0% found this document useful (0 votes)

314 views

Notes of Statistical Mechanics

This document is the contents page and introduction to a textbook on statistical mechanics. It provides an overview of the topics that will be covered in the textbook, including thermodynamics, statistical mechanics foundations, ensembles, quantum statistical mechanics, and phase transitions. The introduction discusses how statistical mechanics provides a bridge between microscopic particle physics and macroscopic thermodynamic laws which describe bulk properties of complex systems. It also summarizes some key concepts in thermodynamics, such as macroscopic variables, closed systems, and adiabatic boundaries.

Uploaded by

RobertoOrtizFernandez

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

314 views

Notes of Statistical Mechanics

Uploaded by

RobertoOrtizFernandez

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 288

Statistical Mechanics I

Peter S. Riseborough
November 15, 2011

Contents
1 Introduction

2 Thermodynamics
2.1 The Foundations of Thermodynamics
2.2 Thermodynamic Equilibrium . . . . .
2.3 The Conditions for Equilibrium . . . .
2.4 The Equations of State . . . . . . . . .
2.5 Thermodynamic Processes . . . . . . .
2.6 Thermodynamic Potentials . . . . . .
2.7 Thermodynamic Stability . . . . . . .

.
.
.
.
.
.
.

4
4
5
7
13
14
16
38

3 Foundations of Statistical Mechanics

3.1 Phase Space . . . . . . . . . . . . . . . . . . . . . .
3.2 Trajectories in Phase Space . . . . . . . . . . . . .
3.3 Conserved Quantities and Accessible Phase Space .
3.4 Macroscopic Measurements and Time Averages . .
3.5 Ensembles and Averages over Phase Space . . . . .
3.6 Liouvilles Theorem . . . . . . . . . . . . . . . . .
3.7 The Ergodic Hypothesis . . . . . . . . . . . . . . .
3.8 Equal a priori Probabilities . . . . . . . . . . . . .
3.9 The Physical Significance of Entropy . . . . . . . .

.
.
.
.
.
.
.
.
.

46
46
49
52
54
55
57
66
72
76

4 The
4.1
4.2
4.3
4.4
4.5

.
.
.
.
.

83
. 83
. 88
. 93
. 97
. 103

.
.
.
.
.
.
.

Micro-Canonical Ensemble
Classical Harmonic Oscillators . . . . . . .
An Ideal Gas of Indistinguishable Particles
Spin One-half Particles . . . . . . . . . . . .
The Einstein Model of a Crystalline Solid .
Vacancies in a Crystal . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.

5 The
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15

Canonical Ensemble
The Boltzmann Distribution Function . . . . . . . . . . .
The Equipartition Theorem . . . . . . . . . . . . . . . . .
The Ideal Gas . . . . . . . . . . . . . . . . . . . . . . . . .
The Entropy of Mixing . . . . . . . . . . . . . . . . . . . .
The Einstein Model of a Crystalline Solid . . . . . . . . .
Vacancies in a Crystal . . . . . . . . . . . . . . . . . . . .
Quantum Spins in a Magnetic Field . . . . . . . . . . . .
Interacting Ising Spin One-half Systems . . . . . . . . . .
Density of States of Elementary Excitations . . . . . . . .
The Debye Model of a Crystalline Solid . . . . . . . . . .
Electromagnetic Cavities . . . . . . . . . . . . . . . . . . .
Energy Fluctuations . . . . . . . . . . . . . . . . . . . . .
The Boltzmann Distribution from Entropy Maximization
The Gibbs Ensemble . . . . . . . . . . . . . . . . . . . . .
A Flexible Polymer . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

106
106
111
111
112
114
115
116
117
119
122
126
133
137
140
144

6 The
6.1
6.2
6.3

Grand-Canonical Ensemble
The Ideal Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fluctuations in the Number of Particles . . . . . . . . . . . . . .
Energy Fluctuations in the Grand-Canonical Ensemble . . . . . .

149
154
156
158

7 Quantum Statistical Mechanics

7.1 Quantum Microstates and Measurements . . . . . . . .
7.2 The Density Operator and Thermal Averages . . . . . .
7.3 Indistinguishable Particles . . . . . . . . . . . . . . . . .
7.4 The Spin-Statistics Theorem and Composite Particles
7.5 Second Quantization . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.

8 Fermi-Dirac Statistics
8.1 Non-Interacting Fermions . . . . . . . . . . . . . . . . . .
8.2 The Fermi-Dirac Distribution Function . . . . . . . . . . .
8.3 The Equation of State . . . . . . . . . . . . . . . . . . . .
8.4 The Chemical Potential . . . . . . . . . . . . . . . . . . .
8.5 The Sommerfeld Expansion . . . . . . . . . . . . . . . . .
8.6 The Low-Temperature Specific Heat of an Electron Gas .
8.7 The Pauli Paramagnetic Susceptibility of an Electron Gas
8.8 The High-Temperature Limit of the Susceptibility . . . .
8.9 The Temperature-dependence of the Pressure of a Gas of
Interacting Fermions . . . . . . . . . . . . . . . . . . . . .
8.10 Fluctuations in the Occupation Numbers . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Non. . . .
. . . .

162
162
164
168
172
174
181
181
182
186
188
190
195
199
201
202
204

9 Bose-Einstein Statistics
205
9.1 Non-Interacting Bosons . . . . . . . . . . . . . . . . . . . . . . . 205
9.2 The Bose-Einstein Distribution Function . . . . . . . . . . . . . . 207
9.3 The Equation of State for Non-Interacting Bosons . . . . . . . . 210

9.4
9.5
9.6
9.7
9.8

The Fugacity at High Temperatures . . .

Fluctuations in the Occupation Numbers .
Bose-Einstein Condensation . . . . . . . .
Superfluidity . . . . . . . . . . . . . . . .
The Superfluid Velocity and Vortices . . .

.
.
.
.
.

211
213
213
223
233

10 Phase Transitions
10.1 Phase Transitions and Singularities . . . . . . . . . .
10.2 The Mean-Field Approximation for an Ising Magnet
10.3 The Landau-Ginzberg Free-Energy Functional . . . .
10.4 Critical Phenomena . . . . . . . . . . . . . . . . . .
10.5 Mean-Field Theory . . . . . . . . . . . . . . . . . . .
10.6 The Gaussian Approximation . . . . . . . . . . . . .
10.7 The Renormalization Group Technique . . . . . . . .
10.8 Collective Modes and Symmetry Breaking . . . . . .
10.9 Appendix: The One-Dimensional Ising Model . . . .

.
.
.
.
.
.
.
.
.

236
237
240
247
252
259
270
279
280
284

.
.
.
.
.

Introduction

Real materials are composed of a huge numbers of particles. For example, one
cubic centimeter of copper or one liter of water contains about 1023 atoms. The
enormity of the number of degrees of freedom prevents one from be able to either
determine or store the initial conditions let alone from solving the equations of
motion. Hence, a detailed microscopic description appears impossible. Nevertheless, the equilibrium states of such materials can be defined by a relative
few macroscopic quantities, such as temperature, pressure or volume, etc. These
quantities reflect the collective properties of the constituents of the material but
still can be measured quite directly by macroscopic means. Likewise, certain
non-equilibrium states of the material can also described by a few easily measured quantities such as the voltage drop across or the electrical current flowing
through an electrical circuit element. Often simple laws emerge between the
macroscopic quantities that describe the properties of these complex systems.
The subject of Thermodynamics is devoted to revealing relations, sometimes
expected and sometimes unexpected, between the macroscopic quantities describing materials. Statistical Mechanics provides statistically based methods
which bridge the gap between the physics of the individual particles that comprise the materials and the simple thermodynamic laws that describe the
macroscopic properties of many-particle systems.

Thermodynamics

Thermodynamics is a branch of science that does not assert new fundamental

principles but, instead predicts universal relations between measurable quantities that characterize macroscopic systems. Specifically, thermodynamics involves the study of macroscopic coordinates which can be expressed in terms of
an extremely large number of microscopic degrees of freedom, and that describe
the macroscopic states of systems.

2.1

The Foundations of Thermodynamics

Macroscopic measurements have the attributes that they involve large numbers
of microscopic degrees of freedom (such at the positions and momenta of 109
atoms) and, are measured over extremely long time scales compared with the
time scales describing the microscopic degrees of freedom (of the order 107
seconds). In general, for sufficiently large systems and when averaged over
sufficiently long time scales, the fluctuations of the macroscopic variables are
extremely small and so only the average values need be retained.
Typical macroscopic variables are the internal energy U , the number of particles N and the volume of the system V . The internal energy U is a precisely
defined quantity, which in the absence of interactions between the system and its
environment, is also a conserved quantity. For systems which contain particles
that do not undergo reactions, the number of particles N is also a well-defined

and conserved quantity. The volume V of a system can be measured, to within

reasonable accuracy. A measurement of the volume usually occurs over timescales that are long enough so that the long wavelength fluctuations of the
atomic positions of the systems boundaries or walls are averaged over.
Thermodynamic measurements are usual indirect, and usually involve externally imposed constraints. Systems which are constrained such that they
cannot exchange energy, volume or number of particles with their environments
are said to be closed. Since one usually only measures changes in the internal
energy of a system U , such measurements necessarily involve the systems environment and the assumption that energy is conserved. Such measurements
can be performed by doing electrical or mechanical work on the system, and
preventing energy in the form of heat from flowing into or out of the system.
The absence of heat flow is ensured by utilizing boundaries which are impermeable to heat flow. Boundaries which are impermeable to heat flow are known
as adiabatic. Alternatively, one may infer the change in internal energy of a
system by putting it in contact with other systems that are monitored and that
have been calibrated so that their changes in internal energy can be found. For
any process, the increase in the internal energy U can be expressed as the sum
of the heat absorbed by the system Q and the work done on the system W .
U = Q + W

(1)

where N is being held constant. This equation is a statement of the conservation

of energy. This basic conservation law in its various forms, forms an important
principle of thermodynamics.

2.2

Thermodynamic Equilibrium

Given a macroscopic system, experience shows that this system will evolve to
a state in which the macroscopic properties are determined by intrinsic factors
and not by any external influences that had been previously exerted on the system. The final states, by definition, are independent of time and are known as
equilibrium states.
Postulate I
It is postulated that, in equilibrium, the macroscopic states of a system can be
characterized by a set of macroscopic variables. These variable may include variables taken from the set {U, V, N } together with any other macroscopic variables
that must be added to the set in order to describe the equilibrium state uniquely.
For example, in a ferromagnet this set may be extended by adding the total
magnetic moment M of the sample. Thus, for a ferromagnet one might specify
the equilibrium state by the macroscopic variables {U, N, V, M }. Another example is given by the example of a gas containing r different species of atoms,
in which case the set of macroscopic variables should be extended to include

the number
of atoms for each species {N1 , N2 , . . . Nr }. Due to the constraint
Pr
N = i=1 Ni , the total number N of atoms should no longer be considered as
an independent variable.
The set of variables {U, V, N, M, . . .} are extensive variables, since they scale
with the size of the system. This definition can be made more precise as follows:
Consider a homogeneous system that is in thermal equilibrium. The value of the
variable X for the equilibrated system is denoted by X0 . Then the variable X
is extensive if, when one considers the system as being composed of identical
subsystems ( > 1), the value of the variable X for each subsystem is equal
to 1 X0 . This definition assumes that the subsystems are sufficiently large so
that the fluctuations X of X are negligibly small.
The extensive variables {U, V, N } that we have introduced, so far, all have
mechanical significance. There are extensive variables that only have thermodynamic significance, and these variables can can also be used to characterize
equilibrium states. One such quantity is the entropy S.
Postulate II
The entropy S is defined only for equilibrium states, and takes on a value
which is uniquely defined by the state. That is, S is a single-valued function
S(U, V, N ) of the mechanical extensive variables. The entropy has the property
that it is maximized in an equilibrium state, with respect to the variation of
hypothetical internal constraints. The constraint must be designed so that, in
the absence of the constraint, the system is free to select any one of a number
of states each of which may be realized in the presence of the constraint. If the
hypothetical internal constraint characterized by the variable x is imposed on the
system, then the entropy of the system depends on the constraint through x and
can be denoted by S(x). The maximum value of the entropy of the unconstrained
system S is given by the maximum value of S(x) found when x is varied over
all possible values.
The function S(E, V, N ) for a system is known as the fundamental relation,
since all conceivable thermodynamic information on the system can be obtained
from it.
Postulate III
The entropy of a system is not only an extensive variable, but also the entropy of a composite system is the sum of the entropies of its components. The
entropy is a continuous, differentiable and a monotonically increasing function
of the entropy.
Postulate III ensures that when the absolute temperature T is defined for
an equilibrium state, then T will be positive.
Postulate IV
6

The entropy of a system vanishes in the limit where

S

U V,N

(2)

The above condition identifies a state for which the absolute temperature approaches the limiting value T 0.
Postulate IV is equivalent to Nernsts postulate that the entropy takes on
a universal value when T 0. The above form of the postulate defines the
universal value of the entropy to be zero.

2.3

The Conditions for Equilibrium

The above postulates allows the conditions for equilibrium to be expressed in

terms of the derivatives of the entropy. Since, entropy is an extensive quantity,
its derivatives with respect to other extensive quantities will be intensive. That
is, the derivatives are independent of the size of the system. The derivatives
will be used to define intensive thermodynamic variables.
First, we shall consider making use of the postulate that entropy is a singlevalued monotonic increasing function of energy. This implies that the equation
for the entropy
S = S(U, V, N )
(3)
can be inverted to yield the energy as a function of entropy
U = U (S, V, N )

(4)

This inversion may be difficult to do if one is presented with a general expression

for the function, but if the function is presented graphically this is achieved
by simply interchanging the axes. Consider making infinitesimal changes of
the independent extensive variables (S, V, N ), then the energy U (S, V, N ) will
change by an amount given by

U
U
U
dS +
dV +
dN
(5)
dU =
S V,N
V S,N
N S,V
The three quantities

U
S

U
V

U
N

V,N

S,N

(6)
S,V

are intensive since if a system in equilibrium is considered to divided into

identical subsystems the values of these parameters for each of the subsystems
is the same as for the combined system. These three quantities define the energy
intensive parameters. A quantity is intensive if its value is independent of the
scale of the system, that is, its value is independent of the amount of matter
used in its measurement. The intensive quantities can be identified as follows:
By considering a process in which S and N are kept constant and V is
allowed to change, one has

U
dV
(7)
dU =
V S,N
which, when considered in terms of mechanical work W , leads to the identification

U
= P
(8)
V S,N
where P is the mechanical pressure. Likewise, when one considers a process in
which V and N are kept constant and S is allowed to change, one has

U
dU =
dS
(9)
S V,N
which, when considered in terms of heat flow Q, leads to the identification

U
= T
(10)
S V,N
where T is the absolute temperature. Finally, on varying N , one has

U
dU =
dN
N S,V
which leads to the identification

U
N

(11)

(12)

S,V

where is the chemical potential. Thus, one obtains a relation between the
infinitesimal changes of the extensive variables
dU = T dS P dV + dN

(13)

This is an expression of the conservation of energy.

Direct consideration of the entropy, leads to the identification of the entropic
intensive parameters. Consider making infinitesimal changes of the independent

extensive variables (U, V, N ), then the entropy S(U, V, N ) will change by an

amount given by

S
S
S
dS =
dU +
dV +
dN
(14)
U V,N
V U,N
N U,V
The values of the coefficients of the infinitesimal quantities are the entropic
intensive parameters. By a suitable rearrangement of eqn(13) as
dS =

1
P

dU +
dV
dN
T
T
T

one finds that the entropic intensive variables are given by

1
S
=
U V,N
T

P
S
=
V U,N
T

S
=
N U,V
T

(15)

(16)

where T is the absolute temperature, P is the pressure and is the chemical

potential.
The conditions for equilibrium can be obtained from Postulate II and Postulate III, that the entropy is maximized in equilibrium and is additive. We
shall consider a closed system composed of two systems in contact. System 1 is
described by the extensive parameters {U1 , V1 , N1 } and system 2 is described
by {U2 , V2 , N2 }. The total energy UT = U1 + U2 , volume VT = V1 + V2
and number of particles NT = N1 + N2 are fixed. The total entropy of the
combined system ST is given by
ST = S1 (U1 , V1 , N1 ) + S2 (U2 , V2 , N2 )

(17)

and is a function of the variables {U1 , V1 , N1 , U2 , V2 , N2 }.

Heat Flow and Temperature
If one allows energy to be exchanged between the two systems, keeping the
total energy UT = U1 + U2 constant, then

S2
S1
dU1 +
dU2
(18)
dST =
U1 V1 ,N1
U2 V2 ,N2
Since UT is kept constant, one has dU1 = dU2 . Therefore, the change in the
total entropy is given by

S1
S2
dST =

dU1
(19)
U1 V1 ,N1
U2 V2 ,N2
9

U1,V1,N1

U2,V2,N2

Figure 1: An isolated system composed of two subsystems.

Furthermore, in equilibrium ST is maximized with respect to the internal partitioning of the energy, so one has
dST = 0

(20)

For this to be true, independent of the value of dU1 , one must satisfy the condition

S1
S2
=
(21)
U1 V1 ,N1
U2 V2 ,N2
or, equivalently
1
1
=
(22)
T1
T2
Thus, the condition that two systems, which can only exchange internal energy
by heat flow, are in thermal equilibrium is simply the condition that the temperatures of the two systems must be equal, T1 = T2 .
Let us consider the same closed system, but one in which the two bodies
are initially not in thermal contact with each other. Since the two systems are
isolated, they are in a state of equilibrium but may have different temperatures.
However, if the two systems are put in thermal contact, the adiabatic constraint
is removed and they will no longer be in thermal equilibrium. The system will
evolve, by exchanging energy in the form of heat, between the two systems
and a new equilibrium state will be established. The new equilibrium state,
obtained by removing the internal constraint will have a larger entropy. Hence,
for the two equilibrium states which differ infinitesimally in the partitioning of
the energy, dST > 0 and

S2
S1

dU1 > 0
(23)
dST =
U1 V1 ,N1
U2 V2 ,N2
10

1
1

T1
T2

dU1 > 0

(24)

This inequality shows that heat flows from systems with higher temperatures
to systems with lower temperatures, in agreement with expectations.
Work and Pressure
Consider a system composed of two sub-systems, which are in contact that
can exchange energy and also exchange volume. System 1 is described by the
extensive parameters {U1 , V1 , N1 } and system 2 is described by {U2 , V2 , N2 }.
The total energy is fixed as is the total volume. The energy and volumes of the
sub-systems satisfy
UT

= U1 + U2

= V1 + V2

(25)

and N1 and N2 are kept constant. For an equilibrium state, one can consider
constraints that result in different partitionings of the energy and volume. The
entropy of the total system is additive
ST = S1 (U1 , V1 , N1 ) + S2 (U2 , V2 , N2 )

(26)

The infinitesimal change in the total entropy ST found by making infinitesimal

changes in U1 and V1 is given by

S2
S1
S2
S1

dU1 +

dV1
dST =
U1 V1 ,N1
U2 V2 ,N2
V1 U1 ,N1
V2 U2 ,N2
(27)
since dU1 = dU2 and dV1 = dV2 . Thus, on using the definitions for the
intensive parameters of the sub-systems, one has

1
1
P1
P2
dST =

dU1 +

dV1
(28)
T1
T2
T1
T2
Since the equilibrium state is that in which ST is maximized with respect to the
variations dU1 and dV1 , one has dST = 0 which leads to the conditions
1
T1
P1
T1

=
=

1
T2
P2
T2

(29)

Hence, the pressure and temperature of two the sub-systems are equal in the
equilibrium state.
Furthermore, if the systems are initially in their individual equilibrium states
but are not in equilibrium with each other, then they will ultimately come into
11

thermodynamic equilibrium with each other. If the temperatures of the two

subsystems are equal but the initial pressures of the two systems are not equal,
then the change in entropy that occurs is given by

dV1
dST =
P1 P2
> 0
(30)
T
Since dST > 0, one finds that if P1 > P2 then dV1 > 0. That is, the system
at higher pressure will expand and the system at lower pressure will contract.
Matter Flow and Chemical Potential
The above reasoning can be extended to a system with fixed total energy,
volume and number of particles, which is decomposed into two sub-systems
that exchange energy, volume and number of particles. Since dU1 = dU2 ,
dV1 = dV2 and dN1 = dN2 , one finds that an infinitesimal change in the
extensive variables yields to an infinitesimal change in the total entropy, which
is given by

S1
S2
dST =

dU1
U1 V1 ,N1
U2 V2 ,N2

S2
S1

dV1
+
V1 U1 ,N1
V2 U2 ,N2

S1
S2
+

dN1
N1 U1 ,V1
N2 U2 ,V2

1
P1
P2
1
2
1

dU1 +

dV1

dN1
=
T1
T2
T1
T2
T1
T2
(31)
Since the total entropy is maximized in equilibrium with respect to the internal
constrains, one has dST = 0 which for equilibrium in the presence of a particle
exchange process yields the condition
1
2
=
T1
T2

(32)

On the other hand, if the systems initially have chemical potentials that differ
infinitesimally from each other, then

dN1
> 0
(33)
dST =
2 1
T
Hence, if 2 > 1 then dN1 > 0. Therefore, particles flow from regions of
higher chemical potential to regions of lower chemical potential.

Thus, two systems which are allowed to exchange energy, volume and particles have to satisfy the conditions
T1

= T2

= P2

= 2

(34)

if they are in equilibrium.

2.4

The Equations of State

The fundamental relation S(U, V, N ) or alternately U (S, V, N ) provides a complete thermodynamic description of a system. From the fundamental relation
one can derive three equations of state. The expressions for the intensive parameters are equations of state
T

= T (S, V, N )

P = P (S, V, N )
= (S, V, N )

(35)

These particular equations relate the intensive parameters to the independent

extensive parameters. If all three equations of state are not known, then one
has an incomplete thermodynamic description of the system.
If one knows all three equations of state, one can construct the fundamental
relation and, hence, one has a complete thermodynamic description of the system. This can be seen by considering the extensive nature of the fundamental
relation and it behavior under a change of scale by s. The fundamental equation
is homogeneous and is of first order so
U (sS, sV, SN ) = s U (S, V, N )
Differentiating the above equation w.r.t s yields

dsS
U
dsV
U
dsN
U
+
+
sS sV,sN
ds
sV sS,sN
ds
sN sS,sV
ds

U
U
U
S +
V +
N
sS sV,sN
sV sS,sN
sN sS,sV
which, on setting s = 1, yields the Euler Equation

U
U
U
S +
V +
N = U
S V,N
V S,N
N S,V

(36)

= U (S, V, N )
= U (S, V, N )

(37)

which, when expressed in terms of the intensive parameters, becomes

T S P V + N = U

(38)

In the entropy representation, one finds the Euler equation in the form
1
P

U +
V
N = S
T
T
T

(39)

which has exactly the same content as the Euler equation found from the energy
representation. From either of these equations it follows that knowledge of the
three equations of state can be used to find the fundamental relation.
The three intensive parameters cannot be used as a set of independent variables. This can be seen by considering the infinitesimal variations of the Euler
Equation
dU = T dS + S dT P dV V dP + dN + N d

(40)

and comparing it with the form of the first law of thermodynamics

dU = T dS P dV + dN

(41)

This leads to the discovery that the infinitesimal changes in the intensive parameters are related by the equation
0 = S dT V dP + N d

(42)

which is known as the Gibbs-Duhem relation. Thus, for a one component system, there are only two independent intensive parameters, i.e. there are only
two thermodynamic degrees of freedom.

2.5

Thermodynamic Processes

Not all processes, that conserve energy, represent real physical processes. Since
if the system is initially in a constrained equilibrium state, and an internal
constraint is removed, then the final equilibrium state that is established must
have a higher entropy.
A quasi static processes, is one that proceeds sufficiently slowly that its trajectory in thermodynamic phase space can can be approximated by a dense
set of equilibrium states. Thus, at each macroscopic equilibrium state one
can define an entropy Sj = S(Uj , Vj , Nj , Xj ). The quasi-static process is
a temporal succession of equilibriums states, connected by non-equilibrium
states. Since, for any specific substance, an equilibrium state can be characterized by {U, V, N, X}, a state can be represented by a point on a hypersurface S = S(U, V, N, X) in thermodynamic configuration space. The cuts of
the hyper-surface at constant U are concave. The quasi-static processes trace
out an almost continuous line on the hyper-surface. Since individual quasistatic processes are defined by sequence of equilibrium states connected by nonequilibrium states, the entropy cannot decrease along any part of the sequence
if it is to represent a possible process, therefore, Sj+1 Sj . Thus, an allowed
quasi-static process must follow a path on the hyper-surface which never has a
segment on which S decreases. A reversible process is an allowed quasi-static
14

process in which the overall entropy difference becomes infinitesimally small.

Hence, a reversible process must proceed along a contour of the hyper-surface
which has constant entropy. Therefore, reversible process occur on a constant
entropy cut of the hyper-surface. The constant entropy cuts of the hyper-surface
are convex.
Adiabatic Expansion of Electromagnetic Radiation in a Cavity
Consider a spherical cavity of radius R which contains electromagnetic radiation. The radius of the sphere expands slowly at a rate given by dR
dt . The
spectral component of wavelength contained in the cavity will be changed
by the expansion. The change occurs through a change of wavelength d that
occurs at reflection with the moving boundary. Since the velocity dR
dt is much
smaller than the speed of light C, one only needs to work to keep terms firstorder in the velocity. A single reflection through an angle produces a Doppler

Figure 2: Electromagnetic radiation being reflected through an angle from the

walls of a slowly expanding spherical electromagnetic cavity.
shift of the radiation by an amount given by
d = 2

dR
cos
c dt

(43)

The ray travels a distance 2 R cos between successive reflections. Hence, the
time between successive reflections is given by
c
2 R cos

(44)

Thus, the rate at which the wavelength changes is given by

d
dR
=
dt
R dt
15

(45)

On integrating the above equation, one finds that

= Constant
R

(46)

Therefore, the wavelength scales with the radius. Quantum mechanically, each
state evolves adiabatically so no transitions occur. The wavelength scales with
the radius such as to match the boundary condition.
The equation of state for the electromagnetic radiation is
P =

1 U
3 V

(47)

so for adiabatic expansion (with = 0 and dN = 0), one has

dU = P dV

(48)

which leads to

1 dV
dU
=
U
3 V
Hence, for adiabatic expansion, one has
U V

1
3

= Constant

(49)

(50)

Putting this together with Stefans law

U = V T4
one finds that
T4 V

4
3

= Constant

(51)

(52)

or R T = const. Thus, the temperature of the cavity decreases inversely with

the radius of the cavity as R increases. Furthermore, since scales with R, one
finds that the density of each spectral component must scales so that T is
constant.

2.6

Thermodynamic Potentials

The Postulate II is the entropy maximum principle which can be restated as

the entropy of a system is maximized in equilibrium (at fixed total energy) with
respect to variations of an internal parameter. This is a consequence of the
concave geometry of the constant energy cuts of the hyper-surface in thermodynamic configuration space. We have formulated thermodynamics in terms of
entropy and the extensive parameters. Thermodynamics has an equivalent formulation in terms of the energy and the extensive parameters. In this alternate
formulation, the entropy maximum principle is replaced by the energy minimum
principle. The energy minimum principle states that the energy is minimized
in an equilibrium state with respect to variations of an internal parameter (for
fixed values of the entropy). This is a consequence of the convex nature of
16

the constant entropy cuts of the hyper-surface in thermodynamic configuration

space. The statements of the entropy maximum and the energy minimum principles are equivalent as can be seen from the following mathematical proof.
Equivalence of the Entropy Maximum and Energy Minimum Principles
The entropy maximum principle can be stated as

S
= 0
X U
2
S
< 0
X 2 U

(53)

where X is an internal parameter. From the chain rule, it immediately follows

that the energy is an extremum in equilibrium since

S

X
U
= U
X S
S
U
X

= T
=

S
X

(54)

Hence, it follows from the entropy maximum principle that the energy is an
extremum.
The energy is extremum is a minimum, this follows by re-writing the second
derivative of S as
2

S

S
(55)
=
X 2 U
X X U U
and designating the internal derivative by A, i.e. let

S
A =
X U
so the entropy maximum principle requires that
2

S
A
=
< 0
X 2 U
X U

(56)

(57)

If we consider A to be a function of (X, U ) instead of (X, S), i.e. A(X, S) =

A(X, S(X, U )), then

A
A
A
S
=
+
(58)
X U
X S
S X X U
17

where the last term vanishes because of the entropy maximum principle. Hence,

A
A
=
(59)
X U
X S
Thus, we have

2S
X 2

=
U

S
X

< 0
U

(60)

Using the chain rule, the innermost partial derivative can be re-written as

S
U
S
=
(61)
X U
X S U X
Hence, on substituting this into the maximum principle, one has

2

U
S
S
=
X 2 U
X X
U X S

2 S

U
U

S
S

(62)
=
X 2 S U X
X S X U X S
The last term vanishes since we have shown that the energy satisfies an extremum principle. Therefore, one has
2
2

S
U
S
=
X 2 U
X 2 S U X

1 2U
=
T X 2 S
< 0
(63)
Thus, since T > 0, we have

2U
X 2

> 0

(64)

so the energy satisfies the minimum principle if the entropy satisfies the maximum principle. The proof also shows that the energy minimum principle implies
the entropy maximum principle, so the two principles are equivalent.
Sometimes it is more convenient to work with the intensive parameters rather
than the extensive parameters. The intensive parameters are defined in terms
of the partial derivatives of the fundamental relation S(U, V, N ) or equivalently
U (S, V, N ). Taking partial derivatives usually leads to a loss of information, in
the sense that a function can only be re-created from its derivative by integration up to a constant (or more precisely a function) of integration. Therefore, to
avoid loss of information, one changes extensive variables to intensive variables
by performing Legendre transformations.

Legendre Transformations
The Legendre transformation relies on the property of concavity of S(E, V, N )
and is introduced so that one can work with a set of more convenient variables,
such as T instead of S or P instead of V . This amounts to transforming from
an extensive parameter to its conjugate intensive parameter which is introduced
as a derivative.
The Legendre transformation is introduced such that the change of variables
is easily invertable. Instead of considering the convex function y = y(x) 1 being
given by the ordered pair (x, y) for each x, one can equally describe the curve
by an envelope of a family of tangents to the curve. The tangent is a straight
line
y = p x + (p)
(65)
with slope p and has a y-axis intercept denoted by (p). Due to the property
of convexity, for each value of p there is a unique tangent to the curve. Hence,
we have replaced the sets of pairs (x, y) with a set of pairs (p, ). The set of
pairs (p, ) describes the same curve and has the same information as the set
of pairs (x, y).

(p)
x
Figure 3: A concave function y(x) is specified by the envelope of a family of
tangents with slopes p and y-axis intercepts (p).
Given a curve in the form of y = y(x), one can find (p) by taking the
derivative to yield
dy
p =
(66)
dx
1 The convexity and concavity of a function implies that the second derivative of the function
has a specific sign. All that we shall require is that the second derivative of the function does
not go to zero in the interval of x that is under consideration.

which specifies the slope p of the tangent line at the tangent point x. The above
equation can be inverted to yield x = x(p) and, hence, one can obtain y(p)
from y = y(x(p)). Then, the y-axis intercept of the tangent can be found as a
function of p from
(p) = y(p) p x(p)
(67)
The function (p) is the Legendre transform of y(x). The quantity (p) contains
exactly the same information as y(x) but depends on the variable p instead of
the x variable.
The inverse transform can be found by constructing (x, y) from (p, ). First
the point x at which a tangent with slope p touches the curve is found. Second,
after inverting x(p) to yield p(x), one finds y(x) from
y = p(x) + (p(x))

(68)

The point of tangency x is found by considering a tangent with slope p and a

neighboring tangent with slope p + dp. The tangent is described by
y = p x + (p)

(69)

which is valid everywhere on the tangent including the point of tangency which
we denote by (x, y). The neighboring tangent which has an infinitesimally different slope p + dp is described by a similar equation, but has a point of tangency
(x + dx, y + dy) that differs infinitesimally from (x, y). To first-order in the
infinitesimals, one finds the coordinates describing the separation of the two
points of tangency are related by
dy = p dx + ( x +

d
) dp
dp

(70)

However, since the two neighboring points of tangency lie on the same curve
and because the slope of the tangent is p, one has
dy = p dx

(71)

Thus, we find that the x-coordinate of the point of tangency is determined by

the equation
d
x =
(72)
dp
The abscissa is given by
y(x)

= + xp

(73)

in which p has been expressed in terms of x. This is the inverse Legendre

transformation.
The inverse Legendre transformation should be compared to the Legendre
transformation
(p)

= y xp
20

(74)

in which x has been expressed in terms of p via inversion of

p =

dy
dx

(75)

Thus, the relation between (x, y) and (p, ) is, apart from a minus sign, symmetrical between the Legendre and inverse Legendre transformations.
The Helmholtz Free-Energy F
The Helmholtz Free-Energy is denoted by F is a function of the variables
(T, V, N ) and is obtained by performing a Legendre transform on the energy
U (S, V, N ). The process involves defining the temperature T via the derivative

U
(76)
T =
S V,N
and then defining a quantity F via
F = U T S

(77)

The definition of T is used to express S as a function of T . Then eliminating

S from the two terms in the above expression for F , yields the Helmholtz FreeEnergy F (T, V, N ).
One can show that F does not depend on S by considering an infinitesimal
transformation
dF = dU S dT T dS
(78)
and then by substituting the expression
dU = T dS P dV + dN

(79)

obtained from U (S, V, N ) and the definition of the energetic extensive parameters. Substitution of the expression for dU into dF yields
dF = S dT P dV + dN

(80)

which shows that F only varies with T , V and N . It does not vary as dS is
varied. Thus F is a function of the variables (T, V, N ). Furthermore, we see
that S can be found from F as a derivative

F
S =
(81)
T V,N
The Helmholtz Free-Energy has the interpretation that it represents the
work done on the system in a process carried out at constant T (and N ). This
can be seen from the above infinitesimal form of dF since, under the condition
that dT = 0, one has
dF = P dV
(82)
21

The inverse transform is given found by starting from F (T, V, N ) and expressing S as

F
S =
(83)
T V,N
This equation is used to express T as a function of S, i.e. T = T (S). The
quantity U is formed via
U = F + T S
(84)
Elimination of T in favour of S in both terms leads to U (S, V, N ) the energy.
The Enthalpy H
The enthalpy is denoted by H and is a function of the variables (S, P, N ). It is
obtained by a Legendre transform on U (S, V, N ) which eliminates the extensive
variable V and introduces the intensive variable P . The pressure P is defined
by the equation

U
(85)
P =
V S,N
and then one forms the quantity H via
H = U + P V

(86)

which on inverting the equation expressing pressure as a function of V and

eliminating V from H, one obtains the enthalpy H(S, P, N ).
The enthalpy is a function of (S, P, N ) as can be seen directly from the
infinitesimal variation of H. Since
dH = dU + V dP + P dV

(87)

dU = T dS P dV + dN

(88)

dH = T dS + V dP + dN

(89)

and as
one finds that
which shows that H only varies when S, P and N are varied. The above
infinitesimal relation also shows that

H
V =
(90)
P S,N
The enthalpy has the interpretation that it represents the heat flowing into
a system in a process at constant pressure (and constant N ). This can be seen
from the expression for the infinitesimal change in H when dP = 0
dH = T dS
which is recognized as an expression for the heat flow into the system.
22

(91)

The inverse Legendre transform of H(S, P, N ) is U (S, V, N ) and is performed

by using the relation

H
V =
(92)
P S,N
to express P as a function of V . On forming U via
U = H P V

(93)

and eliminating P from U , one has the energy U (S, V, N ).

The Gibbs Free-Energy G
The Gibbs Free-Energy G(T, P, N ) is formed by making two Legendre transformation on U (S, V, N ) eliminating the extensive variables S and V and introducing their conjugate intensive parameters T and P . The process starts with
U (Sv, N ) and defines the two intensive parameters T and P via

U
T =
S V,N

U
(94)
P =
V S,N
The quantity G is formed via
G = U T S + P V

(95)

which on eliminating S and V leads to the Gibbs Free-Energy G(T, P, N ).

On performing infinitesimal variations of S, T , V , P and N , one finds the
infinitesimal change in G is given by
dG = dU T dS S dT + P dV + V dP

(96)

which on eliminating dU by using the equation

dU = T dS P dV + dN

(97)

dG = S dT + V dP + dN

(98)

leads to
This confirms that the Gibbs Free-Energy is a function of T , P and N , G(T, P, N ).
It also shows that

G
S =
T P,N

G
V =
(99)
P T,N

The inverse (double) Legendre transform of G(T, P, N ) yields U (S, V, N ) and

is performed by expressing the extensive parameters as

G
S =
T P,N

G
(100)
V =
P T,N
and using these to express T in terms of S and P in terms of V . The energy is
formed via
U = G + T S P V
(101)
and eliminating T and P in favour of S and V , to obtain S(U, V, N ).
The Grand-Canonical Potential
The Grand-Canonical Potential (T, V, ) is a function of T , V and . It
is obtained by making a double Legendre transform on U (S, V, N ) which eliminates S and N and replaces them by the intensive parameters T and . This
thermodynamic potential is frequently used in Statistical Mechanics when working with the Grand-Canonical Ensemble, in which the energy and number of
particles are allowed to vary as they are exchanged with a thermal and particle
reservoir which has a fixed T and a fixed .
The double Legendre transformation involves the two intensive parameters
defined by

U
T =
S V,N

U
(102)
=
N S,V
The quantity is formed as
= U T S N

(103)

elimination of the extensive variables S and N leads to (T, V, ), the GrandCanonical Potential.
The infinitesimal change in is given by
d = dU T dS S dT dN N d

(104)

which, on substituting for dU , leads to

d = S dT P dV N d

(105)

The above equation confirms that only depends on the variables T , V and .
Furthermore, this relation also shows that

S =
T V,
24

(106)
T,V

The inverse (double) transformation uses the two relations

S =
T V,

N =
T,V

(107)

to express T and in terms of S and N . The quantity U is formed via

U = + T S + N

(108)

which on eliminating T and leads to the energy as a function of S, V and N ,

i.e. U (S, V, N ).
As examples of the use of thermodynamic potentials, we shall consider the
processes of Joule Free Expansion and the Joule-Thomson Throttling Process.
Joule Free Expansion
Consider a closed system which is composed of two chambers connected by
a valve. Initially, one chamber is filled with gas and the second chamber is
evacuated. The valve connecting the two chambers is opened so that the gas
can expand into the vacuum.
The expansion process occurs at constant energy, since no heat flows into
the system and no work is done in expanding into a vacuum. Hence, the process
occurs at constant U .
Due to the expansion the volume of the gas changes by an amount V and,
therefore, one might expect that the temperature of the gas may change by an
amount T . For a sufficiently small change in volume V , one expects that
T and V are related by

T
T =
V
(109)
V U,N
where U is being kept constant.
On applying the chain rule, one finds

U
U
T =
/
V
V T,N
T V,N

(110)

However, from the expression for the infinitesimal change in U

dU = T dS P dV

(111)

one finds that the numerator can be expressed as

U
S
= T
P
V T,N
V T,N
whereas the denominator is identified as

U
= CV
T V,N
which is the specific heat at constant volume.
The quantity proportional to

S
V T,N

(112)

(113)

(114)

is not expressed in terms of directly measurable quantities. It can be expressed

as a derivative of pressure by using a Maxwell relation. We note that the
quantity should be considered as a function of V , which is being varied and is
also a function of T which is being held constant. Processes which are described
in terms of the variables V and T can be described by the Helmholtz Free-Energy
F (T, V, N ), for which
dF = S dT P dV + dN

(115)

The Helmholtz Free-Energy is an analytic function of T and V , therefore, it

satisfies the Cauchy-Riemann condition
2
2
F
F
=
(116)
V T N
T V N
which on using the infinitesimal form of dF to identify the inner partial differentials of F , yields the Maxwell relation

S
P
=
(117)
V T,N
T V,N
Hence, the temperature change and volume change that occur in Joule Free
Expansion are related via

P
P
T T
V,N
T =
V
(118)
CV
which can be evaluated with the knowledge of the equation of state.
Since the expansion occurs at constant energy, one finds from
dU = T dS P dV = 0
26

(119)

that

S
V

=
U,N

P
T

(120)

Thus, the entropy increases on expansion, as it should for an irreversible process.

Joule-Thomson Throttling Process
The Joule-Thomson throttling process involves the constant flow of fluid
through a porus plug. The flowing fluid is adiabatically insulated so that heat
cannot flow into or out of the fluid. The temperature and pressure of the fluid
on either side of the porus plug are uniform but are not equal
T1

6= T2

(121)

6= P2

(122)

Thus, a pressure drop P defined by

P = P1 P2

(123)

and temperature drop T defined by

T = T1 T2

(124)

occur across the porus plug.

P2,T2

P1,T1

Figure 4: A fluid is confined in a cylindrical tube and two pistons (solid black
objects). The pistons force the fluid through the porus plug (orange hatched
region). In this process the pressure and temperature on each side of the plug
are kept constant but not equal.
The Joule-Thomson process is a process for which the enthalpy H is constant.
This can be seen by considering a fixed mass of fluid as it flows through the
plug. The pump that generates the pressure difference can, hypothetically, be
replaced by two pistons. Consider the volume of fluid contained in the volume
V1 between the piston and the plug, as having internal energy U1 . When this
volume of gas has been pushed through the plug, the piston has performed an
amount of work P1 V1 . The piston on the other side of the porus plug performs
a negative amount of work equal to P2 V2 when the gas occupies the volume
V2 between the piston and the plug. The change in internal energy is given by
U2 U1 = P1 V1 P2 V2
27

(125)

This implies that

U1 + P1 V1 = U2 + P2 V2

(126)

or the enthalpy H of the fluid is constant in the throttling process.

For sufficiently small changes in pressure, the temperature drop is related to
the pressure drop by

T
P
(127)
T =
P H
where the enthalpy H is being kept constant.
On applying the chain rule, one finds

H
H
T =
/
P
P T,N
T P,N

(128)

However, from the expression for the infinitesimal change in H

dH = T dS + V dP
one finds that the numerator can be expressed as

S
H
= T
+ V
P T,N
P T,N
whereas the denominator is identified as

H
= CP
T P,N
which is the specific heat at constant pressure.
The quantity proportional to

S
P T,N

(129)

(130)

(131)

(132)

is not expressed in terms of directly measurable quantities. It can be expressed

as a derivative of volume by using a Maxwell relation. We note that the quantity
should be considered as a function of P , which is being varied and is also a
function of T which is being held constant. Processes which are described in
terms of the variables P and T can be described by the Gibbs Free-Energy
G(T, P, N ), for which
dG = S dT + V dP + dN

(133)

The Gibbs Free-Energy is an analytic function of T and P , therefore, it satisfies

the Cauchy-Riemann condition
2
2
G
G
=
(134)
P T N
T P N
28

which on using the infinitesimal form of dG to identify the inner partial differentials of G, yields the Maxwell relation

S
V

=
(135)
P T,N
T P,N
Hence, the pressure change and volume change that occur in the Joule-Thomson
process are related via

V
V
T T
P,N

T =

(136)

which can be evaluated with the knowledge of the equation of state.

Since the expansion occurs at constant enthalpy, one finds from
dH = T dS + V dP = 0
that

S
P

=
H,N

V
T

(137)

(138)

Thus, the entropy increases for the irreversible Joule-Thomson process only if
the pressure drops across the porus plug.
The description of the above processes used two of the Maxwells relations.
We shall give a fuller description of these relations below:
Maxwell Relations
The Maxwell Relations are statements about the analyticity of the thermodynamic potentials. The Maxwell relations are expressed in the form of an
equality between the mixed second derivatives when taken in opposite order. If
B(x, y) is a thermodynamic potential which depends on the independent variables x and y, then analyticity implies that
2
2
B
B
=
(139)
xy
yx
The Maxwell relations for the four thermodynamic potentials which we have
considered are described below:
The Internal Energy U (S, V, N )
Since the infinitesimal change in the internal energy is written as
dU = T dS P dV + dN

(140)

one has the three Maxwell relations

T
V S,N

T
N S,V

P

N S,V

P
=
S V,N

=
S V,N

=
V S,N

(141)

The Helmholtz Free-Energy F (T, V, N )

Since the infinitesimal change in the Helmholtz Free-Energy is written as
dF = S dT P dV + dN

(142)

one finds the relations

S
V

S
N

P
N

T,N

T,V

P
=
T V,N

=
T V,N

=
V T,N

(143)

The Enthalpy H(S, P, N )

Since the infinitesimal change in the enthalpy is written as
dH = T dS + V dP + dN

(144)

one has

T
P

T
N

V
N

V
S

=
S,N

=
S,P

S,P

The Gibbs Free-Energy G(T, P, N )

P,N

(145)
S,N

Since the infinitesimal change in the Gibbs Free-Energy is written as

dG = S dT + V dP + dN

(146)

one has

S
P

S
N

V
N

V
T

=
T,N

=
T,P

P,N

(147)
T,N

The Grand-Canonical Potential (T, V, )

Since the infinitesimal change in the Grand-Canonical Potential is written
as
d = S dT P dV N d
one finds the three Maxwell relations

S
=

V T,

S

=
T,V

P

=
T,V

P
T

N
T

N
V

(148)

(149)
T,

The Nernst Postulate

The Nernst postulate states that as T 0, then S 0. This postulate
may not be universally valid. It can be motivated by noting that the specific
heat CV is positive, which implies that the internal energy U is a monotonically
increasing function of temperature T . Conversely, if T decreases then U should
decrease monotonically. Therefore, U should approach its smallest value as
T 0 and the system should be in a quantum mechanical ground state. The
ground state is unique if it is non-degenerate or, in the case where the ground
state has a spontaneously broken symmetry, may have the degeneracy associated
with the broken symmetry. In either case, since the entropy is proportional to
the logarithm of the degeneracy, one expects the entropy at T = 0 to be a
minimum. For degeneracies which are not exponential in the size of the system,
the entropy is not extensive and, therefore, can be considered as being effectively
zero in the thermodynamic limit N 0. This assumption might not be valid
31

for the case of highly frustrated systems such as ice or spin glasses, since these
systems remain highly degenerate as T 0.
Classically, the entropy can only be defined up to an additive constant. Since
classical states are continuous and, therefore, the number of states depends
on the choice of the measure. Because of this, the classical version of Nernsts
postulate states that the entropy reaches a universal minimum value in the
limit T 0. Therefore, Walther Nernsts initial 1906 formulation was that the
T = 0 isotherm is also an isentrope2 . Max Plancks 1911 restatement of the
postulate gave a value of zero to the entropy at T = 0. This restatement is
frequently attributed to Simon 3 .
Nernsts postulate has a number of consequences. For example, the specific
heat vanishes as T 0. This follows if S approaches zero with a finite
derivative, then

S
0
as T 0
(150)
Cv = T
T V
Likewise,

CP = T

S
T

as T 0

(151)

The thermal expansion coefficient also vanishes as T 0, as the Maxwell

relation

V
S
=
(152)
T P,N
P T,N
shows that

V
T

as T 0

(153)

P,N

Hence the coefficient of volume expansion defined by

1 V
=
0
as T 0
V
T P,N
vanishes as T 0.
Likewise, from the Maxwell relation

P
S
=
T V,N
V T,N

(154)

(155)

one discovers that

P
T

as T 0

(156)

V,N

2 W. Nernst, Uber

die Beziehung zwischen W

armeentwicklung und maximaler Arbeit bei
kondensierten Systemen, Ber. Kgl. Pr. Akad. Wiss. 52, 933-940, (1906).
3 F. Simon and F. Lange, Zur Frage die Entropie amorpher Substanzen, Zeit. f
ur Physik,
38, 227-236 (1926)

In the limit T 0, the difference between the specific heats at constant pressure
and constant volume vanish with a higher power of T than the power of T with
which the specific heats vanish.
From the above formula, one realizes that the classical ideal gas does not
satisfy Nernsts postulate. However, quantum mechanical ideal gasses do satisfy
Nernsts postulate.
Another consequence of the Nernst postulate is that the absolute zero temperature cannot be attained by any means. More precisely, it is impossible by
any procedure, no matter how idealized, to reduce the temperature of any system to the absolute zero in a finite number of operations. First we shall consider
the final step of a finite process. Cooling a substance below a bath temperature usually requires an adiabatic stage, since otherwise heat would leak from
the bath to the system and thereby increase its temperature. Suppose that, by
varying a parameter X from X1 to X2 , one adiabatically cools a system from a
finite temperature T2 to a final temperature T1 . Then the adiabaticity condition
requires
S(T1 , X1 ) = S(T2 , X2 )
(157)
Furthermore, if we reduce the systems final temperature T2 to zero, the righthand side vanishes according to Simons statement of Nernsts principle. Thus,
we require
S(T1 , X1 ) = 0
(158)
which is impossible for real systems for which S is expected to only approach
its minimal value in the limit T 0. Hence, this suggests that the final stages
of the process must involve infinitesimal temperature differences. Such a processes is illustrated by a sequence of processes composed of adiabatic expansions
between a high pressure P1 and a low pressure P2 followed by isothermal contractions between P2 and P1 . The internal energy and temperature is lowered
during the adiabatic expansion stages. The curves of entropy versus temper4

S
3

P1
2

T
0
0

Figure 5: The unattainability of T = 0 is illustrated by a substance which

undergoes a series of stages composed of an adiabatic expansion followed by
an isothermal compression. An infinite number of stages would be required to
reach T = 0.
ature for the different pressures must approach each other as T 0. Hence,
33

both the magnitudes of temperature changes and entropy changes decrease in

the successive stages as T approaches zero. Therefore, absolute zero can only be
attained for this sequence in the limit of an infinite number of stages. For these
two example, the unattainability of absolute zero can simply be understood by
noting that the adiabat becomes the isotherm as T 0.
Extremum Principles for Thermodynamic Potentials
The Energy Minimum Principle states that the equilibrium value of any
unconstrained internal parameter X minimizes U (S, V, X) for a fixed value of
S. It can be stated in terms of the first-order and second-order infinitesimal
changes
dU

d2 U

(159)

where S is held constant.

This principle can be formulated in terms of a composite system which is
composed of a system and a reservoir, for which the total energy and entropy
are defined by
UT

U + UR

S + SR

(160)

The energy minimum principle applied to the combined system becomes

dU + dUR

d ( U + UR ) 0

(161)

where since ST is constant, dSR = dS. We also note that, if the reservoir
is sufficiently larger than the system one may set d2 UR = 0, in which case the
second line simplifies to
d2 U 0
(162)

For a system in thermal contact with a reservoir at constant temperature T ,

the infinitesimal change in internal energy of the reservoir is given by the heat
it absorbs
dUR = T dSR = T dS
(163)
Hence, one has
dU + dUR

dU T dS

(164)

which, if T is being held constant, leads to

d( U T S ) = 0
dF = 0

(165)

F = U T S

(166)

where F is defined as
Hence, the quantity F satisfies an extremum principle for processes at constant
T . For a sufficiently large reservoir, one may set d2 UR 0. This can be seen
by examining the second-order change due to a fluctuation, say of the entropy.
For this particular case,
2

UR
d2 UR =
(dSR )2
SR 2

2
UR
(dS)2
=
SR 2
T
=
(dS)2
(167)
CR
Likewise,
2

d U

2U
S 2

T
(dS)2
C

(dS)2
(168)

Therefore, if CR C, one has d2 U d2 UR . Applying this type of consideration to the fluctuations of any set of extensive variables leads to the same
conclusion. The extremum principle is a minimum principle since
d2 U

d2 ( U T S )

(169)

where the first line holds, since T is being held constant and since S is an
independent variable, so the last term can only contribute to a first-order change
T dS. Thus, one has the condition
d2 F 0

(170)

If F is reinterpreted in terms of the Helmholtz Free-Energy F (T, V, N ), this

leads to the Helmholtz Minimum Principle. The Helmholtz Minimum Principle
states that, for a system being held at constant temperature T , the equilibrium
value of unconstrained internal parameter minimizes F (T, V, X).
For a system in thermal contact with a pressure reservoir of pressure P , the
infinitesimal change in internal energy of the reservoir is equal to the work done
on it
dUR = P dVR = P dV
(171)
35

Hence, one has

dU + dUR

dU + P dV

(172)

which, if P is being held constant, leads to

d( U + P V ) = 0
dH = 0

(173)

H = U + P V

(174)

where H is defined as
Hence, the quantity H satisfies an extremum principle for process at constant
P . The extremum principle is a minimum principle since
d2 U

= d2 ( U + P V )
0

(175)

where the first line holds since P is being held constant and V is an independent
variable. Thus, one has the condition
d2 H 0

(176)

The Enthalpy Minimum Principle states that, for a system being held at constant pressure P , the equilibrium value of unconstrained internal parameter
minimizes H(S, P, X).
For a system in thermal contact with a reservoir at constant temperature T
and constant pressure P
dUR = T dSR P dVR = T dS + P dV

(177)

Hence, one has

dU + dUR
dU T dS + P dV

(178)

which, if T and P are being held constant, leads to

d( U T S + P V ) = 0
dG = 0

(179)

G = U T S + P V

(180)

where G is defined as
Hence, the quantity G satisfies an extremum principle for process at constant
T and P . The extremum principle is a minimum principle since
d2 U

= d2 ( U T S + P V )
0

(181)
36

where the first line holds since T and P are being held constant and since S
and V are independent variables. Thus, one has the condition
d2 G 0

(182)

The Gibbs Minimum Principle states that, for a system being held at constant
temperature T and pressure P , the equilibrium value of unconstrained internal
parameter minimizes G(T, P, X).
A perhaps clearer, but less general, derivation of the minimum principle for
thermodynamic potentials can be found directly from the entropy maximum
principle. As an example of a minimum principle for a thermodynamic potential, consider a closed system composed of a system and reservoir which are in
thermal contact. The entropy of the combined system ST is given by
ST (U, V, N : UT , VT , NT ) = S(U, V, N ) + SR (UT U, VT V, NT N ) (183)
We shall consider the Taylor expansion of ST in powers of U , and we shall
assume that the reservoir is much bigger than the system so that the terms
involving higher-order derivatives are negligibly small
ST (U, V, N : UT , VT , NT )

=
=

U
+ ...
S(U, V, N ) + SR (UT , VT V, NT N )
TR

TR S(U, V, N ) U
SR (UT , VT V, NT N ) +
TR
(184)

where terms of the order N 2 /NR have been neglected. We note that the term in
the round parenthesis is of order N and contains all the information about the
subsystem of interest. The entropy maximum principle applied to the combined
system then implies that, in equilibrium, one must have

1
S
=
(185)
U V,N
TR
where TR is the temperature of the thermal reservoir defined by the partial
R (UT )
derivative SU
. Also one has
T
2
S
0
(186)
U 2 V,N
Now consider the convex generalized thermodynamic function F (U : TR , V, N ),
previously identified in the expression for ST , which is defined by
F (U : TR , V, N ) = U TR S(U, V, N )

(187)

for some constant TR . The first two derivatives of F w.r.t. U are given by

F
S
= 1 TR
(188)
U V,N
U V,N
37

and

2 F
U 2

= TR

V,N

2S
U 2

(189)
V,N

which shows that, if the parameter TR is identified with the temperature T of

the system, then F satisfies a minimum principle and that the minimum value
of F is given by the Helmholtz Free-Energy F (T, V, N ).

2.7

Thermodynamic Stability

The condition of thermodynamic stability imposes the condition of convexity or

concavity on the thermodynamic functions characterizing the system. Consider
two identical systems in thermal contact. The entropy maximum principle holds
for the combined system, of energy 2 U , volume 2 V and a total number of
particles 2 N . For the combined system to be stable against fluctuations of the
energy, the entropy function must satisfy the inequality
S(2U, 2V, 2N ) S(U + U, V, N ) + S(U U, V, N )

(190)

for any value of U . Due to the extensive nature of the entropy, this inequality
can be re-written as
2 S(U, V, N ) S(U + U, V, N ) + S(U U, V, N )

(191)

S(U)

Geometrically, the inequality expresses the fact that any chord joining two points
on the curve S(U ) must lie below the curve. Such a curve is known as a concave
curve. In the limit U 0, one obtains the weaker stability condition

U+ U

U- U

U
Figure 6: A concave curve representing S(U ). Any chord connecting two points
on S(U ) must lie below the curve.

0

2S
U 2
38

(192)
V,N

This condition must hold if the macroscopic state of the system characterized
by U, V, N is an equilibrium state. This condition can be re-stated as
2
S
0
U 2 V,N

1
T
2
T
U V,N
1
1
2
(193)
T CV,N
Thus, for a system to be stable, its heat capacity at constant volume must be
positive. This implies that the energy is a monotonically increasing function of
temperature at constant volume.
Likewise, if the energy and volume are allowed to fluctuate, the condition
for stability becomes
2 S(U, V, N ) S(U + U, V + V, N ) + S(U U, V V, N )

(194)

which can be expanded to yield

2
2
2
S
S
S
2
0
U
+
2
U
V
+
V 2 (195)
U 2 V,N
U V N
V 2 U,N
The right hand side of this inequality can be expressed as the sum of two terms

2
2S
2
2
2 2

U V
1
S
S
S
N

0
U +
V +

V 2
2
2
U
U
V
V
2
2
S
S
V,N
N
U,N
U 2

U 2
V,N

V,N

(196)
This leads to two weak conditions for stability, which are
2
S
0
U 2 V,N
and

0

2S
V 2

U,N

2S
U V

2
N

2S
U 2

(197)

(198)

V,N

The last condition can be re-stated as

2
2
2 2
S
S
S

2
2
V
U V,N
U V N
U,N

(199)

which is a condition on the determinant of the matrix of the second-order derivatives. The two by two matrix is a particular example of a Hessian Matrix which,
39

more generally, is an N by N matrix of the second-order derivatives of a function of N independent variables. The Hessian is the determinant of the Hessian
matrix. The Hessian describes the local curvature of the function. Although
the above two conditions have been derived for two identical subsystems, they
can be applied to any macroscopic part of a homogeneous system since thermodynamic quantities are uniformly distributed throughout the system.
Stability Conditions for Thermodynamic Potentials
The energy satisfies a minimum principle, which is reflected in the behavior of
the thermodynamic potentials. Therefore, the convexity of the thermodynamic
potentials can be used to obtain stability conditions for the thermodynamic
potentials.
The energy U (S, V, N ) satisfies a minimum principle. For a system composed
of two identical subsystems each with entropy S, volume V and number of
particles N , the condition for equilibrium under interchange of entropy and
volume is given by
U (S + S, V + V, N ) + U (S S, V V, N ) > 2 U (S, V, N )
For stability against entropy fluctuations, one has
2
U

0
S 2 V,N

T
0
S V,N

(200)

(201)

which leads to the condition CV 0 i.e. the specific heat at constant volume
is always positive. Stability against volume fluctuations leads to
2
U

0
V 2 S,N

P

0
(202)
V S,N
Thus, the entropy is a convex function of the extensive variables and the convexity leads to stability conditions against fluctuations of the extensive variables
which always have the same signs. However, stability against fluctuations of
both S and V leads to a more complex and less restrictive condition
2
2 2
2
U
U
U

(203)
2
2
V
S V,N
SV N
S,N
This can be shown to lead to the condition

P

0
V T,N
40

(204)

i.e. an increase in volume at constant temperature is always accompanied by a

decrease in pressure.
The extension of the stability conditions to the thermodynamic potentials
involves some consideration of the properties of the Legendre transform. It
will be seen that the thermodynamic potentials are convex functions of their
extensive variables but are concave functions of their intensive variables.
Consider a function y(x) which satisfies a minimum condition. The Legendre
transform of y(x) is (p). One notes that the Legendre transform and inverse
Legendre transform introduces the conjugate variables x and p via

y
(205)
p =
x
and

2y
x2

x =

These relations lead to

p
=
x

and
x
=
p

(206)

2
p2

(207)

(208)

dp
Thus, on equating the expressions for dx
, one has
2
y
p
1

=
=
x
x2
2

(209)

which shows that the sign of the second derivative w.r.t. the conjugate variables changes under the Legendre transform. Therefore, the condition for stability against fluctuations in x when expressed in terms of the thermodynamic
potential y has the opposite sign to the condition for stability of fluctuations
in p when expressed in terms of . The stability condition for fluctuations of
the other variables (which are not involved in the Legendre transform) have the
same sign for both y and .
The Helmholtz Free-Energy F (T, V, N ) is derived from the Legendre transform of U (S, V, N ) by eliminating the extensive variable S in favour of the intensive variable T . The condition for stability against temperature fluctuations
is expressed in terms of F (T, V, N ) as
2
F
0
(210)
T 2 V,N
which has the opposite sign as the stability conditions against entropy fluctuations when expressed in terms of U (S, V, N ). Stability against volume fluctuations leads to
2
F
0
(211)
V 2 T,N
41

which has the same sign as the stability conditions against volume fluctuations
when expressed in terms of U .
The stability condition for the enthalpy H(S, P, N ) against entropy fluctuations is given by
2
H
0
(212)
S 2 P,N
which has the same sign as the stability conditions against entropy fluctuations
when expressed in terms of U (S, V, N ). Stability against pressure fluctuations
leads to
2
H
0
(213)
P 2 S,N
which has the opposite sign as the stability conditions against volume fluctuations when expressed in terms of U .
The Gibbs Free-Energy involves a double Legendre transform of U , so both
stability conditions have opposite signs. The condition for stability against
temperature fluctuations is expressed in terms of G(T, P, N ) as
2
G
0
(214)
T 2 P,N
which has the opposite sign as the stability conditions against entropy fluctuations when expressed in terms of U (S, V, N ). Stability against pressure fluctuations leads to the condition
2
G
0
(215)
P 2 T,N
which has the opposite sign as the stability condition against volume fluctuations when expressed in terms of U .
The stability against volume fluctuations of a system held at constant temperature is expressed in terms of the second derivative of the Helmholtz FreeEnergy as
2
F
0
(216)
V 2 T,N
This can be related to the inequality

2U
V 2

S,N

2U
S 2

V,N

2U
SV

2
(217)
N

describing the stability condition obtained the energy minimum principle. This
can be proved by noting that the infinitesimal change in F shows that

2
P
F
=

(218)
V 2 T,N
V T,N
42

The derivative of P with respect to V at constant T can be expressed as a

Jacobian
2
(P, T )
F
=
(219)
2
V
(V, T )
T,N
Since we wish to express the inequality in terms of the energy, one should change
variables from V and T to S and V . This can be achieved using the properties
of the Jacobian
2
(P, T ) (S, V )
F
=
(220)
2
V
(S, V ) (V, T )
T,N
On using the antisymmetric nature of the Jacobian one can recognize that the
second factor is a derivative of S with respect to T , with V being held constant
2
(P, T ) (V, S)
F
=
2
V
(S, V ) (V, T )
T,N

S
(P, T )
=
T V,N (S, V )

S
T
P
T
P
=

T V,N
S V V S
S V V S
(221)
where the expression for the Jacobian has been used to obtain the last line. On
recognizing that P and T are the energy intensive parameters, one can write
2

S
T
P
T
P
F
=

V 2 T,N
T V,N
S V V S
S V V S

U
S
U
U
U
+
=

T V,N
S V S V S V
S S V V V S

2 2
2 2
S
U
U
U
=

(222)
2
2
T V,N
S V V
SV
S
where the last line has been obtained by using the analyticity of U . Finally, one
can write
2

2 2
2 2
F
S
U
U
U
=

V 2 T,N
T V,N
S 2 V V 2 S
SV

2
2U
2U
2U
SV
S 2
V 2
V
S
=
T
S

V,N

2U
S 2

2U
V 2

2U
S 2

2
(223)

V,N

2U
SV

which relates the stability condition against volume fluctuations at constant T

to the stability condition for fluctuations in S and V .
Homework:
Prove the stability condition

2G
T 2

G
P 2

2G
T P

2
0

(224)

Physical Consequences of Stability

The convexity of F with respect to V has been shown to lead to the condition

2
P
F
=
0
(225)
V 2 T,N
V T,N
which can be expressed as

2F
V 2

=
T,N

1
0
V T

Hence, the isothermal compressibility T defined by

P
T = V
0
V T

(226)

(227)

must always be positive. Likewise, the concavity of F with respect to T leads

to the stability condition
CV 0
(228)
that the specific heat at constant volume CV must always be positive.
Another consequence of thermodynamic stability is that a thermodynamic
system which is composed of parts that are free to move with respect to each
other, will become unstable if the temperature is negative. The entropy of the
-th component of the system is a function of the internal energy, that is U
p2
minus the kinetic energy 2m . Since the entropies are additive, the total entropy
is given by

X
p2
S =
S U
(229)
2m

but is subject to the constraint that the total momentum is conserved

X
p = P

(230)

The entropy has to be maximized subject to the constraint. This can be performed by using Lagranges method of undetermined multipliers. Thus, is to
be maximized with respect to p , where

X
p2
+ . p
(231)
=
S U
2m

Maximizing w.r.t. p leads to the equation

p
S
0 =
+
U m
1 p
=
+
T m

(232)

which leads to the velocities of each component being the same. Thus, no independent internal macroscopic linear motions are allowed in an equilibrium state.
For the stationary state to be stable against the momentum fluctuations of
the -th part, one requires that

1
m T

(233)

Therefore, stability against break-up of the system requires that T 0.

Homework:
Prove the two equalities
CP CV = T V

2
T

(234)

and

S
CV
=
T
CP
Hence, prove that the stability conditions imply the inequalities

(235)

CP CV 0

(236)

T S 0

(237)

and

The above conditions for stability are necessary but not sufficient to establish
that the equilibrium is completely stable, since a state may decrease its entropy
when there are infinitesimally small fluctuations in its macroscopic parameters,
but its entropy may increase if the deviations of the parameters have large
values. Such states are known as metastable states. A system which is in a
metastable state will remain there until a sufficiently large fluctuation occurs
that will take the system into a new state that is more stable.
45

U(x)
Metastable

x
Stable

Figure 7: A curve of the internal energy U (X) versus an internal variable X for
a system which exhibits a stable and a metastable state.

Foundations of Statistical Mechanics

Statistical Mechanics provides us with:

(i) A basis for the first-principles calculations of thermodynamic quantities and
transport coefficients of matter in terms of the dynamics of its microscopic constituents.
(ii) A physical significance for entropy.

3.1

Phase Space

In general, phase space is the space of a set of ordered numbers which describes the microscopic states of a many-particle system. For a classical system,
one can describe the state of a system by a set of continuously varying variables
corresponding to the generalized momenta and generalized coordinates of each
particle. However, for quantum systems, the Heisenberg uncertainty principle
forbids one to know the momentum and position of any single particle precisely.
In this case, the quantum states of a particle can be proscribed by specifying
the eigenvalues of a mutually commuting set of operators representing physical
observables. The eigenvalues can be either continuous or discrete. Thus, the
phase space for a quantum system can either consist of a set of discrete numbers
or can consist of a set of continuous numbers, as in the classical case.
Classical Phase Space
A microscopic state of a classical system of particles can be described by
46

proscribing all the microscopic coordinates and momenta describing the internal degrees of freedom.
For a classical system of N particles moving in a three-dimensional space,
the state of one particle, at any instant of time, can be specified by proscribing
the values of the three coordinates (q1 , q2 , q3 ) and the values of the three canonically conjugate momenta (p1 , p2 , p3 ).
The state of the many-particle system, at one instant of time, is proscribed
by specifying the values of 3N coordinates qi , (i {1, 2, 3, . . . 3N }) and the values of 3N canonically conjugate momenta pi , (i {1, 2, 3, . . . 3N }). The space
composed of the ordered set of 6N components of the coordinates and momenta
is the phase space of the N particle system. This phase space has 6N dimensions.
Distinguishable Particles
For distinguishable particles for which each particle can be given a unique
label, each point in phase space represents a unique microscopic state.
Indistinguishable Particles
By contrast, for indistinguishable particles it is not admissible to label particles. The material is invariant under all permutations of the sets of labels
assigned to each of the N particles. There are N ! such permutations for the N
particle system, and each one of these N ! permutations can be built by successively permuting the two sets of (six) labels assigned to pairs of particles. To be
sure, the permutation of a particle described by the values of the ordered set of
variables {q1 , q2 , q3 , p1 , p2 , p3 } and a second particle described by the values of
the ordered set {q4 , q5 , q6 , p4 , p5 , p6 } is achieved by the interchange of the values
{q1 , q2 , q3 , p1 , p2 , p3 } {q4 , q5 , q6 , p4 , p5 , p6 }. Any of these N ! permutations of
the sets of labels assigned the N particles, has the action of transforming one
point in phase space to a different point. Since it is not permissable to label
indistinguishable particles, the resulting N ! different points in phase space must
represent the same physical state.
The Number of Microscopic States.
Given the correspondence between points in phase space and microscopic
states of the system, it is useful to introduce a measure of the number of microscopic states of a system N . One such measure is proportional to the volume of
accessible phase space. Consider an infinitesimal volume element of phase space,
defined by the conditions that the generalized momenta pi lie in the intervals
given by
Pi + pi > pi > Pi
(238)

and the coordinates qi are restricted to the intervals

Qi + qi > qi > Qi

(239)

for all i. This infinitesimal volume element is given by

3N
Y

pi qi

(240)

i=1

The infinitesimal volume element of phase space has dimensions of p3N q 3N .

To turn this into a dimensionless quantity, one has to divide by a quantity with

[p]

[q]
Figure 8: An infinitesimal hyper-cubic volume of phase space = p3N q 3N .
dimensions of (Action)3N . Although, any quantity with dimensions of the action would do, it is convenient, to use 2 h as the measure for the action. With
this particular choice, the dimensionless measure of the volume of phase space
is given by

3N
Y
pi qi

=
(241)
( 2 h )3N
2 h
i=1
The identification of h
with Plancks constant is convenient since it allows one to
make a connection with the number of quantum states, within the quasi-classical
limit. The Heisenberg uncertainty principle dictates that the uncertainty in the
momentum and position of a single-particle (wave-packet) state cannot be determined to better than pi qi > 2 h. Hence, it appears to be reasonable to
define the volume of phase space occupied by a single-particle state as ( 2 pi h)3
and so the dimensionless measure for the number of states for a single-particle
system would be given by

3
Y
pi qi
(242)
2 h
i=1
and consequently, the measure of the distinct microscopic states is given by
N

3N
Y

pi qi
=
=
( 2 h )3N
2 h
i=1

(243)

for a system of N distinguishable particles. If the particles are indistinguishable,

the number of distinct microscopic states N is defined as
N =

N ! ( 2 h )3N

(244)

where we have divided by N ! which is the number of permutations of the N

sets of particle labels.

3.2

Trajectories in Phase Space

As time evolves, the system is also expected to evolve with time. For a classical
system, the time evolution of the coordinates and momenta are governed by
Hamiltons equations of motion, and the initial point in phase space will map
out a trajectory in the 6N dimensional phase space. A closed system, where no
time-dependent external fields are present, the Hamiltonian is a function of the
set of 3N generalized momenta and the 3N generalized coordinates H({pi , qi })
and has no explicit time dependence. The rate of change of {pi , qi } where
i {1, 2, 3, . . . 3N } are given by the set of Hamiltons equations of motion
dpi
dt
dqi
dt

H
qi
H
= +
pi

= { pi , H }P B =
= { qi , H }P B

(245)

where P.B. denotes the Poisson Bracket. The Poisson Bracket of two quantities
A and B is defined as the antisymmetric quantity
{ A , B }P B

3N
X
A B
B A

=
qi pi
qi pi
i=1

(246)

The trajectory originating form a specific point in phase space will be given
by the solution of Hamiltons equations of motion, where the initial conditions
correspond to the values of the 6N variables at the initial point.
============================================
Example: Motion of a single particle in One-Dimension.
A particle of mass m moving in one dimension in the presence of a potential
energy V (q) is described by the Hamiltonian
H =

p2
+ V (q)
2m

(247)

The motion of the particle is described Hamiltons equations of motion which

simply reduce to the form
dp
dt
dq
dt

V
q

=
=

p
m

(248)

as is expected.
============================================
The time dependence of any physical quantity A({pi , qi } : t) can be evaluated
by evaluating it on the trajectory followed in phase space . Hamiltons equation of motion have the consequence that the total derivative of any quantity
A({pi , qi } : t) can be found from the Poisson Bracket equation of motion
dA
dt

3N
X
dqi
=
dt
i=1
3N
X
dH
=
dpi
i=1

A
dpi A
+
qi
dt pi

A
dH A

qi
dqi pi

= { A , H }P B +

A
t

A
t
(249)

The first term describes the implicit time dependence of A and the second term
describes its explicit time dependence.
If a quantity B has no explicit time dependence and the Poisson Bracket of
B and H are zero, then B is conserved.
dB
dt

= { B , H }P B +

B
t

= { B , H }P B
=

(250)

where the first two lines follow from our stated assumptions. Since the total
derivative governs the change of B as the system flows through phase space, B
is conserved. As an example, since our Hamiltonian does not explicitly depend
on time, the Poisson Bracket equation of motion shows that the total derivative
of the Hamiltonian w.r.t. time is zero. Explicitly, the equation of motion for H
is given by
dH
dt

= { H , H }P B +

H
t

= { H , H }P B
=

(251)
50

where the second line follows from the absence of any explicit time dependence
and the last line follows from the antisymmetric nature of the Poisson Bracket.
Hence, the energy is a constant of motion for our closed system. That is, the
energy is constant over the trajectory traversed in phase space.

G
[p]

[q]

Figure 9: A microscopic state of macroscopic system is described point in phase

space and, as it evolves, maps out a very complex trajectory which is governed
by Hamiltons equations of motion.
The equations of motion allows us to follow the time evolution of a point in
phase space, i.e. the evolution of the microscopic state of the system. The trajectory in phase space may be extremely complicated and rapidly varying. For
example in a collision between two neutral molecules, the change in momentum
almost exclusively occurs when the separation between the two molecules is of
the order of the molecular size. This should be compared with the length scale
over which the momentum of each the pair of molecules is constant, which is
given by the ratio of the molecular size divided by the distance travelled by a
molecule between its successive collisions (i.e. the mean free path). This ratio is usually quite large for dilute gasses. If these distances are scaled by the
molecular velocities, one concludes that the momentum of the particles changes
rapidly at the collisions and so the trajectory in phase space changes abruptly.
The same distance ratio also implies that the particular form of a trajectory
is extremely sensitive to the initial conditions, since a small change in initial
conditions determines whether or not a particular collision will occur. The sensitivity to initial conditions and the complexity of the trajectories in phase space
prohibit both analytic solution and also numerical solution for realistic materials. Numerical solution is prohibited due to the enormity of the requirements
for storing the initial conditions, let alone for implementing the numerical solution of the equations of motion. Despite the complexity of trajectories in phase
space, and their sensitivity to initial conditions, the trajectories do have some
important common features.
The trajectory of a closed system cannot intersect with itself. This is a
consequence of Hamiltons equations of motion completely specifying the future
51

motion of a system, if the set of initial conditions are given and since H has no
explicit time dependence. Thus, a trajectory cannot cross itself, since if there
was then Hamiltons equations would lead to an indeterminacy at the point of
intersection. That is, there would be two possible solutions of Hamiltons equations of motion, if the systems initial conditions placed it at the crossing point.
This is not possible. However, it is possible that a trajectory closes up on itself
and forms a closed orbit.
Secondly, the trajectories only occupy a portion of phase space for which the
constants of motion are equal to their initial values.

3.3

Conserved Quantities and Accessible Phase Space

If a system has a set of conserved quantities, then the trajectory followed by

the system is restricted to a generalized surface in the 6N dimensional phase
space, on which the conserved quantities take on their initial values. The set
of points on the generalized surface is known as the accessible phase space a .
For a classical system where only the energy is conserved and has the initial
value E, the points in the accessible phase space is given by the set of points
{pi , qi } that satisfy the equation
H({pi , qi }) = E

(252)

or if the energy is only known to within an uncertainty of E, then the accessible

phase space is given by the set of points that satisfy the inequality
E + E > H({pi , qi }) > E

(253)

============================================
Example: A One-Dimensional Classical Harmonic Oscillator
The Hamiltonian for a particle of mass m constrained to move in one dimension, subject to a harmonic restoring force, is described by the Hamiltonian
H =

m 02 2
p2
+
q
2m
2

(254)

The phase space of this system corresponds to the entire two-dimensional

plane (p, q). If the energy is known to lie in an interval of width E around E,
then the accessible phase space a is determined by
E + E >

m 02 2
p2
+
q > E
2m
2
52

(255)

The surfaces of constant energy4 are in the form of ellipses in phase space,
with semi-major and semi-minor axes given by the turning points

pmax = 2 m E
(256)
and

s
qmax =

2E
m 02

(257)

The ellipse encloses an area of phase space which is given by

pmax qmax = 2

E
0

(258)

Therefore, the accessible phase space a forms an area enclosed between two
ellipses, one ellipse with energy E + E and another with energy E. Thus, the
p
E+DE

pmax

qmax
q

Figure 10: The accessible area of phase space of a one-dimensional Harmonic

oscillator is the area enclosed by the two ellipses.
area of accessible phase space is found as
a = 2

E
0

(259)

On diving by 2
h we can turn a into a measure of the number of microscopic
states accessible to the system N , we find
N =

E
0
h

(260)

This is a measure of the number of different states accessible to the system, and
can be interpreted quantum mechanically as the number of different quantum
states which correspond to the energy within the accuracy E that has been
specified. The result N is just the uncertainty in the number of quanta in the
system.
4 In this case the volume of phase space is an infinite two-dimensional area and, if the
energy is specified precisely, the area of accessible phase space is a line.

============================================
In the most general case where there are several other conserved quantities
Bj ({pi , qi }) (say there are M in number) which have specific values Bj , the accessible phase space will consist of the points in phase space where the surfaces
Bj = Bj ({pi , qi }) corresponding to the conserved quantities intersect. That is,
the accessible phase space corresponds to the points which are consistent with
the values of all the M conserved quantities Bj
E

= H({pi , qi })

= Bj ({pi , qi })

(261)

for all j 1, 2, . . . M . In all cases, the physical trajectories of the system are
restricted to move within the accessible region of phase space.

3.4

Macroscopic Measurements and Time Averages

The measurement of thermodynamic quantities usually represents a relatively

slow process when compared to microscopic time scales. Furthermore, the measurement involves the participation of many of the systems degrees of freedom.
This implies that a macroscopic measurement of a quantity A corresponds to a
time-average of a quantity A({pi , qi }) over a trajectory in phase space, over a
long period of time. Furthermore, the quantity A({pi , qi }) must involve many
degrees of freedom of the system. For a long period of time T , the macroscopic
quantity is given by
A =

1
T

dt A({pi (t), qi (t)})

(262)

where A({pi (t), qi (t)}) varies with time, as the microscopic state changes with
time. That is, the set of momenta and coordinates {pi (t), qi (t)} are considered
to be implicit functions of time and are obtained by solving Hamiltons equations using the initial data.
As an example, consider the pressure on a container wall which encloses a
dilute gas. The pressure P is defined as the force per unit area. The force F is
averaged over a time long compared with the time between molecular collisions
with the wall. The force is is given by the rate of change of momentum of the
molecules impinging on the wall. The force due to a molecular collision occurs
over the time-scale which corresponds to the time in which the molecule is in
close proximity to the wall. On introducing a short-ranged interaction between
the particle and the wall, one finds that the instantaneous force exerted on the
wall is given by

dV (q3 )
F3 (t) =
(263)

dq3
q3 (t)
54

where V (q3 ) is a short-ranged potential due to the interaction of the particle

with the wall. Therefore, the instantaneous pressure is given by

N
1 X dV (q3i )
P (t) =

A i=1
dq3i
q3i (t)

(264)

where A is the area of the wall. The instantaneous pressure would have the
appearance of a sparse sequence of delta-like functions. The thermodynamic
pressure is given by the time-average over an interval T in which many collisions
occur
Z T
1
P =
dt P (t)
T 0

Z T
N
X
dV (q3i )
1
dt
(265)
=
T A 0
dq3i q3i (t)
i=1
This result is of the form that we are considering. If the time average is over a
long enough time interval, the result should be representative of the equilibrium
state in which P does not change with time.
The process of time averaging over long intervals is extremely convenient
since it circumvents the question of what microscopic initial conditions should
be used. For sufficiently long times, the same average would be obtained for
many point on the trajectory. Thus, the long time average is roughly equivalent to an average with a statistical distribution of microscopic initial conditions.

3.5

Ensembles and Averages over Phase Space

The time-average of any quantity over the trajectory in phase space can be
replaced by an average over phase space, in which the different volumes are
weighted with a distribution function ({pi , qi } : t). The distribution function
may dependent on the point of phase space {pi , qi }, and may also depend on
the time t.
Conceptually, the averaging over phase space may be envisaged by introducing an ensemble composed of a very large number of identical systems each of
which have the same set of values for their measured conserved quantities and
all systems must represent the same macroscopic equilibrium state. Although
the different systems making up the ensemble correspond to the same macroscopic equilibrium state, the systems may correspond to different microstates.
The concept of Ensemble Averaging was first introduced by Maxwell in 1879
and developed more fully by Gibbs in 1909.
There are infinitely many possible choices of ensembles, one trivial example
is that each system in the ensemble corresponds to the same initial microstate.
55

Another example corresponds to taking all the different points of a trajectory

of one microscopic state as the initial states of the ensemble. A frequently used
ensemble corresponds to distributing the probability density equally over all
points in phase space compatible with the measured quantities of the macroscopic state.
The Probability Distribution Function
The probability distribution function ({pi , qi } : t) could, in principle, be
measured by measuring the microstates of the systems composing an ensemble
at time t and determining the relative number of systems which are found in
microstates in the volume d of phase space around the point {pi , qi }. In the
limit that the number of systems in the ensemble goes to infinity, this ratio
reduces to a probability. The probability dp(t) is expected to be proportional
to the volume of phase space d. Therefore, we expect that
dp(t) = ({pi , qi } : t) d

(266)

where ({pi , qi } : t) is the probability distribution function. The probability

distribution function is only finite for the accessible volume of phase space. Since
probabilities are non-negative, then so is the probability distribution function.
Furthermore, since the probabilities are defined to be normalized to unity, the
probability distribution function must also be normalized
Z
1 =
dp(t)
Z
=
d ({pi , qi } : t)
(267)
for all times t. For a macroscopic system, the integration over d may be restricted to the volume of available phase space with any loss of generality.
Ensemble Averages
Once ({pi , qi } : t) has been determined, the measured value of any physical
quantity A({pi , qi } : t) of a system in a macroscopic state at time t can then be
represented by an ensemble average. The ensemble average is the average over
phase space weighted by the probability distribution function
Z
A(t) =
d A({pi , qi } : t) ({pi , qi } : t)
(268)

If all the different points of a trajectory of one microscopic state is taken to

define the initial states of the ensemble, the the ensemble averages will coincide
with the long-time average for the microscopic state. At the other extreme, if
each system in the ensemble corresponds to the same initial microstate, then
56

the ensemble average of a quantity at any time t will simply correspond to the
value of the quantity for the microstate at time t.
The fundamental problem of statistical mechanics is to find the probability distribution function for the ensemble that describes measurements on the
macroscopic equilibrium states of physical systems most closely. We shall examine the equations that determine the time-dependence of the probability distribution function in the next section.

3.6

Liouvilles Theorem

Liouvilles Theorem concerns how the probability distribution function for finding our N -particle system in some volume element of phase space at time t
varies with time.
Since the probability is normalized and since the states of a system evolve
on continuous trajectories in phase space, the probability density must satisfy a
continuity equation. Consider a volume element d of phase space, the number
of systems in the ensemble that occupy this volume element is proportional to
({pi , qi } : t) d

(269)

and the increase of the number of systems in this volume element that occurs
in the time interval dt is proportional to

({pi , qi } : t) d dt (270)
({pi , qi } : t + dt) ({pi , qi } : t) d
t
where we have used the Taylor expansion to obtain the right hand side of the
equation. Due to the continuous nature of the trajectories, the increase in the
number of trajectories in the volume must be due to system trajectories which
cross the surface of our 6N -dimensional volume. That is, the net increase must
be due to an excess of the flow across the bounding surfaces into the volume
over the flow out of the volume.
Consider the infinitesimal volume of phase space d where the i-th coordinate is restricted to be between qi and qi + qi and the i-th generalized
momentum is restricted to be between pi and pi + pi . The volume element d
is given by

3N
Y
d =
qi pi
(271)
i=1

The pair of opposite surfaces defined by the coordinates qi and qi + qi have

6N 1 dimensions and have an area given by
3N
Y

3N
Y
j=1

j=1,j6=i

(272)

pj+ pj
dqi/dt
pj
qi

qi+ qi

Figure
Q3N 11: An infinitesimal hyper-cubic element of phase of dimensions dq=i
i=1 qi pi . In time interval dt, the probability density within a distance dt
perpendicular to the bounding surface at qi is swept into the volume.
Trajectories which enter or leave the volume element d must cross one of its
6N boundaries.
Flow In Across a Surface
All the systems of the ensemble in microstates within a distance qi dt behind
the surface
Q3Nat qi will enter
Q3N d in time dt. That is, the ensemble systems in the
volume j=1,j6=i qj j=1 pj qi ({pi , qi }) dt will enter d in the time interval
dt. The number of systems in this volume is proportional to
3N
Y

3N
Y

pj dt qi ({pi , qi }) ({pi , qi } : t)

(273)

j=1

j=1,j6=i

Flow Out Across a Surface

All the systems in the ensemble with microstates that are within a distance
qi dt behind the surface at qi + qi will leave d in time dt. The number of
systems in this volume is proportional to
3N
Y
j=1,j6=i

3N
Y

pj dt qi ({pi , qi + qi }) ({pi , qi + qi } : t)

(274)

j=1

where the velocity and density must be evaluated at the position of the second
surface.
The Net Flow into the Volume
The net flow into d from a pair of coordinate surfaces is given by the
difference of the flow crossing the coordinate surface entering the volume and

the flow crossing the opposite surface thereby leaving the volume

3N
3N
Y
Y
qj
pj dt qi ({pi , qi }) ({pi , qi } : t) qi ({pi , qi + qi }) ({pi , qi + qi } : t)
j=1,j6=i

3N
Y

j=1

3N
Y

pj dt

j=1

qi ({pi , qi }) ({pi , qi } : t)

(275)

where we have Taylor expanded in powers of qi . Likewise, the net flow into
d from the pair of momentum surfaces at pi and pi + pi is given by

3N
3N
Y
Y
qj
pj dt pi ({pi , qi }) ({pi , qi } : t) pi ({pi + pi , qi }) ({pi + pi , qi } : t)
j=1

j=1,j6=i
3N
Y
j=1

3N
Y
j=1

pj dt
pi

pi ({pi , qi }) ({pi , qi } : t)

(276)

On summing over all the 6N surfaces, one finds that the net increase of the
number of ensemble systems in the volume d that occurs in time dt due to
their flowing across all its boundaries is proportional to

3N
3N
Y
Y

qj
pj dt
qi ({pi , qi }) ({pi , qi } : t) +
pi ({pi , qi }) ({pi , qi } : t)
qi
pi
j=1
j=1
(277)

The Continuity Equation

On equating the net increase of the probability in the infinitesimal volume
element with the net probability flowing into the volume, one can cancel the
factors of dt and d. Hence, one finds that the probability density satisfies the
linear partial differential equation

3N
X

+
qi
+
pi
= 0
(278)
t
qi
pi
i=1
On expanding the derivatives of the products one obtains

3N
X

+
+ qi
+
+ pi
= 0
t
qi
qi
pi
pi
i=1

(279)

The above expression simplifies on using Hamiltons equations of motion

qi
pi

H
pi
H
=
qi
=

(280)
(281)

so one obtains
qi
qi
pi
pi

2H
qi pi
2H

pi qi

=
=

(282)
(283)

On substituting these two relations in the equation of motion for , the pair of
second-order derivatives cancel and one finally obtains Liouvilles equation

3N
X

+
+ pi
= 0
qi
t
qi
pi
i=1

(284)

That is, the total derivative of vanishes

3N
X

d
=
+
qi
+ pi
= 0
dt
t
qi
pi
i=1

(285)

The total derivative is the derivative of evaluated on the trajectory followed

by the system. Hence, is constant along the trajectory. Therefore, Liouvilles
theorem states that flows like an incompressible fluid.

r({qi(t),pi(t)})
r({qi(t'),pi(t')})

Figure 12: The time evolution of an inhomogeneous probability density

({qi , pi }) satisfies a continuity equation.
On substituting Hamiltons equations for the expressions for qi and pi in
Liouvilles theorem, one recovers the Poisson Bracket equation of motion for
d
dt

=
=

3N
X

H
H
+

t
pi qi
qi pi
i=1

+
, H
= 0
t
PB
60

(286)

which is in a form suitable for Canonical Quantization, in which case and

H should be replaced by operators and the Poisson Bracket by a commutator
times an imaginary number.
Liouvilles theorem is automatically satisfied for any which has no explicit
t-dependence and can be expressed in terms of the constants of motion. Specifically, when is initially uniform over the accessible phase space, Liouvilles
theorem ensures that it will remain constant. To be sure, if the distribution
satisfies

= 0 i
(287)
pi
and

= 0 i
(288)
qi
for all points {pi , qi } within the accessible volume of phase space (defined by
H({pi , qi }) = E and any other relevant conservation laws) then Liouvilles theorem yields that

= 0
(289)
t

============================================
Example: A Particle in a One-Dimensional Box.
We shall consider an example that illustrates how a probability density thins
and folds as time evolves. The example also shows that for sufficiently large
times, the probability distribution is finely divided and distributed over the volume of accessible phase space.
We shall consider an ensemble of systems. Each system is composed of a
single particle that is confined in a one-dimensional box of length L. When the
particle is not in contact with the walls, the Hamiltonian reduces to
H(p, q) =

p2
2m

(290)

The energies of the systems in the ensemble are bounded by

E + E > H(p, q) > E

(291)

which restricts the momenta to the two intervals

pmax > p > pmin

pmin > p > pmax

and

(292)

The coordinates are restricted to the interval

L
L
> q >
2
2
61

(293)

Thus, the volume of accessible phase space consists of two two-dimensional

strips.
The probability distribution (p, q : t) evolves according to Liouvilles theorem

H
H
+

= 0
(294)
t
p q
q p
which for volumes contained within the spatial boundaries reduces to
p

+
= 0
t
m q
This equation has the general solution

pt
B(p)
(p, q : t) = A q
m

(295)

(296)

which is valid everywhere except at the locations of the walls. In the general
solution A and B are arbitrary functions which must be fixed by the boundary
conditions.
We shall adopt the initial condition that the probability distribution function
has the form
(p, q : 0) = (q) B(p)
(297)
which initially confines all the particles in the ensemble to the center q = 0.
The momentum distribution function B(p) is evenly distributed over the allowed
range

1
B(p) =
(ppmax ) (ppmin ) + (p+pmax ) (p+pmin )
2 ( pmax pmin )
(298)

For sufficiently short times, short enough so that the particles in the ensemble
have not yet made contact with the walls, the solution is of the form

p
(299)
(p, q : t) = q
t B(p)
m
which has the form of two segments of a line. The slope of the line in phase
space is given by m/t. For small times the segments are almost vertical, but the
slope increases as t increases. The increase in slope is caused by the dispersion of
the velocities, and causes the length of the line to increase. The increase in the
length of the line does not affect the normalization, which is solely determined
by B(p). At a time T1 some particles in the ensemble will first strike the walls,
that is the line segments in available phase space will extend to q = L2 . This
first happens when
Lm
T1 =
(300)
2 pmax
62

p
pmax
pmin
q

-L/2

L/2
-pmin
-pmax

Figure 13: The regions where the probability density for an ensemble of systems
composed of a particle in a box is non-zero, at short times, is shown by the solid
portion of the blue line. The slope of the line is caused by the dispersion in
the velocities. The accessible phase space is enclosed by the red dashed lines
between pmax and pmin , and a similar region in the lower half space.
For times greater than T1 some of the ensembles particles will be reflected from
the walls. The solution of Liouvilles equation can be found by the method of imp

pmax

pmin
q
L/2

-L/2
-pmin

-pmax

Figure 14: The regions where the probability density for an ensemble of particles
in boxes is non-zero, for times slightly greater than the times of the first collision,
is shown by the solid portion of the blue line. The two small line segments
in the upper left-hand and lower right-hand portion of accessible phase space
represents the region of the probability density for systems where the particle
has been reflected.
ages. That is, the reflected portion of the probability density can be thought of
as originating from identical systems with identical initial conditions except that
they are obtained by spatially reflecting our system at its boundaries q = L2 .
The reflection requires that B(p) B(p) in the image. The probability dis63

tribution emanating from these other systems will enter the volume of available
our available phase space at time T1 which will represent the reflected portion of
the probability distribution function. The probability distribution that leaves

Figure 15: The extended phase space produced by reflecting the central area
across its boundaries. In this extended system, the reflected probability density
is simply represented by the free evolution of the initial probabilities of the
image systems.
our system, represents the reflected portion of the probability distribution for
the neighboring systems. Thus, we are mentally extending the region of accessible phase space in the spatial direction. The solution just after the first
reflection has occurred, but for times before any system has experienced two
reflections is given by
(p, q : t) =

n=1
X

n=1

p
q nL
t
m

where q is restricted to the interval L2 > q >

distribution does not affect its normalization.

B(p)
L
2.

(301)

The folding of the

For larger times, for which any system in the ensemble has undergone multiple reflections, the set of systems must be periodically continued along the
spatial axis. That is, we must consider multiple images of our system. The
probability distribution valid at any time obviously has the form
(p, q : t) =

q nL

p
t
m

B(p)

(302)

Figure 16: The regions where the probability density for a particle in a box
is non-zero, for large times, is shown by the solid blue lines. For large times,
particles in the ensemble have experienced different numbers of collisions and is
spread over many line segments.
where q is still restricted to the interval of length L. The probability distribution is non-zero on a set of parallel line segments with slope m/t. The line
segments are separated by a distance ( m L )/t along the momentum direction. For sufficiently large times, the slope of the lines will be small and they
will be closely spaced. In conclusion, for sufficiently large times, we have shown
that the probability distribution will be finely divided and spread throughout
the volume of accessible phase space.
============================================
The Time Dependence of Averages
Liouvilles theorem shows that the time dependence of any quantity A({p1 , qi })
(with no explicit t dependence) also follows from the Poisson Bracket equations.
This can be seen by first multiplying Liouvilles equation by A({p1 , qi }) and then
integrating over phase space.

Z
Z

0 =
d A({p1 , qi })
+
d A({p1 , qi }) , H
(303)
t
PB
The derivatives of w.r.t the variables {pi , qi } that occur in the Poisson Bracket
term can be removed by integrating by parts. That is, on noting that vanishes
on the boundaries of the integration, then integration by parts yields
Z
0 =

d A

3N Z
X

d
A

A
(304)
t
qi
pi
pi
qi
i=1

The derivatives of the products can be expanded to yield

Z
3N Z
X

2H
A H
2H
A H
d A
=
+ A

A
d
t
qi pi
qi pi
pi qi
pi qi
i=1
(305)
The terms proportional to the second derivative of the Hamiltonian cancel,
leading to
Z

dA
=
d A
dt
t

3N Z
X
A H
A H
=
d

qi pi
pi qi
i=1

Z
=
d A , H
(306)
PB

which equates the time-derivative of the average of A with the average of the
Poisson Brackets.
The above equation has the consequence that for a macroscopic equilibrium
state the condition that the average of any quantity A that has no explicit
t-dependence should be independent of time
Z

dA
=
d
A = 0
(307)
dt
t
where the entire volume of the integration is fixed. (Note that in this expression,
the total derivative has a different meaning from before since the integration
volume element is considered as being held fixed.) The requirement of the timeindependence of any quantity A in equilibrium necessitates that the Poisson
Bracket of and H must vanish. This can be achieved if only depends on H
and any other conserved quantities.

3.7

The Ergodic Hypothesis

In proving Liouvilles theorem, we noted that

qi
qi
pi
pi

=
=

2H
qi pi
2H

pi qi

(308)
(309)

This has the consequence that if one follows the flow of the systems of the
ensemble with microstates contained in a specific volume of phase space d at
time t then at time t0 the set of microstates will have evolved to occupy a volume
of phase space d0 such that
d = d0
(310)
66

This can be seen by considering the product of the canonically conjugate pairs
of infinitesimal momenta and coordinates at time t
dpi dqi

(311)

At time t + dt the time evolution will have mapped the ends of these intervals
onto new intervals such that the lengths of the new intervals are given by

pi
0
dpi = dpi 1 +
dt
(312)
pi
and
dqi0

= dqi

qi
dt
1 +
qi

Therefore, the product of the new intervals is given by

pi
qi
2
dpi dqi 1 +
+
dt + O(dt )
qi
pi

(313)

(314)

which since

qi
pi
+
=0
qi
pi
leaves the product invariant, to first-order in dt. Hence, since
0

d =

3N
Y

dp0i dqi0

(315)

(316)

i=1

the size of the volume element occupied by the microstates is invariant, i.e.
d = d0 . This does not imply that the shape of the volume elements remains
unchanged, in fact they will become progressively distorted as time evolves. For
most systems for which the trajectories are very sensitive to the initial conditions, the volume elements will be stretched and folded, resulting in the volume
being finely divided and distributed over the accessible phase space.
The initial formulation of the Ergodic Hypothesis was introduced by Boltzmann5 in 1871. A modified form of the hypothesis asserts that if the volume of
accessible phase space is finite, then given a sufficiently long time interval, the
trajectories of the microstates initially contained in a volume element d will
come arbitrarily close to every point in accessible phase space. If this hypothesis
is true, then a long-time average of an ensemble containing states initially in d
will be practically equivalent to an average over the entire volume of accessible
phase space with a suitable probability density. That is, the Ergodic Hypothesis
leads one to expect that the equation
Z T
Z
1
A =
dt A({pi (t), qi (t)}) =
d A({pi , qi }) ({pi , qi })
(317)
T 0
5 L. Boltzmann, Einige allgemeninen S
atze u
ber das W
armegleichgewicht, Wien Ber. 63,
670-711 (1871).

holds for some ({pi , qi }) (the Ergodic Distribution) at sufficiently large times T .
The Ergodic Theorem.
The Ergodic Theorem (due to J. von Neumann6 , and then improved on by
Birkhoff7 in the 1930s) states that the time-average of a quantity A along a
trajectory that is initially located at any point in phase space, then in the limit
as the time goes to infinity one has:
(i) the time-average converges to a limit.
(ii) that limit is equal to the weighted average of the quantity over accessible
phase space. That is, the trajectory emanating from any initial point resembles
the whole of the accessible phase space.
The Ergodic Theorem has been proved for collisions of hard spheres and for
motion on the geodesics on surfaces with constant negative curvature. Ergodicity can also be demonstrated for systems through computer simulations. The
Ergodic Theorem has similar implications as a weaker theorem which is known
as the Poincare Recurrence Theorem.
Poincar
es Recurrence Theorem
The Poincare Recurrence Theorem8 states that most systems will, after a
sufficiently long time, return to a state very close to their initial states. The
Poincare Recurrence Time TR is the time interval that has elapsed between the
initial time and the time when the systems recur. The theorem was first proved
by Henri Poincare in 1890.
The proof is based on the two facts:
(i) The phase trajectories of a closed system do not intersect.
(ii) The infinitesimal volume of a phase space is conserved under time evolution.
Consider an arbitrarily small neighbourhood around any initial point in accessible phase space and follow the volumes trajectory as the microstates evolve
with time. The volume sweeps out a tube in phase space as it moves. The
tube can never cross the regions that have been already swept out, since trajectories in phase space do not intersect. Hence, as the accessible phase space is
a compact manifold, the total volume available for future motion without recurrence will decrease as the time increases. If the tube has not already returned to
6 J.

von Neumann, Physical Applications of the Ergodic Hypothesis, Proc. Natl. Acad.
Sci. 18, 263-266,(1932).
7 G.D. Birkhoff, Proof of the ergodic theorem, Proc. Natl. Acad. Sci. 17, 656-660 (1930).
8 H. Poincar
e, Sur les courbes d
efinies par une
equation diff
erentielle, Oeuvres, 1, Paris,
(1892).

DG'
DG

DG"

Figure 17: A schematic description of Poincares Recurrence Theorem. Under

time-evolution, a region of phase space sweeps out a trajectory in phase
space. At each instant of time, the region occupies an equal volume of the
accessible phase space a , so that = 0 = . After a time TR , the
trajectory of the region will come arbitrarily close to the initial region.
the initial neighborhood (in which case recurrence has already occurred), then
since the total volume of accessible phase space is finite, in a finite time TR all
the volume of accessible phase space must be exhausted. At that time, the only
possibility is that the phase tube returns to the neighbourhood of the initial
point.
Quod Erat Demonstrandum (QED).
Thus, the trajectory comes arbitrarily close to itself at a later time TR . If
the trajectories dont repeat and form closed orbits they must densely fill out
all the available phase space. However, if Poincare recurrence occurs before
the entire volume of available phase space is swept out, some of it may remain
unvisited. Liouvilles theorem implies that the density of trajectories is uniform
in the volume of accessible phase space that is visited. The Recurrence Time
TR is expected to be extremely larger than the time scale for any macroscopic
measurement, in which case the Recurrence Theorem cannot be used to justify
replacing time averages with ensemble averages.
A simple example of the Ergodic Theorem is given by the One-dimensional
Harmonic Oscillator. The Double-Well Potential exhibits a region where ergodicity applies, but for low energies the motion may become constrained to one
well in which case ergodicity does not apply.
If the Ergodic Theorem holds then, for sufficiently large times TR , the timeaverage of any quantity A
Z TR
1
A =
dt A({pi (t), qi (t)})
(318)
TR 0
69

represents the measured value of a macroscopic quantity and the trajectory

passes arbitrarily close to every point in phase space. If the systems trajectory
dwells in the volume of phase space for time t, then the ratio
t
TR

(319)

has a definite limit which defines the probability that, if the system is observed
at some instant of time, it will be found to have a microscopic state in .
There are a number of systems which are known not to obey the Ergodic
Hypothesis. These include integrable systems, or nearly integrable systems. An
integrable system has a number of conservation laws Bi equal to half the number
of dimensions of phase space. Furthermore, each pair of conserved quantities
must be in involution

Bi , Bj
= 0
(320)
PB

These sets of conservation laws reduce the trajectory of an integrable system

to motion on a 3N -dimensional surface embedded in the 6N -dimensional phase
space. Furthermore, due to the involution condition, the normals to the surface
of constant Bi lay in the surface and define a coordinate grid on the surface
which does not have a singularity. The singularities of coordinate grids define
the topology of the surface. For example, a coordinate grid on the surface of
a sphere has two singularities (one at each pole), whereas a coordinate grid on
a doughnut does not. The trajectories of an integrable system are confined to
surfaces that have the topology of 3N -dimensional tori. Different initial conditions will lead to different tori that are nested within phase space. The motion
on the torus can be separated into 3N different types of periodic modes. Since
these modes have different frequencies the motion is quasi-periodic. It is the
separability of the coordinates that makes integrability a very special property.
The very existence of 3N conserved quantities implies that most of the conserved quantities are microscopic and their values are not directly measurable
by macroscopic means. Hence, they should not be considered as restricting the
available phase space for the macroscopic state. Thus, it should be no surprise
that integrable systems are generally considered to be non-ergodic.
============================================
Example: Circular Billiards
A simple example of an integrable system is given by the circular billiard.
In this case, a particle is free to move within a circular area of the plane. The
particle is confined to the area of radius R since it is specularly reflected by the
perimeter. The spatial paths followed by the particle consists of a succession
of cords. Whenever a cord meets the perimeter, the angle between the cord
and the perimeters tangent is the same for each reflection. The phase space is
70

a
2a

Figure 18: The basic geometry of the scattering for circular billiards.
four-dimensional. However, there are two constant of motion, the energy E and
the angular momentum p . The angle satisfies the equation of motion
=

p
m r2

(321)

The radial motion is described by the pair of equations

pr
m
p2
V0 (r R)
m r3

(322)

where V0 . The second equation can be integrated once w.r.t. t by using

an integrating factor of pr on the left and m r on the right, which introduces
a constant of motion which is the (conserved) energy. The remaining integration w.r.t. t leads to the solution for r(t). From this one sees that the radial
coordinate r(t) performs anharmonic oscillations between the turning point at
a
p
a =
(323)
2E
where 0 < a < R, and the radius R. The ratio of the period for a 2
rotation of to the period of the radial motion may not be a rational number.
Hence, in general the motion is quasi-periodic. Also since the turning point
a is a constant, the paths in the space of Cartesian coordinates are obviously
excluded from a circular region of radius a centered on the origin.
============================================
When a system does not have 3N conserved quantities, the system is nonintegrable. The trajectories have much fewer restrictions and extend to higher
dimensions in phase space. The trajectories are more sensitive to the initial
conditions, so the trajectories are inevitably chaotic.

Figure 19: A spatial path traced out by the billiard ball over a long time interval.
The Kolmogorov-Arnold-Moser (KAM) theorem indicates that there is a
specific criterion which separates ergodic from non-ergodic behaviour.
The trajectories of an integrable system are confined to a doughnut-shaped
surface in phase space, an invariant torus. If the integrable system is subjected
to different initial conditions, its trajectories in phase space will trace out different invariant tori. Inspection of the coordinates of an integrable system shows
that the motion is quasi-periodic. The KAM theorem specifies the maximum
magnitude of a small non-linear perturbation acting an a system (which when
non-perturbed is integrable) for which the quasi-periodic character of the orbits
is still retained. For larger magnitudes of the perturbation, some invariant tori
are destroyed and the orbits become chaotic so ergodicity can be expected to
hold. The KAM Theorem was first outlined by Andrey Kolmogorov9 in 1954.
It was rigorously proved and extended by Vladimir Arnold10 (1963) and by
J
urgen Moser11 (1962).

3.8

Equal a priori Probabilities

The Hypothesis of Equal a priori Probabilities was an assumption assigns equal

probabilities to equal volumes of phase space. This hypothesis, first introduced
by Boltzmann, assumes that the probability density for ensemble averaging over
phase space is uniform. The hypothesis is based on the assumption that dynamics does not preferentially bias some volume elements of available phase space
9 A.N. Kolmogorov, On Conservation of Conditionally Periodic Motions for a Small
Change in Hamiltons Function. Dokl. Akad. Nauk SSSR 98, 527-530, (1954).
10 V.I. Arnold, Proof of a Theorem of A. N. Kolmogorov on the Preservation of Conditionally Periodic Motions under a Small Perturbation of the Hamiltonian. Uspehi Mat. Nauk
18, 13-40, (1963).
11 J. Moser, On Invariant Curves of Area-Preserving Mappings of an Annulus. Nachr.
Akad. Wiss. G
ottingen Math.-Phys. Kl. II, 1-20, (1962).

d of other elements with equal volumes. This hypothesis is consistent with

Liouvilles theorem which ensures that an initially uniform probability distribution will remain uniform at all later times. The Equal a priori Hypothesis is
equivalent to assuming that the in many consecutive rolling a dice, that each of
the six faces of a dice will have an equal probability of appearing.
If the Ergodic Hypothesis holds then, for sufficiently large times TR , the
time-average of any quantity A
Z TR
1
A =
dt A({pi (t), qi (t)})
(324)
TR 0
represents the measured value of a macroscopic quantity and the trajectory
passes arbitrarily close to every point in phase space. If the systems trajectory
dwells in the volume of phase space for time t, then the ratio
t
TR

(325)

has a definite limit which defines the probability that, if the system is observed
at some instant of time, it will be found to have a microscopic state in .
However, the Hypothesis of Equal a priori Probabilities assigns the probability
density for a system to be found in the volume of accessible phase space
to a constant value given by the normalization condition
=

1
a

(326)

where a is the entire volume of accessible phase space. The requirement of

the equality of the time-average and ensemble average requires that the two
probabilities must be equal
t

=
(327)
TR
a
Hence, the Ergodic Hypothesis when combined with the Hypothesis of Equal a
priori probabilities requires that the trajectory must spend equal times in equal
volumes of phase space.
============================================
Example: The One-Dimensional Harmonic Oscillator.
We shall show that for the one-dimensional harmonic oscillator that the time
t spent in some volume = p q of its two-dimensional phase space is
proportional to the volume. That is, the trajectory spends equal time in equal
volumes.
The Hamiltonian is expressed as as
H(p, q) =

p2
M 02 2
+
q
2M
2
73

(328)

The equations of motion have the form

dp
dt
dq
dt

M 02 q

p
M

(329)

The equations of motion for the one-dimensional Harmonic Oscillator can be

integrated to yield
p(t)

= M 0 A cos( 0 t + )

q(t)

= A sin( 0 t + )

(330)

where the amplitude A and initial phase are constants of integration. The
Hamiltonian is a constant of motion, and the accessible phase space is given by
E + E > H(p, q) > E

(331)

which leads to the constraint on the amplitude

E + E >

M 02 2
A > E
2

(332)

From the solution one finds that the orbits are closed and form ellipses in phase
p
pmax

E+DE
E

qmax
q

Figure 20: A typical trajectory for the one-dimensional Classical Harmonic

Oscillator is shown in blue. The initial phase is assumed to be unknown.
The energy is known to within E so the accessible phase space a is the area
enclosed between the two ellipses.
space, which pass arbitrary close to every point in accessible phase space. The
Poincare recurrence time TR is given by
TR =

2
0

(333)

Consider an element of phase space = p q where q p. The

trajectory will spend a time t in this volume element where
t =
=

q
| q |
q M
|p|

(334)

Now, the extent of the volume of phase space at (p, q) is determined from the
energy spread
p p
E =
+ M 02 q q
(335)
M
Since we have assumed that q p, the spread in energy is related to p
via
| p | p
(336)
E =
M
On substituting for M/p into the expression for t we obtain
q p
(337)
E
However, as we have already shown, E is related to the volume of accessible
phase space a via
E
(338)
a = 2
0
Therefore,

q p 2
t =
a
0

=
TR
(339)
a
t =

Hence, we have shown that

p
pmax

E+DE

Dp DG

qmax

Figure 21: The trajectory crosses an element of accessible phase space with
a narrow width q in time t, the height p of the element is determined form
the uncertainty in the energy E.
t

=
(340)
TR
a
which shows that the trajectory spends equal times in equal volumes of phase
space.
This relation is independent of the assumed shape of the volume element
since if we considered a volume for which q p then t is given by
t

=
=

p
| p |
p
M 02 | q |
75

(341)

However, in this case the extent of the volume of accessible phase space at the
point (p, q) is determined from the energy spread
E = M 02 | q | q

(342)

Therefore, one has

q p
(343)
E
which, on relating E to the volume of accessible phase space a , leads to the
same relation
t

=
(344)
TR
a
t =

This shows the result probably does not depend on the specific shape of the
volume of accessible phase space .
This example also illustrates how the average of a property of a system
with unknown initial conditions phases (in this case the initial phase ) can be
thought of either as a time average or as an ensemble average.
============================================
The hypothesis of equal a priori probabilities does provide a reasonable basis for calculating the equilibrium thermodynamic properties of a large number
of physical systems. This anecdotal evidence provides justification for its use.
However, one is lead to suspect that is not really uniform but instead is finely
dispersed throughout the volume of accessible phase space. In our discussion of
the Micro-Canonical Ensemble and everything that follows from it. we shall be
assuming that the Hypothesis of Equal a priori Probabilities is valid.

3.9

The Physical Significance of Entropy

A system can only make a transition from one macroscopic equilibrium state
to another if the external conditions are changed. A change in external conditions, without supplying energy to the system, can be achieved by removing
a constraint on the system. The removal of a constraint usually results in an
increase in N the number of microscopic states available to the system. It is
convenient to introduce a measure of the number of N which is extensive, or
additive. Since N is multiplicative, ln N is additive and represents a measure
of the number of microscopic states corresponding to the macroscopic equilibrium state. The removal of a constraint has the effect that ln N increases, as
does the thermodynamic entropy. Therefore, this argument suggest that the
entropy may be defined by
S = kB ln N
(345)
in which case, the entropy is a measure of the dispersivity of the distribution of
microscopic states. The factor of kB (Boltzmanns constant) is required to give
76

Table 1: Percentage frequency of occurrence of the letters in English language

texts.
a
b
c
d
e
8.17 1.49 2.78 4.25 12.70
f
g
h
i
j
2.23 2.01 6.09 6.97 0.15
k
l
m
n
o
0.77 4.02 2.41 6.75 7.51
p
q
r
s
t
1.93 0.09 5.99 6.33 9.06
u
v
w
x
y
2.76 0.98 2.36 0.15 1.97
z
0.07
-

the entropy the same dimensions as the thermodynamic entropy.

Information Theory
Shannnon12 has rigorously proved that the information content S of a probability distribution function of a random process with M possible outcomes is
given by
M
X
S =
pi ln pi
(346)
i=1

where pi is the probability of the i-th outcome.

Consider sending a message consisting of an ordered string of N values of
the possible outcomes. The outcomes can be considered similar to the letters
of an alphabet in which case the message is a word containing N letters. Like
languages, the letters dont occur with equal frequency, for example in English
language texts the letter e appears most frequently, and those who play the
game scrabble know that the letters q and z occur very infrequently.
The total possible number of messages of length N is just M N . However,
not all messages occur with equal probability, since if the outcome i occurs
with a small probability pi messages in which the outcome i occurs a significant
number of times have very small probabilities of appearing. In the analogy with
words of N letters, some allowed words occur so infrequently that they are never
listed in a dictionary.
12 C.E. Shannon, A Mathematical Theory of Communication. Bell System Tech. J., 27,
379- 423, 623656, (1948).

A typical message of length N could be expected to contain the outcome

i an average of Ni = N pi times. Hence, one can determine the approximate
number of times Ni each outcome i will occur in a typical message. Since the
Ni are fixed, the set of typical messages merely differ in the order that these
outcomes are listed. The number of these typical messages DN can be found
from the number of different ways of ordering the outcomes
N!
DN = QM
i=1

Ni !

(347)

Hence, the dictionary of typical N -character messages (N -letter words) contains DN entries. We could index each message in the dictionary by a number.
Suppose we wish to transmit a message, instead of transmitting the string of
characters of the message we could transmit the index which specifies the place
it has in the dictionary. If we were to transmit this index using a binary code,
then allowing for all possible messages, one would have to transmit a string
binary digits of length given by

N!
log2 DN log2 QM
i=1 N1 !

M
X

pi log2 pi

(348)

i=1

where the last line has been obtained by using Stirlings formula (valid for
large N pi ). For an uniform probability distribution, this number would be just
N log2 M . The difference in these numbers, divided by N is the information
content of the probability distribution function. Shannons Theorem proves this
rigorously.
The Entropy
We shall describe the entropy of a macroscopic state as a phase space average
Z
S = kB
d ln( 0 )
(349)
where the factor of 0 has been introduced to make the argument of the logarithm dimensionless. It is convenient to express 0 for a system of N indistinguishable particles moving in a three-dimensional space as
0 = N ! ( 2 h )N

(350)

since the introduction of this factor and the use of the equal a priori hypothesis
results in the expression
S = kB ln N
(351)
if one identifies the number of accessible microstates as
a
N =
0
78

(352)

A different choice of 0 will result in the entropy being defined up to an additive

constant.
The Entropy and Equilibrium States
The assumption of equal a priori probabilities is only a simplification of the
widely held belief that a systems physical trajectory follows an intricate path
which changes rapidly and is finely spread across the volume of accessible phase
space. The corresponding physical distribution function will evolve with respect
to time, according to Liouvilles theorem. If is expected to describe an equilibrium state S should not evolve.
We shall show that entropy defined by
Z
S(t) = kB
d ({pi , qi } : t) ln ({pi , qi } : t) kB ln 0

(353)

is time independent. The last term is an additive constant added to make the
argument of the logarithm dimensionless and has no affect on our deliberations.
The time derivative of the entropy is given by

Z

dS
= kB
d
ln +
dt
t
t

Z

= kB
d
ln + 1
(354)
t
Using Liouvilles theorem reduces this to

Z
dS
= kB
d , H
ln + 1
dt

Z
3N
X
H
H
= kB
d

ln + 1
pi qi
qi pi
i=1
(355)
The terms linear in the derivatives of can be transformed into factors of , by
integrating by parts. This yields
dS
dt

Z
= kB

3N
X
i=1

H
qi

ln + 1

H
pi

ln + 1
(356)

since the boundary terms vanish. On expanding the derivatives of the terms in
the round parentheses, one finds that some terms cancel

Z
dS
2H
H
= kB
d
ln + 1
+
dt
pi qi
qi pi
79

= kB

2H
H

ln + 1

qi pi
pi qi

Z
H
H
d

qi pi
pi qi

(357)

which on integrating by parts yields

dS
dt

Z
= kB

3N
X

i=1

2H
pi qi

(358)

Hence, the entropy of a state with a time-dependent probability density is constant.

From the above discussion, it is clear that the entropy of a system can
only change if the Hamiltonian of the system is modified, such as by removing
an external constraint. A removal of an external constraint will increase the
volume of phase space to 0a . Furthermore, the assumption of equal a prior
probabilities implies that in the final equilibrium state the probability density
0 will be uniformly spread over the increased available phase space. From the
normalization condition
Z
1 =
d0 0
(359)
one finds that in the final state, the probability distribution function is given by
0 =

1
0a

(360)

Since the entropy is given by

Z
S = kB

d ln

(361)

where 0 is constant and since 0a > a , the entropy will have increased by an
amount given by
0
a
S = kB ln
(362)
a
as expected from thermodynamics.
============================================
Example: Joule Free Expansion
We shall consider the Joule Free Expansion of an ideal gas. The gas is initially enclosed by a container of volume V , but the a valve is opened so that the
gas can expand into an adjacent chamber which initially contained a vacuum.
80

The volume available to the gas in the final state is V 0 . Since the adjacent
chamber is empty, no work is done in the expansion.
The Hamiltonian for an idea gas can be represented by
H =

3N
X
p2i
2m
i=1

(363)

so the sum of the squares of the is restricted by E. The volume of accessible

phase space is given by
a =

3N Z
Y

Z
dpi

dqi

i=1

3N
X
p2i
E
2m
i=1

(364)

The integrations for the spatial coordinates separates from the integration over
the momenta. The integration over the three spatial coordinates for each particle
produces a factor of the volume. The integration over the momenta will produce
a result which depends on the energy f (E) which is independent of the volume.
Hence, the expression for the available phase space has the form
a = V N f (E)

(365)

On recognizing that the particles are indistinguishable, one finds that the entropy is given by
S

= kB ln
= N kB

a
0
ln V + kB ln f (E) kB ln N ! N kB ln( 2 h )
(366)

where 0 is the measure of phase space that is used to define a single microscopic
state. Thus, the change in entropy is given by
0
V
(367)
S = N kB ln
V

The same result may be obtained from thermodynamics. Starting from the
expression for infinitesimal change of the internal energy
dU = T dS P dV + dN

(368)

and recognizing that Joule Free expansion is a process for which

(369)

Therefore, one has

P
dV
(370)
T
and on using the equation of state for the ideal gas P V = N kB T one finds
dS =

dS = N kB

dV
V

(371)

which integrates to yield

S = N kB ln

V0
V

(372)

Hence, the expression for the change in entropy derived by using Statistical Mechanics is in agreement with the expression derived by using Thermodynamics.
============================================

The Micro-Canonical Ensemble

4.1

Classical Harmonic Oscillators

Consider a set of N classical harmonic oscillators described by the Hamiltonian

H

dN
X
p2i
m 02 q 2
H =
+
(373)
2m
2
i=1
The above Hamiltonian may be adopted as a model of the vibrational motion
of the atoms in a solid. However, it is being assumed that the vibrations of the
atoms are independent and are harmonic and all the oscillators have the same
frequency. Furthermore, our treatment will be based on classical mechanics.
We shall consider the system in the Micro-Canonical Ensemble, where the
energy is determined to within an uncertainty E
E > H > E E

(374)

The volume of accessible phase space a is given by the integral

a =

dN Z
Y
i=1

dqi

(E H) (E E H)

dpi

(375)

where (x) is the Heaviside step function. On transforming the coordinates to

pi = m 0 qi , the Hamiltonian can be written in the form
dN

1 X
2 m =1

H =

p2i + p2i

(376)

so the accessible phase space is defined by the inequalities

2mE >

dN
X

p2i

> 2 m ( E E )

(377)

i=1

so
a =

dN
Y
i=1

1
m 0

d
pi

(E H) (E E H)

dpi

(378)

Thus, the area of accessible phase space is proportional

enclosed
p
to the volume
between two 2dN -dimensional hyperspheres of radii 2 m E and 2 m ( E E ).
Therefore, we need to evaluate the volume enclosed by a hypersphere.
The Volume of a d-dimensional Hypersphere.

The volume of a d-dimensional hypersphere of radius R has the form

Vd (R) = C Rd

(379)

where C is a constant that depends on the dimensionality d. The constant can

be determined by comparing two methods of evaluating the integral Id , given
by

Z
Z
Z
d
X
2
Id =
dx1
dx2 . . .
dxd exp
xi
(380)

i=1

In a d-dimensional Cartesian coordinate system, the integral can be evaluated

as a product of d identical integrals
Id

d Z
Y

=
=

dxi exp

x2i

i=1

(381)

Alternatively, one may evaluate the integral in hyperspherical polar coordinates

as

Z
dr rd1 exp

Id = Sd

(382)

where the radial coordinate is defined by

d
X

x2i

(383)

i=1

and Sd is the surface area of a d-dimensional unit sphere. This integral can be
re-written in term of the variable t = r2 as

Z
d2
Sd
2
dt t
exp t
(384)
Id =
2 0
The integration is evaluated as
Id

Sd
2

( d2 )

(385)

where (n + 1) = n! is the factorial function. On equating the above two

expressions, one obtains the equality
d
Sd
d
( ) = 2
2
2

(386)

Hence, we find that the surface area of a unit d-dimensional sphere, Sd , is given
by
d
2
Sd = 2
(387)
( d2 )
84

Using this, one finds that the volume of a d-dimensional sphere of radius R is
given by
Z R
Vd (R) = Sd
dr rd1
0

1
Sd Rd
d
d
2 2
Rd
d ( d2 )

=
=

2
Rd
d
d
(
)
2
2

2
Rd
d
( 2 + 1)

(388)

which is our final result.

The Volume of Accessible Phase Space
The volume of accessible phase space a is proportional
to the volume
enp

closed by two 2dN -dimensional hyperspheres of radius 2 m E and 2 m ( E E ) .

Using the above results, one finds

dN

1
dN
dN
dN
a =
(2mE)
( 2 m ( E E ) )
(389)
(dN + 1) m 0
where the factor of

1
m 0

dN
(390)

is the Jacobian for the coordinate transformations. The number of accessible

microstates N is then defined as
N

a
(2
h )dN

1
(dN + 1)

E
h 0

E
E

dN
(391)

The second factor in the square brackets is extremely small when compared to
unity since the term in the parenthesis is less than unity and the exponent is
extremely large. Therefore, it can be neglected

dN
E
1
(392)
N
(dN + 1) h 0
This implies that, for sufficiently high dimensions, the volume of the hypersphere is the same as the volume of the hypershell.

Derivation of Stirlings Approximation

The Gamma function is defined by the integral

Z
n
dx x exp x
(n + 1) =

(393)

which, for integer n coincides with n!. This can be verified by repeated integration by parts

Z
n
dx x exp x
(n + 1) =
0

Z
n
=
dx x
exp x
x
0

Z

n
n=1

= x exp x + n
dx x
exp x
0
0

Z
= n
dx xn1 exp x
0

n (n)
(394)

which together with

Z
(1) =

dx exp

= 1

(395)

leads to the evaluation of the integral as

Z
n
(n + 1) =
dx x exp x = n!

(396)

for integer n.
Stirlings approximation to ln n! can be obtained by evaluating the integral
using the method of steepest descents.

Z
n! =
dx exp x + n ln x
(397)
0

The extremal value of x = xc is found from equating the derivative of the

exponent to zero
n
= 0
(398)
1 +
xc
This yields xc = n. On expanding the integrand to second order in ( x xc ),
one has

Z
n
2
n!
dx exp xc + n ln xc exp 2 ( x xc )
(399)
xc
0
86

On extending the lower limit of the integration to , one obtains the approximation

Z
n
2
dx exp xc + n ln xc exp
n!
( x xc )
2 x2c

r

2 x2c
=
exp xc + n ln xc
(400)
n
This is expected to be valid for sufficiently large n. On setting xc = n, one
has

n!
2 n exp n + n ln n
(401)
Stirlings approximation is obtained by taking the logarithm, which yields
1
ln( 2 n )
(402)
2
Stirlings approximation will be used frequently throughout this course.
ln n! = n ln n n +

The Entropy
The entropy S is given by
S

kB ln N

d N kB ln

E
h 0

kB ln(dN )!

(403)

The logarithm of N ! can be approximated for large N by Stirlings approximation. This can be quickly re-derived by noting that
ln N ! = ln N + ln(N 1) + ln(N 2) + ln 2 + ln 1

(404)

For large N , the sum on the right hand side can be approximated by an integral
Z N
ln N !
dx ln x
0

N

x ( ln x 1 )
0

N ( ln N 1 )

(405)

which results in Stirlings approximation

ln N ! = N ( ln N 1 )

(406)

Using Stirlings approximation in the expression for the entropy S, one obtains

E
kB d N ( ln(dN ) 1 )
S(E, N ) = d N kB ln
h 0

E
= d N kB ln
+ kB d N
(407)
d N h 0
87

which shows that the entropy is an extensive monotonically increasing function

of E. This is the fundamental relation. In the Micro-Canonical Ensemble, the
energy E is the thermodynamic energy U .
The temperature is defined by the derivative

1
S
=
T
U N

(408)

which yields
1
d N kB
=
T
U
Hence, we find that the internal energy U is given by

(409)

U = d N kB T

(410)

which shows that each degree of freedom carries the thermodynamic energy
kB T . The specific heat at constant volume is then found as
CV = d N kB

(411)

which is Dulong and Petits law13 . Dulong and Petits law describes the hightemperature specific heat of solids quite well, but fails at low temperatures
where the quantum mechanical nature of the solid manifests itself.

4.2

An Ideal Gas of Indistinguishable Particles

The Hamiltonian for an ideal gas is written as the sum of the kinetic energies
dN
X
p2i
H =
2m
i=1

(412)

The gas is contained in a volume V with linear dimensions L, such that

V = Ld

(413)

where d is the number of dimensions of space. In the Micro-Canonical Ensemble,

the energy is constrained to an interval of width E accorsing to the inequality
E > H > E E

(414)

The volume of accessible phase space a is given by the multiple integral

a =

dN Z
Y
i=1

dqi

dpi

(E H) (E E H)

(415)

13 A.-T. Petit and P.-L. Dulong, Recherches sur quelques points importants de la Th
eorie
de la Chaleur, Annales de Chimie et de Physique 10, 395-413 (1819).

The integration over the coordinates can be performed, leading to the expression
a

dN
Y

= L

(E H) (E E H)

dpi

dN Z
Y

(E H) (E E H)

dpi

i=1

= VN

i=1
dN

dN Z
Y

(E H) (E E H)

dpi

(416)

i=1

The step functions constrain the momenta such that

dN
X

2mE >

p2i > 2 m ( E E )

(417)

i=1

Thus, the integration over the momenta is equal

p contained be to the volume
2 m ( E E ) .
tween two dN -dimensional hyperspheres of radii 2 m E and
Using the expressions for the volume of a dN dimensional hypersphere
VdN (R)

dN
2

( dN
2 + 1)

RdN

(418)

which yields
a

=
=

dN
2

2mE

( dN
2 + 1)

dN

2

dN
2

( dN
2 + 1)

2mE

Since

1 exp

dN

2
E
1
1
E

d N E
1 exp
(419)
2E

d N E
2E

(420)

the volume of accessible phase space is given by

= V

dN
2

( dN
2 + 1)

dN
2

2mE

(421)

However, for an ideal gas of identical particles, we have to take into account that
specifying all the momenta pi and coordinates qi of the N particles provides us
with too much information. Since the N particles are identical, we cannot distinguish between two points of phase space that differ only by the interchange of
identical particles. There are N ! points corresponding to the different labelings
of the particles. These N ! points represent the same microstates of the system.
To only account for the different microstates, one must divide the volume of
89

accessible phase space by N !. Hence, the number of microscopic states N is

given by
N

a
N ! ( 2 h )dN

VN
N ! ( dN
2 + 1)

mE
2 h2

dN
2
(422)

The entropy S is given by the expression

=
=
=

kB ln Na

dN

2
VN
mE
kB ln
N ! ( dN
2 h2
2 + 1)

dN

2
VN
mE
kB ln
N ! ( dN
2 h2
2 )!

(423)

On using Stirlings formulae

ln N ! = N ( ln N 1 )

(424)

valid for large N , one finds

mE
d
V
dN
kB ln
+ N kB +
N kB (425)
S = N kB ln
+
2 d N
N
2
2
2 h 2
or

S = N kB ln

V
N

mE
d h2 N

d + 2
2

N kB

(426)

On identifying E with U , the thermodynamic energy, one has the fundamental

relation S(U, V, N ) from which all thermodynamic quantities can be obtained.
The intensive quantities can be obtained by taking the appropriate derivatives. For example, the temperature is found from

S
1
=
(427)
T
U V,N
which yields
1
d N kB
=
T
2U
Hence, we have recovered the equation of state for an ideal gas
U =

dN
kB T
2
90

(428)

(429)

Likewise, the pressure is given by

P
=
T

S
V

(430)
U,N

which yields
P
N kB
=
T
V
which is the ideal gas law. The chemical potential is found from

=
T
N U,V

(431)

(432)

which yields

= kB ln

V
N

mU
d h2 N

d + 2
2

kB +

d + 2
2

kB
(433)

Since

kB T
U
=
dN
2

(434)

one has

= kB ln
T

V
N

m kB T
2 h2

d2
(435)

This can be re-written as

= kB ln P + f (T )
T

(436)

where P is the pressure and f (T ) is a function of only the temperature T .

On substituting the equation of state
U =

dN
kB T
2

(437)

into the expression for the entropy, one finds

S = N kB ln

V
N

m kB T
2 h2

d + 2
2

N kB

(438)

This is the Sackur-Tetrode formula for the entropy of an ideal gas.

h
1

( 2 m kB T ) 2

(439)

The SackurTetrode equation was derived independently by Hugo Martin Tetrode

and Otto Sackur, using Maxwell-Boltzmann statistics in 1912.

Note that the factor

(440)

( 2 m kB T ) 2

has the character of an average thermal momentum of a molecule. We can define

via
h
=
(441)
1
( 2 m kB T ) 2
as a thermal de Broglie wave length associated with the molecule. The entropy
can be re-written as

d + 2
V
+
S = N kB ln
N kB
(442)
N d
2
which shows that the entropy S is essentially determined by the ratio of the
volume per particle to the volume d associated with the thermal de Broglie
wavelength. The classical description is approximately valid in the limit
V
1
N d

(443)

where the uncertainties in particle positions are negligible compared with the
average separation of the particles. When the above inequality does not apply,
quantum effects become important.
The Momentum Distribution of an atom in an Ideal Gas
The probability that a particle has momentum of magnitude |p| can be obtained using the Micro-Canonical Ensemble. The probability is found from the
probability distribution by integrating over the coordinates of all the particles
and integrating over all the momenta of the other particles. Thus, we find the
momentum probability distribution function P (|p|) via
P (|p|)

dN Z L
1 Y
dqi
a i=1
0

2mE

dN Z
Y

p2i

dpi

i=d+1

dN
X

dN
X
2
2 m ( E E )
pi
i=1

i=1

(444)
which is evaluated as

P (|p|)

d(N21)

2 m E |p|

( dN
2 + 1)
2 ( d(N21) + 1)
d

dN
2

2mE

( dN
2 + 1)

2 ( d(N21) + 1)
d

2mE

dN
2

2
d(N 1) | p |
2
2 m E

d2 exp

d(N21)
d2

| p |2
2 m E

2mE

dN
4mE

exp

d(N 1) | p |2

2
2mE

(445)

Which is the desired result. On using the thermodynamic relation for the energy
U =

dN
kB T
2

(446)

one obtains the Maxwell distribution

P (|p|) =

exp

2 m kB T

| p |2

2 m kB T

(447)

which is properly normalized.

4.3

Spin One-half Particles

A system of spin one-half particles is described by a discrete not continuous

phase space. The discreteness, is due to the discrete nature of the quantum
mechanical eigenvalues.

i=1

i=2

i=3

i=4

i=N-1

i=N

Figure 22: A set of N spins, in the presence of a uniform applied magnetic field
H z directed along the z-axis. The spins are quantized along the z-direction, so
their S z components are given by h/2.
Consider a set of N spin one-half particles, in an applied magnetic field. The
particles may be either aligned parallel or anti-parallel to the applied magnetic

field H z . The Hamiltonian describing the spins is given by

H = g B

N
X

Siz H z

(448)

i=1

where S z = 12 . If one defines the total magnetic moment as

M z = g B

N
X

Siz

(449)

i=1

one finds that the energy is determined by the magnetization via

H = M z Hz

(450)

Therefore, if the energy has a fixed value E, the accessible microstates are
determined by the fixed value of the magnetization M z . We shall introduce the
dimensionless magnetization as
m = M z /B

(451)

Hence, for a fixed energy there are (N + m)/2 spin-up particles and (N m)/2
spin-down particles. The number of ways of selecting (N + m)/2 particles out
of N particles as being spin up is given by
N!

N m
2

(452)

since there are N ways of selecting the first particle as being spin up, (N 1)
ways of selecting the second particle as being spin up, etc. This process continues
until the (N + m)/2 spin-up particle is chosen and this can be selected in (N +
1 (N + m)/2) ways. Since the number of choices is multiplicative, the product
of the number of choices give the result above. However, not all of these choices
lead to independent microstates. Interchanges of the (N +m)/2 spin up particles
between themselves lead to identical microstates. There are ((N + m)/2)! such
interchanges. The total number of discrete microstates with magnetization m
is found by diving the above result by ((N + m)/2)!. The end result is N given
by
N!

N =
(453)
N +m
N m
!
!
2
2
The entropy S is found from
S = kB ln N

(454)

which is evaluated as
S

N!

N +m
N m
!
!
2
2

N m
N +m
! kB ln
!
ln N ! kB ln
2
2

N +m
N +m
N m
N m
N ln N kB
ln
kB
ln
2
2
2
2
(455)

kB ln

where we have used Stirlings approximation in the last line. Hence, the entropy
has been expressed in terms of n and m or equivalently in terms of E and N ,
This is the fundamental relation, from which we may derive all thermodynamic
quantities.
1

S(E)/NkB

1/T

0.5

0
-1

-0.5

0.5

-1
0
-1

-0.5

0.5

-2

E/NBH

Figure 23: The entropy S(E) as a function of entropy E for a model of a system
of spins in a magnetic field is shown in (a). Since the energy is bounded from
above, the entropy is not a monotonically increasing function of E. This has the
consequence that T can become negative when there is population inversion, as
is indicated in (b).
On identifying the fixed energy with the thermodynamic energy, one may
use the definition of temperature

S
1
=
(456)
T
U N
or
1
T

=
=

S
m
m N U

1
S

m N B Hz
95

(457)
Therefore, one has
B H z
kB T

=
=

1
N +m
1
N m
ln

ln
2
2
2
2

1
N +m
ln
2
N m

This can be exponentiated to yield

2 B H z
N +m
exp
=
kB T
N m

(458)

(459)

which can be solved for m as

= N

exp[ 2 kBB TH ] 1

exp[ 2 kBB TH ] + 1

B H z
= N tanh
kB T

(460)

Hence, the magnetization is an odd function of H z and saturates for large fields
and low temperatures at N . Finally, we obtain the expression for the thermal
average of the internal energy

B H z
U = N B H z tanh
(461)
kB T
which vanishes as the square of the field H z in the limit of zero applied field,
since the Hamiltonian is linear in H z and since the magnetization is expected
to vanish linearly as H z vanishes.
Zero Applied Field
We shall now determine the magnetization probability distribution function
in the limit of zero applied magnetic field. The spins of the particles may either
be aligned parallel or anti-parallel to the axis of quantization. There are a total
of 2N possible microstates. Hence, for zero applied field
N = 2 N

(462)

Since all microstates are assumed to occur with equal probabilities, the probability of finding a system with magnetization m is given by
P (m)

1
N!

N N +m
N m
!
!
2
2

N
1
N!

2
N +m
N m
!
!
2
2
96

(463)

which is normalized to unity. On using the more accurate form of Stirlings

approximation that we found using the method of steepest descents
1
ln( 2 N ) + N ln N N
2
1
1
ln( 2 ) + ( N +
) ln N N
2
2

ln N !

(464)

in ln P (m), one obtains

ln P (m)

1
1
ln( 2 ) + ( N +
) ln N
2
2
m
N
m
N + 1
(1 +
) ln
(1 +
)

2
N +1
2
N
N + 1
m
N
m

(1
) ln
(1
)
2
N +1
2
N
N ln 2

(465)

On expanding in powers of m, the expression simplifies to

ln P (m)

1
m2
1
ln N
ln( 2 ) + ln 2
+ ...
2
2
2N

(466)

Hence, one finds that the magnetization probability distribution function P (m)
is approximated by a Gaussian distribution
r

2
m2
P (m)
exp
(467)
N
2N
Therefore, the most probable value of
the magnetization is m = 0 and the
width of the distribution is given by N . This is small compared with the
total range of the possible magnetization which is 2 N . Most of the microstates
correspond to zero magnetization. This can be seen as total number of available
microstates is given by
2N
(468)
and since the number of states with zero magnetization is given by
r
2
N!

2N
(N/2)! (N/2)!
N

(469)

Thus, this implies that, for H z = 0, the relative size of the fluctuations in the
magnetization is small.

4.4

The Einstein Model of a Crystalline Solid

The Einstein Model of a crystalline solid considers the normal modes of vibrations of the lattice to be quantized, and it assumes that the frequencies of the
all the normal modes are identical and equal to 0 . It is a reasonable approximation for the optical phonon modes in a solid. For a solid with N unit cells,
97

0.3

Exact
Gaussian

P(m)

Exact
Gaussian

P(m)

0.2

N=30

N=10
0.1

0.1

0
-10

-5

-30

-20

-10

Figure 24: The exact probability distribution P (m) of the magnetic moment
m for a system of N spins, and the approximate Gaussian distribution. After
scaling m with the size of the system N , the width of the distribution decreases
on increasing N .
where there are p atoms per unit cell, one expects there to be N 0 = 3(p 1)N
optic modes. The remaining 3N modes are expected to be acoustic modes.
Consider a set of N 0 quantum mechanical harmonic oscillators in the MicroCanonical Ensemble. Each oscillator has the same frequency 0 . The total
energy E is given by the sum of the energies of each individual quantum oscillator

N0
X
1
E =
h 0 ni +
(470)
2
i=1
where ni is the number of quanta in the i-th oscillator. The possible values of
ni are the set of 0, 1, 2, . . . , . The last term in the round parenthesis is the
zero-point energy of the i-th oscillator.
If we subtract the zero-point energy for each quantum oscillator, then the energy Eexc available to distribute amongst the N 0 quantum mechanical harmonic
oscillators is given by

h 0
(471)
Eexc = E N 0
2
The excitation energy Eexc is to be distributed amongst the N 0 quantum oscillators
N0
X
Eexc =
h 0 ni
(472)
i=1

The total number of quanta Q available to the entire system is given by

N
X
Eexc
=
ni = Q
h 0
i=1

(473)

which acts as a restriction on the possible sets of values of ni . Each possible

distribution of the Q quanta is described by a set of integer values, {ni }, which
uniquely describes a microstate of the system. In any allowed microstate the
values of {ni } are restricted so that the number of quanta add up to Q.
There are Q quanta which must be distributed between the N 0 oscillators.
We shall count all the possible ways of distributing the Q quanta among the
N 0 oscillators. Let us consider each oscillator as a box and each quanta as a
marble. Eventually, the marbles are to be considered as being indistinguishable,
so interchanging any number of marbles will lead to the same configuration. We
shall temporarily suspend this assumption and instead assume that the marbles
could be tagged. Later, we shall restore the assumption of indistinguishability.

n1=0

n2=1

n3=0

n4=3

n5=1

n6=2

Figure 25: One microscopic state of a system in which Q quanta have been
distributed amongst N 0 oscillators (Q = 7, N 0 = 6).
The Number of Distinguishable Ways
The number of ways of putting Q marbles in N 0 boxes can be found by
arranging the boxes in a row. In this case, a box shares a common wall with
its neighboring boxes so there are N 0 + 1 walls for the N 0 boxes. If one considers both the walls and marbles as being distinguishable objects, then in any
distribution of the marbles in the boxes there are Q + N 0 + 1 objects in a row.
If there are ni marbles between two consecutive walls, then that box contains
ni marbles. If there are two consecutive walls in a distribution, then that box
is empty. However, the first object and the last object are always walls, so
really there are only Q + N 0 1 objects that can be re-arranged. Therefore, the
total number of orderings can be found from the number of ways of arranging
Q + N 0 1 objects in a row. This can be done in
(Q + N 0 1)!

(474)

number of ways. This happens since there are Q + N 0 1 ways of selecting the
first object. After the first object has been chosen, there are Q + N 0 2 objects
that remain to be selected, so there are only Q + N 0 2 ways of selecting the
second object. Likewise, there are Q + N 0 3 ways of choosing the third object,
and this continues until only the last object is unselected, in which case there
is only one possible way of choosing the last object. The number of possible
arrangements is given by the product of the number of ways of making each
independent choice. Thus, we have found that there are (Q + N 0 1)! possible
ways of sequencing or ordering (Q + N 0 1) distinguishable objects.
The Number of Indistinguishable Ways
We do need to consider the walls as being indistinguishable and also the
marbles should be considered as indistinguishable. If we permute the indistinguishable walls amongst themselves, the ordering that results is identical to the
initial ordering. There are (N 0 1)! ways of permuting the N 0 1 walls amongst
themselves. Hence, we should divide by (N 0 1)! to only count the number of
orderings made by placing the marbles between indistinguishable walls. Likewise, if one permutes the Q indistinguishable marbles, it leads to an identical
ordering, and there are Q! such permutations. So we have over-counted the
number of orderings by Q!, and hence we also need to divide our result by a factor of Q!. Therefore, the total number of inequivalent ways N of distributing
Q indistinguishable marbles in N 0 boxes is given by
N =

(N 0 + Q 1)!
(N 0 1)! Q!

(475)

This is equal to the total number of microstates N consistent with having a

total number of quanta Q distributed amongst N 0 oscillators.
The Entropy
In the Micro-Canonical ensemble, the entropy S is given by the logarithm
of the number of accessible microstates N
S = kB ln N
On substituting the expression for N , one obtains

0
(N + Q 1)!
S = kB ln
(N 0 1)! Q!

= kB ln(N 0 + Q 1)! ln(N 0 1)! ln Q!

(476)

(477)

On using Stirlings approximation

ln N ! N ( ln N 1 )
100

(478)

valid for large N , for all three terms, after some cancellation one has

S kB (N 0 +Q1) ln(N 0 +Q1) ( N 0 1 ) ln( N 0 1 ) Q ln Q (479)
which is valid for large Q and N 0 . It should be recalled that Q = (Eexc /h 0 ),
so S is a function of the total energy E. The above relation between the entropy
and the total energy is the same as the relation between the entropy and the thermodynamic energy U . The expression for S in terms of U is theFundamental
Relation for the thermodynamics of the model.

S(E)/NkB

0
0

E/Nh0-1/2
Figure 26: The entropy S(E) versus the dimensionless excitation energy, for the
Einstein model for the specific heat a solid.
We shall now consider the system to be in thermal equilibrium with a thermal
reservoir held at temperature T . The temperature is defined by

1
S
(480)
=
T
U N
which yields
1
T

=
=
=
=
=

S
Q

Q
U

ln(N + Q 1) ln Q

kB
0
ln(N + Q 1) ln Q
h 0

0

kB
N +Q1
ln
h 0

Q

h 0 ( N 0 1 ) + Uexc
kB
ln
h 0

Uexc

101

(481)

where it is now understood that the energy is the thermodynamic value U that
is determined by T . On multiplying by h
0 /kB and then exponentiating the
equation, one finds

h 0 ( N 0 1 ) + Uexc
h 0

=
(482)
exp
kB T
Uexc
or on multiplying through by Uexc

h 0

Uexc exp
= h 0 ( N 0 1 ) + Uexc
kB T

(483)

This equation can be solved to yield Uexc as a function of T

Uexc =

h 0 ( N 0 1 )
exp[ khBT0 ] 1

(484)

We can neglect the term 1 compared with N 0 since, in our derivation we have
assumed that N 0 is very large. Since
0

Uexc =

N
X

h 0 ni

(485)

i=1

we have found that the thermodynamic average number of quanta n of energy

h0 in each quantum mechanical harmonic oscillator is given by
n =

1
exp[

h
0
kB T

] 1

(486)

If we were to include the zero point energy, then the total thermodynamic energy
is given by
0

N
X

h 0

ni +

i=1

1
2

N0
X
2
h 0
+
1
=
2
exp[ khBT0 ] 1
i=1

N0
h

X
h 0 exp[ kB T0 ] + 1
=
2
exp[ khB0T ] 1
i=1

h 0
h 0
0
= N
coth
2
2 kB T

The specific heat of the Einstein model can be found from

U
CV =
T V
102

(487)

(488)

But Lets Take a Closer Look:

High T beha
Reasonable
agreement w
experiment

Low T beha
CV 0 too
as T 0 !
Figure 27: The specific heat of diamond compared with the results of the Einstein Model. The parameter E = h
0 /kB is a characteristic temperature
that has been assigned the value E = 1320 K. [After A. Einstein, Ann. Physik
22, 180-190 (1907).]
which yields
CV = N 0 kB

0
h
kB T

exp[
( exp[

h
0
kB T

] 1 )2

(489)

The specific heat tends to N 0 kB for temperatures kB T h 0 , as is expected

classically. However, at low temperatures, defined by kB T h 0 , the specific
heat falls to zero exponentially
2

h 0
h 0
0
C V N kB
exp
(490)
kB T
kB T
Therefore, the specific heat vanishes in the limit T 0 in accordance with
Nernsts law. However, the specific heat of most materials deviate from the
prediction of the Einstein model at low temperatures.

4.5

Vacancies in a Crystal

Consider a crystal composed of N identical atoms arranged in a periodic lattice. If an atom is on a proper atomic site, then it has an energy which we
shall define to have a constant value denoted by . If an atom moves to
an interstitial site, it has an energy of zero. This is because it may diffuse to
the surface and escape from the crystal. Alternatively, the excitation energy
required to unbind an atom from its site and, thereby create a vacancy is given
by . We are considering the number of vacancies to be much smaller than
the number of lattice sites, so that we can neglect the possibility that two vacancies sit on neighboring lattice sites, so we can neglect any effects of their
103

Figure 28: A schematic depiction of a crystalline solid composed of N atoms

which contains Nv vacancies.
interactions. The number of possible vacancies ni on the single lattice site i, is
restricted to have values zero or one. That is, there either is a vacancy or there
is not. Also, we should note that the total number of vacancies is not conserved.
Let us consider a lattice with Nv vacancies. This state has an energy which
is greater than the energy of a perfect lattice by the amount U = Nv . We
should note that the vacancies are indistinguishable, since if we permute them,
the resulting state is identical. The number of distinct ways of distributing Nv
indistinguishable vacancies on N lattice sites is given by

N!
N =
(491)
Nv ! ( N Nv )!
This is just the number of ways of choosing the lattice sites for Nv distinguishable vacancies N !/(N Nv )!, divided by the number of permutations of the
vacancies Nv !. Dividing by Nv ! then just counts the vacancies as if they were
indistinguishable.
In the Micro-Canonical Ensemble, the entropy is given by
S

= kB ln N

(492)

which on using Stirlings approximation, yields

S kB N ln N Nv ln Nv ( N Nv ) ln( N Nv ) (493)
which is a function of U since U = Nv .
The energy can be expressed in terms of temperature, by using the relation

S
1
=
(494)
T
U N
104

since the entropy is a function of energy. This yields

1
S
Nv
=
T
Nv
U

Nv
= kB ln( N Nv ) ln Nv
U

kB
=
ln( N Nv ) ln Nv

kB
N Nv
=
ln

Nv

(495)

After multiplying by /kB and exponentiating, the expression can be inverted

to give the number of vacancies
Nv =

exp[ kB

] + 1

(496)

which shows that the average number of thermally excited vacancies on a site
is given by
Nv
1
=
(497)

N
exp[ kB T ] + 1
The thermodynamic energy U at a temperature T is given by the expression
U =

N
exp[ kB T ] + 1

At low temperatures, kB T , this reduces to zero exponentially.

U = N exp
kB T

(498)

(499)

At high temperatures (where the approximate model is not valid) half the lattice sites would host vacancies.
The specific heat due to the formation of vacancies is given by the expression
2

sech2
(500)
C = N kB
2 kB T
2 kB T
which vanishes exponentially at low T as is characteristic of systems with excitation gaps in their excitation spectra. At high temperatures, the specific heat
vanishes as the inverse square of T , characteristic of a system with an energy
spectrum bounded from above. This form of the specific heat is known as a
Schottky anomaly or Schottky peak. The above expression has been derived
from the configurational entropy of the vacancies. In real materials, there will
also be a vibrational entropy since vacancies will cause local phonon modes to
form.
105

0.5

C/NkB

0.4

0.3

0.2

0.1

0
0

kBT/e
Figure 29: The Schottky specific heat versus temperature of a model of vacancies
in a crystalline solid composed of N atoms.

The Canonical Ensemble

The Canonical Ensemble describes a closed system that is divided into two
parts, each with a fixed number of particles and fixed volumes. However, the
two subsystems can exchange energy with each other. One subsystem is the
system which is the focus of our interest. The second subsystem is assumed to
be much larger than the system of interest and is known as the environment.
The properties of the environment will not be of direct interest and its main
role will be to act as a thermal reservoir which absorbs or supplies energy to
the system of interest. The distribution function for the subsystem of interest
can be derived from the Micro-Canonical Probability Distribution Function for
the total system.

5.1

The Boltzmann Distribution Function

The total energy of the complete system ET is partitioned into the energy of
our subsystem E and that of the thermal reservoir ER
ET = ER + E

(501)

where the interaction energy between the system and the environment has been
assumed to be negligible. The infinitesimal volume element of total phase space
dT is also assumed to be factorizable in terms of the products of the volume
elements of the thermal reservoir dR with the volume element of our subsystem
d. This assumes that every degree of freedom for the total system can be
uniquely assigned either to the thermal reservoir or to the system of interest.
Hence, we assume that
dT = dR d
(502)
106

The probability dpT of finding the total system in the volume element of phase
space d is described by the constant Micro-Canonical Distribution Function
mc
dpT = mc dR d
(503)
The probability dp for finding the subsystem in the phase space volume element
d associated with the energy H = E is found by integrating over all the
phase space of the reservoir, consistent with the reservoir having the energy
HR = ET E. Hence,
dp = mc R (ET E) d

(504)

where R (ET E) is the volume of phase space accessible to the reservoir.

However, the entropy of the reservoir is related to the volume of its accessible
phase space via

R (ET E) = R,0 exp SR (ET E)/kB
(505)
where R,0 is the volume of phase space associated with one microscopic state of
the reservoir. Hence, the probability density for the system of interest is given
by

dp
= mc R,0 exp SR (ET E)/kB
(506)
d
where mc is just a constant. The energy of our subsystem E is much smaller
than the energy of total system, ET since the energy is extensive and the thermal
reservoir is much larger than our subsystem. Hence, it is reasonable to assume
that the reservoirs entropy can be Taylor expanded in powers of E and also
that the second and higher-order terms can be neglected. That is

SR (E)
SR (ET E) = SR (ET )
E + ...
(507)
E ET
but one recognizes that the derivative of the reservoirs entropy w.r.t. energy is
defined as the inverse of the temperature T of the thermal reservoir

SR (E)
1
=
(508)
E ET
T
Hence, the probability distribution function c for finding the system in some
region of its phase space, as described in the Canonical Ensemble, depends on
the energy of the system E via

dp
0 = mc R,0 0 exp SR (ET )/kB exp E
d

1
=
exp E
(509)
Z
107

where Z is a constant and 0 is the volume of phase space of the system which
is used to define a single microscopic state. The factor mc R,0 0 is a dimensionless constant which is independent of the specific point of the systems
phase space, as is the first exponential factor. It is to be recalled that the region
of phase space d under consideration corresponds to a specific value of the systems energy E, hence one can express the Canonical Probability Distribution
Function as

1
exp H({pi , qi })
(510)
c ({pi , qi }) 0 =
Z
which depends on the point {pi , qi } of the systems phase space only via the
value of the systems Hamiltonian H({pi , qi }). The dimensionless normalization
constant Z is known as the Canonical Partition Function. The normalization
condition

Z
dp
1 =
d
d
Z
=
d c ({pi , qi })

Z
d 1
=
exp H({pi , qi })
(511)
0 Z
can be used to express the Canonical Partition Function Z as a weighted integral
over the entire phase space of our system

Z
d
Z =
exp H({pi , qi })
(512)
0
where the weighting function depends exponentially on the Hamiltonian H.
Hence, in the Canonical Ensemble, the only property of the environment that
actually appears in the distribution function is the temperature T . The distribution function c ({pi , qi }) is known as the Boltzmann Distribution Function.
In the Canonical Ensemble, averages of quantities A({pi , qi }) belonging solely
to the system are evaluated as

Z
1
d
A =
A({pi , qi }) exp H({pi .qi })
(513)
Z
0
where the range of integration runs over all the phase space of our system,
irrespective of the energy of the element of phase space. In the Canonical Distribution Function, the factor that depends exponentially on the Hamiltonian
replaces the restriction used in the Micro-Canonical Ensemble where integration
only runs over regions of phase space which corresponds to a fixed value of the
energy E.
The Relation between the Canonical Partition Function and the Helmholtz
Free-Energy

108

If the partition function is known, it can be used directly to yield the thermodynamic properties of the system. This follows once the partition function
has been related to the Helmholtz Free-Energy F (T, V, N ) of our system via

Z = exp F
(514)
This identification can be made by recalling that the partition function is related to the Micro-Canonical Distribution Function mc and the entropy of the
thermal reservoir with energy ET via

1
= T,0 mc exp SR (ET )/kB
Z

T,0
exp SR (ET )/kB
=
(515)
T (ET )
where the products of the volumes of phase space representing one microscopic
state of the reservoir R,0 and one microscopic state of the subsystem 0 has
been assumed to be related to the volume of phase space T,0 representing one
microscopic state of the total system by the equation R,0 0 = T,0 . The
second line follows from the relation between the Micro-Canonical Distribution
Function of the total system with the entropy evaluated at ET . However, for
the total system, one has

T,0
= exp ST (ET )/kB
T (ET )

= exp ( SR (ET U ) + S(U ) )/kB
(516)
where we have used the fact that the thermodynamic value of the entropy is
extensive and the thermodynamic entropy of the subsystem is evaluated at
the thermodynamic value of its energy U . (One expects from consideration of
the maximization of the entropy that the thermodynamic energy U should be
equal to the most probable value of the energy. However, as we shall show, the
thermodynamic energy also coincides with the average energy E.) On combining
the above two expressions, one finds that

1
= exp ( SR (ET U ) SR (ET ) + S(U ) )/kB
(517)
Z
which on Taylor expanding the first term in the exponent in powers of the
relatively small average energy of the system U yields

1
= exp U S(U )/kB
(518)
Z
where the higher-order terms in the expansion have been assumed negligible.
Since the Helmholtz Free-Energy of the system is described as a Legendre transformation of the systems energy U (S, V, N )
F = U T S(U )
109

(519)

then F is a function of the variables (T, V, N ). Hence, since recognizes that the
Canonical Partition Function is related to the Helmholtz Free-Energy F of the
subsystem of interest via

Z = exp F
(520)
it is also a function of the variable (T, V, N ). For thermodynamic calculations,
it is more convenient to recast the above relation into the form
F = kB T ln Z

(521)

The Equality between the Average and the Thermodynamic Energies.

The above analysis is completed by identifying the thermodynamic energy
U with the average energy E. First we shall note that within the Canonical
Ensemble, the average energy E is defined as

Z
1
d
E =
H({pi , qi }) exp H({pi .qi })
(522)
Z
0
which can be re-written as a logarithmic derivative of Z w.r.t. , since the
numerator of the integrand is recognized as the derivative of Z w.r.t.
Z

1
d
E =
exp[ H({pi .qi })
Z
0
1 Z
=
Z
ln Z
=
(523)

However, ln Z is also related to the value of the Helmholtz Free-Energy F , so

one has
E

=
=
=
=

(F )

F
F +

F
F T
T
F + T S

(524)

where F is the Helmholtz Free-Energy of thermodynamics and the thermodynamic entropy S has been introduced via

F
(525)
S =
T V,N
110

Hence, since the Free-Energy and the thermodynamic energy are related via
F = U T S, one finds that
E = U
(526)
This shows the thermodynamic energy U coincides with the average energy E
when calculated in the Canonical Ensemble.

5.2

The Equipartition Theorem

5.3

The Ideal Gas

An ideal gas of N particles moving in a d-dimensional space is described by the

Hamiltonian
dN
X
p2i
(527)
HN =
2m
i=1
and the particles are constrained to move within a hypercubic volume with linear dimensions L.
The partition function ZN is given by
ZN

Z L
dN Z
Y
1
dpi
dqi exp HN
=
N! ( 2
h )dN i=1

(528)

Since the Hamiltonian is the sum of independent terms, the expression for ZN
can be expressed as a product of dN terms
ZN

=
=

Z L

dN Z
Y
p2i
1
dp
exp

dq
i
i
N! ( 2
h )dN i=1
2m

0

1
dN
Y
2m 2
1
L
N! ( 2
h )dN i=1

1
dN

1
2m 2
L
N! ( 2
h )dN

dN
V N m kB T 2
(529)
N!
2
h2

On introducing the thermal de Broglie wave length , via

h
1

( 2 m kB T ) 2

one finds
ZN =

1
N!

111

V
d

(530)

N
(531)

Thermodynamic quantities can be obtained by recalling that the Helmholtz

Free-Energy is given by
F = kB T ln ZN
(532)
and by using Stirlings approximation
ln N ! = N ln N N

(533)

which yields

F = N kB T ln

eV
N d

(534)

One can find all other thermodynamic functions from F . Thus, one can obtain
the entropy from

F
(535)
S =
T V.N
as

V e
d
S = N kB ln
N kB
(536)
+
N d
2
which is the Sackur-Tetrode formula.
It is quite simple to show that the chemical potential is given by
= kB T ln

N d
V

(537)

The condition under which the classical description is a expected to be a reasonable approximation is given by
V
1
N d

(538)

Hence, we discover that the classical approximation is expected to be valid

whenever

exp 1
(539)

5.4

The Entropy of Mixing

The entropy of mixing is associated with the factor of N ! needed to describe the
microstates available to a gas of identical particles. This factor is required to
make the entropy extensive so that on changing scale by a factor of s we have
S(sE, sV, sN ) = s S(E, V, N )

(540)

The N ! is also needed to make the expression for the chemical potential intensive.
Consider a container partitioned off into two volumes V1 and V2 . The
containers hold N1 and N2 gas molecules respectively, and assume that the
112

molecules have the same masses and the gasses are kept at the same temperature (or average energy per particle). Then consider removing the partition.
If the gas molecules in the two containers are indistinguishable, then in the
Micro-Canonical Ensemble the entropy of the final state is given by

a
(541)
Sindis = kB ln
( N1 + N2 )! ( 2 h )d(N1 +N2 )
which corresponds to dividing the enlarged accessible phase space a by a factor
of ( N1 + N2 )! to avoid over-counting the number of microstates. Equivalently,
in the Canonical Ensemble the partition function Z is given by

N1 +N2
V1 + V2
d

Zindis =

( N1 + N2 )!

(542)

However, if the molecules are distinguishable, the accessible phase space of the
final state is the same as that for indistinguishable particles. However, it should
be divided by N1 ! N2 ! corresponding to the number of permutations of like
molecules. In this case, the final state entropy is given by the expression

a
Sdis = kB ln
(543)
( N1 ! N2 ! ) ( 2 h )d(N1 +N2 )
or equivalently

Zdis =

V1 + V2
d

N1 +N2

( N1 ! N2 ! )

(544)

Since in this case the final state consists of a mixture of distinct gasses, the entropy of the mixture must be larger than the entropy of the mixture of identical
gasses. That is, it is expected that work would have to be expended to separate
the distinct molecules. The entropy of mixing is defined as
Smix = Sindis Sdis

(545)

and since a are identical, it is found to be given by

Smix

kB ln( N1 + N2 )! kB ln( N1 ! N2 ! )

( N1 + N2 ) kB ln( N1 + N2 ) N1 kB ln N1 N2 kB ln N2
N1
N2
= N1 kB ln
N2 kB ln
N1 + N2
N1 + N2

N1
N1
N2
N2
= ( N1 + N2 ) kB
ln
+
ln
N1 + N2
N1 + N2
N1 + N2
N1 + N2
(546)

which has a form reminiscent of Shannons entropy.

113

5.5

The Einstein Model of a Crystalline Solid

We shall revisit the Einstein model of a Crystalline Solid, in the Canonical

Ensemble. The Hamiltonian of N 0 harmonic oscillators with frequency 0 takes
the form

N0
X
1

H =
(547)
h 0 n
i +
2
i=1
in the number operator representation. The set of possible eigenvalues of n
i are
the integer values 0, 1, 2, 3, . . . , . In this occupation number representation,
the partition function ZN 0 is given by the trace

ZN 0 = Trace exp H
0

Trace exp

N
X

h 0 ( ni +

i=1

1
)
2

(548)

where the Trace is the sum over all the set of quantum numbers ni for each
oscillator. Hence, on recognizing that the resulting expression involves the sum
of a geometric series, we have
Z

N0 X
Y

exp

ni =0

i=1

1
h 0 ( ni +
)
2

N0
Y

exp[ h2 0 ]
=
1 exp[ h 0 ]
i=1

N 0
h 0
=
2 sinh
2

(549)

where each normal mode gives rise to an identical factor. The Free-Energy is
given by

h 0
0
F = N kB T ln 2 sinh
(550)
2
The entropy S is found from

F
(551)
S =
T
which yields
F
h 0
h 0
+ N0
coth
(552)
T
2T
2
However, since F = U T S, one finds the internal energy U is given by
S =

U = N0

h 0
h 0
coth
2
2

(553)

This result is the same as that which was previously found using the MicroCanonical Ensemble.
114

5.6

Vacancies in a Crystal

The Hamiltonian describing vacancies in a crystal can be described by

HN =

N
X

(554)

i=1

where the number of vacancies on site i is defined by ni . The number of vacancies

at site i only ni has two possible values of unity or zero, since there either is a
vacancy or there is not. The partition function ZN is given by

ZN = Trace exp HN
(555)
which is evaluated as

ZN

Trace exp

N
X

i=1

Trace

N
Y

exp

(556)

i=1

The Trace runs over all the set of possible values of ni for each oscillator. Thus

N
Y
exp ni
ZN = Trace
i=1

1
N X
Y
i=1

N
Y

exp

ni =0

1 + exp[ ]

i=1

1 + exp[ ]

which leads to the expression for the Free-Energy

F = N kB T ln 1 + exp[ ]

(557)

(558)

Hence, from thermodynamics, one finds that the energy U is given by

exp[ ]
1 + exp[ ]
1
= N
exp[ ] + 1
= N

(559)

which is identical to the expression that was found using the Micro-Canonical
Ensemble.
115

5.7

Quantum Spins in a Magnetic Field

Consider a set of N quantum spins with magnitude S. We shall set h = 1 for

convenience. The spins interact with a magnetic field H z through the Zeeman
interaction
n
X
int =
H
g B H z Siz
(560)
i=1

where the Siz have eigenvalues mi where S m S.

Sice the spins do not interact with themselves, the partition function factorizes as
X

N
S
z
Z =
exp g B H m
m=S

1
2

) exp g B H ( S +
g B H ( S +

=
exp g B H z 12
exp g B H z 12

sinh g B H z ( S + 21 ) N

=
1
z
sinh g B H 2
exp

1
2

) N

(561)

The Free-Energy is given by

F = kB T ln Z

(562)

which is evaluated as

1
1
F = N kB T ln sinh g B H z ( S + ) + N kB T ln sinh g B H z
2
2
(563)
This can be expressed as

z
z
F = N g B S H N kB T ln 1 exp[ g B H ( 2 S + 1 ) ]

z
+ N kB T ln 1 exp[ g B H ]
(564)
Using thermodynamics, one can obtain the entropy. At high temperatures, the
entropy saturates at
S kB ln( 2 S + 1 )
(565)
From the entropy and F , one can find the energy U which is given by

1
1
1
1
U = N g B H z
coth g B H z
( S + ) coth g B H z ( S + )
2
2
2
2
(566)
116

The internal energy saturates at

U = N g B S H z

(567)

in the low temperature limit, T 0 where the spins are completely aligned
with the field. The internal energy vanishes in the high temperature limit, where
the different spin orientations have equal probabilities.
Homework:
Determine the magnetization M z defined by

F
z
M =
H z

(568)

and the susceptibility z,z which is defined as

M z
z,z =
H z

(569)

Find the zero field limit of the susceptibility.

5.8

Interacting Ising Spin One-half Systems

Consider a one-dimensional array of spins interacting via the Ising Hamiltonian14 given by
X
z
=
H
J Siz Si+1
(570)
i

The operator Sz has 2 possible eigenvalues which are h2 , h2 . The interaction

J couples the z-components of N 1 pairs of nearest neighbor spins. We shall
assume that the interaction J has a positive value, so that the lowest energy
configuration is ferromagnetic in which all the spins are aligned parallel to each
other.
The partition function is given by

Z = Trace exp H
=

Trace

N
1
Y

exp

z
J Siz Si+1

(571)

i=1

which is the product of factors arising from each sequential pair-wise interaction.
z
The factors exp[ J Siz Si+1
] arising from an interaction can be re-written as

z
2 Siz Si+1
1
J h2
z z

=
exp J Si Si+1
+
exp +
2
4
h2
14 E.

Ising, Beitrag zur Theorie des Ferromagnetismus, Z. Phys. 31, 253-258, (1925).

117

z
2 Siz Si+1
1
J h2
+

exp
2
4
2
h

2
J h
J h2
4 z z
=
cosh
+ 2 Si Si+1 sinh
4
4
h
(572)

Z
since they are to be evaluated on the space where Siz Si+1
=

Z = Trace

N
1
Y
i=1

h
2
4 .

4
J h2
J h2
z
+ 2 Siz Si+1
sinh
cosh
4
4

Thus

(573)

The trace can be evaluated as a sum over all possible values of the spin eigenvalues
h

N
2
Y
X
Trace
(574)
i=1

Siz = h
2

The trace runs over all the 2N possible microstates of the system. The trace can
be evaluated, by noting that the summand in the expression for the partition
function only contains one factor which depends on S1z

4 z z
J h2
J h2
+ 2 S1 S2 sinh
(575)
cosh
4
4
h
The terms odd in S1z cancel when taking the trace. Hence, the trace over S1z
contributes a multiplicative factor of
2 cosh

J h2
4

(576)

to the partition function, where the factor of two comes from the two spin
directions. After the trace over S1z has been performed, only the factor

J h2
4 z z
J h2
cosh
+ 2 S2 S3 sinh
(577)
4
4
h
depends on S2z . On taking the trace over S2z , the last term in this factor vanishes
2
and the trace contributes a second multiplicative factor of cosh J4 h to Z. Each
of the N 1 interactions contributes a factor of
2 cosh

J h2
4

(578)

to the partition function. The trace over the last spin, produces a multiplicative
factor of 2 to Z. Hence, the partition function is given by

N 1
J h2
Z = 2 2 cosh
(579)
4
118

The Free-Energy F is given by

F = kB T ln Z

(580)

which is evaluated as
F = N kB T ln 2 ( N 1 ) kB T ln cosh

J h2
4

(581)

The entropy S is found from

S =

F
T

(582)

which yields
J h2
J h2
J h2
( N 1 ) kB
tanh
4
4
4
(583)
The entropy is seen to reach the value N kB ln 2 appropriate to non-interacting
spins in the limit 0 and reaches the value of kB ln 2 in the limit T 0.
The internal energy U is found from the relation
S = N kB ln 2 + ( N 1 ) kB ln cosh

F = U T S

(584)

J h
2
J h2
tanh
(585)
4
4
The energy vanishes in the limit 0 and saturates to the minimal value of
2
( N 1 ) J h4 appropriate to the ( N 1 ) pair-wise interaction between
completely aligned spins in the low temperature limit T 0. Hence, the ground
state is two-fold degenerate and corresponds to minimizing the energy by the
spins aligning so that either they are all up or they are all down. While at high
temperatures, the system is dominated by the entropy which is maximized by
randomizing the spin directions.
U = (N 1)

5.9

Density of States of Elementary Excitations

Consider normal modes of excitation that extend throughout the a hypercubic

volume V = Ld . If the excitations satisfy an isotropic dispersion relation of
the form
h = h (k)
(586)
where (k) is a monotonically increasing function of k, then this relation can
be inverted to yield
k = k()
(587)
119

Since the normal modes are confined to the system, the normal modes wave
functions must vanish on the walls of the system at xi = 0 and xi = L, for
i = 1, 2, . . . , d. If the wave functions have the form
(r) =

(k)

sin k . r
V

(588)

for each polarization , the allowed wave vectors satisfy the d-boundary conditions
ki L = ni
(589)
for positive integer values of ni . Thus, the allowed values of k are quantized
and can be represented by a vector n in n-space
Lk

n =

(590)

which has positive integer components n1 , n2 , n3 , . . . , nd . In n-space, each

normal mode with polarization is represented by a point with positive integer coordinates. Therefore, the normal modes per polarization form a lattice
of points arranged on a hyper-cubic lattice with lattice spacing unity. In the
segment composed of positive integers, there is one normal mode for each unit
volume of the lattice.

(0,0)

Figure 30: A pictorial representation of n-space in two-dimensions. Each state

corresponds to a point (n1 , n2 ) for positive integer values of the ns. In the
positive quadrant, there is one state per unit area. The states with energy less
than E are located in an area of the positive quadrant enclosed by a circular
arc of radius r given by r = Lk(E)/.

120

Due to the monotonic nature of (k), the number of excitations, per polarization, with energies less than , N (), is given by the number of lattice points
n which satisfy the inequality
|n|

L k()

(591)

or more explicitly
v
u d
uX
L k()
t
n2i

i=1

(592)

Since the segment of n-space with positive integers,

n1
n2

...
nd1

(593)

is a fraction of 21d of the entire volume of n-space, the number of normal modes
with energy less than is given by 21d of the volume enclosed by a radius
r =

L k()

(594)

where we have recalled that there is one normal mode for each unit cell in nspace and that each cell has a volume 1d . Hence, on dividing the expression for
the volume of a hypersphere of radius r by 2d , one finds
N ()

=
=

d
1 Sd L k()
2d d

d
Sd L k()
d
2

(595)

This assumes that no points lie on the bounding surface of the hypersphere, or
if they do their numbers are negligible. The surface area of a unit dimensional
hypersphere is given by
d
2 2
Sd =
(596)
( d2 )
so

d
d
2 2 V
k()
N () =
(597)
2
d ( d2 )

121

The number of excitations, per polarization, with energy less that h

can be
expressed as an integral of the density of states (), per polarization, defined
as

X
(k)
(598)
() =
k

as
Z
N ()

d 0 ( 0 )

X
0
d
(k)
0

(k)

(599)

where is the Heaviside step function. The step function restricts the summation to the number of normal modes with frequencies less than , which are
counted with weight unity. Thus, the density of states per polarization can be
found from N () by taking the derivative w.r.t.
d
N ()
d
Hence, we find that the density of states can be represented by
d1

d
V 2
dk()
k()
() =
d
2
d
( 2 )
() =

(600)

(601)

The total density of states is given by the sum of the density of states for each
polarization.
Homework:
Find the density of states for particles moving in a three-dimensional space
obeying the dispersion relation
= c kn

5.10

for n > 0

(602)

The Debye Model of a Crystalline Solid

Consider a crystalline solid consisting of N atoms on a d-dimensional lattice.

The model considers the lattice vibrations as being isotropic sound waves. The
sound waves are characterized by their wave vectors k and by their polarizations.
The vibrational modes consist of N longitudinal modes and (d 1)N transverse
modes. The dispersion relations for the modes will be denoted by (k). The
Hamiltonian is given by
X
1
=
H
h (k) ( nk, +
)
(603)
2
k,

122

where nk, is a integer quantum number.

The partition function Z is given by

Y X
X
1
Z =
exp
h (k) ( nk, +
)
2
=0
n
=

Y
k,

exp

h (k) ( nk,

nk, =0

exp[ 21 h (k) ]
1 exp[ h (k) ]

1
+
)
2

(604)

where we have performed the sum over a geometric series. The Free-Energy F
is given by
F

= kB T ln Z

X
1
1
ln exp[ + h (k) ] exp[ h (k) ]
= kB T
2
2
k,

Z
1
1
= kB T
d () ln exp[ + h ] exp[ h ]
2
2

Z
h

=
+ kB T ln 1 exp[ h ]
(605)
d ()
2

where we have introduced the density of states () via

X
() =
( k, )

(606)

Since the density of states from the different polarizations is additive, one has

(d 1)
V Sd
1
() =
d1
(607)
+
( 2 )d
cdL
cdT
where the dispersion relation for the longitudinal modes is given by = cL k
and the dispersion relation for the (d1) transverse modes is given by = cT k.
Since the lattice vibrations are only defined by the motion of point particles
arranged on a lattice, there is an upper limit on the wave vectors k and, hence,
a maximum frequency. The maximum frequency D is determined from the
condition that the total number of normal modes is dN . Thus,
Z D
() = d N
(608)
0

which yields
V Sd
d ( 2 )d

1
(d 1)
+
cdL
cdT
123

d
D
= dN

(609)

r(w)wD/N

0
-1

-0.5

0.5

w/wD

1.5

Figure 31: The density of states for the Debye model of a three-dimensional
solid containing N atoms, with an upper cut-off frequency D .
Hence, we may write the density of states as
() = d2 N

d1
d
D

(610)

for D 0, ad is zero otherwise.

The Free-Energy is given in terms of the density of states as

Z
h
F =
d ()
+ kB T ln 1 exp[ h ]
(611)
2

and the entropy S is found from

S =

F
T

(612)
V

The internal energy is found from F and S as

Z D
1
1
U =
d () h
+
2
exp[ h ] 1
0
The specific heat at constant volume is found from

E
CV =
T V

(613)

(614)

which yields
C V = kB 2

d () h2 2

124

exp[ h ]
( exp[ h ] 1 )2

(615)

or
CV =

h2
kB T 2

d () 2

exp[ h ]
( exp[ h ] 1 )2

(616)

On substituting for the density of states, one finds

CV =

2 d2 N
h
d
kB T 2 D

d d+1

exp[ h ]
( exp[ h ] 1 )2

(617)

The specific heat can be evaluated in two limits. In the high temperature
limit, kB T
h D , then kB T h . In this limit, one can expand the
integrand in powers of
h , which leads to
Z D
d2 N
C V = kB
d d1
d
D
0
d N kB
(618)
Thus, at high temperatures, the Debye model of a solid reproduces Dulong and
Petits law.
At low temperatures, kB T h
D , one can introduce a dimensionless
variable
x = h
(619)
The maximum frequency D corresponds to a maximum value xD
xD 1
The specific heat can be expressed as
Z xD
d2 N
exp[ x ]
CV = kB
dx xd+1
d
( exp[ x ] 1 )2
xD
0

(620)

(621)

or on extending the upper limit to infinity

Z
exp[ x ]
d2 N
d
CV = kB ( kB T )
dx xd+1
d
( h D )
( exp[ x ] 1 )2
0
Z
2
d N
xd
= kB ( kB T )d
(
d
+
1
)
dx
( h D )d
( exp[ x ] 1 )
0
(622)
where the last line has been found through integration by parts. The integration
produces a constant ( d + 1 ) (d + 1), so the end result is

CV = N k B

kB T
h D

d2 ( d + 2 ) (d + 1)

125

(623)

Hence, the specific heat at low temperatures is proportional to T d , which is in

accord with experimental observation.
For d = 3, one finds

CV = N k B

5.11

kB T
h D

3
216

4
90

(624)

Electromagnetic Cavities

Electromagnetic waves can flow through a vacuum, therefore an empty cavity

can support electromagnetic modes. The normal modes can be represented by
their wave vectors k and by their two possible polarizations. However, unlike
the sound waves in a solid, there is no upper cut off for the wave vector of an
electromagnetic wave. The Hamiltonian is given by
X
1
=
H
h (k) ( nk, +
)
(625)
2
k,

where nk, is a quantum number (the number of photons) which has the allowed
values of 0, 1, 2, 3, . . . , .
The partition function Z is given by

Y X
X
1
Z =
exp
h (k) ( nk, +
)
2
n
=0
=

Y
k,

exp

h (k) ( nk,

nk, =0

exp[ 21 h (k) ]
1 exp[ h (k) ]

1
+
)
2

(626)

and the Free-Energy is given by

= kB T ln Z

X
1
1
= kB T
ln exp[ + h (k) ] exp[ h (k) ]
2
2
k,

Z
1
1
= kB T
d () ln exp[ + h ] exp[ h ]
2
2

Z
h

=
d ()
+ kB T ln 1 exp[ h ]
(627)
2

where the density of states () is given by

X
() =
( k, )
k,

126

(628)

The first term in the Free-Energy represents the (infinite) zero-point energy of
the electromagnetic modes. It is divergent, because the electromagnetic cavity can support modes of arbitrarily high frequency. Divergences due to the
presence of modes with arbitrarily large frequencies are known as ultra-violet
divergences. Since only excitation energies are measured, the zero-point energy
can usually be ignored. However, if the boundaries of the cavity are changed,
there may be a measurable change in the zero-point energy of the cavity such
as found in the Casimir effect15 . That is, although it may be reasonable to
speculate that the divergence in the zero-point energy may merely reflect our
ignorance of the true physics at ultra-short distances, the zero-point energy cannot be dismissed since it does have some physical reality.
The density of states for the ( d 1 ) transverse electromagnetic modes
can be described by
Sd V d1
( 2 )d cd

() = ( d 1 )

(629)

Hence, the Free-Energy is given by the integral

Z
Sd V
h
d1
F = (d 1)
d
+ kB T ln 1 exp[ h ]
( 2 )d cd 0
2
(630)
The internal energy is given by
U

= F + T S

F
= F T
T

(631)

which leads to
Sd V
U = (d 1)
( 2 c )d

h

h
+
2
exp[ h ] 1

(632)

The first term is divergent and represents the zero-point energy, The second
term, U , represents the energy of thermally excited photons. The second
term can be evaluated by changing variable to x defined by
x = h

(633)

Thus
4(d 1)
U =
( d2 )

d+2
2

V h c

kB T
2 h c

d+1 Z

dx
0

xd
(634)
exp[ x ] 1

15 H.B.G. Casimir, On the attraction between two perfectly conducting plates, Proc. Kon.
Nederland. Akad. Wetensch. B51, 793 (1948).

127

which leads ro
4(d 1)
U =
( d2 )

d+2
2

V h c

kB T
2 h c

d+1
(d + 1) (d + 1)

(635)

In three-dimensions, the thermal energy of the cavity is given by

U =

4
2 kB
T4 V
3 3
15 h c

(636)

since 3! (4) = 15 . This result is closely related to the Stefan-Boltzmann law

which describes the energy flux radiating from an electromagnetic cavity held
at temperature T .
The energy flux escaping through a wall of the cavity is given by the energy
that passes through a unit area, per unit time. In a unit time, a photon travels a
distance c cos perpendicular to the wall, where denotes the angle subtended
by the photons velocity to the normal to the wall. Thus, the number of photons
incident on a unit area of the wall, in a unit time, is given by the integral
Z
Z
d
N
(637)
FN =
c cos
d
Sd
V
0
where the integration over the angle runs from 0 to 2 as theR flux only includes
light that is traveling towards the wall. In this expression
d N /V is the
photon density. Therefore, the energy flux is given by
Z
Z
d
N
FE =
c cos
d h
Sd
V
0
Z
Z 2
N
Sd1
d2
d sin
c cos
d h
=
Sd
V
0
0
Z
c Sd1
N
(638)
=
d h
( d 1 ) Sd 0
V
The density of photons in a frequency interval d is given by
N
( d 1 ) Sd
d1
d =
d
d
V
(2c)
exp[ h ] 1
Hence, the energy flux is given by
Z
h d
c Sd1
d
FE =
( 2 c )d 0
exp[ h ] 1

d+1
kB T
= 2
h c2 Sd1 (d + 1) (d + 1)
2 h c

(639)

(640)

For three dimensions, this reduces to

FE =

4
2 kB
T4
60 h3 c2

128

(641)

which is the Stefan-Boltzmann law inferred from experiment16 and then deduced
theoretically17
FE = T 4
(642)
where Stefans constant is given by
=

4
2 kB
3 2
60 h c

(643)

Homework:
Show that the thermal energy, per unit volume, of electromagnetic radiation
with frequency in the range d is given by

1
U
h 3
d =
exp[ h ] 1
d
(644)
V
c3
The spectrum of emitted radiation from a perfect emitter is a universal function of temperature which was first devised by Planck18 . Show that at high
temperatures ( kB T
h ) it reduces to the Rayleigh-Jeans Law19
2
U
d
kB T d
V
c3

(645)

in which each mode has an energy kB T .

Homework:
Since the Universe is expanding it is not in thermal equilibrium. The density
of matter is so small that the microwave background can be considered to be
decoupled from the matter. Therefore, the universe can be considered as an
electromagnetic cavity filled with microwave radiation that obeys the Planck
distribution with a temperature of approximately 2.725 K 20 . Because of the
expansion of the universe, the photons are Doppler shifted. The Doppler shift
16 J. Stefan, Uber

die Beziehung zwischen der W

armestrahlung und der Temperatur,
Sitzungsberichte der mathematisch-naturwissenschaftlichen Classe der kaiserlichen Akademie
der Wissenschaften, Wien, Bd. 79, 391-428, (1879).
17 L. Boltzmann, Ableitung des Stefanschen Gesetzes, betreffend die Abh
angigkeit der
W
armestrahlung von der Temperatur aus der electromagnetischen Lichttheori, Ann. Physik,
22, 291-294 (1884).
18 M. Planck, Uber

das Gesetz der Energieverteilung im Normalspectrum Annalen der

Physik, 3, 553-563 (1901).
19 Lord Rayleigh, Remarks on the Complete Theory of Radiation, Phil. Mag. 49, 539-540
(1900).
J.H. Jeans, On the Partition of Energy between Matter and ther, Phil. Mag. 10, 91-98,
(1905).
20 D.J. Fixen, E.S. Cheng, J.M. Gales, J.C. Mather, R.A. Shafer and E.L. Wright, The
Cosmic Ray Background from the full COBE-FIRAS data set. The Astrophysical Journal,
473, 576-587 (1996).

129

I(,T)
0

Figure 32: The spectrum of electromagnetic radiation emitted from a cavity

held at a temperature T .
reduces the photons energies and squeezes the frequencies to new frequencies
0 .
(i) If the linear-dimension of the universe is changed from L to L0 , how does the
frequency of a photon change?
(ii) Show that the spectral density of thermally activated photons N d retains
its form after the expansion but that the temperature is changed from T to T 0 .
What is the final temperature T 0 in terms of the initial temperature T and L0 /L?
(iii) Assume that expansion of the universe is adiabatic, and then use thermodynamics to determine the change in temperature of the background microwave
radiation when the linear-dimension L of the universe is increased to L0 .
Mathematical Appendix: Evaluation of the Riemann zeta function.
Consider the function
Z
f (k) =

dx
0

sin(kx)
exp[x] 1

(646)

where the integrand is seen to be finite at x = 0. Its Taylor expansion can be

written as
Z

X
k 2n+1
x2n+1
f (k) =
(1)n
dx
(2n + 1)! 0
exp[x] 1
n=0
=

(1)n k 2n+1 (2n + 2)

n=0

130

(647)

so f (k) can be regarded as the generating function for the Riemann zeta functions. The value of the coefficient of k 2n+1 is simply related to the Riemann
zeta function (2n + 2). As sin(kx) is the imaginary part of exp[ikx], one can
re-write the integral as the imaginary part of a related complex function
Z
exp[ikx]
f (k) = lim =m
dx
(648)
0
exp[x] 1

where, due to the finite value of , the integration avoids the pole at the origin.
The real function f (k) can be evaluated by considering an integral of the related
complex function over a contour C
I
exp[ikz]
(649)
dz
exp[z]
1
C
The integrand has simple poles at the points 2 n i with integer n on the
imaginary axis, and has residues exp[ 2 k n ] at these points. The contour
i R

Figure 33: The contour of integration which avoids the poles on the imaginary
axis.
C runs from to R along the real axis, then to R + 2 i along the imaginary
axis, then the contour runs back parallel to the real axis to + 2 i. Then, to
avoid the pole at 2 i, the contour follows a clockwise circle of radius centered
on 2 i from + 2 i to the point i + 2 i. The contour then runs
down the imaginary axis from i + 2 i to i , and finally returns to ,
by following a quarter circle of radius centered on zero, thereby avoiding the
pole at zero. The integral will be evaluated in the limit where R and
0.
Since there are no poles enclosed by the integration contour, Cauchys theorem yields
I
exp[ikz]
dz
= 0
(650)
exp[z] 1
C
In the limit R , the contribution from the segment from R to R +
2 i tends to zero, as the integrand vanishes due to the denominator. The
131

integrations over the segments parallel to the real axis from to R and from
R + 2 i to + 2 i can be combined to yield

Z
exp[ikx]
dx
1 exp[ 2 k ]
(651)
exp[x] 1

which has an imaginary part that is related to f (k). The integrations over the
quarter circles about the simple poles are both clockwise and are given by i 2
times the residues at the poles, and can be combined to yield

1 + exp[ 2 k ]
(652)
i
2
The remaining contribution runs from i + 2 i to i
Z
exp[ k y ]
dy
i
exp[
iy] 1
+2
Z +2
exp[ ( 2 k + i ) y2 ]
=
dy
2 sin y2

Z +2
1
y
=
dy exp[ k y ]
cot i
2
2
which, in the limit 0, has an imaginary part that is given by

Z 2
1
1 exp[ 2 k ] 1
dy exp[ k y ] =
2 0
2
k

(653)

(654)

If one now take the imaginary part of the entire integral of eqn(650) and take
the limit 0, one obtains

1
1 exp[ 2 k ] f (k)
1 + exp[ 2 k ] +
1 exp[ 2 k ] = 0
2
2k
(655)
On rearranging the equation, f (k) is found to be given by

1
coth k
2
2k
Since the series expansion of coth(k) is given by
f (k) =

coth k =

k
1
2
1
+

( k )3 +
( k )5 + . . .
k
3
45
945

(656)

(657)

then

2
4 3
6 5
k
k +
k + ...
6
90
945
so the values of the Riemann zeta function are given by
f (k) =

(2)

(4)

(6)

=
132

2
6
4
90
6
945

(658)

(659)

etc.

5.12

Energy Fluctuations

The Canonical Distribution Function can be used to calculate the entire distributions of most physical quantities of the system. However, for most applications it is sufficient to consider the average values A and the moments of the
fluctuation An where the fluctuation is defined as
A = A A

(660)

The average fluctuation A is identically zero since

A = A A = 0

(661)

However, the mean squared fluctuation is given by

( A A )2

A2 A

(662)

which can be non-zero.

In the Canonical Ensemble, the energy is no longer fixed. However, the
average value of the energy E was found to be equal to the thermodynamic
value U , as
Z

d
1
H exp H
E =
Z
0
Z

1
d
=
exp H
Z
0
ln Z
=

=
F

= U
(663)
The mean squared fluctuation in the energy can be expressed as
Z

Z

2
1
d 2
1
d
E 2 =
H exp H
2
H exp H
Z
0
Z
0

2
2
1 Z
1
Z
=
2
Z 2
Z

1 Z
=
Z
2 ln Z
=
(664)
2
133

It should be noted that the mean squared energy fluctuation can also be expressed as a derivative of the average energy w.r.t.

2
E =
E
(665)

Therefore, on expressing this as a derivative w.r.t T , we find that

E
2
2
E = kB T
T V,N

(666)

Hence, the mean squared energy fluctuation can be expressed in terms of the
specific heat at constant volume
E 2 = kB T 2 CV,N

(667)

From this we deduce that the relative magnitude of the energy fluctuations given
by the dimensionless quantity
E 2
(668)
2
E
is of the order of 1/N since
E 2
E

kB T 2 CV,N
E

1
N

(669)

where the similarity follows since CV is extensive and proportional to N as is E.

Thus, on taking the square root, one sees that the relative magnitude of the root
mean squared (rms) fluctuation in the energy vanishes in the thermodynamic
limit, since
p
kB T 2 CV,N
Erms
1
=

(670)
E
E
N
Therefore, the relative fluctuations of the energy are negligible in the thermodynamic limit. This suggest the reason why quantities calculated with the Canonical Distribution Function agree with those found with the Micro-Canonical
Distribution Function.
The underlying reason for the Canonical and Micro-Canonical Ensemble
yielding identical results is that the energy probability distribution function is
sharply peaked. The probability density that the system has energy E is given
by P (E), where

Z
P (E) =
d E H({pi , qi }) c ({pi , qi })

Z
1
d
=
E H({pi , qi }) exp H({pi , qi })
Z
0

1 (E)
=
exp E
(671)
Z 0
134

which is the product of an exponentially decreasing function of energy exp[E]

and (E) the volume of accessible phase space for a system in the MicroCanonical Ensemble with energy E. We recognize that (E) is an extremely
6

P(E)

exp[- E]

E)
0
0

0.5

1.5

E/E0

Figure 34: The energy distribution function P (E) in the Canonical Ensemble
(shown in blue). The distribution is sharply peaked since it is the product
of an exponentially decreasing factor exp[E] (red) and a rapidly increasing
function (E) (green).
rapidly increasing function of E, since for a typical system (E) E N where
ia a number of the order of unity. The most probable value of energy Emax can
be determined from the condition for the maximum of the energy distribution
function

dP (E)
= 0
(672)
dE Emax
which leads to
d
dE

(E) exp

= 0

(673)

Emax

on representing (E) in terms of the entropy S(E) one finds that the most
probable value of the energy is given by the solution for Emax of the equation

d

exp E + S(E)/kB
= 0
(674)

dE
Emax
or after some simplification

1
S(E)

+
= 0
T
E Emax

(675)

This equation is satisfied if the temperature T of the thermal reservoir is equal

to the temperature of the system. This condition certainly holds true in thermal
equilibrium, in which case
Emax = U
(676)
135

where U is the thermodynamic energy. Thus, we find that the most probable
value of the energy Emax is equal to U , the thermodynamic energy. From our
previous consideration, we infer that the most probable value of the energy is
also equal to the average value of the energy E,
Emax = E

(677)

Thus, the probability distribution function is sharply peaked at the average energy.
The energy probability distribution function P (E) can be approximated by
a Gaussian expression, centered on U . This follows by Taylor expanding the
exponent of P (E) in powers of ( E U )

2

1
d S
1
2
exp F +
(
E

U
)
+
.
.
.
(678)
P (E) =
Z
2 kB dE 2 U
or on cancelling the factor of Z with exp[F ], one finds

2
1
d S
2
P (E) = exp
(
E

U
)
+
.
.
.
2 kB dE 2 U

(679)

The energy width of the approximate Gaussian distribution is governed by the

quantity
2
1
T
d S
1
1
=

kB dE 2 U
kB U V,N

1
T
=
kB T 2 U V,N
1
=
(680)
kB T 2 C V
Hence, the mean square fluctuations in the energy are given by
E 2 = kB T 2 CV

(681)

in accordance with our previous calculation. We note that in the thermodynamic limit N , the energy distribution is so sharply peaked that the
energy fluctuations usually can be ignored.
Homework:
Show that for an ideal gas, the energy fluctuations in the Canonical Ensemble
are such that
s
r
E 2
2
=
(682)
3
N
E

136

Hence, the relative energy fluctuations vanish in the thermodynamic limit N

.
Homework:
Calculate E 3 for a system in the Canonical Ensemble. Hence, show that

2
2
2 Cv
3
E = kB T
+ 2 T CV
(683)
T
T
and evaluate this for an ideal gas.
Homework:
Prove that
E n = ( 1 )n

n ln Z
n

(684)

generally holds for the Canonical Ensemble. Hence deduce that the higher-order
moments of the energy fluctuations are all proportional to N .

5.13

The Boltzmann Distribution from Entropy Maximization

The general expression for entropy in terms of the probability distribution function c ({pi , qi }) is given by the integral over phase space

Z
S = kB
d c ({pi , qi }) ln c ({pi , qi })0
(685)
This is trivially true in the Micro-Canonical Ensemble and is also true in the
Canonical Ensemble where

1
0 c ({pi , qi }) =
exp H({pi , qi })
(686)
Z
This can be seen by substituting the equation for c ({pi , qi }) in the expression
for S, which leads to

Z
S = kB
d c ({pi , qi }) H({pi , qi }) + ln Z
(687)
However, we know that
ln Z = F
and the distribution function is normalized
Z
d c ({pi , qi }) = 1
137

(688)

(689)

so on multiplying by T we find

Z
T S =
d c ({pi , qi }) H({pi , qi }) F

(690)

Since the average energy E is defined as

Z
E =
d c ({pi , qi }) H({pi , qi })

(691)

then we end up with an expression for the Helmholtz Free-Energy

F = E T S
Finally, since E = U (the thermodynamic energy), we have shown that

Z
S = kB
d c ({pi , qi }) ln c ({pi , qi })0

(692)

(693)

which we shall regard as the fundamental form of S for any distribution function.
Derivation
Given the above form of S one can derive the Canonical Distribution Function as the distribution function which maximizes the functional S[], subject
to the requirements that the average energy is U and that the distribution function is normalized. That is c must maximize the functional S[] subject to the
constraints that
Z
1 =
d ({pi , qi })
Z
U =
d ({pi , qi }) H({pi , qi })
(694)
The maximization of S subject to the constraints is performed by using Lagranges method of undetermined multipliers. In this method, one forms the
functional [] defined by

Z
[] = kB
d ({pi , qi }) ln ({pi , qi })0

Z
+ 1 d ({pi , qi })

Z
+ U d ({pi , qi }) H({pi , qi })
(695)
where and are undetermined numbers. If satisfies the two constraints then
[] is equal to S[], and then maximizing is equivalent to maximizing S.

138

If [] is to be maximized by c ({pi , qi }) then a small change in ({pi , qi })

by ({pi , qi }) should not change . That is, if we set
({pi , qi }) = c ({pi , qi }) + ({pi , qi })

(696)

where is an arbitrary deviation, then the first-order change in , (1) defined

by
[c + ] [c ] = (1) + O()2
(697)
must vanish
(1) = 0

(698)

If this condition was not satisfied, then a specific choice for the sign of would
cause to increase further. Thus, the requirement that [] is maximized by
c , leads to the condition

Z
d ({pi , qi }) kB ln c ({pi , qi })0 kB H({pi , qi }) = 0
(699)
This integral must vanish for any choice of . This can be achieved by requiring
that the quantity inside the square brackets vanishes at every point in phase
space. That is
kB ln c ({pi , qi })0 = kB H({pi , qi })

(700)

where and are undetermined constants. Hence, on exponentiating, we have

c ({pi , qi })0 = exp 1 /kB exp H({pi , qi })/kB
(701)
The constants and are determined by ensuring that the two constraints are
satisfied. The conditions are satisfied by requiring that

Z
d
exp H({pi , qi })/kB
1 = exp 1 /kB
0

Z
d
H({pi , qi }) exp H({pi , qi })/kB
U = exp 1 /kB
0
(702)
which then has the effect that c maximized S. The two constraints suggests
that one should rewrite the parameters as
=
and

1
T

Z = exp

(703)

1 + /kB

139

(704)

In fact, if the form of c is substituted back into S and one constraint is used to
express S in terms of U and the second constraint to produce a constant term
(independent of U ), then if one demands that

S
1
=
(705)
U V
T
then one finds = T1 . Thus, the distribution that maximizes S[] is recognized
as being the Boltzmann Distribution Function

1
c ({pi , qi })0 =
exp H({pi , qi })
(706)
Z
In summary, we have shown that the Boltzmann Distribution Function maximizes S[] subject to the two constraints
Z
1 =
d c ({pi , qi })
Z
U =
d c ({pi , qi }) H({pi , qi })
(707)

5.14

The Gibbs Ensemble

The Gibbs Ensemble corresponds to the situation where a closed system is

partitioned into two parts. A smaller part of which is our system of interest and
the other part comprises its environment. The system and its environment are
allowed to exchange energy, and also a partition that separates the system from
its environment is free to move. Therefore, the volume of the subsystem can be
interchanged with the environment. The total energy ET is partitioned as
ET = E + ER

(708)

and the volume is partitioned as

VT = V + VR

(709)

The probability that the partition, considered by itself, will be found such that
the volume of the system is in a range dV around V is assumed to be given
by the ratio dV /VT . The probability dp that the closed system (including the
partition) is in the joint volume element dT and dV ,
dp =

1
mc dT dV
VT

(710)

On factorizing the infinitesimal volume element of total phase space dT into

contributions from the reservoir dR and the system d, one has
dp =

1
mc dR d dV
VT
140

(711)

We are assuming that the phase space d is consistent with the position of the
partition defining the volume V and also that the system has energy E. The
probability dp that the system is in the volume element d irrespective of the
microstates of the reservoir is obtained by integrating over all of the reservoirs
accessible phase space, consistent with the energy ET E and volume VT V .
The result is
1
dp =
mc R (ET E, VT V ) d dV
(712)
VT
The Gibbs Probability Distribution Function G is defined via

dp
dp =
d dV
ddV
= G d dV
(713)
and is found as
G

1
mc R,0 exp
=
VT

SR (ET E, VT V )/kB

(714)

However, one also has

=
=

exp ST (ET , VT )/kB

T,0

1
exp ( SR (ET U, VT V ) + S(U, V ) )/kB (715)
T,0

The phase space volumes representing single microscopic states of the total
system, reservoir and subsystem are assumed to satisfy the relation
T,0 = R,0 0

(716)

Hence, we can express the Gibbs Probability Distribution Function as

1
exp ( SR (ET E, VT V ) SR (ET U, VT V ) S(U, V ) )/kB
G 0 =
VT
(717)
The exponent can be expanded in powers of the energy and volume fluctuations
=

SR (ET U, VT V )

SR (ER , VT V )
+ (U E)

ER
E U

T
SR (ET U, VR )
+ (V V )
+ ...

VR
VT V
1
P
SR (ET U, VT V ) + (U E)
+ (V V )
+ . . . (718)
T
T

On substituting this in the expression for G , one finds that

1
G 0 =
exp G exp ( E + P V )
VT
141

(719)

where G is the Gibbs Free-Energy G(T, P, N ) of the system

G = U T S + P V

(720)

The only quantities in the Gibbs Distribution Function pertaining to the reservoir is its temperature and pressure. On introducing the Gibbs Partition Function Y via

Y = exp

one can express the Gibbs Probability Distribution Function as

1
1
G 0 =
exp P V
exp H
VT
Y

(721)

(722)

The Gibbs Partition Function

The normalization condition for the probability distribution function is given
by
Z
1

Z
dV

=
0

Z
=
0

d G

Z
1
d
1
exp P V
exp H
dV
VT
Y
0

1
1
Z(V )
(723)
exp P V
dV
VT
Y

where Z(V ) is the Canonical Partition Function. Hence, one finds that the
Gibbs Partition Function Y is determined from

Z VT
1
Y =
dV
exp P V Z(V )
(724)
VT
0
which only involves quantities describing the system. Since the Canonical Partition Function is a function of the variable (T, V, N ), the Gibbs partition function
is a function of (T, P, N ). Once Y has been determined from the above equation, thermodynamic quantities can be evaluated from the Gibbs Free-Energy
G(T, P, N ) which is expressed in terms of Y as
G = kB T ln Y

(725)

============================================
Example: The Ideal Gas

142

The Gibbs Partition Function for the ideal gas is given by

3N Z Z
3N
X
dV
1 Y
dpi dqi
p2i
Y (T, P, N ) =
exp P V
exp
VT
N ! i=1
2 h
2m
0
i=1
(726)
which on integrating over the particles coordinates and momenta becomes

N
3N
Z VT
dV
2 m kB T
V
Y (T, P, N ) =
exp P V
(727)
VT
N!
2 h
0
Z

The integral over V is easily evaluated by changing variable to

x = P V

(728)

and yields
1
P VT

Y (T, P, N ) =

2 m kB T
2 h

3N
(729)

where the factor of N ! has cancelled.

On ignoring the non-extensive contributions, the Gibbs Free-Energy is given
by
G = kB T ln Y

5
3
2 h2
= N kB T ln P
ln( kB T ) +
ln
2
2
m

(730)

Since the infinitesimal variation in G is given by

dG = S dT + V dP + dN

(731)

One finds that the average volume is given by

G
V =
P T
N kB T
=
P

(732)

which is the ideal gas law.

The value of the enthalpy H = E + P V can be calculated directly from
its average in the Gibbs Distribution
H

= E + P V
Z VT
Z
1
dV
=
Y 0 VT

d
( H({pi , qi }) + P V ) exp
0

( H({pi , qi }) + P V )
(733)

143

which can simply be expressed as a derivative w.r.t.

1 Y
H =
Y
P
ln Y
=

(734)

Thus, we find that the enthalpy is given by

5
N kB T
2

H =

(735)

from which the specific heat at constant pressure CP is found as

CP =

5
N kB
2

(736)

as is expected for the ideal gas.

============================================

5.15

A Flexible Polymer

(i+1,i+1)

(i,i)

Figure 35: Two successive links of a polymer. The orientation of the i-th
monomer is defined by the polar coordinates (i , i ) in which the polar axis is
defined by displacement vector of the whole polymer which runs from one end
to the other.
Consider a polymer made of a large number N monomers of length a. The
length of the polymer is variable since, although the monomers are joined end
to end, the joints are assumed to be freely flexible. That is, the polymers are
joined in way that allows free rotation at the ends. The length L of the polymer
is defined by its end to end distance, and this definition of the length also defines
a preferred (polar) axis which has the direction of the vector joining the ends.
The orientational degrees of freedom of the i-th monomer is given by the polar
coordinates (i , i ). Hence, the length of the polymer is given by
L =

N
X

a cos i

i=1

144

(737)

Although there is only one polymer, the fact that it is composed of a very large
number of monomers allows one to consider it as being in the thermodynamic
limit and to use statistical mechanics effectively.
The Hamiltonian is set equal to zero, since we are assuming that the monomers

L
Figure 36: A polymer chain consisting of N links and length L. The polar axis
is defined by the orientation of the displacement vector defining the length of
the polymer.
are freely jointed and have negligible masses. The partition function Z(L) for
the polymer of length L is given by

Z
N Z 2
Y
Z(L) =
di
sin i di
(738)
0

i=1

where the integrations are restricted by the constraint

N
X

L =

a cos i

(739)

i=1

In this case Z coincides with since H = 0.

If a tension T is applied to the polymer it is described by a Gibbs Distribution
Function Y (T, T ), defined by

Y

Z Na
Z
N Z 2
N
X
Y =
dL exp T L
di
sin i di L
a cos i
0

i=1

(740)
where the delta function has absorbed the factor of the normalization of the
length probability density. The use of the Gibbs Distribution is justified since
T and L are analogous to P and V . On performing the integral over L, one
obtains

Z
N
N Z 2
Y
X
Y =
di
sin i di
exp T a
cos i
(741)
i=1

i=1

which is no longer subject to a constraint. The constraint on the length has been
replaced by a non-uniform weighting function. The Gibbs Partition Function,
Y can be evaluated as

Z
N Z 2
Y
Y =
di
sin i di exp T a cos i
i=1

145

N Z
Y

i=1

=
=
=

N
Y

d cos i exp

T a cos i

i=1
N
Y

d cos i exp

T a cos i

exp[ T a ] exp[ T a ]
2
T a
i=1

N
exp[ T a ] exp[ T a ]
2
T a

N
sinh[ T a ]
4
T a

(742)

This has the form expected N for non-interacting monomers, where the only
quantity which couples the monomers is the tension across the polymer. The
reason for the form is recognized most clearly in the limit T a 0, where the
monomers are expected to be distributed uniformly over the unit solid angle 4.
Since, in this limit,

sinh[ T a ]
lim
1
(743)
T 0
T a
it is seen that Y reduces to the products of the unit solid angles for each
monomer.
The Gibbs Free-Energy G is given by

Y = exp G

(744)

where in this case G is given by

G = U T S T L

(745)

in which L is the average length. Since the change in the (thermodynamic)

internal energy is given by
dU = T dS + T dL

(746)

then G is a function of T and T since

dG = S dT L dT

(747)

Furthermore, the average length can be determined from thermodynamics, as a

partial derivative of G with respect to the tension T at constant T

G
L =
(748)
T T
146

L = kB T

ln Y
T

(749)
T

which leads to the expression for the average length as a function of tension

L = kB T N

sinh[ T a ]
T a

(750)

This is evaluated as

L = N a

1
coth[ T a ]
T a

(751)

It is seen that the effect of tension is that of extending the length of the polymer,
1

0.8

L/Na

0.6

0.4

0.2

0
0

T a / k BT

Figure 37: The length-tension relation for a polymer.

whereas the temperature acts to contract the polymer. For small values of the
ratio of the tension to temperature, T a 1, the relationship becomes
approximately linear

1
1
3
L N a
(T a)
( T a ) + ...
(752)
3
45
Therefore, for small tensions the polymer acts like a rubber band. The polymer
will contract as the temperature is increased. However, for large values of the
tension T a 1 the average length saturates at the value N a as the length
is given by

1
L N a 1
+ ...
(753)
T a
with exponentially small corrections. This occurs since, for large ratios of the
tension to the temperature, all the segments are aligned parallel.

147

The Helmholtz Free-Energy F is found from

F = G + T L

(754)

N kB
N kB

sinh[ T a ]
T ln 4
T a
T + N a T coth T a

where T should be expressed in terms of L.

148

(755)

The Grand-Canonical Ensemble

The Grand-Canonical Ensemble allows one to thermally average over a system

which is able to exchange both energy and particles with its environment. The
probability distribution function for the Grand-Canonical Ensemble can be derived by considering the closed system consisting of a subsystem and its larger
environment. The probability distribution for the total closed system is calculated in the Micro-Canonical Ensemble. It is assumed that the number of
particles and the energy can be uniquely partitioned into contributions from
either the subsystem or its environment
ER = E + ER
NT = N + NR

(756)

Likewise, one assumes that for a given value of N , the infinitesimal phase space
volume element can also be uniquely partitioned into factors representing the
system and its environment. In this case, the probability dp for finding the
entire closed system in a volume element dT of its phase space
dp = mc dT

(757)

dp = mc dN dR,NT N

(758)

can be expressed as
where the systems phase space element is composed of the contributions from
N particles and has energy E, while the reservoir has NT N particles and
has energy ET E. Since we are only interested in the probability distribution
for finding the system in the a volume element dN corresponding to having
N particles and energy E and are not interested in the environment, we shall
integrate over the phase space available to the environment. This results in the
probability for finding the system in a state with N particles and in a volume
of phase space dN with energy E being given by
dp = mc R,NT N (ET E) dN

(759)

where R,NT N (ET E) is the volume of accessible phase space for the reservoir
which has N particles and energy ET E. The Micro-Canonical Distribution
Function mc can be expressed as
mc =

1
T,NT (ET )

(760)

where T,NT (ET ) is the entire volume of phase space accessible to the closed
system with energy ET . Since T,NT (ET ) can be expressed in terms of the
total entropy of the closed system ST (ET ), the Micro-Canonical Distribution
Function can be expressed as

1
mc =
exp ST (ET , NT )/kB
(761)
NT ,0
149

where NT ,0 is the volume of phase space which represents one microscopic state
of the system with NT particles. The volume of accessible phase space for the
reservoir can also be written in terms of its entropy

R,NT N (ET E) = NR ,0 exp SR (ET E, NT N )/kB
(762)
where NR ,0 is the volume of phase space which represents a single microscopic
state of the reservoir which contains NR particles. Hence, the probability dp
for finding the N particle system in an infinitesimal volume of phase space dN
with energy E is given by

dp
dpN,E =
dN
dN

dN
= exp ( SR (ET E, NT N ) ST (ET , NT ) )/kB
N,0
(763)
where we have assumed that
NT ,0 = NR ,0 N,0

(764)

Since the environment has been assumed to be much larger than the system
both E and N are small compared to ET and NT . Therefore, it is reasonable
to assume that the entropy of the reservoir can be Taylor expanded in powers
of the fluctuations of E from the thermodynamic value U and the fluctuations
of N from its thermodynamic value N .
SR (ET E, NT N )

SR (ET U, NT N )

SR (ER , NT N )
+ (U E)

ER
E U

T
SR (ET U, NR )
+ ...
+ (N N )

NR
NT N
(765)

On using the definitions of the thermal reservoirs temperature

SR (UR , N R )
1
=
UR
T
NR
and its chemical potential

SR (UR , N R )
N R

=
UR

(766)

(767)

the expansion becomes

SR (ET E, NT N )

= SR (ET U, NT N )
+

(U E)
(N N )

+ ...
T
T

150

(768)

where and T are the chemical potential and temperature of the reservoir. The
total entropy ST (ET , NT ) is extensive and can be decomposed as
ST (ET , NT ) = SR (ET U, NT N ) + S(U, N )

(769)

Thus, the Grand-Canonical Distribution Function can be written as

dp
N,0 = exp ( E N ) exp ( U N ) S(U, N )/kB
d N,E
(770)
or

dp
N,0 = exp ( E N ) exp
(771)
d N,E
where is the Grand-Canonical Potential (T, V, )
= U T S N

(772)

describing the thermodynamics of the system. Once again, we note that the
quantities E, N and in the probability distribution function are properties of
the system and that the only quantities which describe the environment are the
temperature T and the chemical potential . The Grand-Canonical Partition
function is defined by

= exp

so the Grand-Canonical Distribution Function can be written as

dp
1
N,0 =
exp ( HN N )
d N

(773)

(774)

where HN is the Hamiltonian for the N particle system. The exponential factor containing the Hamiltonian automatically provides different weights for the
regions of N particle phase space. The quantity

dp
dpN =
dN
(775)
d N
is the probability for finding the system to have N particles and be in the volume of phase space dN . Hence, the Grand-Canonical Probability Distribution
Function can be used in determining the average of any function defined on the
N particle phase space AN via

Z
X
dp
A =
dN
AN
(776)
d N
N =0

or, more explicitly

Z
1 X
dN
A =
exp ( HN N ) AN

N,0
N =0

151

(777)

and involves an integration over the N particle phase space and a summation
over all possible particle numbers.
The Grand-Canonical Partition Function
The quantity

dpN =

dp
d

(778)

is the probability for finding the system to have N particles and also be in the
volume of phase space dN . The probability pN for finding the system as having
N particles anywhere in its phase space is found by integrating over all dN

Z
dN
1
exp ( HN N )
(779)
pN =

N,0
Since the probability pN must be normalized, one requires that

pN = 1

(780)

N =0

since a measurement of the number of particles in the system will give a result which is contained in the set 0, 1, 2, . . . , . This normalization condition
determines as being given by

=
=
=

Z
X
dN
exp ( HN N )
N,0
N =0

Z

X
dN
exp HN
exp N
N,0
N =0

X
exp N ZN

(781)

N =0

which relates the Grand-Canonical Partition Function to a sum involving the

Canonical Partition Functions for the N particle systems ZN .
Thermodynamics Averages and the Grand-Canonical Ensemble
The above normalization condition can be used to evaluate and, hence, the
Grand-Canonical Potential . One can then evaluate thermodynamic averages
directly from (T, V, ), via
= kB T ln

152

(782)

Thus for example, knowing one can find the average number of particles
N via the thermodynamic relation

N =
(783)
T
which can be expressed as

=
=
=
=

ln
kB T

T
1
kB T
T

1 X exp[N ] ZN
kB T

T
N =0

X
1
N exp N ZN

(784)

N =0

Hence, pN defined by
pN

1
=
exp

(785)

appears to be the probability for the system to have N particles, as

N =

N pN

(786)

N =0

Since ZN is given by
Z
ZN =

dN
exp
N,0

(787)

on substituting in the expression for pN and combining the exponentials, we

can express pN as

Z
dN
1
exp ( HN N )
(788)
pN =

N,0
which is in agreement with our previous identification.
More generally, given the average of a quantity A defined by

Z
1 X
dN
A =
exp ( HN N ) AN

N,0

(789)

N =0

and on defining the thermodynamic average of A in the Canonical Ensemble

with N particles as

Z
1
dN
AN =
exp HN AN
(790)
ZN
N,0
153

one finds that

1 X
exp N ZN AN

N =0

pN AN

(791)

N =0

as is expected.
Homework:
Derive the probability distribution function gc (N, {pi , qi }N ) for the GrandCanonical Ensemble by maximizing the entropy subject to the three constraints
Z
X

dN gc

dN HN gc

N =0
Z
X
N =0

Z
N

dN gc

(792)

N =0

6.1

The Ideal Gas

The Grand-Canonical Partition function is given by

Z
exp

X
dN exp HN
=
N ! ( 2 h )dN
N =0

X
=
exp N ZN

(793)

N =0

However, for an ideal gas the Canonical partition function Z)N is given by
ZN

1
=
N!

V
d

N
(794)

Therefore, one has

1
N!

exp[ ] V
=
d
N =0

exp[ ] V
= exp
d

154

(795)

This leads to the expression for the Grand-Canonical Potential

kB T ln

kB T exp

V
d

The average number of particles is given by

N =
V,T

V
= exp
d

(796)

(797)

Thus, the chemical potential is given by the equation

N d
= kB T ln
V

(798)

which is identical to the result found by using the Canonical Ensemble. Furthermore, on using the thermodynamic relation

P =
V ,T

kB T
= exp
(799)
d
which on combining with the expression for N , results in the ideal gas law
P = N

kB T
V

(800)

Homework:
Consider an ideal gas of atoms represented by the Grand-Canonical Ensemble.
Show that the probability PN of finding a subsystem with N atoms is given
by
PN =

1
N
N exp
N!

where N is the average number of atoms.

155

(801)

6.2

Fluctuations in the Number of Particles

In the Grand-Canonical Ensemble the probability of finding the system in a

state with N particles in a volume element dN of phase space is given by

dp
dpN,E =
dN
d N,E

1
dN
=
exp N exp HN ({pi , qi })
(802)

N,0
where the Grand-Canonical Partition Function is given by
=

exp

Z
N

N =0

dN
exp
N,0

HN ({pi , qi })

(803)

The average number of particles N is defined by the expression

1 X
dN
N =
N exp N
exp HN ({pi , qi })

N,0

(804)

N =0

where, since we are not interested in the position of the N -particle system in
its phase space, we have integrated over dN . The above expression can be
re-written in terms of a logarithmic derivative of w.r.t. the product .
Alternatively, on defining the fugacity z as

z = exp
(805)
one may express N as

Z

1 X
dN
N
N z
N =
exp HN ({pi , qi })

N,0

(806)

N =0

which can be expressed as the derivative w.r.t. z

N =
z

z
ln
= z
z

(807)

The average value N 2 can also be written as a second derivative w.r.t. or

1
2
2
z
+ z
N
=

z
z 2

2
1

=
z

(808)

z
156

Likewise, the average squared fluctuations of N is given by

N 2

= N2 N

2
2
1

1
=
z
z
2

z

2

=
z
ln
z

N
=
z
z
= kB T

(809)

The relative fluctuation of the particle number is given by

N 2
N

kB T
N

(810)
V,T

which is of order 1/N and vanishes in the thermodynamic limit.

The above expression for the relative fluctuations is not expressed in terms
of quantities that are not easily measurable. However, the factor ( N
)T can be
V
re-written in terms of ( P )T which is quite easily measurable. On combining
the expression for the infinitesimal change in the Grand-Canonical Potential
d = S dT P dV N d

(811)

= P V

(812)

V dP = S dT N d

(813)

with
one finds the relation

For a process at constant T this reduces to a relation between P and

V dP = N d
which on dividing by dV becomes

P

= N
V
V T
V T

(814)

(815)

This second term in this relation can be expressed in terms of the derivative of
the volume per particle
V
v =
(816)
N
157

P
V

=
T

(817)
T

However, the second term can be reinterpreted as the derivative of w.r.t. N

at constant V . Therefore, one obtains

P
N

V
=
V T
v V N T
(818)
but since N = V /v one has

P
V
V T

N T

2
N

=
V
N T
V
= 2
v

(819)

On inverting this relation and substituting this in the expression for the relative
fluctuations in the number of particles, one finds that

N 2
kB T P
(820)
=

2
V2
V T
N
Again, one finds that the relative fluctuations of the particle are inversely proportional to the volume and thus vanish in the thermodynamic limit.
Homework:
Show that for an ideal gas
N 2 = N

6.3

(821)

Energy Fluctuations in the Grand-Canonical Ensemble

The average energy E in the Grand-Canonical Ensemble can be represented as

a derivative of the logarithm of the Grand-Canonical Partition Function w.r.t.
as

1
E =
z

ln
=
(822)

z,V
where the fugacity z is held constant. Likewise, the mean squared energy is
given by

1 2
E2 =
(823)
2 z,V
158

Hence, the mean squared fluctuation of the energy is given by

E 2

= E2 E
2

ln
=
2
z,V

E
=
z,V

E
= kB T 2
T z,V

(824)

The above relations are similar to the relations found for the Canonical Ensemble, but are different because the derivatives are evaluated at constant N
for the Canonical Ensemble and at constant fugacity for the Grand-Canonical
Ensemble. Hence, the energy fluctuations are different in the Canonical and the
Grand-Canonical Ensembles.
The cause for the difference between the fluctuations in the Grand Canonical
and the Canonical Ensembles is not easy to discern from the above expression
since the fugacity is difficult to measure. The difference can be made explicit
by using thermodynamics, in which case we identify the average energy E in
the Grand Canonical Ensemble with U . That is, since on holding V fixed
N = N (T, z)

(825)

U = U (T, N (T, z))

(826)

one has
so the infinitesimal variation in U can be expressed as

U
U
dU =
dT +
dN
T N
N T

U
U
N
N
=
dT +
dT +
dz
T N
T z
z T
N T
(827)
Hence, the derivative of U w.r.t. T with z kept constant is given by

U
U
U
N
=
+
T z,V
T N ,V
T z,V
N T,V

U
N
= CN,V +
T z,V
N T,V

(828)

Therefore, part of the energy fluctuation in the Grand-Canonical Ensemble is

the same as the energy fluctuations in the Canonical Ensemble where the number of particles and the volume are fixed and the other contribution originates
159

from the temperature dependence of the number of particles.

One can obtain more insight into the origin of the energy fluctuations in the
Grand-Canonical Ensemble by using a specific thermodynamic relation involving
the factor

U
(829)
N T,V
and another relation for

N
T

(830)
z,V

On considering the infinitesimal change

dU = T dS P dV + dN
with constant T and V , then on dividing by dN one obtains

U
S
= T
+
N T,V
N T,V
Substitution of the Maxwell relation

S

=
T N ,V
N T,V

(831)

(832)

(833)

obtained from the analyticity of the Helmholtz Free-Energy F (T, V, N ), yields

the first thermodynamic relation

U

= T
(834)
T N ,V
N T,V
The thermodynamic relation for the factor

N
T z,V
is obtained from N (T, V, ) by expressing = (T, z) so

N
N
N

=
+
T z,V
T ,V
T,V T z,V
The first term on the right-hand side can be re-written yielding

N
N

=
+
T z,V
T,V T N ,V
T,V T z,V

(835)

(836)

(837)

and on recognizing that is related to the fugacity by

= kB T ln z
160

(838)

then

Hence, we have

N
T z,V

=
z

(839)

N
+
T,V T N ,V
T
T,V

1 N

T
T
T,V
T N ,V

(840)

(841)

On substituting the first relation in the above expression one finds

N
1 N
U
=
T z,V
T
T,V N T,V

(842)

This analysis yields the two equivalent expressions for the energy fluctuations
in the Grand-Canonical Ensemble

2
N

2
2
E = kB T CN,V + kB T
T
(843)
T,V
T N ,V
and
E 2

=
=

kB T CN,V + kB T
kB T 2 CN,V + N 2

U
N

T,V

U
N

2
T,V

(844)
T,V

where, in the last line we have used the equality

N
N 2 = kB T
T,V

(845)

The second expression for E 2 shows that the mean squared energy fluctuations have two contributions, one originating from the mean squared energy
fluctuation with a fixed number of particles and the second contribution comes
from the mean squared fluctuations in the particle number where each particle
E
that is exchanged with the reservoir carries with it the energy ( N
)T .
Homework:
Show that the specific heat at constant N is related to the specific heat at
constant via the relation
2
N
T

CV,N = CV, T

,V
T,V

161

(846)

Quantum Statistical Mechanics

Quantum Statistical Mechanics describes the thermodynamic properties of macroscopically large many-particle quantum systems.

7.1

Quantum Microstates and Measurements

In Quantum Mechanics a microscopic state of a many particle system is represented by a vector in Hilbert space
| >
Any two states | > and | > in Hilbert space have an inner product,
which is given by a complex number
< | > = < | >

(847)

The states are normalized to unity

< | >= 1

(848)

A set of states | n > form an orthonormal set if their inner product satisfies
< n | m > = n.m

(849)

where n,m is the Kronecker delta function. An orthonormal set is complete if

any arbitrary state can be expanded as
X
| >=
Cn | n >
(850)
n

where the expansion coefficients Cn are complex numbers, which are found as
Cn = < n | >

(851)

Thus, the expansion is given by

| >=

|n >< n| >

(852)

Hence, we have the completeness condition

X
I =
|n >< n|

(853)

Using the completeness condition, the normalization condition can be written

as
X
< | >=
< |n >< n| >= 1
(854)
n

162

On inserting a complete set of coordinate eigenstates, this condition reduces to

1 =

3N Z
Y
i=1

dqi (q1 , q2 , q3 , . . . , q3N )

(855)

Physical observables Aj ({pi , qi }) are represented by Hermitean operators

Aj ({
pi , qi }). If the Poisson Bracket of two classical observables Aj and Ak is
represented by

Aj , A k
PB

then the Poisson Bracket of two quantum operators is represented by the commutator of the operators divided by i h

1
Aj , A k
=
Aj , Ak
(856)
i h
PB
In particular, since the Poisson Bracket for canonically conjugate coordinates
and momenta are given by

pi , qj
= i,j
(857)
PB

then, one has the commutation relations

pi , qj
= i h i,j

(858)

The possible values of a measurement of A on a systems are the eigenvalues

an found from the eigenvalue equation
A | an > = an | an >

(859)

If a measurement of A on a system results in the value an , then immediately

after the measurement the system is definitely known to be in a state which is
an eigenstate of A with eigenvalue an . The number of eigenstates corresponding to the same eigenvalue an is known as the degeneracy Dn . The degenerate
eigenstates | an, > can be orthonormalized. Since the observable quantities
are represented by Hermitean operators, the eigenstates form a complete set
and the eigenvalues an are real.
It is possible to know with certainty the simultaneous values of measurements
of Aj and Ak on a state if the operators Aj and Ak commute
[ Aj , Ak ] = 0
163

(860)

In which case it is possible to find simultaneous eigenstates of Aj and Ak . A

state is completely determined if it is an eigenstate of a maximal set of mutually
commuting operators.
If a system is definitely in a state | >, it is a pure state. The probability
that a measurement of A on the state | > will yield the result an is given by
Dan

P (an ) =

| < an, | > |2

(861)

where the sum is over the number of Dn -fold degenerate eigenstates21 | an, >
that correspond to the eigenvalue an . Thus, the average value A of the measurement of A on a pure state | > is given by
X
A =
Pan an
n

< | an, > an < an, | >

< | A | an, > < an, | >

= < | A | >

(862)

In the coordinate representation, the average can be expressed as

A =

3N Z
Y

dqi

(q1 , q2 , . . . , q3N ) A({

pi , qi }) (q1 , q2 , . . . , q3N ) (863)

i=1

In the time interval in which no measurements are performed, a state | >

evolves according to the equation
i h

| >
| >= H
t

(864)

Thus, the time evolution of the state | > in our closed system is given by

i
H t | >
(865)
| (t) > = exp
h

7.2

The Density Operator and Thermal Averages

A macroscopic state may correspond to numerous microscopic states | >

which are assigned probabilities p . Such macroscopic states are known as
21 The states are assumed to have been orthogonalized, so that the eigenstates of A form a
complete orthonormal set.

164

mixed states. The ensemble average of A is given by the weighted average

X
A =
p < | A | >

X X

p < | n > < n | A | m > < m | >

n,m

< n | A | m >

n,m

p < m | > < | n > (866)

On defining the density operator via

X
=
p | > < |

(867)

the average can be represented as

X
< n | A | m > < m | | n >
A =
n,m

Trace A

< n | A | n >

(868)

where the last line defines the Trace over a complete set of states22 .
Since the probabilities p are normalized, the density operator satisfies
X
Trace =
< n | | n >
(869)
n

X X
n

p < n | > < | n >

(870)

< | n > < n | >

(871)

p < | >

(872)

(873)

(874)

Thus, the trace of the density matrix is unity.

The time-dependence of the density operator is found from the time dependence of the basis states
X
(t) =
p | (t) > < (t) |

22 Gleasons

Theorem assures us that the only possible measure of probability on a Hilbert

space has the form of a density matrix. [A.M. Gleason,Measures on the closed subspaces of
a Hilbert space, Journal of Mathematics and Mechanics 6, 885-893, (1957).].

165

p exp

i
H t
h

| > < | exp

i
H t
h

(875)
This shows that the time evolution of the density operator has the form of a
unitary transformation. From this, one finds the equation of motion for the
density operator is given by
i h

, ]
= [H
t

(876)

or, equivalently
d

] = 0
= i h
+ [ , H
(877)
dt
t
This last expression could have been derived directly from the Poisson equation
of motion for the probability density by Canonical Quantization.
i
h

In the Micro-Canonical Ensemble, all the states | n, > in the ensemble

must be energy eigenstates belonging to the same energy eigenvalue E = En
| n, > = En | n, >
H

(878)

The number of these degenerate eigenstates is denoted by N . Therefore, in an

equilibrium state, the probabilities are given by
1
N

p =

(879)

which is equivalent to the hypothesis of equal a priori probabilities. Thus, the

density operator in the Micro-Canonical can be written as
mc =

N
1 X
| n, > < n, |
N =1

(880)

On defining the von Neumann entropy in terms of the density operator by

S = kB Trace ln

(881)

this is evaluated as
S

= kB

p ln p

= kB ln N

(882)

in agreement with our previous notation. If the energy E of the Micro-Canonical

Ensemble corresponds to a non-degenerate state, it is a pure state and has
N = 1, then the entropy vanishes. This observation is in accordance with the
universal constant value of entropy, demanded by Nernsts law in the T 0, as
166

being defined as being zero.

Since the set of all energy eigenstates are complete, in the Canonical Ensemble the density matrix is given by

1

c =
exp HN
(883)
ZN
where the partition function is given by the normalization condition on c

ZN = Trace exp HN
(884)
If the trace is evaluated in a complete set of energy eigenstates | > , the
result for the partition function reduces to

X

ZN =
exp E
(885)

where the sum runs over all the degenerate states for each energy.
In the Grand-Canonical Ensemble, one is working in a Hilbert space with a
variable number of particles (Fock Space). In this case, one has

1

gc =
exp H N
(886)

in which the Grand Canonical Partition Function is given by

= Trace exp H N

(887)

is the number operator. If the particle number is conserved so that

where N

can be diagonalized simultaneously, then the partition function

both N and H
can be reduced to the form

X
=
exp N ZN
(888)
N =0

where the N are the eigenvalues of the number operator.

Homework:
Consider a system which has statistically independent sub-systems A and B.
Statistically independent systems are defined as having a factorizable density
operator
= A B
(889)

167

The von Neumann entropies of the subsystems are defined as

SA = kB TraceA A ln A
and

SB = kB TraceB

(890)

B ln B

(891)

where the trace over A evaluated as a sum over a complete set of states for the
subsystem A, and with a similar definition holds for subsystem B. Show that
the total entropy is additive
S = SA + SB

(892)

For two subsystems which are not statistically independent, one can define
the density operators for the subsystems by the partial traces
A

TraceB

TraceA

(893)

where each trace is evaluated as a sum over a complete set of states for the
subsystem. In this case, one can prove that the entropy satisfies the triangle
inequality23
SA + SB > S > | SA SB |
(894)

Find an example of a quantum system in a pure state for which S = 0 but,

nevertheless, both SA and SB are greater than zero.
Homework:
Show that the von Neumann entropy is constant, due to the unitary nature
of the time evolution operator.

7.3

Indistinguishable Particles

Any labelling of indistinguishable particles is unphysical, by definition. Since

any measurement will A will produce results which are independent of the choice
of labelling the operators must be symmetric under any permutation of the
labels. Hence, every physical operator A({pi , ri }), including the Hamiltonian,
must be a symmetric function of the particle position and momenta vectors
23 H. Araki and Elliott H. Lieb, Entropy Inequalities, Communications in Mathematical
Physics, 18, 160-170 (1970).

168

(
pi , ri ) of the N particles. Any permutation of the set of N particles can be
represented in terms of the successive interchanges of pairs of particles. The pair
of particles labelled as (
pi , ri ) and (
pj , rj ) are interchanged by the permutation
operator Pi,j . The permutation operator Pi,j is Hermitean and unitary. In the
coordinate representation, the permutation operator has the effect
Pi,j (r1 , r2 , . . . , ri , . . . , rj , . . . , rN ) = (r1 , r2 , . . . , rj , . . . , ri , . . . , rN ) (895)
Since any physical operator must be invariant under the permutation of any two
particles, one has
1
= A({pi , ri })
(896)
Pi,j A({pi , ri }) Pi,j
Hence, every Pi,j commutes with every physical operator including the Hamiltonian
[ Pi,j , A ] = 0
(897)
Therefore, the permutation operators can be diagonalized simultaneously together with any complete set of compatible physical operators.
The eigenvalues of the permutation operator are defined as p
Pi,j | p > = p | p >

(898)

However, as two successive interchanges of the labels i and j leaves the state
unchanged, one has
2
Pi,j
= I
(899)
Thus, the eigenvalues of the permutation operators must satisfy
2
Pi,j
| p > =

p2 | p >
| p >

(900)

which leads to the solutions for the eigenvalues

p = 1

(901)

which are constants of motion. Since the particles are indistinguishable, all pairs
particles must have the same value of the eigenvalue p.
Since the real space probability density is given by
| (r1 , r2 , . . . , ri , . . . , rj , . . . , rN ) |2

(902)

is observable, and not the wave function, measurements on two states differing
only by the interchange two particle labels will yield results which have identical
distributions.
Fermions and Bosons

169

Particles are known as fermions if their wave functions are antisymmetric

under the interchange of any pair of particles, whereas the particles are called
bosons if the wave function is symmetric under a single interchange.
In certain two-dimensional systems, states with mixed symmetry can occur24 . The exotic particles with mixed symmetries are known as anyons, and
they obey fractional statistics.
Fermions and Fermi-Dirac Statistics
Particles are fermions, if their wave function is antisymmetric under the
interchange of any pair of particles
(r1 , r2 , . . . , ri , . . . , rj , . . . , rN ) = (r1 , r2 , . . . , rj , . . . , ri , . . . , rN )

(903)

Examples of fermions are given by electrons, neutrinos, quarks, protons and

neutrons, and Helium3 atoms.
Bosons and Bose-Einstein Statistics
Particles are bosons if their wave function is symmetric under the interchange
of any pair of particles
(r1 , r2 , . . . , ri , . . . , rj , . . . , rN ) = (r1 , r2 , . . . , rj , . . . , ri , . . . , rN )

(904)

Examples of bosons are given by photons, gluons, phonons, and Helium4 atoms.
One can represent tan arbitrary N -particle state with the requires symmetry
as a linear superposition of a complete set of orthonormal N -particle basis states
. These many-particle states are composed as a properly symmetrized product
of single-particles wave functions (r), which form a complete orthonormal set
Z
d3 r 0 (r) (r) = 0 ,
X
(r0 ) (r) = 3 (r r0 )
(905)

The many-particle basis states 1 ,2 ,...,N (r1 , r2 , . . . , rN ) are composed from

a symmetrized linear superposition of N single-particle states
X
1 ,2 ,...,N (r1 , r2 , . . . , rN ) = 1
( 1 )np P 1 (r1 ) 2 (r2 ) . . . N (rN )
P

(906)
where the sum runs over all N ! permutations of the particle indices and np
is the order of the permutation. That is, np is the number of pairwise interchanges that produce the permutation. For boson wave functions, the positive
24 J.M. Leinaas and J. Myrheim, On the theory of identical particles. Il Nuovo Cimento
B 37, 1-23 (1977).

170

sign holds, and the minus sign holds for fermions.

The Pauli Exclusion Principle
The Pauli Exclusion Principle states that a single-particle state cannot contain more than one fermion. That is, any physically acceptable many-particle
state describing identical fermions 1 ,...,i ,...,j ... must have i 6= j . Otherwise, if i = j , then one has
Pi,j 1 ,...,i ,...,i ... (r1 , . . . , ri , . . . , rj , . . .) = 1 ,...,i ,...,i ... (r1 , . . . , ri , . . . , rj , . . .)
(907)
since changing the orders of i (ri ) and i (rj ) does not change the sign of the
wave function. Furthermore, since the many-particle state is an eigenstate of
Pi,j with eigenvalue 1, one also has
Pi,j 1 ,...,i ,...,i ... (r1 , . . . , ri , . . . , rj , . . .) = 1 ,...,i ,...,i ... (r1 , . . . , ri , . . . , rj , . . .)
(908)
Therefore, on equating the right-hand sides one finds that
1 ,...,i ,...,i ... (r1 , . . . , ri , . . . , rj , . . .) = 0

(909)

which indicates that the state where two fermions occupy the same singleparticle eigenstate does not exist.
The Occupation Number Representation
Instead of labeling the many-particle basis states by the eigenvalues 1 , 2 ,
. . ., N we can specify the number of times each single-particle eigenstate is
occupied. The number of times that a specific one-particle state occurs is
denoted by n which is called the occupation number. Specifying the occupation
numbers n1 , n2 , . . ., uniquely specifies the many-particle states n1 ,n2 ,... .
For a system with N particles, the sum of the occupation numbers is just equal
to the total number of particles
X
n = N
(910)

For fermions, the Pauli exclusion principle limits n to have values of either 0
or 1. For bosons, n can have any positive integer value, including zero.
The orthonormality relation

N Z
Y
d3 ri n0 ,n0 ,... (r1 , r2 , . . . , rN ) n1 ,n2 ,... (r1 , r2 , . . . , rN ) = n0 ,n1 n0 ,n2 . . .
i=1

(911)
leads to the identification of the normalization constant . For fermions, one
has

= N!
(912)
171

which is just the number of terms in the wave function. For bosons, the normalization is given by
s
Y
=
N!
n !
(913)

where it should be noted that 0! is defined as unity.

An arbitrary many-particle state can be expressed in terms of the set of
basis states as
X
(r1 , r2 , . . . , rN ) =
C(n1 , n2 , . . .) n1 ,n2 ,... (r1 , r2 , . . . , rN ) (914)
{n }

where the expansion coefficients C(n1 , n2 , . . .) play the role of the wavefunction in the occupation number representation.

7.4

The Spin-Statistics Theorem and Composite Particles

The Spin-Statistics Theorem has its origins in Quantum Field Theory, and states
that fermions have half odd integer spins and that bosons have integer spins.
The theorem was first proposed by Markus Fierz25 and later proved by Wolfgang
Pauli26 . Rather than prove the theorem, we shall be content to show that if the
Spin-Statistics holds for elementary particles then it will also hold for composite
particles. We shall also outline an observation which might be turned into a
proof of the Theorem.
Two indistinguishable composite particles are permuted if all the elementary particles composing one composite particles are interchanged with the corresponding constituent particles of the other. When two identical composite
particle each of which is composed of nF elementary fermions and nB elementary bosons are interchanged, the wave function will change by a factor of
(1)nF (+1)nB
Hence, if a composite particle contains an odd number of fermions, the composite particle will be a fermion since the wave function of two such identical
composite particles is antisymmetric under the interchange of the particles. On
the other hand, if a composite particle contains an even number of fermions, the
composite particle will be a boson since the wave function of two such identical
composite particles will be symmetric under the interchange of the particles.
The above result is consistent with the application of the Spin-Statistics
Theorem. A composite particle containing nF fermions and nB bosons will
25 M. Fierz, Uber

die relativistische Theorie kr

aftefreier Teilchen mit beliebigem Spin,
Helvetica Physica Acta 12, 3-37, (1939).
26 W. Pauli, The Connection Between Spin and Statistics, Phys. Rev. 58, 716-722 (1940).

172

(-1)nF(+1)nB
o

x
x

o
x

Figure 38: Composite particles are interchanged when their constituent particles
are interchanged. For a system with nF fermions and nB bosons, the interchange
changes the wave function representing the pair of composite particles by a factor
of (1)nF (+1)nB .
have a spin composed of nF half-odd integers and nB integers. If nF is odd,
the resulting spin will be a half-odd integer, whereas if nF is even the resulting
spin will be integer.
Thus, a composite particle containing an odd number of fermions nF will
have a half-odd integer spin and the wave function of a pair of identical composite particles with odd nF will be antisymmetric under their interchange.
Likewise, a composite particle containing an even number of fermions will have
an integer spin, and the wave function of a pair of identical composite particles
with even nF will be symmetric under the interchange of the composite particles. Hence, the Spin-Statistics Theorem will be true for identical composite
particles if it is true for their elementary constituents.
============================================
Example: The Isotopes of He
He3 has two protons, a neutron and two electrons. Therefore, He3 is a
fermion.
He4 has an extra neutron. Thus, it contains two protons, two neutrons and
two electrons. Therefore, He4 is a boson.
The difference in the quantum statistics of the two isotopes results in their
having very different properties at low temperatures, although they are chemically similar. For example, their phase diagrams are very different and He4
exhibits the phenomenon of superfluidity.
============================================
The idea behind the Spin-Statistics Theorem is that field operators must

173

transform under Lorentz transformations according to the spin of the particles

that they describe. In a non-relativistic theory, the boost generators generates
rotations. It is noted that rotations about an angle 2 are equivalent to the
identity for fields with integer spins, such as scalar or vector fields. However,
this is not true for odd half integer spin fields represented by spinors.
with arbitrary spin, given by
Consider the products of two fields ,

R()
(r)
(r)

(915)

This product creates (or annihilates) two particles with spins that are rotated
by relative to each other. Now consider a rotation of this configuration by
around the origin. Under this rotation, the two points r and r switch places,
and the spins of the two field are rotated through an additional angle of .
Thus, the two product of the field operators transform as

R(2)
(r)
R()
(r)

(916)

which for integer spins the above product is equivalent with

(r)
R()
(r)

(917)

and for half-integer spins, the product is equivalent to

(r)
R()
(r)

(918)

Therefore, the particles associated with two spinful operator fields can be interchanged by a rotation, but must also involve a change of sign for half-integer
spin fields.
A proof of the Spin-Statistics Theorem based on this observation would require that the following assumptions hold true:
(i) The theory has a Lorentz invariant Lagrangian.
(ii) The vacuum is Lorentz invariant.
(iii) The particle is a localized excitation which is not connected to any other
object.
(iv) The particle is propagating so it has a finite mass.
(v) The particle is a real excitation, so that states involving the particle have
positive definite norms.
Homework
Determine whether the isotope Rb87 is a Boson or a Fermion particle.

7.5

Second Quantization

Second Quantization for Bosons

174

For bose particles, one can introduce a set of operators a

with single-particle
quantum numbers which are defined by

a
n1 ,...,n ,... (r1 , . . . , rN ) =
n + 1 n1 ,...,n +1,... (r1 , . . . , rN +1 )
(919)
and the Hermitean conjugate operator is found to satisfy
a
n1 ,...,n ,... (r1 , . . . , rN ) =

n n1 ,...,n 1,... (r1 , . . . , rN 1 )

(920)

The creation operator a

adds an additional particle to the quantum level ,
and a
annihilates a particle in the state . If in the initial state , n = 0
then application of the annihilation operator yields zero.
One can define the number operator n
as the combination
n
= a
a

(921)

which has the eigenvalue equation

n
n1 ,...,n ,... (r1 , . . . , rN ) = n n1 ,...,n ,... (r1 , . . . , rN )

(922)

where the eigenvalue n is the occupation number for the single-particle state.
Hence, the number operator represents a measurement of the number of particles
in a quantum level . The total number of particles in the system N corresponds
to the eigenvalues of the operator
X
=
N
n

a
a

(923)

The creation and annihilation operators satisfy the commutation relations

[a
, a
0 ]

= ,0

(924)

and
[a
, a
0 ]

[a
, a
0 ]

(925)

as can be easily shown. For example, the diagonal elements of the first commutator yields

a
a
a
a
| n1 ...n ... > =
( n + 1 ) n | n1 ...n ... >
=
175

1 | n1 ...n ... >

(926)

and the off-diagonal elements yield zero.

On defining the vacuum state as | 0 >, one can create any arbitrary basis
state by the repeated action of the creation operators
Y ( a )n

|0 >
(927)
| n1 ,n2 ... > =
n !

where the product runs over all the single-particle quantum numbers .
Any operator can also be expressed in terms of the creation and annihilation
operators. For example, the Hamiltonian of a system of interacting particles can
be written as
X
X
1
0 | > a
=
< 0 0 | Vint | > a
0 a
0 a
a

< 0 | H
0 a
+
H
2!
0
0
0
;

,; ,

(928)
where
0 | > =
< |H
0

d r

0 (r)

p2
+ V0 (r)
2m

(r)

(929)

are the matrix elements of the single-particle energy and

Z
Z
0 0
3

d3 r0 0 (r) 0 (r0 ) Vint (r, r0 ) (r) (r0 )

< | Vint | > =
d r
(930)
are the matrix elements of the two-body interaction terms.
Second Quantization for Fermions
For fermi particles, one can formally introduce a set of operators c corresponding to the set of single-particle quantum numbers . The operators are
defined by

c n1 ,...,n ,... (r1 , . . . , rN ) =

n + 1 n1 ,...,n +1,... (r1 , . . . , rN +1 )
(931)
for anti-symmetric states . The Hermitean conjugate operator c is then found
to satisfy

c n1 ,...,n ,... (r1 , . . . , rN ) =

n n1 ,...,n 1,... (r1 , . . . , rN 1 ) (932)
Due to the antisymmetric nature of the states , the creation and annihilation
operators must satisfy the anti-commutation relations
{ c , c0 }

= ,0

(933)

and
{ c , c0 }

{ c , c0 } =

176

(934)

are defined by
where the anti-commutator of two operators A and B
}
{ A , B

+ B
A
A B

(935)

These anti-commutation relations restricts the occupation numbers n for a

single-particle state to zero or unity, as required by the Pauli exclusion principle. This can be seen since the anti-commutation relation yields
= c c n1 ,... (r1 , . . . , rN )

c c n1 ,... (r1 , . . . , rN )

(936)

which results in the identification

p
( n 1 ) n n1 ,...,n 2,... (r1 , . . . , rN 2 ) = 0

(937)

so either n = 1 or n = 0. Incorporating this restriction into the definition of

the creation operator yields
c n1 ,...,n ,... (r1 , . . . , rN )

n1 ,...,n +1,... (r1 , . . . , rN +1 )

for n = 1

for n = 0
(938)

and similarly for the annihilation operator

c n1 ,...,n ,... (r1 , . . . , rN )

n1 ,...,n 1,... (r1 , . . . , rN 1 )

for n = 0

for n = 1
(939)

The occupation number operator can be defined as

n
= c c

(940)

which is seen to have the eigenvalue equation

n
n1 ,...,n ,... (r1 , . . . , rN ) = n n1 ,...,n ,... (r1 , . . . , rN )

(941)

, is then defined
similar to the case for bosons. The total number operator, N
as
X
=
N
n

c c

(942)

On defining the vacuum state as | 0 > one can create any arbitrary basis
state by the repeated action of the creation operators

Y
n
| n1 ,n2 ... > =
( c )
|0 >
(943)

177

where the product runs over all the single-particle quantum numbers .
Any operator can also be expressed in terms of the creation and annihilation
operators. For example, the Hamiltonian for a system of interacting particles
can be written as
X
X
=
0 | > c 0 c + 1
H
< 0 | H
< 0 0 | Vint | > c0 c 0 c c

2!
0
0
0
;

,; ,

(944)
where
0 | > =
< |H
0

d r

0 (r)

p2
+ V0 (r)
2m

(r)

(945)

is the matrix elements of the single-particle energy and < 0 0 | Vint | >
represents the matrix elements of the two-body interaction between properly
anti-symmetrized two-electron states. Since ordering is important for fermions,
it is important to note that the order of the annihilation operators is in the reverse order of the corresponding single-particle quantum numbers in the matrix
elements.
Coherent States
Since the occupation number is unrestricted, an unusual type of state is
allowed for bosons. We shall focus our attention on one single-particle quantum
level, and shall drop the index which labels the level. A coherent state | a >
is defined as an eigenstate of the annihilation operator27
a
| a > = a | a >

(946)

where a is a complex number. For example, the vacuum state or ground state
is an eigenstate of the annihilation operator, in which case a = 0.
The coherent state28 can be found as a linear superposition of eigenstates of
the number operator with eigenvalues n
| a > =

Cn | n >

(947)

n=0

On substituting this form in the definition of the coherent state

X
a
| a > =
Cn a
|n >
n

= a

Cn | n >

(948)

n
27 E. Schr

odinger, Der stetige Ubergang

von der Mikro- zur Makromechanik, Naturwissenschaften 14, 664 (1926).
28 R. J. Glauber, Coherent and Incoherent States of the Radiation Field, Phys. Rev. 131,
2766 (1963).

178

and using the property of the annihilation operator, one has

X
X

Cn n | n 1 > = a
Cn | n >
n

(949)

On taking the matrix elements of this equation with the state < m |, and using
the orthonormality of the eigenstates of the number operator, one finds

m + 1 = a Cm
(950)
Cm+1
Hence, on iterating downwards, one finds
m
a
Cm =
C0
m!
and the coherent state can be expressed as
n
X
a

| a > = C0
|n >
n!
n=0
The normalization constant C0 can be found from
n n
X
a a

1 = C0 C0
n!
n=0
by noting that the sum exponentiates to yield

1 = C0 C0 exp a a
so, on choosing the phase of C0 , one has

1
C0 = exp
a a
2

(951)

(952)

(953)

(954)

(955)

From this, it can be shown that if the number of bosons in a coherent state are
measured, the result n will occur with a probability given by

( a a )n
exp a a
(956)
P (n) =
n!
Thus, the boson statistics are governed by a Poisson distribution. Furthermore,
the quantity a a is the average number of bosons n present in the coherent
state.
The coherent states can be written in a more compact form. Since the state
with occupation number n can be written as
|n > =

(a
)n

|0 >
n!
179

(957)

the coherent state can also be expressed as

| a > = exp

a a
2

( a a
)n
|0 >
n!
n=0

or on summing the series as an exponential

a a exp a a

|0 >
| a > = exp
2

(958)

(959)

Thus the coherent state is an infinite linear superposition of states with different
occupation numbers, each coefficient in the linear superposition has a specific
phase relation with every other coefficient.
The above equation represents a transformation between number operator
states and the coherent states. The inverse transformation can be found by
expressing a as a magnitude a and a phase

a = a exp i
(960)
The number states can be expressed in terms of the coherent states via the
inverse transformation

Z 2

n!
1 2
d
|n > =
exp
+
a
exp

i
n

| a >
an
2
2
0
(961)
by integrating over the phase of the coherent state. Since the set of occupation number states is complete, the set of coherent states must also span Hilbert
space. In fact, the set of coherent states is over-complete.
A number of systems do have states which have properties that closely resemble coherent states, such as the photon states in a laser or such as the superfluid
condensate of He4 at low temperatures. Although they do not have the precise
mathematical form of the coherent states investigated by Glauber, the approximate states are characterized by having sufficiently large fluctuations in their
occupation numbers so that the expectation value of the annihilation operator
is well-defined.

180

Fermi-Dirac Statistics

8.1

Non-Interacting Fermions

An ideal gas of non-interacting fermions is described by the Hamiltonian H

which is given by the sum
X
=
H
n

(962)

where n
represents the occupation number of the -th single-particle energy
level. This is just the sum of the contributions from each particle, grouped

according to the energy levels that they occupy. Likewise, the operator N
representing the total number of particles is given by
X
=
N
n

(963)

where the sum is over the single-particle energy levels. Hence, for non-interacting
fermions, the density operator is diagonal in the occupation number representation.
The Grand-Canonical Partition function is given by

= Trace exp ( H N )

(964)

where the trace is over a complete set of microscopic states with variable N
for the entire system. A convenient basis is given by the N -particle states
and N
are diagonal in this basis. The trace
| n1 ,n2 ,... > , since both H
reduces to the sum of all configurations { n }, and since the total number of
particles N is also being summed over, the trace is unrestricted. Therefore, the
trace can be evaluated by summing over all possible values of the eigenvalues
n for each consecutive value of . That is

1
1
1
X
X
X
Trace
...

...
...
(965)
n1 =0 n2 =0 n3 =0

Since the exponential term in reduces to the form of the exponential of a sum
of independent terms, it can be written as the product of exponential factors

X

exp ( H N )
= exp
( ) n

exp

( ) n

(966)

Therefore, the trace in the expression for can be reduced to

1
Y X
=
exp ( ) n

n =0

181

(967)

where the sum is over all the occupation numbers n of either zero or unity as
is allowed by Fermi-Dirac statistics. On performing the sum of the geometric
series, one obtains

1
X
exp ( )n = 1 + exp[( )]
(968)
n =0

Therefore, the Grand-Canonical Partition Function is given by

Y
=
1 + exp[( )]

(969)

The Grand-Canonical Potential is found from

= exp

(970)

which, on taking the logarithm, yields

X
ln 1 + exp[( )]
=

(971)

or
= kB T

1 + exp[( )]

(972)

On introducing the density of single-particle states () defined as

X
() =
( )

(973)

the summation can be transformed into an integral over

Z
= kB T
d () ln 1 + exp[( )]

(974)

where the density of states is bounded from below by 0 . After evaluating the
integral, one may find all the thermodynamic properties of the system from .

8.2

The Fermi-Dirac Distribution Function

The average number of particles in the system N can be determined from via
the relation

N =
(975)
T,V
which is evaluated as
=

kB T
Z

exp[( )]
1
+ exp[( )]

1
d ()
exp[( )] + 1
Z

d ()

182

(976)

or equivalently
N =

1
exp[( )] + 1

(977)

but, by definition, one has

N =

(978)

1
exp[( )] + 1

(979)

Therefore, the function f ( ) defined by

f ( ) =

represents the average number of particles n in a quantum level with a singleparticle energy . The function f () is the Fermi-Dirac distribution function.
The Fermi-Dirac distribution vanishes as

f () exp ( )
(980)
for > kB T . However, for > kB T , the Fermi-Dirac distribution
function tends to unity

f () 1 exp ( )
(981)
and falls off rapidly from 1 to 0 at = where it takes on the value
f () =

1
2

(982)

The range of over which the function differs from either 1 or 0 is governed by

f(e)

0.5

kBT

Figure 39: The Fermi-Dirac Distribution function.

kB T .

183

-(df/d)

kBT

Figure 40: The negative energy derivative of Fermi-Dirac distribution function.

At zero temperature, all the states with single-particle energies less than
are occupied and all states with energies greater than are empty. The value
of (T = 0) is called the Fermi-energy and is denoted by F . Note that since
Z
f ()
d
= 1
(983)

and as T 0 one has

f ()
0

if | | > kB T

(984)

then

f ()
( )

as it resembles the Dirac delta function.

(985)

Thermodynamic Properties
The thermodynamic energy U coincides with the average energy E. The
thermodynamic energy can be obtained from the Grand-Canonical Potential
via its definition
= U T S N
(986)
which, together with
S
N

=
=

184

(987)
T,V

can be inverted to yield

+ T S + N

T

T ,V
T,V

(988)

On substituting
Z

= kB T

d () ln

1 + exp[( )]

(989)

one obtains

U

= kB T 2

( )

exp[( )]

d ()

1 + exp[( )]
exp[( )]
+ kB T
d ()
1 + exp[( )]

Z

=
d ()
exp[( )] + 1

Z
=
d () f ()

X
f ( )
=

(990)

which shows that the thermodynamic energy for a system of particles is just the
average energy, since the average energy of a system of non-interacting particles
is just the sum over the average energies for each particle. This reinforces the
interpretation of f () as the average number of fermions in a single-particle state
with energy .
The entropy S is determined from the equation

S =
T ,V

(991)

which yields
S

1 + exp[( )]

X ( )
exp[( )]

T
1 + exp[( )]

X
kB
ln 1 f ( )

185

X ( )
f ( )
T

+ kB

(992)

However, one may rewrite the factor

( )
kB T

(993)

as
( )
kB T

exp[( )]

ln 1 f ( ) + ln f ( )

(994)

Therefore, on combining the above expressions one finds that the entropy of the
non-interacting fermion gas can be expressed as

X
S = kB
( 1 f ( ) ) ln 1 f ( ) + f ( ) ln f ( )
(995)

The entropy has the form of

S = kB

p ln p

(996)

where p = f ( ) is the probability that the -th level is occupied and

p = 1 f ( ) is the probability that the level is empty. This form of
the entropy follows naturally from the assumption that the non-interacting particles are statistically independent together with the definition of the entropy.

8.3

The Equation of State

The equation of state for a gas of non-interacting fermions can be found from
by noting that
= P V
(997)
The equation of state can be obtained directly when the single-particle density
of states () has the form of a simple power law
= C

()

for 0
otherwise

(998)

where is a constant. The Grand-Canonical Potential is found as

= P V
Z

= kB T

d () ln

186

1 + exp[( )]

(999)

Hence, one has

P V
kB T

= C

d ln

1 + exp[( )]

C
+ 1

d
0

d(+1)
ln
d

1 + exp[( )]

On integrating by parts, one obtains

P V
C

=
+1 ln
1 + exp[( )]

kB T
+ 1
0
Z
C
exp[( )]
(+1)
+
d
+ 1 0
1 + exp[( )]

(1000)

(1001)

The boundary terms vanish, since the density of states vanishes at the lower
limit of integration = 0 and the logarithmic factor vanishes exponentially
when . Thus, on canceling a factor of , one finds
Z
1
1
d C (+1)
P V =
+ 1 0
exp[( )] + 1
Z
1
1
=
d ()
+ 1
exp[( )] + 1
U
(1002)
=
+ 1
That is, the equation of state for the system of non-interacting fermions is found
as
U
P V =
(1003)
+ 1
For = 12 , the relation is identical to that found for the classical ideal gas.
In fact, the high-temperature limit of the equation of state for the system of
non-interacting bosons can be evaluated as

= P V
Z

= kB T

d () ln

1 + exp[( )]

kB T

d () exp[( )]
Z0

kB T

N kB T

d ()
0

1
exp[( )] + 1
(1004)

since at high-temperatures < 0 and since we have assumed that the singleparticle density of states is zero below = 0. Therefore, we have re-derived
the ideal gas law from the high temperature limit of a set of non-interacting
particles which obey Fermi-Dirac statistics.

187

8.4

The Chemical Potential

We have seen that, at high temperatures, the equation of state of a gas of

particles obeying Fermi-Dirac statistics reduces to the equation of state for
particles obeying Classical statistics if

exp 1
(1005)
(where we have restricted 0) since under these conditions the Fermi-Dirac
distribution function reduces to the Boltzmann distribution function. The above
restriction is also consistent with our previous discussion of the ideal gas, where

V
(1006)
exp (T ) =
3
We shall examine the temperature dependence of the chemical potential and
show that Fermi-Dirac statistics is similar to classical Maxwell-Boltzmann statistics at sufficiently high temperatures.
For a system with a fixed number of fermions, N , governed by a condition
like electrical neutrality, then the chemical potential is temperature dependent
and is found as the solution for (T ) found from the equation
Z
N =
d () f ()
(1007)
0

For large and negative values of (T ), one can expand the Fermi-Dirac distribution function
Z
exp[( )]
N =
d ()
1 + exp[( )]
0
Z

X
=
(1)n exp[n( )]
d () exp[( )]
0

n=0

d ()
0

X
n=1

(1)n+1 exp[n( )]

n=1

(1)n+1 exp[n]

d () exp[n]

(1008)

On substituting for the chemical potential in terms of the fugacity z

z = exp[]

(1009)

and on introducing an expression for the density of states

() = C

188

(1010)

one has
N = C

(1)

n+1

d exp[n]

(1011)

n=1

On changing the variable of integration from to the dimensionless variable x

where x is defined as
x = n
(1012)
the expression for the average number of particles has the form
N

= C ( kB T )+1

dx x exp[x]

(1)n+1

n=1

= C ( kB T )+1 ( + 1)

(1)n+1

n=1

zn
n+1

zn
n+1
(1013)

where (x) is the Gamma function. Since one can re-write the above equation
as

X
N
zn
(1014)
=
(1)n+1 +1
+1
( + 1) C ( kB T )
n
n=1
one can see that, for fixed N and high T , z must be small. In the case where
z 1, which occurs for sufficiently high temperatures, one may only retain the
first term in the power series for z. This leads to the solution for z and, hence,
the chemical potential

N
z
(1015)
( + 1) C ( kB T )+1
or alternatively

(T ) kB T ln

( + 1) C ( kB T )+1
N

(1016)

which illustrates that (T ) must be temperature dependent. Furthermore, we

see that at sufficiently high temperatures, only the first term in the expansion
of the Fermi-Dirac distribution contributes. In this limit, Fermi-Dirac statistics reduces to classical Maxwell-Boltzmann statistics. We shall see later that
something similar happens in the high temperature limit with Bose-Einstein
statistics.
One also sees that on decreasing the temperature downwards, starting from
the high temperature limit, then z increases. Therefore, one must retain an
increasing number of terms in the expansion

N
( + 1) C ( kB T )+1

189

X
n=1

(1)n+1

zn
n+1

(1017)

if one wants to determine the chemical potential accurately for lower temperatures. The reversion of the series is only practical if z 1, above which
the Fermi-Dirac gas is said to be non-degenerate. For temperatures below the
degeneracy temperature, at which = 0 and therefore z = 1, the chemical
potential must be found by other methods.

8.5

The Sommerfeld Expansion

Many thermodynamic properties of a system of electrons can be written in the

form
Z
d () f ()
(1018)
A(T ) =

where () is some function of the single-particle energy multiplied by the

single-particle density of states () and f () is the Fermi-function.
Integrals of this form can be evaluated very accurately, if the temperature
T is of the order of room temperature T 300
AK, which corresponds to an
energy scale of
1
kB T
eV
(1019)
40
and if the typical energy scale for () is between 1 and 10 eV. Typical electronic
scales are given by the binding energy of an electron in a Hydrogen atom 13.6
eV or the total band width in a transition metal which is about 10 eV. In Na,
the chemical potential measured from the bottom of the valence band density
of states is about 3 eV and is about 12 eV for Al. Clearly, under ambient
conditions in a metal one has
kB T

(1020)

so a metal can usually be thought as being below its degeneracy temperature.

The Sommerfeld expansion29 expresses integrals of the form
Z
A(T ) =
d () f ()

(1021)

in terms of a sum of the T = 0 limit of the integral and a power series of kB T /.

As a first approximation, one can estimate A(T ) as
Z

(T )

A(T )

d () + 0

(1022)

29 A. Sommerfeld, Zur Elektronentheorie der Metalle auf Grund der Fermischen Statistik.
Zeitschrift f
ur Physik 47, 1-3, (1928).

190

since at T = 0, one can write the Fermi-function as

f ()

for < (T )

f ()

for > (T )

(1023)

We would like to obtain a better approximation, which reflects the temperature

dependence of the Fermi-function f (). A better approximation for A(T ) can
be obtained by re-writing the exact expression for A(T ) as
Z (T )
Z
A(T ) =
d () +
d 0

(T )
(T )

d () ( f () 0 )

d () ( f () 1 ) +

(T )

(1024)
In this we have included the exact corrections to the T = 0 approximation to
each region of the integral. This is evaluated as

Z (T )
Z (T )
1
A(T ) =
d () +
d ()
1
exp[( )] + 1

Z
1
+
d ()
exp[(

)] + 1
(T )
Z (T )
Z (T )
1
=
d ()
d ()
exp[(
)] + 1

Z
1
+
d ()
(1025)
exp[(

)] + 1
(T )
where we have substituted the identity
1
1
exp[( )] + 1

exp[( )]
exp[( )] + 1
1
=
1 + exp[( )]

(1026)

in the second term of the first line. The two temperature-dependent correction
terms in A(T ) involve a function of the form
1
exp[ x ] + 1

(1027)

We shall set x = ( (T ) ) or
= (T ) + kB T x

(1028)

in the second correction term which yields

Z (T )
Z (T )
1
A(T ) =
d ()
d ()
exp[(
)] + 1

Z
1
+ kB T
dx ( + kB T x)
(1029)
exp[x]
+ 1
0
191

The first correction term can be expressed in terms of the variable y = (

(T ) ) or
= (T ) kB T y
(1030)
and the boundaries of the integration over y run from 0 to .
Z
A(T )

(T )

+ kB

0
1
d () + kB T
dy ( kB T y)
exp[y]
+ 1

Z
1
T
dx ( + kB T x)
(1031)
exp[x]
+ 1
0

Except for the terms +kB T x and kB T y in the arguments of the function ,
the correction terms would cancel and vanish. On changing the integration
variable from y to x in the second term, the integrals can be combined as
Z
A(T )

(T )

d ()

Z
+ kB T

( + kB T x) ( kB T x)

1
exp[x] + 1
(1032)

On Taylor expanding the terms in the large square parenthesis, one finds that
only the odd terms in kB T x survive.
Z
A(T )

(T )

d ()

Z
+ 2 kB T
0

( kB T x )2n+1
1
2n+1 ()
dx
2n+1

(2n
+
1)!
exp[x]
+ 1
(T )
n=0
(1033)

On interchanging the order of the summation and the integration, one obtains
Z
A(T )

(T )

d ()

X
( kB T )2n+2 2n+1 ()
x2n+1
+2
dx
(2n + 1)!
2n+1 (T ) 0
exp[x] + 1
n=0
(1034)
where all the derivatives of are to be evaluated at (T ). The integrals over
x are convergent. One should note that the power series only contains terms of
even powers in kB T . Since the derivatives of such as
2n+1 ()
2n+1
192

(1035)

have the dimensions of /2n+1 , one might think of this expansion as being in
powers of the dimensionless quantity

2
kB T
(1036)

which is assumed to be much smaller than unity. Therefore, the series could be
expected to be rapidly convergent.
The integral
Z

x2n+1
exp[x] + 1

dx
0

(1037)

is convergent since it is just the area under the curve that varies as x2n+1 for
small x and vanishes exponentially as exp[x] x2n+1 for large x. The integral
5

exp[-x]
2

0
0

Figure 41: The integrand

xn
exp[x]+1 .

can be evaluated by considering Im given by

Z
xm
Im =
dx
exp[x] + 1
0

(1038)

and noting that since x > 0 then exp[x] < 1. Therefore, by rewriting the
integral as
Z
xm
Im =
dx
exp[x] ( 1 + exp[x] )
Z0
xm exp[x]
=
dx
(1039)
( 1 + exp[x] )
0
one can expand the integral in powers of exp[x]
Z

X
Im =
dx xm exp[x]
( 1 )l exp[lx]
0

X
l=0

l=0

( 1 )l

dx xm exp[(l + 1)x]

193

(1040)

On changing the variable of integration from x to y where

y = (l + 1)x
one has
Im =

X
l=0

( 1 )l
( l + 1 )m+1

(1041)

dy y m exp[y]

(1042)

R
The integral 0 dy y m exp[y] can be evaluated by successive integration by
parts. That is,
Z
Z

dy y m
dy y m exp[y] =
exp[y]
y
0
0

Z

dy m y m1 exp[y]
= y m exp[y] +
0

(1043)
The boundary term vanishes like y m near y = 0 and vanishes like exp[y] when
y . Therefore,
Z
Z
dy y m = m
dy y m1 exp[y]
0
Z0
= m!
dy exp[y]
0

= m!
Thus, we have
Im = m!

(1044)

( 1 )l
( l + 1 )m+1

l=0

(1045)

However, since the Riemann zeta function is defined by

(m + 1) =

X
l=0

one has

Im = m!

1
( l + 1 )m+1
2

2m+1

(1046)

(m + 1)

(1047)

Therefore, the Sommerfeld expansion takes the form

Z
A(T )

(T )

d ()

X
n=0

1
2(2n+1)

2n+2

(2(n + 1)) ( kB T )

2n+1 ()
2n+1

(T )

(1048)
194

which can be written as

Z (T )
A(T ) =
d ()

n=1

1
2(2n1)

(2n) ( kB T )

2n1 ()
2n1 (T )
(1049)

where (T ) is the value of the chemical potential at temperature T and the

Riemann function has the values
(2)

(4)

(6)

2
6
4
90
6
945

(1050)

etc. Thus, at sufficiently low temperatures, one expects that one might be able
to approximate A(T ) by the Sommerfeld expansion
Z

(T )

A(T )

d ()

8.6

2
()
( kB T )2
+ ...
6
(T )

(1051)

The Low-Temperature Specific Heat of an Electron

Gas

The condition of electrical neutrality determines the number of electrons in a

metal and keeps the number constant. The number of electrons N is given by
Z
N =
d () f ()
(1052)

which can be approximated by the first few terms in the Sommerfeld expansion

Z (T )

2
2
N =
d () +
( kB T )
+ ...
(1053)
6

This yields the temperature dependence of (T ). Since N is independent of

temperature
N
T

195

=
0

(T )

2 2

d () +
+ ...
kB ( kB T )
6

2 2

+ ...
(1054)
() +
kB ( kB T )
T
6

Thus, we have found that

2 2
=
kB ( kB T )
T
6

(1055)

r(e)

which implies that the derivative of w.r.t. T has the opposite sign to rho
.
Thus, if (T = 0) is just below a peak in () then the integral expression for
N runs over a range of which avoids the peak. If did not decrease with
increasing T , then at finite temperatures, the peak when multiplied by the tail
of the Fermi-function could give an extra contribution to N . This increase must

r(e)f(e)

Figure 42: The density of states () and the density of states weighted by the
Fermi-Dirac distribution function ()f ().
be offset by moving (T ) down from F . so the contribution from the tail at
the peak is smaller and is offset by the smaller area under the curve to (T ).
Similar reasoning applies for the increase in (T ) if F is located just above a
peak in (). However, if F is located at the top of a symmetric peak in the
density of states, then the chemical potential should not depend on temperature.
The internal energy can also be expressed as
Z
E =
d () f ()

196

(1056)

which can be approximated by the first few terms in the Sommerfeld expansion

Z (T )
2
( )
d () +
+ ...
(1057)
( kB T )2
E =
6

The specific heat is given by the temperature derivative of the internal energy
at fixed V

E
CV =
T V

Z (T )
2 2

( )
d () +
+ ...
=
kB ( kB T )
T
6

2 2
( )
=
() +
kB ( kB T )
+ ...
(1058)
T
6

On substituting for
CV

T ,

one finds

2

2 2
(
=
kB ( kB T )
+
kB ( kB T )
3

6

=

2
kB ( kB T ) () + O(T 3 )
3

+ ...

(1059)

since on expanding
, one finds the term cancels with the term
coming from the temperature dependence of the chemical potential. Hence, the
low-temperature electronic specific heat at constant volume is linearly proportional to temperature and the coefficient involves the density of states at the
Fermi-energy.

2
kB ( kB T ) ()
3

(1060)

The above result is in contrast with the specific heat of a classical gas, which
is given by
3
Cv =
N kB
(1061)
2
The result found using quantum statistical mechanics
CV

2
kB ( kB T ) ()
3

(1062)

is consistent with Nernsts law as

CV = T

197

S
T

(1063)
T

vanishes as T 0 since S vanishes as T 0. This occurs since the quantum

ground state is unique so that S/N vanishes as T 0. The uniqueness occurs
since the lowest energy single-particle states are all occupied by one electron,
in accordance with the Pauli exclusion principle. Since there is no degeneracy,
S = 0.
The specific heat of the electron gas is proportional to T . This can be understood by considering the effect of the Pauli exclusion principle. In a classical
gas, where there is no exclusion principle, on supplying the thermal energy to
the system, on average each particle acquires a kinetic energy of 23 kB T . Hence,
the excitation energy of the system is proportional to kB T and
E =

3
N kB T
2

(1064)

so

Cv

E
T

3
N kB
2

(1065)

For fermions, if one supplies the thermal energy to the system, only the electrons within kB T of the Fermi-energy can be excited. An electron in an energy
level far below F cannot be excited by kB T since the final state with higher
energy is already occupied. Thus, the Pauli-exclusion principle forbids it to be
excited. However, electrons within kB T of the Fermi-energy can be excited.
The initial state is occupied, but the final state is above the Fermi-energy and
can accept the excited electron.
Only the electrons within kB T of the Fermi-energy can be excited. The
number of these electrons is approximately given by
(F ) kB T

(1066)

Each of these electrons can be excited by the thermal energy kB T , so the

increase in the systems energy is given by
E = (F ) ( kB T )2

(1067)

Hence, the specific heat is estimated as

=
=

E
T
2
(F ) kB
T

(1068)

which shows that the linear T dependence is due to the Pauli-exclusion principle.

198

Similar arguments apply to thermodynamic properties or transport coefficients of the electron gas. The states far from F are inert since they cannot
be excited by the small energies involved in the process, since their electrons
cannot move up in energy because the desired final states are already occupied.
The Pauli exclusion principle blocks these states from participating in processes
which involve low excitation energies. Thus, they dont participate in electrical
conduction, etc. These processes are all dominated by the states near F , hence
they depend on the density of states evaluated at the Fermi-energy () or its

, etc. Hence, it may be useful to find other experimental properderivatives

ties which can be used to measure ().

8.7

The Pauli Paramagnetic Susceptibility of an Electron

Gas

The Pauli paramagnetic susceptibility provides an alternate measure of the

single-particle density of states at the Fermi-energy. The effect of the Pauli
exclusion principle limits the number of electrons that are allowed to flip their
spins to an energy of the order of B H from the Fermi-energy.
Consider a gas of non-interacting electrons, each of which carries a spin
S = 21 . These spins couple to an applied magnetic field H z aligned along the
z-axis, via the anomalous Zeeman interaction
int = g B S Z H z
H

(1069)

where g = 2 is the gyromagnetic ratio. Hence, in the presence of the field, the
single-electron energy levels become
1
2
1
=
2

B H z

for

Sz = +

+ B H z

for

Therefore, the Grand-Canonical potential is given by

X
z
=
ln 1 + exp[( B H ) ]

(1070)

(1071)

which can be written as an integral over the (zero-field) density of states

Z
1 X
z
=
d () ln 1 + exp[( B H ) ] (1072)
2
where the factor 2 is the density of states per spin direction (for zero applied
fields). The magnetization is defined as

z
M =
(1073)
H z
199

which yields
M

=
=

Z
1 X
exp[( B H z ) ]
() B
d ()
2
1 + exp[( B H z ) ]

1 X
() B
d () f ( B H z )
(1074)
2

This can simply be interpreted as B times the excess of spin-up electrons over
down-spin electrons
X
Mz =
() n,
(1075)
,

which vanishes when H

0. The differential susceptibility is defined as

()

+()
2BH

Figure 43: The spin-split single-electron density of states (red), in the presence
of a finite magnetic field. Due to the Pauli principle, the field can only realign
the spins of electrons which have energies within B H z of the Fermi-energy.

z,z

M z
H z

(1076)
H z =0

which measure the linear field dependence of M . The susceptibility is evaluated

as
Z
f
1 X 2
B
d ()
( B H z )
(1077)
z,z =
2

which remains finite in the limit H z

susceptibility simplifies to

z,z

=
=

0. In the limit of zero field, the

Z
1 X 2
f

B
d ()
2

Z
f
2B
d ()

200

(1078)

In the limit of zero temperature, one has

f
= ( )

(1079)

therefore, one finds

z,z = 2B ()

(1080)

Hence, the ratio of the specific heat and the susceptibility given by
2
2B CV
=
z,z
T
3

(1081)

2
kB

This relation is independent of () and provided a check on the theory. It is

satisfied for all simple metals and for most of the early transition metals. Thus
the low temperature limit of z,z is a measure of the density of states at the
Fermi-energy.
The leading temperature dependent corrections to can be obtained from
the Sommerfeld expansion. The susceptibility is given by
Z
f
(1082)
d ()
z,z = 2B

and on integrating by parts, one obtains

z,z
2

= B () f ()

+ 2B

= 2B

f ()

f ()

(1083)

since the boundary terms vanish. On using the Sommerfeld expansion, one
obtains the result
2
2

1
2
+ . . . (1084)
z,z = 2B () + 2B
( kB T )2

2
6

F
The corrections arent important unless kB T /
, i.e. the temperature
is of the order of the energy over which varies. Thus, z,z is approximately
temperature independent.

8.8

The High-Temperature Limit of the Susceptibility

The high-temperature limit of the susceptibility

Z
f
z,z = 2B
d ()

201

(1085)

can be found by using the high-temperature approximation

f () exp[( )]

(1086)

which yields the approximation

z,z

d () exp[( )]

(1087)

which is evaluated as
z,z

2B N

on utilizing the expression for the number of electrons

Z
N =
d () exp[( )]

(1088)

(1089)

()

Hence, the Pauli paramagnetic susceptibility turns over into a Curie law at sufficiently high temperatures. Since the high temperature variation first happens

Figure 44: The temperature dependence of the Pauli paramagnetic susceptibility

(T ) (blue curve). The high temperature Curie law is shown by the dashed red
curve.
when kB T , the high temperature limit first applies at temperatures of
the order of T 12,000 K.

8.9

The Temperature-dependence of the Pressure of a Gas

of Non-Interacting Fermions

The Sommerfeld expansion can also be used to calculate the temperaturedependence of the pressure for a gas of non-interacting fermions at low temperatures. Starting from the expression for the Grand-Canonical potential
202

and integrating by parts, one finds that

= P V
Z
1
2
d ()
=
3 0
exp[ ( ) ] + 1

(1090)

where the density of states (including a factor of 2 for both spin directions) is
given by

3
2m 2
V

(1091)
() =
2 2
h2
Hence, on substituting for the single-particle density of states and on canceling
a factor of V , one finds that the pressure is given by

1
P =
3 2

2m
h2

32 Z

d 2 f ()

(1092)

where f () is the Fermi-Dirac distribution function. On using the Sommerfeld

expansion, one obtains the approximate expression
1
P
3 2

2m
h2

32 Z

(T )

d
0

3
2

1
3
2
( kB T )2 (T ) 2 + . . .
+
6
2

which is evaluated as

3

5
1
2m 2 2
1
2
2
2
2
(T ) +
( kB T ) (T ) + . . .
P
3 2
5
4
h2

(1093)

(1094)

The temperature dependence of the chemical potential is approximately given

by

2

2 kB T
(T ) F 1
+ ...
(1095)
12
F
Hence, we obtain the final result
P

3
1
2 m 2 2 52
2
2 2
+
( kB T ) F + . . .
5 F
6
h2

2

3
2
5 2 kB T
2 m 2 52
F 1 +
+ ...
15 2
12
F
h2

1
3 2

(1096)

Thus, a gas of non-interacting particles which obey Fermi-Dirac statistics exerts

a finite pressure in the limit T 0. This can be understood, since at T = 0,
the particles occupy states with finite momenta up to the Fermi-energy and,
therefore, they collide with the containers walls giving rise to pressure. This is
in direct contrast with the behavior found for a classical ideal gas and, as we
shall see later, is also in contrast with the pressure of a non-interacting gas of
bosons at low temperatures.
203

8.10

Fluctuations in the Occupation Numbers

If one considers a group of energy levels with large enough degeneracy, then
it is possible to consider the statistics of the occupation number for .
The average occupation number of the energy levels , is given by

1

Trace n
exp ( H N )
n =

X
1
0
0
( ) n
=
Trace n exp

T
ln

(1097)

where the derivative is w.r.t. the energy level . As expected, this is given by
n

1
exp[( )] + 1
= f ( )
=

(1098)

which is just the Fermi-Dirac distribution function.

The mean squared fluctuations around this average is given by
n2 n 2

2
1
ln

1
=

n

= f ( ) ( 1 f ( ) )

(1099)

The r.m.s. number fluctuation is reduced from the classical value of n . In fact,
due to the Pauli exclusion principle, the fluctuations are only non-zero in an energy width of kB T around the Fermi-energy. The reduction in the fluctuation
of fermion occupation numbers is in strong contrast to the fluctuations that are
found for particles that obey Bose-Einstein Statistics.

204

Bose-Einstein Statistics

9.1

Non-Interacting Bosons

An ideal gas of non-interacting bosons30 is described by the Hamiltonian H

which is given by the sum
X
=
H
n

(1100)

where n
represents the occupation number of the -th single-particle energy
level. This is just the sum of the contributions from each particle, grouped

according to the energy levels that they occupy. Likewise, the operator N
representing the total number of particles is given by
X
=
n

(1101)
N

where the sum is over the single-particle energy levels. Hence, for non-interacting
bosons, the density operator is diagonal in the occupation number representation.
The Grand-Canonical Partition function is given by

= Trace exp ( H N )

(1102)

X
X
X
Trace
...

...
...
(1103)
n1 =0 n2 =0 n3 =0

Since the exponential term in reduces to the form of the exponential of a sum
of independent terms, it can be written as the product of exponential factors

X

exp ( H N )
= exp
( ) n

exp

( ) n

(1104)

30 A. Einstein, Quantentheorie des einatomigen idealen Gases, Sitzungsberichte der

Preussischen Akademie der Wissenschaften, 1, 3-14, (1925).

205

Therefore, the trace in the expression for can be reduced to

Y X

( ) n

exp

(1105)

n =0

where the sum is over all the occupation numbers n allowed by Bose-Einstein
statistics. The summation is of the form of a geometric series, as can be seen
by introducing the variable x defined by

x = exp ( )
(1106)
so

( ) n

exp

n =0

(1107)

n =0

This geometric series converges if x 1, i.e.

exp ( ) 1

(1108)

which requires that > . This condition has to hold for all , so must
be smaller than the lowest single-particle energy level 0 . Therefore, we require
that
0 >
(1109)
On performing the sum of the geometric series, one obtains

exp ( )n

n =0

1
1 exp[( )]

where 0 > (1110)

Therefore, the Grand-Canonical Partition Function is given by

Y
1
=
1 exp[( )]

The Grand-Canonical Potential is found from

= exp
which, on taking the logarithm, yields

X
=
ln

or
= kB T

1
1 exp[( )]

(1112)

(1111)

(1113)

1 exp[( )]

206

(1114)

On introducing the density of single-particle states () defined as

X
() =
( )

(1115)

the summation can be transformed into an integral over

Z
d () ln 1 exp[( )]
= kB T

(1116)

where the density of states goes to zero below 0 . After evaluating the integral,
one may find all the thermodynamic properties of the system from .

9.2

The Bose-Einstein Distribution Function

The average number of particles in the system N can be determined from via
the relation

N =
(1117)
T,V
which is evaluated as
N

kB T
Z

exp[( )]
1
exp[( )]

1
d ()
exp[( )] 1
Z

d ()

(1118)

or equivalently
N =

1
exp[( )] 1

(1119)

but, by definition, one has

N =

(1120)

Therefore, the function N () defined by

N () =

1
exp[( )] 1

(1121)

represents the average number of particles n in a quantum level with a singleparticle energy . The function N () is the Bose-Einstein distribution function.
The Bose-Einstein distribution vanishes as

N () exp ( )
(1122)
for > kB T . For low energies < < + kB T , the Bose-Einstein

207

N(e)
3

0
-2

-1.5

-1

-0.5

0.5

1.5

b(e-m)

-1

-2

-3

-4

Figure 45: The Bose-Einstein Distribution function.

distribution varies as

1
kB T

N ()

(1123)

which can become arbitrarily large.

The Bose-Einstein distribution function enters the expressions for other thermodynamic quantities, such as the average energy. The energy can be found
from the expression for the entropy S. The entropy can be found from the
infinitesimal relation
d = S dT P dV N d

(1124)

which yields

S

+ kB
T

d ()

( )
exp[( )] 1

On recalling the expression for N , one has

Z
( + N )
1
1
S =
+
d ()
T
T
exp[( )] 1

(1125)

(1126)

On using the definition

= U T S N
one finds that the thermodynamic energy is given by
Z
1
U =
d ()
exp[(

)] 1

208

(1127)

(1128)

or alternately as
U =

1
exp[( )] 1

(1129)

The thermodynamic energy U should be compared with the expression for the
average energy of the non-interacting particles
X
E =
n
(1130)

This comparison reconfirms our identification of the average number of particles

in the -th energy level as
n =

1
exp[( )] 1

(1131)

In general, the Bose-Einstein distribution function is defined as

N ( ) =

1
exp[( )] 1

(1132)

where and corresponds to the thermal average of the occupation number

of a quantum level with single-particle energy . Since the occupation numbers
can have the values n = 0 , 1 , 2 , . . . , one has
n 0

(1133)

so the average value must also satisfy the same inequalities

N ( ) 0

(1134)

The positivity of N ( ) requires that . However, there does exist a

possibility that, in thermal equilibrium, a level can be occupied by an average
number of particles that tends to infinity. This possibility requires that the
energy of this level is sufficiently close to , i.e. 0 .
For a system with a fixed average number of particles N , the equation
Z
N =
d () N ()
(1135)

has to be regarded as an implicit equation for (T ). Once (T ) has been

found, one may then calculate other thermodynamic averages. For example,
the average energy of our system of non0interacting particles is given by
Z
U =
d () N ()
(1136)

etc.

209

9.3

The Equation of State for Non-Interacting Bosons

The equation of state for a gas of non-interacting bosons can be found from
by noting that
= P V
(1137)
The equation of state can be obtained directly when the single-particle density
of states () has the form of a simple power law
= C

()

for 0
otherwise

(1138)

where is a constant. The Grand-Canonical Potential is found as

= P V
Z
= kB T

d () ln

1 exp[( )]

(1139)

Hence, one has

P V

kB T

= C

1 exp[( )]

C
+ 1

d
0

d(+1)
ln
d

1 exp[( )] (1140)

On integrating by parts, one obtains

C
P V

=
+1 ln
1 exp[( )]

kB T
+ 1
0
Z
C
exp[( )]
(+1)

d
(1141)
+ 1 0
1 exp[( )]
The boundary terms vanish, since the density of states vanishes at the lower
limit of integration = 0 and the logarithmic factor vanishes exponentially
when . Thus, on canceling a factor of , one finds
Z
1
1
P V =
d C (+1)
+ 1 0
exp[( )] 1
Z
1
1
=
d ()
+ 1
exp[( )] 1
U
=
(1142)
+ 1
That is, the equation of state for an ideal gas of bosons is found as
P V =

U
+ 1

210

(1143)

The same method was used to find the equation of state for an ideal gas of
fermions. The result is the same. Therefore, the equation of state holds true,
independent of the quantum statistics used. The equation of state must also
apply to the classical ideal gas, as the ideal gas can be considered as the hightemperature limiting form of the ideal quantum gasses.

9.4

The Fugacity at High Temperatures

For a system with a fixed number of particles N , the equation

Z
d () N ()
N =

(1144)

implicitly determines (T ). For non-relativistic bosons with the dispersion relation

h2 k 2
k =
(1145)
2m
in three-dimensions, the density of states can be calculated by replacing the sum
over discrete values of k by an integral over a density if points in phase space
X
() =
( k )
k

Z
V
d3 k ( k )
( 2 )3

Z
2m
V
2 2 m
2
dk
k

k
=
2 2 0
h2
h2
r
V m
2m
=
2 2 h2
h2
Thus, the average number of particles is expressed as

3 Z
1
V
1
2m 2
N =
d 2 1
2
2
4
z
exp[ ] 1
h
0
=

(1146)

(1147)

where z is the fugacity is defined as

z = exp[ ]

(1148)

The equation for N can be re-written in terms of the dimensionless variable

x = as

3 Z
1
V
2 m kB T 2
1
N =
dx x 2 1
(1149)
2
2
4
z
exp[ x ] 1
h
0
which determines the fugacity z as a function of temperature. This equation
can be expressed as

3
Z
1
2
1
N
2 h2 2
2

dx x
=
(1150)
z 1 exp[ x ] 1
V
m kB T
0
211

or as
1
( 23 )

dx x
0

1
2

z 1

N
1
=
exp[ x ] 1
V

where

2 h2
m kB T

32
(1151)

dx x exp[ x ]

( + 1) =

(1152)

This type of integral appears frequently in the evaluation of other quantities31 . We shall denote the integral I+1 (z) as
Z
1
1
I+1 (z) =
dx x 1
(1153)
( + 1) 0
z
exp[x] 1
where z < 1 and x > 1. The integrand can be expanded as
Z
1
z exp[x]
I+1 (z) =
dx x
( + 1) 0
1 z exp[x]
Z

X
1
=
dx x
z m exp[mx]
( + 1) 0
m=1

(1154)

On transforming the variable of integration to y = mx one obtains

X
zm
1
dy y exp[y]
I+1 (z) =
( + 1) 0
m+1
m=1
=

X
m=1

zm
m+1

(1155)

For small z, z 1, one only needs to retain the first terms so

I+1 (z) z

(1156)

while for z equal to unity, z = 1, the integral has the value

I+1 (1)

m(+1)

m=1

= ( + 1)

(1157)

The equation determining N can be expressed in terms of the above set of

functions as

3
N
2 h2 2
I 32 (z) =
(1158)
V
m kB T
31 J.E.

Robinson, Phys. Rev. 83, 678 (1951).

212

where = 12 . This equation can be solved graphically if the function I 23 (z)

is plotted vs z since the intersection with the line representing the righthand
side yields the solution for z. At high temperatures, one finds that the solution
for z is much less than unity so that the chemical potential is negative. As
the temperature decreases, the value of z increases towards its maximum value
of one, which corresponds to increasing towards zero. The temperature for
which z = 1 is given by

32

N
2 h2
3
(1159)
=

2
V
m kB Tc
For temperatures below Tc , the equation for N cannot be satisfied and the
lowest energy level has to have a macroscopic occupation number. That is
for T < Tc , the bosons must condense into the lowest energy state, as first
predicted by Einstein. For this low temperature range, it is no longer sufficient
to use a continuum expression for the density of single-particle state () which
fails to give the proper weight for the lowest energy state. That is, the method
used for calculating the density of states approximates the sum over states by
an integration over the density of points in phase space states. Therefore, it
only calculates the average number of points on the constant energy surface.
This approximation fails miserably at very low energies, when the number of
points on the constant energy surface is low. A better approximation has the
form
1
() = () + C 2
(1160)
which explicitly includes a delta function of weight of unity for the lowest energy
state and C is an extensive constant.

9.5

Fluctuations in the Occupation Numbers

9.6

Bose-Einstein Condensation

The expression for the Bose-Einstein distribution function only makes sense
when the chemical potential is less than the energy of the single-particle
quantum state . Since this is true for all and since we have defined the lowest
single-particle energy as = 0, we must require that < 0. As was first
pointed out by Einstein, if approaches zero, then there can be a macroscopic
occupation of a single-particle quantum level.
When there is a macroscopic occupation of the lowest energy level, the single
particle density of states must explicitly include the lowest energy state. In this
case, we need to use the better approximation to the density of states given by
() = () + C ()

(1161)

where the function () is the Heaviside step function. The delta function represents the lowest energy state. The second term represents the approximation
213

for the extensive part of the density of states. This expression can also be used
at temperatures above Tc since the contribution from the delta function is not
extensive and can be ignored.
On using the above expression for the density of states to evaluate the average number of particles, one finds
Z
d () N ()
N =

=
d () + C () N ()

Z
d C N ()
(1162)
= N (0) +
0

where the first term represents the number of particles in the quantum level
with zero energy. On changing the variable of integration to x = , the
expression can be re-written as
Z
1
x
+1
e =
N
+
C
(
k
T
)
dx 1
B
1
z
1
z
exp[x] 1
0
z
+1
+ ( + 1) C ( kB T )
I+1 (z)
(1163)
=
1 z
This equation determines z when the average number of particles N is fixed.
The above equation can be interpreted as the sum of the particles in the lowest
3

2.612
2.5

I3/2(z)

1.5

0.5

0
0

0.2

0.4

0.6

0.8

Figure 46: The graphical solution for the fugacity found from plotting both
I 32 (z) (blue) and A[N z/(1 z)] (red) versus z. The point of intersection
of the curves yields the value of the fugacity. Note that although I 23 (z) has a
divergent derivative at z = 1, its value there is finite and is given by 2.612.
energy state N0 and the number of particles in the excited states Nexc
N = N0 + Nexc
214

(1164)

where the number of excited particles is given by

Nexc = ( + 1) C ( kB T )+1 I+1 (z)

(1165)

The number of particles in the condensate N0 is given by

N0 =

z
1 z

(1166)

Above the Condensation Temperature

For T > Tc , one has
Nexc = N

(1167)

as the explicit equation for N determines the value of z to be z < 1. Thus,

the number of particles in the condensate is not extensive and is negligible, so
all the particles can be considered as being in the excited states.
The Condensation Temperature
The Bose-Einstein condensation temperature Tc is evaluated from the condition that z = 1 where is a positive small quantity
N = ( + 1) C ( kB Tc )+1 ( + 1)

(1168)

which determines the lowest temperature at which the number of the particles
in the condensate is still negligibly small (N0 1 ). Note that, since C V ,
the condensation temperature Tc depends on the density, or if the number of
particles is fixed it depends on the volume Tc (V ).
Below the Condensation Temperature
For temperatures below Tc , T < Tc , the number of particles in the excited
states is temperature dependent and decreases towards zero as T is reduced to
zero according to a simple power law.
Nexc = ( + 1) C ( kB T )+1 ( + 1)

(1169)

or equivalently as

Nexc = N

T
Tc

+1
(1170)

where we have used the equation for the Bose-Einstein condensation temperature to eliminate the constant C. The number of particles in the condensate is
defined as
N0 = N Nexc
(1171)

215

which is evaluated as

N0 = N

T
Tc

+1
(1172)

which tends to N in the limit of T 0. In this case, one also has

1.5

Nexc/N

N0/N

0.5

0
0

0.5

1.5

T/Tc

Figure 47: The temperature-dependence of the relative number of condensate

particles N0 and the relative number of excited particles Nexc .
z
1 z

= N

T
Tc

+1
(1173)

which determines z as
1

z =
1

1
N

T
Tc

+1 1

(1174)

which confirms that z 1.

Properties of the Idealized Condensed Phase
The average energy U of the condensed phase of an ideal gas of bosons is
given by
Z
U =
d () N ()
(1175)

which together with expression for the single-particle density of states

() = () + C ()
yields the expression
Z
U =

(1176)

d C N ()

216

(1177)

Figure 48: The velocity distribution of a gas of Rb atoms at three different

temperatures, showing the evolution of the macroscopic occupation of the lowest
energy single-particle state as the temperature is lowered. [M.H. Anderson, J.R.
Ensher, M.R. Matthews, C.E. Wieman, and E.A. Cornell, Observation of BoseEinstein Condensation in a Dilute Atomic Vapor, Science 269, 198-201 (1995).]
since the N (0) particles with = 0 dont contribute to the total energy. Then
for z 1, the integral reduces to
U

= C ( kB T )+2 ( + 2) I+2 (1)

= C ( kB T )+2 ( + 2) ( + 2)

(1178)

Then, the specific heat for T Tc can be found from

U
CV,N =
T V,N
=

kB C ( kB T )+1 ( + 3) ( + 2)

(1179)

which follows a simple power law in T

CV,N = N kB ( + 2 ) ( + 1 )

( + 2)
( + 1)

T
Tc

+1
(1180)

The power law variation is simply understood. At T = 0 all the particles

occupy the state with = 0. At finite temperatures, the states with energy less
than kB T are occupied. For a density of states proportional to , there are
217

approximately ( kB T )+1 occupied states, each of which carries an energy of

approximately kB T . Therefore, the total energy is proportional to ( kB T )+2 .
Hence, the specific heat is proportional to ( kB T )+1 .
2.5

CV/NkB

1.5

0.5

0
0

0.5

1.5

T/Tc

Figure 49: The temperature dependence of the heat capacity for an ideal Bose
Gas.
The Cusp in the Heat Capacity at Tc
At the condensation temperature Tc the specific heat has the value of

( + 2)
CV,N (Tc ) = N kB ( + 2 ) ( + 1 )
(1181)
( + 1)
which for =

1
2

takes the value

CV,N (Tc )

= N kB
=

15
4

1.925 N kB

( 52 )
( 32 )

(1182)

Thus, the specific heat at Tc exceeds the high temperature classical value of
1.5 N kB . In fact there is a cusp in CV,N at Tc .
The existence of a cusp can be seen by examining the general expression for
the heat capacity, valid in the normal liquid and the condensed phase.

U
CV,N =
T V,N

z
= kB C ( kB T )+1 ( + 2) ( + 2 ) I+2 (z) + T
I+2 (z)
T z
(1183)

218

However, since
I+1 (z) =

X
m=1

zm
m+1

(1184)

then the derivative w.r.t. z is simply given by

1
I+1 (z) =
I (z)
z
z

(1185)

Thus, the specific heat in the normal phase can be expressed as

z 1
+1
CV,N = C kB ( kB T )
( + 2) ( + 2 ) I+2 (z) + T
I+1 (z)
T z
(1186)
z
in the normal phase. In the Bose Condensed
which requires knowledge of T
z
in the normal phase
phase, z = 1 so the derivative vanishes. The value of T
can be found from the condition

N
= 0
(1187)
T V
which yields

N
0 =
T V
=

C kB ( kB T ) ( + 1)

z 1
( + 1 ) I+1 (z) + T
I (z)
T z
(1188)

This has the solution

z
z I+1 (z)
= ( + 1)
T
T I (z)

(1189)

Therefore, the specific heat for the normal phase is given by the expressions

2
I+1
(z)
+1
CV,N = C kB ( kB T )
( + 2) ( + 2 ) I+2 (z) ( + 1 )
I (z)

I+2 (z)
I+1 (z)
= ( + 1 ) N kB ( + 2 )
( + 1)
(1190)
I+1 (z)
I (z)
In the high temperature limit, this can be expanded as

CV,N ( + 1 ) N kB 1 + +2 z + . . .
2

(1191)

which reaches the classical limit as z approaches zero and increases when z
increases. If = 12 , the denominator of the last term in the exact expression
diverges at the Bose-Einstein condensation temperature where z = 1. This
219

occurs as the sum defining ( 21 ) is divergent. The divergence of the I 32 (z) at

z = 1 causes the last term to vanish and makes the specific heat continuous at
Tc . However, the specific heat does have a discontinuity in its slope which is
given by

T +
N kB
CV,N c
= 3.66
(1192)

T
Tc
Tc
A cusp is also seen in the temperature dependence of the specific heat of liquid

PHYSICAL REVIEW B 68, 174518 2

SPECIFIC HEAT OF LIQUID HELIUM IN ZERO . . .

transition is expected to be given by Eq. 1. We fit the

sults over the whole range measured with the truncated
function:
C p

A
2

t 1a
c t b c t B ,

A
t B ,

TT ,

where we have assumed the constraint B B . The sim

form was used above T because the data extend only
t 106 , where the additional terms would still be ne
gible. All parameters were allowed to vary except for
which was fixed at its theoretical value25 of 0.529, and
which was determined as described earlier. See Ref. 60
FIG. 17. Bin-averaged data close to the transition. Line shows
the complete set of raw data used in the curve fitting. A
the best-fit function.
Figure 50: The temperature
dependence of the experimentally determined
listed is theheat
bin-averaged dataset shown in Fig. 15.
The best-fit
values for the parameters are listed in the
capacity of He4 locally
near the

point.
[J.A.
Lipa,
J.A.
Nissen,
D.A.
Stricker,
D.R.
linear dependence on the temperature within the
line
of Table
II along with the ratio A /A . The correspo
Swanson and T.C.P.
Specific
of Liquid
Helium
gravity
near
sample.Chui,
An example
of a fitHeat
is shown
by the solid
line in in
Fig.zeroing
uncertainties are listed below the values and refer to
16. It can
be seen
a reasonable
of the
the Lambda Point.,
Phys.
Revthat
B 68,
174518 representation
(2003).]
standard statistical error evaluated from the curve fitting
behavior is obtained over a significant portion of the decay.
tine. The uncertainties for the derived quantities A/
data above the transition became progressively more dif4
and P were evaluated by the usual formulas for propaga
He , which is a The
signature
of
the
so-called

transition.
ficult to analyze as the temperature was increased. This was
of errors61 taking into account the strong correlation betw
due primarily to the increased length of the extrapolation
the fitted parameters , A, and A. To obtain some
The Pressure inback
the toCondensed
the center of Phase
the pulse after the thermal transient had
for the sensitivity of the results to small changes in the an
decayed sufficiently. The bin-averaged specific-heat results
sis, we performed a number of extra fits to the data.
near the transition are shown on a linear scale in Fig. 17.
group ,
in the table shows the effect of modifying
The pressure can be found directly from the Grand-Canonical second
Potential
9
to
the
form
as
C. Curve fitting

= P V

which is

As described in the Introduction, the RG theory makes a

evaluated
as for the critical exponent , describing the diverprediction
gence of the heat-capacity near the transition and for the ratio
of=the
P V
leading-order
coefficients on the two sides of the tran heat-capacity near the
Z form for the
sition. The asymptotic

= kB T

with

1 exp[( )]

t 1a
c t b c tB ,

A
t B , TT ,

(1194)

TABLE II. Results

from curve fitting to the specific-heat measurements using Eq. 9 except where noted. Statistical uncertainties
given in parentheses beneath the values.
Constraint

This yields

d () ln

(1193)
A

C p

A /A
() = ()
+ C
() A

Eq. 9
Eq. 10

P V = kB T ln( 1 z )+
Reduced range
Reduced range
T 1 nK
P 5 107 W
P 5 5104 W

0.02%

0.01264
1.05251
0.00024 Z 0.0011

0.01321
1.05490
C0.00025
kB T
d
0.0011
0
0.01254
1.05210
0.00043
0.0018
0.01264
220 1.05251
0.00024
0.0011
0.01278
1.05307
0.00024
0.0011
0.01269
1.05273
0.00026
0.0012
0.01323
1.05498
0.00042
0.0018
0.01275
1.05297
0.00041
0.0018

5.6537
0.015

5.6950
ln0.092
1
5.6458
0.030
5.6537
0.015
5.6623
0.015
5.6570
0.017
5.6970
0.029
5.6620
0.028

a
c

(1195) b c

460.19
0.0157
0.3311
7.3
0.0015
0.011

443.76
0.0253
128.4
z 7.0
exp[]
(1196)2.5
0.0015
463.11
0.0136
0.3035
13.4
0.0043
0.044
460.20
0.0157
0.3311
7.4
0.0015
0.012
455.80
0.0165
0.3372
7.2
0.0015
0.012
458.55
0.0160
0.3335
8.0
0.0017
0.013
443.27
0.0228
0.3853
11.6
0.0038
0.028
456.89
0.0176
0.3473
12.3
0.0034
0.025

Range of fit

4.154
0.022
4.155
0.022
4.154
0.022
4.154
0.022
4.151
0.022
4.154
0.025
4.156
0.022
4.154
0.022

51010 t 10

51010 t 310
109 t 102

51010 t 10

On changing variable to x = one obtains

dx x ln 1 z exp[x]

P V = kB T ln N (0) kB T ln z C ( kB T )

(1197)
and integrating by parts in the last term, yields

x+1
exp[x] 1
0
(1198)
where the boundary terms have vanished. Hence, we find the equation of state
has the form
P V = kB T ln N (0) kB T ln z +

C
( kB T )+2
+ 1

z 1

C
( kB T )+2 ( + 2) I+2 (z)
+ 1
(1199)
For T < Tc , one has z = 1, therefore the expression reduces to
P V = kB T ln N (0) kB T ln z +

P V = kB T ln N (0) +

C
( kB T )+2 ( + 2) ( + 2)
+ 1

(1200)

Since C is proportional to the volume, the first and last terms are extensive,
while the logarithmic term is not and can be neglected. This further reduces
the equation of state to the form
P V = C ( kB T )+2 ( + 1) ( + 2)

(1201)

or equivalently
C
( kB T )+2 ( + 1) ( + 2)
(1202)
V
which as the volume dependence of C cancels with V , one finds that the pressure
is independent of volume. The pressure only depends on temperature, for T <
Tc , Thus, the isotherms become flat on entering the condensed phase.
P =

Furthermore, since
Nexc = ( + 1) C ( kB T )+1 ( + 1)
one can write

P V =

( + 2)
( + 1)

(1203)

Nexc kB T

(1204)

This makes sense since only the excited particles carry momentum and collide
with the walls.
The Entropy of the Condensed Phase
The entropy of the condensate can be found from

S =
T V,
221

(1205)

P=N/V1 kBT

P=N/V2 kBT
Tc2

Tc1

Figure 51: The P T relations for ideal boson gasses with two different densities
(blue) and different Tc s (vertical dashed lines). In the Bose-Einstein condensed
phase, the P T curves collapse onto one curve. The classical asymptotic
limiting form for the P T relations at these two densities are shown by the
dashed red lines.

Figure 52: The P V relations for an ideal boson gas at two different temperatures. Since the number of particles is fixed, the condensation temperature
3
depends on volume Tc (V ) V 2 . Thus, the critical pressure Pc varies as
5
Pc V 3 .
which leads to
T S

=
=

E N

+2
U N
+1

222

(1206)

In the Bose condensed phase, = 0. Hence, with

( + 2)
U = ( + 1)
Nexc kB T
( + 1)
one finds that the entropy is given by

( + 2)
S = ( + 2)
kB Nexc
( + 1)

(1207)

(1208)

This implies that only the excited particles are disordered and contribute to the
entropy.
The above discussions relate to non-interacting boson gasses. It is expected
that Bose-Einstein condensation may be hindered by interactions, since the density should resemble the squared modulus of the lowest energy single particle
wave function and exhibit the same non-uniformity in space. Local interactions
are expected to make the fluids density uniform.
Homework:
Rb87 was reported to Bose Condense32 . The gas was trapped in a threedimensional harmonic potential with frequency 0 750 Hz.
(i) Determine an approximate form for the density of states.
(ii) For a number density of 2.5 1012 per cm3 , estimate the Bose-Einstein
condensation temperature and compare it to the reported value of Tc 170 nK.

9.7

Superfluidity

He has two electrons located in the 1s orbitals, so it has a closed atomic shell
and is relatively chemically inert. There does exist a van der Waals interaction
between the He atoms. The interatomic potential consists of the sum of the
short-ranged exponential repulsion and the attractive van der Waals interaction33 . The potential has a weak minimum of depth 9 K, at a distance of about
3
A. Since the atom is relatively light, it doesnt solidify easily. If the atoms
did solidify, with a lattice spacing d, then the uncertainty in the momentum p
of any atom would be of the order of h/d. The kinetic energy would be of the
order of
p2
h2

(1209)
2m
2 m d2
32 M.H. Anderson, J.R. Ensher, M.R. Matthews, C.E. Wieman and E.A. Cornell, Observation of Bose-Einstein Condensation in a Dilute Atomic Vapor, Science, 269, 198-201, (1995).
33 J.C. Slater and J.G. Kirkwood, The van der waals forces in gases, Phys. Rev. 37,
682-697 (1931).

223

V(R) [ Kelvin ]

-5

-10

-15
2

R/a0

Figure 53: The interaction potential of atomic Helium as a function of radial

distance R/a0 , where a0 is the Bohr radius.
, this energy is greater than the minimum of the potential. Hence,
for d 3 A
the lattice would melt. Therefore, He remains a liquid at ambient pressures.
However, He interacts quite strongly so at pressures of the order of 25 atmospheres, it can solidify at very low temperatures.
The isotope He3 is a fermion and form a Fermi-liquid at low temperatures.
On the other hand, He4 is a boson and obeys Bose-Einstein Statistics. He4 can
be cooled by a process of evaporation. When it is cooled, the Helium becomes
turbulent, just like boiling water. However, at a temperature of 2.2 K it suddenly becomes clear and the turbulence disappears. This signals a phase change

Figure 54: The P T phase diagram of He4 . He4 remains a fluid for pressures
below 2.5 MPa. The Liquid-Gas critical point is located at a temperature of
5.2 K. The liquid phase undergoes a further transition from He I (the normal
liquid) to He II (the superfluid), as the temperature is reduced below 2.18 K.
from the He I phase, to He II.

224

He II has unusual properties, it flows as if it has a vanishing viscosity, and

can flow through narrow capillaries34 . That is, it exhibits superflow. On the
other hand, when the fluid is placed in a cavity contain closely spaced disks
that are fee to rotate, and the disks are forced to perform oscillations through
some small angles (torsional oscillations), then the effective moment of inertia
is temperature dependent. The increased moment of inertia indicates that some
fluid is being dragged by the rotating disks. Above the critical temperature Tc
it was found by Andronikashvilli35 that all the fluid was being dragged by the
plates, but below Tc the moment of inertia decreased. in fact, the fraction the
fluid which is dragged by the rotating disks is found to vanish as T is decreased
to zero.
In 1938 Fritz London36 proposed that the transition to He II is related to
Bose-Einstein condensation. The reasons for this proposal was:
(i) The observed critical temperature Tc has the magnitude of 2.18 K, whereas
the Bose-Einstein condensation temperature for free bosons is given by
kB Tc

2 h2
=
2m

2/3
3 V

2 N

(1210)

which is calculated as 3.14 K.

(ii) The transition occurs in the boson system He4 but not in the fermionic
system He3 .
(iii) For the Bose condensed phase of an ideal gas, the pressure is independent of
the density and only a function of temperature. Hence, pressure variations are
directly related to temperature variations, as is true for He II. The specific heat
of He II also shows a anomaly at Tc , whereas the ideal Bose gas is expected
to exhibit a cusp at Tc . However, lambda-like anomalies in the specific heat are
universally found at second-order phase transitions.
Superfluid He differs from the Bose-Einstein condensate in many details,
primarily due to the effects of interactions. For example, an ideal Bose condensate would not be a superfluid. However, interactions can change the form of
the dispersion relation for the low-energy excitations so that He II is a superfluid.
The Excitation Spectrum of a Weakly-Interacting Bose Condensate
34 P. Kapitza, Viscosity of Liquid Helium below the -point, Nature, 141, 74-74, (1938),
J.F. Allen and A.D. Misener, Flow of Liquid Helium II, Nature, 141, 75-75, (1938).
35 E.I. Andronikashvilli, Direct Observation of two kinds of motion for Helium II, Zh.
Eksp. Theor. Fiz., 16, 780 (1946).
36 F. London, The Lambda Phenomenon of Liquid Helium and the Bose-Einstein Degeneracy, Nature, 141, 643 (1938).

225

Consider a gas of weakly-interacting bosons contained in a volume V . The

Hamiltonian is described by
=
H

< 0 |

p2
1
| > a
0 a
+
2m
2!

< 0 0 | Vint | > a

0 a
0 a

,;0 , 0

(1211)
where the two-body interaction represents a short-ranged repulsive interaction.
The one-body part can be diagonalized by choosing the single-particle wave
functions (r) to be momentum eigenstates

1
k (r) = exp i k . r
(1212)
V
The Hamiltonian can be expressed as
=
H

X h
2 k2
1
a
k a
k +
2m
2! V
k

Vint (q) a
kq a
k0 +q a
k0 a
k

(1213)

k,k0 ;q

where the scattering term conserves momentum and Vint (q) is the Fourier Transform of the interaction potential. Therefore, the two-body scattering does not
change the total momentum of the system. We shall assume that the potential
is sufficiently short-ranged so that the limit, limq0 Vint (q) = Vint (0), is welldefined.
At sufficiently low temperatures, the bosons are expected to form a condensate. Let the number of particles in the condensate be N0 and we shall assume
that N0 is much larger than the number of excited particles Nexc
X
N exc =
a
k a
k
(1214)
k

so
N0 = N

a
k a
k

(1215)

The condensate, contains a large number of bosons with k = 0 and, therefore can
be considered to be a coherent state. The expectation value of the creation and
0 , respectively, can be replaced by the complex numbers
annihilation a
0 and a

a0 and a0 . Thus, the interaction Hamiltonian can be expanded in powers of a0

or a0 as

X h
2 k2
1
a
a
k +
| a0 |4 Vint (0)
2m k
2! V
k

1 X
2
2
+
Vint (k) a0 a
k a
k + a0 a
k a
k
2! V
k

2 X
+
Vint (0) | a0 |2 a
k a
k + a
k a
k
+ . . .(1216)
2! V
k

226

Terms cubic in a0 and a0 are forbidden due to the requirement of conservation of momentum. In this expression, we have ignored terms involving more
than two excited boson creation or annihilation operators. The above form of
the Hamiltonian contains terms involving unbalanced creation and annihilation
operators. The term with two creation operators represent processes in which
two bosons are scattered out of the condensate and the term with two annihilation operators represents the
of two bosons into the condensate. On
P absorbtion

a
a
,
one
finds
replacing | a0 |2 by N
k
k
k

X h
N2
2 k2
a
k a
k +
Vint (0)
2m
2! V
k

N X

+
Vint (k)
exp[ + 2 i ] a
k a
k + exp[ 2 i ] a
k a
k
2! V
k

N X

+
Vint (0) a
k a
k + a
k a
k
+ ...
(1217)
2! V
k

where is a constant phase. When the condensate adopts a phase, the continuous U (1) phase symmetry of the Hamiltonian has been spontaneously broken.
The Hamiltonian can be put in diagonal form37 by using a suitably chosen
, so
unitary transformation U
0 = U
H
U

(1218)

0 has the same spectrum of eigenvalues as the

The transformed Hamiltonian H

original Hamiltonian H. We shall choose the transformation to be of the form

U = exp
k ak ak exp[ + 2 i k ] ak ak exp[ 2 i k ]
k

(1219)
which is unitary when k is real. The creation operators transform as
k

a U

U
k

cosh k ak + sinh k exp[ 2 i k ] ak

(1220)

and the annihilation operators are found to transform as

ak U

cosh k ak + sinh k exp[ + 2 i k ] ak

(1221)

Since the transformation is unitary, it does not affect the canonical conjugate
commutation relations
[
k ,
k 0 ]
37 N.N.

= k,k0

(1222)

Bogoliubov,On the Theory of Superfluidity, J. Phys. USSR, 11, 23 (1947).

227

and
[
k ,
k 0 ]

[a
k ,
k0 ]

(1223)

The transformed Hamiltonian takes the form

0
H

N2
Vint (0)
2! V

X
h2 k 2
N

2
2
k + sinh k a
k a
k
+
+
Vint (0)
cosh k a
k a
2m
V
k

N X

+
Vint (k) sinh k cosh k cos[2( k )] a
k a
k + a
k a
k
V
k

X
h2 k 2
N

+
+
Vint (0)
sinh k cosh k exp[+2ik ] a
k + exp[2ik ] a
k a
k
k a
2m
V
k

N X

2
2
+
Vint (k)
cosh k + sinh k
k + exp[2i] a
k a
k
exp[+2i] a
k a
2! V
k

(1224)
when written in terms of the original creation and annihilation operators. The
terms non-diagonal in the particle creation and annihilation operators can be
eliminated by the appropriate choice of k and k . We shall set k equal to
the phase of the condensate, k = . Then, the off-diagonal terms vanish if one
chooses k to satisfy
tanh 2k =

N
V
h
2

2 m

Vint (k)
+

N
V

(1225)

Vint (0)

With this choice, the Hamiltonian reduces to

1 X
h2 k 2
N
N2
0

Vint (0) +
E(k)

Vint (0)
H =
2! V
2
2m
V
k
X
+
E(k) a
k a
k
(1226)
k

where the first line represents the ground state energy and the second line represents the energy of the elementary excitations. The energy of the elementary
excitation E(k) is given by
s
2

2
h2 k 2

N
N
E(k) =

+
Vint (0)
Vint (k)
(1227)
2m
V
V
228

This dispersion relation vanishes identically at k = 0 and is approximately linear

for small k,
s
E(k) k

2 N
h
Vint (0)
m V

(1228)

where the excitations have the characteristics of phonons. The excitations are
the Goldstone modes associated with the broken gauge symmetry38 . It should
be noted that Vint (0) must be positive for this solution to be stable. At higher
values of k the dispersion relation reduces to
E(k)

2 k2
h
N
+
Vint (0)
2m
V

(1229)

which represents the bare particle dispersion relation together with a constant
energy shift due to the interaction with the particles in the condensate.
In summary, one observes that due to the interactions with the particles in
the condensate, the dispersion of the elementary excitations has changed from
quadratic to being linear. This has the important experimental consequence
3
that the specific heat changes from being proportional to T 2 at low temperatures to having a T 3 variation.
The Coherent Nature of The Ground State
The replacement of the condensate creation and annihilation operators may
have obscured the physics, specially since the unitary transformation does not
conserve the number of particles. An appropriate generalization to the case of
conserved particles is given by

X k

U = exp
0 a
0 a
0 a
0 ak ak
(1230)
ak ak a
N
k

where, for convenience, we have set the phase of the condensate to zero. The
states of the untransformed system can be obtain from those of the transformed
system by the inverse transformation
| 0 >
| >= U

(1231)

In the primed frame, the ground state ia an eigenstate of the number operator
n
k = a
k a
k with eigenvalue zero. Hence, in the primed frame, the ground state
simply corresponds to N bosons in the condensate
1
| 0 > =
(a
0 )N | 0 >
N!

(1232)

38 Gauge symmetry is broken since each condensate has a particular value for the phase
, whereas the theory shows that the energy of the Bose-condensed state is independent of
, where is a continuous variable that lies in the range 2 > > 0. Hence, the
Bose-condensate has broken the continuous phase-symmetry of the Hamiltonian.

229

where | 0 > is the vacuum. Since the vacuum satisfies

| 0 > = | 0 >
U

(1233)

one finds that the ground state in the un-transformed system is given by
| >

=
=
=

1
U
| 0 >

U (a
0 )N U
N!
1
|0 >

U (a
0 )N U
N!

N2
1

U a
0 a
0 U
|0 >
N!

(1234)

Thus, the ground state of the condensate has the form of a product of linear
superpositions39
| >

a
0

N2
2 X

k a
k a
k + . . .
|0 >

N
k

tanh k a
k

a
k

N2
|0 >

(1235)

in which it is seen that the interaction has scattered pairs of bosons out of
the condensate. Conservation of momentum shows that the pairs of particles
scattered out of the condensate have zero total momentum. Thus, the number
of particles with zero momentum is smaller than the number of particles. The
ground state is a form of coherent state, in the sense the number of particles
in the condensate is large as is the number fluctuations. It is also seen that
the components of the ground state with different numbers of particles in the
condensate have definite phase relationships.
The states with a single elementary excitation present are proportional to

U a
k U | > =
cosh k a
k a
0 sinh k a
0 a
k | >
(1236)
Hence, the elementary excitations of the Bose-Einstein condensate are of the
form of a linear superposition. The relative weight of the single-particle excitation ak in this excited state is significantly reduced for small k.
Thus, the interaction not only produces a change in the dispersion relations
of the excitations of a Bose-Einstein condensate, but also changes the character
of the excitations.
39 M. Girardeau and R. Arnowitt, Theory of Many-Boson Systems: Pair Theory, Phys.
Rev. 113, 755-761 (1959).

230

The Critical Velocity

The change in the character of the dispersion relation at low energies has the
consequence that the condensate of a weakly-interacting Bose-Einstein particles
exhibits superfluidity. Superfluidity is the property of flowing through narrow
capillaries without exhibiting viscosity. Thus, a superfluid will flow in the absence of any driving forces.
We shall consider the flow of a fluid at zero temperature, flowing with velocity
v through a capillary. In the primed frame of reference moving with the fluid,
the walls of the capillary are moving with velocity v, In this primed reference
frame, consider the fluid to initially be in a Bose-Einstein Condensate and carries
no momentum. The total energy of the fluid would be M c2 , the rest mass
energy, and the total momentum is zero. If the viscosity were to be finite, the
interaction with the moving capillary walls would cause the fluid to start moving.
The change in the state of the initial fluid could only be caused by exciting the
internal degrees of freedom of the superfluid. That is, if the viscosity is present,
the interaction with the moving capillary walls should produce an excitation in
the liquid. In the primed reference frame, the total energy for the condensate
in which there is an excitation with momentum p0 is given by
ET0 = M c2 + E(p0 )

(1237)

In the reference frame where the capillary walls are stationary, this energy is
given by the Lorentz transformation
ET

( ET0 + v . p0 )

1 v/c

M c2 + E(p0 ) + p0 . v +

M 2
v
2

(1238)

and the momentum in the rest frame is given by

1
q

( p0 + v
2

1 v/c

p0 + M v

E0
)
c2
(1239)

Hence, the energy of the excitation in the stationary reference frame is given by
E = E(p0 ) + p0 . v

(1240)

and its momentum is p0 . The excitation energy must be negative, if the excitation is to be allowed. The rationale for this is that since the capillary is at rest
and at T = 0, it cannot provide the energy necessary to create a positive-energy
excitation. On the other hand, since the fluid is moving, it can lose energy by
reducing its state of motion and dissipate the excess energy through the creation
231

of positive-energy excitations in the capillarys walls. Hence, we require that

the excitation energy is negative
E(p0 ) + p0 . v < 0

(1241)

It is possible to satisfy this criterion if p0 is anti-parallel to v. After the excitation has occurred, the liquid is expected to have slowed down.

Figure 55: The experimentally determined excitation spectra of He II at T = 1.1

K. [R.A. Cowley, and A.D.B. Woods, Inelastic Scattering of Thermal Neutrons
From Liquid Helium, Can. J. Phys. 49, 177 (1971).]
The criterion for dissipation to occur in the fluid is that
E(p0 ) + p0 . v < 0

(1242)

On assuming v and p0 are oppositely directed, the criterion for viscous flow
reduces to
v > min E(p0 )/p0
(1243)
for some value of p0 . The critical velocity vc is defined as
vc = min E(p0 )/p0

(1244)

The Landau criterion40 states that superflow occurs when vc > v > 0 and
viscous flow occurs when v > vc . Geometrically, the critical velocity is the
minimum value of the slope of a line from the origin to a point p0 on the curve
E(p). This point is given by the solution of

E(p0 )
dE(p)
=
(1245)
p0
dp p0
Any Bose-Einstein condensate with a parabolic dispersion relation E(p) cannot
exhibit superflow, since p0 = 0. Therefore, for a Bose-Einstein condensate with
40 L.D.

Landau, The Theory of Superfluidity of He II, J. Phys. USSR, 5, 71, (1941).

232

a parabolic dispersion relation vc = 0 so any flow is viscous. For He4 , the

theoretically calculated and experimentally measured dispersion relation E(p)
is linear at small p but exhibits a minimum at some finite value of p, which
is known as the roton minimum. The roton minima is caused by the strong
interactions between the particles which are almost strong enough to result in
the formation of a crystal. In fact Feynman has shown that the dispersion
relation can be written in terms of the geometric structure factor S(p) via
E(p) =

p2
2 m S(p)

(1246)

For large p, the structure factor has a maximum at a momentum p0 given by

p0 hd where d is the interparticle spacing. The maximum in the structure
factor gives rise to the observed roton minimum. However, the observed roton
minimum gives a critical velocity of 60 m/sec. This velocity is several orders of
magnitude greater than the critical velocity observed in He II. Feynman41 has
shown that the critical velocity is very close to that expected for a single vortex
ring.

9.8

The Superfluid Velocity and Vortices

A spatially varying condensate can be characterized by a spatially varying wave

function (r) which is given by the creation amplitude for the particles in the
condensate. Then the total number of particles in the condensate N0 is given
by
Z
N0 =
d3 r | (r) |2
(1247)
The spatial variation of the condensate can be described by a phenomenological
Landau-Free-Energy

2
Z
h

F [, ] =
d3 r
| |2 + V (r) | |2 + Vint | |4
(1248)
2m
where the last term represents a localized interaction between the particles in
the condensate. Minimization of the Free-Energy w.r.t yields a Schrodingerlike equation with non-linear terms. Conservation of particles requires that the
condensate density and current density j must satisfy the continuity equation

+ .j = 0

(1249)

where
(r) = | |2

(1250)

41 R.P. Feynman and M. Cohen, Energy Spectrum of the Excitations in Liquid Helium,
Physical Review, 102, 1189-1204, (1956).

233

and
j(r) =

h
2mi

(1251)

If the amplitude of the condensate wave function varies slowly compared to the
phase, one may write
r

N0
(r) =
exp i (r)
(1252)
V
Hence, one finds that the condensates current density is given by
h N0

m V

j(r) =

(1253)

This allows one to define a superfluid velocity v s via

vs =

(1254)

which is governed by the spatial variation of the phase of the condensate.

The circulation of the superfluid velocity field is given by the line integral
around a closed loop inside the superfluid
I
I
h
dr .
(1255)
dr . v s =
m
This equation relates the superfluid circulation to the change of the phase of
the condensate wave function at the end point of the loop. Since the condensate
wave function must be single-valued, the phase at any point must be defined up
to a multiple of 2 . Thus, from continuity of the wave function, the circulation
must be quantized42
I
h
dr . v s =
n2
(1256)
m
where n is an integer, n = 0, 1, 2, . . .. The quantization of circulation is
a manifestation of the quantum nature of the superfluid. If one now considers loop in the condensate which is simply connected and shrinks the size of
the loop to zero, one finds that the only possible value of the phase quantum
number n is zero. Hence, the condensate must be irrotational. On the other
hand, if the condensate is multiply connected, then one cannot shrink the loop
to zero, and the quantum numbers n can be non-zero. Vortices with low numbers of circulation quanta are preferred energetically. Thus, the circulation of
the superfluid can be non-zero when the loop encloses regions in which the condensate density is zero. This analysis has the experimental consequence that if
one starts to rotate a cylindrical vessel containing a superfluid, the superfluid
liquid will initially remain at rest. However, if the angular velocity is increased
42 L.

Onsager, Statistical Hydrodynamics, Nuovo Cim. 6 Suppl. 2 pp. 279-287 (1949).

234

Figure 56: A single vortex excitation in a superfluid contained in a cylinder

rotating with angular velocity .

Figure 57: The configuration of vortex excitations in a superfluid contained

within a rotating cylinder.
above a critical value, excitations consisting of normal regions called vortices
are introduced into the superfluid. A vortex consists of a one-dimensional region of normal fluid (i.e. non-superfluid) that is oriented parallel to the axis of
rotation and extends throughout the entire length of the cylinder [See fig(56).].
The vortices are topological excitations. An increase in the angular velocity of
the cylinder will result in an increase in the number of vortices that penetrate
the superfluid, and the vortex lines will form a two-dimensional array43 . In a

Figure 58: A vortex ring moving through a superfluid.

43 H.E.

Hall and W.F. Vinen, The Rotation of Liquid He II, Proc. Roy. Soc. London,
Series A, 238, 204-215 (1956): 238, 215-234 (1956).

235

non-rotating superfluid, it is possible to have vortices, if the vortices close up

on themselves in loops. Such vortex rings have the form of smoke rings [See
fig(58).]. In a flowing liquid, large vortex rings can be created before the critical velocity for the creation of rotons is reached. Hence, the critical velocity
of a superfluid is usually determined by the vortices. If the arial dimension of
the system is reduced below the size of the vortex rings, the critical velocity is
bounded from above by the critical velocity inferred from the roton excitations.

Phase Transitions

In 1944 Lars Onsager 44 published the exact solution of the two-dimensional

Ising Model in zero-field on a square lattice. His exact solution was a tour de
force of mathematical physics. Onsager used the transfer matrix technique to
describe a finite square lattice of size L L and then diagonalized the matrix by
finding the irreducible representations of a related matrix algebra. The exact
solution demonstrated that, in thermodynamic limit, the system exhibited a
phase transition which is marked by singularities in the physical properties.
The form of the Hamiltonian of the Ising Model is given by
X
=
H
Ji,j Siz Sjz
(1257)
i,j

where Ji,j = J if i and j are on neighboring lattice sites and Ji,j = 0

otherwise. The spin variables can only have the allowed values Siz = 1. We
shall assume J > 0 which favors states where the neighboring S z values have
the same sign. Onsager found that the exact partition function is given by the
expression

Z a
Z a
a2
ln Z
2
dk
dk
ln
cosh
(2J)

sinh(2J)
(
cos
k
a
+
cos
k
a
)
= ln 2 +
x
y
x
y
N
2 ( 2 )2 a

a
(1258)
The argument of the logarithm is non-negative and the integral exists for all
values of J. For J > 0, the minimum value of the argument occurs for k = 0,
and is given by
cosh2 (2J) 2 sinh(2J) = ( 1 sinh(2J) )2

(1259)

The non-analytic behavior of F occurs at the temperature when this minimum

value vanishes. This gives rise to the identification of the critical temperature
as the solution of the equation
sinh(2J) = 1
or equivalently
tanh c J =
44 L.

Onsager, Phys. Rev. 65, 117 (1944).

236

2 1

(1260)
(1261)

For temperatures below the critical temperature, the system is in a ferromagnetic magnetic state. The non-analyticity originates with the long wavelength
behavior of the integral, and can be found by approximating the integral by

Z a
1
a2
ln Z
2
2
dk k ln ( 1 sinh(2J) ) +
ln 2 +
sinh(2J) ( k a )
N
2(2) 0
2

Z 22
1
ln 2 +
dx ln ( 1 sinh(2J) )2 + sinh(2J) x
(1262)
(4) 0
This yields the expression for the non-analytic part of the Free-Energy

2
ln Z
( 1 sinh(2J) )2
1
(1263)

ln 1 sinh(2J)
N
4 sinh(2J)
Hence, the specific heat is found to diverge logarithmically at the transition
temperature
C
8
kB
( c J )2 ln | T Tc |
(1264)
N

which is symmetrical around the transition temperature. Onsager stated without proof that, for temperatures below Tc , the zero-field magnetization defined
by the average value
N
X
M =
< Siz >
(1265)
i=1

varies as

M = N

1 sinh4 (2J)

81
(1266)

The proof of this last result was eventually published by C.N. Yang 45 . Onsagers
success is of great historical importance and is without parallel, since to date, no
exact solutions have been found for models of similar physical importance, such
as the three-dimensional Ising model, or such as the two or three-dimensional
versions of the Heisenberg model. Onsagers results provided the only rigorous
treatment of a phase transition, until three decades later when the renormalization group technique was finally formulated.

10.1

Phase Transitions and Singularities

The non-analytic behavior of the Free-Energy that is seen in the vicinity of

phase transition is, at first sight, quite disturbing since this implies that the
partition function is also anomalous. This is unexpected since the partition
function is defined as

Z() = Trace exp H

(1267)
45 C.N.

Yang, Phys. Rev. 85, 808 (1952).

237

which is understood to be the sum of a finite number of positive terms and,

therefore, is an analytic function of (except possibly when ). Furthermore, the sum of these terms exponentiates to yield a Free-Energy FN ()
which has an extensive part (proportional to N ) and other parts which become
negligible when N . This would lead us to expect that the Free-Energy is
also an analytic function. However, the sequence of series of analytic functions
FN () need not be analytic in the limit N .
Just to make this explicit, consider an Ising Spin Hamiltonian, with longranged pair-wise interactions
X
= J
H
Siz Sjz
(1268)
i>j

where Siz = 21 . Any spin interacts with every other spin with the same
interaction strength. The Hamiltonian can be re-written as

X
N

N
X

J
2

X

2
N
J
N
z
Si

2
4
i=1

Siz

i=1

( Siz )2

i=1

(1269)

Strictly speaking, it is necessary to formally define J = J 0 /N , if we were to

demand that the Free-Energy be extensive. We shall ignore this simply to
avoid carrying around extra factors of N . If J is negative, the ground state
corresponds to a ferromagnetic state for which
N
X

Siz =

i=1

N
2

for J < 0

(1270)

The ferromagnetic state is two-fold degenerate. On the other hand if J > 0

the ground state (when N is even) corresponds to
N
X

Siz = 0

for J > 0

(1271)

i=1

which is highly-degenerate. It is easy to show that the degeneracy is given by

CN
N . The partition function ZN () can be expanded in powers of a parameter
2
z defined as

J
z = exp
(1272)
8
For positive J, z is reduced from 1 to 0 as T is reduced from to 0. For
negative J, z increases from 1 to as T is reduced from to 0. The partition

238

function is found to be given by

ZN ()

exp
zN

J
N
8

N
X

X
N

N (N 2m)
Cm
z

m=0
2

N (N 2m)
Cm
z

(1273)

m=0
N
where Cm
are the binomial coefficients. This expression contains a factor which
is a very high-order polynomial in z. The polynomial has no roots on the positive real axis except, perhaps, at the point z = 0. However, it does have pairs
of complex conjugate roots in the complex z plane. The roots may be multiple
roots. For our model, it is seen that the pairs of roots are located on circles
enclosing the origin z = 0. As z approaches a point which is a root, the partition

Figure 59: The distribution of zeroes of the partition function Z9 (z) for the
Ising Model with long-ranged interactions, in the complex z-plane. The dashed
blue circle has a radius of unity.
function approaches zero and the Free-Energy FN (z) diverges logarithmically.
In general, the partition function is expected to have the form

Y
ZN (z) = exp N A(z)
( z z ) ( z z )

(1274)

where z and z are the pairs of complex conjugate roots in the complex z
plane, and A(z) is a simple function. The Free-Energy FN (z) is given by
X
FN (z) = kB T N A(z) kB T
ln ( z z ) ( z z )
(1275)

239

which has singularities in the complex z plane. Lee and Yang46 proved that the
limit
1
1
lim
FN (z) = kB T lim
ln ZN (z)
(1276)
N N
N N
exits for all real positive z and is a continuous monotonically increasing function
of z. Also, for any region which does not contain any roots of ZN (z) then
limN N1 FN (z) is analytic in this region. If these conditions are satisfied for
all physical values of z, the system does not exhibit a phase transition.
As the limit N is approached, the zeroes of ZN (z) may approach the
real axis and pinch it off at a real value of z, zc . The conditions of the Lee-Yang
theorem do not apply in the immediate vicinity of this point. If the zeroes
approach a point zc on the real axis continuously as N is increased, then the
point zc may be located on a branch cut of F (z) which would yield non-analytic
behavior at zc or equivalently at c . In such a case, zc would defines a critical
temperature Tc at which the Free-Energy is singular.

10.2

The Mean-Field Approximation for an Ising Magnet

Consider an Ising Hamiltonian in the presence of an applied magnetic field H z

X
X g B
=
Siz H z
(1277)
H
Ji,j Siz Sjz

h
i,j
i
The operator S z has (2S + h)/h possible eigenvalues which are S, S +
h, . . . , S
h, S. The interaction Ji,j couples the z-components of nearest
neighbor spins. We shall assume that the interaction J is short-ranged and
takes on the same positive value between each pair of nearest neighboring spins,
so that the lowest energy configuration is ferromagnetic in which all the spins are
aligned parallel to each other. Although the Hamiltonian has an extremely simple form, the only known exact expressions for the Free-Energy have been found
for the special cases where the spins are arranged on one or two-dimensional lattices47 . Therefore, we shall have to describe this system approximately by using
the mean-field approximation, first introduced by Weiss.
We shall define the average magnetization per spin as m and express the
Hamiltonian as

X

X
g B
z
z
z

H =
Ji,j m + ( Si m )
m + ( Sj m )
m + ( Si m ) H z
h
i,j
i
(1278)
46 C.N. Yang and T.D. Lee, Statistical Theory of Equations of State and Phase Transitions:
1. Theory of Condensation, Phys. Rev. 87, 404-409, (1952).
T.D. Lee and C.N. Yang, Statistical Theory of Equations of State and Phase Transitions: 2.
Lattice Gas and Ising Model, Phys. Rev. 87, 410-419, (1952).
47 L. Onsager, Crystal Statistics I: A two-dimensional model with an order-disorder transition, Phys. Rev. 65, 117-149 (1944).
L. Onsager, unpublished: Nuovo Cimento 6, Suppl. p.261 (1949).

240

and expand in power of the fluctuations of the spins ( Siz m ) from their
average value. To first-order in the fluctuations, one has
X
MF =
H
Ji,j m2
i,j

Ji,j m ( Siz + Sjz )

i,j

X g B
Siz H z
h

(1279)

where we have neglected the terms of second-order in the fluctuations. One

should note that the above Hamiltonian resembles that expected for non-interacting
spins in an effective magnetic field Hef f , given by
X
h
Ji,j m
g B j

z
Hef
f = H +

(1280)

The mean-field partition function Z can be calculated as

ZM F = Trace exp HM F

(1281)

where the Trace runs over all the possible spin configurations. Thus, the Trace
corresponds to the products of sums over the (2S/h + 1) possible configuration
of each spin. Since the spins are no longer coupled, the mean-field Hamiltonian
factorizes. Hence, the partition function has the form

ZM F = exp

Ji,j m2

Y
N

+S
X

i=1

Siz =S

i,j

exp

g B z z
Si Hef f
h

(1282)
The trace can be performed yielding the result

ZM F = exp

Ji,j m2

sinh

(2S+
h)
h

sinh

i,j

g B Hef f
2

N
(1283)

Hence, in the mean-field approximation, the Free-Energy is given by

X
(2S + h) g B Hef f
FM F =
Ji,j m2 N kB T ln sinh
h
2
i,j

g B Hef f
+ N kB T ln sinh
(1284)
2
The magnetization is found from the thermodynamic relation

F
z
M =
H z
241

(1285)

which yields
M z = N g B

2S + h
(2S + h)
coth
2 h

On recognizing that
Mz =

g B Hef f
2

1
coth
2

g B X z
S i
h
i

g B Hef f
2
(1286)
(1287)

one finds that the average value of S z is independent of the site and is given by

2S + h
(2S + h) g B Hef f
h
g B Hef f
S0z =
coth

coth
2
h

2
2
2
(1288)
or, equivalently, on using the definition of m as the average value of the zcomponent of the spin
P

z
hm
2S + h
(2S + h) g B H +
j Jj,0
m =
coth
2
h
2
P

z
g B H +
hm
h

j Jj,0
coth
(1289)

2
2
This non-linear equation determines the value of m. This equation is known
as the self-consistency equation, since the equation for m has to be solved selfconsistently as m also enters non-linearly in the right-hand side. The equation
can be solved graphically.
For H z =
simplifies to

0 the two spin directions are equivalent, and the equation

P

hm
(2S + h)
2S + h
j Jj,0
coth
2

h
2
P

hm
h

j Jj,0

coth
2
2

(1290)

Both the left and right hand sides are odd functions of m. This symmetry
is a consequence of the symmetry under spin inversion Siz Siz of the
Hamiltonian, when H z = 0. The graphical solution is illustrated in the figure.
At high temperatures, the equation has only one solution m = 0, whereas at low
temperatures, there are three solutions, one solution corresponds to m = 0 and
the other two solutions corresponds to m = m0 (T ) located symmetrically
about m = 0. The value of m0 (T ) increases continuously from 0 and saturates
at S as T is decreased towards zero. The critical temperature Tc at which the
pair of non-zero solutions first appears can be found by expanding the righthand side w.r.t. m, since it is expected that m 0 just below Tc . This leads to
the equation
2 X
Jj,0 S ( S + h ) m + O(m3 )
(1291)
m
3 j
242

1.5

0.5

0
-1.5

-0.5

0.5

1.5

-0.5

T > Tc

-S

-1

-1.5

Figure 60: The graphical solution of the mean-field self-consistency equation,

for temperatures above Tc .
1.5

0.5

0
-1.5

-0.5

0.5

1.5

-0.5

-S

Tc > T

-1

-1.5

Figure 61: The graphical solution of the mean-field self-consistency equation,

for temperatures below Tc .
On assuming that the cubic terms are negligibly small, this has the solution

243

m = 0 unless the temperature is equal to Tc which satisfies

kB Tc =

2 X
Jj,0 S ( S + h )
3 j

(1292)

At the critical temperature Tc , a non-zero (but still infinitesimal) value of m0 (T )

is first allowed. At this temperature, the graphical solution shows that the line
m = m is tangent to the magnetization curve at the origin. For temperatures
just below Tc , the cubic terms in m in the self-consistency equation determine
the magnitude of the solution m0 to be a function of ( Tc T ).
Below Tc , the system must be described by one of the three possible values
for m. In equilibrium, the Free-Energy should be minimized. The condition
that FM F is an extremum w.r.t. m is equivalent to the above self-consistency
condition for m. It is seen that for temperatures for which the non-zero solutions exist, the Free-Energy is minimized for the non-zero solutions for m, and
that these minima are degenerate. The system will spontaneously break the

-1

-0.5

FMF(m)

m
-1.5

0.5

1.5

T > Tc

Figure 62: The m-dependence of the mean-field Free-Energy, for temperatures

above Tc .
spin inversion symmetry, by settling into one of the two equivalent states. When
H z = 0, the mean-field approximation indicates that there is a second-order
phase transition at Tc . The transition occurs at the temperature at which the
location of the absolute minimum value of FM F (m) changes from m = 0 for
temperatures above Tc to the two equivalent locations of m = m0 (T ) for
temperatures below Tc .
The mean-field approximation shows the existence of another type of phase
transition, which occurs when there is a small symmetry breaking applied magnetic field H z which is to be varied. In this case, the applied field stabilizes one
of the minima (say m0 ) w.r.t. to the minima at m0 . However, on changing the
applied field from a slightly positive to a slightly negative value, the equilibrium
value of m jumps discontinuously from + m0 to m0 . This discontinuous jump
244

m
-1

-0.5

FMF(m)

-1.5

0.5

1.5

Tc > T

Figure 63: The m-dependence of the mean-field Free-Energy, for temperatures

below Tc .

FMF(m)

between two pre-existing minima of F (m) is characteristic of a first-order phase

transition. The defining characteristic of a first-order transition is the presence

Figure 64: The m-dependence of the mean-field Free-Energy, for temperatures

below Tc and various values of the applied field.
of discontinuous first-derivatives of the Free-Energy48 . Since the entropy is a
first-derivative of the Free-Energy with respect to temperature, first-order transitions may involve a latent heat. Since energy is not transferred instantaneously
between a system and its environment, first-order transitions are frequently associated with mixed-phase regimes in which some parts of the system have
completed the transition and others have not. Another characteristic is the
presence of hysteresis. For systems where the transformation proceeds slowly,
a phase may exist at fields where its Free-Energy is not a global minimum but
only a local minimum. Hence, on decreasing the field, the transformation may
48 P. Ehrenfest (1880-1933) proposed a classification of phase transitions based on the discontinuities of the derivatives of the Free-Energy. A transition was defined to be n-th order
transition if the lowest-order derivative of the Free-Energy which was discontinuous across
the transition was the n-th order derivative. This classification fell into disfavor when it was
realized that a derivative could diverge without it being discontinuous.

245

first occur at a negative value of the field. Furthermore, if the field is subsequently increased, the reverse transformation may occur at a positive value of
the field. The point at which the transformation occurs is determined by the
rate at which the field is changed and the time-scale required for the system to
nucleate the new phase.
1.2

T/Tc

0.8

0.6

0.4

0.2

0
-1

-0.5

0.5

Figure 65: The H T phase diagram for the mean-field description of a ferromagnet.
The phase diagram in the H T plane, shows a line of first-order transitions
at low-temperatures which ends at the critical point (H = 0, T = Tc ). Due to
the symmetry of the magnetic system, the line of first-order transitions is vertical. On keeping H = 0, the system exhibits a second-order phase transition at
the critical point.
Phase transitions are found in many different types of system. Despite the
differences between the microscopic descriptions, phase transitions can usually
be described in the same manner. For example, the liquid-gas phase transition
the role of the magnetization is replaced by the density and the magnetic field
is replaced by the pressure. The line of first-order transitions is not vertical but
has a finite slope so it can be crossed by changing temperature.
These transitions can be described in similar manners. In this description,
the microscopic variables are replaced by coarse grained variables that represent the collective coordinates of a large number of the microscopic degrees of
freedom. The resulting description only retains the essential characteristics of
the underlying microscopic systems, such as the symmetry, the dimensionality
and the character of the ordering.

246

Page 1 of 1

Figure 66: The P T phase diagram for the liquid-gas transition. The liquid
and gas are separated by a line of first-order transitions which ends at a critical
point.

10.3

The Landau-Ginzberg Free-Energy Functional

The Landau-Ginzberg formulation of critical phenomena is based on a coarse

grained version of statistical mechanics, in which the microscopic degrees of
freedom, pertaining to some small region of space, have been replaced by a
small set of collective variables that describe the state of the volume element.
This leads to a formulation of the statistical mechanics of the system in terms of
the collective variables for each small volume element. It is common to introduce
an order parameter defined for the microscopic d-dimensional volume element
dd r surrounding the point r. The order parameter characterizes the change
in the system that occurs at the phase transition. For a system which changes
symmetry, the order parameter is an extensive quantity (for example it could be
a scalar quantity or it could be a vector quantity with a number of components
that we will denote by n, etc.) which is non-zero in the state with lowersymmetry and is zero in the state with higher-symmetry. For simplicity, we
shall only consider the case when one type of symmetry is broken, in which
case, only one order parameter is required. The volume of the system is to
be partitioned into a set of identical infinitesimal volumes (or cells) dd r that
http://www.teachersparadise.com/ency/en/media/d/dc/phase_diag.png
surrounding a set of points
that we shall label by the variable r. The order
parameter (r) can be expressed as the sum of terms of microscopic quantities
i that are expressed in terms of the microscopic degrees of freedom contained
in the volume element dd r surrounding the point r
X
(r) =
i
(1293)
i r

The partition function Z is expressed as the trace over the Hamiltonian

Z = Trace exp H
(1294)

247

11/27/09

The Landau-Ginzberg Free-Energy Functional F [(r)] is a number that depends

on the function (r) and is expressed as in terms of a trace

Y
X

exp F [(r)] = Trace

i
exp H
(r)
i r

(1295)
where the product contains a delta function which constrains the microscopic
variables in the volume elements around each point r to be consistent with the
value of (r) at that point. Hence, the partition function can be expressed as
an integral over the possible values of (r) for each cell labeled by r.

Y Z
d(r)
exp F []
(1296)
Z =
r

This is recognized as a functional integral, and it should be noted that the set of
possible functions defined by the values of (r) at each point of space r include
many wild functions that change discontinuously from point to point and also
includes functions vary smoothly over space. The functional integral over the set
of all possible functions (r) is weighted exponentially by the Landau-Ginzberg
Free-Energy Functional. The path integral is conventionally denoted by

Z
Z =
D(r) exp F []
(1297)
The Landau-Ginzberg Free-Energy plays the role of a Hamiltonian, which generally depends on T , that describes the physical probabilities in terms of the
collective variables (r) described on the length scale dictated by the choice of
the size of the volume elements d3 r. It contains all the physics that is encoded
in the Helmholtz-Free-Energy F . Like the Hamiltonian, the Landau-Ginzberg
Free-Energy Functional is a scalar. In principle The Landau-Ginzberg FreeEnergy Functional should be calculated from knowledge of the model and its
symmetries. In practice, one can understand properties of phase transitions
in a quite universal way close to a second-order phase transition or a weakly
first-order transition, where the order parameter is quite small. In such cases,
one can expand the Landau-Ginzberg Free-Energy Functional in powers of the
parameter, keeping only terms of low-order. The constraints imposed by stability and the symmetry on the finite number of terms retained, provides severe
restrictions on the form of the Landau-Ginzberg Free-Energy Functional that
describes the phase transition of a system. This severe restriction causes all
the different phase transitions of physical systems to fall into a small number
of universality classes, which are determined only by the dimensionality of the
system d and the dimensionality of the order parameter n. Systems which fall
into the same universality class have the same types of non-analytic temperature variations.
For example, a system residing in a d-dimensional Euclidean space which is
characterized by an n-dimensional vector order parameter (1 , 2 . . . n )
248

and has a Hamiltonian which is symmetric under rotations of the order parameter, can be described by the expanded Landau-Ginzberg Free-Energy Functional

Z
F [] =
dd r F0 + F2 (r) . (r) + F4 ( (r) . (r) )2 (r) . (r)
+

n
X

c ( i . i )

(1298)

i=1

In this expression F0 , F2 , F4 and c are constants, that might depend on temperature and may also depend on the microscopic length scales of the system.
If the above expansion is to describe stable systems that have small values of
the order parameter, it is necessary to assume that F4 > 0. The Free-Energy
Functional has been expressed in terms of quantities that are invariant under the
symmetries of space and the order parameter. The invariant quantities include
the identity and the scalar product
(r) . (r) =

n
X

i (r) i (r)

(1299)

i=1

The first three terms represent the Free-Energy density for the cells, in the absence of an external field. Since the material is assumed to be homogeneous, the
coefficients F0 , F2 and F4 are independent of r. The fourth term represents the
effect of a spatially varying applied external field (r) that is conjugate to (r).
The application of the field breaks the symmetry under rotations of the order
parameter. The final term represents the interaction between neighboring cells,
which tends to suppress rapid spatial variations of the order parameter and,
hence, gives large weights to the functions (r) which are smoothly varying.
The gradient term involves two types of scalar products, one type is associated
with the d-dimensional scalar product of the gradients and the other is associated with an n-dimensional scalar product of the vector order parameter. The
appearance of the gradient is due to the restriction to large length scales in the
Landau-Ginzberg formulation. In this case, expressions such as
2
X

(r + ) (r)
(1300)

which tend to keep the value of in the cell r close to the values of in the
neighboring cells at r + can be expanded, leading to
2
n
n X
X X
X
i (r + ) i (r)

( . i )2

i=1

= c

n
X

( i . i )

(1301)

i=1

where we have assumed that the higher-order terms in the small length scale
are negligibly small and that the neighboring cells are distributed isotropically
249

in space. This assumption of isotropic space and slow variations leads to the
Landau-Ginzberg Functional having a form similar to the Lagrangians of continuum Field Theories. Apart from the coefficients F0 , F2 , F4 and c, the form
of the Lagrangian only depends on the values of n and d. However, for systems
which undergo more than one type of phase transition, it may be necessary to
introduce more than one order parameter, in which case the Landau-Ginzberg
Free-Energy functional can have more complicated forms.
Linear Response Theory
For simplicity, we shall consider the case where the order parameter is a
scalar. In general, if a system is subject to a uniform applied field with an
additional small (perhaps non-uniform) component (r), so that
(r) = 0 + (r)

(1302)

then one expects that the additional small component of the field will induce
a small additional (non-uniform) component into the expectation value of the
local order-parameter < (r) >
< (r) > = 0 + (r)

(1303)

The average of the order-parameter is defined by

Z
1
< (r) > =
D (r) exp F []
Z

(1304)

where the Trace has been replaced by a path integral, and the Hamiltonian H
has been replaced by the Landau-Ginzberg Free-Energy Functional F []. The
Landau-Ginzberg Free-Energy Functional includes both the uniform applied
field and the small (non-uniform) component. On expanding the exponent and
denominator in powers of (r), one has

R
R d 0
0
0

D (r) 1 + d r (r ) (r )
exp F []
=0

< (r) > =

R
R
exp F []
D 1 + dd r0 (r0 ) (r0 )
=0

(1305)
The divisor is expanded to lowest non-trivial order as
1
R

1 +

dd r0

(r0 )

exp

F []

=
R

D exp

F []

250

R
dd r0 (r0 ) (r0 ) exp F []
=0

2

R
D exp F []

(1306)

Since the uniform part of the order parameter satisfies

R
D (r) exp F []
=0

0 =

R
D exp F []

(1307)

then the small additional (non-uniform) component of the order parameter is

given by the expression

R
D (r) (r0 ) exp F []
Z

=0
(r) =
dd r0 (r0 )

R

D exp F []
=0

R
R
0

D (r ) exp F []
D (r) exp F []
=0
=0

R
R
D exp F []
D exp F []
=0

(1308)
where the integration over the additional part of the applied field (r0 ) has
been taken out of the averages. The above equation can be written in a more
compact form as

Z
(r) =
dd r 0
< (r) (r0 ) > < (r) > < (r0 ) > (r0 )
(1309)
in which the averages are calculated with = 0. On defining the two-point
correlation function S(r, r0 ) as
S(r, r0 ) = < (r) (r0 ) > < (r) > < (r0 ) >
then the small induced component of the order-parameter is given by
Z
(r) =
dd r0 S(r, r0 ) (r0 )

(1310)

(1311)

which is a linear response relation which connects the small change in the orderparameter at position r to the change in the applied field at position r0 .
For translational invariant systems, the correlation function does not depend
separately on r and r0 , but only on the relative separation r r0 . For materials
251

which are translational invariant, one can displace the origin through a distance
r0 leading to the expression
S(r r0 ) = < (r r0 ) (0) > < (r r0 ) > < (0) >

(1312)

which only depends on the difference r r0 . In the limit where becomes

uniform one finds that, due to translational invariance, becomes uniform
and is given by
Z
=

dd r0 S(r0 )

(1313)

Due to the isotropy of space, the induced value of can be expressed as

Z
=
dd r0 S(r0 )
(1314)
Hence, the uniform differential susceptibility, , defined by

(1315)

is found to be proportional to the spatial integral of the correlation function

Z
(1316)
=
dd r S(r)
where the correlation function S(r) is calculated with = 0.

10.4

Critical Phenomena

The simplest phase diagram exhibiting a phase transition, is a line of first-order

transitions which ends in a critical point. If the line of first-order transitions
is traversed along its length towards the critical point, the system is approaching a second-order phase transition. The order parameter (which is given by a
first-order derivative of the Free-Energy w.r.t. to an applied field ) is discontinuous as the system is taken on a path which crosses the line of first-order
transitions. The difference in the order parameter, from the opposite side of
the line, characterizes the difference between the two phases. The magnitude of
the discontinuity diminishes for paths that cross the line of first-order transition
closer to the critical point. This signifies that there is no discernable difference
between the phases at the critical point. Above the critical point, only a unique
phase exists and the order parameter is zero.
Phenomena that happen at a critical point are known as critical phenomena.
Close to the critical point, the temperature variation in physical quantities may
show non-analytic temperature variations. For example, one may consider a
uniform order parameter defined by the derivative of the Free-Energy w.r.t a
uniform applied field

F
=
(1317)

252

a zero-field susceptibility defined as

(1318)

or the heat capacity C

S
= T
T
2
F
= T
T 2

(1319)

In the vicinity of the critical point, these quantities exhibit a non-analytic temperature. A dimensionless parameter t is introduced as
T Tc
Tc

t =

(1320)

as a measure of the distance to the critical point. The temperature variation

of any quantity is decomposed into a regular part and the non-analytic part. The
temperature dependence of the non-analytic part can be expressed in terms of
t. It is found that the leading non-analytic part follows temperature variations
given by

|t|

for

T < Tc

|t|

for

= 0

|t|

for

= 0

and

for T = Tc

= 0

(1321)

where the exponents , , and are known as the critical exponents. Generally, the value of a critical exponent (say is the exponent of a quantity A) is
determined by taking the limit

ln A
= lim
(1322)
t0
ln |t|
The critical exponent describes the leading order temperature variation. However, one expects correction terms, so that a quantity A may vary as

A = c |t|
1 + D |t| + . . .
(1323)
where > 0. For a second-order transition, there should be no latent heat on
passing through the transition, thus
Z

Tc +

L =

dT C(T ) = 0
Tc

253

(1324)

so, 1 > . The value of is actually significantly smaller than unity, and for
some systems (for example the two-dimensional Ising Model) C varies logarithmically

Tc

(1325)
C ln
T Tc
when T is close to Tc . Since
ln |t| = lim

1
( |t| 1 )

(1326)

the logarithmic variation corresponds to = 0. There are other critical exponents that are introduced to characterize the spatial correlations of the order
parameter. Thus, for example, one can introduce a correlation function S(r)
as the an average of the product of the fluctuations of a local order parameter
(r) defined via
(r) = (r) < (r) >
(1327)
The correlation function S(r r0 ) is introduced as
S(r r0 )

< (r) (r0 ) >

< (r) (r0 ) > < (r) > < (r0 ) >

(1328)

The last term has the effect that the correlation function decays to zero at
large distances for temperatures above and below Tc . Since we are assuming
the system is invariant under translations, one expects that the average value
is non-zero below Tc where it satisfies
< (r) > = < (r0 ) >

(1329)

so that it is independent of the position. Also, due to translational invariance,

one has
< (r) (r0 ) > = < (r r0 ) (0) >
(1330)
so the correlation function only depends on r r. Furthermore, due to isotropy
in space
(1331)
S(r r0 ) = S(|r r0 |)
independent of the orientation of the vector r. In the limit |r r0 | ,
one expects that the value of at r will be unrelated to the value of at r0 .
Therefore, one expects that in this limit
lim

|rr 0 |

< (r) (r0 ) > < (r) > < (r0 ) >

(1332)

Hence, one expects that correlation function decays to zero at large distances
lim

|rr 0 |

S(r r0 ) 0

One can define a correlation length such that

d2+

1
|r r0 |
0
S(r r )
exp
|r r0 |

254

(1333)

(1334)

where d is the number of spatial dimensions, which holds for = 0 and T 6= Tc .

The exponent is the anomalous dimension. The correlation length describes
the length scale above which the correlations die out. The correlation length is
found to diverge as the critical temperature is approached, so that
| t |

(1335)

where the critical exponent is denoted by . One can define another exponent
which describes the spatial correlations at T = Tc . If one defines the Fourier
components k of the local order parameter (r) via

Z
1
k =
dd r exp i k . r (r)
(1336)
V
then one may define a momentum-space correlation function

Z
Z
1
dd r 1
dd r2 exp i k ( r1 r2 ) < (r1 ) (r2 ) >
< k k > =
V
(1337)
as the Fourier transformation of S(r r0 ) for k 6= 0. For T = Tc one defines the
exponent via
< k k > | k |2+

k 0

(1338)

Experiments reveal that to within experimental accuracy the exponents, such as

, and , have the same value no matter in which direction the limitt 0 is
taken. That is the exponents have the same values for T > Tc as for T < Tc .
However, the coefficients of the non-analytic temperature variation usually have
different magnitudes for temperatures above and below Tc . That is, the singularities are of the same type no matter whether the transition is approached
from above or below.
The same sets of values for the critical exponents are found for many different types of transitions. The same sets of values of the critical exponents
are found for systems with transition temperatures that differ by many orders
of magnitude, and for transitions that occur in crystalline materials, are independent of the type of crystal structure. Transitions which share the same
set of values of the critical exponents form what is known as a universality class.
The values of the critical exponents are not independent of one another.
Historically, it was first shown that thermodynamics requires that the six exponents must satisfy (more than) four inequalities. For example, using convexity
arguments, Rushbrooke showed that the heat capacity measured at constant ,
C , must be positive, hence

C T
255

(1339)

Table 2: The experimentally determined values of the critical exponents for a

number of three-dimensional systems with order parameters of dimension n.
Liquid-Gas
Xe

Binary Fluid
BMethanol-hexane

-brass
Cu-Zn

Norm-Super
4
He

Ferromagnet
Fe

Antiferromagnet
RbMnF3

0.08 0.02
0.344 .003
1.203 .002
4.4 .4
0.1 .1
0.57

0.113 .005
0.322.002
1.239 .002
4.85 .03
0.017 .015
0.625 .006

0.05 .06
0.305 .005
1.25 .02
0.08 .017
0.65 .02

-0.014 .016
0.34 .01
1.33 .03
3.95 .15
0.21 .05
0.672 .001

-0.12 .01
0.37 .01
1.33 .015
4.3 .1
0.07 .04
0.69 .02

-0.139 .007
0.316 .008
1.397 .034
0.067 .01
-

The above inequality implies that the exponents must also satisfy the inequality
+ 2 + 2

(1340)

Experimental evidence accumulated that the inequalities were in fact equalities.

The exponents of = 0, = 18 , = 74 , = 1 and = 14 , found from
Onsagers exact solution of the two-dimensional Ising Model, also satisfy the
same equalities. The exponent equalities include the Rushbrooke relation
+ 2 + = 2

(1341)

+ ( + 1) = 2

(1342)

the Griffiths relation

which describes thermodynamic properties. Widom showed that the exponents
would satisfy equalities if the singular part of the Helmholtz Free-Energy F did
not depend separately on T and but, instead, can be written as
1

F (T, ) = |t| y (/|t| y )

where

t =

T Tc
Tc

(1343)

(1344)

Leo Kadanoff introduced the idea that the exponents expressing spatial correlations are also related. These relations include the Fisher relation
(2 ) =

256

(1345)

and the hyper-scaling relation or Josephson relation

d = 2

(1346)

The Josephson relation is the only relation which involves the dimensionality
d. It becomes invalid for sufficiently large d, that is when d exceeds the upper
critical dimensionality dc . For d > dc , all the critical exponents become independent of d.
The scaling that is found in the proximity of a phase transition can be understood as a consequence of the fluctuations of the order parameter that occur
as the phase transition is approached. The picture is that as the temperature is
decreased towards the critical temperature, the material exhibits islands which
are ordered whose spatial extent increases with decreasing t. Furthermore,
it is the long ranged large scale fluctuations that dominate the physical divergences. At the transition t 0 so , therefore, the system becomes scale
invariant. The scaling hypothesis assumes that the correlation length is the
only relevant characteristic length scale of the system close to the transition
and that all other length scales must be expressible in terms of . Hence, the
effects of the microscopic length scale a should be expressible in terms of the
ratio a which vanishes close to the transition. The temperature dependence
of static properties can then be inferred from dimensional analysis. Thus, the
Free-Energy (measured in units of kB T ) per unit volume has dimensions Ld
which, on substituting for L, leads to a variation as d . Since the specific
heat has exponent , and involves the second derivative of F w.r.t. T , F
should scale as |t|2 . If the correlation function is normalized to L2d then,
on noting that S(r) is proportional to < (r) (0) >, one has
< (0) > L
which then sets
< (0) >

2d
2

(1347)

(1348)

Also since, from linear response theory, the susceptibility can be expressed as
Z

(1349)
dd r < (r) (0) >
one has L2 or
2

(1350)

On using |t| one obtains the scaling relations

= d
1
=
(d 2 + )
2
= (2 )

257

(1351)

The first is recognized as the Josephson hyper-scaling relation and the last is
the Fisher relation. The exponent can be obtained by first determining the
length scale of the conjugate field from the definitive relation

F
=
(1352)

which leads to the identification that

Ld L

2d
2

2+d
2

(1353)

Since , one has

2+d
2

(1354)

Hence, one finds the scaling relation

2 + d
=

(1355)

The two relations consisting of

=
and

2 + d
2

(1356)

d 2 +

(1357)
2
can be shown to be equivalent to the Griffiths and Rushbrooke relations. The
Griffith relation can be found by simply adding the two relations, yielding
=

( + 1) = d

(1358)

On using the Josephson relation, the above equation is recognized as yielding

the Griffiths relation
( + 1) = (2 )
(1359)
Likewise, on subtracting the relations expressed in eqn(1356) and eqn(1357)),
one obtains
( 1) = (2 )
(1360)
which using the Fisher relation, yields
( 1) =

(1361)

Adding the above equation to the Griffiths relation leads to the Rushbrooke
relation. Scaling analysis indicates that one may consider there to be only two
independent exponents, such as and the anomalous dimension , but does not
fix their values.
258

10.5

Mean-Field Theory

Landau-Ginzburg Mean-Field Theory is an approximation which replaces the

exact evaluation of the path integral for the partition function

Z
Z =
D exp F []
(1362)
and, hence, the exact evaluation of the Free-Energy. It can be viewed as an
approximate evaluation of the path integral analogous to the evaluation of an
ordinary integral by using the method of steepest descents. In the method of
steepest descents, one evaluates an integral of the form

Z
I =
dx exp f (x)
(1363)

by finding the value of x, say x0 , for which the exponent f (x) is minimum and
then approximating f (x) by a parabola

1 d2 f
( x x0 ) 2 + . . .
(1364)
f (x) = f (x0 ) +
2 dx2 x0
This approximation is based on the assumption that the value of the integral I
has its largest contribution from the region around x0 . This leads to the result
v

u
2
u

exp f (x0 )
(1365)
I u

2
t
ddxf2
x0

The dominant contribution to the integral is given by the exponential factor,

that is the it is determined by the exponent at the value of x for which the exponent of the integrand is maximized. Mean-field theory approximates the path
integral by its extremal value. That is, the partition function z is approximated
by

Z exp

F [0 ]

(1366)

where 0 is the which minimizes the functional F [].

Let us assume that there is a smooth 0 (r) which minimizes F [] and then
consider a set of s with the form
(r) = 0 (r) + (r)

(1367)

where 0 (r) is an arbitrary function which vanishes at the boundaries of the

system. The parameter can be continuously varied from zero to unity, and
represents the amplitude of the deviation from 0 . For a fixed , the Freeenergy functional depends on and can be written as F (). the extremum
condition inoplies that

F ()
= 0
(1368)

=0
259

so the term first-order in in the expansion of F [0 + ] must be zero. That

is, if
F [0 + ] = F [0 ] + F 1 + . . .
(1369)
then for 0 to be an extremum, one requires that
F 1 = 0

(1370)

The explicit form of the above condition is that

Z
0 =
dd r 2 F2 ( 0 . ) + 4 F4 ( 0 . 0 ) ( 0 . ) . + 2 c 0 .
(1371)
where the last term must be interpreted as involving two types of scalar product,
as previously discussed. On integrating the last term by parts and recalling that
vanishes at the boundaries, one can write the integrand as a product of a
factor that depends on 0 and a factor of ,

Z
2
d
0 =
d r . 2 F2 0 + 4 F4 0 ( 0 . 0 ) 2 c 0 (1372)
This must vanish for an arbitrary . One can chose such that

(r) = d r r0

(1373)

for some arbitrary point r0 . The integration over dd r can be performed leading
to the requirement that 0 must satisfy the equation

0 = 2 F2 0 (r0 ) + 4 F4 0 (r0 ) ( 0 (r0 ) . 0 (r0 ) ) (r0 ) 2 c 2 0 (r0 )
(1374)
for any arbitrarily chosen point r0 . The functions 0 (r) which satisfy the above
equation extremalizes F [] for any choice of (r). We shall write eqn(1374) in
the form

2 F2 + 4 F4 ( 0 (r) . 0 (r) ) 2 c 2 0 (r) = (r)
(1375)
in which the spatially varying applied field acts as a source. This equation governs all the extrema of F [].
We shall first consider physical properties associated with the extrema for
which is uniform across the system, and then consider the physical properties
associated with the spatially varying solutions.
Uniform Solutions

260

The differential equation simplifies for spatially uniform solution and zero
applied fields, = 0, to

2 F2 + 4 F4 . = 0
(1376)
which has the solutions
= 0
and
. =

(1377)
2 F2
4 F4

(1378)

The second solution only makes sense if

. > 0

(1379)

since the order parameter is assumed to be a real vector quantity. When

2 F2
< 0
4 F4

(1380)

the only physically acceptable solution corresponds to = 0. The magnitude

of order parameter may take on a non-zero value if

2 F2
> 0
4 F4

(1381)

In this case, in addition to the solution = 0, there exists the possibility

of continuously degenerate solutions. Due to stability considerations, one must
have F4 > 0. Hence, a second-order phase transition may occur when F2
changes sign. This motivates the notation
F2 = A ( T Tc )

(1382)

with A > 0. For T > Tc , there is only one unique solution which is given by
= 0, so the mean-field value of the Free-Energy is given by
F [ = 0] = V F0

(1383)

whereas for T < Tc , one has the possibility of an additional solution corresponding to
2 A ( Tc T )
0 . 0 =
(1384)
4 F4
which fixes the magnitude of 0 as
s
2 A ( Tc T )
0 =
(1385)
4 F4
The non-zero solution is continuously degenerate with respect to the orientation of the vector order parameter. The presence of an infinitesimal applied
261

magnetic field allows the system to choose a direction of which spontaneously

breaks the rotational symmetry of the Hamiltonian. The mean-field Free-Energy
corresponding the broken symmetry solution is found as

2 A2 ( Tc T )2
A2 ( Tc T )2
F [0 ] = V F0
+
4 F4
4 F4

A2 ( Tc T )2
= V F0
(1386)
4 F4
which is lower than the Free-Energy of the solution with = 0. Hence, in the
mean-field approximation, the system will condense into the broken symmetry
state. Thus, the mean-field approximation describes a second-order transition
at Tc and, for T < Tc , the magnitude of the order parameter will have a
temperature-dependence given by
s
2 A ( Tc T )
(1387)
| 0 | =
4 F4
The value of the mean-field critical exponent for the order parameter, , is,
therefore, fixed at = 12 .
The order parameter exponent on the critical isotherm, , defined via
| | | |
is found from the equation

2 A ( T Tc ) + 4 F4 0 . 0 0 =

(1388)

(1389)

which for T = Tc reduces to

4 F4 ( 0 . 0 ) 0 =

(1390)

which leads to the identification of the critical exponent = 3.

The mean-field specific heat C is calculated from
2
F
C = T
T 2

(1391)

which, for T > Tc , yields

C = 0

for T > Tc

(1392)

but, for T < Tc gives

C = V

A2 T
2 F4

for T > Tc
262

(1393)

Hence, in the mean-field approximation, the specific heat exhibits a discontin2

uous jump at Tc , of magnitude V A2 FT4c . The appropriate mean-field specific
heat exponents are = 0 = 0.
The above solutions are all uniform, due to the homogeneity of space, and
due to the uniformity of the applied fields.
Spatially Varying Solutions
It has been shown that the non-uniform configurations (r) which minimize
F [] satisfy the partial differential equation

2
(1394)
(r) = (r)
2 F2 + 4 F4 (r) . (r) 2 c
We shall consider spatially varying solutions, in which the spatial variations are
induced by an applied field which has a small non-uniform part
(r) = 0 + (r)

(1395)

The local order-parameter (r) will be expressed in terms of a uniform component 0 and a spatially varying part (r)
(r) = 0 + (r)

(1396)

The spatially varying part of the order-parameter (r) vanishes in the limit
that (r) vanishes. In this limit, 0 minimizes F [] in the presence of 0 .
The terms of first-order in the small spatially varying components satisfy the
equation

2 F2 + 4 F4 0 . 0 2 c 2 (r) + 8 F4 0 ( 0 . (r) ) = (r) (1397)
This equation indicates that, for temperatures below Tc , the mean field response
will be different depending on the relative orientation of the spatially varying
field (r) and the direction of the uniform order parameter 0 . For temperatures above the critical temperature Tc , the vector order parameter vanishes
and the equation simplifies to

(1398)
2 F2 2 c 2 (r) = (r)
In this case, the magnitude of the induced vector order parameter is independent of the orientation of the spatially varying applied field. This is expected
since, in the absence of the order parameter which has spontaneously broken
the symmetry, the system is isotropic.
Longitudinal Response

263

For temperatures below the critical temperature and when the non-uniform
part of the applied field (r) is parallel to 0 , the mean-field response is longitudinal and satisfies the equation

2
(1399)
L (r) = L (r)
2 F2 + 12 F4 0 . 0 2 c
This equation is also valid for temperatures above Tc , where 0 vanishes, although longitudinal and transverse is undefined. It should be noted that the
equation has a different form in the two temperature regimes.
Transverse Response
The transverse response is only defined for temperatures below the critical
temperature. If (r) is transverse to 0 , the mean-field response is determined
from the partial differential equation

2 F2 + 4 F4 0 . 0 2 c 2 T (r) = T (r)
(1400)
In the limit 0 0, the uniform order parameter 0 satisfies the equation

2 F2 + 4 F4 0 . 0

= 0

(1401)

Hence, the partial differential equation for the transverse response in the meanfield approximation simplifies to
2 c 2 T (r) = T (r)

(1402)

The solution of this equation determines the order parameter for which the
Landau-Ginzberg Free-Energy Functional is extremal.
The Mean-Field Correlation Functions
These difference in the response show that the correlation function S(r)
involved in the linear response theory must be considered as a tensor quantity.
The mean-field equations for the order parameter allows one to calculate the
(mean-field) tensor correlation function. Linear response theory describes how
(r) is related to (r). In particular, if a tensor correlation function Si,j (r)
is defined via
Si,j (r r0 ) = < i (r) j (r0 ) > < i (r) > < j (r0 ) >
one finds that components satisfy the linear response relations
X Z
dd r0 Si,j (r r0 ) j (r0 )
i (r) =
j

264

(1403)

(1404)

Substitution of the linear response relation into eqn(1397) results in an integral

equation which is linear in the components of . The left-hand side involves
an integral over r0 of the components of (r0 ) weighted by a matrix function
involving the derivatives of Si,j (r r0 ). The right-hand side only depends on
a component of (r). The integral equation has to be satisfied for an arbitrary (r). The equation is satisfied if the matrix function is proportional
to a d-dimensional delta function d (r r0 ). The integration over r0 can then
be trivially performed. This results in a matrix equation which, after eliminating the arbitrary spatially varying field, determines the tensor response function.
We shall first examine the longitudinal response function SL which satisfies
the linear response relation
Z
dd r0 SL (r r0 ) L (r0 )
(1405)
L (r) =
On substituting the longitudinal linear response relation into eqn(1399) which
describes the mean-field order parameter, one finds

Z
2 F2 + 12 F4 0 . 0 2 c 2r
dd r0 SL (r r0 ) L (r0 ) = kB T L (r)
(1406)
On changing the order of integration and differentiation, one obtains

Z
d 0
2
d r 2 F2 + 12 F4 0 . 0 2 c r SL (r r0 ) L (r0 ) = kB T L (r)
(1407)
where L is to be regarded as an arbitrary function. The equation is satisfied
for any L (r), if the (mean-field) longitudinal correlation function satisfies

0
d
0
2
r r
(1408)
2 F2 + 12 F4 0 . 0 2 c r SL (r r ) = kB T
as can be verified by substitution. Hence, we have determined a partial differential equation which determines the longitudinal correlation function below Tc
and also determines the correlation function above Tc ,
Likewise, one can determine the transverse correlation function ST (r r0 )
which satisfies the linear response relation
Z
T (r) =
dd r0 ST (r r0 ) T (r0 )
(1409)
On substituting the transverse linear response relation into eqn(1402), which
describes the mean-field order parameter, one finds
Z
2
2 c r
dd r0 ST (r r0 ) T (r0 ) = kB T T (r)
(1410)

265

On changing the order of integration and differentiation, one obtains

Z
d 0
0
2
d r
2 c r ST (r r ) T (r0 ) = kB T T (r)

(1411)

where T is to be regarded as an arbitrary function. The equation is satisfied

for any T (r), if the (mean-field) transverse correlation function satisfies

d
0
0
2
r r
(1412)
2 c r ST (r r ) = kB T
The above equation can be used to determine the transverse correlation function
in the mean-field approximation.
For T > Tc the mean-field order parameter is zero, so the correlation
function satisfies

0
d
0
2
r r
(1413)
2 A ( T Tc ) 2 c
S(r r ) = kB T
In the low temperature phase, T < Tc , the order parameter is given by
| 0 |2 =

A ( T Tc )
2 F4

(1414)

therefore, the longitudinal correlation function satisfies the partial differential

equation

(1415)
4 A ( T Tc ) 2 c 2r SL (r r0 ) = kB T d r r0
Dimensional analysis of the above two equations indicates that the correlation
length should be given by
r
c
=
(1416)
A ( T Tc )
for T > Tc , and
r
=

c
2 A ( Tc T )

(1417)

for T < Tc . Since the correlation length is defined to scale as

| T Tc |

(1418)

one finds that the critical exponents are the same above and below Tc and are
given by = 0 = 12 .
as

The above equations can be expressed in terms of the correlation length

kB T d
2
2
0
0

r S(r r ) =

r r
(1419)
2c
266

The equation can be solved by Fourier transforming, leading to

kB T
2 + k 2 S(k) =
2c

(1420)

Thus,

S(k) =

kB T
2c

1
+ k2

(1421)

The correlation function S(r) is given by the inverse Fourier Transformation

Z
dd k
exp
i
k
S(k)
S(r) =
.
r
( 2 )d

Z
exp
i
k
.
r
kB T
dd k
=
(1422)
2c
( 2 )d
2 + k 2
For three dimensions, the integral is evaluated as

S(r)

=
=

exp i k r cos
d3 k
( 2 )3
2 + k 2
0
0
0

Z
Z
exp i k r cos
kB T
dk
2
k
d sin
2c
( 2 )2
2 + k2
0
0

Z

exp
+
i
k
r

exp

i
k
r
k2
kB T
dk
2c
( 2 )2 i k r
( 2 + k 2 )
0Z

dk
k
kB T 1
exp + i k r
(1423)
4 c r ( 2 i )
( 2 + k 2 )

kB T
2c

d sin

where two terms have been combined extending the integration over k to the
range to +. The remaining integration can be performed using Cauchys
theorem by completing the contour with a semi-circle in the upper-half complex
plane. The contribution of the semi-circular contour at infinity vanishes due to
Jordans lemma. The integral is dominated by the residue at the pole k = i 1 ,
leading to

S(r) =

kB T
8c

exp

which leads to the identification of the (three-dimensional)

of the critical exponent as = 0.
49 The

(1424)

r
49

mean-field value

three-dimensional result is quite special. A more general form of the mean-field form

267

The susceptibility in the high temperature phase and the longitudinal

susceptibility L in the low temperature ordered phase can be obtained from
linear response theory. The susceptibility is given by
Z
=
dd r S(r)
(1427)
which, in three-dimensions leads to the mean-field susceptibility given by

Z
exp r
1
=
d3 r
8c
r

Z

r
1
dr r exp
=
2c

0
2 Z

=
dx x exp x
(1428)
2c
0
where the dimensionless parameter x
x =

(1429)

has been introduced in the last line. The integral can be evaluated using integration by parts, leading to
2
=
(1430)
2c
Hence the susceptibilities are given by
1
2 A ( T Tc )

(1431)

for T > Tc , whereas, for T < Tc , the longitudinal susceptibility is given by

L =

(1432)

4 A ( Tc T )

Both susceptibilities diverge as the transition temperature is approached. Thus,

in mean-field theory the susceptibility critical exponent is the same in the
of S(r) in d-dimensions is given by

S(r) =

kB T
2 c Sd ( d 2 )

exp

r(d2)

(1425)

where F (x) has the properties that F (0) = 1 and for large r

(d3)
2

For d = 3 the function reduces to a constant, which is unity.

268

(1426)

low and high temperature phases and is given by = 0 = 1. However, the

amplitude of the divergent term is a factor of two greater for the high temperature phase. The transverse susceptibility T can also be calculated from the
transverse correlation function ST (r). The correlation length for the transverse
correlation function diverges for all temperatures below Tc . The divergence of
transverse correlation length is connected with Goldstones theorem. It has the
effect that, ST (r) satisfies the equation
2 c 2 ST (r) = kB T d (r)

(1433)

One can integrate the equation for ST over r, where the integration runs over a
volume which contains the origin, and then use Gausss theorem to express the
remaining volume integral as an integral over the surface of the volume
Z
Z
d
2
2c
d r ST (r) = kB T
dd r d (r)
Z
2c
dd r 2 ST (r) = kB T
Z
(1434)
2c
dS d1 . ST (r) = kB T
where the direction of the (d 1) dimensional surface element, dS d1 , is defined
to be normal to the surface of integration. Since ST (r) is spherically symmetric,
the integration is easily performed over the surface of a hyper-sphere of radius
r, leading to
(1435)
2 c Sd rd1 er . ST (r) = kB T
where Sd is the surface area of a d-dimensional unit sphere. Thus,

ST
kB T
1
=
d1
r
2 c Sd r

(1436)

For d > 2, the expression can be integrated leading to the transverse correlation
function ST (r) being given by

kB T
1
ST (r) =
(1437)
2 c (d 2) Sd rd2
The transverse susceptibility, T , can then be found from ST via
Z
T =
dd r ST (r)
Z L
1

dr rd1 d2
r
0
Z L

dr r
0

L2
269

(1438)

and diverges in the thermodynamic limit where the linear dimension of the system, L, is sent to infinity. The transverse susceptibility T is infinite since the
application of a small transverse field can cause the vector order parameter to
re-orient.
The results for the high temperature and the longitudinal susceptibilities
could have been determined directly from the equation

2 F2 + 4 F4 0 . 0 2 c 2 (r) + 8 F4 0 ( 0 . (r) ) = (r) (1439)
by considering the limit in which and become independent of r. In this
limit, one obtains

(1440)
2 F2 + 4 F4 0 . 0 + 8 F4 0 ( 0 . ) =
The differential longitudinal susceptibility L is defined as
L =

L
L

(1441)

is found to be given by
1

L =

(1442)

2 F2 + 12 F4 0 . 0
which is evaluated as
L =

1
4 A ( Tc T )

(1443)

The susceptibility in the disordered phase, where 0 = 0, is given by

1
2 F2

(1444)

1
2 A ( T Tc )

(1445)

=
which reduces to
=
as found previously.

10.6

The Gaussian Approximation

The Gaussian approximation is an approximation in which one evaluates the

corrections to mean-field theory due to the small amplitude fluctuations about
the mean-field ground state. The Gaussian approximation is analogous to the
steepest descents evaluation of an integral of the form

Z
I =
dx exp f (x)
(1446)

270

The mean-field approximation corresponds to approximating the integral by the

exponential of the function f (x0 ) at the value x0 which minimizes f (x). The
Gaussian approximation includes the corrections due to integrating the lowestorder (quadratic) terms in the Taylor expansion of f (x). This leads to the
result
v

u
2
u

exp f (x0 )
I u

2
t
ddxf2
x0

exp

f (x0 )

1
ln
2

d2 f
dx2

(1447)

which gives a correction to the exponent which is a factor of 1 smaller than

the leading contribution.
The Gaussian approximation goes one step beyond the mean-field approximation. The mean field approximation finds the most probable configuration
0 and takes into account the small amplitude fluctuations (r). The spatially
varying order parameter can be represented as
(r) = 0 + (r)

(1448)

in which the fluctuations (r) are assumed to have small amplitudes

| (r) | | 0 |

(1449)

The small amplitude fluctuations can be expressed in terms of their Fourier

components k via

1 X
(r) =
exp + i k . r k
V k

(1450)

It should be noted that since the (r) are real functions, then
k = k

(1451)

and k are complex functions with real and imaginary parts. The k-space
representation can be used to simplify the expression for the Landau-Ginzberg
Free-Energy Functional. The path integral can be re-expressed in terms of
integrals over the Fourier components i,k

Z = exp F

Y Z
=
di,k
exp F []
(1452)
i,k

271

where we are formally assuming that the Fourier components i,k are independent fields. The Gaussian approximation retains terms in F [] up to quadratic
order in the i,k . This allows the functional integral of the resulting approximate integrand to be evaluated exactly.
Like the mean-field approximation, the Gaussian approximation takes on
different forms in the ordered and disordered phases.
The Gaussian Approximation for T > Tc
For T > Tc the mean-field order parameter is given by 0 = 0, and the
non-trivial part of the Free-Energy Functional can be expressed as

2

2
Z
2
2
d
+ F4 (r)
(1453)
d r F2 (r) + c
which can be written in terms of the Fourier components as

X
1 X
F2 + c k 2 k . k +
F4 k . k
k . k k k
V
1
2
3
1
2
3
k

k1 ,k2 ,k3

(1454)
In obtaining the above expression, we have used the identity

Z
dd r exp i ( k 1 + k 2 + k 3 + k 4 ) . r
= V k1 +k2 +k3 +k4

(1455)

where is the Kronecker delta function. In the Gaussian approximation, where

k is assumed to be small, one neglects the fourth-order term proportional to
F4 . Hence, one has

X
F2 + c k 2 k . k
F [] F0 V +
k

F0 V +

n
XX
k

A ( T Tc ) + c k 2

i,k i,k (1456)

i=1

The condition
k = k

(1457)

relates the fields at points k and k. Since the two fields are not independent, it
is convenient to partition k-space into two disjoint regions, one region denoted
by 0 which contains the set of points k and a second region that contains all
the points k obtained by inversion of the points in the region 0 . The primed
region, 0 , is chosen such that all points of k-space are contained in either the
region 0 or its inversion partner. The Gaussian functional integral is evaluated
by first re-writing it as

Z = exp F

272

Y Z

di,k

F []

exp

i,k
0
Y Z

Z
di,k

di,k

F []

exp

i,k
0
Y Z

Z
di,k

di,k

F []

exp

(1458)

i,k

where the values of k in the primed products are restricted to the region 0 . The
variable of integration is changed from i,k and i,k to the real and imaginary
parts of the components of the field, <ei,k and =mi,k
0

Z =

Z
2

Z
d<ei,k

d=mi,k

exp

F []

(1459)

i,k

where the Jacobian of the transformation is 2. The approximate Free-Energy

functional is also re-written as a summation over k where the k-values in the
summation are restricted to the primed region.
F []

F0 V + 2

0
n
XX

F0 V + 2

i,k i,k

i=1

0
n
XX

A ( T Tc ) + c k 2

A ( T Tc ) + c k

i,k i,k

i=1

F0 V + 2

n
XX
k

A ( T Tc ) + c k 2

(<ei,k )2 + (=mi,k )2

i=1

(1460)
On performing the Gaussian integrals, one finds that the Partition function Z
is approximated by

Z

F0 V

exp

0
Y
1
n Y
A ( T Tc ) + c k 2
2 kB T
i=1

exp

Y
12
n Y
A ( T Tc ) + c k 2
F0 V
(1461)
2 kB T
i=1
k

where in the last line we have restored the product to run over the entire range
of k. In this expression, each of the n components of the order parameter yields
an identical factor. Thus, since

Z = exp F
(1462)

273

the Gaussian approximation to the Free-Energy, for T > Tc , is given by the

expression

A ( T Tc ) + c k 2
kB T n X
ln
(1463)
F F0 V +
2
2 kB T
k

where the summation runs over the full range of k. The specific heat can be
obtained from the expression
2
F
(1464)
C = T
T 2
The most divergent term in C is recognized as
Z
T 2 n kB
dd k
A2
C
V
d
2
( 2 ) ( A ( T Tc ) + c k 2 )2

(1465)

As we shall see, this exhibits different types of behavior depending on whether

d > 4 or d < 4. On setting
x = k
(1466)
where the correlation length is defined by
2 =

A ( T Tc )
c

(1467)

one finds that the leading divergence of the specific heat is given by
C

n k B A2 T 2
V 4d
2 c2

dd x
( 2 )d

1
1 + x2

2
(1468)

The integral is convergent at large k for 4 > d, in which case it is independent

of the cut-off for k. For larger dimensionalities, d, the integral may exhibit an
ultra-violet divergence if the cut-off is ignored. However, the upper cut-off for
x does depend on the lattice spacing a and is given by xc /a, so the
expression for the part of C displayed becomes independent of when d > 4,
leading to
n k B A2 T 2
V a4d
(1469)
C
2 c2
Thus, for d > 4, we have found that the critical exponent is given by = 0.
In general, the only divergences of interest are those which occur at k 0 when
. Divergences that occur due to the behavior of the integrand in the
region k 0 are known as infra-red divergences. For 4 > d, the integral in
eqn(1468) is convergent when is finite, and one obtains
C

4d
( T Tc )(4d)

274

(1470)

This expression reflects the infra-red divergence which occurs at T = Tc . Thus,

in the Gaussian approximation and with 4 > d, the critical exponent has
been calculated as
= (4 d)
(1471)
which differs from the value = 0 found from the discontinuous specific heat
as calculated in the mean-field approximation. For d > 4, the Gaussian approximation was found to yield the specific heat exponent = 0, just like
the value of found in the mean-field approximation. The exponents of the
Gaussian and the mean-field approximation first coincide when d = 4. The
difference in the the exponents found for d < 4 indicates that the fluctuations of the order parameter need to be accounted in dimensions less than four.
The Gaussian Approximation for Tc > T
For temperatures below Tc , the mean-field order parameter is non-zero and
will be parallel to any uniform applied field, L , no matter how small. We shall
orient our coordinate system so that the applied field lies along one axis, the
longitudinal axis. The longitudinal component of the order parameter, L (r),
will be written as the sum of the mean-field order parameter L and the spatially
varying fluctuations
(1472)
L (r) = L + L (r)
The remaining (n 1) components are transverse components which represent
truly spatially varying fluctuations, i.e. they have no uniform components. The
transverse components will be denoted as
(1473)

j (r)

for j = 1 , 2 , . . . , n. On substituting these expressions into the GinzbergLandau Functional, one obtains

2
4
F [] = V F0 + F2 L + F4 L L L

Z
d
+
d r F2 2L (r) + 6 F4 2L 2L (r) + c (L )2
+ F2

n1
X

2j (r)

+ 2

n1
X

F4 2L

j=1

2j (r)

+ c

j=1

n1
X

(j )2

j=1

+ 4 F4 L 3L (r) + 4 F4 L L (r)

n1
X

2j (r)

j=1

F4 4L (r)

+ 2

F4 2L (r)

n1
X
j=1

2j (r)

+ F4

n1
X

2i (r)

2j (r)

i,j

(1474)

275

The first line represents the Landau-Ginzberg Free-Energy for a uniform longitudinal order parameter. The Gaussian approximation consists of minimizing
the first line, as in mean-field theory, and retains the terms in the second and
third lines as they are of quadratic order in the fluctuations. The terms in the
last two lines are neglected, since they are of cubic and quartic order in the
fluctuations. The fluctuating parts of the fields are expressed in terms of their
Fourier components

1 X
exp + i k . r L,k
(1475)
L (r) =
V k
and

1 X
exp + i k . r j,k
j (r) =
V k

(1476)

On substituting into the Gaussian approximation for the Free-Energy Functional, one obtains

F [] V F0 + F2 2L + F4 4L L L

X
+
F2 + 6 F4 2L + c k 2 L,k L,k
k

n1
X
j=1

F2 + 2 F4 2L + c k 2

j,k j,k

(1477)
Since L minimizes the first term in the approximate Free-Energy Functional it
satisfies

2 F2 + 4 F4 2L

L = L

or
F2 + 2 F4 2L =

L
2 L

(1478)

(1479)

On utilizing the expression for L , one can express the approximate Free-Energy
Functional as

L L
F [] V F0 F4 4L
2

X

L
2
2
+
4 F4 L +
+ ck
L,k L,k
2 L
k

n1
X
j=1

X L
+ c k 2 j,k j,k
2 L
k

276

(1480)

The longitudinal and transverse fluctuations behave differently. It is seen that

the longitudinal fluctuations are primarily stabilized by the non-zero order parameter, whereas only the applied field stabilizes the transverse fluctuations.
The Gaussian path integral can be evaluated leading to the Gaussian approximation to the Free-Energy
F

4 F4 2L + 2LL + c k 2
kB T X
ln
F [L ] +
2
2 kB T
k

kB T X
+(n 1)
ln
2
k

L
2 L

+ c k2

(1481)

2 kB T

The Free-Energy can be used to calculate the divergent part of the specific heat
and its critical exponent . The specific heat is given by the sum of the contributions the mean-field theory and the longitudinal Gaussian fluctuations. Note
that the amplitude of the singular part of the Free-Energy is different above
and below Tc , and this leads to an extra factor of n in the specific heat of the
high temperature phase, whereas at low temperatures, there is an extra factor
(d4)
of 2 2 .
The Ginzberg Criterion
The Ginzberg Criterion provides an estimate of the temperature range in
which the results of mean-field theory may be reasonable. Mean-field theory
(or the Gaussian approximation) may be considered reasonable whenever the
fluctuations in the order parameter are smaller than the average value of the
order parameter. The size of the mean-squared fluctuations can be estimated
by S(r) evaluated at a length scale given by the correlation length . Hence,
the results of mean-field theory may be reasonable when
1 >

S()
20

(1482)

or, equivalently

20 >

kB T
8c

1
d2

(1483)

which leads to
A ( Tc T )
>
2 F4

kB T
8c

2 A ( Tc T )
c

2 c2
kB T F 4

2 A ( Tc T )
c

d2
2
(1484)

d4
2
(1485)

This suggests that, generally, mean-field theory might be reasonable for temperatures outside the critical region which is a narrow temperature window around
277

Tc . The fluctuations dominate in the critical region. The Ginzberg criterion also
indicates that mean-field theory, or the Gaussian approximation, might also be
reasonable for all temperatures in four or higher dimensions. The upper critical
dimension duc is the dimension above which the critical point can be treated in
the Gaussian approximation, and for an ordinary second-order transition duc = 4.
There is also a lower critical dimensionality dlc . Mermin and Wagner have
shown that a phase with spontaneously broken continuous symmetry is unstable
for dimensions less than two, since longwave length transverse fluctuations of
the order parameter are divergent. In this case, the lower critical dimensionality
dlc , below which a phase transition cannot occur, is dlc = 2. The divergence of
the fluctuations for 2 > d found in systems with a continuously broken symmetry is related to the presence of Goldstone modes. Due to the divergence of
the fluctuations, the average value of the order parameter is not well-defined
and, therefore, the fluctuations dynamically restore the broken symmetry. The
suppression of ordering can be seen in a different way, by examining how Tc is
reduced in the self-consistent Gaussian approximation.
The Self-consistent Gaussian Approximation.
The self-consistent Gaussian approximation starts from the approximate
Free-Energy of the Gaussian model in the form

F2 + 6 F4 2L + c k 2
kB T X
ln
F [] F [L ] +
2
2 kB T
k

F2 + 2 F4 2L + c k 2
kB T X
ln
+(n 1)
(1486)
2
2 kB T
k

This expression holds true for both T greater and T smaller than Tc . For temperatures above Tc , one expects L will be zero and the two logarithmic terms
can be combined since there is no physical distinction between the longitudinal
and transverse directions, if L = 0. Minimiziation w.r.t. L leads to the
solutions of either
L = 0
(1487)
or

0

F2 + 2 F4 2L

kB T X
6 F4
2
2
2
F
+
6
F
2
4 L + c k
k

kB T X
2 F4
2
2
2
F
+
2
F
2
4 L + c k
k

X
2
+ 2 F4 L + 6 F4
< L,k L,k >

+(n 1)

=

+ ( n 1 ) 2 F4

< T,k T,k >

278

(1488)

where the last two terms have been recognized as involving the fluctuations of
the order parameter, as evaluated in the Gaussian approximation. The critical
temperature Tc is the temperature at which two infinitesimal but real solutions for L first occur. This is to be contrasted with the approximate critical
(0)
temperature , Tc , defined by
F2 = A ( T Tc(0) )

(1489)

The true critical temperature Tc is determined from the equation

1
kB Tc X
0 = F2 + ( n + 2 ) 2 F4
2
F
+
c k2
2
k
(0)

(1490)

(0)

which since the last term is positive, reduces Tc below Tc . At Tc , the last
term can be expressed in terms of an integral

( n + 2 ) 2 F4

(0)

kB Tc
2c

Sd
V
( 2 )d

dk k d3

(1491)

a
For d > 3, the integral is finite and of the order (d2)
, hence, one expects that
the shift of Tc will be reasonably moderate. On the other-hand, for 3 > d
the integral representing the order parameter fluctuations is divergent due to
the behavior at k 0, thereby suppressing Tc to much lower temperatures.
The logarithmic divergence of the correction to Tc that occurs for d = 2 is
consistent with the value of the lower critical dimensionality dlc = 2 that is
inferred from the Mermin-Wagner theorem.

10.7

The Renormalization Group Technique

The scaling behavior shows that there exists a single relevant length scale that
describes the large scale, long-ranged, fluctuations that dominate the singular
parts of the Free-Energy. The scaling theory and the formulation of the LandauGinzberg Free-Energy Functional indicate that the microscopic length scales in
the Hamiltonian are irrelevant. The scaling hypothesis describes the change
in the fluctuations as this length scale is changed by, for example, changing
the temperature. Furthermore, at the critical temperature, the system appears
to exhibit the same behavior at all length scales. The renormalization group
technique supplements the scaling hypothesis by incorporating the effect of the
short scale physics. It shows that if the length scale is changed, then the effective interactions controlling the large scale fluctuations also change. The
interactions between the long-ranged fluctuations are re-scaled, when the shortranged fluctuations are removed by integrating them out. The method involves
the following three steps:

279

(i) Integrating out the short scale fluctuations of the system, thereby increasing the effective short distance cut-off for the system.
(ii) Re-defining all length scales, so that the new cut-off appears indistinguishable from the old cut-off.
(iii) Re-define or renormalize the interactions governing the fluctuations of
the order parameter.
The above procedure introduces the idea of an operation that can be compounded, resulting a semi-group rather than a group, since the operations
are not uniquely invertible. The operations result in a flow in both the form
and the parameters involved in the Landau-Ginzberg Functional as the length
scale is changed by successive infinitesimal increments. The set of parameters
{F2 , F4 , . . . , c} that describe the most general form of the Landau-Ginzberg
Free-Energy Functional describe a point in parameter space. A change in scale
by a factor of results in a flow between two different points in parameter space
{F20 , F40 , . . . , c0 } = R(){F2 , F4 , . . . , c}

(1492)

At the critical point, the above operations should leave the renormalized LandauGinzberg Functional invariant, reflecting the scale-invariance that occurs at the
critical point,
{F2 , F4 , . . . , c } = R(){F2 , F4 , . . . , c }
(1493)
The corresponding invariant point of parameter space {F2 , F4 , . . . , c } is known
as a fixed point. Sometimes the properties of a system which is close to a fixed
point can be inferred from the flow of the parameters under the renormalization
group operations by linearizing the flow around the fixed point. In this case,
the procedure results in the recovery of the phenomena described by the scaling
hypothesis together with the actual values of the critical exponents.

10.8

Collective Modes and Symmetry Breaking

Goldstones theorem50 pertains to phase transition where the Hamiltonian has

a continuous symmetry that is spontaneously broken. That is, the ground state
is infinitely degenerate and does not have the full symmetry of the Hamiltonian. The theorem states that in the broken symmetry state, the system will
have branch of collective boson excitations that has a dispersion relation which
reaches = 0 at k = 0.
Furthermore, in the long wavelength limit, the bosons are non-interacting
and a coherent superpositions of these bosons connects the broken symmetry
state to the continuum of degenerate states. This is easily understood, since if
50 J. Goldstone, Field Theories with Superconductor Solutions. Nuovo Cimento, 19, 154164, (1961).

280

the system was to be physically transformed from one broken symmetry state to
another, no energy would have to be supplied to the system. Thus, the bosons
dynamically restore the broken symmetry.
Goldstone bosons in the form of spin waves were already known to exist
in ferromagnets and antiferromagnets51 , where the continuous spin rotational
symmetry is spontaneously broken at low temperatures. For ferromagnets, the
ground state and the spin-wave dispersion relations can be calculated exactly.
Ironically, P.W. Anderson had already investigated the dynamic modes associated with a superconductor52 prior to Goldstones work. Anderson had found,
contrary to the Goldstone theorem, that the bosons in a superconductor had a
finite excitation energy similar to the plasmon energy of the metal. A posteriori, this is obvious since metals neither become transparent nor change colour
when they start to superconduct. Andersons idea was subsequently picked up
by Peter Higgs53 and by Tom Kibble and co-workers54 and also by Francois
Englert and Robert Brout55 who noted that, if long-ranged interactions were
present, the modes would acquire a mass. The massive modes, associated with
the breaking of a continuous symmetry in the presence of long-ranged interactions, are known as Kibble-Higgs modes.
Here we shall examine the Goldstone bosons of a Heisenberg ferromagnet,
which is a slightly unusual case since the order parameter of a ferromagnet is a
conserved quantity.
The Ferromagnetic State
The fully polarized ferromagnetic state | 0 > has all the spins aligned and
is an exact eigenstate of the Heisenberg Hamiltonian. The Hamiltonian can be
written as a scalar product
X
=
H
Ji,j Si . Sj
i,j

X
i,j

Ji,j

1 +
Siz Sjz +
( Si Sj + Si Sj+ )
2

(1494)

where the sum runs over pairs of sites. We shall assume that the spontaneous
51 P.W. Anderson, An Approximate Quantum Theory of the Antiferromagnetic Ground
State, Phys. Rev. 86, 694-701, (1952).
52 P.W. Anderson, Random-Phase Approximation in the Theory of Superconductivity,
Phys. Rev. 112, 1900-1916, (1958), Plasmons, Gauge Invariance, and Mass, Phys. Rev.
130, 439 (1963).
53 P. Higgs, Broken Symmetries and the Masses of Gauge Bosons, Phys. Rev. Lett. 13.
508-509, (1964).
54 G. Guralnik, C.R. Hagen and T.W.B. Kibble, Global Conservation Laws and Massless
Particles, Phys. Rev. Lett. 13, 585-587 (1964).
55 F. Englert and R. Brout, Broken Symmetry and the Mass of Gauge Vector Mesons,
Phys. Rev. Lett. 13, 321-23 (1964).

281

magnetization is parallel to the z-axis. Then

X
1 +
| 0 > =
( Si Sj + Si Sj+ ) | 0 >
H
Ji,j Siz Sjz +
2
i,j

X
1 +
+

( Si Sj + Si Sj ) | 0 >
=
Ji,j S S +
2
i,j
X
=
Ji,j S 2 | 0 >
(1495)
i,j

The first line follows since all the spins are aligned with the z-axis and are
eigenstates of Siz with eigenvalue S
Siz | Si > = S | Si >

(1496)

The second line occurs since the spin-flip terms vanish as they all involve the
spin-raising operator at a site and
Si+ | Si > = 0

(1497)

as the spin cannot be raised further. Hence, the fully-polarized

ferromagnetic
P
2
state is an exact eigenstate with eigenvalue E0 =
J
S
. This degeni,j
i,j
erate is infinitely degenerate since any rotation of the total magnetization leads
to an equivalent symmetry broken ground state.
Conservation of Magnetization
The z-component of the total magnetization M is defined as
z =
M

N
X

Siz

(1498)

i=1

The magnetization commutes with the Hamiltonian and so is conserved. The

commutator
z , H
]
[M
(1499)
can be evaluated with the aid of the commutation relations
[ Siz , Sj+ ]

+ i,j h Si+

[ Siz , Sj ]

= i,j h Si

(1500)

From which one finds that

z , H
]
[M

X
h
Ji,j ( Si+ Sj Si Sj+ )
2 i,j

(1501)
282

independent of any choice for the exchange interaction. Hence, the total magnetization is conserved.
The Spin Wave Dispersion Relation
The spin wave state | q > is a linear superpositions of ferromagnetic states
with a single flipped spin. The spin wave state can be expressed as
| q > =
=

Sq | 0 >

1 X

exp i q . Rj Sj | 0 >
N j

(1502)

which satisfies the energy-eigenvalue equation

| q > = Eq | q >
H

(1503)

where the energy eigenvalue can be expressed in terms of the ground state energy
E0 and the spin wave excitation energy h
q
Eq = E0 + h
q

(1504)

The dispersion relation can be found from

, Sq ] | 0 > = h q Sq | 0 >
[H

(1505)

by using the commutation relations

[ Sjz , Si ] = i,j h Si

(1506)

[ Sj+ , Si ] = 2 i,j h Siz

(1507)

and
On using these two commutation relations, one finds that the commutation
relation between Sq and the Hamiltonian produces

1 X
z
z

exp i q . Rj
[ H , Sq ] =
Ji,j h Si Sj Si Sj
N i,j

1 X
z
z

+
Ji,j h Si Sj Si Sj
exp i q . Ri
N i,j
(1508)
Thus, when acting on the ferromagnetic state the commutation relation reduces
to

X
, Sq ] | 0 > = 1
[H
Ji,j h S Sj Si exp i q . Rj | 0 >
N i,j

1 X
+
Ji,j h S Si Sj exp i q . Ri | 0 >
N i,j
(1509)
283

which can be further reduced by noting that since the pairwise interaction only
depends on the nearest neighbor separation and not the absolute location in the
lattice. Therefore, on expressing this in terms of Sa and a sum over the nearest
neighbors sites, one has

1 X

Ji,0 h S 1 cos[ q . Ri,0 ] Sq | 0 >

[ H , Sq ] | 0 > =
N i
(1510)
Thus, the dispersion relation is evaluated as a sum over nearest neighbor sites

X
h q =

J0,i h S 1 cos[ q . R0,i ]

(1511)
i

which vanishes quadratically as q 0. Usually, Goldstone modes have linear

dispersion relations, however, due to the conserved nature of the order parameter, ferromagnetic spin waves have quadratic dispersion relations.

10.9

Appendix: The One-Dimensional Ising Model

The Transfer Matrix Solution of the One-Dimensional Ising Model in

an Applied Field
We shall apply periodic boundary conditions to a one-dimensional chain of
z
z
length N , so that SN
+1 = S1 , so the system of spins becomes equivalent to a
ring of spins with N links. The partition function can be written as the trace
over N factors each of which depends on the values of two neighboring spins
Z = Trace

Y
N

z
T (Siz , Si+1
)

(1512)

i=1

where the factors of K have been symmetrized

B
z
z
z
T (Siz , Si+1
) = exp J Siz Si+1
+
( Siz + Si+1
)
2

(1513)

z
The factors T (Siz , Si+1
) can be regarded as elements of a matrix T (the transfer
matrix)
z
z
T (Siz , Si+1
) = < Siz | T | Si+1
>
(1514)

where | Siz > is a column vector

z
Si+1

z
1+Si+1
2

>=

z
1Si+1
2

284

(1515)

and < Siz | is a row vector

Siz

| =

1+Siz
2

Thus, the matrix T is given by

exp
+

(
J
+
B
)

T =

exp J

1Siz
2

exp J

exp + ( J B )

(1516)

(1517)

Since
X

z
z
| Si+1
> < Si+1
| = I

(1518)

z
Si+1

where I is the unit 2 2 matrix

I =

1
0

0
1

(1519)

z
the summation over Si+1
in the expression can be performed as
X
z
z
z
z
< Siz | T | Si+1
> < Si+1
| T | Si+2
> = < Siz | T T | Si+2
>
z
Si+1

z
= < Siz | T2 | Si+2
>

(1520)
Using the completeness property iteratively, the successive traces over the variables Siz for i = 2 , . . . , N , in the expression for Z can be replaced by
successive multiplication of the matrices T. Thus
X
Z =
< S1z | TN | S1z >
(1521)
S1z =1

The partition function can be expressed in terms of the eigenvalues i of T

defined by the eigenvalue equation
T | i > = i | i >

(1522)

for i = 1 or i = 2. We shall assume that 1 > 2 . On defining a 2 2 matrix

S as a row vector of the two column vectors | 1 > and | 2 >

| 1 >
| 2 >
S =
(1523)
then, as the similarity transform based on S diagonalizes T, the inverse transformation is given by

1 0

T = S
S1
(1524)
0 2
285

Then, the partition function can be evaluated as

N

X
1
0
< S1z | S
S1 | S1z >
Z =
0 N
2
z

(1525)

S1 =1

and on utilizing the cyclic invariance of the trace, one finds the result
N

X
1
0
< S1z |
Z =
| S1z >
N
0

2
z
S1 =1

N
= N
1 + 2

(1526)

The Free-Energy F is given by

kB T ln

N
1

N
2

kB T N ln 1 kB T ln

N
2
1 + N
1

(1527)

The second term can be neglected since it is of order unity. Thus,

F kB T N ln 1
The eigenvalues are determined from the secular equation

exp + ( J + B )
exp

J

exp

J
exp
+

(
J

B
)

(1528)

= 0

(1529)

which can be expressed as the quadratic equation

2 2 exp[ J ] cosh B + 4 cosh J sinh J = 0

(1530)

This has the two solutions

= exp[ J ] cosh B

exp[ + 2J ] sinh2 B + exp[ 2J ] (1531)

Thus, the final result for the Free-Energy is given by

q
F = N kB T ln exp[ J ] cosh B + exp[ + 2J ] sinh2 B + exp[ 2J ]
(1532)
The magnetization is defined as

M =

286

F
B

(1533)

which yields
M = N q

exp[ J ] sinh B

(1534)

exp[ + 2J ] sinh2 B + exp[ 2J ]

Since the Free-Energy is an analytic function of B and T , and as M (B, T ) is

also an analytic function of B for all T , the system does not exhibit a phase
transition at any finite T .
Zeroes of the Partition Function
The zeroes of the partition function are determined as
N
ZN (z) = N
1 + 2 = 0

(1535)

z = exp[4J + 2B] = exp[4J + i]

(1536)

we shall express z as

corresponding to B = i 2 . On defining a parameter r via

r = exp[2J]

(1537)

z = r2 exp[ i ]

(1538)

then
The values of for which ZN (z) = 0 are determined from the equation

cos

+
2

r2 sin2

12 N

cos

r2 sin2

12 N
= 0 (1539)

On setting

= ( 1 r2 ) cos
(1540)
2
the equation for can be simplified to

2 N
N
N
2
(1 r )
( cos + i sin ) + ( cos i sin )
= 0 (1541)
cos

or, equivalently
N

( 1 r2 ) 2 2 cos N = 0

(1542)

which has solutions given by

= (k

1
)
2 N

(1543)

1
2

(1544)

for k = 1 , 2 , . . . , N . Since
cos = 2 cos2
287

one has the solution

cos

= r2 + ( 1 r2 ) cos 2

(2k 1)
2
2
= r + ( 1 r ) cos
N

(1545)

for k = 1 , 2 , . . . , N . This determines the distribution of zeroes of ZN (z). It

is seen that they reside on a circle of radius r2 in the complex z plane. The zeroes lie on an arc of the circle starting with the angle i cos1 ( 1 2 r2 ) =
2 sin1 r and ending with an angle f = 2 i . It has a gap of angular
width 4 sin1 r centered on the real z-axis. It only pinches of the real z-axis
when r 0 or, equivalently, when T 0. Thus, the model can be considered
as exhibiting a phase transition at T = 0. In this case, the critical exponents are
identified as = 1, = 0, = 1, = 1 and = . The exponents satisfy
the scaling relations. The existence of a phase transition in one-dimension is
not inconsistent with the Mermin-Wagner theorem, since the broken symmetry
is discrete.
The distribution of zeroes on the circle, (), is defined as

dk
1
() =
N d

(1546)