CBE240book PDF

Statistical Thermodynamics
Carlo Carraro

c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft ii
Preface
These lecture notes present an introduction to the statistical foundations of equilibrium thermodynamics.
I take a Bayesian approach to the topic, where macroscopic entropy is introduced as a measure of the
information that could be gained by observing the microscopic state of a physical system. This approach is
appealing for its logical simplicity. It is rooted in the principle of insufficient reason, enunciated by Jakob
Bernoulli, which in this context essentially states that in the absence of information to the contrary, a physical
system is equally likely to be found in any microscopic state compatible with the known conservation laws
(such as particle number, energy, and so on); entropy quantifies this absence of information. A commonly
followed alternative starting point, more aligned with a frequentist viewpoint of statistics, is the ergodic
hypothesis, that a system will in due time visit all microscopic states compatible with conservation laws; but
this due time can be so long compared to any practical timescale for observation (or even to the proverbial
age of the universe), that one must always question whether any such observation yields representative
time-averages. This issue is particularly vexing in molecular dynamics simulations. In fact, breakdown of
ergodicity occurs in many interesting systems, such as in any system exhibiting phase transitions.
The material presented here strives to be self contained, although knowledge of calculus is assumed, as well
as some familiarity with thermodynamics, as developed in introductory undergraduate science or engineering
courses. Chapter 1 introduces the concept of macroscopic vs microscopic state of a system. The mathematical
tools needed in statistical thermodynamics are laid out in chapters 2 and 3 (counting and probability theory,
respectively). After introducing entropy in probability theory, we transition to the physical world with an
application to the classical ideal gas in Chapter 4. Here, the power of maximum entropy as a thermodynamic
potential begins to become clear. The properties of entropy are further developed in Chapter 5, which
includes the statement of the Second Law of Thermodynamics, two important consequences of which are
explored in Chapter 6 (Carnots and Clausius theorems). Chapters 7 and 8 are devoted to the topic of
boundary conditions, implemented macroscopically through the construction of suitable thermodynamic
potentials (Chapter 7) and interpreted probabilistically as the process of entropy maximization with suitable
constraints (Chapter 8). Chapter 9 deals with open systems, and lays the foundation for the study of
phase equilibria and phase transitions. Chapter 10 develops the mean field approximation for systems whose
degrees of freedom can be modeled as Bernoulli variables. Finally, Chapter 11 presents some exacts results
to highlight the limitations of mean field theory and to illustrate the role of dimensionality on the breakdown
of mean field theory.
The emphasis throughout is on the logical structure of statistical thermodynamics more than on any
particular application; the goal is to empower the reader to think about very different physical situations in
a coherent, unified fashion. Accordingly, the organization of the notes tends to have each section dedicated
to the exposition of a fundamental concept followed by a worked out example (sometimes reinforced in
the exercises at the end of each chapter). These examples and exercises are taken from a variety of topics
in physical chemistry, solid state state physics, and materials science. The often cursory or introductory
treatment afforded in these notes to important topics like the theory of electrolytic solutions, the ideal Fermi
gas, or the blackbody radiation, is meant to encourage the students to explore further on their own, and to
instill the confidence that they are well equipped to do so.
Berkeley, CA July 2020
iii

c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft iv
Contents
Preface iii
1 The goal of Statistical Thermodynamics 1

1.1 Microscopic State of a System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Hamiltonian Mechanics and Phase Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Macroscopic State of a System; Gibbs Phase Rule . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Thermodynamic Equilibrium and the Zeroth Law of Thermodynamics . . . . . . . . . . . . . 2
1.5 The Ergodic Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Counting 5
2.1 How Many? A Simple Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Permutations, Factorials, and Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Distinguishable vs Indistinguishable Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Quantum Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 The Binomial Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Probability 9
3.1 Definition of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Conditional Probability and Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Moments of Probability Distributions: Expectation and Variance . . . . . . . . . . . . . . . . 11
3.4 Joint Probability Distributions, Independence, and Covariance . . . . . . . . . . . . . . . . . 11
3.5 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.6 Binomial Distribution for Large n: Gaussian and Poisson Distributions . . . . . . . . . . . . . 12
3.7 Uniform Distribution and Cumulant Distribution Function . . . . . . . . . . . . . . . . . . . . 12
3.8 Distribution of a Function of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.9 Characteristic Function and Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . 14
3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Entropy 17
4.1 Entropy of a Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Boltzmann’s Entropy Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Discrete Phase Space: the Entropy of a Lattice Gas . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Continuum Phase Space: the Entropy of the Classical Ideal Gas . . . . . . . . . . . . . . . . . 18
4.5 Principle of Maximum Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
v
CONTENTS
5 Properties of Entropy 23
5.1 The Dependence of Entropy on E, V, N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Thermodynamic Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Connection to Kinetic Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Homogeneity of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.5 Thermodynamic Susceptibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.6 Entropy Changes and Reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.7 Clausius’ Statement of the Second Law of Thermodynamics . . . . . . . . . . . . . . . . . . . 27
5.8 Entropy at Low Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Thermodynamic Processes and Cycles 29

6.1 Joule Expansion and Other Processes for an Ideal Gas . . . . . . . . . . . . . . . . . . . . . . 29
6.2 The Carnot Cycle and Carnot’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3 Clausius’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7 Thermodynamic Potentials 33
7.1 The Concept of Free Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.2 Systematic Construction of Thermodynamic Potentials . . . . . . . . . . . . . . . . . . . . . . 33
7.3 Stability Criteria for Thermodynamic Potentials . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.4 The Calculus of Thermodynamics: Maxwell Relations and Jacobians . . . . . . . . . . . . . . 35
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
8 Statistical Mechanics in Thermal Equilibrium 39

8.1 Probability Distribution at Constant Temperature . . . . . . . . . . . . . . . . . . . . . . . . 39
8.2 The Canonical Partition Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
8.3 Energy Fluctuations and Heat Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.4 Equivalence of Microcanonical and Canonical Routes to Thermodynamics . . . . . . . . . . . 42
8.5 Partition Function of the Classical Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . 43
8.6 Partition Function of the Quantized Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . 44
8.7 Heat Capacity of Crystals and Blackbody Radiation . . . . . . . . . . . . . . . . . . . . . . . 45
8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9 Statistical Mechanics of Open Systems 49

9.1 Equilibrium under Particle Flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
9.2 Probability Distribution of an Open System at Constant (T, V, µ) . . . . . . . . . . . . . . . . 49
9.3 Adsorption Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.4 Ideal Fermi Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9.5 Virial Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
9.6 van der Waals Equation of State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
9.7 Phase Coexistence and Metastable States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
9.8 What makes a Phase? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
9.9 Absence of Phase Transitions in Finite Systems and the Theorem of Lee and Yang . . . . . . 60
9.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
10 Mean Field Theory 63

10.1 Lattice Models of Binary Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
10.2 Regular Solution Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
10.3 The Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
10.4 Mean Field Theory of the Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
10.5 Limitations on the Applicability of Mean Field Theory . . . . . . . . . . . . . . . . . . . . . . 67
10.6 Charge Screening in Coulomb Systems: Poisson-Boltzmann and Debye-Hückel Theory . . . . 68
10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft vi
CONTENTS
11 Exact Results and Breakdown of Mean Field Theory 71

11.1 Ising Model in 1D: the Transfer Matrix Method . . . . . . . . . . . . . . . . . . . . . . . . . . 71
11.2 Breakdown of Mean Field Theory: Fluctuations and Defects . . . . . . . . . . . . . . . . . . . 72
11.3 The Landau Theory of Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
11.4 Correlations near Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
11.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft vii
CONTENTS

c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft viii
Chapter 1
The goal of Statistical

Thermodynamics
The laws of thermodynamics are empirical statements concerning energy conservation (first law) and the
maximum amount of work that can be obtained in a thermodynamic transformation (second law). The goal
of statistical thermodynamics is to derive these laws from the microscopic description of the system through
a statistical treatment of the microscopic degrees of freedom.
1.1 Microscopic State of a System

Consider a macroscopic system composed by N particles. For the system to be regarded as macroscopic,
we may require, loosely speaking, that N be of order Avogadro’s number (this is a wild overestimate, as
we shall see, but is in line with our everyday experience). In classical mechanics, the state of the system is
specified once the identity1 of all particles is known, and we know their positions and momenta at a given
instant in time. The evolution of the system at arbitrary points in the future2 is predicted by integrating
Newton’s equations of motion, mi~ai = F~i for all i. This is a system of 3N second order ordinary differential
equations, and thus 6N initial conditions are needed to specify a solution completely; sometimes we say that
the system possesses 6N degrees of freedom.
1.2 Hamiltonian Mechanics and Phase Space

The laws of classical mechanics can be formulated in several equivalent ways. One such formulation, due to
Hamilton, is particularly useful in the development of statistical mechanics. It is based on the Hamiltonian
function, the sum of kinetic and potential energies of all particles: H(p, r) = KE({~ pi }) + P E({~ri }), from
which the equations of motion (Hamilton’s equations) are written as
∂H ∂H
p˙i = − , r˙i = . (1.2.1)
∂ri ∂pi
The 6N components (in 3D) of the position and momentum variables, ~ri and p~i span the “phase space” of
the N -particle system; the state of the system corresponds to a point in phase space. It is easy to see that
dH ∂H
dt = ∂t , so that if the potential energy is independent of time, the value of the Hamiltonian is constant in
time (conservation of energy). Later on, we will deal with systems for which we can’t write down equations
of motion, but we can still specify the energy as a function of certain “degrees of freedom”, or microscopic
variables. In statistical mechanics, it is common to refer to such energy function as the “Hamiltonian” of
the system, even though the system does not follow Hamiltonian mechanics.
1 The identity of a point particle in classical mechanics is determined by its mass and its interactions, i.e., by the Hamiltonian
(see next section)

2 We can also tell the system’s past history if we integrate backwards in time
1
CHAPTER 1. THE GOAL OF STATISTICAL THERMODYNAMICS
1.3 Macroscopic State of a System; Gibbs Phase Rule

For a macroscopic system, keeping track of all degrees of freedom is not practical, and one must be content
with measuring a much smaller number of parameters, such as volume, pressure, temperature, and chemical
composition. A suitable subset of these variables is needed to define the macroscopic, or thermodynamic,
state of the system. Exactly how many variables are needed is stated by Gibbs phase rule, which in the
absence of chemical reactions states that
Nv = 2 + Nc − φ, (1.3.1)
where Nv is the number of variables, or thermodynamic degrees of freedom, Nc the number of independent
components, and φ the number of phases. A phase is a macroscopically homogeneous substance.3 For
example, we take as our variables p, T , and the variables that determine the compositions of all phases.
(It is understood that the total amount of stuff in the system is arbitrary, for instance we could always
refer everything to one mole, so this arbitrary “variable” does not contribute to Nv .) The state of a system
described in this way is referred to as a macrostate. The macrostate of a system can be changed, e.g., by
doing work on the system or by heating it; it also changes if particles flow into or out of it.
1.4 Thermodynamic Equilibrium and the Zeroth Law of Thermo-

dynamics
A system is said to be in thermodynamic equilibrium if its thermodynamic (i.e., macroscopic) properties do
not vary with time. Sometimes we’ll distinguish between thermal, mechanical, and chemical equilibrium, by
which we mean, respectively, that temperature, pressure, and composition of the system are not changing
in time. Temperature, unlike pressure or chemical identity, has no counterpart in classical mechanics; it is
defined empirically by the instrument used to measure it (the thermometer). Its usefulness relies on the
premise that it is a transitive property. This fact is often stated as
The Zeroth Law of Thermodynamics: Two bodies that are separately in thermal equilibrium
with a third one must also be in thermal equilibrium with each other.
1.5 The Ergodic Hypothesis

Since the thermodynamic state of a system in equilibrium doesn’t change in time, time itself will never
appear as a parameter in equilibrium Thermodynamics, unlike in classical mechanics. How, then, can we
hope to establish a connection between the two disciplines? We have established, by our choice, that a
thermodynamic state is specified by a much smaller set of variables than a microscopic state. That means
that to each macrostate there corresponds a very large number of microstates. Statistical mechanics reduces
the calculation of a given observable (anything that can be measured, like density, compressibility, heat
capacity, etc.) for a macrostate to the statistical average of that observable over all microstates corresponding
to the given macrostate. It is often assumed that a system that at some instant in time happens to be
in a given microscopic state will visit, or access, in due time, all other microstates compatible with the
conservation laws of microscopic mechanics (e.g., if the system is isolated, all accessible microstates must
have the same energy). This is the ergodic hypothesis. It forms the foundation of molecular dynamics, where
the macroscopic (thermodynamic) properties of a system are measured by following the trajectories of the
particles over a long time and taking the time-average of the observables of interest. Providing a rigorous
justification of the ergodic hypothesis for a given system is not possible in general; in fact, many systems are
known where the ergodicity does not hold true. One might also wonder whether the macroscopic properties
of a system can have anything at all to do with ergodicity, since the time it takes a system to visit most
accessible states is typically much larger than the age of the universe, i.e., many orders of magnitude larger
3 More precisely, it is a homogeneous mixture of all compounds that can be formed by the chemical elements present

c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 2
1.6. EXERCISES
than the time scale of any experiment by which those properties are measured. In contrast, it turns out that
statistical averaging over a large number of microstates is free from these fundamental difficulties and can
often be carried out with relative ease.
1.6 Exercises
1. How many microscopic degrees of freedom for a system of 2 particles? What if the particles are
connected by a rigid rod of negligible mass? What if the particles are connected by a spring of
negligible mass?
2. How many macroscopic degrees of freedom for a 0.1M solution of NaCl in water? How many macro-
scopic degrees of freedom for a gaseous mixture of oxygen and nitrogen? How many degrees of freedom
for liquid water in equilibrium with its vapor and with ice?

CHAPTER 1. THE GOAL OF STATISTICAL THERMODYNAMICS

Chapter 2
Counting
2.1 How Many? A Simple Rule

In statistical mechanics we are often required to count the number of elements of a large set, for instance
the number of possible energy states of a system, or the number of configurations of spins on a lattice, or
the number of initial conditions of a set of equations, etc.1 The process of counting involves arranging the
elements of the set in question in a one-to-one correspondence with (a subset of) the natural numbers.2 Note
that “arranging” implies some sort of ordering, and this suggests that the process of counting can be broken
down into a sequence of steps. The first rule of counting says that if we have m choices of arrangements
at one step and for each of these we have n choices for the next step, then the total number of arrangements
that can be made in the two steps is m × n.
Example: The Power Set
Consider a finite set A with n elements. The power set of A is the set of all distinct subsets of A,
including A itself and the empty set ∅. To count the number of subsets of A, we employ a useful
trick. We construct a correspondence between each subset of A and a length-n binary string (a string
of 0s and 1s). The string has length n because there are n elements in A. Let us order the elements of
A (it can be done because A is countable). For a given subset of A, it either contains the first element
of A, in which case we assign the value 1 to the first digit of the binary string, or not, in which case
we assign the value 0. Then we keep going: our subset either contains the second element of A, or
not, and so forth. We see that this procedure determines a unique binary string, and moreover any
given string determines a subset uniquely (for instance, the empty set corresponds to a string of 0s;
the string 11 . . . 1 corresponds to the set A itself). How many strings of length n are there? If we
write down the string, at each digit we have a choice of two values, so by the first rule of counting
there are 2n binary strings, and the power set of A has 2n elements.
2.2 Permutations, Factorials, and Stirling’s Formula

A simple but fundamental application of counting reveals that the number of permutations of n distinct
objects is n · (n − 1) · · · 2 · 1 = n!. Note that n objects are in one-to-one correspondence with the set of
integers {1, ..., n}, so we have n ways to pick the first object, n − 1 to pick the second, and so on, until we
have a single choice for the last one. We can say that there are n! ways of ordering a set of size n. Now let
us count the ways of ordering a subset of size k picked from the set {1, ..., n}. We proceed as before, but
n!
stop at the k-th step, so we have altogether n · (n − 1) · · · · (n − k) = (n−k)! choices. (If k = n, we get back
1 One of the seminal papers on counting, by the great mathematician G. Pólya, was motivated by the problem of enumerating
structural isomers.
2 This procedure is always possible for a finite set. For an infinite set, it may not be possible, in which case the set is said to
be uncountable.
5
CHAPTER 2. COUNTING
the value n! because by definition 0! = 1.) In statistical physics, we usually deal with very large numbers of
objects. In this case, we can use Stirling’s approximation for factorials,3
√
n! ≈ 2πn(n/e)n (2.2.1)
2.3 Combinations
In the preceding section, we considered the objects and their arrangements to be distinguishable. In other
words, we considered an ordered list of k objects picked from a group of n. Now instead we consider the case
when the arrangement of the objects does not matter. Since there are k! arrangements of a list of k objects,
but we now consider them the same list, we must divide the result of the previous section by k! (this is the
second rule of counting). In other words, we say that the number of ways of choosing k objects out of n
without replacement and irrespective of order is

n n!
:= . (2.3.1)
k (n − k)!k!
Here, “irrespective of order” means that the permutations of the k objects are not regarded as different.
2.4 Distinguishable vs Indistinguishable Objects
In the previous two sections, we counted arrangements of objects chosen from a set “without replacement”
(or without repetition), meaning that each of the n objects could be chosen only once (as when dealing
from a deck of cards). In some instances we are interested in the possibility that the n elements can occur
repeatedly in a “lineup”. We did precisely this when we counted the number of binary strings of length n.
We can generalize to n-strings with an m-character alphabet, or equivalently n distinguishable balls thrown
into m distinguishable bins. By the first rule of counting, there are mn ways to throw n distinguishable balls
into m distinguishable bins, because for each of the n balls, we have m choices of bins to throw them into.
Suppose now the balls are indistinguishable. We want to know how many arrangements there are for
n indistinguishable balls in m bins; the bins are distinguishable.4 This counting problem arises in many
practical instances, including in quantum statistical mechanics where one needs to count the arrangements
of n identical particles among m energy levels. Imagine arranging the balls in a row; we may as well visualize
them as a string of n 0s. Now these balls have to be arranged in m bins; some bins may be empty. If we had
just two bins, we would need simply to specify where the first bin ends and the second begins, which can
be done by laying down the digit 1 as the divide between the two groups of zeros (if the first bin is empty,
the 1 will be to the left of the all 0’s, if the second bin is empty, it will be to the right; remember, bins are
distinguishable!) Now we extend the procedure to m bins by placing m − 1 digits 1 as the divides between
consecutive bins.5 Therefore, we have constructed a string of n + m − 1 0s and 1s, containing exactly n 0s in
any position. How many such strings are there? This is a familiar problem, which we solved in the previous
section; the answer is

n+m−1
.
n
3 This formula is quite accurate even for single digit integers!

4 Note that we avoided asking about throws; why?
5 This classic counting problem is known as “stars and bars,” stars for the balls and bars for the bin walls.

2.5. QUANTUM STATISTICS
2.5 Quantum Statistics

Counting in Classical and Quantum Systems
Consider two point particles and assume that they are not interacting with each other. Suppose
we know the microstates that a single particle can occupy, i.e., the single particle energy levels,
and suppose there are three such levels. How many microstates can the two-particle system be
found in? If the particles are distinguishable, there are 32 = 9 possible microstates; in statis-
tical mechanics, considering particles as distinguishable is referred to as Boltzmann counting.
If the particles are indistinguishable, and multiple oc-
cupancy of a level is allowed, then there is a total of
2+3−1
2 = 6 possible microstates; in statistical mechan-
ics, considering particles as indistinguishable with no oc-
cupancy restriction is referred to as Bose counting. Fi-
nally, if no two indistinguishable particles can occupy the
same level, which is referred to as Fermi counting, one
has 32 = 3 possible microstates (just choose the two out

of three that are occupied). Mathematically, Boltzmann,

Bose, or Fermi statistics is simply a matter of changing counting rule. In Nature, it turns out that
particles never follow Boltzmann counting. Integer spin particles (like a hydrogen atom or a photon)
follow Bose counting and are called Bosons; half-integer spin particles (like a deuterium atom or an
electron) follow Fermi counting and are called Fermions.
It is interesting to generalize the example above to arbitrary number of particles, N , and levels, M . In
−1
each of the three cases, one finds M N (distinguishable), N +M

N (indistinguishable with no restrictions),
and M

N (indistinguishable with single occupancy, or “without repetition”; note that M ≥ N necessarily).
What happens when the number of levels (or bins) is much larger than the number of particles (or balls)?
−1
MN
Look at the limiting cases for M N (which implies M 1). Then, N +M ≈ N +M ≈ N! ≈ M

N N N , so
both Bose and Fermi counting give the same approximate result, which is the Boltzmann counting divided
by N !. This happens because if there are many more bins than balls, most of the bins will be empty, and
a few will be singly occupied. Multiple occupancy will be extremely rare. Then, the only consequence of
indistinguishability is that the order in which we put the balls into their bins (one per bin) is irrelevant, and
thus we divide by N !, the number of all permutations of the balls.
2.6 The Binomial Theorem

One can use counting arguments to prove the equality between two quantities, if one can prove that both
count the same set. Proofs of this sort are called combinatorial proofs. For instance, we have shown that the
number of outcomes in a string of n coin tosses is 2n (multiply the possibilities at each toss, 2 × 2 × · · · , n
times). But imagine we have just observed one such string of n tosses, and saw k heads. There are just nk
such strings, and clearly it is possible to exhaust all possibilities of head, tail combinations by having any
number of heads 0 ≤ k ≤ n in our string of n tosses. This means that
n
X n
= 2n .
k
k=0
The equality we have just proved is a particular case of the important Binomial Theorem, which states that
n
X n
(p + q)n = pk q n−k . (2.6.1)
k
k=0
The proof of the general case is left as an exercise.

CHAPTER 2. COUNTING
Example: Configurations of a gas
Consider n gas molecules in a container of volume V , which is divided

in two by an imaginary partition with a hole (the partition may just as
well be real, as long as we have no “macroscopic” way of determining
the amount of gas inside each subdivision). The figure represents three
possible microstates for the case where n = 6. For the purpose of this
simple-minded example, a microstate is defined by how the molecules split
between upper and lowerPn compartment of the box; so the microstate at left
has l = 6. There are 0 nl = 2n = 64 microstates corresponding to the

n
macrostate specified by n
and V (forgetting about energy altogether for simplicity). There are 0 = 1 microstate represented
by the leftmost figure; n1 = 6 for the middle one, and n3 = 20 for the rightmost one. Although

the configuration at right is more “typical”, all of them are possible microstates of the system. If
we assume that the system spends equal time in each microstate (ergodic hypothesis), then the
configuration at left will occur for about a minute every hour, while the one at right for about twenty
minutes.
2.7 Exercises
1. How many poker hands are there? How many bridge hands? How many bridge hands with two aces?
(A poker hand has 5 cards; a bridge hand has 13; and the whole deck 52.)
2. A set S has n elements. How many subsets of S have at least 3 elements?
3. How many nonnegative integer solutions does the equation x1 + x2 + x3 + x4 + x5 + x6 = 9 have?
4. Prove of the binomial theorem using a combinatorial argument.

5. Prove Pascal’s identity between binomial coefficients: nk = n−1 n−1

k−1 + k using a combinatorial
argument. Hint: count all possible teams of k students that can be selected from a group of n students,
who are Alice and her n − 1 classmates.
j
X n m n+m
6. (Vandermonde convolution) Prove that = .
k j−k j
k=0
7. In the example of Sect. 2.6, how many molecules would it take for them to be completely segregated
in the lower container for only one minute over the entire life of the universe? (The universe is about
15 billion years young.)

Chapter 3
Probability
3.1 Definition of Probability

Suppose we perform an experiment, possibly repeated times. Assume we can’t tell the outcome of a par-
ticular trial in advance: in other words, the experiment in question observes a random variable X. Let the
“probability space” χ be the set of all values that the random variable X can take, i.e., the set of all possible
outcomes. For instance, our random variable could be the face of a coin in a coin toss, then χ = {head, tail}.
For the moment, we assume χ is finite and countable: χ = {xi }, i = 1, . . . N . A subset of χ is called an
“event.” A probability mass function p assigns to each outcome a real number. It is subject to three defining
axioms:
i. The probability of an oucome is between zero and one: 0 ≤ p(xi ) ≤ 1 ∀xi ∈ χ.
ii. The probability is “normalized” to one, meaning that
X
p(xi ) = 1.
i=1...N
iii. The probability of a union of disjoint (mutually exclusive) events is the sum of the probabilities of the
individual events.
In an experiment, we can sometimes treat the outcome as a continuous variable (e.g., the position of
an object on a line). Then, χ is not a countable collection and so we associate a (positive semidefinite)
probability density function (pdf) p(x) with outcomes distributed in the (infinitesimal) interval (x, x + dx)
by equating their probability to p(x)dx. In this case, the axioms still apply with the following changes:
p(x) ≥ 0 (but not Rnecessarily less or equal to one1 ) and the sums in the axioms above become integrals (e.g.,
∞
axiom ii becomes −∞ dx p(x) = 1).
To develop an intuition for the meaning of the probability axioms it is useful to picture a planar board
of arbitrary shape and finite mass. We take the total mass of the board to be the unit mass (axiom 2). The
board could be infinitely large (in which case we know the areal mass density must be infinitesimally small
over most of the board so that the mass can be unity). An outcome is a piece of the board. The probability
of the outcome is the mass of that piece of the board. Therefore, its mass is between nothing and the mass
of the whole board (axiom 1). If we break a section of the board into several pieces, their masses add up to
the mass of the original section (axiom 3).
In general, there are two conceptual steps involved in the statistical description of the physical world. The
first is to find the appropriate correspondence between χ and a set of numbers (this is the identification of
the random variable); the second is to prescribe the correct form of p(x) and predict the physical properties
of the system in question (if you are an experimentalist, these are the very practical steps of deciding what
to measure, and then doing it). Statistical mechanics gives us the recipe for the second step. The first step is
the selection of a model, and is guided largely by physical intuition and symmetry considerations. Of course,
it is also possible to travel the path in reverse, and use an experiment to infer if the probability distribution
was assigned correctly, as we discuss in the next section.
1 It is still true that the probability of any event must be less than or equal to one!
9
CHAPTER 3. PROBABILITY
3.2 Conditional Probability and Bayesian Inference
The probability distribution contains all information available about the random variable X. Therefore,
the probability distribution itself depends on the information available, and must be updated when new
information becomes available. The process of updating knowledge after making an observation is called
Bayesian inference, and rests on the concept of conditional probability: the probability of B given that A is
true (indicated by p(B|A)) is given by the probability that A and B are true, normalized by the probability
of A, which is written formally as
p(B ∩ A)
p(B|A) = .
p(A)
In terms of our board analogy, imagine drawing two closed figures, A and B, on the board. Then, p(A) is the
mass of the piece of board that we would get if we cut off figure A; remember the unit of mass is the mass
of the whole board. Now, figures A and B could intersect. Then, p(B ∩ A) is the mass of the intersection,
and p(B|A) is the mass of the intersection, expressed in units of the mass of figure A. In other words, we
forget about the original board and restrict our interest to figure A. In probability theory, we say that we
are “conditioning on A.” A useful theorem due to Bayes states that
p(B)
p(B|A) = p(A|B) .
p(A)
Example: Unfair coins
1. (Updating knowledge) Suppose we are given a bag with two coins. One is fair, and one has two
heads. Without looking, we reach for a coin and toss it. What is the probability that we observe
tail? This kind of problems can easily be solved by drawing a tree, where the branches exiting from
each node represent a possible outcome and are labeled with the corresponding probability. The
probability that we observe tail is 1/4, since there is a probability 1/2 of having picked the fair coin,
and for this, the probability of tail is 1/2. Suppose we do observe tail, and toss the same coin again.
What is the probability that we observe tail at the second toss? Clearly, we know after the first toss
that we must be holding the fair coin, so if we toss it again, we have a 1/2 probability of getting a
tail. However, consider this problem (same bag as before): without looking, we reach for a coin and
toss it twice. What is the probability of observing tail at the second toss? Here, we only look after
the second toss. Then, the probability of tail is still 1/4, since there has been no update of our prior
knowledge.
2. (Bayes’ theorem). Suppose we are given a bag with ten coins. Eight are fair, and two have two
heads. Without looking, we reach for a coin, flip it, and observe head. What is the probability that the
coin is fair? This is a typical application of Bayes’ theorem. Let us call p(F ), p(U ) the probabilities
that the coin is fair and unfair, respectively; and p(H), p(T ) the probabilities of observing head and
tail, respectively. Bayes theorem states that
p(F )
p(F |H) = p(H|F ) .
p(H)
The numerator terms are easily found: p(H|F )p(F ) = (1/2)(4/5) = 2/5. To find the denominator,
note that p(H) = p(H ∩ F ) + p(H ∩ U ) = p(H|F )p(F ) + p(H|U )p(U ). The first equality follows
from axiom iii and the second from the definition of conditional probability. But p(H|F ) = 1/2 and
2/5
p(H|U ) = 1, so p(F |H) = 2/5+1×1/5 = 2/3.

3.3. MOMENTS OF PROBABILITY DISTRIBUTIONS: EXPECTATION AND VARIANCE
3.3 Moments of Probability Distributions: Expectation and Vari-

ance
Two properties of a random variable are particularly noteworthy: the expectation (or mean) and the variance.
The expectation of X is defined as

 N
X


 xi p(xi ), X discrete;
E(X) = Zi=1 (3.3.1)

 dx xp(x), X continuous.


It is the average of all values of x weighted by their probability. Thus, if we run a long sequence of independent
experiments that measure the random variable X (i.e., each experiment samples X) and take the long time
average, the average value will converge to the expectation of X. This argument can be formalized by the
central limit theorem, as we will see later. The expectation value is sometimes indicated by the bracket
notation hxip (where the subscript denotes the probability distribution that is being sampled) or simply by
hxi when the distribution is unambiguously understood. A measure of the spread of the distribution p(x)
about the average is given by the expectation of the square of the distances between x and hxi, which is the
variance 
 XN
x2i p(xi ) − hxi2 , X discrete;



Var(X) = Zi=1 (3.3.2)
 2 2
 dx x p(x) − hxi , X continuous.


p
The standard deviation is σ = Var(x) and has the units of x. The quantity hxn i is called the n-th moment
of x.
3.4 Joint Probability Distributions, Independence, and Covari-

ance
Our definitions and results extend easily to the case when we are dealing with multiple random variables at
once (as, for example, if measuring the velocity of a particle in 3D space, or the position of two molecules
in a liquid, etc.). Suppose our outcome is specified R by a pair of numbers (x, y); then we define the joint
probability distribution p(x, y), normalized to dxdy p(x, y) = 1, as the probability of the outcome (x, y).
The random variables are independent if and only if the joint probability factorizes into the product of the
probabilities of the single variables, p(x1 , x2 , . . . , xn ) =R p1 (x1 )p2 (x2 ) . . . pn (xn ). In addition to the means
and variances of x and y, defined as before (e.g., hyi = dxdy yp(x, y)), one defines the covariance of x and
y as
Z
Cov(x, y) = dxdy xyp(x, y) − hxihyi.
The covariance is often called correlation function. If two variables are independent, then p(x, y) = p(x)p(y),
so Cov(x, y) = 0, but the converse is not necessarily true. If Cov(x, y) = 0, then x and y are said to be
linearly uncorrelated or linearly independent.
3.5 Binomial Distribution

The task of computing probabilities is one of counting the ratio of favorable outcomes to all possible outcomes.
The most important counting formula in classical statistical mechanics concerns binary random variables,
i.e., a variable that can take on two values, say x = 1 with probability q or x = 0 with probability (1 − q).
This random process is called a Bernoulli process. A simple example of a Bernoulli process is a coin toss. A

binomial random variable counts the number of successes in n independent Bernoulli trials. The probability
of k successes in n trials is given by the binomial distribution

n k
Pn,q (k) = q (1 − q)n−k .
k
This results follows simply from the assumption that the trials are independent, so that the joint probability
for the n trials factorizes, and from applying axiom iii after counting the number of different (nonoverlapping)
ways to get the desired number of P successes. It is easy to see that Pn,q (k) is normalized from the binomial
n
expansion of 1n = [q + (1 − q)]n = k=0 Pn,q (k) and that E(k) = nq and Var(k) = nq(1 − q). The binomial
distribution models, for example, a random walk, i.e., a process where a walker takes a sequence of n
independent random steps, each step to the right with probability q or to the left with probability (1 − q)
(a fair coin or an unbiased walk correspond to q = 0.5). Random walks can in turn be used to model many
interesting physical systems, from polymers to diffusion processes.
3.6 Binomial Distribution for Large n: Gaussian and Poisson Dis-

tributions
It is interesting to study the limit of the binomial distribution for n → ∞. First consider the case where the
probability of success is q 6= 0. We expect to see hki = nq successes, which is a large number since we assume
p σ 1
that q is finite; moreover, the typical spread of the distribution is σ = nq(1 − q), so ≈ √ → 0 and
hki n
we expect a strongly peaked distribution around hki. Defining a new variable x := k − nq with µ := nq, and
using Stirling’s formula, one finds (the proof is left as an exercise) that in the limit of large n and finite q,
the binomial distribution approaches the celebrated Gaussian distribution,
1 (x−µ)2
p(x) = √ e− 2σ 2 .
2πσ 2
The Gaussian distribution has expectation µ and standard deviation σ. It can always be reduced to standard,
or normal, form, µ = 0 and σ = 1, with an appropriate change of variable.
Next consider the case in which the number of trials n → ∞, but the probability of success is infinites-
imally small, so that nq := λ stays constant as n → ∞. This situation models a lot of real life problems
involving random “rare events” with constant rate, such as unstable isotope decays. Under the stated
hypothesis, we can write
n −k
λk λk −λ

n k n−k λ n! λ
Pn,q (k) = q (1 − q) = 1− 1 − ≈ e := Pλ (k), n → ∞.
k k! n (n − k)!nk n k!
This is the Poisson distribution. It has expectation hki = λ and variance hk 2 i − hki2 = λ.
3.7 Uniform Distribution and Cumulant Distribution Function

A very important probability distribution is the uniform distribution. For a discrete random variable, the
uniform probability distribution is simply pi = const = N −1 , where N is the cardinality (the size) of the
probability space. The classical example is the fair die. The concept of uniform probability distribution
extends naturally to continuous distributions. A continuous random variable distributed according to the
pdf2 (
1
, a≤x≤b
p(x) = b−a (3.7.1)
0 otherwise
is called a uniformly distributed random variable in the interval [a, b]. With appropriate shift and rescaling
of the units, we can always reduce to the case 0 ≤ x ≤ 1.
2 For a continuous random variable, it makes no difference whether we define the pdf on a closed or open interval.

3.8. DISTRIBUTION OF A FUNCTION OF RANDOM VARIABLES
Computer generated (pseudo)random numbers are distributed uniformly in the interval [0, 1). Other
distributions can be conveniently sampled from the uniform distribution with the following trick. Consider
the random variable X with arbitrary pdf p(x), where −∞ ≤ x ≤ ∞ without loss of generality. The
cumulant distribution function (CDF) of X is defined as the probability that X ≤ x or
Z x
F (x) = dx0 p(x0 ), (3.7.2)
−∞
so that p(x) = dFdx(x) . Put y = F (x); y is clearly a random variable; it depends on x in such a way that
if x1 ≤ x ≤ x2 , then F (x1 ) ≤ y ≤ F (x2 ) because p(x) is nonnegative. We assert that y is uniformly
dy
distributed in [0, 1]. This follows from the fact that g(y)dy = p(x)dx, but since dx = p(x), we must have
g(y) = 1. Therefore, we can pick a random number y uniformly distributed in [0, 1) and be guaranteed that
x = F −1 (y) will be distributed according to p(x). F −1 , the inverse function of F , is accessible analytically
or numerically.
3.8 Distribution of a Function of Random Variables
We will often encounter the case where we need to find the distribution of a function of random variables
with known joint distribution. The systematic way to solve this problem is to enforce the constraint with the
aid of a Dirac delta function (see example in the box below). In many cases, one can equivalently carry out
the task by integrating the joint probability of the random variables over the portion of the probability space
where the constraint is know to be satisfied. An example is the important problem of the distribution of the
sum of random variables. Consider two independent random variables, (x, y), each uniformly distributed in
[0, 1]. The joint pdf is of course the unit constant over the square domain [0 ≤ x ≤ 1] × [0 ≤ y ≤ 1]. What
is the probability distribution of z = x + y? That is, what is the probability that x, y add up to a given
number z? Clearly, neither variable can be greater than z, so we know, for example, that 0 ≤ x ≤ z if z ≤ 1,
and z − 1 ≤ x ≤ 1 if z > 1; either way, we have a choice for x. However, for a given x, y has to be exactly
z − x, so there is no choice for y and we can express p(z) as a single integral in x:
Z z


 dx = z, 0≤z≤1
Z0

1
pZ (z) = dx = 2 − z, 1 ≤ z ≤ 2 (3.8.1)

z−1



0 otherwise.

Note that the same result can be arrived at with the use of the Dirac δ. Let u(t) = 1, 0 ≤ t ≤ 1; u(t) = 0
otherwise, be the uniform pdf.3 Then
ZZ Z 1
pZ (z) = dxdy u(x)u(y)δ(x + y − z) = dx u(x)u(z − x), (3.8.2)
0
which evaluates to the answer in Eq. 3.8.1. Equation 3.8.2 is interesting because it shows explicitly the
general fact that given two independent random variables, the pdf of their sum is the convolution of their
pdfs.
3A more compact notation is u(t) = θ(t)θ(1 − t), where θ(t) is the Heaviside step function.

Pair correlation function

What is the probability distribution of the distance between two particular molecules in a liquid?
Suppose
p we know the joint distribution of p
~r1 , ~r2 , p12 (~r1 , ~r2RR
); we are looking for the distribution
p of
r = (~r1 − ~r2 )2 . Clearly, p(r) :=Prob r = (~r1 − ~r2 )2 = d3 r1 d3 r2 p12 (~r1 , ~r2 )δ(r − (~r1 − ~r2 )2 ),
where δ is the Dirac delta function.
p Recall that δ(f (x)) has support where f (x) = 0. Therefore, it
enforces the constraint r = (~r1 − ~r2 )2 , while the integral over the joint distribution measures the
size of the region of probability space where the constraint is obeyed. Imagine particle 1 is at the
origin; then in a uniform medium, the number of particles in a shell between r and r + dr increases
proportionately to the volume of the shell, which is 4πr2 dr. Therefore, the information about the
inhomogeneity, or structure, of the liquid is contained in the pair correlation function g(r) defined as

1 XX
q
4πr2 ρg(r) = h δ r − (~ri − ~rj )2 i (3.8.3)
N i
j6=i
where ρ = N/V . With this normalization, g(r) → 1 as r → ∞. However, at small distances, the plot
of g(r) has oscillations revealing the existence of a hard core and of an average coordination shell. In
numerical simulations, the evaluation of integrals with a δ functions like the one above can be carried
out by discretization and in that way they are reduced to the process of creating a histogram.
3.9 Characteristic Function and Central Limit Theorem
The characteristic function of a probability distribution is defined as the expectation value of the imaginary
exponential
Z
ikx
φ(k) = he i= dxp(x)eikx . (3.9.1)
It has the property that, if the moments of p exist, they can be generated as coefficients of the Taylor
expansion of φ at the origin:
dn φ(k)

hxn i = (−i)n , (3.9.2)
dk n k=0
as can be seen from Eq. 3.9.1 by differentiating under the integral.4
4 This may not always be allowed; in fact, the moments of a distribution may not always exist, but the integral in Eq. 3.9.1
always exists, and so does the characteristic function.

3.9. CHARACTERISTIC FUNCTION AND CENTRAL LIMIT THEOREM
Fourier Transform
The characteristic function is essentially the Fourier transform of the probability distribution. The
Fourier Transform of a function f (x) is defined as
Z
fˆ(k) = dxf (x)e−ikx , (3.9.3)
so that we see that φ(k) = p̂(−k). The inverse Fourier transform is

Z
dk ˆ
f (x) = f (k)eikx . (3.9.4)
2π
As an example, it is easy to show, by completing the squares, that the Fourier transform of a Gaussian
function is itself a Gaussian (but it is not normalized):
(x − x0 )2
2 2
−k σ
Z
1
√ dxe−ikx exp − 2
= e−ikx0
exp . (3.9.5)
2πσ 2 2σ 2
The width of the Fourier transform is the inverse of the width of the original Gaussian. This gen-
eral feature of Fourier transforms, that a broad function has a narrow transform and vice versa, is
intimately related to the Heisenberg uncertainty principle of quantum mechanics.
The characteristic function is important in statistical mechanics as it is often used to calculate correlation
functions. Here, we use it to demonstrate a fundamental result of statistics, the Central Limit Theorem.
Suppose we measure the random variable X, having probability distribution pX (x), n times; that is, we
build a sequence {xi , i = 1, . . . , n} of independent, identically distributed random variables all drawn from
the distribution
Pn pX (x). We assume the moments of X exist and put µ := hxi, σ 2 := hx2 i − µ2 . Let
1
y = n i=1 xi be the average of the measurements, which is a new random variable function of the {xi }.
Under these conditions, the central limit theorem states the remarkable fact that the probability distribution
of pY (y) approaches a Gaussian distribution with mean µ and variance τ 2 = σ 2 /n as n → ∞, regardless of
the actual shape of the distribution pX (x).
To show how this happens, consider the characteristic function φX of the distribution pX and the char-
acteristic function φY of the distribution pY . By definition,
Z P Y n Z n
iky ik i xi ikxi k
φY (k) = he i = dx1 . . . dxn p(x1 ) . . . p(xn ) exp = dxi pX (xi ) exp = φX .
n i=1
n n
(3.9.6)
Now, note that in the limit n → ∞, the argument of φX (k/n) goes to zero, suggesting an approximation by
Taylor expansion
n n n
k 2 d2 φ(k) k2 2

k k dφ(k) k 2
φX = φX (0) + + 2 + . . . = 1 + i µ − τ + O(1/n ) .
n n dk 2n dk 2 k=0 n 2n
Recalling that
n
t
lim n → ∞ 1 + = et ,
N
one finds
k2 τ 2

ikµ
φY (k) = e exp − ; (3.9.7)
2
by taking the inverse Fourier transform, one arrives at the desired result
(y − µ)2

1
pY (y) = √ exp − . (3.9.8)
2πτ 2 2τ 2

3.10 Exercises
1. Maxwell velocity distribution We will prove later on that the velocity ~v of a molecule in a classical
real gas follows a Gaussian probability distribution (Maxwell distribution)
1 m~v2
p(~v ) = 3/2
e− 2kT .
(2πkT /m)
Using this information, find the average velocity, the standard deviation (also called rms velocity), and
the correlation function hvx vz i. What property of the Maxwell distribution is responsible for the value
of the correlation function just found?
2. Show that for the binomial distribution Pn,q (k), E(k) = nq and Var(k) = nq(1 − q).
3. How many configurations are there for a substitutional alloy AB, with N total atoms, and NA atoms
of type A?
4. What is the probability that an unbiased random walker takes 10 steps and lands 2 steps away from
the starting point? 5 steps away?
5. Let x, y be uniform independent random variables in [0, 1], and z = x + y. In this problem, you will
arrive at result 3.8.1 by a different route (so do not assume the result of 3.8.1 or 3.8.2!). Calculate the
probability that x + y ≤ z (you may do an integral, but it is easier to use elementary geometry). Since
this is the CDF of z, differentiate to obtain the pdf and verify Eq. 3.8.1.
6. Prove that the characteristic function of a probability distribution always exists. Then, calculate the
characteristic function of the Cauchy pdf (aka Lorentzian curve), p(x) = π[(x−xγ0 )2 +γ 2 ] , and show that
the second moment does not exist both by direct calculation and by the characteristic function route.
Lorentzians are ubiquitous in spectroscopy.

Chapter 4
Entropy
4.1 Entropy of a Random Process

While the variance is sometimes used as a measure of the spread of a distribution,1 another way to charac-
terize a random process is by its entropy,2
X
S(X) = − p(xi ) ln p(xi ). (4.1.1)
i
For example, consider a six sided die. If the die is fair, the probability that the die lands on each face is
the same (1/6). Then, S = ln 6, the logarithm of the number of possible outcomes. Suppose now the die
is rigged, so that it almost always lands on 3 or 4; then S ≈ ln 2 < ln 6. We suspect that the entropy gets
bigger, the flatter the probability, and that it is maximum when all outcomes are equally likely. This fact
will be proven later. Note that when there is only one outcome (so the variable X is not random at all, but
is instead deterministic), entropy attains its minimum possible value, S = 0.
4.2 Boltzmann’s Entropy Formula

The connection between classical thermodynamics and statistical mechanics was first postulated by Ludwig
Boltzmann, who asserted that the thermodynamic entropy of a system at equilibrium is proportional to the
natural logarithm of the number of microstates, W , corresponding to the system’s macroscopic state.3 Since
the macroscopic equilibrium state of an isolated one-component, one-phase system of N particles is specified
by its total energy and by the volume, W can depend on E, V , and N only. Another way to think of this is
that V fixes the system’s boundaries, and within these boundaries, E and N are conserved quantities if the
system is isolated. At equilibrium, the macroscopic state of the system does not change, and neither do V ,
E, and N . Thus we must allow W to depend on these quantities. Then
S(E, V, N ) = k ln W, (4.2.1)
where the constant k (sometimes written as kB ) is called the Boltzmann constant. Comparing Boltzmann’s
entropy formula with Eq. 4.1.1 and recalling that microstates are drawn from a uniform pdf, we see that
there is agreement up to the proportionality constant k: entropy quantifies the information we stand to gain
about a thermodynamic system if we actually decide to measure which microstate the system is occupying.
The logarithmic dependence of entropy follows from the requirement that the entropy of two isolated
systems must be additive. Consider two isolated systems, A and B, with number of microstates WA and WB ,
1 More precisely, one must look for a nondimensional number, such as the ratio of mean to standard deviation. Also note that
when a distribution deviates substantially from normal (Gaussian), mean and variance may no longer be the most meaningful
descriptors of tendency.
2 This definition is used in information theory, where entropy is often referred to as “Shannon entropy” and denoted by H;
the logarithm is then taken in base 2.

3 W is sometimes called the multiplicity of the system.
17
CHAPTER 4. ENTROPY
respectively. Since the two systems do not interact, the number of microstates of the combined system is
WA WB by the first rule of counting. Then the entropy is k ln WA WB = SA + SB . It can be shown that the
logarithm is the only function that satisfies this property. From Boltzmann’s formula, it also follows that
S ≥ 0.
4.3 Discrete Phase Space: the Entropy of a Lattice Gas

A lattice gas is a highly idealized model of a gas, where each particle’s coordinate belongs to a set of discrete
values (e.g., Z3 for a cubic lattice); since coordinates are not differentiable, momenta are not defined. While
dynamic laws could be prescribed, the usual lattice model lacks dynamics altogether. An ideal lattice gas is
one where two particles on different lattice sites do not interact with each other. However, the lattice gas is
defined with the constraint that no two particles can occupy the same site. Therefore, the lattice gas is not
a model of an ideal gas, not just because it omits kinetic energy, but more importantly because it includes
excluded volume effects (hard core repulsion between particles). How do we calculate the entropy of a lattice
gas of N particles? We must calculate all possible configurations of the N particles on the lattice. If the
lattice has M sites, obviously with M ≥ N , there are W = M

N ways of putting the particles on the lattice,
and the entropy S = k ln W takes on a particularly simple form when expressed in terms of the fraction of
occupied site, φ =: N/M , 0 ≤ φ ≤ 1. Using Sterling’s approximation, one finds

S = −kM φ ln φ + (1 − φ) ln(1 − φ) . (4.3.1)
Not surprisingly this is the entropy of M independent Bernoulli variables.

Two-level system
The two-level system (tls) describes a particle that can occupy either one of two energy levels, say a
ground state with energy E0 = 0 and an excited state with energy E1 = . Consider N such particles
with total energy E. Assume they are distinguishable and noninteracting, so that each of them can be
treated as a tls independent of the others. What is the entropy? To calculate S, we need to compute
W , the number of way to put the N particles in the two energy states. If energy did not matter,
the answer would be 2N , but we know that the total energy is E. NTherefore, there are n(E) = E/
N

particles in the excited state, W = n(E) , and S(E, N ) = k ln n(E) . Note that entropy does not
depend on volume in this simple model.
4.4 Continuum Phase Space: the Entropy of the Classical Ideal

Gas
Consider an ideal monoatomic gas, that is, N noninteracting point particles confined to a volume V with
total energy E. How do we calculate entropy in the case where the microstate is specified by continuous
variables? We must calculate how many ways we have to assign positions and momenta to the N particles,
compatibly with the constraints. The degrees of freedom are 3N coordinates and 3N momenta, spanning a
(6N − 1)-dimensional space because of the constraint of constant energy,
X
p~2i /2m = E.
i
Each possible microstate corresponds to a point in this space. How does phase space look like? Imagine for
simplicity you just have two particles on a line segment of length L. Then phase space is spanned by the
variables x1 , x2 , p1 , p2 , with p21 + p22 = 2mE. Thus, the pair of coordinates can be chosen anywhere inside
2
a rectangle
√ of area L and the pair of momenta can be chosen anywhere on the circumference of aN circle
of radius 2mE. In the general case, coordinates span a 3N -dimensional hypercube, of volume V , and

4.5. PRINCIPLE OF MAXIMUM ENTROPY
momenta span the surface of a 3N -dimensional hypersphere of area4
2π 3N/2
S3N = (2mE)(3N −1)/2 .
Γ(3N/2)
Two important observations are in order before proceeding. First, there is a symmetry of phase space
that we have overlooked in our counting argument. Consider the case where we have two particles. Pick a
point in phase space and carry out the following operation: reflect the coordinates and the momenta about
the x1 = x2 and p1 = p2 axes. Clearly, this means switching positions and momenta of the two particles.
But if the masses are the same (the particles are identical), then we have simply switched the labels of the
two particles, yielding a physical configuration that no measurement could distinguish from the previous
one. This means that we have double counted the number of available configurations and so we must divide
the phase space volume by 2!, or N ! in the general case of N identical particles. Note that in the case of the
lattice gas, the binomial distribution formula does this for us automatically.
Second, knowing the volume of available phase space still doesn’t solve our problem. After all, entropy
was defined through the logarithm of a number, whereas the volume of phase space has dimensions [Lp]3N .
This indicates that we are missing a factor with those dimensions and suggests that we subdivide phase
space into elementary cells of volume (∆x∆p)3N . The multiplicity of the system is given simply by the
number of phase space cells available to the system, divided by N !. In classical physics, there is no criterion
to fix the value of the elementary phase space cell volume. In quantum mechanics, Heisenberg’s uncertainty
principle tells us that it is impossible to determine position and momentum of a particle simultaneously
with arbitrary accuracy; therefore phase space must be coarse grained. It turns out that the volume of
a phase space cell is h3N , where h is Planck’s constant (we will later demonstrate this later). Entropy is
then obtained from the natural logarithm of the number of elementary phase space cells. Using Stirling’s
approximation for the factorials and up to terms of order N (note that 3N − 1 ≈ 3N ), one finds
3/2
V 4mπE 5kN
S = kN ln + . (4.4.1)
N 3h2 N 2
This formula is called the Sackur-Tetrode entropy for its discoverers.

To recap what we have learned in the previous two examples: to calculate the entropy of an isolated
system (E=constant) with a discrete number of degrees of freedom, count (add up) all the microstates with
the specified energy. For systems with continuously varying degrees of freedom, perform an integral over the
degrees of freedom to find the volume of phase space, using
d3N xd3N p
N !h3N
as the measure and constraining the Hamiltonian to the specified value of energy, E. While this integral has
a simple geometric interpretation in the case of an ideal gas, it can more generally be cast in the form
Z Z 3N 3N
d xd p
g(E, V, N ) = · · · δ[E − H({~ pi }, {~xi })]. (4.4.2)
N !h3N
g(E, V, N ) is called the density of states because the number of states of with energy between E and
E + dE is given by W (E, V, N ) = g(E, V, N )dE.5
4.5 Principle of Maximum Entropy

Maximum or minimum (or, more generally, extremum) principles are commonplace in physics. For instance,
we are familiar with the notion that static equilibrium is determined by potential energy being extremal,
or that light travels between two point on the path of stationary time (Fermat’s principle). Statistical
4 The Γ function generalizes factorials to the complex plane. For integer argument, Γ(n) = (n − 1)!, and thus the asymptotic
behavior on the positive real axis is given by Stirling’s formula

5 There are many other common notations for the density of states, such as ρ(E) and Ω(E)

CHAPTER 4. ENTROPY
mechanics has its own extremum principle. It states that for an isolated system, there is a functional of the
probability distribution of the microstates that takes on its maximum value for the distribution corresponding
to thermodynamic equilibrium. This functional is
X
S = −k pi ln pi . (4.5.1)
i
The definition of statistical entropy ensures that

(a) Boltzmann expression of thermodynamic entropy is retrieved upon maximization;
(b) it is possible to predict quantities measured in macroscopic (thermodynamic) experiments, as expectation
values over the equilibrium probability distribution.
Several remarks are in order. First, there is a fundamental distinction between extremum principles in
dynamics and the maximum entropy principle. The former stem from the homogeneity of space and time.
The latter results from the “homogeneity” of a probability space that exists because we have “chosen” to
ignore the information encoded in the microscopic states of the system. Probabilities are conditional by
nature; they depend, as we have seen, on what knowledge is available at a given time. Next, note that
Eq. 4.5.1 does not have an immediate counterpart for continuous pdfs, which are dimensional objects and
cannot serve as arguments of transcendental functions like the logarithm. This points to some missing factor,
like a phase space measure, as already noted in our calculation of the classical ideal gas entropy. The problem
is, however, considerably more complex, and in the remainder of these note we will limit our use of Eq. 4.5.1
to discrete distributions only.
Maximization by Lagrange Multipliers
We now show explicitly that Boltzmann’s formula follows immediately upon carrying out the maxi-
mization procedure on Eq. 4.5.1. This is done by imposing the condition that the derivatives of S with
respect of the {pi } be zero; however, the probabilities cannot be treated as independent variables,
XW
because they must add up to unity: pi = 1. Constrained extremization problems of this kind are
i=1
best handled by the method of Lagrange P multipliers. Thus, we introduce a new parameter, α, and
extremize the function S({pi }) + kα i pi with respect to the {pi } (unconstrained extremization).
Pfor the {pi }’s that depend on the value of α.
This step is very straightforward and yields expressions
We are then able to choose α such that the constraint i pi = 1 is obeyed. Let us see how this works
in practice:
X X
∂
−k pi ln pi + kα pi = −k ln pi − k + kα = 0 =⇒ pi = eα−1 .
∂pi i i
Then, X
pi (α) = 1 =⇒ 1 = W eα−1
i
or
pi = 1/W,
thus recovering the expected result that the probabilities of all microstates are the same and equal to
the reciprocal of the multiplicity of the system, which gives Boltzmann’s statement. The proof that
the extremum found in this manner is in fact a maximum is left as an exercise.
4.6 Exercises
1. Show that the extremum of S found by the method of Lagrange multipliers is a maximum.
2. Show that statistical entropy is positive semidefinite (for any discrete probability distribution, not just
for the most probable).

4.6. EXERCISES
3. What is the entropy of a quarter-filled lattice gas with M sites? Of a half filled one? Of a three-quarter
filled one? Do you notice a symmetry?
4. Looking at your answer from the previous problem, what filling of the lattice gas has the highest
entropy? Does this mean that the equilibrium state of the lattice gas is the half filled lattice?
5. Invert the Sackur-Tetrode equation to find E(S, V, N ). Why are you guaranteed that the function can
in fact be inverted?

CHAPTER 4. ENTROPY

Chapter 5
Properties of Entropy
5.1 The Dependence of Entropy on E, V, N

Let us examine the qualitative dependence of entropy on its three variables. We examine each variable in
turn, keeping the other two constant. (a) Increase the energy at constant volume and number of particles.
In a classical Hamiltonian system, at constant volume, an increase in energy can be effected by keeping
the positions of the particles fixed (constant potential energy) and increasing their kinetic energy. This
results in a larger number of choices of the momenta, which means higher entropy. Thus, entropy increases
monotonically with energy. Nevertheless, it is possible to imagine physical systems where the energy is
bounded above (which means that there has to be a maximum energy state). In this case, as energy is
added to the system, all the particles eventually would end up in the highest energy state, so the entropy
would evaluate to zero. Therefore, in these systems, the entropy would be nonmonotonic. Nuclear spins are
examples of systems where this condition can be realized temporarily (over short time scales); eventually, the
spin degrees of freedom transfer energy to the translational degrees of freedom. (b) Increase the volume at
constant energy and number of particles. Increasing volume always offers more choices of particle placement,
so entropy is a monotonically increasing function of volume. (c) Increase number of particles at constant
energy and volume. If the particles acts approximately independent of one another (as they usually do at
very low density), then increasing their number will increase the number of ways energy can be partitioned
among them, as well as the number of spatial configurations available to them. However, in the presence
of “excluded volume” interactions, there is a limit to the number of particles that can be accommodated at
constant volume and energy; therefore, in this case, entropy will increase but then decrease as a function of
N (see the lattice gas for a simple example).
5.2 Thermodynamic Forces

From the foregoing considerations, we see that whenever entropy can be regarded as a differentiable function
of its variables, then sign of its partial derivatives is dictated by physical considerations. Additional consid-
erations of equilibrium allow us to identify these derivatives as thermodynamic forces familiar from everyday
experience. Consider two isolated systems A and B, with entropies SA (EA , VA , NA ) and SB (EB , VB , NB )
and multiplicities WA,B = exp(SA,B /k). Note that the isolated systems start out with maximum multiplicity
(or maximum entropy), P meaning that the probabilities piA , piB of the microstates in each of the systems are
such that SA,B ≡ − iA,B piA,B ln piA,B are as large as possible. Since the systems do not interact with each
other, and their entropies are maximum, the total entropy SA + SB is also maximum. But suppose now we
let the two systems exchange energy, for instance by putting them in contact through a diathermic wall. In
other words, we remove the constraint that EA,B = constant. Of course, conservation of energy implies that
EA + EB = constant, so any increase in energy of one system is compensated by a corresponding loss in
the other: dEA = −dEB . However, the microstate probabilities can reorganize to maximize the combined
multiplicity WA WB or the total entropy SA +SB . The combined entropy will be maximum when it no longer
23
CHAPTER 5. PROPERTIES OF ENTROPY
changes under infinitesimal exchange of energy,
d(SA + SB ) dSA dSB dSA dSB

0= = + = − ,
dEA dEA dEA dEA dEB
dSA dSB
or dE A
= dE B
. Hence, we recognize that energy exchange stops once the quantity ∂S/∂E is equal on the
two sides of the diathermal wall (we use the partial derivative to remember that the diathermal wall does
not allow change of volume nor particle transfer). We can reason in completely similar ways to reach the
conclusion that ∂S/∂V is a quantity that governs mechanical equilibrium, and ∂S/∂N is a quantity that
governs chemical equilibrium. In other words, the three quantities obtained by taking the derivative of
the entropy with respect to the variables E, V, N are the driving forces for energy, volume, and particle
exchange, respectively.1 Note that since entropy and its variables are all extensive, the first derivatives are
all intensive quantities. These quantities are important because it is through an application of these forces
that we can change the respective state variables. In fact, in laboratory condition, the forces are often easier
to control than the corresponding (“conjugate”) extensive variables.
5.3 Connection to Kinetic Temperature

To identify these forces in terms of quantities we measure in the laboratory, let us analyze an isolated ideal
monoatomic gas with total energy E , N atoms of mass m, confined in a box of volume V = Lx Ly Lz with
walls perpendicular to the three axes. It is then straightforward to derive the pressure of the gas using kinetic
theory. A particle in the gas will impinge elastically, say from the right on the left wall of the box of area
Ly Lz , undergoing a change in momentum ∆px = 2mvx with frequency f = vx /(2Lx ) (momentum parallel
to the wall is conserved). The time-averaged force is then Fx = f ∆px = mvx2 /Lx , contributing an amount
Fx /(Ly Lz ) to the pressure. Adding the contribution of all particles, and observing that for randomly moving
particles, mvx2 = mvy2 = mvz2 = (2/3)E, since for the ideal gas the energy is all kinetic, we arrive at
2E
P = , (5.3.1)
3V
a well known result of kinetic theory. Moreover, from the empirical definition of temperature via the ideal
gas thermometer, we know that P ∝ N T /V.2 We can define the scale of temperature in such a way that the
proportionality constant coincides with Boltzmann’s constant, so that
P kN
= . (5.3.2)
T V
Now, using the Sackur-Tetrode entropy formula for the ideal gas, we find

∂S 3kN
= (5.3.3a)
∂E V,N 2E

∂S kN
= . (5.3.3b)
∂V E,N V
Thus we are led to the straightforward identifications

∂S 1
= (5.3.4a)
∂E V,N T

∂S P
= , (5.3.4b)
∂V E,N T
1N is of course a discrete variable, but on account of its magnitude in thermodynamics it can usually be treated as continuous.
2 Note that this result implies that the energy of a given amount of ideal gas is directly proportional to temperature and
independent of pressure or volume.

5.4. HOMOGENEITY OF ENTROPY
and moreover we find that

3N kT
E= . (5.3.5)
2
We can use the remaining derivative to define a new quantity, the chemical potential µ, as

∂S µ
:= − .
∂N E,V T
To understand the meaning of µ, begin with the differential form of entropy:

dE P dV µdN
dS(E, V, N ) = + − . (5.3.6)
T T T
Since T > 0, we can invert S(E, V, N ) for E(S, V, N ), and obtain the differential form for the energy of the
system:
dE(S, V, N ) = T dS − P dV + µdN. (5.3.7)
From Eq. 5.3.7, the meaning of the chemical potential µ becomes transparent: it is the change in energy upon
adding a particle to the system at constant entropy and volume. In thermodynamics, energy is sometimes
referred to as “internal energy” and denoted by the symbol U . Working with energy, rather than entropy,
is easier when we are required to apply the first law of thermodynamics. The differential of energy is the
sum of terms of the form (thermodynamic force)×(differential change in extensive property); this is entirely
analogous to the form of the potential energy in classical mechanics, dU = −F~ · d~x, and is one reason we
call the internal energy a “thermodynamic potential”.
Chemical Potential of a Classical Ideal Gas; Quantum Degeneracy
We can obtain an expression for the chemical potential of the classical monoatomic ideal gas by
differentiating the Sackur-Tetrode entropy and using Eq. 5.3.5:
3/2
∂S 2πmkT V
µ = −T = −kT ln 2
. (5.3.8)
∂N E,V
h N
Note that the chemical potential is large and negative at sufficiently high temperature, but it can be
positive at low T and high density. It is interesting to see when the chemical potential switches sign:
1/2 1/3
h2

V
µ = 0 =⇒ = . (5.3.9)
2πmkT N
The quantity on the left hand side has dimension of a length and is called the thermal de Broglie
wave length; it is usually denoted by λ,
1/2
h2

λ= . (5.3.10)
2πmkT
Its dependence on Planck’s constant h tells us that it is a quantum mechanical parameter. Classical
theory breaks down when the thermal de Broglie wave length becomes comparable to the interparticle
separation; under these conditions the gas is called “degenerate.” Thus, classical statistical mechanics
cannot be used when the chemical potential approaches zero (from below): matter at extremely high
pressure or very low temperature, in the sense defined by Eq. 5.3.9, behaves quantum mechanically.
5.4 Homogeneity of Entropy

We have seen that the additivity of entropy motivated the use of the logarithm in its definition. Now
consider a system in equilibrium. What would happen to the entropy if we doubled the energy, volume,

and number of particles in the system? Imagine that we carve out subsystems of unit volume, large enough
that each subsystem can still be considered macroscopic - i.e., a valid statistical representation of the whole
system. This means that after doubling, the new system would simply be made up of twice the number of
representative subsystems, but from the point of view of each subsystem, nothing would have changed: in
other words, all properties that can be measured locally (such as particle density, energy density, temperature,
compressibility, and so on) would remain the same. This consideration leads us to posit that the entropy is
a “homogeneous function of degree one.” Formally, we say
S(λE, λV, λN ) = λS(E, V, N ). (5.4.1)
Note that homogeneity cannot be justified if the interactions among particles have macroscopic range (e.g.,
Coulomb or gravitational).3
5.5 Thermodynamic Susceptibilities

In this section we explore a very important consequence of the two properties of entropy: (a) homogeneity,
S(λE, λV, λN ) = λS(E, V, N ) and (b) second law. We have seen that the Second Law of thermodynamics
for an isolated system with given E, V , N states that the state of unconstrained equilibrium has an entropy
greater than all states of constrained equilibrium with the same E, V , N . Together with the homogeneity
assumption, this implies that the entropy is a concave function of energy, volume, and number of particles.
This can be proven simply as follows. Take two systems 1 and 2, initially separated, with the same volume
V and N and with energies E1 = E + ∆ and E2 = E − ∆ (you may think, for instance, of two isolated gas
cylinders with the same volume and number moles, but slightly different energy, separated by an insulating
partition). The total initial entropy is then Si = S(E + ∆, V, N ) + S(E − ∆, V, N ). Now bring the two
systems together and remove the partition. The final entropy is then Sf = S(2E, 2V, 2N ). By the second
law, we have, upon removal of the constraint (partition): Sf ≥ Si , while by the homogeneity of entropy, we
have Sf = 2S(E, V, N ). Therefore,
S(E + ∆, V, N ) + S(E − ∆, V, N ) − 2S(E, V, N ) ≤ 0,
which is the statement of concavity of entropy. This means that entropy, regarded as a function of energy
(constant V , N ) has negative curvature. The principal curvatures of entropy (or any other potential) are
called susceptibilities or responses, since they describe how a system “responds” to a change in the affected
variable. Here, we considered an injection of energy at constant volume, which is a quantity of heat, since
no work is done at constant volume. The relevant thermodynamic susceptibility is thus the heat capacity at
constant volume (defined in Sect. 5.6), which therefore is positive semidefinite.
5.6 Entropy Changes and Reversibility

In a thermodynamic process a system exchanges energy, volume, or particles with its surroundings. Together,
system and surroundings form an isolated system, conventionally referred to as the “universe.”4 During a
process, the system moves from an initial state of equilibrium to a final state, also of equilibrium. When
changes happen under an infinitesimal gradient (of T , P , or µ), the system remains arbitrarily close to
states of equilibrium throughout the process. Such processes are called reversible and the total entropy
(system+surroundings) does not change. This can be seen by calculating the entropy change explicitly. For
instance, upon exchange of an infinitesimal amount of energy, dE, under infinitesimal temperature difference
dT , at constant V and N , we can write:

∂Ssys ∂Ssurr 1 1 1
dSuniv (E) = dSsys + dSsurr = dE + (−dE) = − dE = 2 dT dE,
∂E ∂E T − dT T T
3 Forinstance, a cubic meter of iron oxide in Earth’s crust is subject to very different stresses than its counterpart on Mars.
4 The system proper is what we are interested in, and the surroundings are whatever is needed to effect the changes in state
variables, for example a heat source/sink (often called a reservoir) that can deliver/absorb energy to/from the system.

5.7. CLAUSIUS’ STATEMENT OF THE SECOND LAW OF THERMODYNAMICS
which is a higher order infinitesimal and therefore vanishes. In contrast, under finite temperature difference,
∆T , the change in entropy of the universe is positive, since dE and ∆T have the same sign.
Note that dE is an energy change at constant volume, and since no mechanical work is done at constant
volume, the energy change is due to heat. In general an infinitesimal amount of heat exchanged reversibly
at temperature T , at constant volume and number of particles, can be expressed as
δqrev = T dS. (5.6.1)
Likewise, δWrev = P dV is the infinitesimal amount of reversible work done by the system. Note that
reversible work implies that the pressure P of the system is equal to the external pressure Pext . If the work
is not done reversibly, then P 6= Pext , and the external pressure must be used in the computation of the work
RV
done by the system: W = Vif Pext dV (think, e.g., about a gas expanding against a piston with friction).
Operationally, one measures heat by calorimetry, and it is convenient to express a quantity of heat exchanged
at constant volume as δq = cV dT , which holds regardless of whether the heat is exchanged reversibly or not.
Combining the two relations for reversible heat exchange, we find
cV dT
dS V,N = . (5.6.2)
T
Why is this expression useful? Entropy is, after all, a function of E, V, N , so the change in entropy upon
R E+Q 0
absorption of a quantity of heat Q should be calculated as S(E+Q, V, N )−S(E, V, N ) = E dE 0 ∂S(E∂E,V,N
0
)
.
However, temperature is usually easier to measure than energy, so it is desirable to have an expressions of
entropy in terms of T rather than E, or an expression for energy in terms of T rather than S.
Heat exchange with a reservoir
A heat reservoir, or heat bath, or simply a reservoir, is a very large body that can exchange finite
amounts of heat without change in temperature: it has practically infinite heat capacity. What is
the entropy change of a reservoir upon exchange of a quantity of heat Q? Since ∆T = Q/cV is
infinitesimal, T can be treated as constant in the integral for entropy. Therefore ∆S = cV T∆T = Q
T.
Recalling the sign convention on heat exchange (absorbed heat quantity is positive), note that the
entropy of the reservoir increases when heat goes in and decreases when heat goes out.
Example: Irreversible heat exchange
There are two identical blocks of a certain material of constant heat capacity cV ; one is at temperature
TH and the other at TL < TH . The two blocks are the system. They are brought together and left
to equilibrate. We assume that the blocks are isolated from the surroundings, and we neglect volume
expansion. We want to find the final temperature and the change in entropy of the universe. Since
the blocks are isolated, and no work is done, the quantity of heat flowing out of one is equal in
magnitude to that flowing into the other by conservation of energy. Let Tf be the final temperature
after the blocks equilibrate. Thus, −QH = cV (TH −Tf ) = QL = cV (Tf −TL ) and Tf = (TH +T 2
L)
. The
change in entropy is ∆SUniv = ∆Ssyst = ∆SH + ∆SL , which is conveniently expressed as an integral
over temperature, since the initial and final states of the blocks are given in terms of temperature:
T2 2
∆SUniv = cV ln THfTL = cV ln (T4T
H +TL )
H TL
, which is manifestly greater than zero.
5.7 Clausius’ Statement of the Second Law of Thermodynamics

Let us return to the fundamental result of Sect. 5.2, namely that if we let two systems A and B that are
separately in equilibrium exchange energy with each other at constant V and N (which means exchanging
∂S
heat!), the new state of equilibrium is achieved when ∂E V,N
≡ T1 is the same in the two systems. This
means that heat flow happens in the direction that equalizes the two temperatures. This is proven by noting
∂S ∂2S
that the system with the larger ∂E must see it decrease. But since ∂E 2 < 0, this means that when energy

∂S
enters the system ∂E decreases. So energy enters the colder system and exits the hotter one. The second
law of thermodynamics, which antedates Boltzmann’s formula and the principle of maximum entropy, and
was based entirely on experimental evidence, was formulated by Clausius in the following way:
The Second Law of Themodynamics (Clausius) There is no process whose only effect is to
transfer a quantity of heat from a colder body to a hotter one.
We see that Clausius’ statement of the second law is equivalent to the principle of maximum entropy.
5.8 Entropy at Low Temperature

Let us return to the differential expression for entropy given by Eq. 5.6.2. Since heat capacities are relatively
easy to measure (and to compute from first principles), this expression lets us determine the entropy of a
substance at any temperature, up to an integration constant:
T
cV (T 0 )
Z
S(T ) = S(0) + dT 0 . (5.8.1)
0 T0
This equation is interesting because it tells us that, for entropy to be defined, there can be at most an
integrable singularity at low T , so for any system in thermodynamic equilibrium, CV → 0 as T → 0.
As to the value of the integration constant, Boltzmann’s formula states that it is the logarithm of the
number of states of the system in equilibrium at T = 0, which means in mechanical equilibrium. Since
matter obeys the laws of quantum mechanics (experimental fact), we must answer the question, what is the
degeneracy of the quantum mechanical ground state of a Hamiltonian system? In most systems of interest,
the ground state turns out to be either nondegenerate, or have a finite degeneracy, so that S(0) is not an
extensive constant and can be taken to be zero: this is Nernst’s third law of thermodynamics. Examples
of nondegenerate ground states are all gases, crystals and liquid He. To provide a general answer to the
question of ground state degeneracy, one must rephrase it in terms of eigenvalues of large matrices. A large
degeneracy implies the existence of high symmetry, and such symmetry is usually broken by the slightest
perturbation, so that high degeneracies are lifted in physical systems.
5.9 Exercises
E+P V −µN
1. Differentiate both sides of Eq. 5.4.1 with respect to λ and then put λ = 1 to show that S = T .
2. Rearranging the equation from the previous exercise shows that E = T S − P V + µN . Regarding all
variables as independent, take the total differential and then subtract Eq. 5.3.7 to obtain an equation
relating the differentials of intensive variables only (Gibbs-Duhem equation). Use this result to explain
Eq. 1.3.1.
3. According to Gibbs-Duhem, µ is a function of P and T only. Starting from Eq. 5.3.8, give the explicit
formula for µ(T, P ) for the ideal gas.

Chapter 6
Thermodynamic Processes and Cycles
6.1 Joule Expansion and Other Processes for an Ideal Gas

Consider two insulated containers of equal volumes V , connected by a thin tube with a valve. On one side
there is gas, on the other there is vacuum. What happens when we open the valve and let the gas fill both
containers? We know that the process is irreversible, since the pressures on the two sides of the valve are
different. Moreover, we know that energy does not change during the process, since no heat can flow in or
out of the insulated container, and no work is done by the gas (Pext = 0). So, ∆S > 0 and ∆E = 0. Nothing
further can be said without knowledge of the equation of state of the gas. Let us assume the gas is ideal.
Then, from Eq. 4.4.1, we can calculate ∆S = kN ln 2, while from Sect. 5.3 we have ∆T = 0. Expansion into
a vacuum is an experiment performed by Joule and is called the Joule process. Joule did not use insulated
containers; instead, his apparatus was immersed in a calorimeter. He did not know kinetic theory, so he used
the experiment to discover that ∆T = 0 for an ideal gas. Ideal gas processes are easy to analyze, using the
equation of state P V = N kT , and the fact that the internal energy of the ideal gas depends on temperature
only, E = cV T . Important processes are summarized in Table 6.1.
Table 6.1: Processes for one mole of an Ideal Gas (P V = RT )
Process Equation Work done by gas Heat absorbed ∆S

Vf Vf Vf
Reversible Isotherm Pi Vi = Pf Vf = RT RT ln Vi RT ln Vi R ln Vi
Reversible Adiabat Ti (Vi )R/cV = Tf (Vf )R/cV cV (Ti − Tf ) 0 0

Tf
Reversible Isochore Vi = Vf 0 cV (Tf − Ti ) cV ln Ti
6.2 The Carnot Cycle and Carnot’s Theorem

A process where the values of the state variables of the system in the final state are the same as in the initial
state is called a cycle. Thus in a cycle, by construction, ∆Ssys = 0 and ∆Esys = 0, so nothing happens to
the system. Cycles are useful in practice because they can modify the surroundings of a system, for instance
by absorbing a quantity of heat, partially converting it to work, and rejecting the balance of the heat (since
∆Esys = 0), as in an internal combustion engine. The extent of the conversion of heat into work is the
efficiency of the cycle, defined as
W |Qin | − |Qout |
η := = . (6.2.1)
|Qin | |Qin |
Why must cycles reject heat? If a cyclic process took out heat from a body at some temperature T1 and
converted it entirely into work, then the work could be used to heat up (e.g., by friction or Joule heating)
a second body at T2 > T1 ; after the cycle is complete, the only effect would be the transfer of a quantity
29
CHAPTER 6. THERMODYNAMIC PROCESSES AND CYCLES
of heat from a colder body to a hotter one. This is in violation of Clausius’s statement of the second law.
Hence,
The Second Law of Themodynamics (Kelvin) It is impossible to have a process whose only
effect is to convert into work heat extracted from a source at a single temperature.
Carnot’s cycle is a special cycle, characterized by

the fact that it exchanges heat only at two temper-
atures. It consists of two isothermal processes, at
TH and TC , and two adiabatic processes between
TH and TC ; all processes are reversible. Figure
6.1 represent the cycle for an ideal gas on the P V
plane. Starting at point A on the colder isotherm,
the gas is compressed adiabatically until the hotter
isotherm is reached (B), then expanded isothermally
until the desired amount of heat has been absorbed
(C). Then, the gas is expanded adiabatically until
the lower temperature is reached (D), and finally
the gas is compressed isothermally to the original
volume (A). The heat absorbed and the work done
at each step can be read off Table 6.1, from which
we find the efficiency
Figure 6.1: Representation of the Carnot cycle for an
TC
ηC = 1 − . (6.2.2) ideal gas on the (P, V ) plane
TH
The importance of this result is manifest through

the following statement: the efficiency of a Carnot cycle between two temperatures TH and TC is the
highest possible (Carnot’s theorem). The theorem is proven by contradiction: if Carnot’s theorem were
false, the second law of thermodynamics (Kelvin’s statement) would be violated. Imagine we have found a
cycle that takes a quantity of heat, |Q0H | from a reservoir at temperature TH , does work W 0 , and dumps
|Q0 |
|Q0C | = |Q0H | − W 0 into a reservoir at temperature TC , and |Q0C | < TTH
C
, so that η 0 > ηC . Now, since Carnot
H
0
cycle is reversible, we can use it in reverse to absorb heat |QC | from the reservoir at TC , which is then
ηC
restored to its pristine condition. Next, we use work W = 1−η C
|Q0C | to complete the reverse Carnot cycle
and dump |Q0C | + W into the hotter reservoir. Since ηC < η 0 , we have W < W 0 and |Q0C | + W < |Q0H |, so
the net result is that we have converted entirely into work a quantity of heat (|Q0H | − |Q0C | − W > 0) taken
from a single source at constant temperature TH , contradicting the second law.
Example: Reversible heat exchange
Example (Reversible heat exchange) Consider again the two identical blocks of Example 2 of
Sect. 5.6. However, now the blocks are not brought together. What process can be devised to
extract the maximum amount of work, Wmax , from them? What are the final temperature and the
entropy generated in the process? The blocks are two sources of heat at different temperatures. To
extract work with the maximum possible efficiency, we run a Carnot cycle between them. However,
since the blocks are not infinite reservoirs, their temperatures will change after each cycle (TH will
decrease, TL will increase). So we consider a sequence of infinitesimal Carnot cycles, absorbing dQH
from the hotter block, and rejecting dQL into the colder one. From the properties of Carnot cycles it
follows that dQ dQL
TH = TL , or dSH + dSL = 0 so the total entropy does not change in the process. But
H
T T T T √
∆SH = c ln THf and ∆SL = c ln TLf , so we have c ln THf TLf = 0, or Tf = TH TL . The maximum work
√ √
is W = QH − QL = c(TH + TL − 2Tf ) = c( TH − TL )2 and is obtained when the entropy change
is zero (reversible process).

6.3. CLAUSIUS’ THEOREM
6.3 Clausius’ Theorem

Let us now consider the quantity dq
H H
T , where denotes an integral over a cycle (i.e., around any closed
path in the state variables). In other words, consider a system undergoing a cyclic transformation, reversible
or not, and break it down into a large number of small steps, i = 1, . . . , N , with N large; at each step, let
temperature be well defined (T = Ti ), and consider the quantity of heat Qi (with sign!) exchanged by the
PN
system. The loop integral is then well approximated by i=1 Q Ti , N
i
→ ∞. We have seen in Sect. 5.6 that
δqrev = T dS, so we can state immediately that for a reversible cycle, rev dq = dqTrev = dS = 0. What can
H H H
T
we say for an irreversible cycle? In that case, dq
H
T < 0. This result follows from considering the discrete sum
approximation to the loop integral, and at each of the N steps introducing a Carnot cycle that restores the
heat to the surroundings. In so doing, the i-th cycle produces or uses up work and exchanges heat Q0i with
a common reservoir at some temperature T0 . All we have achieved is that the system effectively exchanges
heat with the same reservoir at T = T0 (rather than with the surroundings at variable temperature), and
it does so through an infinite
PN number ofPinfinitesimal Carnot cycles. In the i-th Carnot cycle, we have
N
Qi /Q0i = Ti /T0 , so that i=1 Q i
Ti = 1
T0
0
i=1 Qi . Now, we see that the only things that happen in this
PN
process (since the Carnot cycles and the systems end up unchanged) is that some heat, i=1 Q0i , is taken
from a source at constant temperature T0 and converted to work, which is impossible (Kelvin’s statement
PN
of the second law), so that means that work has actually been consumed, or i=1 Q0i < 0. But this means
PN Qi
that i=1 Ti < 0.
Example: Irreversible cycle
To illustrate how Clausius’ theorem works, consider the ideal gas

cycle shown at right. Process 1-2 is a Joule expansion, an irreversible
process; all other processes are reversible. Therefore the cycle is
irreversible and we expect the theorem to hold as a strict inequality.
To verify that this is the case, compute the heat into the system
for each process. For the Joule expansion and for the two adiabats,
Q = 0. For the isothermal compression at low temperature, Qc =
RT ln VV43 < 0. Hence, dq Qc
H
T = Tc < 0. Note that this cycle is not
very useful, as it converts work into heat; it has efficiency of negative
infinity!
6.4 Exercises
1. Verify the relations given in Table 6.1.
2. Relate explicitly the heat capacity at constant volume, cV , to the second derivative of entropy with
respect to energy, and show that cV > 0.
3. Show that a body for which (a) temperature starts to increase at constant volume while at the same
time (b) heat starts flowing out of the body cannot have been in equilibrium at the time T started to
increase.
4. An isolated box of volume 2V is separated into two volumes V1 and V2 > V1 by a sliding diathermal
partition. There is one mole of ideal gas on each side, and the temperature is initially Ti on both sides.
Assume that movement of the partition can be harnessed to extract work. Calculate the maximum
work that can be extracted from the system and the final temperature Tf .

CHAPTER 6. THERMODYNAMIC PROCESSES AND CYCLES

Chapter 7
Thermodynamic Potentials
7.1 The Concept of Free Energy

An immediate consequence of Clausius’ theorem is that, for any transformation that takes the system from
state A to state B (both states of equilibrium),
Z B
dq
≤ S(B) − S(A); (7.1.1)
A T
the equality holds iff the transformation is reversible. The proof is left as an exercise. Consider now an
isolated system. Then, no heat can be exchanged during a transformation. Thus, the left hand side of
Eq. 7.1.1 vanishes, and we find, again, that for an isolated system undergoing a transformation between two
states, the entropy will increase if the transformation is irreversible, and stay the same if it is reversible.
Consider instead a transformation at constant temperature. Then, the l.h.s. of Eq. 7.1.1 is Q/T , so that we
can write Q ≤ T ∆S; using the first law, Q − W = ∆E, we arrive at a bound for the amount of work that
can be done by a system during a transformation at constant temperature:
W ≤ T ∆S − ∆E. (7.1.2)
This result extends the concept of mechanical Potential Energy, W = −δP E to thermodynamics, where heat
exchanges are considered. It is natural to define a new thermodynamic potential,
F (T, V, N ) = E(S, V, N ) − T S = −P V + µN, (7.1.3)
such that the bound on the available work is W ≤ −∆F . Equality as usual implies a reversible transfor-
mation. The potential F is called the Helmholtz free energy.1 The word free energy signifies that it is the
maximum amount of energy available for the system to do work while in thermal equilibrium with its sur-
roundings. (Thermal equilibrium with the environment guarantees that heat can be exchanged reversibly.)
When the free energy has reached its minimum, no more work can be done by the system, which therefore
will have reached mechanical equilibrium as well. So the minimum of the Helmholtz free energy corresponds
to the stable state of the system in thermal equilibrium with the environment at temperature T , in analogy
with the maximum of entropy, which corresponds to the stable state of an isolated system. Note that in the
former case, the system is at constant temperature, while in the latter, it is the energy that is constant. But
T = ∂E/∂S suggests that the relation in Eq. 7.1.3 is not “casual,” but rather the result of a systematic way
of constructing potential functions. This construction is the mathematical tool called Legendre transform.
7.2 Systematic Construction of Thermodynamic Potentials

1 It is also sometimes indicated by the symbol A, from the German word “Arbeit” meaning work.
33
CHAPTER 7. THERMODYNAMIC POTENTIALS
Consider the convex function f (x) as sketched in the figure at right. Con-
vexity ensures that f is differentiable (except possibly at a finite number
of points, an important case we ignore for the moment) and that the first
df
derivative of f , dx , is monotonic. This means that each point on the
df
curve has a unique slope, s := dx , so that we can specify the function in
terms of its slope rather than in terms of x. We may be tempted to put
?
g(s) = f (x(s)), but that would not work, since knowing that s = 1 when
f = 2 leaves us with infinite choices for where to draw the line with slope
s that intersects the flat line y = 2. Clearly the information y = 2 is not
useful to us. What we need is the value of the y−intercept of the tangent
line to f ; in this way, we trade the information encoded in (x, f (x)) for
the information encoded in (slope, intercept) of the tangent line to f .2 The
figure illustrates the geometric significance of the transform. Analytically,
we can define the transform as
g(s) = min[f (x) − sx], (7.2.1)

x
where the operation of finding the minimum for all x explicitly demonstrates that the right hand side is no
longer a function of x. It is common, especially in thermodynamics,3 to write simply
g(s) = f (x) − sx, (7.2.2)
where it is understood that either s or x must be treated as a constant parameter; then, holding, say, s
df
constant, and differentiating with respect to x, one has s = dx , and vice versa, holding x constant, one has
dg
x = − ds . Two useful properties of the Legendre transform are
(i) the Legendre transform of a convex function is concave in the new variable;4
(ii) the Legendre transform of the Legendre transform of f is f .
With the aid of the Legendre transform, we immediately recognize the Helmholtz free energy F (T, V, N ) =
E − T S as the Legendre transform of the internal energy E(S, V, N ); see Eq. 7.1.3. Two more potentials are
used frequently in thermodynamics: the enthalpy
H(S, P, N ) = U + P V (7.2.3)
and the Gibbs free energy
G(T, P, N ) = F + P V = µN. (7.2.4)
2 To verify that information is preserved, consider for instance that if you were to color the plane below each tangent line
you would be coloring the plane below f , owing to the convexity of f .

3 Thermodynamics also uses a peculiar sign definition for the intercept of the tangent line; an alternative, symmetric, definition
of the Legendre transforms is f (x) + g(s) = sx.

4 This holds true for the customary definition of the transform in thermodynamics. With the alternative definition f (x) +
g(s) = sx, the Legendre transform of a convex function is convex.

7.3. STABILITY CRITERIA FOR THERMODYNAMIC POTENTIALS
Thermodynamic potential vs equation of state
An equation such as E = E(T, V, N ), which gives energy as a function of T rather than S, is an

example of loss of information entailed in expressing a thermodynamic potential as a function of
variables other than the “natural” ones without carrying out the full Legendre transform. Consider
an ideal gas, for which E = CV T . While this equation is useful (for instance in analyzing Joule
expansion), it is not complete. We can’t use it to derive the pressure or the chemical potential, for
example. The full thermodynamic potential can be recovered if we know enough equations of state.
Enough means one per degree of freedom (i.e., two for a one-component system). For instance, we can
use E = CV T along with P V = N kT and the Gibbs-Duhem equation to find µ (try as an exercise!):

1 P µ
Ed +Vd − Nd = 0.
T T T
Even then, we would be missing an integration constant for entropy (the third law and quantum
mechanics can help to determine it, as we will see).
7.3 Stability Criteria for Thermodynamic Potentials

In Sect. 5.5 the concavity of entropy was proven. It follows that the energy is a convex function of the
variables (S, V, N ), which means that the Hessian is positive definite. This implies that the responses (heat
capacity at constant volume and isentropic compressibility) are positive
∂ 2 E

∂T T
= = >0 (7.3.1a)
∂S 2 V,N ∂S V,N CV
∂ 2 E

∂p 1
= − = > 0; (7.3.1b)
∂V 2 S,N ∂V V κS
S,N
and that the determinant of the hessian matrix is positive (the proof is left as an exercise)
2 2
∂ 2 E ∂ 2 E

∂ E
> . (7.3.2)
∂S 2 V,N ∂V 2 S,N ∂S∂V N
Noting that the Legendre transform involves pair of “conjugate” variables, one of which is intensive and the
other extensive (by Euler’s homogeneity property), and the energy is a function of all extensive variables, we
conclude immediately that the thermodynamic potentials F, H, G must be convex functions of the extensive
variables and concave functions of the intensive ones. For example, after Legendre transformation to the
Helmholtz free energy, the stability conditions become
∂ 2 F

∂S CV
2
=− =− <0 (7.3.3a)
∂T V,N ∂T V,N
T
∂ 2 F

∂p 1
= − = > 0. (7.3.3b)
∂V 2 T,N ∂V V κT
T,N
The compressibility is now isothermal instead of isentropic, since the Legendre transform switched the
independent variable from S to T .
7.4 The Calculus of Thermodynamics: Maxwell Relations and Ja-

cobians
In solving thermodynamics problems, we are often required to switch between different potential representa-
tions. This may happen when we encounter the derivative of one variable that is not readily available, and

we wish to expressed as a derivative of some other variable accessible through an equation of state; or when
we want to reduce the second derivative of a potential to one of the three standard responses (heat capacity,
compressibility, thermal expansivity). There is nothing fundamental about these manipulations, other than
they are used to cast a calculation in terms of quantities that are easily measured in an experiment. There are
two useful mathematical trick to solve these kind of problems, Maxwell Relations and Jacobians. Maxwell
relations simply state that thermodynamic potentials are potential functions. That means that∂Pthe
order of
∂2F ∂2F ∂S
differentiation does not matter,5 so that, for example, ∂T ∂V = ∂V ∂T , which means ∂V T,N = ∂T V,N . Note

that the two variables will never be a conjugate pair!
Jacobians are determinants useful to compute derivatives with change of variables, such as r(x, y), s(x, y) →
r(u, v), s(u, v). The following relation holds:
∂r ∂r ∂x

∂x ∂r ∂r
∂x ∂y ∂u ∂v ∂u ∂v ,
∂s
∂s ∂y ∂y = ∂s ∂s (7.4.1)
∂x ∂y ∂u ∂v ∂u ∂v

which can be written short hand as
∂(r, s) ∂(x, y) ∂(r, s)

= .
∂(x, y) ∂(u, v) ∂(u, v)
We can use this relation conveniently even when there is just one partial derivative, but
we are changing
∂S ∂S
the variable to be held constant. For instance, say we want to switch from CV = T ∂T V
to CP = T ∂T P
.
Then, remembering ∂V /∂V =1, we have
V α2

CV ∂(S, V ) ∂(S, V ) ∂(T, P ) ∂S ∂V ∂S ∂V ∂P CP
= = = − = − , (7.4.2)
T ∂(T, V ) ∂(T, P ) ∂(T, V ) ∂T P ∂P T ∂P T ∂T P ∂V T
T κT
where we have used a Maxwell relation along with the definition of the isobaric expansion coefficient α to
∂S
write ∂P T,N
= − ∂V
∂T P,N := −V α.

5 Provided the second derivatives are continuous (Clairaut’s theorem).

7.5. EXERCISES
Application: Adiabatic Demagnetization
In this example we practice writing the first law for magnetic work rather than expansion work, using
Maxwell relations and Jacobi determinants, and using an analogy between magnetic and compression
work. Adiabatic demagnetization is the cooling of a magnetic sample analogous to the cooling of
a gas during an adiabatic expansion; it was discovered by W. Giauque and is employed to reach
temperatures in the mK range. It exploits the fact that the spins of a magnetic salt can be aligned in
a strong magnetic field, acquiring large negative potential energy; upon decreasing the field adiabati-
cally, this magnetic potential energy increases from a large negative to zero so that the kinetic energy
of microscopic motion must decrease: random motion of the magnetic nuclei decreases, and the salt
cools toward absolute zero. Here, we are interested in finding out how temperature changes with
∂T
changing magnetic field in the adiabatic process, i.e., ∂B S
, assuming the equation of state M T = B.
We can express this derivative using the Jacobian determinants:
∂(T, S) ∂(B, S) ∂(T, S) ∂(T, S)

= =− ,
∂(B, S) ∂(B, T ) ∂(B, T ) ∂(T, B)
where the − sign occurs because we switched columns in a determinant. The identity above is
rewritten as
∂T ∂S ∂S
=− .
∂B S ∂T B ∂B T
Of the partial derivatives of entropy, one is heat capacity at constant field, ∂S/∂T |B = CB /T , and
the other can be found through a Maxwell relation. So we must construct thermodynamic potentials.
The work done per unit volume on an isotropic sample of magnetic salt in increasing its magnetization
from M to M + dM in a magnetic field B is BdM . The first law in differential form is written as
dU = T dS +BdM. Noting the analogy −P → B, we write the Gibbs free energy as G = U −T S −BM
.

∂S

∂S

∂M
From here, we find the Maxwell relation expressing ∂B T as a function of M and T : ∂B = ∂T =
T B
− TB2 (using the given equation of state). Putting it all together, we reach the final result expressing
the temperature change in the adiabatic demagnetization in terms of experimental quantities:

∂T ∂S/∂B|T B T M
=− = 2 = .
∂B S ∂S/∂T |B T CB CB
This type of cooling is very effective at low temperature because CB vanishes as T → 0.
7.5 Exercises
1. Justify the following relations for two systems 1 and 2 initially in equilibrium at temperature T :
F (T, V1 , N1 ) + F (T, V2 , N2 ) ≥ F (T, V1 + V2 , N1 + N2 )
and
F (T, 2V, 2N ) = 2F (T, V, N );

then use them to show that the isothermal compressibility, κT = − V1 ∂V
∂P T , is positive semidefinite in
thermal equilibrium.
2. Consider an extremely crude model of a long chain molecule, made up by adding identical monomers
sequentially. Each monomer can be added in one of two configurations, straight or kinked, and for
simplicity assume that (a) straight and kinked occur with equal probability; (b) a straight monomer
contributes a length a to the length of the molecule, while a kinked one does not contribute. Thus, for
a molecule of N monomers, the maximum length is N a, and if n is the number of straight monomers,
the actual length is L = na.

(i) Write the probability distribution for the length of a molecule.

(ii) Consider now the molecule under tension τ at temperature T . Write the differential of the appro-
priate thermodynamic potential for this situation, neglecting volumetric expansion.
(iii) Find an expression for the tension (a function of a, N, n, T ) and explain its dependence on
temperature.
3. This exercise should help making sense of the Legendre transform. Why is the Helmholtz free energy
not a function of E? Consider a system with energy, volume, and number of particles E, V, N , and a
temperature bath at temperature Tb . Show that ∂F (Tb∂E
,E,V,N )
= 0 (i.e., F does not depend on E) if
system and bath are in thermal equilibrium.

Chapter 8
Statistical Mechanics in Thermal

Equilibrium
8.1 Probability Distribution at Constant Temperature

Consider two systems A and B that can exchange energy with each other
at constant volume and particle number. We know that the total energy
E = EA + EB is conserved, but EA and EB are microscopically fluctu-
ating quantities. For example, consider the molecular collision process,
illustrated in Fig. 8.1: during the momentum–and energy–conserving col-
lision (mediated by the diathermal wall) energy is transferred from the
left to the right of the partition. Since there is no volume change, en-
ergy is exchanged in the form of heat. Now, we know that there cannot
be a macroscopic transfer of energy, or else the temperature on one side
would become higher than on the other side, which would make energy
flow in the opposite direction. In other words, the process in Fig. 8.1 is
microscopically allowed to take place, but macroscopically, the energy
transferred through the diathermal wall must average out to zero in ther-
mal equilibrium. In other words, focussing on one side of the partition, Figure 8.1: Microscopic pro-
we expect the system to be able to access any microstate i of arbitrary en- cess of energy transfer through
ergy Ei , with probability pi (E
Pi , V, N ), and subject to the constraint that a diathermal wall
the average energy, hEi = i pi Ei , is constant. To find out the prob-
ability distribution {pi }, we maximize its entropy subject to the energy
constraint and to the normalization condition:
∂ X X
(S + α pi − β pi Ei ) = 0, (8.1.1)
∂pi i i
where the negative sign in front of the constant β is inconsequential and has been chosen for later convenience.
Proceeding as in the example of Sect. 4.5, we find
X
∂ X X
−k pi ln pi +kα pi −kβ pi Ei = −k ln pi −k +kα −kβEi = 0 =⇒ pi = eα−1 e−βEi . (8.1.2)
∂pi i i i
P
The parameter α can be eliminated by imposing normalization of probability, i pi (α) = 1, which implies
e−βEi
pi = P −βEi . (8.1.3)
ie
In contrast to isolated systems, the probability distribution of the microstates of a system in equilibrium
with a reservoir is no longer uniform. It is customary to refer to isolated systems as “microcanonical” and
39
CHAPTER 8. STATISTICAL MECHANICS IN THERMAL EQUILIBRIUM
to systems in thermal equilibrium as “canonical.” To identify the constant β, rewrite the first equality in
Eq. 8.1.2 in differential form,
(−k ln pi − k)dpi + kαdpi − kβEi dpi = 0,
P
sum over all i, and note that i dpi = 0. The equation then becomes
dS − kβdhEi = 0;
recalling that the system is in equilibrium and at constant volume so dhEi = δqrev , this implies
1
β= . (8.1.4)
kT
Example: Two-level system revisited
Consider N distinguishable
particles in a two-level system with total energy E. We showed in Sect. 4.3
N
that S(E, N ) = k ln n(E) . Starting from this result, we want to describe the same system, but in
n(E) E
terms of temperature,
rather than total energy.
Putting p := N = N and using Stirling’s formula,
one finds S = −kN p ln p + (1 − p) ln(1 − p) . Next, calculate
1 ∂S ∂S ∂p 1 ∂S k p
= = = = − ln .
T ∂E ∂p ∂E N ∂p 1−p
This formula can be inverted to express p as a function of temperature:
e−β
p= .
1 + e−β
The probability that the particles have energy at temperature T agrees with Eq. 8.1.3.
8.2 The Canonical Partition Function

Consider the structure of the probability of a state at constant temperature, Eq. 8.1.3. The numerator, the
exponential of negative the energy of a state in units of kT , is called the Boltzmann factor. It tells us that
the probability of any one state decreases exponentially with energy. However, it does not tell us anything
about the number of states at a particular energy level E` , or in an infinitesimal energy range (E, E + dE).1
The number of states at any given energy is taken into account in the denominator of Eq. 8.1.3, which is
the sum over all states of the Boltzmann factors and is called, well, sum over states or canonical partition
function, and denoted by the letter Q or Z:2
 X X

 e−βEi = g` e−βE` (discrete levels);

 i `
Q(T, V, N ) = Z (states) (energy levels) (8.2.1)

 d3N xd3N p −βH({x},{p}) R −βE

 e = dEg(E)e (continuous phase space).
N !h3N
The factor of N !h3N in the phase space integral was introduced in Sect. 4.4.
From the partition function, with the aid of Eq. 8.1.3, we can calculate all thermodynamic quantities, as
functions of (T, V, N ) or equivalently of β, V, N . The internal energy is obtained as
X X e−βEi ∂ln Q
E(T, V, N ) = hEi = Ei pi = Ei P −βEi = − , (8.2.2)
i i ie ∂β
1 The former is often called the degeneracy of the energy level, and indicated by g , while the latter (the density of states)
`
we have already encountered in Sect. 4.4.
2 Z is from the German “Zustandssumme,” sum over states.

8.3. ENERGY FLUCTUATIONS AND HEAT CAPACITY
while entropy is obtained as
e−βEi
−βEi
X X e
S(T, V, N ) = −k pi ln pi = −k P −βE ln P −βE = kβhEi + k ln Q. (8.2.3)
ie ie
i i
i i
Rearranging, we find ln Q = −β[E(T, V, N ) − T S(T, V, N )], or
Q = e−βF (T,V,N ) , (8.2.4)
where F (T, V, N ) is the Helmholtz free energy. This shows that Q contains all information about the
thermodynamic properties of the system.
Partition Function of the Ideal Gas
The classical ideal gas is a straightforward application of Eq. 8.2.1. Consider N noninteracting parti-
N
X p~2i
cles confined to a volume V and in equilibrium at temperature T . The Hamiltonian is H = .
i=1
2m
Therefore,
N 3N/2 N
d3N xd3N p −β PN VN VN
Z
~2
Z pi βp2 2πmkT 1 V
Q= e i=1 2m = d3 pe− 2m = = ,
N !h3N 3N
h N! N! h2 N ! λ3
where λ is the thermal de Broglie wave length (Eq. 5.3.10). From here, the Helmholtz free energy is
calculated (with v = V /N and using Stirling’s formula) as
λ3

V
F = −N kT ln 3 + kT ln N ! = N kT ln − N kT,
λ v
the pressure as
∂F N kT
P =−
= ,
∂V V
and we can check that the entropy agrees with the Sackur-Tetrode expression (see exercises).
Note that in the example above the partition function factorizes into the product of single particle
partition functions,
QN
QN = 1 . (8.2.5)
N!
This is a general consequence of the fact that the Hamiltonian of noninteracting particles is the sum of single
particle Hamiltonians. In the general case of interacting particles, this does not happen, and the partition
function is usually impossible to calculate exactly. Consider, for instance, the case of a fluid of particles
interacting through a pairwise potential, v(~ri − ~rj ). The partition function then becomes
N
d3N rd3N p p~2i
Z Z 3N
X XX 1 d r XX
Q= exp −β − β v(~
ri − ~
rj ) = exp −β v(~
ri − ~
rj ) . (8.2.6)
N !h3N i=1
2m i
N ! λ3N i
j6=i j6=i
The configurational integrals can be tackled by numerical methods or through various approximation strate-
gies.
8.3 Energy Fluctuations and Heat Capacity

Among the properties of a system that are readily accessible through the partition function is the heat
capacity. Recall that
∂E β ∂E
CV = =−
∂T T ∂β

at constant V, N . Thus, from Eq. 8.2.2, it follows that

2
β 1 ∂2Q

β ∂ ∂ln Q 1 ∂Q 1 1 2
hE 2 i − hEi2 =

CV = = 2
− 2 = 2 2
h E − hEi i, (8.3.1)
T ∂β ∂β T Q ∂β Q ∂β kT kT
where we have used the fact that the partition function generates the n-th moment of E through its n-th
order derivative with respect to β. Equation 8.3.1 confirms that heat capacity is nonnegative, since it is the
average of a squared quantity (the mean square fluctuation of energy).
Heat capacity of two-level system
Consider a system with two nondegenerate energy levels E0 = 0 and E1 = . In this simple case, the
e−β
moments are given by hE n i = n 1+e −β by straightforward calculation. It is instructive, however, to
Q e−β
derive them by differentiation of the partition function Q = 1 + e−β . Then, hEi = − ∂ln
∂β = 1+e−β
∂ Q 2
2 e−β
and hE 2 i = (−1)2 Q∂β 2 = 1+e−β
. The heat capacity is
1 2 2 1 2 e−β
CV = (hE i − hEi ) = , (8.3.2)
kT 2 kT 2 (1 + e−β )2
or, in terms of the dimensionless parameter x = β,

ex
CV = kx2 . (8.3.3)
(1 + ex )2
Note that the heat capacity has a maximum around x = 2.4 or kT =

0.42 (the so-called Schottky anomaly) and vanishes in both high
and low temperature limits. The result has a simple interpretation
in terms of fluctuations: in those limits, the system is either “frozen”
in the lower energy level, or it populates both equally; in either case,
a small change in temperature cannot elicit an appreciable response.
On the other hand, when kT is of order , thermal fluctuations are
very effective at promoting excursions between the two energy levels,
causing a peak in the heat capacity.
8.4 Equivalence of Microcanonical and Canonical Routes to Ther-

modynamics
We stated earlier on that all information is encoded in the multiplicity of a system, W (E, V, N ) or equiv-
alently, in the density of state g(E, V, N ), from which thermodynamics follows from Boltzmann’s entropy
formula. In which way are the microcanonical and canonical approaches related? The answer is, mathemat-
ically, very simple, and physically quite deep. The canonical partition function is the Laplace transform of
the density of states in phase space:
Z
Q(β) = L(g) = dEg(E)e−βE . (8.4.1)
This relation implies that the logarithm of Q is related to the logarithm of g(E) by a Legendre transformation,
which is just what connects Helmholtz free energy to entropy (or energy). The relation between Laplace
and Legendre transforms becomes exact in the thermodynamic limit, which means for N → ∞. This can be
shown by evaluating the integral in Eq. 8.4.1 using the saddle point method. First note that the integrand
of Eq. 8.4.1 is the product of two factors, the density of states and the Boltzmann factor; the first grows
rapidly with energy,3 while the second decreases exponentially with it. We therefore expect the product to
3 Although we have shown this explicitly only for the ideal gas, see Eq. 4.4.1, we can easily see that the conclusion remains
valid in the presence of interactions.

8.5. PARTITION FUNCTION OF THE CLASSICAL HARMONIC OSCILLATOR
be strongly peaked at a particular value of energy, E ∗ . Next, we rewrite the integrand as

−βE −1 S(E)
g(E)e = ∆ exp − βE ,
k
where ∆ is some arbitrary constant with dimensions of energy, which we disregard hereafter; next, note that
in this way, it takes on the form e−N φ(η) with η = E/N and φ(η) = βη − S(η, v, 1)/k. The maximum of the
integrand corresponds to the minimum of φ, so we can solve for = E ∗ = N η ∗ by setting dφ
dη = 0. We find

∂S(η, v, 1) ∂S(E, V, N ) 1
= = .
∂η
η=η ∗ ∂E
E=E ∗ T
Proceeding to the second derivative, we find
∂ 2 φ ∂ 2 S(η, v, 1)

N ∂ 1 N
2
=− 2
=− = ,
∂η η=η∗ k∂η
η=η ∗ k ∂E T E=E ∗ kT 2 CV

which is indeed a minimum since CV > 0. Expanding η in Taylor series, we find

2 (E−E ∗ )2
−N φ(η ∗ )− 2kTN2 C (η−η ∗ )2 ∗ −
e−N φ(η) ≈ e V = g(E ∗ )e−βE e 2kT 2 CV
. (8.4.2)
√
The gaussian integral then evaluates to 2πkT 2 CV . Upon taking the logarithm, it contributes a term
O(ln N ), which is much smaller than the extensive, O(N ), terms E ∗ − T S(E ∗ ) in the thermodynamic limit.
This is a very important result because it shows that
(a) the energy of a system in equilibrium at constant temperature is gaussian distributed in the thermody-
namic limit;
(b) since the standard deviation of the distribution is proportional to N 1/2 while the mean is proportional
to N , the distribution approaches a δ-function in the thermodynamic limit: almost all samples of the energy
of a macroscopic system yield the average value.
Thus, it does not matter whether we derive thermodynamics starting from a large isolated system with con-
stant energy, or from a large system in equilibrium with a constant temperature heat bath: both descriptions
will predict the same thermodynamic properties.
8.5 Partition Function of the Classical Harmonic Oscillator

One of the most useful models in physics is the harmonic oscillator. It describes the motion of a particle
displaced slightly from its equilibrium position and subject to a linear restoring force. It can be used, for
instance, as a crude model of a crystal, where an atom oscillates about its equilibrium position at each lattice
site. If ω is the angular frequency of the oscillation, the Hamiltonian is given, in one dimension, by
p2 mω 2 x2
H= + . (8.5.1)
2m 2
The partition function involves momentum and coordinate integrations. The momentum integral is a gaussian
integral, exactly the same as for the ideal gas; the coordinate integral is also gaussian, so it yields the same
dependence on temperature4
p2 mω 2 x2
Z
dpdx 1
Q= exp −β exp −β = . (8.5.2)
(2π~) 2m 2 β~ω
The thermodynamical quantities of interest are the internal energy
∂ln Q ∂ln kT /~ω

hEi = − =− = kT (8.5.3)
∂β ∂β
4 We use the customary notation ~ = h/2π.

and the heat capacity

dhEi
CV = = k. (8.5.4)
dT
This result is a consequence of a general theorem
Classical Equipartition Theorem Each degree of freedom that contributes quadratically to the
classical Hamiltonian of a system will contribute an amount kT
2 to its internal energy and an amount
k
2 to the heat capacity.
For a crystal with N atoms, classical equipartition implies a constant heat capacity CV = 3N k, as there
are 6 degrees of freedom per atom that contribute a quadratic term to the Hamiltonian. This is actually
observed for most crystals at room temperature, and is know as the law of Dulong-Petit. However, the law
fails disastrously at low temperature, where the heat capacity is found experimentally to go to zero as T 3 .
Of course, we already knew that it must fail if entropy is well defined (Sect. 5.8).5
Example: Rigid Rotator
Consider a diatomic molecule, like N2 . Regarding it as a rigid dumbbell aligned with the ẑ-axis,
p2x +p2y +p2z
we can write the Hamiltonian as the sum of the kinetic energy of the center of mass, 2M ,
L2 +L2
and the rotational energy, or kinetic energy in the center of mass, x2I y . Here, I is the moment
of inertia about the x̂- or ŷ-axes (Iz =0, so there is no degree of freedom or energy associated with
rotation about the molecular axis). Since the Hamiltonian has five quadratic degrees of freedom, by
the equipartition theorem, CV = 52 k.
8.6 Partition Function of the Quantized Harmonic Oscillator

In quantum mechanics, it is often convenient to specify the states of a system by the eigenvalues of the
Hamiltonian and their degeneracies. These are all the possible values of energy that can be measured, and
the degeneracy reflects any other degree of freedom that is indifferent to the measurement and can take on
their own values independently.6 Other ways of describing the state of a system are equally valid; the energy
route is particularly convenient because of the structure of the canonical partition function. The energy
levels of Hamiltonian 8.5.1, found by solving the Schrödinger equation, are
~ω
En = ~ωn + , n = 0, 1, . . . , ∞ (8.6.1)
2
and are singly degenerate (gn = 1 for all n). The quantity ~ω is often referred to as “quantum of energy” and
n is the number of such quanta present at a given energy, or “occupation number.” The partition function
is calculated by summing the geometric series
∞
β~ω X β~ω 1 1
Q = e− 2 e−β~ωn = e− 2 = . (8.6.2)
n=0
1 − e−β~ω 2 sinh β~ω
2
The internal energy is
∂ln sinh β~ω

∂ln Q 2 ~ω β~ω 1 1
hEi = − = = coth = ~ω + β~ω (8.6.3)
∂β ∂β 2 2 2 e −1
and from it, we recognize that
hEi 1 1
hni = − = β~ω , (8.6.4)
~ω 2 e −1
5 The classical gas heat capacity result is also invalid at low temperature. This miserable failure of classical statistical
mechanics is in fact what prompted the initial discovery of quantum mechanics.

6 For example, energy may not depend on orbital angular momentum, spin, or polarization.

8.7. HEAT CAPACITY OF CRYSTALS AND BLACKBODY RADIATION
which is sometimes called the average occupation number. The heat capacity is
2
dhEi dhEi β~ω/2
CV = = −kβ 2 =k . (8.6.5)
dT dβ sinh(β~ω/2)
How does this formula compare to the classical case? In general, we compare quantum mechanical to classical
x
results by setting ~ → 0. Here, we see that, since sinh x → 1 for x → 0, we recover the equipartition result
in the classical limit. Now let us study the important limits of high and low temperature, β → 0 and
β → ∞, respectively. We immediately realize that the high T limit and the classical limit coincide, since β
multiplies ~. Hence, in the high temperature limit, the quantum mechanicl oscillator behaves like a classical
oscillator. In the low T limit, however, β → ∞ causes the denominator to diverge exponentially, winning
over the linearly diverging numerator. Therefore, the heat capacity vanishes in this limit, unlike predicted
by classical equipartition. This result might have been expected, of course, once it is realized that, at T = 0,
any system with discrete energy levels looks like a two level system (looking up from the ground state, all
that matters to the system is the first rung in the energy ladder).
Example: Heat Capacity of Diatomic Gases
Consider a diatomic molecule. This time, take into account the molecular bond between the two
atoms; think of it as a stiff spring rather than a rigid dumbbell. Although the experimental heat
capacity of a diatomic molecule at room temperature, CV = 52 k, is explained purely by its translational
and rotational degrees of freedom, the bond should contribute as well. Typical bond energies are
~ω ∼ 0.1 − 0.5 eV, while at room temperature kT ≈ 0.025 eV. This means that the vibrational
degree of freedom is “frozen out”: the molecule is sitting in the vibrational ground state as if it were
at zero temperature. At higher than room temperatures (e.g, ∼103 K) the heat capacity becomes
indeed 72 k. Conversely, at lower than room temperatures (<102 K) the rotational degrees of freedom
freeze out: due to quantization of angular momentum (L2 = ~2 `(` + 1), ` = 0, 1, . . . ), the molecule
2
eventually settles in the rotational ground state when kT ~2I . Although the mechanical origin of the
energy spectrum is different, statistical mechanics provides a unified explanation of the experimental
observations. However, the vanishing of CV at T = 0 in spite of the remaining translational degrees
of freedom has a completely different origin.
8.7 Heat Capacity of Crystals and Blackbody Radiation

The harmonic oscillator also describes the dynamical behavior of the normal modes of a many-particle,
interacting mechanical system close to mechanical equilibrium, since in those conditions the potential energy
can be expanded in Taylor series of the coordinates through second order, and the resulting quadratic form
can be diagonalized. For an isolated system of N particles, with 3N coordinates, there will be 3N − 3
normal modes, the eigenvectors of the quadratic form in the center of mass (the center of mass can translate
freely). Let α be an index labelling the normal modes; it will take on 3N − 3 values. Thus, a description
of a system using the normal modes contains the same information as the original description in terms of
real space coordinates and momenta: the state of the system is specified by listing the occupation numbers
nα of each normal mode. The normal modes behave approximately as independent harmonic oscillators,
each eigenvector having a particular eigenfrequency, ωα . For a crystal, the lowest energy normal modes are
sound waves, and are called the acoustic phonons. They are quantized long wave length oscillations of the
crystal’s unit cells about their equilibrium positions on the crystal lattice; the index α is the wave vector ~k
of the oscillation.7 Since the normal modes are independent, the partition function for each polarization is
the product of the single-mode partition functions
Y 1
Q= ~
. (8.7.1)
~
k 2 sinh β~ω(
2
k)
7 There are three polarization states for each vibration, one longitudinal and two transverse, which must also be accounted
for by proper indexing. If we approximate them as degenerate, we can simply multiply occupation numbers by a factor of 3.

The internal energy of the crystal is

∂ln Q X 1 1
hEi = −3 =3 ~ω(~k) + , (8.7.2)
∂β 2 eβ~ω(~k) − 1
~ k
where the factor of 3 takes into account the three possible polarization states. The sum over normal modes
is a sum over a very dense reticle and can be replaced by an integral. After some bookkeeping, one arrives
at
TD /T
T3 x3
Z
E = N 9kT 3 dx , (8.7.3)
TD 0 ex−1
where the “Debye temperature” is set by the energy of the highest vibrational mode (the mode with shortest
wave length, comparable to the unit cell size). It is interesting to examine the two limiting cases of low and
high temperature:
T3 ∞ x3
 Z
9N kT 3

 dx x = AT 4 T TD ;
TD 0 e −1
E= (8.7.4)
T 3 TD /T
Z
dxx2 = 3N kT T TD ,

9N kT 3

TD 0
which imply for heat capacity

(
4AT 3 T TD ;
CV = (8.7.5)
3N k T TD .
Classical equipartition is recovered at high temperature, while at low temperature the heat capacity has a
power law dependence on temperature. This power law behavior is brought forth by the long wave length
phonon modes, and is observed experimentally in insulating crystals (the heat capacity of conductors has a
contribution from free electrons). It is called the Debye specific heat.
The low temperature limit of the equations just derived describes correctly another important physical
system, blackbody radiation. Blackbody radiation is the name given to electromagnetic field oscillations in
a cavity in thermal equilibrium.8 A blackbody emits radiation with a spectral distribution characteristic of
temperature only that be derived from the partition function in Eq. 8.7.1 proceeding in way similar to the
crystal, except for the facts that (a) photons only have two polarization states; (b) there is no underlying
lattice, so there is no minimum wave length; and (c) there is no fixed number of underlying degrees of
freedom. When this is taken into account (details left as an exercise), the energy density is calculated as
E π 2 (kT )4
= . (8.7.6)
V 15 (~c)3
Multiplying by the speed of light, one obtains the energy flux radiated by the black body, known as the
Stefan-Boltzmann law.
8 Thename derives from the fact that a small hole in the cavity wall will absorb any photon incident upon it (perfect absorber,
hence “black”); it must also be a perfect emitter if it has to remain in equilibrium at constant temperature.

8.8. EXERCISES
Blackbody Thermodynamics
An instructive exercise in the application of thermodynamic potentials is to work out the thermody-
namic properties of blackbody radiation. We can begin from our calculation of the energy equation
of state,
E = bV T 4 ,
where b is a constant, from which we also have
CV = 4bV T 3 .
According to Sect. 7.2, this is a false start, because E(T, V, N ) is an equation of state, not a fun-
damental relation. Of course, we could evaluate the partition function and get the free energy from
there, but it turns out that this is not needed. The key point is that, as we have already observed,
the number of particles (electromagnetic oscillators, or photons) is fluctuating. The thermodynamic
potentials do not depend on it, which means that the chemical potential is zero. Hence, we do have
an additional equation of state, µ = 0, to complement the energy equation of state, so the problem
is not underdetermined. We can proceed to calculate entropy by integrating the heat capacity:
Z T
2 4
S(T, V ) = S(0) + dT 0 4bV T 0 = bV T 3 .
0 3
Then, the Helmholtz free energy is

1
F (T, V ) = E − T S = − bV T 4 .
3
From here, we can calculate the pressure,

∂F 1
P =− = bT 4 ,
∂V T 3
and we obtain the pressure equation of state of a gas of photons,

E
PV = .
3
Note the factor of 2 difference from the ideal gas (Eq. 5.3.1), which is due to the fact that photons are
relativistic particles, so their energy has a linear dependence on momentum, instead of a quadratic
one.
8.8 Exercises
1. Starting from the ideal gas partition function, calculate the entropy and show that it agrees with the
Sackur-Tetrode expression.
2. Show that for a classical gas the probability distribution of the velocities is Maxwellian (cf. Sect. 3.10).
V
P R 3
3. Work out Eq. 8.7.6 from a suitable modification of Eq. 8.7.2. (Hint: replace k with (2π) 3 d k; you
x3 π4
R
will need dx ex −1 = 15 . You should shift the energy of all oscillators to be zero in the ground state to
obtain the desired result.)
4. Calculate how pressure varies with volume for a reversible adiabatic expansion of a photon gas.
2 4
5. Consider a classical particle in the potential well V (x) = a x2 + b x4 , with a and b positive constants.
Use a Taylor expansion to calculate approximately the partition function Q, the average energy hEi,

and the heat capacity CV through 1st order in b. State a meaningful (dimensionless) criterion for
the validity of the approximation. Explain on physical grounds why CV is smaller than for the pure
harmonic potential.

Chapter 9
Statistical Mechanics of Open Systems
9.1 Equilibrium under Particle Flux

Consider a one-component, two-phase system, where the particles can move between phase A and phase B;
the two phases are at constant and equal temperature T and pressure P . We already know, from Gibbs’
phase rule, that P and T cannot be independent; now, we show that the two phases in equilibrium must have
the same chemical potential. At constant T and P , and total number of particles N = NA + NB , the system
is in equilibrium when the total Gibbs potential is minimum: dG = dGA + dGB = µA dNA + µB dNB = 0.
Since N is constant, dNA = −dNB , so µA = µB . Thus, chemical potential differences are the driving force
for particle exchange, much in the same way as temperature gradients drive energy (heat) exchange. Note
that the condition µA (P, T ) = µB (P, T ) implies that P cannot be independent of T , in agreement with the
phase rule.
Clausius-Clapeyron Equation
As an application, let us determine the shape of the liquid-vapor coexistence curve. Consider a liquid
in equilibrium with its vapor. As we take a step along the curve, we have that the chemical potentials
of the two phases remain equal. Thus, dµv = dµl . Since dµ = −sdT + vdP , with s and v specific
entropies and volumes, we have (sv − sl )dT = (vv − vl )dP , and noting that T (sv − sl ) = ∆hvap is
the specific enthalpy of vaporization, also called latent heat, and vv vl ≈ kT /P (away from the
critical point!) we find
dP ∆hvap P
= ,
dT kT 2
and, integrating, we find
1 1
P = P0 exp ∆hvap − . (9.1.1)
kT0 kT
9.2 Probability Distribution of an Open System at Constant (T, V, µ)

Consider a system at constant (T, V, N ). Let its partition function be Q(T, V, N ) = e−βF (T,V,N ) . If we carve
out of it a subsystem in a small volume, V1 , in equilibrium it will have the same temperature T , but a
variable number of particles N1 , with 0 ≤ N1 ≤ N . Let us neglect the interactions between the subsystem
and the rest of the system, except for the possibility of particle exchange. Then,1
N
X
Q(T, V, N ) = Q(T, V − V1 , N − N1 )Q(T, V1 , N1 ), (9.2.1)
N1 =0
1 Note N
that we should not include the multiplicity factor of N to count all the ways of splitting the particles between the
1
two subsystems. The Gibbs prescription already accounts for that, as is verified easily.
49
CHAPTER 9. STATISTICAL MECHANICS OF OPEN SYSTEMS
which suggests that we take as the properly normalized probability of N1 particles being in V1 the expression
Q(T, V − V1 , N − N1 )Q(T, V1 , N1 )
p(N1 ; T, V1 ) = = Q(T, V1 , N1 )e−β[F (T,V −V1 ,N −N1 )−F (T,V,N )] . (9.2.2)
Q(T, V, N )
Now recall that since V1 V , the complement system of V1 can be viewed as a reservoir of heat and
particles, in the sense that in equilibrium, all its average extensive properties will be much larger than those
of the system (in the ratio V : V1 ); therefore we can Taylor expand the free energy of the reservoir and use
∂F ∂F
the relations that ∂V = −P and ∂N = µ. Now we let N → ∞ and V → ∞ for the reservoir, and we drop
the subscripts “1” from N1 and V1 for the system to obtain the probability distribution
p(N ; T, V ) = Q(T, V, N )eβµN e−βP V . (9.2.3)
Rearranging, summing over all particles in the open subsystem, and using the normalization of probability,
we find
∞
X ∞
X
eβP V p(N ; T, V ) = eβP V = Q(T, V, N )eβµN . (9.2.4)
N =0 N =0
The rhs of the equation is called the grand canonical partition function, denoted by Ξ:
∞
X
Ξ(T, V, µ) = eβµN Q(T, V, N ). (9.2.5)
N =0
The thermodynamic potential associated with it is called the Grand Potential Ω, a.k.a. the Landau Potential
or Landau free energy, defined by
Ξ(T, V, µ) = e−βΩ . (9.2.6)
For a P V system, it is obvious from Eq. 9.2.4 that Ω = −P V . To generalize to any type of work, it is
convenient to express the Landau potential directly as the Legendre transform of the Helmholtz free energy
with respect to the particle number:
Ω(T, V, µ) = F (T, V, N ) − µN. (9.2.7)
The grand canonical partition function is often written in terms of fugacity, defined as
z = eβµ , (9.2.8)
so that one has

∞
X
Ξ(T, V, z) = z N Q(T, V, N ). (9.2.9)
N =0
Since the system can exchange both energy and particles with a reservoir, they are both stochastic variables
with expectations given by2

∂ ln Ξ
hN i = z (9.2.10a)
∂z T,V

∂ ln Ξ
hEi = − . (9.2.10b)
∂β V,z
Similar to the relation between microcanonical and canonical distributions, the grand canonical also pro-
vides an equivalent description to the canonical distribution, in the sense that the particle number, albeit
fluctuating, is Gaussian distributed and strongly peaked around the canonical value (the average value). The
variance of the particle number fluctuation is given by
∂ 2 ln Ξ(T, V, µ) κT
hN 2 i − hN i2 = (kT )2 2
= hN ikT , (9.2.11)
∂µ v
2 Itis important to remember what variables are kept constant. A different expression for the energy is obtained by working
at constant µ rather than z! (See exercises.)

9.3. ADSORPTION EQUILIBRIUM
where κT is the isothermal compressibility, which cannot, therefore, be negative. Just as we saw for the
energy fluctuation, the mean square particle fluctuation is an extensive quantity, which implies that the
standard deviation vanishes as hN i−1/2 in the thermodynamic limit.
Ideal gas in the grand canonical ensemble
The grand canonical partition function of the ideal gas is

∞ ∞
X X VN zV
Ξ(T, V, z) = z N Q(T, V, N ) = zN = e λ3 . (9.2.12)
λ3N N !
N =0 N =0

Thus, ln Ξ = zV ∂ ln Ξ
λ3 and hN i = z ∂z T,V = ln Ξ. But ln Ξ = βP V , so βP V = hN i, which is the
equation of state of the ideal gas.
9.3 Adsorption Equilibrium

Physical adsorption is a phenomenon that takes place at phase boundaries or interfaces, where molecules
from a bulk phase tend to segregate onto the boundary because of favorable interaction with the other
phase. For example, gas molecules in a container can adsorb onto the walls, and possibly build a thick film.
Physical adsorption is driven by van der Waals attraction and does not involve the formation of new chemical
species, but may or may not involve the formation of new phases. When chemical bonds are formed, one
talks of chemisorption. The binding of ligands to an enzyme is conceptually similar to the adsorption of gas
molecules onto a substrate.
Pictured at right is a lattice model of a surface in equilibrium with a
bulk phase. There are M sites altogether, of which N are surface sites,
and m < M particles altogether; of these, n ≤ N ≤ m particles can
reside on the surface. An atom in the bulk has energy E = 0 (ideal lattice
gas), and an atom on the surface has energy E = − < 0. The partition
function of the system can be written conveniently after listing all possible
energy states, which are En = −n, n = 0, . . . , N , and their degeneracies,
−N
which are gn = N
M
n m−n :
N N
X
−βEn
X M −N N βn
Q= gn e = e . (9.3.1)
n=0 n=0
m − n n
To examine the behavior of the system as a function of temperature, we note that for T → ∞ (β = 0), the
Boltzmann factors are all unity, so all M sites are equivalent, and Q = M m , which is what the summation
evaluates to (Vandermonde convolution); in the opposite limit, T → 0 (β → ∞), the sum is dominated by
the largest term (n = N ), so all surface sites are filled.
To examine the behavior of the system as a function of gas pressure, let us assume N m. In this case,
the surface can be regarded as an open system in equilibrium with a particle reservoir, which we take to be
an ideal gas, so we can put
P
µads = µig = kT ln .
P0
The grand canonical partition function of the adsorbate is
N
X N βn
Ξ(T, N, z) = zn e = (1 + zeβ )N , (9.3.2)
n=0
n
with z = eβµig = P/P0 . Note the form of the partition function as the product of N single-site partition
functions, due to the assumption that sites are independent (and distinguishable). The surface coverage, θ,

is found from Eq. 9.2.10:
hni 1 ∂ ln Ξ ∂ ln(1 + zeβ ) zeβ KP

θ= = =z = = , (9.3.3)
N N ∂ ln z ∂z 1 + zeβ 1 + KP
with K = eβ /P0 . This is the Langmuir isotherm. Rearranging the Langmuir isotherm equation, we can
write
θ
= K,
(1 − θ)P
which makes it manifest (at least to chemistry students) that K is the equilibrium constant for the process
S + X
SX binding the substrate to the adsorbate.
Example: Cooperative Binding
The structure of the site partition function of the adsorbate reveals that it is a polynomial of degree
equal to the number of adsorbed particles per site (or in the number of ligands per enzyme). This
is a general feature of the grand canonical partition function, which is a polynomial in z for finite
number of particles. For historical reasons, the partition function for ligand adsorption goes by the
name of binding polynomial. A plot of the average site occupation number, θ, versus pressure (or
bulk phase concentration [L]) is often used to extract information about cooperativity in the binding
process. Positive cooperativity means that subsequent binding events are facilitated by previous
binding events, e.g., when the first ligand deforms the enzyme such as to make it more favorable to
bind the next ligand. For instance, assume a two-ligand adsorption process where the second ligand
has a much more favorable adsorption energy; then, by the definition of the equilibrium constant K
in Eq. 9.3.3, we see that the isotherm is dominated by the quadratic term of the polynomial. This can
result in an isotherm with a sigmoidal shape. The effective degree of the polynomial (which reveals
the effective number of ligands cooperating in the binding event) can be extracted from the slope of
θ
the logarithmic plot of ln 1−θ vs ln[L]. Positive cooperativity is famously observed in the case of O2
adsorption by hemoglobin, where the degree of the polynomial is 4 and the slope of the Hill plot is
about 2.8.
9.4 Ideal Fermi Gas

In the previous section, we have derived the partition function of a surface where each site can be either
empty or singly occupied (Eq. 9.3.2). Because each site was independent (i.e., particles on different sites did
not interact with each other), we were able to write the partition function as a product of single site factors,
dependent on site energy, chemical potential, and temperature. A gas of noninteracting Fermions can be
treated in exactly the same way, with the proviso that now “site” refers to a quantum state of the Fermi
particles. Quantum states are specified through a complete set of quantum numbers, which for an ideal
quantum gas in a box of volume V are simply the values of quantized momentum, ~~k, and spin, s = ±1/2.
The energy of electrons in the state ~k, s is
~2 k 2
~k = , (9.4.1)
2m
independent of spin. A spin-up and a spin-down electron can have the same momentum ~k, so each momentum
state has a spin degeneracy factor of gs = 2s + 1 = 2. Therefore, the grand canonical partition function is
Y
1 + eβ(µ−~k ) .

Ξ= (9.4.2)
~
k,s
It is easy to pass to the continuum limit by considering the logarithm of the partition function. Proceeding

9.4. IDEAL FERMI GAS
as in exercise 8.3, we obtain
d3 k
Z
ln 1 + eβ(µ−~k )

ln Ξ = V gs (9.4.3)
(2π)3
Z 3
eβ(µ−~k )
Z 3
∂ ln Ξ d k d k
hN i = z = V gs := V n~ . (9.4.4)
3 (2π)3 k
β(µ− )

∂z (2π) 1 + e ~
k
The second equation above3 defines the single particle occupation number, given by
1
n~k = n(~k ) = gs β(~k −µ)
, (9.4.5)
e +1
which is the celebrated Fermi distribution.
In the limit T → 0, the Fermi distribution reduces to the
complement of a step function, n() = 1 − θ(~k − µ). The
physical meaning of this formula is that at T = 0, all states with
energy below the chemical potential are completely occupied,
and all states above it are completely empty. The chemical
potential of a Fermi gas at T = 0 is called the Fermi energy or
Fermi level:
µ(T = 0) = F .
The corresponding momentum is called the Fermi momentum, kF . At finite temperature, some electrons
with energy below the Fermi level can be thermally excited to levels above F , as illustrated in the figure.
The spread in energy where this is possible is of order kT . In other words, the Fermi distribution at finite T
differs from a step function only over a region of approximate extent F −kT < < F +kT . The value of the
Fermi energy of a typical metal is several eV, or a few hundred times the value of kT at room temperature.
So in normal laboratory conditions, metals behave like Fermi gases near T = 0.
The Fermi level of a Fermi gas is calculated easily by noting that at T = 0 all levels, and only those
levels, with |~k| < kF are occupied, so the total number of electrons is
kF
d3 k d3 k
Z Z Z
V V 3
dkk 2 =

N =V n~ = 2V 1 − θ(~k − F ) = 2 k , (9.4.6)
(2π)3 k (2π)3 π 0 3π 2 F
yielding the dependence of Fermi energy on electron density:
2/3
~2 kF2 ~2 3π 2 N

F = = . (9.4.7)
2m 2m V
In the same way, one can calculate the average energy,
Z 3
d k ~2 k 2 3
E = V gs 3
n~k = N F . (9.4.8)
(2π) 2m 5
We can use Eq. 9.4.8 to study the properties of the Fermi gas at zero temperature. First, note that S = 0,
since there is a unique configuration of the ground state. Thus, partial derivatives of the energy with respect
to volume or number are taken at constant entropy, and we can calculate chemical potential and pressure as
∂E
µ= = F (9.4.9)
∂N
∂E 2N
P =− = F . (9.4.10)
∂V 5V
The first equation confirms the result we had anticipated, that the Fermi energy is the chemical potential at
T = 0. The second equation states the remarkable fact that at zero temperature the Fermi gas possesses a
large residual pressure, unlike the classical ideal gas. This fact is direct consequence of the Pauli exclusion
3 Note the resemblance to the coverage equation of the Langmuir isotherm.

principle. For typical values of Fermi energy, F = 10 eV, and density, N/V = 1022 cm−3 , we find P ≈ 3×104
atmospheres!
Application: Contact Potential
Consider two different metals (with different Fermi energies) in thermal equilibrium. When the metals
are connected by a wire, the difference in Fermi energy (and therefore, electron gas pressure) forces
electrons to flow from one metal to the other, until the metals charge up sufficiently that the resulting
voltage (electrostatic potential difference, ∆φ) causes the current flow to stop. This voltage is called
the contact potential. When electrons stop flowing, the electrostatic potential difference has balanced
exactly the chemical potential difference. The equilibrium condition is given by the minimum of the
Helmholtz free energy. Assume the Fermi energy (and so the chemical potential) of metal 1 is larger
than that of metal 2, so that Ne electrons are transferred from metal 1 to metal 2. The total free
energy is then
(eNe )2
F (Ne ) = F1 (−Ne ) + F2 (Ne ) + ,
2C
where F1,2 are the free energies of the two metals, C is the capacitance of the metals (function of their
shape and distance), e is the proton charge, and the ratio of (charge)2 over twice the capacitance is
the electrostatic energy stored in the capacitor as a result of the build-up of charge eNe . Since the
total Helmholtz free energy must be minimum with respect to the partition of the charge between
∂F
the metals, we set ∂N e
= 0. This means
e2 Ne
−µ1 + µ2 + = 0.
C
The last term can be expressed in terms of the potential difference between the two electrodes of
the capacitors: Q = CV , where V is the potential of the positively charged electrode (metal 1 that
lost electrons) minus the potential of the negatively charged metal (metal 2 that gained electrons):
V = φ1 − φ2 . Thus, −µ1 + µ2 + eφ1 − eφ2 = 0 or
µ1 − eφ1 = µ2 − eφ2 . (9.4.11)
This equilibrium condition can be stated as the condition of equality of the electrochemical potential
of the two metals.
9.5 Virial Expansion

The binding polynomial interpretation of the grand canonical partition function presented in Sect. 9.3 sug-
gests that its structure may be used as a tool to generate power series expansion of experimental quantities
of interest, like equations of state and correlation functions. At first sight, it may seem appealing that
coefficients of the partition function of higher degree involve an increasing number of particles; we may be
tempted to retain only a few terms, at least for dilute enough systems. In reality, this is not the case, since
the fugacity is not a good expansion parameter (the convergence of the series is very slow). After some
thought, this should be expected since we have seen that, even for the ideal gas, we had to sum the whole
infinite series to obtain the correct equation of state. It turns out that density, rather than fugacity, is a
more sensible parameter to use in a power series. While graph theory provides a formal justification of
this assertion, it is not easy to tell how fast expansions of this sort converge, and they should be used with
circumspection, especially in dense systems.
A virial equation of state has the form
∞
X
βP = Bn (T )ρn , (9.5.1)
n=1
where Bn is the n-th virial coefficient and depends only on temperature. Of course, we already know that

9.5. VIRIAL EXPANSION
B1 = 1 from matching to the ideal gas equation of state. To limit bookkeeping to a minimum, we work out
explicitly the expression of the second virial coefficient B2 , although the method can be used quite generally
for higher order coefficients. Taking the logarithm of Eq. 9.2.9, we have
X ∞
1 1
βP = ln z N Q(T, V, N ) = ln 1 + zQ1 + z 2 Q2 + . . . , (9.5.2)
V V
N =0
which yields an undesired expansion in fugacity. However, we do know density as a power series of fugacity,
so our plan is to invert that expansion and substitute into Eq. 9.5.2 to find the virial coefficients. The power
expansion of density comes from Eq. 9.2.10:

hN i z ∂ ln Ξ z ∂
ρ= = = ln 1 + zQ1 + z 2 Q2 + . . . . (9.5.3)
V V ∂z V ∂z
Since Eq. 9.5.3 shows that ρ = O(z), and we are looking for an expression valid to O(ρ2 ) to get B2 , we
don’t need to keep track of terms o(z 2 ); thus, recalling the Taylor expansion of the logarithm, ln(1 + x) =
x − x2 /2 + . . . , we have
z ∂ z2 1
ρ= (zQ1 + z 2 Q2 − Q21 ) + O(z 3 ) = (zQ1 + 2z 2 Q2 − z 2 Q21 ) + O(z 3 ). (9.5.4)
V ∂z 2 V
Moreover, writing fugacity as a power series of density,
z = aρ + bρ2 + O(ρ3 ), (9.5.5)
substituting into the right hand side of Eq. 9.5.4, and grouping like powers of ρ, we obtain
2

Q1 Q1 2 2Q2 2 Q1
0= a −1 ρ+ b +a −a ρ2 + O(ρ3 ). (9.5.6)
V V V V
A power series vanishes identically when all the coefficients do; hence,

V b Q2
a= ; =V 1−2 2 . (9.5.7)
Q1 a Q1
Substituting Eq. 9.5.5 into Eq. 9.5.2, and using the values of a and b just determined, we find
Q21 2 2

Q1 Q1 2 Q2 V Q2 2
βP = aρ + bρ + − a ρ =ρ+ 1 − 2 2 ρ + O(ρ3 ), (9.5.8)
V V V 2V 2 Q1
which allow us to identify

V Q2
B2 (T ) = 1−2 2 . (9.5.9)
2 Q1
As a check, note that for the ideal gas 2Q2 = Q21 , so the second virial coefficient is indeed zero. Now, let us
assume a real gas where the particles interact via a two-body potential, v(~r12 ). Then, Q1 = V /λ3 and
Z
1
Q2 = 6 d3 r1 d3 r2 e−βv(~r12 ) .
2λ
The integral over the center of mass yields the volume, so we find
Z Z
1 3 −βv(~r) 1 3 −βv(~
r)
B2 (T ) = V − d re = d r 1−e . (9.5.10)
2 2
Typical intermolecular potentials have a strong, short range repulsion –over a distance the size of the “hard
core” radius σ– and a weak, long range attraction. Over the hard core region, the integrand is approximately
1, so the integral acquires a positive contribution approximately equal to the volume of the hard core region,
the “excluded volume,” and independent of temperature. Outside of the hard core region and out to infinity,

an estimate for the contribution of the weak tail of the potential is given by β d3 rv(r), a negative quantity
R
for an attractive potential, and a decreasing (in magnitude) function of temperature. Therefore, the second
virial coefficient is expected to start negative at low temperature, and become positive at high temperature.
Such temperature dependence is characteristic of the competition between entropic (excluded volume) and
energetic (long range attraction) effects, with entropy dominating at high temperature.
Example: Boyle temperature of the square well potential
Consider a square well potential,


∞
 |~r| ≤ σ
v(~r) = − σ < |~r| ≤ R

0 R < |~r|,

with > 0. Then,

Z ∞ Z σ Z R
2π 3
drr2 (1 − e−βv(r) = 2π drr2 + 2π drr2 (1 − eβ ) = σ + (1 − eβ )(R3 − σ 3 ) .

B2 (T ) = 2π
0 0 σ 3
The second virial coefficient vanishes at

/k
T = .
ln R3 /(R3 − σ 3 )
At this temperature, which is called the Boyle temperature, the gas behaves the closest to an ideal
gas.
9.6 van der Waals Equation of State

As seen in the square well potential example, the competition between entropy and energy is played out in the
contrasting effects of the two parameters of the potential, σ and . Assuming kT , the square well virial
coefficient can be written in terms of two parameters, a and b defined as B2 (T ) = 2π 3 2π 3 3
3 σ − β 3 (R − σ ) =
4
b − βa. If the second virial coefficient exists, this approximate identification of length and energy scale is
always possible, regardless of the actual form of the interaction. Then, Eq. 9.5.2 can be recast in the form
βP = ρ + (b − βa)ρ2 ,
or
β(P + aρ2 ) = ρ(1 + bρ). (9.6.1)
Note that the right hand side is the low density expansion of
ρ
= ρ(1 + bρ + b2 ρ2 + . . . ).
1 − bρ
Although this looks like an ad hoc stipulation, we can see that it describes the physically correct high density
limit, where pressure grows rapidly as molecules are squeezed together. Moreover, it reduces to the known
exact expression of the equation of state for the one-dimensional hard rod gas (see exercises). This limit
cannot be obtained from a truncated viral expansion, since such an expansion is designed to capture correctly
the low density behavior of real fluids. Substituting into Eq. 9.6.1, we obtain the van der Waals equation of
state,5 which is more commonly written as
N2

P + a 2 (V − N b) = N kT. (9.6.2)
V
4 Thesecond virial coefficient may not exist, as in the case of the Coulomb potential.
5 J.
D. van der Waals was awarded the 1910 Nobel Prize in Physics “for his work on the equation of state for gases and
liquids.” Put in historical context, the equation of state allowed van der Waals to estimate the size of atoms – in times when
many were not convinced of their existence – and the size of interatomic forces – a field to which he contributed significantly.

9.7. PHASE COEXISTENCE AND METASTABLE STATES
The interesting feature of this equation of state is that it is a cubic equa-

tion in the density (or volume). This feature makes it possible to have
more than one solution for density at fixed pressure and temperature,
as can be seen by plotting isotherms in the P V plane. There exists an
isotherm, called the critical isotherm, above which there is only one solu-
tion for the density, and below there are three. The intermediate density
solution does not correspond to an equilibrium value, because the positive
slope of the isotherm through it implies a negative isothermal compress-
ibility, which violates the second law of thermodynamics; the remaining
two values of the density are compatible with stability criteria. Hence,
there is a region of the P V plane where the equation of state gives unphysical results. To define the bound-
aries of this instability region precisely, we will work from the thermodynamic potential, since the equation
of state is not a fundamental relation and does not contain sufficient information to derive the thermody-
namics of the system. As temperature approaches the critical temperature from below, the instability region
shrinks to a point; this implies that the critical isotherm must have both zero slope and an inflection point.
These two conditions, solved simultaneously with the original equation of state, allow us to identify critical
temperature Tc , pressure Pc , and volume Vc uniquely:

∂P
=0


 ∂V
 
Vc = 3N b

 Tc ,Vc 
a

 2 

 ∂ P 
Pc =
2
= 0 =⇒ 27b2 . (9.6.3)
 ∂V Tc ,Vc




 8a
kT =
N2

 c

 Pc + a 2 (Vc − N b) − N kTc = 0

 27b
Vc
Law of Corresponding States
We can use {Tc , Vc , Pc } to eliminate {N, a, b} from the equation of state. Doing so, we realize that we
can recast the equation of state entirely in terms of rescaled variables τ := TTc , ν := VVc , and ψ := PPc :

3
ψ + 2 (3ν − 1) = 8τ, (9.6.4)
ν
which is known as the law of corresponding states. This equation is remarkable because it predicts
that if we scale temperature, pressure, and volume by their critical point values, all real fluids obey
the same equation of state regardless of the interaction between them. The reduced compressibility
factor at the critical point is predicted to be a universal constant, Pc Vc /N kTc = 3/8. How does the
prediction fare experimentally? Experimental values usually fall between 0.2 and 0.3. Nevertheless,
a glance at generalized compressibility charts shows that the data for real gases do tend to fall on
“universal” curves.
9.7 Phase Coexistence and Metastable States

To describe the behavior of the system below the critical
point, let us look for a suitable thermodynamic potential. The
Helmholtz free energy of the van der Waals gas is
N 2a

V − Nb
F (T, V, N ) = −N kT ln 3
+ 1 − . (9.7.1)
Nλ V
Later on, we will learn to construct (approximate) expressions

for the free energy of interacting systems in a systematic way.

For the moment, we simply accept this form of the free energy
based on the fact that (a) it yields the desired equation of state; (b) it recovers the correct ideal gas free
energy when the interaction parameters a and b are set to zero. Now, imagine the system is following a van
der Waals isotherm. By the discussion of Sect. 9.1, we know that the chemical potential (or specific Gibbs
free energy) is minimum in these conditions. The Gibbs potential is
N 2a

V − Nb
G(T, P, N ) = min −N kT ln +1 − +VP . (9.7.2)
V N λ3 V
Minimization by taking the derivative with respect to V is equivalent to working with the equation of state,
except that now we are equipped with the means of classifying the stability of each solution by comparing the
Gibbs free energies (or chemical potentials). The figure shows the rescaled Gibbs free energy g := G/N kTc
using the rescaled units defined in the example box, (ψ, τ ), for a fixed τ < 1 as a function of ψ, i.e., following
a van der Waals isotherm. Note that the Gibbs free energy is given by the minimum of the plot for any given
pressure, which means that the system in equilibrium must move straight from point 2 to point 6, without
visiting the “bowtie” path (2,3,4,5,6) (that path corresponds to the values of the argument of the right hand
side of Eq. 9.7.2, F (T, V, N ) − V P (T, V, N ), before carrying out the prescribed minimization). Note also that
the point (2,6) corresponds to a cusp of g: the derivative of the free energy is discontinuous there. A point
where the first derivative of a thermodynamic potential has a discontinuity is called a first order phase
∂g

transition. In this case, the derivative we have considered is v = ∂P T,N
; at the first order transition of the
van der Waals fluid, two phases with different specific volume coexist (liquid and vapor). Experimentally,
calorimetry is often used to detect first order transitions from the measurement of the associated latent heat,
∂g
which signals a discontinuity in specific entropy s = − ∂T P,N
.
Although our discussion of the phase diagram has been dismissive of the bowtie path on the van der Waals
isotherm, real systems often are found in a thermodynamic state corresponding to points on the (2,3) and
(5,6) portions of the path. The existence of such states in equilibrium does not run counter any fundamental
law (unlike “states” along the (4,5) path, which would be states of negative isothermal compressibility). The
characteristic feature of such states is metastability: they are local, not global minima of the free energy,
and they are separated from the global minimum by a free energy barrier.
Application: Classical Nucleation Theory
The classical theory of homogeneous nucleation considers a metastable state, say a supercooled vapor,
where a fluctuation has occurred producing a droplet of the stable liquid phase. Let gV be the Gibbs
free energy per unit volume of the vapor and gL that of the liquid; since the vapor phase is metastable,
∆g := gL − gV < 0. When the droplet forms, a phase boundary appears in the material, with an
associated surface tension σLV , which is the reversible work needed to create a unit area of the
interface (microscopically, this work is required because interfaces involve broken bonds). The free
energy change upon formation of a droplet of radius R at constant P, T is then given by
4πR3
∆G = ∆g + 4πR2 σLV .
3
As a function of droplet radius, the free energy first increases, since the negative bulk term is pro-
portional to R3 while the positive surface term is proportional to R2 . There exists a critical radius
R∗ = 2σ
−∆g for which the total free energy cost of the stable phase droplet is maximum and equal to
LV
3
16πσLV
∆G∗ = .
3(∆g)2
According to classical nucleation theory, this is the barrier that needs to be overcome for the formation
of the stable phase, and kinetic theory then predicts a homogeneous nucleation rate proportional to
exp(−β∆G∗ ).
There are also second order phase transitions, which occur with no latent heat.The transition from
resistive to superconductive state in metals is of this type. Second order phase transitions were originally

9.8. WHAT MAKES A PHASE?
given this name because of a jump in the heat capacity (second order derivative of the free energy) observed,
e.g., in the superconducting transition. However, this is not always true; sometimes other types of singularity
are observed. The important point is that the first derivative of the free energy are continuous; hence the
phases are indistinguishable at the exact transition point – there is no phase coexistence or double minimum
in thermodynamic potentials. The critical point of the van der Waals fluid is an example of such instance:
there, the two minima of Eq. 9.7.2 coalesce and the latent heat and density difference disappear.
9.8 What makes a Phase?
In the introduction, a phase of matter was defined as a homogeneous mixture of all compounds that can
be formed by the chemical elements present. Now if one tries to be more specific regarding the meaning of
homogeneous, several difficulties arise. One difficulty has to do with kinetics. Imagine filling a vessel with
two immiscible fluids, like water and oil. The presence of two phases will be evident from the presence of a
very sharp interface, which leaves no room for ambiguity at room temperature, where the entropic penalty
of not mixing is beaten hands down by the favorable energetics of keeping water’s hydrogen bond network
intact (save for the broken bonds at the interface). Now if the vessel is shaken vigorously, the liquid will
appear milky and homogeneous: a mixture of microscopic droplets of the two liquids.6 How many phases
are there? The correct answer is – one has no business talking about phases because the system is not in
equilibrium. If one waits long enough, water and oil will separate again. Long enough may be a minute
or two, or even a few hours, depending on the type of oil used and the cleanliness of the container, among
other factors; regardless, the system in equilibrium has two phases, the water and the oil. We can define
a measurable quantity, such as the expectation value of the density of water molecules, as the parameter
distinguishing the phases: it is nearly unity (normalized to the density of pure water at the same T, P ) in the
water phase and nearly zero in the oil phase, with an abrupt, discontinuous jump at the interface (here we
take a macroscopic viewpoint; interfaces are not abrupt at the atomic scale, although away from the critical
point, they are actually abrupt on the scale of a few nm). Other choices are possible, such as the difference
in molar concentration of water vs oil. The important point is that there exists a parameter that changes
abruptly at the phase boundary. We call this parameter an order parameter.
Now imagine adding some detergent into the vessel; shake it again and the same milky mixture appears.
Only this time it will tend to stay around longer, a lot longer. We are willing to call that equilibrium. How
many phases are there? The milky mixture is one phase; it is called the middle phase, since it coexists
in equilibrium with water (denser, at the bottom) and oil (lighter, on top). Now, this is considerably
trickier than the kinetics-dominated case of oil-water alone. How different is the mixture from before?
Macroscopically, not much, except that it is stable. Microscopically, it consists of water and oil domains
separated by a monolayer surfactant film. Having microscopic water and oil domains is entropically favorable,
and the energy cost (oil-water surface tension) is made smaller by the surfactant, which stabilizes the
homogeneous mixture.7
6 The milky appearance of the “homogeneous” system is caused by the difference in optical density between the two fluids,
just like in a fog (water has a lower index of refraction than oil).
7 This qualitative discussion is an oversimplification of the physics, but is a good starting point. The middle phase is actually
a generic name for a rich number of different possible phases.

Example: Spreading Pressure
The lowering of the surface tension by a dilute adsorbed film is a general phenomenon due to the
entropic advantage of making more surface states available to the adsorbed molecules. Let σ12 be
the surface tension at the interface of a two-phase system. A differential increase in the interfacial
area A at constant T, V, N1 , N2 has a free energy cost of dF = σ12 dA. Now consider the addition of
a small amount of a third component, adsorbed at the interface, with areal density ns = Ns /A. The
free energy of this dilute surface film is Fs = −T Ss + Ns µs . Interactions among adsorbed molecules
are neglected, and the adsorption energy per particle is included in the chemical potential µs . The
entropy of the surface film, Ss , is proportional to the logarithm of the available area, by Boltzmann’s
law: Ss = Ns k ln A/A0 . A differential increase in the interfacial area at constant T, V, N1 , N2 , Ns has
a free energy cost of dF = σ12 dA − T ∂S ∂A dA = (σ12 − ns kT )dA. So the free energy cost of creating
s
more interface is lowered by an amount proportional to the areal density of surface molecules ns and
temperature, which is just the two dimensional ideal gas pressure, or spreading pressure. The build
up of the surface film results in a smaller surface tension.
So what makes the middle phase a phase? What is its order parameter? Since the middle phase coexists
with the water and oil phases, there must be something that tells them apart. Again, many choices are
possible, one being the (normalized) concentration of water, which will exhibit three values, approximately
one in the water phase, zero in the oil phase, and something in the middle in the middle phase, with two
distinct discontinuities at the two interfaces. Now here is the catch. We are necessarily assuming that we
are talking about an average concentration of water, with the average taken over a region containing many
surfactant-stabilized water and oil domains; otherwise, on a submicrometer scale, the density of water jumps
in a binary fashion between zero and one (or vice versa) as one traverses the surfactant film from the oil side
to the water side (or vice versa). So in defining the order parameter, we must first take averages over some
short (microscopic) length scales. This procedure works, because, as we will see later, the phenomenology
of phase transitions is dominated by long wave length fluctuations; what happens at short scales can be
averaged out.
9.9 Absence of Phase Transitions in Finite Systems and the The-

orem of Lee and Yang
From the discussion of the van der Waals gas and its generalization through the concept of order parameter,
we have discovered that the study of phase transitions revolves around a systematic way of calculating with
partition functions – multivariable statistical distributions – by averaging first over the “less important”
degrees of freedom until we are left with what seems important to describe the macroscopic behavior of the
system; for the van der Waals gas, it was the density. Once this parameter, call it φ has been identified
(there can be more that one, it can be a vector, it can depend on space, etc.), then we look at the behavior
of the average of this parameter in the space of thermodynamic variables (e.g., P, T ). The average of the
order parameter can be obtained from a thermodynamic potential, as in the case of the density, which by
combining Eqs. 9.2.7 and 9.2.10 can be written as
z ∂Ω
ρ=− . (9.9.1)
V ∂z
If it happens that this parameter does not vary smoothly,8 then the thermodynamic potential is not an
analytic function (an analytic function is infinitely differentiable). Now, the partition function of a system
with N particles is a polynomial of degree N in the fugacity z. Polynomials are analytic functions, and thus
a system with a finite number of particles cannot have a phase transition. You may ask what happens in the
limit N → ∞. Isn’t an infinite series also infinitely differentiable? The answer lies in a theorem by Lee and
Yang, which we state without proof. Consider the grand canonical partition function Q. For finite N , Q is
8 It can jump (first order transition), or stay constant above some temperature and slowly vary below it (second order
transition), for example.

9.10. EXERCISES
a polynomial with positive coefficients; it cannot real have roots. However, extending it to complex values
of fugacity z, it will have N complex roots. Now, in the limit V → ∞, N → ∞, the roots of Q can converge
to a value z0 on the real axis. When this happens, Q will not be analytic at z0 . Therefore, there is a phase
transition for that value of fugacity.
9.10 Exercises
1. Derive an expression for the expectation of energy in the grand canonical formalism, keeping the
chemical potential µ fixed, instead of the fugacity.
2. Derive the expression for the variance of the particle number Eq. 9.2.11.
3. Work out the adsorption problem if each surface site can adsorb 0, 1, or 2 particles, with respective
energies E0 = 0, E1 = −, E2 = −2 + η, with > 0. Discuss the three cases η > 0, η = 0, η < 0 and
sketch coverage vs pressure for each.
4. Fermi gas at low temperature For the ideal Fermi gas (N electrons in volume V ), the internal
energy at low, but nonzero temperature, can be calculated approximately from Eq. 9.4.3, yielding the
expression
2
5π 2 kT

3
E = N F 1 + .
5 12 F
(a) Is room temperature “low”? Justify your answer for a typical value of F = 5 eV.
(b) Work out the specific heat and the entropy. Are they consistent with the Third Law?
(c) Suppose your metal has one free electron per ion core. Compare the contribution of the specific heat
of the ion lattice to that of the electron gas at room temperature, assuming that classical equipartition
is valid for the ions.
5. Calculate the virial coefficient of the hard core van der Waals potential,

∞
 r≤σ
6
v(r) = σ
−
 σ < r,
r
assuming you can use a Taylor expansion appropriate for high temperature. Why is kT a reasonable
assumption for intermolecular potentials? (e.g., for argon, ≈ 1.7×10−21 J.)
6. Tonk’s gas Calculate the canonical partition function Q(T, L, N ) for a gas of N hard rods in one
dimension on the segment [0, L]. Each rod has length σ, and there is no other interaction besides the
hard core potential (excluded volume). (Note that this interaction imposes a natural order on the
string of particles: two hard core particles cannot switch places in one dimension.) Then from the
partition function, find the equation of state and compare it to the van der Waals form.
7. Show that Eq. 9.7.1 reproduces the van der Waals equation of state and yields the ideal gas free energy
when we set a = b = 0.
8. Sketch a comparison of how the Gibbs free energies of a gas, a liquid, and a solid vary as a function of
(a) pressure and (b) temperature.
9. Will it liquefy? A mole of diatomic van der Waals gas is passed through a Joule-Thompson expansion
valve. In this problem, analyze the process in the P, T plane, using a convenient approximation of the
virial equation of state,
RT
V = + B(T )
P
a
with B(T ) = b − RT . The Joule-Thompson expansion is isenthalpic: H(Pi , Ti ) = H(Pf , Tf ).
(a) Write an expression for the differential of H regarded as a function of T and P .
(b) Devise a suitable path consisting of isothermal and isobaric steps for which the use of the ideal gas
CP is justified. Sketch this path and use it to find an equation for the final temperature Tf .

10. DPPC is a surfactant lining the lung. As a monolayer on water, its spreading pressure Π is measured
to be 4 mJ m−2 at specific area of 0.4 nm2 at room temperature in a Langmuir trough. The surface
tension of pure water is σ0 = 72 mJ m−2 . Assuming ideal 2D gas behavior for DPPC on water, what
specific area would give you a water surface tension σ = 40 mJ m−2 ? Compare your answer to the
experimental value of 0.24 nm2 and explain the discrepancy. Propose a one-parameter modification of
the ideal gas equation of state that would behave more realistically at low specific area.

Chapter 10
Mean Field Theory
10.1 Lattice Models of Binary Systems

Lattice models are an important class of models where the degrees of freedom of the system reside on a
discrete lattice.They may describe, for instance, the presence of an atom of type A vs. B in a binary alloy or
diblock copolymer, or the presence of solvent vs. solute in a solution, or a filled (liquid) versus empty (gas)
volume element in a fluid. The lattice is embedded in a D-dimensional space, so the degrees of freedom are
defined in ZD . The degrees of freedom themselves can be binary (Bernoulli) variables, classical or quantum
spins, real valued functions, vectors, etc. depending of the physical variable they describe. Each lattice site
is connected to a number of nearest neighbors, denoted by z (no relation to fugacity!). Unless specified
otherwise, we will be working on D-dimensional cubic lattices, for which z = 2D. Note that in most cases
considered here, the lattice is simply a convenient artifact that allows one to work with discrete degrees
of freedom; the fact that it may impose an underlying “crystalline” order, or periodicity, on the degrees
of freedom is accidental and irrelevant. Thus, in the case of the lattice gas, the condensed phase at zero
temperature should be regarded as a liquid phase, and nothing can be inferred about freezing transitions.
In the next section, we explore two models of binary systems: regular solution theory and the Ising model of
ferromagnetism. It will become apparent that the models are actually the same, in the sense that there is an
exact mapping between them, as should be expected from the fact that the underlying degrees of freedom
are drawn from the Bernoulli distribution in both cases.
10.2 Regular Solution Theory

Consider a binary solution on a lattice; there are NA particles of solvent and NB particles of solute, filling
the lattice completely. The entropy and energy are
N!
S = k ln and U = MAA wAA + MBB wBB + MAB wAB ,
NA !NB !
where the ws and M s are known interaction energies and unknown numbers (respectively) of the three types
of bonds AA, BB, and AB. Note that there is, in fact, a single unknown in the problem, because
zNA = 2MAA + MAB and zNB = 2MBB + MAB
are two equations constraining the three M s, since each AA bond terminates into two A particles and each
AB bond terminates into one A particle, and there are a total of zNA bonds terminating into A particles,
and likewise for B. So one can eliminate two of the M s by solving the constraints as
zNA − MAB zNB − MAB
MAA = and MBB = .
2 2
In the absence of microscopic information, the value of MAB , needed to determine the exact energy of a
configuration, remains unknown. However, we can assigned it probabilistically by noting that the probability
63
CHAPTER 10. MEAN FIELD THEORY
that a neighbor of a particle of type A is a particle of type B is given simply by NB /N . (Strictly speaking
the probability is NB /(N − 1), but N is of order Avogadro’s number.) Thus, the Bragg-Williams mean
field assumption replaces MAB by its expected value:
NA NB
MAB ≈ hMAB i = z . (10.2.1)
N
Now the problem is completely determined and can be solved. Conventionally, one defines the temperature-
dependent exchange parameter
z wAA + wBB
χ= wAB − , (10.2.2)
kT 2
whereupon the energy becomes
zwAA zwBB NA NB
U= NA + NB + kT χ .
2 2 N
In terms of fractions of solvent and solute,
NA NB
x= and (1 − x) =
N N
the excess free energy of the solution, ∆F (x) := F (NA , NB ) − F (NA , 0) − F (0, NB ) is given by
∆F (x)
= χ x(1 − x) + x ln x + (1 − x) ln(1 − x). (10.2.3)
N kT
This is Hildebrand’s regular solution theory. Plotting out the excess free
energy as a function of composition for various values of χ (which depends
on temperature), we see that there are two regimes, much like for the van
der Waals fluid: for χ < 2, the excess free energy has a single minimum as
a function of x, while for χ > 2 there are two; this means that the system is
stable, at a given temperature, in two phases with different composition, a low
x phase composed mostly of type B particles and a high x phase composed
mostly of type A particles: the two components are immiscible and phase
separate. The critical temperature given by χ = 2 is the temperature below which the phase diagram shows
an immiscibility region. Note that there are systems where χ has a less trivial temperature dependence than
indicated by Eq. 10.2.2; this happens, for example, in some strong hydrogen bonding systems, where the
hydrogen bonding enthalpy becomes very strong at low temperature and favors the mixing of a hydrogen
bonding solute with water below some lower critical mixing temperature.
10.3 The Ising Model

The degrees of freedom of the Ising model are binary variables at each site, si , i = 1, . . . , N , taking up one
of two possible values ±1: we will call them spins.1 Let −h be a potential acting on each spin (the chemical
potential including any external field).2 In the absence of spin-spin (interparticle) interactions, the partition
function becomes the product of single-site partition functions:3
X PN X N
Y N
β hsi βhsi
Q(T, N, h) = e i=1 = e = 2 cosh(βh) , (10.3.1)
{si =±1} {si =±1} i=1
where the notation makes the correspondence V → N , µ → h explicit. The summation in Eq. 10.3.1 is
simply an application of the binomial theorem.
1 They are of course classical spin variables.
2 The negative sign comes from the analogy with magnetism, where the energy is minimum when dipoles align with the
external field.
3 Just as we would find for a collection of distinguishable tls particles. The binary degrees of freedom are distinguishable
because they reside on different lattice sites.

10.3. THE ISING MODEL
Equation 10.3.1 shows that the probability of a spin configuration is given by
XN
exp β hsi N N
i=1
Y eβhsi Y
p(s1 , s2 , . . . , sN ) = = = p(si ), (10.3.2)
Q i=1
Q1/N i=1
which is the product of single-site probabilities, and thus implies that the spins are independent random
variables when the spins are not interacting with each other. Now suppose instead that there is an interaction
between spins, which we write as a two-body interaction parametrized by a traceless4 symmetric matrix Jij
X X
H=− Jij si sj − h si . (10.3.3)
i,j>i i
Assuming for the moment that the interaction is of short range and without loss of generality taking it to
involve only nearest neighbors, we can further simplify the model and write it as
X X
H = −J si sj − h si , (10.3.4)
<i,j> i
where J parameterizes the interaction strength and < i, j > denotes that the sum runs over all pairs of
nearest neighbors. This is the Hamiltonian of the Ising model.
Example: Lattice Gas to Ising Model
Consider a lattice gas, where each site i can be empty or singly occupied (ni = 0 or 1) and particles
P have an interaction energy u; let µ be the chemical potential. Then, H =
onPneighboring sites
u <i,j> ni nj − µ i ni . Introducing new variables for the degrees of freedom, si = 2ni − 1, we
obtain the Ising Hamiltonian (up to an inconsequential constant) with J = −u/4 and h = (µ + uz)/2.
The partition function of the Ising model is
X P P
Q= eβJ <i,j> si sj +βh i si
. (10.3.5)
{si =±1}
Now it is immediately clear that the probability of a given spin configuration does not factor as a product
of site probabilities, because of the correlations introduced by the nearest neighbor interaction term.
4 The trace would represents a self-interaction, which would be included in the potential h.

Example: Three-spin clusters in one dimension
Consider the partition function of a one dimensional (linear) cluster of three spins:
X X X
Q= eβJ(s1 s2 +s2 s3 ) eβh(s1 +s2 +s3 ) .
s1 =±1 s2 =±1 s3 =±1
Eliminating the middle spin first, we find

X X
eβJ(s1 +s3 ) eβh(s1 +s3 +1) + e−βJ(s1 +s3 ) eβh(s1 +s3 −1)

Q=
s1 =±1 s3 =±1
X X
eβh(s1 +s3 ) eβh eβJ(s1 +s3 ) + e−βh e−βJ(s1 +s3 )

= (10.3.6)
s1 =±1 s3 =±1
X X
= eβh(s1 +s3 ) eln[2 cosh(βJ(s1 +s3 )+βh)] ,
s1 =±1 s3 =±1
which shows us that the middle spin effectively couples its neighbor to the left to its neighbor to the
right via an interaction given by
β J˜ = ln[2 cosh(βJ(s1 + s3 ) + βh)],
even though these two spins were not interacting in the original Hamiltonian (they are not nearest
neighbors). This correlation prevents one from writing the probability of a configuration as the
product of independent site factors.
It is not obvious how the Ising model can be solved in closed form in the general case. However, the
exact solution can be found in 1D and in 2D. The 1D solution is straightforward and instructive, although
it is uneventful: no phase transition is present in the 1D Ising model at nonzero temperature. In 2D,
the solution is a mathematical tour de force, and it provides a benchmark for numerical and approximate
methods of studying phase transitions. Before exploring exact solutions, we apply mean field theory to
obtain approximate solutions in all dimensions, subject to the caveats described at the end of Sect. 10.2.
10.4 Mean Field Theory of the Ising Model

Let us explore a general method of constructing mean field theories of statistical mechanics models. The
method consists of replacing the true probability of a microstate with an approximate one that is the product
of single particle (i.e., site) probabilities of the form given in Eq. 10.3.2. For a system at constant temperature
and external field, which for now we allow to be site-dependent, the Gibbs energy is calculated easily under
the mean field approximation, since
X X X
GMF = hH − T Si = −J hsi ihsj i − hi hsi i + kT [pi ln pi + (1 − pi ) ln(1 − pi )]. (10.4.1)
<i,j> i i
The crucial point is that the factorization of the probability into single site terms (i.e., the assumption of
statistical independence) allowed us to replace the expectation of the product hsi sj i with the product of
expectations hsi ihsj i. Now note that the expectation of the local spin, which is the local magnetization, is
given by
mi = hsi i = 1 × pi + (−1) × (1 − pi ) = 2pi − 1. (10.4.2)
Expressing pi in terms of mi allows us to rewrite the Gibbs free energy in terms of the magnetization alone
as
X X X 1 + mi 1 + mi 1 − mi 1 − mi

GMF = −J mi mj − hi mi + kT ln + ln . (10.4.3)
<i,j> i i
2 2 2 2
We recognize the magnetization as the order parameter of the system (cf. Sect. 9.8). Now, we know that G
is a function of T, h, N only; therefore, at equilibrium, it must be minimum with respect to the values of the

10.5. LIMITATIONS ON THE APPLICABILITY OF MEAN FIELD THEORY
mi :
∂G X kT 1 + mi
= −J mj − hi + ln = 0, i = 1, . . . , N, (10.4.4)
∂mi 2 1 − mi
j∈{nn}i
where the summation extends over the nearest neighbors of the i-th spin. Solving the Ising model in the
mean field approximation thus requires the solution of a coupled system of N transcendental equations.
The task becomes much simpler if the external field is
constant. In that case, the order parameter is uniform and
the minimum free energy solution is the constant solution m
that satisfies
kT 1+m
−Jzm − h + ln = 0, (10.4.5)
2 1−m
or, equivalently,
m = tanh(βJzm + βh), (10.4.6)
which is known, in the field of magnetism, as the Curie-

Weiss equation.
We can use this equation to predict the phase diagram of
Ising model. Let us begin by setting h = 0. Clearly, m = 0 is always a solution, and is the only solution
at high temperature β → 0. In the figure, the solutions of the mean field equation are found by a graphical
method for small and large β. Clearly, there exists a critical temperature, which we will calculate shortly,
below which spontaneous magnetization appears as a solution of the mean field Curie-Weiss equation. The
mean field Gibbs free energy function G(T, h; m) is also displayed, to allow one to verify the stability of the
solution; and at the bottom, three isotherms are plotted, a supercritical, the critical, and a subcritical one.
It is instructive to explore the behavior of the magnetization near Tc analytically. If at some temperature
spontaneous magnetization is to emerge, two new solutions (the equation is symmetric under a change of
sign of m) have to branch out from the zero solution, and hence, they will start out very small, so we can
expand the rhs of Eq. 10.4.6 in powers of m: tanh x ≈ x − x3 /3 + . . . . Retaining terms up to the cubic term
only,5 one finds
(βJzm)3
m = βJzm − , (10.4.7)
3
which yields the solutions
(
0 kT ≥ Jz
m(T, h = 0) = √ p
0; ± 3 βJz − 1 kT < Jz, and βJz − 1 → 0+ .
Below Tc , the zero magnetization solution is unstable because it corresponds to a maximum in the free
energy. Note the dependence of Tc on system dimensionality D, through the number of nearest neighbors z;
for hypercubic lattices, kTc = 2JD in D dimensions. In simple terms, Tc can be viewed as the temperature
above which entropy-driven behavior (disorder) takes over enthalpy-driven behavior (order). Hence, entropy
is more important (more destabilizing) in lower dimensions.
10.5 Limitations on the Applicability of Mean Field Theory

The qualitative discussion of the Bragg-Williams assumption highlights a very important point of mean field
theories: their applicability depends on the number of nearest neighbors z. This is essentially a consequence
√ theorem, because Var(MAB ) ∝ z, so the relative error of replacing MAB with hMAB i
of the central limit
decreases as 1/ z as z gets larger. Interestingly, there are several ways of achieving a large z; one is through
a large number of nearest neighbors (a highly coordinated lattice), another is through long (infinite) range
interactions. Thus, we expect mean field theory to be a better approximation in higher dimensional systems
5 The analogy with the cubic van der Waals equation of state is not a coincidence.

or in the presence of long range forces. In the next section, we show how mean field theory succeeds in
giving reasonable approximations to the problem of particles interacting through the Coulomb potential,
something the virial expansion could not handle. However, we will find later that mean field theory is
utterly inapplicable to the one dimensional nearest neighbor linear Ising spin chain, where the exact solution
shows that the critical temperature vanishes, in contrast to the prediction of mean field theory. This is
because of the in one dimension a fluctuation at a single site can be overwhelmingly important since z is so
small.
10.6 Charge Screening in Coulomb Systems: Poisson-Boltzmann

and Debye-Hückel Theory
We saw earlier of that the virial expansion does not exist for systems of charged particles, owing to the slow
decay of the potential. In fact, it would be rather unsettling if an expansion in the power of the density
(or fugacity) converged quickly for long range interactions, because such interactions involve a large number
of particles, which are unaccounted for by the lowest virial coefficients. From our discussion of mean field
theory, we may expect that it fares better in dealing with Coulomb systems. Consider the mean field Gibbs
for spins on the lattice, Eq. 10.4.1. If we are to apply this equation to a system interacting via the Coulomb
force, two modifications need to be made:
(a) We need to retain the full spatial dependence of the interaction coupling Jij since the interaction goes
well beyond nearest neighbors; hence, the energy term must be rewritten as
X X hsi ihsj i
Jij hsi ihsj i = e2 ,
i>j i>j
4π|rij |
where e is the proton charge, the permittivity of the medium, and the sign of the “spin” variable corresponds
to the sign of the charge;
(b) The system must have zero net charge, or else the thermodynamic potentials will not be extensive. This
can be arranged by a suitable choice of the chemical potential; in this case h = 0 ensures zero net charge by
symmetry.
Along with these modifications, we also assume for simplicity that the positive and negative ions are singly
charged (q± = ±e). Then, the mean field equation becomes
βe2 X mj

mi = tanh − . (10.6.1)
4π j |rij |
We then replace the lattice with a continuum and thus the site “magnetization” variables mi with the local
V
ρ(~r) and the sum with an integral ( N1 i → V1 d3 r):
P R
volume density of charge, mi → N
βe2 r 0)
Z
3 0 ρ(~
ρ(~r) = 2ρ0 tanh − d r , (10.6.2)
4π |~r − ~r 0 |
e
R 3 0 ρ(~r 0 )
with ρ0 = N/2V the average number density of positive charges. Finally, we note that 4π d r |~r−~r 0 | = ϕ(~r)
is the electrostatic potential at point ~r and use the first Maxwell equation,
eρ(~r)
∇2 ϕ(~r) = − , (10.6.3)

to arrive at
2eρ0
∇2 ϕ(~r) = tanh[βeϕ(~r)]. (10.6.4)

This equation is called “crowded” Poisson-Boltzmann equation, since it is the appropriate replacement for
the more commonly used Poisson-Boltzmann equation for densely packed electrolytes (such as near charged
electrodes). Both Eq. 10.6.4 and the plain Poisson-Boltzmann equation,
2eρ0
∇2 ϕ(~r) = sinh[βeϕ(~r)], (10.6.5)

10.7. EXERCISES
can be linearized to capture the behavior of dilute electrolytes6 and the behavior of the potential at long
distances, where it is expected to be small. Then, we are allowed to retain only the first term in the Taylor
expansion of tanh x or sinh x ≈ x. This limit corresponds to the Debye-Hückel approximation,
2e2 ρ0
∇2 ϕ(~r) = β ϕ(~r). (10.6.6)

For a unit test charge in an electrolytic medium, the electrostatic potential from equation 10.6.6 at a distance
r from the charge is
e−r/λ
ϕ(~r) = , (10.6.7)
4πr
which decays exponentially with decay length given in term of the ionic strength by
s
kT
λ= , (10.6.8)
e2 i ρ0,i zi2
P
where the i-th species has charge number zi and density ρ0,i , and the charge neutrality condition is
P n
i=1 ρ0,i qi = 0. This length is called the Debye screening length. Note that the Debye-Hückel poten-
tial can not be recovered as a truncated expansion in the density, which explains our failure to obtain a
sensible result for the second virial coefficient for the Coulomb interaction. In this respect, mean field theory
is much more successful.
10.7 Exercises
kT
1. Show that the interfacial tension in the lattice model of regular solutions is given by γAB = za χ, where
a is the unit area of the interface.
2. Consider a lattice gas with nearest neighbor interactions, with bond energy − < 0. Let z be the
lattice coordination, ρ the fraction of occupied sites.
(a) Calculate the pressure and the second virial coefficient. In what limit is the ideal gas equation of
state recovered?
(b) Find the heat capacity. (Hint: work it out from the entropy) Explain your result in terms of the
limitations of the lattice model.
3. (Cooperative Adsorption) Consider a lattice model of surface adsorption, where the particles are in
ideal gas in the bulk gas phase, but they interact once they are adsorbed on the surface. There are
n particles occupying N sites on the surface; an adsorbed particle has energy E = − < 0, and two
particles occupying nearest neighbor sites have negative interaction energy −J (thus, J > 0).
(a) Defining surface coverage as θ = n/N , find the free energy F (T, N, θ) in mean field theory.
(b) From F , find the chemical potential µa of the adsorbed atoms in terms of temperature and coverage.
If the adsorbed atoms are in equilibrium with a gas phase, which we take to be an ideal gas, find a
relation between coverage and pressure. Express it as P (θ) (unlike the Langmuir isotherm, this relation
cannot be inverted in closed form).
(c) Express the condition for pressure to have an inflection point as a cubic equation in coverage. (Hint:
show that if the graph of y = f (x) has an inflection point at x = x0 , then the graph of its inverse
x = g(y) has an inflection point at y0 = f (x0 )).
(d) Cubic equations always have a real root. Set a bound on the range of T for which this root will be
physically meaningful within the mean field theory framework.
6 The Poisson-Boltzmann equation is also used to model charged gases (plasmas).


Chapter 11
Exact Results and Breakdown of

Mean Field Theory
11.1 Ising Model in 1D: the Transfer Matrix Method

The three-spin cluster example of Sect. 10.3 suggests a general strategy for solving the Ising model in 1D.
Consider again the 3-spin cluster, and rewrite the partition function as
1 1 1
X X X
Q= eβh 2 (s1 +s3 ) eβJs1 s2 +βh 2 (s1 +s2 ) eβJs2 s3 +βh 2 (s2 +s3 )
s1 =±1 s2 =±1 s3 =±1
X (11.1.1)
βh 12 (s1 +s3 )
X X
= e T (s1 , s2 )T (s2 , s3 ) .
s1 =±1 s3 =±1 s2 =±1
where
h
T (si , si+1 ) = exp Jsi , si+1 + (si + si+1 ) (11.1.2)
2
1
represents the contribution of each of the two bonds in the cluster. The factor of eβh 2 (s1 +s3 ) hanging in
front reminds us that there is no bond between the first and last spin in our cluster. Now, T (si , si+1 ) is a
function of the two spin connected by the i−th bond; it is, in fact, a 2 × 2 matrix, because each of its indices
can take on two values. T is called the transfer matrix; T11 = T (↑↑), T12 = T (↑↓), T21 = T (↓↑), and
T22 = T (↓↓) correspond to all possible combinations of spin values across the bond between s1 and s2 :
e−βJ
β(J+h)
T T12 e
T = 11 = . (11.1.3)
T21 T22 e−βJ eβ(J−h)
P
Next, note that s2 =±1 T (s1 , s2 )T (s2 , s3 ) = Ts1 ↑ T↑s3 + Ts1 ↓ T↓s3 is simply the matrix product of two transfer
matrices. So computing the partition function of a linear chain of spins has been reduced to computing the
product of 2 × 2 matrices.
Example: Three-spin cluster again
Let us rework the result from the previous section using the transfer matrix formalism. The matrix
product in Eq. 11.1.1 is
e−βJ e−βJ + e−2βJ eβh + e−βh

β(J+h) β(J+h) 2β(J+h)
e e e
T2 = = := [M (s1 , s3 )].
e−βJ eβ(J−h) e−βJ eβ(J−h) eβh + e−βh e2β(J−h) + e−2βJ
1
X X
Therefore, the partition function is Q = eβh 2 (s1 +s3 ) M (s1 , s3 ). It is a simple exercise to
s1 =±1 s3 =±1
verify that this result agrees with Eq. 10.3.6.
71
CHAPTER 11. EXACT RESULTS AND BREAKDOWN OF MEAN FIELD THEORY
For a chain of N spins, it is convenient to impose periodic boundary conditions, which means that the
first spin of the chain makes a bond with the last one,1 as if the chain were wrapped around a circle; in other
words, sN +1 = s1 . With this boundary condition, Eq. 11.1.1 becomes, for N spins,
X X X
Q= ··· T (s1 , s2 )T (s2 , s3 ) . . . T (sN , s1 ) = Tr(T N ). (11.1.4)
s1 =±1 s2 =±1 sN =±1
Since T N is a 2 × 2 matrix, it has two eigenvalues, λN

± , where λ± are the eigenvalues of T , with λ+ > λ− ;
hence, Q = λN + + λ N
− or
N
N λ−
Q = λ+ 1 + → λN+ , N → ∞. (11.1.5)
λ+
The thermodynamics of the model is recovered from the free energy per spin,

G 1/2
= g(T, h) = −kT ln λ+ = −kT ln eβJ cosh(βh) + e2βJ cosh2 (βh) − 2 sinh(2βJ) , (11.1.6)
N
from which the magnetization (average spin) can be calculated as
1 X ∂g sinh βh
m(T, h) = hsi i = − = . (11.1.7)
N i ∂h [sinh2 βh + e−4βJ ]1/2
This result tells us that

(a) in zero external field, there is no magnetization; i.e., the one dimensional model does not have a (nonzero)
Curie temperature;
(b) the limits of zero field and zero temperature do not commute, which is a hallmark of phase transitions.
In other words, the the optimist’s reading of Eq. 11.1.7 is that the 1D Ising model has a phase transition
at T = 0. This is a correct statement, supported additionally by the important result that the correlation
length of the system diverges as T → 0 (see Exercises), which is another signature of phase transitions (as was
discussed briefly in Sec. 9.8 and will be discussed in more depth later). On the other hand, the pessimist’s
reading is that at any nonzero temperature, long range order is absent. Since the “mechanical” ground state
of the system is clearly one ofPtwo possible states with all spins aligned (these states minimize the value of
the Hamiltonian) and thus N1 i si = ±1 6= 0, entropy must play a crucial role.
11.2 Breakdown of Mean Field Theory: Fluctuations and Defects

Mean field theory relies on the ansatz2 that the probability of a microstate is the product of single site (for
a lattice model) or single particle probabilities; the specific form of the site probability is then chosen to
minimize the free energy. A powerful method for testing the prediction of mean field theory is to try to see
if we can beat it with a better guess, which should be guided by physical considerations. In particular, if
mean field theory predicts an ordered state at low temperature, one should look for ways of breaking up the
order while lowering the system’s free energy. If this is possible, the original prediction of mean field theory
will have been proven wrong. We use this approach next to show how we could have reached the result that
the 1D Ising chain cannot have spontaneous magnetization at any nonzero temperature without working out
the exact solution of the model.
Consider the 1D Ising chain with nearest neighbor ferromagnetic
interaction (J > 0) in zero external field. Let us consider the three
spin configurations in the figure. The top configuration is the min-
imum energy configuration (pointing up; everything we say is sym-
metric under reversal of all spins); we take it as the reference state
(state of zero energy). The middle configuration might be expected
1 By introducing this extra bond, the partition function no longer has unpaired factors of eβ hs
1(N ) and is exressed entirely
as a product of tranfer matrices.
2 Ansatz is a fancy word for guess that doesn’t make you feel clueless.

11.2. BREAKDOWN OF MEAN FIELD THEORY: FLUCTUATIONS AND DEFECTS
to be the lowest energy excitation of the system, just a single flipped

spin; the energy cost of doing this is ∆E = 4J, since two “bonds”
are “broken” (marked with ×). We might expect P the system to tolerate a few of these excitation, but still
be overall ordered, in the sense that m = N1 i si > 0. We have overlooked, however, the case illustrated
at the bottom, that of a domain of up-spins coexisting with a domain of down-spins. This state has lower
energy than the single spin flip, since it involves just one broken bond, so its energy cost is ∆E = 2J, but
more worrisome is that it may entail overall zero magnetization (about half the spins are up and half down).
This excitation is called a domain wall; unlike the single spin flip, a domain wall represents an abrupt change
of the order parameter; in other words, a proliferation of domain walls signifies that the system is unable
to settle into a macroscopic phase. Note that the single spin flip can be viewed as a tightly bound pair of
domain walls. But there is no interaction between domain walls in the model, as is easily ascertained by
effecting the substitution
1 − si si+1
wi = , (11.2.1)
2
which turns the nearest neighbor Ising chain into an ideal lattice gas of domain walls. Thus, there can’t
be any force binding domain walls together, and their random placement prevents any long range order
correlation between spins. In fact, the correlation length can be at most as large as the average distance
between domain walls, which is their inverse density. Then, it suffices to calculate the equilibrium density of
domain walls. To do so, we must consider both energy and entropy. Let us then consider M domain walls,
that can be placed on any bond of the chain. The free energy of the system (relative to the ordered state)
is given by
N
∆G(T, h = 0; M ) = 2M J − kT ln . (11.2.2)
M
Defining the density of domain walls as w = M/N , the free energy per spin becomes

∆G(T, h = 0; w) = 2Jw + kT w ln w + (1 − w) ln(1 − w) . (11.2.3)
The free energy must be minimum with respect to the density of domain walls:
∂∆G w
= 0 =⇒ 2J + kT ln = 0, (11.2.4)
∂w 1−w
which means that there is a finite density of domain walls at any nonzero temperature:
1
w= , (11.2.5)
eβ2J + 1
and therefore a finite correlation length3 of order the mean spacing between domain walls:
1
ξ∼ ∼ e2βJ (11.2.6)
w
at low temperature.
There is a great deal of physical insight to be gained from inspection of Eq. 11.2.2. The crucial feature
is that the energy of the defect is independent of the system size L, while the entropy scales with ln L (L is
simply the number of lattice sites N times the lattice spacing). Therefore, at any nonzero temperature there
will be a size large enough for defect to become entropically favorable. This scaling argument illustrates the
role of spatial dimension and of the range of the interactions. For example, it is easy to see that a spin chain
with long range coupling, decaying as 1/x2 at large separation x, yields a domain wall energy proportional to
ln L. We may expect a phase transition at finite T in this case, and the exact solution of the model confirms
this expectation. As another example, consider the Ising model in two dimensions. A naive choice of a
rigid domain wall predicts an energy that grows linearly with system size and a logarithmic entropy, so that
domain walls would have positive free energy at any temperature in a large system (∆G ≈ 2JL − kT ln L)
and they could not make the system unstable. The exercises show that this argument is too crude, since
in reality a zigzagging domain wall has entropy that scales linearly with L, competing with the energy; but
3 The exact expression of the correlation length is derived in the exercises.

even if the naive argument were correct, one could not conclude that the system is always ordered, because
there could be other defects that are more energetically favorable than the rigid domain wall and could drive
a phase transition. In summary, arguments such as those made in this section can only be used to prove the
absence of an ordered phase, but not its presence.
11.3 The Landau Theory of Phase Transitions

The notions of order parameter and of analyticity of the free energy are central to the Landau theory of
phase transitions. The theory consists in recognizing the order parameter as the important degree of freedom
of the system near the transition. All other degrees of freedom behave nicely (by definition, since we only
can tell there is a transition between different phases because of the order parameter). We can think of them
as averaging away quickly (both in time and in space). So imagine we perform all the averages involved in
the calculation of the partition function, except that over the order parameter. Rigorously, we could appeal
to Sect. 3.8: define an order parameter as a random variable of the degrees of freedom of the system, say the
coordinates of the particles, φ{xi }; since the probability that this function has any particular value φ is one,
Z
1 = dφδ(φ − φ{xi }),
we can write the partition function as

Z 3N 3N Z
d xd p
Q= dφ e−βH({x},{p}) δ(φ − φ{xi }) (11.3.1)
N !h3N
and performing all integrals except for that over φ we have
Z
Q = dφ e−βV f (φ,T ) (11.3.2)
where f (φ, T ) is some complicated function of the order parameter and of temperature; by the extensivity
of the free energy, ln Q, we know that f must be intensive, since we have scaled out a factor of volume: it is
a free energy density. Taking the thermodynamic limit via the saddle point method, one has
Q ≈ e−βV f (φmin ,T ) , (11.3.3)
or
F = V f (φmin ) + const.
The problem is, what is f (φ, T )? Landau great insight was to recognize that the requirement of analiticity
away from a a point of phase transition constrains the form of f greatly. First, it must be a power series of φ,
with temperature-dependent coefficients. The temperature dependence arises from all the integrals over the
degrees of freedom. Second, it must obey all symmetries imposed by the physics. These are usually (but not
always) easy to identify; in the next chapter we will see that an important class of systems, including fluids,
binary mixtures, and certain ferromagnets, can be described by an order parameter that has “up-down”
symmetry: the free energy must not change if we flip the sign of φ. If the system is subject to an external
force field h that couples linearly to φ, the free energy must not change if the signs of both h and φ are
flipped simultaneously. Third, consider the system near a critical point, where the difference between the
two phases (experimentally, the density difference, surface tension, latent heat, etc.) is small. Then we
can expect that the order parameter will be small and only the first few terms of the power series will be
important; this is an approximation that can actually be verified a posteriori. Finally, local fluctuation in φ
should be considered in principle (based on Sect. 9.8, or if the external field is spatially modulated): then
the order parameter should be taken to be a function of position (i.e., a family of local order parameters4 )
and the free energy density should be allowed to depend also on the gradient of the order parameter and to
contain terms such as (∇φ)2 .
4 In
this case, the integral over φ in Eq. 11.3.2 actually becomes a multiple integral over all local order parameters; integrals
where the integration variable is a function are called functional integrals.

11.3. THE LANDAU THEORY OF PHASE TRANSITIONS
Putting all these considerations together, the free energy density can only have the form
a b c
f (φ, h) = f0 + φ2 + φ4 + (∇φ)2 − hφ + ..., (11.3.4)
2 4 2
where the dots remind us that we left out terms of higher order in φ and in its gradient. The coefficients
a, b, c and the “background” free energy f0 are temperature dependent. This expression of the free energy
is exact, but is often called “phenomenological,” meaning that the coefficients are not derived from first
principles. Unfortunately, its exact solution is difficult and is beyond the scope of these notes; however, it
is instructive to work out a simple solution under assumptions that amount that amount to a “mean field
approximation.” We will explore the meaning and the limits of this approximation in the next chapter in
great detail. Here, we just state the main results of Landau mean field theory, obtained by neglecting the
effect of gradient terms in the Landau free energy, and assuming a spatially constant order parameter φ.
The thermodynamic limit is obtained by minimizing the free energy with respect to φ:
df
=0 =⇒ aφ + bφ3 − h = 0. (11.3.5)
dφ
Consider h = 0 for simplicity, and let f0 , b, c be some nonzero constants, while a = a0 (T − Tc ), with a0 > 0.
When a > 0, there is a single, stable solution, φmin = 0, meaning that the order parameter vanishes: this
is the high temperature disordered phase. However, if a < 0, the zero solution becomes unstable and two
stable solutions emerge,
p
φmin = ± −a/b ∝ τ 1/2 , (11.3.6)
with
Tc − T
τ= . (11.3.7)
Tc
There are now two phases and the free energy, f (φmin , T ) develops a singularity at exactly the point where
a = 0, or T = Tc . As the critical temperature is approached from below, the order parameter becomes
smaller and smaller, according to a power law with exponent 1/2. Many other physical quantities turn out
to have a power law dependence near the critical temperature, giving rise to a maze of critical exponents.
Susceptibility near the critical point
The order parameter susceptibility

∂m(h)
χ=
∂h h=0
also follows a power law near the critical point. To calculate it, we must retain the external field,
take the derivative, and then set the field to zero. We can write
φ(h) = φ(0) + χh
through first order in h, where φ(0) is the solution to Eq. 11.3.5 in zero field. Thus, for T > Tc ,
φ(0) = 0, so aχh = h and
χ = 1/a ∼ |τ |−1 , T > Tc .
p
For T < Tc , φ(0) = ± −a/b, so (φ(0) + χh)(a + bφ(0)2 + 2bφ(0)χh) − h = 0 neglecting terms of
order h2 . But a + bφ(0)2 = 0, so we can write (φ(0) + χh)2bφ(0)χh − h = 0 or
χ = 1/2bφ(0)2 = −1/2a ∼ |τ |−1 , T < Tc .
From the calculation, we see that the susceptibility diverges as 1/|τ | both above and below Tc , but
the amplitude coefficients are different, which is the normal occurrence near the critical point. The
heat capacity is another example; see Exercises.

11.4 Correlations near Critical Points

The power law behavior of a system’s properties near the critical point, and the associated critical exponents,
are a manifestation of what is referred to as universal behavior. This is related to the law of corresponding
states for real gases (see Eq. 9.6.4), since it means that data from disparate systems can be collapsed on
the same curve; however, it has both broader applicability and much deeper meaning. In fact, a power law
behavior signifies the absence of a characteristic scale in the system.5
Correlation Length in Landau Theory
A characteristic length scale is set by the size of the subsystems within which particles behave co-
operatively; on larger scales, particles appear uncorrelated. Near the critical point, this length scale
diverges: the system is macroscopically correlated. How this happens can be understood from an
approximation to Landau theory, that retains the gradient term but neglects the nonlinear terms in
Eq. 11.3.4 Z
3 c 2 a 2
F [φ] = d r (∇φ) + φ − h(~r)φ(~r) . (11.4.1)
2 2
We are interested in the following question: how far is the effect of a small, localized perturbation
felt in the medium as T → T + ? For instance, we could imagine flipping a spin (or a small cluster of
spins) in a magnet near the Curie point or putting a droplet of liquid in a barely supercritical fluid.
Mathematically, the knob for doing this is the external field, h(~r). We need to minimize the free
energy with respect to the order parameter in the presence of the perturbation. Since we have only
retained the linear and quadratic terms of the free energy, it is convenient to work with the Fourier
transform of the order parameter and of the perturbation, φ̂(~k) and ĥ(~k), using the three-dimensional
extension of Eq. 3.9.4. Note that since φ(~r) is real, we have φ̂(−~k) = φ̂∗ (~k), where ∗ indicates the
complex conjugate. Thus the free energy can be written as
Z 3
∗ d k c 2 a ~ ∗ ~ 1 ~ ∗ ~ ∗ ~ ~
F [φ̂, φ̂ ] = k + φ̂(k)φ̂ (k) − ĥ(k)φ̂ (k) + ĥ (k)φ̂(k) . (11.4.2)
(2π)3 2 2 2
The advantage of the Fourier transform is that each value of ~k is decoupled from the others. If we
regard the continuum of k-space as a dense mesh of discrete points, the free energy is a diagonal
quadratic form, and its minimization is a straightforward algebraic problem:
∂F ĥ(k)
=0 =⇒ (k 2 + a)φ̂∗ (~k) = ĥ∗ (k) =⇒ φ̂(~k) = . (11.4.3)
∂ φ̂(~k) ck 2 + a
Now, if h(r) is taken to be localized, i.e., a point source h(r) = hδ(~r), then ĥ(~k) = h is a constant.
Thus,
Z 3 Z 3 r
d k ~ i~k~˙r h d k i~k~˙r 1 h a
φ(~r) = φ(k)e = e = exp − r ∼ e−r/ξ , (11.4.4)
(2π)3 c (2π)3 k 2 + (a/c) c c
showing that the order parameter decays exponentially away from the localized perturbation with
characteristic length r r
c c
ξ= = ∼ |τ |−1/2 . (11.4.5)
a a0 Tc |τ |
ξ is the correlation length. Note the mathematical analogy between ξ and the Debye screening
length, stemming from the fact that the linearized differential equations solved in the two cases are
identical.
The existence and the value of the correlation length are among the most powerful predictions of Landau
5A familiar example may be fractals, which are objects that look the same on all length scales and are characterized by
“fractal exponents.”

11.5. EXERCISES
Theory. As long as ξ stays finite, the system has a well defined characteristic length. However, ξ has a
negative critical exponent, meaning that it becomes infinite as the critical point is approached (we have just
shown it from above; the same is true from below). The exact value of the exponent is somewhat different
from 1/2 when calculations are performed retaining the term φ4 in the free energy, but the qualitative
behavior of the correlation length is well captured by Eq. 11.4.5. In particular, the equation shows us
that at the critical point, a system is infinitely correlated, so that a perturbation anywhere in the system
must eventually be “felt” anywhere else. If the theory is extended to the time domain to include relaxational
dynamic effects, one finds that relaxation times also diverge at the critical point. As a consequence, a system
takes a long time to reach equilibrium: experiments near critical points are extremely tricky to perform for
this reason. So are computer simulations under the same conditions. In particular, (a) the approach to
the critical point is severely limited by the size of the system, which must be greater than the correlation
length at a given temperature and (b) equilibration times become very large making, computations rather
CPU-intensive.
11.5 Exercises
P
1. Consider the one dimensional Ising model in zero external field, H = −J i si si+1 . The correla-
tion function, Γ(k) = hsi si+k i, by symmetry depends only on the distance between lattice points,
k. ShowP that Γ(k) = hsi si+1 si+1 si+2 si+2 ...si+k−1 si+k−1 si+k i. Then, rewrite1 the ∂Q
Hamiltonian as
H = − i Ji si si+1 ; at the end all Ji will be set equal to J. Show that hsi si+1 i = βQ ∂Ji , and hsi si+k i =
1 ∂k Q
Calculate the partition function and show that Γ(k) = (tanh(βJ))k . Does the corre-
β k Q ∂Ji ...∂Ji+k−1
.
lation length ever diverge?
2. The free energy cost of having a straight domain wall in a 2D Ising model on an L × L lattice is
2JL − kT ln L. But how about if the wall is not straight? Consider the following rule for making a
domain wall: draw the wall from left to right, but instead of going straight right at each step, you can
go straight, or one step up or one step down. How does the free energy scale with L and what does
this result tell about order in the 2D Ising model?
3. Show that the heat capacity at constant volume is discontinuous at T = Tc in Landau mean field
theory. This means that the approach to Tc is of the type cV = c± |τ |0 , i.e., the critical exponent is
zero, and the amplitudes are different above and below Tc .
4. Derive the exponent of the critical isotherm in Landau mean field theory; that is, find how φ depends
on h along the isotherm T = Tc .

Index
Adiabatic Demagnetization, 37 Expectation (Value), 11
Bayes’ Theorem, 10 Fermi Distribution, 53

Bernoulli process, 11 Fermi Energy, 54
Binomial Theorem, 7 Fermions, 7
Blackbody Radiation, 47 First Law of Thermodynamics, 1
Boltzmann Factor, 40 Fourier Transform, 15
Bosons, 7 Fugacity, 50
Canonical, 40 Gaussian Distribution, 12

Carnot Cycle, 30 Gibbs-Duhem Equation, 28
Carnot’s Theorem, 30
Central Limit Theorem, 15 Hamiltonian, 1
Chemical Potential, 25 Harmonic Oscillator, Classical, 43, 44
Chemical Potential, Ideal Gas, 25 Heat Capacity, 41
Classical Nucleation Theory, 58 Heat Capacity (Diatomic Molecule), 45
Clausius’ Theorem, 31 Hill Plot, 52
Clausius-Clapeyron Equation, 49 Homogeneous Function, 26
Concavity of Entropy, 26
Conditional Probability, 10 Independence (of Random Variables), 11
Conserved Quantities and Entropy, 17 Internal Energy, 25
Contact Potential, 54 Internal Energy and Canonical Partition Function, 40
Cooperativity, 52 Ising Spin Chain, 72
Correlation Function, 11
Joint Probability, 11
Correlation Length, 76
Corresponding States Law, 57 Kinetic Theory, 24
Counting, First Rule, 5
Counting, Second Rule, 6 Lagrange Multipliers, 20
Covariance, 11 Landau Potential, 50
Critical Isotherm (Landau Theory), 77 Landau Theory, 74
Cumulant Distribution Function (CDF), 12 Langmuir Isotherm, 52
Legendre Transform, 33
Debye Model (Crystals), 46
Debye-Hückel approximation, 69 Macrostate, 2
Domain Wall, 73 Maxwell Distribution, 16
Maxwell Relations, 36
Efficiency, 29 Mean Field Theory (Bragg-Williams), 64
Electrochemical Potential, 54 Mean Field Theory (Curie-Weiss, 67
Entropy, 17 Metastability, 58
Entropy and Random Processes, 17 Microcanonical, 40
Entropy, Ideal Gas, 19 Microstate, 1
Entropy, Lattice Gas, 18
Equilibrium Constant, 52 Nucleation Rate, 58
Equilibrium, Thermodynamic, 23
Equipartition of Energy, 44 Occupation Number, 44
Exchange Parameter, 64 Order Parameter, 59
78
INDEX
Partition Function, Canonical, 40

Partition Function, Grand Canonical, 50
Phase Equilibrium, 49
Phase Rule, 2
Phase Space, 1
Phase Transition, Order of, 58
Poisson Distribution, 12
Poisson-Boltzmann Equation, 68
Probability Axioms, 9
Quantum Degeneracy, 25
Quantum Statistics, 7
Reservoir, 27
Responses, Thermodynamic, 26
Reversibility and Maximum Work, 30
Reversible Processes, 26
Sackur-Tetrode Formula, 19
Saddle Point Method, 42
Screening, 69
Second Law of Thermodynamics, 28
Spreading Pressure, 60
Square Well Potential, 56
Stability, 35
Standard Deviation, 11
Stirling’s Formula, 6
Thermal de Broglie Wave Length, 25

Thermodynamic Potentials and Equations of State,
35
Third Law of Thermodynamics, 28
Two-level system, 18
Uniform Distribution, 12
Variance, 11
Virial Coefficients, 54
Zeroth Law of Thermodynamics, 2


CBE240book PDF

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

CBE240book PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CBE240book PDF

Uploaded by

Copyright:

Available Formats

Statistical Thermodynamics

Berkeley, CA July 2020

1 The goal of Statistical Thermodynamics 1

6 Thermodynamic Processes and Cycles 29

8 Statistical Mechanics in Thermal Equilibrium 39

9 Statistical Mechanics of Open Systems 49

10 Mean Field Theory 63

11 Exact Results and Breakdown of Mean Field Theory 71

The goal of Statistical

1.1 Microscopic State of a System

1.2 Hamiltonian Mechanics and Phase Space

(see next section)

1.3 Macroscopic State of a System; Gibbs Phase Rule

1.4 Thermodynamic Equilibrium and the Zeroth Law of Thermo-

1.5 The Ergodic Hypothesis

2.1 How Many? A Simple Rule

Example: The Power Set

2.2 Permutations, Factorials, and Stirling’s Formula

2.4 Distinguishable vs Indistinguishable Objects

3 This formula is quite accurate even for single digit integers!

2.5 Quantum Statistics

of three that are occupied). Mathematically, Boltzmann,

2.6 The Binomial Theorem

The proof of the general case is left as an exercise.

Example: Configurations of a gas

Consider n gas molecules in a container of volume V , which is divided

4. Prove of the binomial theorem using a combinatorial argument.

3.1 Definition of Probability

3.2 Conditional Probability and Bayesian Inference

Example: Unfair coins

3.3 Moments of Probability Distributions: Expectation and Vari-

3.4 Joint Probability Distributions, Independence, and Covari-

3.5 Binomial Distribution

3.6 Binomial Distribution for Large n: Gaussian and Poisson Dis-

3.7 Uniform Distribution and Cumulant Distribution Function

3.8 Distribution of a Function of Random Variables

Pair correlation function

3.9 Characteristic Function and Central Limit Theorem

as can be seen from Eq. 3.9.1 by differentiating under the integral.4

so that we see that φ(k) = p̂(−k). The inverse Fourier transform is

4.1 Entropy of a Random Process

4.2 Boltzmann’s Entropy Formula

the logarithm is then taken in base 2.

4.3 Discrete Phase Space: the Entropy of a Lattice Gas

Not surprisingly this is the entropy of M independent Bernoulli variables.

4.4 Continuum Phase Space: the Entropy of the Classical Ideal

momenta span the surface of a 3N -dimensional hypersphere of area4

This formula is called the Sackur-Tetrode entropy for its discoverers.

4.5 Principle of Maximum Entropy

behavior on the positive real axis is given by Stirling’s formula

The definition of statistical entropy ensures that

5.1 The Dependence of Entropy on E, V, N

5.2 Thermodynamic Forces

changes under infinitesimal exchange of energy,

d(SA + SB ) dSA dSB dSA dSB

5.3 Connection to Kinetic Temperature

Thus we are led to the straightforward identifications

and moreover we find that

or, in terms of the dimensionless parameter x = β,