CBE240book PDF
CBE240book PDF
CBE240book PDF
Carlo Carraro
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft ii
Preface
These lecture notes present an introduction to the statistical foundations of equilibrium thermodynamics.
I take a Bayesian approach to the topic, where macroscopic entropy is introduced as a measure of the
information that could be gained by observing the microscopic state of a physical system. This approach is
appealing for its logical simplicity. It is rooted in the principle of insufficient reason, enunciated by Jakob
Bernoulli, which in this context essentially states that in the absence of information to the contrary, a physical
system is equally likely to be found in any microscopic state compatible with the known conservation laws
(such as particle number, energy, and so on); entropy quantifies this absence of information. A commonly
followed alternative starting point, more aligned with a frequentist viewpoint of statistics, is the ergodic
hypothesis, that a system will in due time visit all microscopic states compatible with conservation laws; but
this due time can be so long compared to any practical timescale for observation (or even to the proverbial
age of the universe), that one must always question whether any such observation yields representative
time-averages. This issue is particularly vexing in molecular dynamics simulations. In fact, breakdown of
ergodicity occurs in many interesting systems, such as in any system exhibiting phase transitions.
The material presented here strives to be self contained, although knowledge of calculus is assumed, as well
as some familiarity with thermodynamics, as developed in introductory undergraduate science or engineering
courses. Chapter 1 introduces the concept of macroscopic vs microscopic state of a system. The mathematical
tools needed in statistical thermodynamics are laid out in chapters 2 and 3 (counting and probability theory,
respectively). After introducing entropy in probability theory, we transition to the physical world with an
application to the classical ideal gas in Chapter 4. Here, the power of maximum entropy as a thermodynamic
potential begins to become clear. The properties of entropy are further developed in Chapter 5, which
includes the statement of the Second Law of Thermodynamics, two important consequences of which are
explored in Chapter 6 (Carnots and Clausius theorems). Chapters 7 and 8 are devoted to the topic of
boundary conditions, implemented macroscopically through the construction of suitable thermodynamic
potentials (Chapter 7) and interpreted probabilistically as the process of entropy maximization with suitable
constraints (Chapter 8). Chapter 9 deals with open systems, and lays the foundation for the study of
phase equilibria and phase transitions. Chapter 10 develops the mean field approximation for systems whose
degrees of freedom can be modeled as Bernoulli variables. Finally, Chapter 11 presents some exacts results
to highlight the limitations of mean field theory and to illustrate the role of dimensionality on the breakdown
of mean field theory.
The emphasis throughout is on the logical structure of statistical thermodynamics more than on any
particular application; the goal is to empower the reader to think about very different physical situations in
a coherent, unified fashion. Accordingly, the organization of the notes tends to have each section dedicated
to the exposition of a fundamental concept followed by a worked out example (sometimes reinforced in
the exercises at the end of each chapter). These examples and exercises are taken from a variety of topics
in physical chemistry, solid state state physics, and materials science. The often cursory or introductory
treatment afforded in these notes to important topics like the theory of electrolytic solutions, the ideal Fermi
gas, or the blackbody radiation, is meant to encourage the students to explore further on their own, and to
instill the confidence that they are well equipped to do so.
iii
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft iv
Contents
Preface iii
2 Counting 5
2.1 How Many? A Simple Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Permutations, Factorials, and Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Distinguishable vs Indistinguishable Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Quantum Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 The Binomial Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Probability 9
3.1 Definition of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Conditional Probability and Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Moments of Probability Distributions: Expectation and Variance . . . . . . . . . . . . . . . . 11
3.4 Joint Probability Distributions, Independence, and Covariance . . . . . . . . . . . . . . . . . 11
3.5 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.6 Binomial Distribution for Large n: Gaussian and Poisson Distributions . . . . . . . . . . . . . 12
3.7 Uniform Distribution and Cumulant Distribution Function . . . . . . . . . . . . . . . . . . . . 12
3.8 Distribution of a Function of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.9 Characteristic Function and Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . 14
3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Entropy 17
4.1 Entropy of a Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Boltzmann’s Entropy Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Discrete Phase Space: the Entropy of a Lattice Gas . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Continuum Phase Space: the Entropy of the Classical Ideal Gas . . . . . . . . . . . . . . . . . 18
4.5 Principle of Maximum Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
v
CONTENTS
5 Properties of Entropy 23
5.1 The Dependence of Entropy on E, V, N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Thermodynamic Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Connection to Kinetic Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Homogeneity of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.5 Thermodynamic Susceptibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.6 Entropy Changes and Reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.7 Clausius’ Statement of the Second Law of Thermodynamics . . . . . . . . . . . . . . . . . . . 27
5.8 Entropy at Low Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7 Thermodynamic Potentials 33
7.1 The Concept of Free Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.2 Systematic Construction of Thermodynamic Potentials . . . . . . . . . . . . . . . . . . . . . . 33
7.3 Stability Criteria for Thermodynamic Potentials . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.4 The Calculus of Thermodynamics: Maxwell Relations and Jacobians . . . . . . . . . . . . . . 35
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft vi
CONTENTS
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft vii
CONTENTS
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft viii
Chapter 1
The laws of thermodynamics are empirical statements concerning energy conservation (first law) and the
maximum amount of work that can be obtained in a thermodynamic transformation (second law). The goal
of statistical thermodynamics is to derive these laws from the microscopic description of the system through
a statistical treatment of the microscopic degrees of freedom.
∂H ∂H
p˙i = − , r˙i = . (1.2.1)
∂ri ∂pi
The 6N components (in 3D) of the position and momentum variables, ~ri and p~i span the “phase space” of
the N -particle system; the state of the system corresponds to a point in phase space. It is easy to see that
dH ∂H
dt = ∂t , so that if the potential energy is independent of time, the value of the Hamiltonian is constant in
time (conservation of energy). Later on, we will deal with systems for which we can’t write down equations
of motion, but we can still specify the energy as a function of certain “degrees of freedom”, or microscopic
variables. In statistical mechanics, it is common to refer to such energy function as the “Hamiltonian” of
the system, even though the system does not follow Hamiltonian mechanics.
1 The identity of a point particle in classical mechanics is determined by its mass and its interactions, i.e., by the Hamiltonian
1
CHAPTER 1. THE GOAL OF STATISTICAL THERMODYNAMICS
Nv = 2 + Nc − φ, (1.3.1)
where Nv is the number of variables, or thermodynamic degrees of freedom, Nc the number of independent
components, and φ the number of phases. A phase is a macroscopically homogeneous substance.3 For
example, we take as our variables p, T , and the variables that determine the compositions of all phases.
(It is understood that the total amount of stuff in the system is arbitrary, for instance we could always
refer everything to one mole, so this arbitrary “variable” does not contribute to Nv .) The state of a system
described in this way is referred to as a macrostate. The macrostate of a system can be changed, e.g., by
doing work on the system or by heating it; it also changes if particles flow into or out of it.
The Zeroth Law of Thermodynamics: Two bodies that are separately in thermal equilibrium
with a third one must also be in thermal equilibrium with each other.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 2
1.6. EXERCISES
than the time scale of any experiment by which those properties are measured. In contrast, it turns out that
statistical averaging over a large number of microstates is free from these fundamental difficulties and can
often be carried out with relative ease.
1.6 Exercises
1. How many microscopic degrees of freedom for a system of 2 particles? What if the particles are
connected by a rigid rod of negligible mass? What if the particles are connected by a spring of
negligible mass?
2. How many macroscopic degrees of freedom for a 0.1M solution of NaCl in water? How many macro-
scopic degrees of freedom for a gaseous mixture of oxygen and nitrogen? How many degrees of freedom
for liquid water in equilibrium with its vapor and with ice?
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 3
CHAPTER 1. THE GOAL OF STATISTICAL THERMODYNAMICS
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 4
Chapter 2
Counting
Consider a finite set A with n elements. The power set of A is the set of all distinct subsets of A,
including A itself and the empty set ∅. To count the number of subsets of A, we employ a useful
trick. We construct a correspondence between each subset of A and a length-n binary string (a string
of 0s and 1s). The string has length n because there are n elements in A. Let us order the elements of
A (it can be done because A is countable). For a given subset of A, it either contains the first element
of A, in which case we assign the value 1 to the first digit of the binary string, or not, in which case
we assign the value 0. Then we keep going: our subset either contains the second element of A, or
not, and so forth. We see that this procedure determines a unique binary string, and moreover any
given string determines a subset uniquely (for instance, the empty set corresponds to a string of 0s;
the string 11 . . . 1 corresponds to the set A itself). How many strings of length n are there? If we
write down the string, at each digit we have a choice of two values, so by the first rule of counting
there are 2n binary strings, and the power set of A has 2n elements.
structural isomers.
2 This procedure is always possible for a finite set. For an infinite set, it may not be possible, in which case the set is said to
be uncountable.
5
CHAPTER 2. COUNTING
the value n! because by definition 0! = 1.) In statistical physics, we usually deal with very large numbers of
objects. In this case, we can use Stirling’s approximation for factorials,3
√
n! ≈ 2πn(n/e)n (2.2.1)
2.3 Combinations
In the preceding section, we considered the objects and their arrangements to be distinguishable. In other
words, we considered an ordered list of k objects picked from a group of n. Now instead we consider the case
when the arrangement of the objects does not matter. Since there are k! arrangements of a list of k objects,
but we now consider them the same list, we must divide the result of the previous section by k! (this is the
second rule of counting). In other words, we say that the number of ways of choosing k objects out of n
without replacement and irrespective of order is
n n!
:= . (2.3.1)
k (n − k)!k!
Here, “irrespective of order” means that the permutations of the k objects are not regarded as different.
In the previous two sections, we counted arrangements of objects chosen from a set “without replacement”
(or without repetition), meaning that each of the n objects could be chosen only once (as when dealing
from a deck of cards). In some instances we are interested in the possibility that the n elements can occur
repeatedly in a “lineup”. We did precisely this when we counted the number of binary strings of length n.
We can generalize to n-strings with an m-character alphabet, or equivalently n distinguishable balls thrown
into m distinguishable bins. By the first rule of counting, there are mn ways to throw n distinguishable balls
into m distinguishable bins, because for each of the n balls, we have m choices of bins to throw them into.
Suppose now the balls are indistinguishable. We want to know how many arrangements there are for
n indistinguishable balls in m bins; the bins are distinguishable.4 This counting problem arises in many
practical instances, including in quantum statistical mechanics where one needs to count the arrangements
of n identical particles among m energy levels. Imagine arranging the balls in a row; we may as well visualize
them as a string of n 0s. Now these balls have to be arranged in m bins; some bins may be empty. If we had
just two bins, we would need simply to specify where the first bin ends and the second begins, which can
be done by laying down the digit 1 as the divide between the two groups of zeros (if the first bin is empty,
the 1 will be to the left of the all 0’s, if the second bin is empty, it will be to the right; remember, bins are
distinguishable!) Now we extend the procedure to m bins by placing m − 1 digits 1 as the divides between
consecutive bins.5 Therefore, we have constructed a string of n + m − 1 0s and 1s, containing exactly n 0s in
any position. How many such strings are there? This is a familiar problem, which we solved in the previous
section; the answer is
n+m−1
.
n
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 6
2.5. QUANTUM STATISTICS
Consider two point particles and assume that they are not interacting with each other. Suppose
we know the microstates that a single particle can occupy, i.e., the single particle energy levels,
and suppose there are three such levels. How many microstates can the two-particle system be
found in? If the particles are distinguishable, there are 32 = 9 possible microstates; in statis-
tical mechanics, considering particles as distinguishable is referred to as Boltzmann counting.
If the particles are indistinguishable, and multiple oc-
cupancy of a level is allowed, then there is a total of
2+3−1
2 = 6 possible microstates; in statistical mechan-
ics, considering particles as indistinguishable with no oc-
cupancy restriction is referred to as Bose counting. Fi-
nally, if no two indistinguishable particles can occupy the
same level, which is referred to as Fermi counting, one
has 32 = 3 possible microstates (just choose the two out
It is interesting to generalize the example above to arbitrary number of particles, N , and levels, M . In
−1
each of the three cases, one finds M N (distinguishable), N +M
N (indistinguishable with no restrictions),
and M
N (indistinguishable with single occupancy, or “without repetition”; note that M ≥ N necessarily).
What happens when the number of levels (or bins) is much larger than the number of particles (or balls)?
−1
MN
Look at the limiting cases for M N (which implies M 1). Then, N +M ≈ N +M ≈ N! ≈ M
N N N , so
both Bose and Fermi counting give the same approximate result, which is the Boltzmann counting divided
by N !. This happens because if there are many more bins than balls, most of the bins will be empty, and
a few will be singly occupied. Multiple occupancy will be extremely rare. Then, the only consequence of
indistinguishability is that the order in which we put the balls into their bins (one per bin) is irrelevant, and
thus we divide by N !, the number of all permutations of the balls.
The equality we have just proved is a particular case of the important Binomial Theorem, which states that
n
X n
(p + q)n = pk q n−k . (2.6.1)
k
k=0
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 7
CHAPTER 2. COUNTING
the configuration at right is more “typical”, all of them are possible microstates of the system. If
we assume that the system spends equal time in each microstate (ergodic hypothesis), then the
configuration at left will occur for about a minute every hour, while the one at right for about twenty
minutes.
2.7 Exercises
1. How many poker hands are there? How many bridge hands? How many bridge hands with two aces?
(A poker hand has 5 cards; a bridge hand has 13; and the whole deck 52.)
2. A set S has n elements. How many subsets of S have at least 3 elements?
3. How many nonnegative integer solutions does the equation x1 + x2 + x3 + x4 + x5 + x6 = 9 have?
7. In the example of Sect. 2.6, how many molecules would it take for them to be completely segregated
in the lower container for only one minute over the entire life of the universe? (The universe is about
15 billion years young.)
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 8
Chapter 3
Probability
iii. The probability of a union of disjoint (mutually exclusive) events is the sum of the probabilities of the
individual events.
In an experiment, we can sometimes treat the outcome as a continuous variable (e.g., the position of
an object on a line). Then, χ is not a countable collection and so we associate a (positive semidefinite)
probability density function (pdf) p(x) with outcomes distributed in the (infinitesimal) interval (x, x + dx)
by equating their probability to p(x)dx. In this case, the axioms still apply with the following changes:
p(x) ≥ 0 (but not Rnecessarily less or equal to one1 ) and the sums in the axioms above become integrals (e.g.,
∞
axiom ii becomes −∞ dx p(x) = 1).
To develop an intuition for the meaning of the probability axioms it is useful to picture a planar board
of arbitrary shape and finite mass. We take the total mass of the board to be the unit mass (axiom 2). The
board could be infinitely large (in which case we know the areal mass density must be infinitesimally small
over most of the board so that the mass can be unity). An outcome is a piece of the board. The probability
of the outcome is the mass of that piece of the board. Therefore, its mass is between nothing and the mass
of the whole board (axiom 1). If we break a section of the board into several pieces, their masses add up to
the mass of the original section (axiom 3).
In general, there are two conceptual steps involved in the statistical description of the physical world. The
first is to find the appropriate correspondence between χ and a set of numbers (this is the identification of
the random variable); the second is to prescribe the correct form of p(x) and predict the physical properties
of the system in question (if you are an experimentalist, these are the very practical steps of deciding what
to measure, and then doing it). Statistical mechanics gives us the recipe for the second step. The first step is
the selection of a model, and is guided largely by physical intuition and symmetry considerations. Of course,
it is also possible to travel the path in reverse, and use an experiment to infer if the probability distribution
was assigned correctly, as we discuss in the next section.
1 It is still true that the probability of any event must be less than or equal to one!
9
CHAPTER 3. PROBABILITY
The probability distribution contains all information available about the random variable X. Therefore,
the probability distribution itself depends on the information available, and must be updated when new
information becomes available. The process of updating knowledge after making an observation is called
Bayesian inference, and rests on the concept of conditional probability: the probability of B given that A is
true (indicated by p(B|A)) is given by the probability that A and B are true, normalized by the probability
of A, which is written formally as
p(B ∩ A)
p(B|A) = .
p(A)
In terms of our board analogy, imagine drawing two closed figures, A and B, on the board. Then, p(A) is the
mass of the piece of board that we would get if we cut off figure A; remember the unit of mass is the mass
of the whole board. Now, figures A and B could intersect. Then, p(B ∩ A) is the mass of the intersection,
and p(B|A) is the mass of the intersection, expressed in units of the mass of figure A. In other words, we
forget about the original board and restrict our interest to figure A. In probability theory, we say that we
are “conditioning on A.” A useful theorem due to Bayes states that
p(B)
p(B|A) = p(A|B) .
p(A)
1. (Updating knowledge) Suppose we are given a bag with two coins. One is fair, and one has two
heads. Without looking, we reach for a coin and toss it. What is the probability that we observe
tail? This kind of problems can easily be solved by drawing a tree, where the branches exiting from
each node represent a possible outcome and are labeled with the corresponding probability. The
probability that we observe tail is 1/4, since there is a probability 1/2 of having picked the fair coin,
and for this, the probability of tail is 1/2. Suppose we do observe tail, and toss the same coin again.
What is the probability that we observe tail at the second toss? Clearly, we know after the first toss
that we must be holding the fair coin, so if we toss it again, we have a 1/2 probability of getting a
tail. However, consider this problem (same bag as before): without looking, we reach for a coin and
toss it twice. What is the probability of observing tail at the second toss? Here, we only look after
the second toss. Then, the probability of tail is still 1/4, since there has been no update of our prior
knowledge.
2. (Bayes’ theorem). Suppose we are given a bag with ten coins. Eight are fair, and two have two
heads. Without looking, we reach for a coin, flip it, and observe head. What is the probability that the
coin is fair? This is a typical application of Bayes’ theorem. Let us call p(F ), p(U ) the probabilities
that the coin is fair and unfair, respectively; and p(H), p(T ) the probabilities of observing head and
tail, respectively. Bayes theorem states that
p(F )
p(F |H) = p(H|F ) .
p(H)
The numerator terms are easily found: p(H|F )p(F ) = (1/2)(4/5) = 2/5. To find the denominator,
note that p(H) = p(H ∩ F ) + p(H ∩ U ) = p(H|F )p(F ) + p(H|U )p(U ). The first equality follows
from axiom iii and the second from the definition of conditional probability. But p(H|F ) = 1/2 and
2/5
p(H|U ) = 1, so p(F |H) = 2/5+1×1/5 = 2/3.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 10
3.3. MOMENTS OF PROBABILITY DISTRIBUTIONS: EXPECTATION AND VARIANCE
It is the average of all values of x weighted by their probability. Thus, if we run a long sequence of independent
experiments that measure the random variable X (i.e., each experiment samples X) and take the long time
average, the average value will converge to the expectation of X. This argument can be formalized by the
central limit theorem, as we will see later. The expectation value is sometimes indicated by the bracket
notation hxip (where the subscript denotes the probability distribution that is being sampled) or simply by
hxi when the distribution is unambiguously understood. A measure of the spread of the distribution p(x)
about the average is given by the expectation of the square of the distances between x and hxi, which is the
variance
XN
x2i p(xi ) − hxi2 , X discrete;
Var(X) = Zi=1 (3.3.2)
2 2
dx x p(x) − hxi , X continuous.
p
The standard deviation is σ = Var(x) and has the units of x. The quantity hxn i is called the n-th moment
of x.
The covariance is often called correlation function. If two variables are independent, then p(x, y) = p(x)p(y),
so Cov(x, y) = 0, but the converse is not necessarily true. If Cov(x, y) = 0, then x and y are said to be
linearly uncorrelated or linearly independent.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 11
CHAPTER 3. PROBABILITY
binomial random variable counts the number of successes in n independent Bernoulli trials. The probability
of k successes in n trials is given by the binomial distribution
n k
Pn,q (k) = q (1 − q)n−k .
k
This results follows simply from the assumption that the trials are independent, so that the joint probability
for the n trials factorizes, and from applying axiom iii after counting the number of different (nonoverlapping)
ways to get the desired number of P successes. It is easy to see that Pn,q (k) is normalized from the binomial
n
expansion of 1n = [q + (1 − q)]n = k=0 Pn,q (k) and that E(k) = nq and Var(k) = nq(1 − q). The binomial
distribution models, for example, a random walk, i.e., a process where a walker takes a sequence of n
independent random steps, each step to the right with probability q or to the left with probability (1 − q)
(a fair coin or an unbiased walk correspond to q = 0.5). Random walks can in turn be used to model many
interesting physical systems, from polymers to diffusion processes.
This is the Poisson distribution. It has expectation hki = λ and variance hk 2 i − hki2 = λ.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 12
3.8. DISTRIBUTION OF A FUNCTION OF RANDOM VARIABLES
Computer generated (pseudo)random numbers are distributed uniformly in the interval [0, 1). Other
distributions can be conveniently sampled from the uniform distribution with the following trick. Consider
the random variable X with arbitrary pdf p(x), where −∞ ≤ x ≤ ∞ without loss of generality. The
cumulant distribution function (CDF) of X is defined as the probability that X ≤ x or
Z x
F (x) = dx0 p(x0 ), (3.7.2)
−∞
so that p(x) = dFdx(x) . Put y = F (x); y is clearly a random variable; it depends on x in such a way that
if x1 ≤ x ≤ x2 , then F (x1 ) ≤ y ≤ F (x2 ) because p(x) is nonnegative. We assert that y is uniformly
dy
distributed in [0, 1]. This follows from the fact that g(y)dy = p(x)dx, but since dx = p(x), we must have
g(y) = 1. Therefore, we can pick a random number y uniformly distributed in [0, 1) and be guaranteed that
x = F −1 (y) will be distributed according to p(x). F −1 , the inverse function of F , is accessible analytically
or numerically.
We will often encounter the case where we need to find the distribution of a function of random variables
with known joint distribution. The systematic way to solve this problem is to enforce the constraint with the
aid of a Dirac delta function (see example in the box below). In many cases, one can equivalently carry out
the task by integrating the joint probability of the random variables over the portion of the probability space
where the constraint is know to be satisfied. An example is the important problem of the distribution of the
sum of random variables. Consider two independent random variables, (x, y), each uniformly distributed in
[0, 1]. The joint pdf is of course the unit constant over the square domain [0 ≤ x ≤ 1] × [0 ≤ y ≤ 1]. What
is the probability distribution of z = x + y? That is, what is the probability that x, y add up to a given
number z? Clearly, neither variable can be greater than z, so we know, for example, that 0 ≤ x ≤ z if z ≤ 1,
and z − 1 ≤ x ≤ 1 if z > 1; either way, we have a choice for x. However, for a given x, y has to be exactly
z − x, so there is no choice for y and we can express p(z) as a single integral in x:
Z z
dx = z, 0≤z≤1
Z0
1
pZ (z) = dx = 2 − z, 1 ≤ z ≤ 2 (3.8.1)
z−1
0 otherwise.
Note that the same result can be arrived at with the use of the Dirac δ. Let u(t) = 1, 0 ≤ t ≤ 1; u(t) = 0
otherwise, be the uniform pdf.3 Then
ZZ Z 1
pZ (z) = dxdy u(x)u(y)δ(x + y − z) = dx u(x)u(z − x), (3.8.2)
0
which evaluates to the answer in Eq. 3.8.1. Equation 3.8.2 is interesting because it shows explicitly the
general fact that given two independent random variables, the pdf of their sum is the convolution of their
pdfs.
3A more compact notation is u(t) = θ(t)θ(1 − t), where θ(t) is the Heaviside step function.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 13
CHAPTER 3. PROBABILITY
where ρ = N/V . With this normalization, g(r) → 1 as r → ∞. However, at small distances, the plot
of g(r) has oscillations revealing the existence of a hard core and of an average coordination shell. In
numerical simulations, the evaluation of integrals with a δ functions like the one above can be carried
out by discretization and in that way they are reduced to the process of creating a histogram.
The characteristic function of a probability distribution is defined as the expectation value of the imaginary
exponential
Z
ikx
φ(k) = he i= dxp(x)eikx . (3.9.1)
It has the property that, if the moments of p exist, they can be generated as coefficients of the Taylor
expansion of φ at the origin:
dn φ(k)
hxn i = (−i)n , (3.9.2)
dk n k=0
4 This may not always be allowed; in fact, the moments of a distribution may not always exist, but the integral in Eq. 3.9.1
always exists, and so does the characteristic function.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 14
3.9. CHARACTERISTIC FUNCTION AND CENTRAL LIMIT THEOREM
Fourier Transform
The characteristic function is essentially the Fourier transform of the probability distribution. The
Fourier Transform of a function f (x) is defined as
Z
fˆ(k) = dxf (x)e−ikx , (3.9.3)
(x − x0 )2
2 2
−k σ
Z
1
√ dxe−ikx exp − 2
= e−ikx0
exp . (3.9.5)
2πσ 2 2σ 2
The width of the Fourier transform is the inverse of the width of the original Gaussian. This gen-
eral feature of Fourier transforms, that a broad function has a narrow transform and vice versa, is
intimately related to the Heisenberg uncertainty principle of quantum mechanics.
The characteristic function is important in statistical mechanics as it is often used to calculate correlation
functions. Here, we use it to demonstrate a fundamental result of statistics, the Central Limit Theorem.
Suppose we measure the random variable X, having probability distribution pX (x), n times; that is, we
build a sequence {xi , i = 1, . . . , n} of independent, identically distributed random variables all drawn from
the distribution
Pn pX (x). We assume the moments of X exist and put µ := hxi, σ 2 := hx2 i − µ2 . Let
1
y = n i=1 xi be the average of the measurements, which is a new random variable function of the {xi }.
Under these conditions, the central limit theorem states the remarkable fact that the probability distribution
of pY (y) approaches a Gaussian distribution with mean µ and variance τ 2 = σ 2 /n as n → ∞, regardless of
the actual shape of the distribution pX (x).
To show how this happens, consider the characteristic function φX of the distribution pX and the char-
acteristic function φY of the distribution pY . By definition,
Z P Y n Z n
iky ik i xi ikxi k
φY (k) = he i = dx1 . . . dxn p(x1 ) . . . p(xn ) exp = dxi pX (xi ) exp = φX .
n i=1
n n
(3.9.6)
Now, note that in the limit n → ∞, the argument of φX (k/n) goes to zero, suggesting an approximation by
Taylor expansion
n n n
k 2 d2 φ(k) k2 2
k k dφ(k) k 2
φX = φX (0) + + 2 + . . . = 1 + i µ − τ + O(1/n ) .
n n dk 2n dk 2 k=0 n 2n
Recalling that
n
t
lim n → ∞ 1 + = et ,
N
one finds
k2 τ 2
ikµ
φY (k) = e exp − ; (3.9.7)
2
by taking the inverse Fourier transform, one arrives at the desired result
(y − µ)2
1
pY (y) = √ exp − . (3.9.8)
2πτ 2 2τ 2
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 15
CHAPTER 3. PROBABILITY
3.10 Exercises
1. Maxwell velocity distribution We will prove later on that the velocity ~v of a molecule in a classical
real gas follows a Gaussian probability distribution (Maxwell distribution)
1 m~v2
p(~v ) = 3/2
e− 2kT .
(2πkT /m)
Using this information, find the average velocity, the standard deviation (also called rms velocity), and
the correlation function hvx vz i. What property of the Maxwell distribution is responsible for the value
of the correlation function just found?
2. Show that for the binomial distribution Pn,q (k), E(k) = nq and Var(k) = nq(1 − q).
3. How many configurations are there for a substitutional alloy AB, with N total atoms, and NA atoms
of type A?
4. What is the probability that an unbiased random walker takes 10 steps and lands 2 steps away from
the starting point? 5 steps away?
5. Let x, y be uniform independent random variables in [0, 1], and z = x + y. In this problem, you will
arrive at result 3.8.1 by a different route (so do not assume the result of 3.8.1 or 3.8.2!). Calculate the
probability that x + y ≤ z (you may do an integral, but it is easier to use elementary geometry). Since
this is the CDF of z, differentiate to obtain the pdf and verify Eq. 3.8.1.
6. Prove that the characteristic function of a probability distribution always exists. Then, calculate the
characteristic function of the Cauchy pdf (aka Lorentzian curve), p(x) = π[(x−xγ0 )2 +γ 2 ] , and show that
the second moment does not exist both by direct calculation and by the characteristic function route.
Lorentzians are ubiquitous in spectroscopy.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 16
Chapter 4
Entropy
For example, consider a six sided die. If the die is fair, the probability that the die lands on each face is
the same (1/6). Then, S = ln 6, the logarithm of the number of possible outcomes. Suppose now the die
is rigged, so that it almost always lands on 3 or 4; then S ≈ ln 2 < ln 6. We suspect that the entropy gets
bigger, the flatter the probability, and that it is maximum when all outcomes are equally likely. This fact
will be proven later. Note that when there is only one outcome (so the variable X is not random at all, but
is instead deterministic), entropy attains its minimum possible value, S = 0.
S(E, V, N ) = k ln W, (4.2.1)
where the constant k (sometimes written as kB ) is called the Boltzmann constant. Comparing Boltzmann’s
entropy formula with Eq. 4.1.1 and recalling that microstates are drawn from a uniform pdf, we see that
there is agreement up to the proportionality constant k: entropy quantifies the information we stand to gain
about a thermodynamic system if we actually decide to measure which microstate the system is occupying.
The logarithmic dependence of entropy follows from the requirement that the entropy of two isolated
systems must be additive. Consider two isolated systems, A and B, with number of microstates WA and WB ,
1 More precisely, one must look for a nondimensional number, such as the ratio of mean to standard deviation. Also note that
when a distribution deviates substantially from normal (Gaussian), mean and variance may no longer be the most meaningful
descriptors of tendency.
2 This definition is used in information theory, where entropy is often referred to as “Shannon entropy” and denoted by H;
17
CHAPTER 4. ENTROPY
respectively. Since the two systems do not interact, the number of microstates of the combined system is
WA WB by the first rule of counting. Then the entropy is k ln WA WB = SA + SB . It can be shown that the
logarithm is the only function that satisfies this property. From Boltzmann’s formula, it also follows that
S ≥ 0.
The two-level system (tls) describes a particle that can occupy either one of two energy levels, say a
ground state with energy E0 = 0 and an excited state with energy E1 = . Consider N such particles
with total energy E. Assume they are distinguishable and noninteracting, so that each of them can be
treated as a tls independent of the others. What is the entropy? To calculate S, we need to compute
W , the number of way to put the N particles in the two energy states. If energy did not matter,
the answer would be 2N , but we know that the total energy is E. NTherefore, there are n(E) = E/
N
particles in the excited state, W = n(E) , and S(E, N ) = k ln n(E) . Note that entropy does not
depend on volume in this simple model.
Each possible microstate corresponds to a point in this space. How does phase space look like? Imagine for
simplicity you just have two particles on a line segment of length L. Then phase space is spanned by the
variables x1 , x2 , p1 , p2 , with p21 + p22 = 2mE. Thus, the pair of coordinates can be chosen anywhere inside
2
a rectangle
√ of area L and the pair of momenta can be chosen anywhere on the circumference of aN circle
of radius 2mE. In the general case, coordinates span a 3N -dimensional hypercube, of volume V , and
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 18
4.5. PRINCIPLE OF MAXIMUM ENTROPY
2π 3N/2
S3N = (2mE)(3N −1)/2 .
Γ(3N/2)
Two important observations are in order before proceeding. First, there is a symmetry of phase space
that we have overlooked in our counting argument. Consider the case where we have two particles. Pick a
point in phase space and carry out the following operation: reflect the coordinates and the momenta about
the x1 = x2 and p1 = p2 axes. Clearly, this means switching positions and momenta of the two particles.
But if the masses are the same (the particles are identical), then we have simply switched the labels of the
two particles, yielding a physical configuration that no measurement could distinguish from the previous
one. This means that we have double counted the number of available configurations and so we must divide
the phase space volume by 2!, or N ! in the general case of N identical particles. Note that in the case of the
lattice gas, the binomial distribution formula does this for us automatically.
Second, knowing the volume of available phase space still doesn’t solve our problem. After all, entropy
was defined through the logarithm of a number, whereas the volume of phase space has dimensions [Lp]3N .
This indicates that we are missing a factor with those dimensions and suggests that we subdivide phase
space into elementary cells of volume (∆x∆p)3N . The multiplicity of the system is given simply by the
number of phase space cells available to the system, divided by N !. In classical physics, there is no criterion
to fix the value of the elementary phase space cell volume. In quantum mechanics, Heisenberg’s uncertainty
principle tells us that it is impossible to determine position and momentum of a particle simultaneously
with arbitrary accuracy; therefore phase space must be coarse grained. It turns out that the volume of
a phase space cell is h3N , where h is Planck’s constant (we will later demonstrate this later). Entropy is
then obtained from the natural logarithm of the number of elementary phase space cells. Using Stirling’s
approximation for the factorials and up to terms of order N (note that 3N − 1 ≈ 3N ), one finds
3/2
V 4mπE 5kN
S = kN ln + . (4.4.1)
N 3h2 N 2
d3N xd3N p
N !h3N
as the measure and constraining the Hamiltonian to the specified value of energy, E. While this integral has
a simple geometric interpretation in the case of an ideal gas, it can more generally be cast in the form
Z Z 3N 3N
d xd p
g(E, V, N ) = · · · δ[E − H({~ pi }, {~xi })]. (4.4.2)
N !h3N
g(E, V, N ) is called the density of states because the number of states of with energy between E and
E + dE is given by W (E, V, N ) = g(E, V, N )dE.5
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 19
CHAPTER 4. ENTROPY
mechanics has its own extremum principle. It states that for an isolated system, there is a functional of the
probability distribution of the microstates that takes on its maximum value for the distribution corresponding
to thermodynamic equilibrium. This functional is
X
S = −k pi ln pi . (4.5.1)
i
Several remarks are in order. First, there is a fundamental distinction between extremum principles in
dynamics and the maximum entropy principle. The former stem from the homogeneity of space and time.
The latter results from the “homogeneity” of a probability space that exists because we have “chosen” to
ignore the information encoded in the microscopic states of the system. Probabilities are conditional by
nature; they depend, as we have seen, on what knowledge is available at a given time. Next, note that
Eq. 4.5.1 does not have an immediate counterpart for continuous pdfs, which are dimensional objects and
cannot serve as arguments of transcendental functions like the logarithm. This points to some missing factor,
like a phase space measure, as already noted in our calculation of the classical ideal gas entropy. The problem
is, however, considerably more complex, and in the remainder of these note we will limit our use of Eq. 4.5.1
to discrete distributions only.
Maximization by Lagrange Multipliers
We now show explicitly that Boltzmann’s formula follows immediately upon carrying out the maxi-
mization procedure on Eq. 4.5.1. This is done by imposing the condition that the derivatives of S with
respect of the {pi } be zero; however, the probabilities cannot be treated as independent variables,
XW
because they must add up to unity: pi = 1. Constrained extremization problems of this kind are
i=1
best handled by the method of Lagrange P multipliers. Thus, we introduce a new parameter, α, and
extremize the function S({pi }) + kα i pi with respect to the {pi } (unconstrained extremization).
Pfor the {pi }’s that depend on the value of α.
This step is very straightforward and yields expressions
We are then able to choose α such that the constraint i pi = 1 is obeyed. Let us see how this works
in practice:
X X
∂
−k pi ln pi + kα pi = −k ln pi − k + kα = 0 =⇒ pi = eα−1 .
∂pi i i
Then, X
pi (α) = 1 =⇒ 1 = W eα−1
i
or
pi = 1/W,
thus recovering the expected result that the probabilities of all microstates are the same and equal to
the reciprocal of the multiplicity of the system, which gives Boltzmann’s statement. The proof that
the extremum found in this manner is in fact a maximum is left as an exercise.
4.6 Exercises
1. Show that the extremum of S found by the method of Lagrange multipliers is a maximum.
2. Show that statistical entropy is positive semidefinite (for any discrete probability distribution, not just
for the most probable).
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 20
4.6. EXERCISES
3. What is the entropy of a quarter-filled lattice gas with M sites? Of a half filled one? Of a three-quarter
filled one? Do you notice a symmetry?
4. Looking at your answer from the previous problem, what filling of the lattice gas has the highest
entropy? Does this mean that the equilibrium state of the lattice gas is the half filled lattice?
5. Invert the Sackur-Tetrode equation to find E(S, V, N ). Why are you guaranteed that the function can
in fact be inverted?
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 21
CHAPTER 4. ENTROPY
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 22
Chapter 5
Properties of Entropy
23
CHAPTER 5. PROPERTIES OF ENTROPY
2E
P = , (5.3.1)
3V
a well known result of kinetic theory. Moreover, from the empirical definition of temperature via the ideal
gas thermometer, we know that P ∝ N T /V.2 We can define the scale of temperature in such a way that the
proportionality constant coincides with Boltzmann’s constant, so that
P kN
= . (5.3.2)
T V
Now, using the Sackur-Tetrode entropy formula for the ideal gas, we find
∂S 3kN
= (5.3.3a)
∂E V,N 2E
∂S kN
= . (5.3.3b)
∂V E,N V
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 24
5.4. HOMOGENEITY OF ENTROPY
We can obtain an expression for the chemical potential of the classical monoatomic ideal gas by
differentiating the Sackur-Tetrode entropy and using Eq. 5.3.5:
3/2
∂S 2πmkT V
µ = −T = −kT ln 2
. (5.3.8)
∂N E,V
h N
Note that the chemical potential is large and negative at sufficiently high temperature, but it can be
positive at low T and high density. It is interesting to see when the chemical potential switches sign:
1/2 1/3
h2
V
µ = 0 =⇒ = . (5.3.9)
2πmkT N
The quantity on the left hand side has dimension of a length and is called the thermal de Broglie
wave length; it is usually denoted by λ,
1/2
h2
λ= . (5.3.10)
2πmkT
Its dependence on Planck’s constant h tells us that it is a quantum mechanical parameter. Classical
theory breaks down when the thermal de Broglie wave length becomes comparable to the interparticle
separation; under these conditions the gas is called “degenerate.” Thus, classical statistical mechanics
cannot be used when the chemical potential approaches zero (from below): matter at extremely high
pressure or very low temperature, in the sense defined by Eq. 5.3.9, behaves quantum mechanically.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 25
CHAPTER 5. PROPERTIES OF ENTROPY
and number of particles in the system? Imagine that we carve out subsystems of unit volume, large enough
that each subsystem can still be considered macroscopic - i.e., a valid statistical representation of the whole
system. This means that after doubling, the new system would simply be made up of twice the number of
representative subsystems, but from the point of view of each subsystem, nothing would have changed: in
other words, all properties that can be measured locally (such as particle density, energy density, temperature,
compressibility, and so on) would remain the same. This consideration leads us to posit that the entropy is
a “homogeneous function of degree one.” Formally, we say
Note that homogeneity cannot be justified if the interactions among particles have macroscopic range (e.g.,
Coulomb or gravitational).3
which is the statement of concavity of entropy. This means that entropy, regarded as a function of energy
(constant V , N ) has negative curvature. The principal curvatures of entropy (or any other potential) are
called susceptibilities or responses, since they describe how a system “responds” to a change in the affected
variable. Here, we considered an injection of energy at constant volume, which is a quantity of heat, since
no work is done at constant volume. The relevant thermodynamic susceptibility is thus the heat capacity at
constant volume (defined in Sect. 5.6), which therefore is positive semidefinite.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 26
5.7. CLAUSIUS’ STATEMENT OF THE SECOND LAW OF THERMODYNAMICS
which is a higher order infinitesimal and therefore vanishes. In contrast, under finite temperature difference,
∆T , the change in entropy of the universe is positive, since dE and ∆T have the same sign.
Note that dE is an energy change at constant volume, and since no mechanical work is done at constant
volume, the energy change is due to heat. In general an infinitesimal amount of heat exchanged reversibly
at temperature T , at constant volume and number of particles, can be expressed as
Likewise, δWrev = P dV is the infinitesimal amount of reversible work done by the system. Note that
reversible work implies that the pressure P of the system is equal to the external pressure Pext . If the work
is not done reversibly, then P 6= Pext , and the external pressure must be used in the computation of the work
RV
done by the system: W = Vif Pext dV (think, e.g., about a gas expanding against a piston with friction).
Operationally, one measures heat by calorimetry, and it is convenient to express a quantity of heat exchanged
at constant volume as δq = cV dT , which holds regardless of whether the heat is exchanged reversibly or not.
Combining the two relations for reversible heat exchange, we find
cV dT
dS V,N = . (5.6.2)
T
Why is this expression useful? Entropy is, after all, a function of E, V, N , so the change in entropy upon
R E+Q 0
absorption of a quantity of heat Q should be calculated as S(E+Q, V, N )−S(E, V, N ) = E dE 0 ∂S(E∂E,V,N
0
)
.
However, temperature is usually easier to measure than energy, so it is desirable to have an expressions of
entropy in terms of T rather than E, or an expression for energy in terms of T rather than S.
Heat exchange with a reservoir
A heat reservoir, or heat bath, or simply a reservoir, is a very large body that can exchange finite
amounts of heat without change in temperature: it has practically infinite heat capacity. What is
the entropy change of a reservoir upon exchange of a quantity of heat Q? Since ∆T = Q/cV is
infinitesimal, T can be treated as constant in the integral for entropy. Therefore ∆S = cV T∆T = Q
T.
Recalling the sign convention on heat exchange (absorbed heat quantity is positive), note that the
entropy of the reservoir increases when heat goes in and decreases when heat goes out.
There are two identical blocks of a certain material of constant heat capacity cV ; one is at temperature
TH and the other at TL < TH . The two blocks are the system. They are brought together and left
to equilibrate. We assume that the blocks are isolated from the surroundings, and we neglect volume
expansion. We want to find the final temperature and the change in entropy of the universe. Since
the blocks are isolated, and no work is done, the quantity of heat flowing out of one is equal in
magnitude to that flowing into the other by conservation of energy. Let Tf be the final temperature
after the blocks equilibrate. Thus, −QH = cV (TH −Tf ) = QL = cV (Tf −TL ) and Tf = (TH +T 2
L)
. The
change in entropy is ∆SUniv = ∆Ssyst = ∆SH + ∆SL , which is conveniently expressed as an integral
over temperature, since the initial and final states of the blocks are given in terms of temperature:
T2 2
∆SUniv = cV ln THfTL = cV ln (T4T
H +TL )
H TL
, which is manifestly greater than zero.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 27
CHAPTER 5. PROPERTIES OF ENTROPY
∂S
enters the system ∂E decreases. So energy enters the colder system and exits the hotter one. The second
law of thermodynamics, which antedates Boltzmann’s formula and the principle of maximum entropy, and
was based entirely on experimental evidence, was formulated by Clausius in the following way:
The Second Law of Themodynamics (Clausius) There is no process whose only effect is to
transfer a quantity of heat from a colder body to a hotter one.
We see that Clausius’ statement of the second law is equivalent to the principle of maximum entropy.
This equation is interesting because it tells us that, for entropy to be defined, there can be at most an
integrable singularity at low T , so for any system in thermodynamic equilibrium, CV → 0 as T → 0.
As to the value of the integration constant, Boltzmann’s formula states that it is the logarithm of the
number of states of the system in equilibrium at T = 0, which means in mechanical equilibrium. Since
matter obeys the laws of quantum mechanics (experimental fact), we must answer the question, what is the
degeneracy of the quantum mechanical ground state of a Hamiltonian system? In most systems of interest,
the ground state turns out to be either nondegenerate, or have a finite degeneracy, so that S(0) is not an
extensive constant and can be taken to be zero: this is Nernst’s third law of thermodynamics. Examples
of nondegenerate ground states are all gases, crystals and liquid He. To provide a general answer to the
question of ground state degeneracy, one must rephrase it in terms of eigenvalues of large matrices. A large
degeneracy implies the existence of high symmetry, and such symmetry is usually broken by the slightest
perturbation, so that high degeneracies are lifted in physical systems.
5.9 Exercises
E+P V −µN
1. Differentiate both sides of Eq. 5.4.1 with respect to λ and then put λ = 1 to show that S = T .
2. Rearranging the equation from the previous exercise shows that E = T S − P V + µN . Regarding all
variables as independent, take the total differential and then subtract Eq. 5.3.7 to obtain an equation
relating the differentials of intensive variables only (Gibbs-Duhem equation). Use this result to explain
Eq. 1.3.1.
3. According to Gibbs-Duhem, µ is a function of P and T only. Starting from Eq. 5.3.8, give the explicit
formula for µ(T, P ) for the ideal gas.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 28
Chapter 6
29
CHAPTER 6. THERMODYNAMIC PROCESSES AND CYCLES
of heat from a colder body to a hotter one. This is in violation of Clausius’s statement of the second law.
Hence,
The Second Law of Themodynamics (Kelvin) It is impossible to have a process whose only
effect is to convert into work heat extracted from a source at a single temperature.
Example (Reversible heat exchange) Consider again the two identical blocks of Example 2 of
Sect. 5.6. However, now the blocks are not brought together. What process can be devised to
extract the maximum amount of work, Wmax , from them? What are the final temperature and the
entropy generated in the process? The blocks are two sources of heat at different temperatures. To
extract work with the maximum possible efficiency, we run a Carnot cycle between them. However,
since the blocks are not infinite reservoirs, their temperatures will change after each cycle (TH will
decrease, TL will increase). So we consider a sequence of infinitesimal Carnot cycles, absorbing dQH
from the hotter block, and rejecting dQL into the colder one. From the properties of Carnot cycles it
follows that dQ dQL
TH = TL , or dSH + dSL = 0 so the total entropy does not change in the process. But
H
T T T T √
∆SH = c ln THf and ∆SL = c ln TLf , so we have c ln THf TLf = 0, or Tf = TH TL . The maximum work
√ √
is W = QH − QL = c(TH + TL − 2Tf ) = c( TH − TL )2 and is obtained when the entropy change
is zero (reversible process).
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 30
6.3. CLAUSIUS’ THEOREM
6.4 Exercises
1. Verify the relations given in Table 6.1.
2. Relate explicitly the heat capacity at constant volume, cV , to the second derivative of entropy with
respect to energy, and show that cV > 0.
3. Show that a body for which (a) temperature starts to increase at constant volume while at the same
time (b) heat starts flowing out of the body cannot have been in equilibrium at the time T started to
increase.
4. An isolated box of volume 2V is separated into two volumes V1 and V2 > V1 by a sliding diathermal
partition. There is one mole of ideal gas on each side, and the temperature is initially Ti on both sides.
Assume that movement of the partition can be harnessed to extract work. Calculate the maximum
work that can be extracted from the system and the final temperature Tf .
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 31
CHAPTER 6. THERMODYNAMIC PROCESSES AND CYCLES
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 32
Chapter 7
Thermodynamic Potentials
the equality holds iff the transformation is reversible. The proof is left as an exercise. Consider now an
isolated system. Then, no heat can be exchanged during a transformation. Thus, the left hand side of
Eq. 7.1.1 vanishes, and we find, again, that for an isolated system undergoing a transformation between two
states, the entropy will increase if the transformation is irreversible, and stay the same if it is reversible.
Consider instead a transformation at constant temperature. Then, the l.h.s. of Eq. 7.1.1 is Q/T , so that we
can write Q ≤ T ∆S; using the first law, Q − W = ∆E, we arrive at a bound for the amount of work that
can be done by a system during a transformation at constant temperature:
W ≤ T ∆S − ∆E. (7.1.2)
This result extends the concept of mechanical Potential Energy, W = −δP E to thermodynamics, where heat
exchanges are considered. It is natural to define a new thermodynamic potential,
such that the bound on the available work is W ≤ −∆F . Equality as usual implies a reversible transfor-
mation. The potential F is called the Helmholtz free energy.1 The word free energy signifies that it is the
maximum amount of energy available for the system to do work while in thermal equilibrium with its sur-
roundings. (Thermal equilibrium with the environment guarantees that heat can be exchanged reversibly.)
When the free energy has reached its minimum, no more work can be done by the system, which therefore
will have reached mechanical equilibrium as well. So the minimum of the Helmholtz free energy corresponds
to the stable state of the system in thermal equilibrium with the environment at temperature T , in analogy
with the maximum of entropy, which corresponds to the stable state of an isolated system. Note that in the
former case, the system is at constant temperature, while in the latter, it is the energy that is constant. But
T = ∂E/∂S suggests that the relation in Eq. 7.1.3 is not “casual,” but rather the result of a systematic way
of constructing potential functions. This construction is the mathematical tool called Legendre transform.
33
CHAPTER 7. THERMODYNAMIC POTENTIALS
Consider the convex function f (x) as sketched in the figure at right. Con-
vexity ensures that f is differentiable (except possibly at a finite number
of points, an important case we ignore for the moment) and that the first
df
derivative of f , dx , is monotonic. This means that each point on the
df
curve has a unique slope, s := dx , so that we can specify the function in
terms of its slope rather than in terms of x. We may be tempted to put
?
g(s) = f (x(s)), but that would not work, since knowing that s = 1 when
f = 2 leaves us with infinite choices for where to draw the line with slope
s that intersects the flat line y = 2. Clearly the information y = 2 is not
useful to us. What we need is the value of the y−intercept of the tangent
line to f ; in this way, we trade the information encoded in (x, f (x)) for
the information encoded in (slope, intercept) of the tangent line to f .2 The
figure illustrates the geometric significance of the transform. Analytically,
we can define the transform as
where the operation of finding the minimum for all x explicitly demonstrates that the right hand side is no
longer a function of x. It is common, especially in thermodynamics,3 to write simply
where it is understood that either s or x must be treated as a constant parameter; then, holding, say, s
df
constant, and differentiating with respect to x, one has s = dx , and vice versa, holding x constant, one has
dg
x = − ds . Two useful properties of the Legendre transform are
(i) the Legendre transform of a convex function is concave in the new variable;4
(ii) the Legendre transform of the Legendre transform of f is f .
With the aid of the Legendre transform, we immediately recognize the Helmholtz free energy F (T, V, N ) =
E − T S as the Legendre transform of the internal energy E(S, V, N ); see Eq. 7.1.3. Two more potentials are
used frequently in thermodynamics: the enthalpy
H(S, P, N ) = U + P V (7.2.3)
2 To verify that information is preserved, consider for instance that if you were to color the plane below each tangent line
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 34
7.3. STABILITY CRITERIA FOR THERMODYNAMIC POTENTIALS
Even then, we would be missing an integration constant for entropy (the third law and quantum
mechanics can help to determine it, as we will see).
∂ 2 E
∂T T
= = >0 (7.3.1a)
∂S 2 V,N ∂S V,N CV
∂ 2 E
∂p 1
= − = > 0; (7.3.1b)
∂V 2 S,N ∂V V κS
S,N
and that the determinant of the hessian matrix is positive (the proof is left as an exercise)
2 2
∂ 2 E ∂ 2 E
∂ E
> . (7.3.2)
∂S 2 V,N ∂V 2 S,N ∂S∂V N
Noting that the Legendre transform involves pair of “conjugate” variables, one of which is intensive and the
other extensive (by Euler’s homogeneity property), and the energy is a function of all extensive variables, we
conclude immediately that the thermodynamic potentials F, H, G must be convex functions of the extensive
variables and concave functions of the intensive ones. For example, after Legendre transformation to the
Helmholtz free energy, the stability conditions become
∂ 2 F
∂S CV
2
=− =− <0 (7.3.3a)
∂T V,N ∂T V,N
T
∂ 2 F
∂p 1
= − = > 0. (7.3.3b)
∂V 2 T,N ∂V V κT
T,N
The compressibility is now isothermal instead of isentropic, since the Legendre transform switched the
independent variable from S to T .
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 35
CHAPTER 7. THERMODYNAMIC POTENTIALS
we wish to expressed as a derivative of some other variable accessible through an equation of state; or when
we want to reduce the second derivative of a potential to one of the three standard responses (heat capacity,
compressibility, thermal expansivity). There is nothing fundamental about these manipulations, other than
they are used to cast a calculation in terms of quantities that are easily measured in an experiment. There are
two useful mathematical trick to solve these kind of problems, Maxwell Relations and Jacobians. Maxwell
relations simply state that thermodynamic potentials are potential functions. That means that∂Pthe
order of
∂2F ∂2F ∂S
differentiation does not matter,5 so that, for example, ∂T ∂V = ∂V ∂T , which means ∂V T,N = ∂T V,N . Note
that the two variables will never be a conjugate pair!
Jacobians are determinants useful to compute derivatives with change of variables, such as r(x, y), s(x, y) →
r(u, v), s(u, v). The following relation holds:
∂r ∂r ∂x
∂x ∂r ∂r
∂x ∂y ∂u ∂v ∂u ∂v ,
∂s
∂s ∂y ∂y = ∂s ∂s (7.4.1)
∂x ∂y ∂u ∂v ∂u ∂v
We can use this relation conveniently even when there is just one partial derivative, but
we are changing
∂S ∂S
the variable to be held constant. For instance, say we want to switch from CV = T ∂T V
to CP = T ∂T P
.
Then, remembering ∂V /∂V =1, we have
V α2
CV ∂(S, V ) ∂(S, V ) ∂(T, P ) ∂S ∂V ∂S ∂V ∂P CP
= = = − = − , (7.4.2)
T ∂(T, V ) ∂(T, P ) ∂(T, V ) ∂T P ∂P T ∂P T ∂T P ∂V T
T κT
where we have used a Maxwell relation along with the definition of the isobaric expansion coefficient α to
∂S
write ∂P T,N
= − ∂V
∂T P,N := −V α.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 36
7.5. EXERCISES
In this example we practice writing the first law for magnetic work rather than expansion work, using
Maxwell relations and Jacobi determinants, and using an analogy between magnetic and compression
work. Adiabatic demagnetization is the cooling of a magnetic sample analogous to the cooling of
a gas during an adiabatic expansion; it was discovered by W. Giauque and is employed to reach
temperatures in the mK range. It exploits the fact that the spins of a magnetic salt can be aligned in
a strong magnetic field, acquiring large negative potential energy; upon decreasing the field adiabati-
cally, this magnetic potential energy increases from a large negative to zero so that the kinetic energy
of microscopic motion must decrease: random motion of the magnetic nuclei decreases, and the salt
cools toward absolute zero. Here, we are interested in finding out how temperature changes with
∂T
changing magnetic field in the adiabatic process, i.e., ∂B S
, assuming the equation of state M T = B.
We can express this derivative using the Jacobian determinants:
where the − sign occurs because we switched columns in a determinant. The identity above is
rewritten as
∂T ∂S ∂S
=− .
∂B S ∂T B ∂B T
Of the partial derivatives of entropy, one is heat capacity at constant field, ∂S/∂T |B = CB /T , and
the other can be found through a Maxwell relation. So we must construct thermodynamic potentials.
The work done per unit volume on an isotropic sample of magnetic salt in increasing its magnetization
from M to M + dM in a magnetic field B is BdM . The first law in differential form is written as
dU = T dS +BdM. Noting the analogy −P → B, we write the Gibbs free energy as G = U −T S −BM
.
∂S
∂S
∂M
From here, we find the Maxwell relation expressing ∂B T as a function of M and T : ∂B = ∂T =
T B
− TB2 (using the given equation of state). Putting it all together, we reach the final result expressing
the temperature change in the adiabatic demagnetization in terms of experimental quantities:
∂T ∂S/∂B|T B T M
=− = 2 = .
∂B S ∂S/∂T |B T CB CB
7.5 Exercises
1. Justify the following relations for two systems 1 and 2 initially in equilibrium at temperature T :
and
F (T, 2V, 2N ) = 2F (T, V, N );
then use them to show that the isothermal compressibility, κT = − V1 ∂V
∂P T , is positive semidefinite in
thermal equilibrium.
2. Consider an extremely crude model of a long chain molecule, made up by adding identical monomers
sequentially. Each monomer can be added in one of two configurations, straight or kinked, and for
simplicity assume that (a) straight and kinked occur with equal probability; (b) a straight monomer
contributes a length a to the length of the molecule, while a kinked one does not contribute. Thus, for
a molecule of N monomers, the maximum length is N a, and if n is the number of straight monomers,
the actual length is L = na.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 37
CHAPTER 7. THERMODYNAMIC POTENTIALS
3. This exercise should help making sense of the Legendre transform. Why is the Helmholtz free energy
not a function of E? Consider a system with energy, volume, and number of particles E, V, N , and a
temperature bath at temperature Tb . Show that ∂F (Tb∂E
,E,V,N )
= 0 (i.e., F does not depend on E) if
system and bath are in thermal equilibrium.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 38
Chapter 8
where the negative sign in front of the constant β is inconsequential and has been chosen for later convenience.
Proceeding as in the example of Sect. 4.5, we find
X
∂ X X
−k pi ln pi +kα pi −kβ pi Ei = −k ln pi −k +kα −kβEi = 0 =⇒ pi = eα−1 e−βEi . (8.1.2)
∂pi i i i
P
The parameter α can be eliminated by imposing normalization of probability, i pi (α) = 1, which implies
e−βEi
pi = P −βEi . (8.1.3)
ie
In contrast to isolated systems, the probability distribution of the microstates of a system in equilibrium
with a reservoir is no longer uniform. It is customary to refer to isolated systems as “microcanonical” and
39
CHAPTER 8. STATISTICAL MECHANICS IN THERMAL EQUILIBRIUM
to systems in thermal equilibrium as “canonical.” To identify the constant β, rewrite the first equality in
Eq. 8.1.2 in differential form,
(−k ln pi − k)dpi + kαdpi − kβEi dpi = 0,
P
sum over all i, and note that i dpi = 0. The equation then becomes
dS − kβdhEi = 0;
recalling that the system is in equilibrium and at constant volume so dhEi = δqrev , this implies
1
β= . (8.1.4)
kT
Consider N distinguishable
particles in a two-level system with total energy E. We showed in Sect. 4.3
N
that S(E, N ) = k ln n(E) . Starting from this result, we want to describe the same system, but in
n(E) E
terms of temperature,
rather than total energy.
Putting p := N = N and using Stirling’s formula,
one finds S = −kN p ln p + (1 − p) ln(1 − p) . Next, calculate
1 ∂S ∂S ∂p 1 ∂S k p
= = = = − ln .
T ∂E ∂p ∂E N ∂p 1−p
This formula can be inverted to express p as a function of temperature:
e−β
p= .
1 + e−β
The probability that the particles have energy at temperature T agrees with Eq. 8.1.3.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 40
8.3. ENERGY FLUCTUATIONS AND HEAT CAPACITY
e−βEi
−βEi
X X e
S(T, V, N ) = −k pi ln pi = −k P −βE ln P −βE = kβhEi + k ln Q. (8.2.3)
ie ie
i i
i i
where F (T, V, N ) is the Helmholtz free energy. This shows that Q contains all information about the
thermodynamic properties of the system.
Partition Function of the Ideal Gas
The classical ideal gas is a straightforward application of Eq. 8.2.1. Consider N noninteracting parti-
N
X p~2i
cles confined to a volume V and in equilibrium at temperature T . The Hamiltonian is H = .
i=1
2m
Therefore,
N 3N/2 N
d3N xd3N p −β PN VN VN
Z
~2
Z pi βp2 2πmkT 1 V
Q= e i=1 2m = d3 pe− 2m = = ,
N !h3N 3N
h N! N! h2 N ! λ3
where λ is the thermal de Broglie wave length (Eq. 5.3.10). From here, the Helmholtz free energy is
calculated (with v = V /N and using Stirling’s formula) as
λ3
V
F = −N kT ln 3 + kT ln N ! = N kT ln − N kT,
λ v
the pressure as
∂F N kT
P =−
= ,
∂V V
and we can check that the entropy agrees with the Sackur-Tetrode expression (see exercises).
Note that in the example above the partition function factorizes into the product of single particle
partition functions,
QN
QN = 1 . (8.2.5)
N!
This is a general consequence of the fact that the Hamiltonian of noninteracting particles is the sum of single
particle Hamiltonians. In the general case of interacting particles, this does not happen, and the partition
function is usually impossible to calculate exactly. Consider, for instance, the case of a fluid of particles
interacting through a pairwise potential, v(~ri − ~rj ). The partition function then becomes
N
d3N rd3N p p~2i
Z Z 3N
X XX 1 d r XX
Q= exp −β − β v(~
ri − ~
rj ) = exp −β v(~
ri − ~
rj ) . (8.2.6)
N !h3N i=1
2m i
N ! λ3N i
j6=i j6=i
The configurational integrals can be tackled by numerical methods or through various approximation strate-
gies.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 41
CHAPTER 8. STATISTICAL MECHANICS IN THERMAL EQUILIBRIUM
Consider a system with two nondegenerate energy levels E0 = 0 and E1 = . In this simple case, the
e−β
moments are given by hE n i = n 1+e −β by straightforward calculation. It is instructive, however, to
Q e−β
derive them by differentiation of the partition function Q = 1 + e−β . Then, hEi = − ∂ln
∂β = 1+e−β
∂ Q 2
2 e−β
and hE 2 i = (−1)2 Q∂β 2 = 1+e−β
. The heat capacity is
1 2 2 1 2 e−β
CV = (hE i − hEi ) = , (8.3.2)
kT 2 kT 2 (1 + e−β )2
This relation implies that the logarithm of Q is related to the logarithm of g(E) by a Legendre transformation,
which is just what connects Helmholtz free energy to entropy (or energy). The relation between Laplace
and Legendre transforms becomes exact in the thermodynamic limit, which means for N → ∞. This can be
shown by evaluating the integral in Eq. 8.4.1 using the saddle point method. First note that the integrand
of Eq. 8.4.1 is the product of two factors, the density of states and the Boltzmann factor; the first grows
rapidly with energy,3 while the second decreases exponentially with it. We therefore expect the product to
3 Although we have shown this explicitly only for the ideal gas, see Eq. 4.4.1, we can easily see that the conclusion remains
valid in the presence of interactions.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 42
8.5. PARTITION FUNCTION OF THE CLASSICAL HARMONIC OSCILLATOR
where ∆ is some arbitrary constant with dimensions of energy, which we disregard hereafter; next, note that
in this way, it takes on the form e−N φ(η) with η = E/N and φ(η) = βη − S(η, v, 1)/k. The maximum of the
integrand corresponds to the minimum of φ, so we can solve for = E ∗ = N η ∗ by setting dφ
dη = 0. We find
∂S(η, v, 1) ∂S(E, V, N ) 1
= = .
∂η
η=η ∗ ∂E
E=E ∗ T
∂ 2 φ ∂ 2 S(η, v, 1)
N ∂ 1 N
2
=− 2
=− = ,
∂η η=η∗ k∂η
η=η ∗ k ∂E T E=E ∗ kT 2 CV
p2 mω 2 x2
H= + . (8.5.1)
2m 2
The partition function involves momentum and coordinate integrations. The momentum integral is a gaussian
integral, exactly the same as for the ideal gas; the coordinate integral is also gaussian, so it yields the same
dependence on temperature4
p2 mω 2 x2
Z
dpdx 1
Q= exp −β exp −β = . (8.5.2)
(2π~) 2m 2 β~ω
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 43
CHAPTER 8. STATISTICAL MECHANICS IN THERMAL EQUILIBRIUM
Classical Equipartition Theorem Each degree of freedom that contributes quadratically to the
classical Hamiltonian of a system will contribute an amount kT
2 to its internal energy and an amount
k
2 to the heat capacity.
For a crystal with N atoms, classical equipartition implies a constant heat capacity CV = 3N k, as there
are 6 degrees of freedom per atom that contribute a quadratic term to the Hamiltonian. This is actually
observed for most crystals at room temperature, and is know as the law of Dulong-Petit. However, the law
fails disastrously at low temperature, where the heat capacity is found experimentally to go to zero as T 3 .
Of course, we already knew that it must fail if entropy is well defined (Sect. 5.8).5
Example: Rigid Rotator
Consider a diatomic molecule, like N2 . Regarding it as a rigid dumbbell aligned with the ẑ-axis,
p2x +p2y +p2z
we can write the Hamiltonian as the sum of the kinetic energy of the center of mass, 2M ,
L2 +L2
and the rotational energy, or kinetic energy in the center of mass, x2I y . Here, I is the moment
of inertia about the x̂- or ŷ-axes (Iz =0, so there is no degree of freedom or energy associated with
rotation about the molecular axis). Since the Hamiltonian has five quadratic degrees of freedom, by
the equipartition theorem, CV = 52 k.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 44
8.7. HEAT CAPACITY OF CRYSTALS AND BLACKBODY RADIATION
which is sometimes called the average occupation number. The heat capacity is
2
dhEi dhEi β~ω/2
CV = = −kβ 2 =k . (8.6.5)
dT dβ sinh(β~ω/2)
How does this formula compare to the classical case? In general, we compare quantum mechanical to classical
x
results by setting ~ → 0. Here, we see that, since sinh x → 1 for x → 0, we recover the equipartition result
in the classical limit. Now let us study the important limits of high and low temperature, β → 0 and
β → ∞, respectively. We immediately realize that the high T limit and the classical limit coincide, since β
multiplies ~. Hence, in the high temperature limit, the quantum mechanicl oscillator behaves like a classical
oscillator. In the low T limit, however, β → ∞ causes the denominator to diverge exponentially, winning
over the linearly diverging numerator. Therefore, the heat capacity vanishes in this limit, unlike predicted
by classical equipartition. This result might have been expected, of course, once it is realized that, at T = 0,
any system with discrete energy levels looks like a two level system (looking up from the ground state, all
that matters to the system is the first rung in the energy ladder).
Example: Heat Capacity of Diatomic Gases
Consider a diatomic molecule. This time, take into account the molecular bond between the two
atoms; think of it as a stiff spring rather than a rigid dumbbell. Although the experimental heat
capacity of a diatomic molecule at room temperature, CV = 52 k, is explained purely by its translational
and rotational degrees of freedom, the bond should contribute as well. Typical bond energies are
~ω ∼ 0.1 − 0.5 eV, while at room temperature kT ≈ 0.025 eV. This means that the vibrational
degree of freedom is “frozen out”: the molecule is sitting in the vibrational ground state as if it were
at zero temperature. At higher than room temperatures (e.g, ∼103 K) the heat capacity becomes
indeed 72 k. Conversely, at lower than room temperatures (<102 K) the rotational degrees of freedom
freeze out: due to quantization of angular momentum (L2 = ~2 `(` + 1), ` = 0, 1, . . . ), the molecule
2
eventually settles in the rotational ground state when kT ~2I . Although the mechanical origin of the
energy spectrum is different, statistical mechanics provides a unified explanation of the experimental
observations. However, the vanishing of CV at T = 0 in spite of the remaining translational degrees
of freedom has a completely different origin.
7 There are three polarization states for each vibration, one longitudinal and two transverse, which must also be accounted
for by proper indexing. If we approximate them as degenerate, we can simply multiply occupation numbers by a factor of 3.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 45
CHAPTER 8. STATISTICAL MECHANICS IN THERMAL EQUILIBRIUM
∂ln Q X 1 1
hEi = −3 =3 ~ω(~k) + , (8.7.2)
∂β 2 eβ~ω(~k) − 1
~ k
where the factor of 3 takes into account the three possible polarization states. The sum over normal modes
is a sum over a very dense reticle and can be replaced by an integral. After some bookkeeping, one arrives
at
TD /T
T3 x3
Z
E = N 9kT 3 dx , (8.7.3)
TD 0 ex−1
where the “Debye temperature” is set by the energy of the highest vibrational mode (the mode with shortest
wave length, comparable to the unit cell size). It is interesting to examine the two limiting cases of low and
high temperature:
T3 ∞ x3
Z
9N kT 3
dx x = AT 4 T TD ;
TD 0 e −1
E= (8.7.4)
T 3 TD /T
Z
dxx2 = 3N kT T TD ,
9N kT 3
TD 0
Classical equipartition is recovered at high temperature, while at low temperature the heat capacity has a
power law dependence on temperature. This power law behavior is brought forth by the long wave length
phonon modes, and is observed experimentally in insulating crystals (the heat capacity of conductors has a
contribution from free electrons). It is called the Debye specific heat.
The low temperature limit of the equations just derived describes correctly another important physical
system, blackbody radiation. Blackbody radiation is the name given to electromagnetic field oscillations in
a cavity in thermal equilibrium.8 A blackbody emits radiation with a spectral distribution characteristic of
temperature only that be derived from the partition function in Eq. 8.7.1 proceeding in way similar to the
crystal, except for the facts that (a) photons only have two polarization states; (b) there is no underlying
lattice, so there is no minimum wave length; and (c) there is no fixed number of underlying degrees of
freedom. When this is taken into account (details left as an exercise), the energy density is calculated as
E π 2 (kT )4
= . (8.7.6)
V 15 (~c)3
Multiplying by the speed of light, one obtains the energy flux radiated by the black body, known as the
Stefan-Boltzmann law.
8 Thename derives from the fact that a small hole in the cavity wall will absorb any photon incident upon it (perfect absorber,
hence “black”); it must also be a perfect emitter if it has to remain in equilibrium at constant temperature.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 46
8.8. EXERCISES
Blackbody Thermodynamics
An instructive exercise in the application of thermodynamic potentials is to work out the thermody-
namic properties of blackbody radiation. We can begin from our calculation of the energy equation
of state,
E = bV T 4 ,
where b is a constant, from which we also have
CV = 4bV T 3 .
According to Sect. 7.2, this is a false start, because E(T, V, N ) is an equation of state, not a fun-
damental relation. Of course, we could evaluate the partition function and get the free energy from
there, but it turns out that this is not needed. The key point is that, as we have already observed,
the number of particles (electromagnetic oscillators, or photons) is fluctuating. The thermodynamic
potentials do not depend on it, which means that the chemical potential is zero. Hence, we do have
an additional equation of state, µ = 0, to complement the energy equation of state, so the problem
is not underdetermined. We can proceed to calculate entropy by integrating the heat capacity:
Z T
2 4
S(T, V ) = S(0) + dT 0 4bV T 0 = bV T 3 .
0 3
8.8 Exercises
1. Starting from the ideal gas partition function, calculate the entropy and show that it agrees with the
Sackur-Tetrode expression.
2. Show that for a classical gas the probability distribution of the velocities is Maxwellian (cf. Sect. 3.10).
V
P R 3
3. Work out Eq. 8.7.6 from a suitable modification of Eq. 8.7.2. (Hint: replace k with (2π) 3 d k; you
x3 π4
R
will need dx ex −1 = 15 . You should shift the energy of all oscillators to be zero in the ground state to
obtain the desired result.)
4. Calculate how pressure varies with volume for a reversible adiabatic expansion of a photon gas.
2 4
5. Consider a classical particle in the potential well V (x) = a x2 + b x4 , with a and b positive constants.
Use a Taylor expansion to calculate approximately the partition function Q, the average energy hEi,
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 47
CHAPTER 8. STATISTICAL MECHANICS IN THERMAL EQUILIBRIUM
and the heat capacity CV through 1st order in b. State a meaningful (dimensionless) criterion for
the validity of the approximation. Explain on physical grounds why CV is smaller than for the pure
harmonic potential.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 48
Chapter 9
As an application, let us determine the shape of the liquid-vapor coexistence curve. Consider a liquid
in equilibrium with its vapor. As we take a step along the curve, we have that the chemical potentials
of the two phases remain equal. Thus, dµv = dµl . Since dµ = −sdT + vdP , with s and v specific
entropies and volumes, we have (sv − sl )dT = (vv − vl )dP , and noting that T (sv − sl ) = ∆hvap is
the specific enthalpy of vaporization, also called latent heat, and vv vl ≈ kT /P (away from the
critical point!) we find
dP ∆hvap P
= ,
dT kT 2
and, integrating, we find
1 1
P = P0 exp ∆hvap − . (9.1.1)
kT0 kT
1 Note N
that we should not include the multiplicity factor of N to count all the ways of splitting the particles between the
1
two subsystems. The Gibbs prescription already accounts for that, as is verified easily.
49
CHAPTER 9. STATISTICAL MECHANICS OF OPEN SYSTEMS
which suggests that we take as the properly normalized probability of N1 particles being in V1 the expression
Q(T, V − V1 , N − N1 )Q(T, V1 , N1 )
p(N1 ; T, V1 ) = = Q(T, V1 , N1 )e−β[F (T,V −V1 ,N −N1 )−F (T,V,N )] . (9.2.2)
Q(T, V, N )
Now recall that since V1 V , the complement system of V1 can be viewed as a reservoir of heat and
particles, in the sense that in equilibrium, all its average extensive properties will be much larger than those
of the system (in the ratio V : V1 ); therefore we can Taylor expand the free energy of the reservoir and use
∂F ∂F
the relations that ∂V = −P and ∂N = µ. Now we let N → ∞ and V → ∞ for the reservoir, and we drop
the subscripts “1” from N1 and V1 for the system to obtain the probability distribution
Rearranging, summing over all particles in the open subsystem, and using the normalization of probability,
we find
∞
X ∞
X
eβP V p(N ; T, V ) = eβP V = Q(T, V, N )eβµN . (9.2.4)
N =0 N =0
The rhs of the equation is called the grand canonical partition function, denoted by Ξ:
∞
X
Ξ(T, V, µ) = eβµN Q(T, V, N ). (9.2.5)
N =0
The thermodynamic potential associated with it is called the Grand Potential Ω, a.k.a. the Landau Potential
or Landau free energy, defined by
Ξ(T, V, µ) = e−βΩ . (9.2.6)
For a P V system, it is obvious from Eq. 9.2.4 that Ω = −P V . To generalize to any type of work, it is
convenient to express the Landau potential directly as the Legendre transform of the Helmholtz free energy
with respect to the particle number:
The grand canonical partition function is often written in terms of fugacity, defined as
z = eβµ , (9.2.8)
Since the system can exchange both energy and particles with a reservoir, they are both stochastic variables
with expectations given by2
∂ ln Ξ
hN i = z (9.2.10a)
∂z T,V
∂ ln Ξ
hEi = − . (9.2.10b)
∂β V,z
Similar to the relation between microcanonical and canonical distributions, the grand canonical also pro-
vides an equivalent description to the canonical distribution, in the sense that the particle number, albeit
fluctuating, is Gaussian distributed and strongly peaked around the canonical value (the average value). The
variance of the particle number fluctuation is given by
∂ 2 ln Ξ(T, V, µ) κT
hN 2 i − hN i2 = (kT )2 2
= hN ikT , (9.2.11)
∂µ v
2 Itis important to remember what variables are kept constant. A different expression for the energy is obtained by working
at constant µ rather than z! (See exercises.)
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 50
9.3. ADSORPTION EQUILIBRIUM
where κT is the isothermal compressibility, which cannot, therefore, be negative. Just as we saw for the
energy fluctuation, the mean square particle fluctuation is an extensive quantity, which implies that the
standard deviation vanishes as hN i−1/2 in the thermodynamic limit.
Ideal gas in the grand canonical ensemble
N N
X
−βEn
X M −N N βn
Q= gn e = e . (9.3.1)
n=0 n=0
m − n n
To examine the behavior of the system as a function of temperature, we note that for T → ∞ (β = 0), the
Boltzmann factors are all unity, so all M sites are equivalent, and Q = M m , which is what the summation
evaluates to (Vandermonde convolution); in the opposite limit, T → 0 (β → ∞), the sum is dominated by
the largest term (n = N ), so all surface sites are filled.
To examine the behavior of the system as a function of gas pressure, let us assume N m. In this case,
the surface can be regarded as an open system in equilibrium with a particle reservoir, which we take to be
an ideal gas, so we can put
P
µads = µig = kT ln .
P0
The grand canonical partition function of the adsorbate is
N
X N βn
Ξ(T, N, z) = zn e = (1 + zeβ )N , (9.3.2)
n=0
n
with z = eβµig = P/P0 . Note the form of the partition function as the product of N single-site partition
functions, due to the assumption that sites are independent (and distinguishable). The surface coverage, θ,
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 51
CHAPTER 9. STATISTICAL MECHANICS OF OPEN SYSTEMS
with K = eβ /P0 . This is the Langmuir isotherm. Rearranging the Langmuir isotherm equation, we can
write
θ
= K,
(1 − θ)P
which makes it manifest (at least to chemistry students) that K is the equilibrium constant for the process
S + X
SX binding the substrate to the adsorbate.
Example: Cooperative Binding
The structure of the site partition function of the adsorbate reveals that it is a polynomial of degree
equal to the number of adsorbed particles per site (or in the number of ligands per enzyme). This
is a general feature of the grand canonical partition function, which is a polynomial in z for finite
number of particles. For historical reasons, the partition function for ligand adsorption goes by the
name of binding polynomial. A plot of the average site occupation number, θ, versus pressure (or
bulk phase concentration [L]) is often used to extract information about cooperativity in the binding
process. Positive cooperativity means that subsequent binding events are facilitated by previous
binding events, e.g., when the first ligand deforms the enzyme such as to make it more favorable to
bind the next ligand. For instance, assume a two-ligand adsorption process where the second ligand
has a much more favorable adsorption energy; then, by the definition of the equilibrium constant K
in Eq. 9.3.3, we see that the isotherm is dominated by the quadratic term of the polynomial. This can
result in an isotherm with a sigmoidal shape. The effective degree of the polynomial (which reveals
the effective number of ligands cooperating in the binding event) can be extracted from the slope of
θ
the logarithmic plot of ln 1−θ vs ln[L]. Positive cooperativity is famously observed in the case of O2
adsorption by hemoglobin, where the degree of the polynomial is 4 and the slope of the Hill plot is
about 2.8.
independent of spin. A spin-up and a spin-down electron can have the same momentum ~k, so each momentum
state has a spin degeneracy factor of gs = 2s + 1 = 2. Therefore, the grand canonical partition function is
Y
1 + eβ(µ−~k ) .
Ξ= (9.4.2)
~
k,s
It is easy to pass to the continuum limit by considering the logarithm of the partition function. Proceeding
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 52
9.4. IDEAL FERMI GAS
d3 k
Z
ln 1 + eβ(µ−~k )
ln Ξ = V gs (9.4.3)
(2π)3
Z 3
eβ(µ−~k )
Z 3
∂ ln Ξ d k d k
hN i = z = V gs := V n~ . (9.4.4)
3 (2π)3 k
β(µ− )
∂z (2π) 1 + e ~
k
The second equation above3 defines the single particle occupation number, given by
1
n~k = n(~k ) = gs β(~k −µ)
, (9.4.5)
e +1
which is the celebrated Fermi distribution.
In the limit T → 0, the Fermi distribution reduces to the
complement of a step function, n() = 1 − θ(~k − µ). The
physical meaning of this formula is that at T = 0, all states with
energy below the chemical potential are completely occupied,
and all states above it are completely empty. The chemical
potential of a Fermi gas at T = 0 is called the Fermi energy or
Fermi level:
µ(T = 0) = F .
The corresponding momentum is called the Fermi momentum, kF . At finite temperature, some electrons
with energy below the Fermi level can be thermally excited to levels above F , as illustrated in the figure.
The spread in energy where this is possible is of order kT . In other words, the Fermi distribution at finite T
differs from a step function only over a region of approximate extent F −kT < < F +kT . The value of the
Fermi energy of a typical metal is several eV, or a few hundred times the value of kT at room temperature.
So in normal laboratory conditions, metals behave like Fermi gases near T = 0.
The Fermi level of a Fermi gas is calculated easily by noting that at T = 0 all levels, and only those
levels, with |~k| < kF are occupied, so the total number of electrons is
kF
d3 k d3 k
Z Z Z
V V 3
dkk 2 =
N =V n~ = 2V 1 − θ(~k − F ) = 2 k , (9.4.6)
(2π)3 k (2π)3 π 0 3π 2 F
yielding the dependence of Fermi energy on electron density:
2/3
~2 kF2 ~2 3π 2 N
F = = . (9.4.7)
2m 2m V
In the same way, one can calculate the average energy,
Z 3
d k ~2 k 2 3
E = V gs 3
n~k = N F . (9.4.8)
(2π) 2m 5
We can use Eq. 9.4.8 to study the properties of the Fermi gas at zero temperature. First, note that S = 0,
since there is a unique configuration of the ground state. Thus, partial derivatives of the energy with respect
to volume or number are taken at constant entropy, and we can calculate chemical potential and pressure as
∂E
µ= = F (9.4.9)
∂N
∂E 2N
P =− = F . (9.4.10)
∂V 5V
The first equation confirms the result we had anticipated, that the Fermi energy is the chemical potential at
T = 0. The second equation states the remarkable fact that at zero temperature the Fermi gas possesses a
large residual pressure, unlike the classical ideal gas. This fact is direct consequence of the Pauli exclusion
3 Note the resemblance to the coverage equation of the Langmuir isotherm.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 53
CHAPTER 9. STATISTICAL MECHANICS OF OPEN SYSTEMS
principle. For typical values of Fermi energy, F = 10 eV, and density, N/V = 1022 cm−3 , we find P ≈ 3×104
atmospheres!
Application: Contact Potential
Consider two different metals (with different Fermi energies) in thermal equilibrium. When the metals
are connected by a wire, the difference in Fermi energy (and therefore, electron gas pressure) forces
electrons to flow from one metal to the other, until the metals charge up sufficiently that the resulting
voltage (electrostatic potential difference, ∆φ) causes the current flow to stop. This voltage is called
the contact potential. When electrons stop flowing, the electrostatic potential difference has balanced
exactly the chemical potential difference. The equilibrium condition is given by the minimum of the
Helmholtz free energy. Assume the Fermi energy (and so the chemical potential) of metal 1 is larger
than that of metal 2, so that Ne electrons are transferred from metal 1 to metal 2. The total free
energy is then
(eNe )2
F (Ne ) = F1 (−Ne ) + F2 (Ne ) + ,
2C
where F1,2 are the free energies of the two metals, C is the capacitance of the metals (function of their
shape and distance), e is the proton charge, and the ratio of (charge)2 over twice the capacitance is
the electrostatic energy stored in the capacitor as a result of the build-up of charge eNe . Since the
total Helmholtz free energy must be minimum with respect to the partition of the charge between
∂F
the metals, we set ∂N e
= 0. This means
e2 Ne
−µ1 + µ2 + = 0.
C
The last term can be expressed in terms of the potential difference between the two electrodes of
the capacitors: Q = CV , where V is the potential of the positively charged electrode (metal 1 that
lost electrons) minus the potential of the negatively charged metal (metal 2 that gained electrons):
V = φ1 − φ2 . Thus, −µ1 + µ2 + eφ1 − eφ2 = 0 or
This equilibrium condition can be stated as the condition of equality of the electrochemical potential
of the two metals.
where Bn is the n-th virial coefficient and depends only on temperature. Of course, we already know that
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 54
9.5. VIRIAL EXPANSION
B1 = 1 from matching to the ideal gas equation of state. To limit bookkeeping to a minimum, we work out
explicitly the expression of the second virial coefficient B2 , although the method can be used quite generally
for higher order coefficients. Taking the logarithm of Eq. 9.2.9, we have
X ∞
1 1
βP = ln z N Q(T, V, N ) = ln 1 + zQ1 + z 2 Q2 + . . . , (9.5.2)
V V
N =0
which yields an undesired expansion in fugacity. However, we do know density as a power series of fugacity,
so our plan is to invert that expansion and substitute into Eq. 9.5.2 to find the virial coefficients. The power
expansion of density comes from Eq. 9.2.10:
hN i z ∂ ln Ξ z ∂
ρ= = = ln 1 + zQ1 + z 2 Q2 + . . . . (9.5.3)
V V ∂z V ∂z
Since Eq. 9.5.3 shows that ρ = O(z), and we are looking for an expression valid to O(ρ2 ) to get B2 , we
don’t need to keep track of terms o(z 2 ); thus, recalling the Taylor expansion of the logarithm, ln(1 + x) =
x − x2 /2 + . . . , we have
z ∂ z2 1
ρ= (zQ1 + z 2 Q2 − Q21 ) + O(z 3 ) = (zQ1 + 2z 2 Q2 − z 2 Q21 ) + O(z 3 ). (9.5.4)
V ∂z 2 V
Moreover, writing fugacity as a power series of density,
substituting into the right hand side of Eq. 9.5.4, and grouping like powers of ρ, we obtain
2
Q1 Q1 2 2Q2 2 Q1
0= a −1 ρ+ b +a −a ρ2 + O(ρ3 ). (9.5.6)
V V V V
A power series vanishes identically when all the coefficients do; hence,
V b Q2
a= ; =V 1−2 2 . (9.5.7)
Q1 a Q1
Substituting Eq. 9.5.5 into Eq. 9.5.2, and using the values of a and b just determined, we find
Q21 2 2
Q1 Q1 2 Q2 V Q2 2
βP = aρ + bρ + − a ρ =ρ+ 1 − 2 2 ρ + O(ρ3 ), (9.5.8)
V V V 2V 2 Q1
Typical intermolecular potentials have a strong, short range repulsion –over a distance the size of the “hard
core” radius σ– and a weak, long range attraction. Over the hard core region, the integrand is approximately
1, so the integral acquires a positive contribution approximately equal to the volume of the hard core region,
the “excluded volume,” and independent of temperature. Outside of the hard core region and out to infinity,
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 55
CHAPTER 9. STATISTICAL MECHANICS OF OPEN SYSTEMS
an estimate for the contribution of the weak tail of the potential is given by β d3 rv(r), a negative quantity
R
for an attractive potential, and a decreasing (in magnitude) function of temperature. Therefore, the second
virial coefficient is expected to start negative at low temperature, and become positive at high temperature.
Such temperature dependence is characteristic of the competition between entropic (excluded volume) and
energetic (long range attraction) effects, with entropy dominating at high temperature.
Example: Boyle temperature of the square well potential
At this temperature, which is called the Boyle temperature, the gas behaves the closest to an ideal
gas.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 56
9.7. PHASE COEXISTENCE AND METASTABLE STATES
We can use {Tc , Vc , Pc } to eliminate {N, a, b} from the equation of state. Doing so, we realize that we
can recast the equation of state entirely in terms of rescaled variables τ := TTc , ν := VVc , and ψ := PPc :
3
ψ + 2 (3ν − 1) = 8τ, (9.6.4)
ν
which is known as the law of corresponding states. This equation is remarkable because it predicts
that if we scale temperature, pressure, and volume by their critical point values, all real fluids obey
the same equation of state regardless of the interaction between them. The reduced compressibility
factor at the critical point is predicted to be a universal constant, Pc Vc /N kTc = 3/8. How does the
prediction fare experimentally? Experimental values usually fall between 0.2 and 0.3. Nevertheless,
a glance at generalized compressibility charts shows that the data for real gases do tend to fall on
“universal” curves.
N 2a
V − Nb
F (T, V, N ) = −N kT ln 3
+ 1 − . (9.7.1)
Nλ V
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 57
CHAPTER 9. STATISTICAL MECHANICS OF OPEN SYSTEMS
For the moment, we simply accept this form of the free energy
based on the fact that (a) it yields the desired equation of state; (b) it recovers the correct ideal gas free
energy when the interaction parameters a and b are set to zero. Now, imagine the system is following a van
der Waals isotherm. By the discussion of Sect. 9.1, we know that the chemical potential (or specific Gibbs
free energy) is minimum in these conditions. The Gibbs potential is
N 2a
V − Nb
G(T, P, N ) = min −N kT ln +1 − +VP . (9.7.2)
V N λ3 V
Minimization by taking the derivative with respect to V is equivalent to working with the equation of state,
except that now we are equipped with the means of classifying the stability of each solution by comparing the
Gibbs free energies (or chemical potentials). The figure shows the rescaled Gibbs free energy g := G/N kTc
using the rescaled units defined in the example box, (ψ, τ ), for a fixed τ < 1 as a function of ψ, i.e., following
a van der Waals isotherm. Note that the Gibbs free energy is given by the minimum of the plot for any given
pressure, which means that the system in equilibrium must move straight from point 2 to point 6, without
visiting the “bowtie” path (2,3,4,5,6) (that path corresponds to the values of the argument of the right hand
side of Eq. 9.7.2, F (T, V, N ) − V P (T, V, N ), before carrying out the prescribed minimization). Note also that
the point (2,6) corresponds to a cusp of g: the derivative of the free energy is discontinuous there. A point
where the first derivative of a thermodynamic potential has a discontinuity is called a first order phase
∂g
transition. In this case, the derivative we have considered is v = ∂P T,N
; at the first order transition of the
van der Waals fluid, two phases with different specific volume coexist (liquid and vapor). Experimentally,
calorimetry is often used to detect first order transitions from the measurement of the associated latent heat,
∂g
which signals a discontinuity in specific entropy s = − ∂T P,N
.
Although our discussion of the phase diagram has been dismissive of the bowtie path on the van der Waals
isotherm, real systems often are found in a thermodynamic state corresponding to points on the (2,3) and
(5,6) portions of the path. The existence of such states in equilibrium does not run counter any fundamental
law (unlike “states” along the (4,5) path, which would be states of negative isothermal compressibility). The
characteristic feature of such states is metastability: they are local, not global minima of the free energy,
and they are separated from the global minimum by a free energy barrier.
Application: Classical Nucleation Theory
The classical theory of homogeneous nucleation considers a metastable state, say a supercooled vapor,
where a fluctuation has occurred producing a droplet of the stable liquid phase. Let gV be the Gibbs
free energy per unit volume of the vapor and gL that of the liquid; since the vapor phase is metastable,
∆g := gL − gV < 0. When the droplet forms, a phase boundary appears in the material, with an
associated surface tension σLV , which is the reversible work needed to create a unit area of the
interface (microscopically, this work is required because interfaces involve broken bonds). The free
energy change upon formation of a droplet of radius R at constant P, T is then given by
4πR3
∆G = ∆g + 4πR2 σLV .
3
As a function of droplet radius, the free energy first increases, since the negative bulk term is pro-
portional to R3 while the positive surface term is proportional to R2 . There exists a critical radius
R∗ = 2σ
−∆g for which the total free energy cost of the stable phase droplet is maximum and equal to
LV
3
16πσLV
∆G∗ = .
3(∆g)2
According to classical nucleation theory, this is the barrier that needs to be overcome for the formation
of the stable phase, and kinetic theory then predicts a homogeneous nucleation rate proportional to
exp(−β∆G∗ ).
There are also second order phase transitions, which occur with no latent heat.The transition from
resistive to superconductive state in metals is of this type. Second order phase transitions were originally
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 58
9.8. WHAT MAKES A PHASE?
given this name because of a jump in the heat capacity (second order derivative of the free energy) observed,
e.g., in the superconducting transition. However, this is not always true; sometimes other types of singularity
are observed. The important point is that the first derivative of the free energy are continuous; hence the
phases are indistinguishable at the exact transition point – there is no phase coexistence or double minimum
in thermodynamic potentials. The critical point of the van der Waals fluid is an example of such instance:
there, the two minima of Eq. 9.7.2 coalesce and the latent heat and density difference disappear.
In the introduction, a phase of matter was defined as a homogeneous mixture of all compounds that can
be formed by the chemical elements present. Now if one tries to be more specific regarding the meaning of
homogeneous, several difficulties arise. One difficulty has to do with kinetics. Imagine filling a vessel with
two immiscible fluids, like water and oil. The presence of two phases will be evident from the presence of a
very sharp interface, which leaves no room for ambiguity at room temperature, where the entropic penalty
of not mixing is beaten hands down by the favorable energetics of keeping water’s hydrogen bond network
intact (save for the broken bonds at the interface). Now if the vessel is shaken vigorously, the liquid will
appear milky and homogeneous: a mixture of microscopic droplets of the two liquids.6 How many phases
are there? The correct answer is – one has no business talking about phases because the system is not in
equilibrium. If one waits long enough, water and oil will separate again. Long enough may be a minute
or two, or even a few hours, depending on the type of oil used and the cleanliness of the container, among
other factors; regardless, the system in equilibrium has two phases, the water and the oil. We can define
a measurable quantity, such as the expectation value of the density of water molecules, as the parameter
distinguishing the phases: it is nearly unity (normalized to the density of pure water at the same T, P ) in the
water phase and nearly zero in the oil phase, with an abrupt, discontinuous jump at the interface (here we
take a macroscopic viewpoint; interfaces are not abrupt at the atomic scale, although away from the critical
point, they are actually abrupt on the scale of a few nm). Other choices are possible, such as the difference
in molar concentration of water vs oil. The important point is that there exists a parameter that changes
abruptly at the phase boundary. We call this parameter an order parameter.
Now imagine adding some detergent into the vessel; shake it again and the same milky mixture appears.
Only this time it will tend to stay around longer, a lot longer. We are willing to call that equilibrium. How
many phases are there? The milky mixture is one phase; it is called the middle phase, since it coexists
in equilibrium with water (denser, at the bottom) and oil (lighter, on top). Now, this is considerably
trickier than the kinetics-dominated case of oil-water alone. How different is the mixture from before?
Macroscopically, not much, except that it is stable. Microscopically, it consists of water and oil domains
separated by a monolayer surfactant film. Having microscopic water and oil domains is entropically favorable,
and the energy cost (oil-water surface tension) is made smaller by the surfactant, which stabilizes the
homogeneous mixture.7
6 The milky appearance of the “homogeneous” system is caused by the difference in optical density between the two fluids,
just like in a fog (water has a lower index of refraction than oil).
7 This qualitative discussion is an oversimplification of the physics, but is a good starting point. The middle phase is actually
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 59
CHAPTER 9. STATISTICAL MECHANICS OF OPEN SYSTEMS
The lowering of the surface tension by a dilute adsorbed film is a general phenomenon due to the
entropic advantage of making more surface states available to the adsorbed molecules. Let σ12 be
the surface tension at the interface of a two-phase system. A differential increase in the interfacial
area A at constant T, V, N1 , N2 has a free energy cost of dF = σ12 dA. Now consider the addition of
a small amount of a third component, adsorbed at the interface, with areal density ns = Ns /A. The
free energy of this dilute surface film is Fs = −T Ss + Ns µs . Interactions among adsorbed molecules
are neglected, and the adsorption energy per particle is included in the chemical potential µs . The
entropy of the surface film, Ss , is proportional to the logarithm of the available area, by Boltzmann’s
law: Ss = Ns k ln A/A0 . A differential increase in the interfacial area at constant T, V, N1 , N2 , Ns has
a free energy cost of dF = σ12 dA − T ∂S ∂A dA = (σ12 − ns kT )dA. So the free energy cost of creating
s
more interface is lowered by an amount proportional to the areal density of surface molecules ns and
temperature, which is just the two dimensional ideal gas pressure, or spreading pressure. The build
up of the surface film results in a smaller surface tension.
So what makes the middle phase a phase? What is its order parameter? Since the middle phase coexists
with the water and oil phases, there must be something that tells them apart. Again, many choices are
possible, one being the (normalized) concentration of water, which will exhibit three values, approximately
one in the water phase, zero in the oil phase, and something in the middle in the middle phase, with two
distinct discontinuities at the two interfaces. Now here is the catch. We are necessarily assuming that we
are talking about an average concentration of water, with the average taken over a region containing many
surfactant-stabilized water and oil domains; otherwise, on a submicrometer scale, the density of water jumps
in a binary fashion between zero and one (or vice versa) as one traverses the surfactant film from the oil side
to the water side (or vice versa). So in defining the order parameter, we must first take averages over some
short (microscopic) length scales. This procedure works, because, as we will see later, the phenomenology
of phase transitions is dominated by long wave length fluctuations; what happens at short scales can be
averaged out.
z ∂Ω
ρ=− . (9.9.1)
V ∂z
If it happens that this parameter does not vary smoothly,8 then the thermodynamic potential is not an
analytic function (an analytic function is infinitely differentiable). Now, the partition function of a system
with N particles is a polynomial of degree N in the fugacity z. Polynomials are analytic functions, and thus
a system with a finite number of particles cannot have a phase transition. You may ask what happens in the
limit N → ∞. Isn’t an infinite series also infinitely differentiable? The answer lies in a theorem by Lee and
Yang, which we state without proof. Consider the grand canonical partition function Q. For finite N , Q is
8 It can jump (first order transition), or stay constant above some temperature and slowly vary below it (second order
transition), for example.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 60
9.10. EXERCISES
a polynomial with positive coefficients; it cannot real have roots. However, extending it to complex values
of fugacity z, it will have N complex roots. Now, in the limit V → ∞, N → ∞, the roots of Q can converge
to a value z0 on the real axis. When this happens, Q will not be analytic at z0 . Therefore, there is a phase
transition for that value of fugacity.
9.10 Exercises
1. Derive an expression for the expectation of energy in the grand canonical formalism, keeping the
chemical potential µ fixed, instead of the fugacity.
2. Derive the expression for the variance of the particle number Eq. 9.2.11.
3. Work out the adsorption problem if each surface site can adsorb 0, 1, or 2 particles, with respective
energies E0 = 0, E1 = −, E2 = −2 + η, with > 0. Discuss the three cases η > 0, η = 0, η < 0 and
sketch coverage vs pressure for each.
4. Fermi gas at low temperature For the ideal Fermi gas (N electrons in volume V ), the internal
energy at low, but nonzero temperature, can be calculated approximately from Eq. 9.4.3, yielding the
expression
2
5π 2 kT
3
E = N F 1 + .
5 12 F
(a) Is room temperature “low”? Justify your answer for a typical value of F = 5 eV.
(b) Work out the specific heat and the entropy. Are they consistent with the Third Law?
(c) Suppose your metal has one free electron per ion core. Compare the contribution of the specific heat
of the ion lattice to that of the electron gas at room temperature, assuming that classical equipartition
is valid for the ions.
5. Calculate the virial coefficient of the hard core van der Waals potential,
∞
r≤σ
6
v(r) = σ
−
σ < r,
r
assuming you can use a Taylor expansion appropriate for high temperature. Why is kT a reasonable
assumption for intermolecular potentials? (e.g., for argon, ≈ 1.7×10−21 J.)
6. Tonk’s gas Calculate the canonical partition function Q(T, L, N ) for a gas of N hard rods in one
dimension on the segment [0, L]. Each rod has length σ, and there is no other interaction besides the
hard core potential (excluded volume). (Note that this interaction imposes a natural order on the
string of particles: two hard core particles cannot switch places in one dimension.) Then from the
partition function, find the equation of state and compare it to the van der Waals form.
7. Show that Eq. 9.7.1 reproduces the van der Waals equation of state and yields the ideal gas free energy
when we set a = b = 0.
8. Sketch a comparison of how the Gibbs free energies of a gas, a liquid, and a solid vary as a function of
(a) pressure and (b) temperature.
9. Will it liquefy? A mole of diatomic van der Waals gas is passed through a Joule-Thompson expansion
valve. In this problem, analyze the process in the P, T plane, using a convenient approximation of the
virial equation of state,
RT
V = + B(T )
P
a
with B(T ) = b − RT . The Joule-Thompson expansion is isenthalpic: H(Pi , Ti ) = H(Pf , Tf ).
(a) Write an expression for the differential of H regarded as a function of T and P .
(b) Devise a suitable path consisting of isothermal and isobaric steps for which the use of the ideal gas
CP is justified. Sketch this path and use it to find an equation for the final temperature Tf .
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 61
CHAPTER 9. STATISTICAL MECHANICS OF OPEN SYSTEMS
10. DPPC is a surfactant lining the lung. As a monolayer on water, its spreading pressure Π is measured
to be 4 mJ m−2 at specific area of 0.4 nm2 at room temperature in a Langmuir trough. The surface
tension of pure water is σ0 = 72 mJ m−2 . Assuming ideal 2D gas behavior for DPPC on water, what
specific area would give you a water surface tension σ = 40 mJ m−2 ? Compare your answer to the
experimental value of 0.24 nm2 and explain the discrepancy. Propose a one-parameter modification of
the ideal gas equation of state that would behave more realistically at low specific area.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 62
Chapter 10
are two equations constraining the three M s, since each AA bond terminates into two A particles and each
AB bond terminates into one A particle, and there are a total of zNA bonds terminating into A particles,
and likewise for B. So one can eliminate two of the M s by solving the constraints as
zNA − MAB zNB − MAB
MAA = and MBB = .
2 2
In the absence of microscopic information, the value of MAB , needed to determine the exact energy of a
configuration, remains unknown. However, we can assigned it probabilistically by noting that the probability
63
CHAPTER 10. MEAN FIELD THEORY
that a neighbor of a particle of type A is a particle of type B is given simply by NB /N . (Strictly speaking
the probability is NB /(N − 1), but N is of order Avogadro’s number.) Thus, the Bragg-Williams mean
field assumption replaces MAB by its expected value:
NA NB
MAB ≈ hMAB i = z . (10.2.1)
N
Now the problem is completely determined and can be solved. Conventionally, one defines the temperature-
dependent exchange parameter
z wAA + wBB
χ= wAB − , (10.2.2)
kT 2
whereupon the energy becomes
zwAA zwBB NA NB
U= NA + NB + kT χ .
2 2 N
In terms of fractions of solvent and solute,
NA NB
x= and (1 − x) =
N N
the excess free energy of the solution, ∆F (x) := F (NA , NB ) − F (NA , 0) − F (0, NB ) is given by
∆F (x)
= χ x(1 − x) + x ln x + (1 − x) ln(1 − x). (10.2.3)
N kT
This is Hildebrand’s regular solution theory. Plotting out the excess free
energy as a function of composition for various values of χ (which depends
on temperature), we see that there are two regimes, much like for the van
der Waals fluid: for χ < 2, the excess free energy has a single minimum as
a function of x, while for χ > 2 there are two; this means that the system is
stable, at a given temperature, in two phases with different composition, a low
x phase composed mostly of type B particles and a high x phase composed
mostly of type A particles: the two components are immiscible and phase
separate. The critical temperature given by χ = 2 is the temperature below which the phase diagram shows
an immiscibility region. Note that there are systems where χ has a less trivial temperature dependence than
indicated by Eq. 10.2.2; this happens, for example, in some strong hydrogen bonding systems, where the
hydrogen bonding enthalpy becomes very strong at low temperature and favors the mixing of a hydrogen
bonding solute with water below some lower critical mixing temperature.
X PN X N
Y N
β hsi βhsi
Q(T, N, h) = e i=1 = e = 2 cosh(βh) , (10.3.1)
{si =±1} {si =±1} i=1
where the notation makes the correspondence V → N , µ → h explicit. The summation in Eq. 10.3.1 is
simply an application of the binomial theorem.
1 They are of course classical spin variables.
2 The negative sign comes from the analogy with magnetism, where the energy is minimum when dipoles align with the
external field.
3 Just as we would find for a collection of distinguishable tls particles. The binary degrees of freedom are distinguishable
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 64
10.3. THE ISING MODEL
XN
exp β hsi N N
i=1
Y eβhsi Y
p(s1 , s2 , . . . , sN ) = = = p(si ), (10.3.2)
Q i=1
Q1/N i=1
which is the product of single-site probabilities, and thus implies that the spins are independent random
variables when the spins are not interacting with each other. Now suppose instead that there is an interaction
between spins, which we write as a two-body interaction parametrized by a traceless4 symmetric matrix Jij
X X
H=− Jij si sj − h si . (10.3.3)
i,j>i i
Assuming for the moment that the interaction is of short range and without loss of generality taking it to
involve only nearest neighbors, we can further simplify the model and write it as
X X
H = −J si sj − h si , (10.3.4)
<i,j> i
where J parameterizes the interaction strength and < i, j > denotes that the sum runs over all pairs of
nearest neighbors. This is the Hamiltonian of the Ising model.
Consider a lattice gas, where each site i can be empty or singly occupied (ni = 0 or 1) and particles
P have an interaction energy u; let µ be the chemical potential. Then, H =
onPneighboring sites
u <i,j> ni nj − µ i ni . Introducing new variables for the degrees of freedom, si = 2ni − 1, we
obtain the Ising Hamiltonian (up to an inconsequential constant) with J = −u/4 and h = (µ + uz)/2.
X P P
Q= eβJ <i,j> si sj +βh i si
. (10.3.5)
{si =±1}
Now it is immediately clear that the probability of a given spin configuration does not factor as a product
of site probabilities, because of the correlations introduced by the nearest neighbor interaction term.
4 The trace would represents a self-interaction, which would be included in the potential h.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 65
CHAPTER 10. MEAN FIELD THEORY
Consider the partition function of a one dimensional (linear) cluster of three spins:
X X X
Q= eβJ(s1 s2 +s2 s3 ) eβh(s1 +s2 +s3 ) .
s1 =±1 s2 =±1 s3 =±1
which shows us that the middle spin effectively couples its neighbor to the left to its neighbor to the
right via an interaction given by
even though these two spins were not interacting in the original Hamiltonian (they are not nearest
neighbors). This correlation prevents one from writing the probability of a configuration as the
product of independent site factors.
It is not obvious how the Ising model can be solved in closed form in the general case. However, the
exact solution can be found in 1D and in 2D. The 1D solution is straightforward and instructive, although
it is uneventful: no phase transition is present in the 1D Ising model at nonzero temperature. In 2D,
the solution is a mathematical tour de force, and it provides a benchmark for numerical and approximate
methods of studying phase transitions. Before exploring exact solutions, we apply mean field theory to
obtain approximate solutions in all dimensions, subject to the caveats described at the end of Sect. 10.2.
The crucial point is that the factorization of the probability into single site terms (i.e., the assumption of
statistical independence) allowed us to replace the expectation of the product hsi sj i with the product of
expectations hsi ihsj i. Now note that the expectation of the local spin, which is the local magnetization, is
given by
mi = hsi i = 1 × pi + (−1) × (1 − pi ) = 2pi − 1. (10.4.2)
Expressing pi in terms of mi allows us to rewrite the Gibbs free energy in terms of the magnetization alone
as
X X X 1 + mi 1 + mi 1 − mi 1 − mi
GMF = −J mi mj − hi mi + kT ln + ln . (10.4.3)
<i,j> i i
2 2 2 2
We recognize the magnetization as the order parameter of the system (cf. Sect. 9.8). Now, we know that G
is a function of T, h, N only; therefore, at equilibrium, it must be minimum with respect to the values of the
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 66
10.5. LIMITATIONS ON THE APPLICABILITY OF MEAN FIELD THEORY
mi :
∂G X kT 1 + mi
= −J mj − hi + ln = 0, i = 1, . . . , N, (10.4.4)
∂mi 2 1 − mi
j∈{nn}i
where the summation extends over the nearest neighbors of the i-th spin. Solving the Ising model in the
mean field approximation thus requires the solution of a coupled system of N transcendental equations.
The task becomes much simpler if the external field is
constant. In that case, the order parameter is uniform and
the minimum free energy solution is the constant solution m
that satisfies
kT 1+m
−Jzm − h + ln = 0, (10.4.5)
2 1−m
or, equivalently,
Below Tc , the zero magnetization solution is unstable because it corresponds to a maximum in the free
energy. Note the dependence of Tc on system dimensionality D, through the number of nearest neighbors z;
for hypercubic lattices, kTc = 2JD in D dimensions. In simple terms, Tc can be viewed as the temperature
above which entropy-driven behavior (disorder) takes over enthalpy-driven behavior (order). Hence, entropy
is more important (more destabilizing) in lower dimensions.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 67
CHAPTER 10. MEAN FIELD THEORY
or in the presence of long range forces. In the next section, we show how mean field theory succeeds in
giving reasonable approximations to the problem of particles interacting through the Coulomb potential,
something the virial expansion could not handle. However, we will find later that mean field theory is
utterly inapplicable to the one dimensional nearest neighbor linear Ising spin chain, where the exact solution
shows that the critical temperature vanishes, in contrast to the prediction of mean field theory. This is
because of the in one dimension a fluctuation at a single site can be overwhelmingly important since z is so
small.
where e is the proton charge, the permittivity of the medium, and the sign of the “spin” variable corresponds
to the sign of the charge;
(b) The system must have zero net charge, or else the thermodynamic potentials will not be extensive. This
can be arranged by a suitable choice of the chemical potential; in this case h = 0 ensures zero net charge by
symmetry.
Along with these modifications, we also assume for simplicity that the positive and negative ions are singly
charged (q± = ±e). Then, the mean field equation becomes
βe2 X mj
mi = tanh − . (10.6.1)
4π j |rij |
We then replace the lattice with a continuum and thus the site “magnetization” variables mi with the local
V
ρ(~r) and the sum with an integral ( N1 i → V1 d3 r):
P R
volume density of charge, mi → N
βe2 r 0)
Z
3 0 ρ(~
ρ(~r) = 2ρ0 tanh − d r , (10.6.2)
4π |~r − ~r 0 |
e
R 3 0 ρ(~r 0 )
with ρ0 = N/2V the average number density of positive charges. Finally, we note that 4π d r |~r−~r 0 | = ϕ(~r)
is the electrostatic potential at point ~r and use the first Maxwell equation,
eρ(~r)
∇2 ϕ(~r) = − , (10.6.3)
to arrive at
2eρ0
∇2 ϕ(~r) = tanh[βeϕ(~r)]. (10.6.4)
This equation is called “crowded” Poisson-Boltzmann equation, since it is the appropriate replacement for
the more commonly used Poisson-Boltzmann equation for densely packed electrolytes (such as near charged
electrodes). Both Eq. 10.6.4 and the plain Poisson-Boltzmann equation,
2eρ0
∇2 ϕ(~r) = sinh[βeϕ(~r)], (10.6.5)
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 68
10.7. EXERCISES
can be linearized to capture the behavior of dilute electrolytes6 and the behavior of the potential at long
distances, where it is expected to be small. Then, we are allowed to retain only the first term in the Taylor
expansion of tanh x or sinh x ≈ x. This limit corresponds to the Debye-Hückel approximation,
2e2 ρ0
∇2 ϕ(~r) = β ϕ(~r). (10.6.6)
For a unit test charge in an electrolytic medium, the electrostatic potential from equation 10.6.6 at a distance
r from the charge is
e−r/λ
ϕ(~r) = , (10.6.7)
4πr
which decays exponentially with decay length given in term of the ionic strength by
s
kT
λ= , (10.6.8)
e2 i ρ0,i zi2
P
where the i-th species has charge number zi and density ρ0,i , and the charge neutrality condition is
P n
i=1 ρ0,i qi = 0. This length is called the Debye screening length. Note that the Debye-Hückel poten-
tial can not be recovered as a truncated expansion in the density, which explains our failure to obtain a
sensible result for the second virial coefficient for the Coulomb interaction. In this respect, mean field theory
is much more successful.
10.7 Exercises
kT
1. Show that the interfacial tension in the lattice model of regular solutions is given by γAB = za χ, where
a is the unit area of the interface.
2. Consider a lattice gas with nearest neighbor interactions, with bond energy − < 0. Let z be the
lattice coordination, ρ the fraction of occupied sites.
(a) Calculate the pressure and the second virial coefficient. In what limit is the ideal gas equation of
state recovered?
(b) Find the heat capacity. (Hint: work it out from the entropy) Explain your result in terms of the
limitations of the lattice model.
3. (Cooperative Adsorption) Consider a lattice model of surface adsorption, where the particles are in
ideal gas in the bulk gas phase, but they interact once they are adsorbed on the surface. There are
n particles occupying N sites on the surface; an adsorbed particle has energy E = − < 0, and two
particles occupying nearest neighbor sites have negative interaction energy −J (thus, J > 0).
(a) Defining surface coverage as θ = n/N , find the free energy F (T, N, θ) in mean field theory.
(b) From F , find the chemical potential µa of the adsorbed atoms in terms of temperature and coverage.
If the adsorbed atoms are in equilibrium with a gas phase, which we take to be an ideal gas, find a
relation between coverage and pressure. Express it as P (θ) (unlike the Langmuir isotherm, this relation
cannot be inverted in closed form).
(c) Express the condition for pressure to have an inflection point as a cubic equation in coverage. (Hint:
show that if the graph of y = f (x) has an inflection point at x = x0 , then the graph of its inverse
x = g(y) has an inflection point at y0 = f (x0 )).
(d) Cubic equations always have a real root. Set a bound on the range of T for which this root will be
physically meaningful within the mean field theory framework.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 69
CHAPTER 10. MEAN FIELD THEORY
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 70
Chapter 11
where
h
T (si , si+1 ) = exp Jsi , si+1 + (si + si+1 ) (11.1.2)
2
1
represents the contribution of each of the two bonds in the cluster. The factor of eβh 2 (s1 +s3 ) hanging in
front reminds us that there is no bond between the first and last spin in our cluster. Now, T (si , si+1 ) is a
function of the two spin connected by the i−th bond; it is, in fact, a 2 × 2 matrix, because each of its indices
can take on two values. T is called the transfer matrix; T11 = T (↑↑), T12 = T (↑↓), T21 = T (↓↑), and
T22 = T (↓↓) correspond to all possible combinations of spin values across the bond between s1 and s2 :
e−βJ
β(J+h)
T T12 e
T = 11 = . (11.1.3)
T21 T22 e−βJ eβ(J−h)
P
Next, note that s2 =±1 T (s1 , s2 )T (s2 , s3 ) = Ts1 ↑ T↑s3 + Ts1 ↓ T↓s3 is simply the matrix product of two transfer
matrices. So computing the partition function of a linear chain of spins has been reduced to computing the
product of 2 × 2 matrices.
Example: Three-spin cluster again
Let us rework the result from the previous section using the transfer matrix formalism. The matrix
product in Eq. 11.1.1 is
71
CHAPTER 11. EXACT RESULTS AND BREAKDOWN OF MEAN FIELD THEORY
For a chain of N spins, it is convenient to impose periodic boundary conditions, which means that the
first spin of the chain makes a bond with the last one,1 as if the chain were wrapped around a circle; in other
words, sN +1 = s1 . With this boundary condition, Eq. 11.1.1 becomes, for N spins,
X X X
Q= ··· T (s1 , s2 )T (s2 , s3 ) . . . T (sN , s1 ) = Tr(T N ). (11.1.4)
s1 =±1 s2 =±1 sN =±1
1 X ∂g sinh βh
m(T, h) = hsi i = − = . (11.1.7)
N i ∂h [sinh2 βh + e−4βJ ]1/2
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 72
11.2. BREAKDOWN OF MEAN FIELD THEORY: FLUCTUATIONS AND DEFECTS
The free energy must be minimum with respect to the density of domain walls:
∂∆G w
= 0 =⇒ 2J + kT ln = 0, (11.2.4)
∂w 1−w
which means that there is a finite density of domain walls at any nonzero temperature:
1
w= , (11.2.5)
eβ2J + 1
and therefore a finite correlation length3 of order the mean spacing between domain walls:
1
ξ∼ ∼ e2βJ (11.2.6)
w
at low temperature.
There is a great deal of physical insight to be gained from inspection of Eq. 11.2.2. The crucial feature
is that the energy of the defect is independent of the system size L, while the entropy scales with ln L (L is
simply the number of lattice sites N times the lattice spacing). Therefore, at any nonzero temperature there
will be a size large enough for defect to become entropically favorable. This scaling argument illustrates the
role of spatial dimension and of the range of the interactions. For example, it is easy to see that a spin chain
with long range coupling, decaying as 1/x2 at large separation x, yields a domain wall energy proportional to
ln L. We may expect a phase transition at finite T in this case, and the exact solution of the model confirms
this expectation. As another example, consider the Ising model in two dimensions. A naive choice of a
rigid domain wall predicts an energy that grows linearly with system size and a logarithmic entropy, so that
domain walls would have positive free energy at any temperature in a large system (∆G ≈ 2JL − kT ln L)
and they could not make the system unstable. The exercises show that this argument is too crude, since
in reality a zigzagging domain wall has entropy that scales linearly with L, competing with the energy; but
3 The exact expression of the correlation length is derived in the exercises.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 73
CHAPTER 11. EXACT RESULTS AND BREAKDOWN OF MEAN FIELD THEORY
even if the naive argument were correct, one could not conclude that the system is always ordered, because
there could be other defects that are more energetically favorable than the rigid domain wall and could drive
a phase transition. In summary, arguments such as those made in this section can only be used to prove the
absence of an ordered phase, but not its presence.
where f (φ, T ) is some complicated function of the order parameter and of temperature; by the extensivity
of the free energy, ln Q, we know that f must be intensive, since we have scaled out a factor of volume: it is
a free energy density. Taking the thermodynamic limit via the saddle point method, one has
or
F = V f (φmin ) + const.
The problem is, what is f (φ, T )? Landau great insight was to recognize that the requirement of analiticity
away from a a point of phase transition constrains the form of f greatly. First, it must be a power series of φ,
with temperature-dependent coefficients. The temperature dependence arises from all the integrals over the
degrees of freedom. Second, it must obey all symmetries imposed by the physics. These are usually (but not
always) easy to identify; in the next chapter we will see that an important class of systems, including fluids,
binary mixtures, and certain ferromagnets, can be described by an order parameter that has “up-down”
symmetry: the free energy must not change if we flip the sign of φ. If the system is subject to an external
force field h that couples linearly to φ, the free energy must not change if the signs of both h and φ are
flipped simultaneously. Third, consider the system near a critical point, where the difference between the
two phases (experimentally, the density difference, surface tension, latent heat, etc.) is small. Then we
can expect that the order parameter will be small and only the first few terms of the power series will be
important; this is an approximation that can actually be verified a posteriori. Finally, local fluctuation in φ
should be considered in principle (based on Sect. 9.8, or if the external field is spatially modulated): then
the order parameter should be taken to be a function of position (i.e., a family of local order parameters4 )
and the free energy density should be allowed to depend also on the gradient of the order parameter and to
contain terms such as (∇φ)2 .
4 In
this case, the integral over φ in Eq. 11.3.2 actually becomes a multiple integral over all local order parameters; integrals
where the integration variable is a function are called functional integrals.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 74
11.3. THE LANDAU THEORY OF PHASE TRANSITIONS
Putting all these considerations together, the free energy density can only have the form
a b c
f (φ, h) = f0 + φ2 + φ4 + (∇φ)2 − hφ + ..., (11.3.4)
2 4 2
where the dots remind us that we left out terms of higher order in φ and in its gradient. The coefficients
a, b, c and the “background” free energy f0 are temperature dependent. This expression of the free energy
is exact, but is often called “phenomenological,” meaning that the coefficients are not derived from first
principles. Unfortunately, its exact solution is difficult and is beyond the scope of these notes; however, it
is instructive to work out a simple solution under assumptions that amount that amount to a “mean field
approximation.” We will explore the meaning and the limits of this approximation in the next chapter in
great detail. Here, we just state the main results of Landau mean field theory, obtained by neglecting the
effect of gradient terms in the Landau free energy, and assuming a spatially constant order parameter φ.
The thermodynamic limit is obtained by minimizing the free energy with respect to φ:
df
=0 =⇒ aφ + bφ3 − h = 0. (11.3.5)
dφ
Consider h = 0 for simplicity, and let f0 , b, c be some nonzero constants, while a = a0 (T − Tc ), with a0 > 0.
When a > 0, there is a single, stable solution, φmin = 0, meaning that the order parameter vanishes: this
is the high temperature disordered phase. However, if a < 0, the zero solution becomes unstable and two
stable solutions emerge,
p
φmin = ± −a/b ∝ τ 1/2 , (11.3.6)
with
Tc − T
τ= . (11.3.7)
Tc
There are now two phases and the free energy, f (φmin , T ) develops a singularity at exactly the point where
a = 0, or T = Tc . As the critical temperature is approached from below, the order parameter becomes
smaller and smaller, according to a power law with exponent 1/2. Many other physical quantities turn out
to have a power law dependence near the critical temperature, giving rise to a maze of critical exponents.
Susceptibility near the critical point
φ(h) = φ(0) + χh
through first order in h, where φ(0) is the solution to Eq. 11.3.5 in zero field. Thus, for T > Tc ,
φ(0) = 0, so aχh = h and
χ = 1/a ∼ |τ |−1 , T > Tc .
p
For T < Tc , φ(0) = ± −a/b, so (φ(0) + χh)(a + bφ(0)2 + 2bφ(0)χh) − h = 0 neglecting terms of
order h2 . But a + bφ(0)2 = 0, so we can write (φ(0) + χh)2bφ(0)χh − h = 0 or
From the calculation, we see that the susceptibility diverges as 1/|τ | both above and below Tc , but
the amplitude coefficients are different, which is the normal occurrence near the critical point. The
heat capacity is another example; see Exercises.
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 75
CHAPTER 11. EXACT RESULTS AND BREAKDOWN OF MEAN FIELD THEORY
A characteristic length scale is set by the size of the subsystems within which particles behave co-
operatively; on larger scales, particles appear uncorrelated. Near the critical point, this length scale
diverges: the system is macroscopically correlated. How this happens can be understood from an
approximation to Landau theory, that retains the gradient term but neglects the nonlinear terms in
Eq. 11.3.4 Z
3 c 2 a 2
F [φ] = d r (∇φ) + φ − h(~r)φ(~r) . (11.4.1)
2 2
We are interested in the following question: how far is the effect of a small, localized perturbation
felt in the medium as T → T + ? For instance, we could imagine flipping a spin (or a small cluster of
spins) in a magnet near the Curie point or putting a droplet of liquid in a barely supercritical fluid.
Mathematically, the knob for doing this is the external field, h(~r). We need to minimize the free
energy with respect to the order parameter in the presence of the perturbation. Since we have only
retained the linear and quadratic terms of the free energy, it is convenient to work with the Fourier
transform of the order parameter and of the perturbation, φ̂(~k) and ĥ(~k), using the three-dimensional
extension of Eq. 3.9.4. Note that since φ(~r) is real, we have φ̂(−~k) = φ̂∗ (~k), where ∗ indicates the
complex conjugate. Thus the free energy can be written as
Z 3
∗ d k c 2 a ~ ∗ ~ 1 ~ ∗ ~ ∗ ~ ~
F [φ̂, φ̂ ] = k + φ̂(k)φ̂ (k) − ĥ(k)φ̂ (k) + ĥ (k)φ̂(k) . (11.4.2)
(2π)3 2 2 2
The advantage of the Fourier transform is that each value of ~k is decoupled from the others. If we
regard the continuum of k-space as a dense mesh of discrete points, the free energy is a diagonal
quadratic form, and its minimization is a straightforward algebraic problem:
∂F ĥ(k)
=0 =⇒ (k 2 + a)φ̂∗ (~k) = ĥ∗ (k) =⇒ φ̂(~k) = . (11.4.3)
∂ φ̂(~k) ck 2 + a
Now, if h(r) is taken to be localized, i.e., a point source h(r) = hδ(~r), then ĥ(~k) = h is a constant.
Thus,
Z 3 Z 3 r
d k ~ i~k~˙r h d k i~k~˙r 1 h a
φ(~r) = φ(k)e = e = exp − r ∼ e−r/ξ , (11.4.4)
(2π)3 c (2π)3 k 2 + (a/c) c c
showing that the order parameter decays exponentially away from the localized perturbation with
characteristic length r r
c c
ξ= = ∼ |τ |−1/2 . (11.4.5)
a a0 Tc |τ |
ξ is the correlation length. Note the mathematical analogy between ξ and the Debye screening
length, stemming from the fact that the linearized differential equations solved in the two cases are
identical.
The existence and the value of the correlation length are among the most powerful predictions of Landau
5A familiar example may be fractals, which are objects that look the same on all length scales and are characterized by
“fractal exponents.”
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 76
11.5. EXERCISES
Theory. As long as ξ stays finite, the system has a well defined characteristic length. However, ξ has a
negative critical exponent, meaning that it becomes infinite as the critical point is approached (we have just
shown it from above; the same is true from below). The exact value of the exponent is somewhat different
from 1/2 when calculations are performed retaining the term φ4 in the free energy, but the qualitative
behavior of the correlation length is well captured by Eq. 11.4.5. In particular, the equation shows us
that at the critical point, a system is infinitely correlated, so that a perturbation anywhere in the system
must eventually be “felt” anywhere else. If the theory is extended to the time domain to include relaxational
dynamic effects, one finds that relaxation times also diverge at the critical point. As a consequence, a system
takes a long time to reach equilibrium: experiments near critical points are extremely tricky to perform for
this reason. So are computer simulations under the same conditions. In particular, (a) the approach to
the critical point is severely limited by the size of the system, which must be greater than the correlation
length at a given temperature and (b) equilibration times become very large making, computations rather
CPU-intensive.
11.5 Exercises
P
1. Consider the one dimensional Ising model in zero external field, H = −J i si si+1 . The correla-
tion function, Γ(k) = hsi si+k i, by symmetry depends only on the distance between lattice points,
k. ShowP that Γ(k) = hsi si+1 si+1 si+2 si+2 ...si+k−1 si+k−1 si+k i. Then, rewrite1 the ∂Q
Hamiltonian as
H = − i Ji si si+1 ; at the end all Ji will be set equal to J. Show that hsi si+1 i = βQ ∂Ji , and hsi si+k i =
1 ∂k Q
Calculate the partition function and show that Γ(k) = (tanh(βJ))k . Does the corre-
β k Q ∂Ji ...∂Ji+k−1
.
lation length ever diverge?
2. The free energy cost of having a straight domain wall in a 2D Ising model on an L × L lattice is
2JL − kT ln L. But how about if the wall is not straight? Consider the following rule for making a
domain wall: draw the wall from left to right, but instead of going straight right at each step, you can
go straight, or one step up or one step down. How does the free energy scale with L and what does
this result tell about order in the 2D Ising model?
3. Show that the heat capacity at constant volume is discontinuous at T = Tc in Landau mean field
theory. This means that the approach to Tc is of the type cV = c± |τ |0 , i.e., the critical exponent is
zero, and the amplitudes are different above and below Tc .
4. Derive the exponent of the critical isotherm in Landau mean field theory; that is, find how φ depends
on h along the isotherm T = Tc .
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 77
Index
78
INDEX
Quantum Degeneracy, 25
Quantum Statistics, 7
Reservoir, 27
Responses, Thermodynamic, 26
Reversibility and Maximum Work, 30
Reversible Processes, 26
Sackur-Tetrode Formula, 19
Saddle Point Method, 42
Screening, 69
Second Law of Thermodynamics, 28
Spreading Pressure, 60
Square Well Potential, 56
Stability, 35
Standard Deviation, 11
Stirling’s Formula, 6
Uniform Distribution, 12
Variance, 11
Virial Coefficients, 54
c Carlo Carraro – CBE 240 Lecture notes – Fall 2020 – Draft 79