PCMI Notes
PCMI Notes
Lecture notes
(Preliminary Version)
Marek Biskup
1 Prerequisites 5
1.1 Basic probability concepts . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Random walks 13
2.1 Random walks and limit laws . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Transition in d = 2: Recurrence vs transience . . . . . . . . . . . . . . 18
2.3 Transition in d = 4 & Loop-erased random walk . . . . . . . . . . . . 22
2.4 Harmonic analysis and electric networks . . . . . . . . . . . . . . . . 27
2.5 Random walks on resistor networks . . . . . . . . . . . . . . . . . . . 34
3 Branching processes 41
3.1 Galton-Watson branching process . . . . . . . . . . . . . . . . . . . . . 41
3.2 Critical process & duality . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Tree percolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Erdős-Rényi random graph . . . . . . . . . . . . . . . . . . . . . . . . 56
4 Percolation 61
4.1 Percolation transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Uniqueness of infinite component . . . . . . . . . . . . . . . . . . . . 66
3
4 CONTENTS
Chapter 1
Prerequisites
The purpose of this introductory section is to introduce the basic concepts that will
be needed throughout the course. We will for the most part stay away from stating
theorems, lemmas, etc.; these will be the subject of the next section.
Probability originated in games of chance. A prime example associated with prob-
ability is the experiment of tossing a coin. This is a procedure intended to produce
a random choice out of two answers: Heads or Tails. Generally, the outcomes of
such a random experiment are collected in a set Ω called the sample space. For the
coin toss, we simply have Ω = { H, T }.
Before probability theory was axiomatized and embedded into the rest of math-
ematics, the word probability referred to the frequencies with which the various
outcomes — i.e., values from Ω — were seen in repeated random experiments.
For instance, it is a known fact that tossing a coin many times generally results in
about the same fraction of heads and tails. If ω denotes the result of the coin toss,
we extrapolate from this
1
P(ω = heads) = P(ω = tails) = . (1.1)
2
We read: the probability to get heads is 1/2, and similarly for the tails. For more
general discrete sets Ω — finite or countably infinite — we similarly need to extract
the frequences with which each element z of Ω occurs. The value thus obtained is
then proclaimed to be the probability to get z, namely, P(ω = z).
Once we know the frequencies for all elements of Ω, we can start asking more
general questions, e.g., for a set A ∈ Ω, how likely it is that the outcome will fall
into A. An example of this is rolling a die. Here Ω = {1, . . . , 6} with P(ω = j) = 1/6
for each j = 1, . . . , 6. However, if we take A = {5, 6} we may ask what is the
probability that ω ∈ A? It turns out
1 1 1
P( ω ∈ A ) = + = (1.2)
6 6 3
5
6 CHAPTER 1. PREREQUISITES
where we used our common sense to conclude that the frequency with which we
can see 5 or 6 is simply the sum of the frequencies to see 5 and to see 6.
The previous observation quickly results in the conclusion that it is actually better
to define probability as a function of subsets of Ω which we call events. We will
thus talk of probability of a set A, writing P( A), or probability that A occurs. From
its nature P( A) has to be a number between 0 and 1 and such that P(Ω) = 1
and P(∅) = 0. We will also require that if A and B are disjoint events, then
P( A ∪ B ) = P( A ) + P( B ) (1.3)
which reflect the fact that the relative frequency of results from A ∪ B is the sum of
relative frequencies of a result in A and the relative frequency of a result in B. Since
something has to come up as a result of the experiment, we have
∑ P {ω } = 1
(1.4)
ω ∈Ω
The conditions (1-3) are very natural as they basically ensure that the set F is closed
under all basic set-theoretical operations.
Exercise 1.2 Show that F is closed also under countable intersections,
A1 , A2 , · · · ∈ F An ∈ F
\
⇒ (1.6)
n ≥1
Also the axioms (A1-A2) are fairly natural. Indeed, (A1) is a generalization of the
statement that the probability of a set of individual outcomes is simply the sum of
individual probabilities, while (A2) ensures that “something must occur,” i.e., the
frequencies of all outcomes add up to one.
Before we pass to some examples, let us record some basic facts about probability
spaces. The proofs of these are omitted for brevity. Nevertheless, the reader may
find them to be very interesting exercises.
Lemma 1.3 Suppose (Ω, F , P) is a probability space. Then P has the properties of
(1) if A1 ⊃ A2 ⊃ . . . , then \
P An = lim P( An ) (1.10)
n→∞
n ≥1
(2) if A1 ⊂ A2 ⊂ . . . , then [
P An = lim P( An ) (1.11)
n→∞
n ≥1
1
P {ω } = n .
(1.12)
2
To see that this is reasonable we appeal to our intuition that — for the coin to be
fair — each sequence should have the same probability. Since there are 2n distinct
sequences, each should have probability 2−n .
Example 1.6 Rolling a die : A die has six sides and if it is fair—i.e., no “sticky”
sides—all have equal chance to come up as a result of rolling it. The sample space
is Ω = {1, 2, . . . , 6} and the probability is given by P({ω }) = 1/6 for all ω ∈ Ω.
8 CHAPTER 1. PREREQUISITES
Example 1.7 Rolling a die n-times : We proceed as for coin tosses. The sample space
is Ωn = {1, 2, . . . , 6}n and each sequence ω ∈ Ωn of n numbers from {1, 2, . . . , 6}
has probability
1 1
P {ω } =
= n. (1.13)
|Ωn | 6
Example 1.8 Recording only the 6’s : We add a twist to the previous example. Again
we will roll a die 6 times, but now we only want to record whether we got a “6” or
not. The result will be a sequence of 1’s and 0’s and so our sample space is Ω0n =
{0, 1}n , that is, the same as for the coin tosses! However, the probability is very
different. Indeed, if ω ∈ Ω0n is one such sequence, its probability will be
n 1 ω k 5 1− ω k
P {ω } = ∏
, (1.14)
k =1
6 6
Here η → ω means that ηk equals six if and only if ωk = 1, for all k = 1, . . . , n, and
the last expression is the consequence of the fact that there are five different choices
for ηk whenever ωk = 0. As is easy to check, this is exactly (1.14).
{ω ∈ Ω : X (ω ) ∈ I } ∈ F (1.16)
for any interval I ⊂ R. A random variable is vector valued if it takes values in Rd (in this
case we substitute intervals by balls in (1.16)).
The condition (1.16) is entirely technical and can be omitted except for its role in
the following definition:
Definition 1.10 [Distribution function] Consider a random variable X taking values
in R. Then the function
F ( x ) = P({ω : X (ω ) ≤ x }) (1.17)
is called the distribution function.
Clearly, we need (1.16) for at least intervals of the form I = (−∞, x ] in order to be
able to define the distribution function. Generally, we refer to expressions of the
form P( X ∈ A) as the distribution of X.
1.1. BASIC PROBABILITY CONCEPTS 9
Problem 1.11 Show that F takes values in [0, 1], is non-decreasing and right con-
tinuous with limx→∞ F ( x ) = 1 and limx→−∞ F ( x ) = 0. Moreover,
The reason for this name is seen from the following observation:
Problem 1.14 Suppose X1 , X2 , . . . , Xn can only take values in a countable set R.
Show that knowing the value of one random variable, say X1 , does not influence
the distribution of the others. Explicitly, for any a1 , . . . , an ∈ R,
P ( X1 = a 1 , X2 = a 2 , . . . , X n = a n ) = P ( X1 = a 1 ) P ( X2 = a 2 , . . . , X n = a n ) (1.21)
Let us go back to Example 1.8. The right-hand side in (1.14) is not the most general
imaginable on Ωn . An important extension is as follows:
Example 1.15 Bernoulli random variables : Consider the sample space Ωn = {0, 1}n
and let p ∈ [0, 1]. For each sequence ω ∈ Ωn , define
n
P {ω } = ∏ p ω (1 − p )1− ω
k k (1.22)
k =1
This is the Bernoulli distribution. Note that the outcomes of the experiment with this
distribution are manifestly independent in the above sense.
S n = X1 + · · · + X n , (1.23)
10 CHAPTER 1. PREREQUISITES
that is, the sum of the first n elements of the random sequence ( Xk ). As is easy
to check, Sn can take any integer values between 0 and n. Suppose we wish to
calculate the distribution of Sn , i.e., the collection of numbers P(Sn = k ) for all k =
0, 1, . . . , n. The answer is as follows:
Lemma 1.16 [Binomial distribution] Let ( X j ) be Bernoulli with parameter p ∈ [0, 1].
For each k = 0, 1, . . . , n, we have
n
P ( Sn = k ) = p k (1 − p ) n − k . (1.24)
k
A random variable with this distribution is called Binomial with parameters n and p.
λm −λ
P( Zn,λ/n = m) −→ e (1.26)
n→∞ m!
A yet another example of discrete random variables is the geometric random vari-
able that can be arrived at as follows:
Problem 1.18 Suppose X1 , X2 , . . . be independent 0-1-valued random variables
with P( Xk = 1) = p and P( Xk = 0) = 1 − p. We may think of these as results
of tossing an unfair coin. Define T to be the time when the first 1 appeared,
P( T = n) = (1 − p)n−1 p, n ≥ 1. (1.28)
Examples of random variables with continuous distribution are easier to state be-
cause one only has to give the corresponding probability density function f ( x ).
Note that in these cases Z
P( X ∈ A ) = f ( x )dx (1.29)
A
1 2
f (x) = √ e− x /2 (1.31)
2π
X−µ
Y= (1.32)
σ
has the distribution of N (0, 1).
Example 1.21 Exponential : This random variable takes values in [0, ∞); the proba-
bility density is (
λe−λx , x > 0,
f (x) = (1.33)
0, otherwise.
Example 1.22 Cauchy : A Cauchy random variable takes all values in R and it has
the probability density
1 1
f (x) = (1.34)
π 1 + x2
Having given a sufficient number of representative examples, let us address the
last item on the “basic probability list,” namely, computing expectations.
Definition 1.23 [Expectation] Let X be a random variable taking values in R. Then
the expectation, EX, is defined by
EX = ∑ a P( X = a ) (1.35)
a
if X has continuous distribution with probability density f . We say that EX does not exist
if either the sum or the integral is not well defined (e.g., is of type ∞ − ∞, etc).
12 CHAPTER 1. PREREQUISITES
We omit the proof for brevity. The idea of the proof is easier to see in the analogous
statement for discrete random variables:
Exercise 1.26 Let X have a discrete distribution and let us define Eg( X ) using the
probability mass function of g( x ). Show that
Eg( X ) = ∑ g ( a ) P( X = a ). (1.39)
a
Random walks
Random walks are one of the basic objects studied in probability theory. The moti-
vation comes from observations of various random motions in physical and biolog-
ical sciences. The most well-known example is the erratic motion of pollen grains
immersed in a fluid — observed by botanist Robert Brown in 1827 — caused, as
we now know, by collisions with rapid molecules. The latter example serves just
as well for the introduction of Brownian motion. As will be discussed in a par-
allel course, Brownian motion is a continuous analogue of random walk and, not
surprisingly, there is a deep connection between both subjects.
The definition of a random walk uses the concept of independent random variables
whose technical aspects are reviewed in Chapter 1. For now let us just think of
independent random variables as outcomes of a sequence of random experiments
where the result of one experiment is not at all influenced by the outcomes of the
other experiments.
Definition 2.1 [Random walk] Suppose that X1 , X2 , . . . is a sequence of Rd -valued
independent and identically distributed random variables. A random walk started at
z ∈ Rd is the sequence (Sn )n≥0 where S0 = z and
S n = S n −1 + X n , n ≥ 1. (2.1)
The quantities ( Xn ) are referred to as steps of the random walk.
Our interpretation of the above formula is as follows: The variable Sn marks the
position of the walk at time n. At each time the walk chooses a step at random —
with the same step distribution at each time — and adds the result to its current
position. The above can also be written as
S n = z + X1 + · · · + X n (2.2)
for each n ≥ 1. Note that while the steps X1 , X2 , . . . are independent as random
variables, the actual positions of the walk S0 , S1 , . . . are not.
13
14 CHAPTER 2. RANDOM WALKS
Figure 2.1: A path of length 104 of the simple random walk on Z drawn by inter-
polating linearly between the points with coordinates (n, Sn ), n = 0, . . . , 104 .
Exercise 2.2 Let (Sn )n≥0 be a random walk. Show that S2n − Sn and Sn are inde-
pendent and have the same distribution.
The walk then jumps left or right equally likely at each time. This case is more cor-
rectly referred to as the “simple symmetric random walk,” but the adjective “sym-
metric” is almost invariably dropped. In the other cases, i.e., when
with p 6= 1/2, the walk is referred to as biased. The bias is to the right when p > 1/
2
and to the left when p < 1/2.
Example 2.4 Simple random walk on Zd : This is a d-dimensional version of the first
example. Here X1 takes values in {±ê1 , . . . , ±êd } where êk is the “coordinate vec-
tor” (0, . . . , 0, 1, 0, . . . , 0) in Rd with the “1” appearing in the k-th position. This
random walk is confined to the set of points in Rd with integer coordinates,
Zd = ( n 1 , . . . , n d ) : n 1 , . . . , n d ∈ Z .
(2.5)
2.1. RANDOM WALKS AND LIMIT LAWS 15
Figure 2.2: The set of vertices visited by a two-dimensional simple random walk
before it exited a box of side 103 . The walk was started at the center of the box and
it took 682613 steps to reach the boundary.
The easiest example to visualize is the case of d = 2 where the set Z2 are the vertices
of a square grid. Thinking of Z2 as a graph, the links between the neighboring
vertices represent the allowed transitions of the walk. A majority of appearances
of this random walk is in the symmetric case; i.e., when X1 takes any of the 2d
allowed values with equal probabilities.
Example 2.5 “As the knight jumps” random walk on Z2 : This random walk takes
steps allowed to the knight in the game of chess; i.e., there are 8 allowed jumps
Some experience with chess reveals that the random walk can reach every vertex
of Z2 in a finite number of steps. This fails to be true if we further reduce the steps
only to those in the top line; the random walk is then restricted to the fraction of 1/3
of all vertices in Z2 ; see Fig. 2.3.
Example 2.6 Gaussian random walk: This random walk has steps that can take any
value in R. The probability distribution of X1 is normal (or Gaussian) with mean
16 CHAPTER 2. RANDOM WALKS
Figure 2.3: The set of allowed steps (arrows) and reachable vertices (dots) for the
random walk discussed in Example 2.5.
where α is a parameter with 0 < α < 2. As is seen by comparing Fig. 2.1 and
Fig. 2.4, a distinction between this random walk and the SRW is clear at first sight.
Theorem 2.8 [Strong Law of Large Numbers] Suppose that E| X1 | < ∞. Then, with
probability one,
Sn
lim exists and equals EX1 (2.10)
n→∞ n
The expectation EX1 thus defines the asymptotic velocity of the walk. In particular,
if EX1 6= 0 then the walks moves away from the starting point at linear speed while
for EX1 = 0 the speed is zero.
Exercise 2.9 Show that if EX1 6= 0, the probability that the random walk with
steps X1 , X2 , . . . visits the starting point infinitely often is zero.
Problem 2.10 An example of a heavy tailed random walk is the Cauchy random walk
where X1 has Cauchy distribution characterized by the probability density
1 1
f (x) = . (2.11)
π 1 + x2
Next we will describe the fluctutations of the position Sn around its mean:
Theorem 2.11 [Central Limit Theorem] Consider a one-dimensional random walk
with E( X12 ) < ∞. Then, as n → ∞,
Sn − nEX1
√ (2.12)
n
has asymptotically normal distribution with mean zero and variance σ2 = Var( X1 ).
The crux of this result is that, for the walks with EX1 = 0, the distribution of the
endpoint is asymptotically very close to that of the Gaussian random walk with a
properly adjusted variance. This is a manifestation of a much more general invari-
ance principle that deals with the distribution of the entire path of the random walk.
The limiting object there is Brownian motion.
Problem 2.12 Consider the p Gaussian random walk of length n. Show that the
largest step is of size order log n and that the difference between the first and
second largest positive step tends to zero as n → ∞.
Exercise 2.13 Suppose 1 < α < 2 and consider the first n steps of the heavy tailed
random walk from Example 2.7. Show that the probability that the largest step is
twice as large than any other step is bounded away from zero uniformly in n.
Problem 2.14 Suppose now that 0 < α < 1. Show that that with probability that is
uniformly positive in n, the largest step of a heavy tailed random walk of length n
is larger than the sum of the remaining steps. See Fig. 2.4.
18 CHAPTER 2. RANDOM WALKS
Figure 2.4: A plot of 25000 steps of the heavy tailed random walk from Example 2.7
with α = 1.2. The defining feature of heavy tailed random walks is the presence of
“macroscopic” jumps, i.e., those comparable with the typical distance of the walk
from the starting point at the time of their occurrence. In particular, the Central
Limit Theorem does not apply due to the lack of the second moment of X1 .
(1) Under what conditions does a random walk come infinitely often back to its
starting position?
(2) When do the paths of two independent copies of the same random walk inter-
sect infinitely often?
The interest in these is bolstered by the fact that the answer depends sensitively on
the dimension. Explicitly, for rather generic step distributions, the character of the
answer changes as dimension goes from 2 to 3 for the first question and from 4 to 5
for the second question.
Throughout this section we will focus on the first question.
2.2. TRANSITION IN D = 2: RECURRENCE VS TRANSIENCE 19
Definition 2.15 [Recurrence & transience] We say that a random walk is recurrent
if it visits its starting position infinitely often with probability one and transient if it visits
its starting position finitely often with probability one.
Our analysis begins by showing that every random walk is either recurrent or tran-
sient; no intermediate scenarios take place. Let N be the number of visits of (Sn ) to
its starting point S0 ,
N = ∑ 1 { S n = S0 } . (2.13)
n ≥0
P( N = 1) = P( τ = ∞ ). (2.15)
P ( N = n ) = P ( τ = ∞ )P ( τ < ∞ ) n −1 . (2.16)
Consider the first visit back to the origin and suppose it occurred at time τ = k.
0 = S
Then N = n + 1 implies that the walk Sm k +m − Sk — namely, the part of the
walk (Sn ) after time k — makes n visits back to its starting point, S00 = 0. But
the walk Sm 0 is independent of the event { τ = k } because τ = k is determined
0 is a function of only X
by X1 , . . . , Xk while Sm k +1 , Xk +2 , . . . . This implies
P( N = n + 1 & τ = k ) = P ∑ 1{Sm0 =0} = n & τ = k
m ≥0
=P ∑ 1{Sm0 =0} = n P(τ = k) = P( N = n)P(τ = k) (2.18)
m ≥0
implying P( N < ∞) = 0. If, on the other hand, P(τ = ∞) > 0 then P(τ < ∞) < 1
and, by (2.17), the probabilities P( N = n) form a geometric sequence. Summing
over all n in the range 1 ≤ n < ∞ gives
P( τ = ∞ )
P( N < ∞ ) = =1 (2.20)
1 − P( τ < ∞ )
as desired.
Problem 2.17 Suppose (Sn ) is a random walk and let x be such that P(Sn = x ) > 0
for some n ≥ 0. Prove that with probability one (Sn ) visits x only finitely often
if (Sn ) is transient and infinitely often if (Sn ) is recurrent.
The main technical point of the previous derivations is that transience can be char-
acterized in terms of finiteness of EN:
Lemma 2.18 A random walk is transient if EN < ∞ and recurrent if EN = ∞.
Proof. If EN < ∞ then P( N < ∞) = 1 and the walk is transient. However, the
other implication is more subtle. Assume P( N < ∞) = 1 and note that then also
P(τ = ∞) > 0. Then sequence P( N = n) thus decays exponentially and so
∞ ∞
EN = ∑ nP( N = n) = P(τ = ∞) ∑ nP(τ < ∞)n−1
n =1 n =1
(2.21)
P( τ = ∞ ) 1
= =
[1 − P(τ < ∞)] 2 P( τ = ∞ )
Exercise 2.19 As noted in the proof, the fact that EN < ∞ implies P( N < ∞) = 1
is special for the context under consideration. To see this is not true in general, find
an example of an integer valued random variable Z ≥ 0 such that P( Z < ∞) = 1
but EZ = ∞.
Exercise 2.20 Show that the probability P(Sn = 0) for the simple symmetric ran-
dom walk in d = 1 decays like n−1/2 . Conclude that the walk is recurrent.
Then
dk 1
Z
EN = lim d
(2.23)
t ↑1 [−π,π ]d (2π ) 1 − tϕ(k)
2.2. TRANSITION IN D = 2: RECURRENCE VS TRANSIENCE 21
dk ik·Sn
Z
1{ Sn =0} = e , (2.24)
[−π,π ]d (2π )d
(Here is where we used the fact that the walk is confined to integer lattice.)
Taking expectation in (2.24), we thus get
dk
Z
P( Sn = 0) = E(eik·Sn ). (2.26)
[−π,π ]d (2π )d
Next multiply (2.26) by tn for some t ∈ [0, 1) and sum on n ≥ 0. This gives
∞
dk
Z
∑ t n P( Sn = 0) = ∑ tn
[−π,π ]d(2π )d
ϕ(k )n
n =0 n ≥0
dk
Z
∑
n
= d
tϕ(k ) (2.28)
[−π,π ]d (2π ) n≥0
dk 1
Z
= d
[−π,π ]d (2π ) 1 − tϕ(k)
where we used that |tϕ(k )| ≤ t < 1 to see that the sum and integral can be inter-
changed in the second line. Taking the limit t ↑ 1 makes the left-hand side tend
to ∑n≥0 P(Sn = 0) = EN.
These observation allow us to characterize when the simple random walk is recur-
rent and when it is transient:
Theorem 2.22 [Recurrence/transience of SRW] The simple symmetric random walk
on Zd is recurrent in dimensions d = 1, 2 and transient in dimensions d ≥ 3.
Proof. To apply the previous lemma, we need to calculate ϕ for the SRW. Using that
the walk makes steps only in (positive or negative) coordinate directions, we get
1 ik1 1 1 1
ϕ(k) = e + e−ik1 + · · · + eikd + e−ikd
2d 2d 2d 2d
(2.29)
1 1
= cos(k1 ) + · · · + cos(k d ).
d d
22 CHAPTER 2. RANDOM WALKS
2x
1 − cos( x ) = 2 sin2 ( x/2) and ≤ sin( x ) ≤ x (2.30)
π
yield
k2i k2i
2 ≤ 1 − cos ( k i ) ≤ (2.31)
π2 2
Plugging this in the definition of ϕ(k ) shows that
| k |2 | k |2
1 − t + 2t ≤ 1 − tϕ ( k ) ≤ 1 − t + . (2.32)
π2 d 2d
Taking the limit we find that the function k 7→ 1 − tϕ(k ) is uniformly integrable
around k = 0 if and only if the function k 7→ |k |2 is integrable, i.e.,
dk
Z
EN < ∞ if and only if <∞ (2.33)
|k|<1 | k |2
1 1 1
P ( X1 = n ) = − , n 6= 0. (2.34)
2 |n|α (|n| + 1)α
In this section will be devoted to the second question from Section 2.2 which con-
cerns the non-intersection of the paths of independent copies of the same random
2.3. TRANSITION IN D = 4 & LOOP-ERASED RANDOM WALK 23
walk. Consider two independent copies (Sn ) and (S̃n ) of the same random walk.
We are interested in the cardinality of the set
I(S, S̃) := {Sn : n ≥ 0} ∩ {S̃n : n ≥ 0}. (2.35)
First we note that in some cases the question can be answered directly:
Exercise 2.26 Use Problem 2.17 to show that paths of two independent copies of a
(non-constant) recurrent random walk meet at infinitely many distinct points.
This allows us to focus, as we will do from now on, on transient random walks
only. Some of these can be still handled by geometric arguments:
Problem 2.27 Show that the paths of two independent copies of a simple random
walk on Z, biased or symmetric, intersect infinitely often with probability one.
To address the general case, instead of |I(S, S̃)| we will work with the number
N (2) = ∑ 1{ Sm = Sn } (2.36)
m,n≥0
that counts the number of pairs of times when the walks collided. To see this comes
at no loss, we note that
N (2) < ∞ ⇒ I(S, S̃) < ∞ (2.37)
To get the opposite implication, we note:
Lemma 2.28 Suppose the random walks S and S̃ are transient. Then
P( N (2) = ∞) = 1 if and only if P |I(S, S̃)| = ∞ = 1
(2.38)
and let ñ x be the corresponding quantity for S̃n . By the assumption of transience,
n x < ∞ and ñ x < ∞ for every x with probability one. Next we note
N (2) = ∑ ∑ 1{Sn = x} 1{S̃m = x} = ∑ n x ñ x (2.40)
m,n≥0 x ∈I(S,S̃) x ∈I(S,S̃)
If |I(S, S̃)| < ∞, then the sum would be finite implying N (2) < ∞. Thus, if
P( N (2) = ∞) = 1 then we must have |I(S, S̃)| = ∞ with probability one.
We now proceed to characterize the transient random walks which N (2) is finite
with probability one. The analysis is analogous to the question of recurrence vs
transience but some steps are more tedious and so will be a bit sketchy at times.
Using arguments that we omit for brevity, one can again show that P( N (2) < ∞)
takes only values zero and one and
P ( N (2) < ∞ ) = 1 if and only if EN (2) < ∞. (2.41)
Next we prove:
24 CHAPTER 2. RANDOM WALKS
Lemma 2.29 Consider a random walk on Zd with steps X1 , X2 , . . . and let, as before,
ϕ(k ) = E(eik·X1 ). Then
dk 1
Z
EN (2)
= lim (2.42)
t ↑1 [−π,π ]d (2π )d |1 − tϕ(k)|2
dk ik·(Sn −S̃m )
Z
1{Sn =S̃m } = e . (2.43)
[−π,π ]d (2π )d
Applying (2.27) we have
m
E(eik·(Sn −S̃m ) ) = ϕ(k)n ϕ(k ) (2.44)
dk
Z
m
P(Sn = S̃m ) = d
ϕ(k )n ϕ(k ) (2.45)
[−π,π ]d (2π )
dk 1
Z
∑ tm+n P(Sn = S̃m ) =
[−π,π ]d (2π ) |1 − tϕ(k)|2
d
. (2.46)
m,n≥0
dk
Z
EN (2) < ∞ if and only if < ∞. (2.47)
|k|<1 | k |4
The integral is finite if and only if d ≥ 5.
Problem 2.31 For what values of α > 0 do the paths of independent copies of the
walk described in Problem 2.25 intersect infinitely often.
There is a heuristic explanation of the above phenomena: The fact that the ran-
dom walk is recurrent in d = 2, but just barely, means that the path of the walk
is two dimensional. (This is actually a theorem if we interpret the dimension in the
sense of Hausdorff dimension.) Now it is a fact from geometry two generic two-
dimensional subspaces of Rd do not intersects in dimension d ≥ 5 and they do
in dimensions d ≤ 4. Hence we should expect that Theorem 2.30 is true, except
perhaps for the subtle boundary case d = 4.
2.3. TRANSITION IN D = 4 & LOOP-ERASED RANDOM WALK 25
Problem 2.32 To verify the above heuristics, let us investigate the intersections of m
paths of SRW. Explicitly, let S(1) , . . . , S(m) be m independent SRW and define
N (m) = ∑ 1{ S`
1
=···=S`m } (2.48)
`1 ,...,`m ≥0
Zm = STm , Tm ≤ n. (2.50)
The subject of the LERW goes way beyond the level and scope of these notes. (In-
deed, it has only been proved recently that, in all dimensions, the LERW has a well
defined scaling limit which is understood in d = 2 — see Fig. 2.5 — and d ≥ 4,
but not in d = 3.) However, the analysis of the path-avoiding property of the SRW
allows us to catch at least a glimpse of what is going on in dimensions d ≥ 5.
The key notion to be studied in high dimension is that of a cut point. The cleanest
way to define this notion is for the two sided random walk which is a sequence
of random variables (Sn )n∈Z indexed by (both positive and negative) integers,
where Sn is defined by
X1 + · · · + X n ,
n ≥ 1,
Sn = 0, n = 0,
X n + 1 + · · · + X0 , n<0
intersect only at time n = 0 — the starting point. The time k is then referred to as the cut
time of the random walk (Sn ).
26 CHAPTER 2. RANDOM WALKS
Figure 2.5: A path of the loop erased random walk obtained by loop-erasure from
the SRW from Fig. 2.2. The trace of the SRW is depicted in light gray. While the
SRW needed 682613 steps to exit the box, its loop erasure took only 3765 steps.
Lemma 2.35 Consider the two sided random walk (Sn ) and let ( Zn ) be the loop erasure of
the n ≥ 0 portion of the path. Then the sequence ( Zn ) visits all cutpoints (of the two-sided
path) on the n ≥ 0 portion of the path (Sn ) in chronological order.
Proof. The loop erasure removes only vertices on the path that are inside cycles.
Cut points are never part of a cycle and so they will never be loop-erased.
The fact that the SRW and the LERW agree on all cutpoints has profound conse-
quences provided we can control the frequency of occurrence of cutpoints. We
state a very weak claim to this extent:
Lemma 2.36 Let Rn be the number of cut times — in the sense of Definition 2.34 — in
the set of times {1, . . . , n}. Then
P ( N (2) = 1 ) − e
P( Rn ≥ en) ≥ . (2.53)
1−e
2.4. HARMONIC ANALYSIS AND ELECTRIC NETWORKS 27
Figure 2.6: A schematic picture of the path of a two sided random walk which, in
high dimension, we may think of as a chain of little tangles or knots separated by
cutpoints (marked by the bullets).
Proof. We have
n
Rn = ∑ 1{k is a cut time} (2.54)
k =1
Taking expectation we get
n
ERn = ∑ P(k is a cut time). (2.55)
k =1
But the path of the two-sided random walk looks the same from every time and
so P(k is a cut time) equals the probability that 0 is a cut time. That probability in
turn equals P( N (2) = 1). This proves (2.52). To get also (2.53) we note
ERn ≤ en 1 − P( Rn ≥ en) + nP( Rn ≥ en).
(2.56)
Then (2.53) follows from (2.52) and some simple algebra.
Of course having a positive density of points where the SRW and the LERW agree is
not sufficient to push the path correspondence through. However, if we can show
that the “tangles” between the cutpoints have negligible diameter and that none of
them consumes a macroscopic amount of time, then on a large scale the paths of
the LERW and the SRW will be hardly distinguishable.
Random walks have a surprising connection to electric or, more specifically, resistor
networks. This connection provides very efficient means to estimate various hitting
probabilities and other important characteristics of random walks. The underlying
common ground is the subject of harmonic analysis.
We begin by a definition of a resistor network:
Definition 2.37 [Resistor network] A resistor network is an unoriented (locally fi-
nite) graph G = (V, E) endowed with a collection (c xy )(x,y)∈E of positive and finite num-
bers — called conductances — that obey the symmetry
c xy = cyx , ( x, y) ∈ E, (2.57)
28 CHAPTER 2. RANDOM WALKS
For ease of exposition, we also introduce the notation i ( x ) for the total current,
i( x ) := ∑ ixy (2.60)
y ∈V
out of vertex x. There are two basic engineering questions that one may ask about
resistor networks:
(1) Suppose the values of the potential u are fixed on a set A ⊂ V. Find the poten-
tial at the remaining nodes.
(2) Suppose that we are given the total current i ( x ) out of the vertices in A ⊂ V.
Find the potential at the nodes of V that is consistent with these currents.
The context underlying these questions is sketched in Figs. 2.7 and 2.8.
Of course, the above questions would not come even close to having a unique
solution without imposing an additional physical principle:
Definition 2.39 [Kirchhoff’s Law of Currents] We say that a collection of currents
(i xy ) obeys Kirchhoff’s law of currents in the set W ⊂ V if the total current out of any
vertex in W is conserved, i.e.,
i ( x ) = 0, x ∈ W. (2.61)
(L f )( x ) = ∑ c xy f (y) − f ( x )
(2.62)
y ∈V
2.4. HARMONIC ANALYSIS AND ELECTRIC NETWORKS 29
Figure 2.7: A circuit demonstrating the setting in the first electrostatic problem
mentioned above. Here vertices on the extreme left and right are placed on con-
ducting plates that, with the help of a battery, keep them at a constant electrostatic
potential. The problem is to determine the potential at the “internal” vertices.
Let W ⊂ V and suppose u is an electric potential for which the currents defined by Ohm’s
law satisfy Kirchhoff’s law of currents in W. Then
(Lu)( x ) = 0, x ∈ W. (2.63)
Proof. Using Ohm’s Law, the formula for the current out of x becomes
i ( x ) = ∑ i xy = ∑ c xy u(y) − u( x ) = (Lu)( x )
(2.64)
y ∈V y ∈V
The object L is a linerar operator in the sense that it operates on a function to get
another function and the operation is linear. In Markov chain theory which we will
touch upon briefly in the next section, L is referred to as generator.
Note that while the definition of harmonicity of f speaks only about the vertices
in W, vertices outside W may get involved due to the non-local nature of L. Har-
monic functions are special in that they satisfy the Maximum Principle. Given a
set W ⊂ V, we use
∂W = y ∈ V \ W : ∃ x ∈ W such that ( x, y) ∈ E (2.65)
30 CHAPTER 2. RANDOM WALKS
Figure 2.8: A circuit demonstrating the setting in the second electrostatic problem
above. The topology of the circuit is as in Fig. 2.7, but now the vertices on the
sides have a prescribed current flowing in/out of them. The problem is again to
determine the electrostatic potential consistent with these currents.
In fact, f cannot have a strict local maximum on W and if has a local maximum on W then
it is constant on W ∪ ∂W.
Indeed, if f (y) ≤ f ( x ) for all neighbors of x with at least one inequality strict, then
But that is impossible because the difference of the left and right-hand side equals
(L f )( x ) which is zero because x ∈ W and because f is harmonic at x.
Now suppose that the right-inequality on (2.66) does not hold. Then the maximum
of f over W ∪ ∂W occurs on W. We claim that then f is constant on W ∪ ∂W.
Indeed, if x ∈ W ∪ ∂W were a vertex where f is not equal its maximum but that
has a neighbor where it is, then we would run into a contradiction with the first
part of the proof by which f must be constant on the neighborhood of any local
maxima. Hence, f is constant on W ∪ ∂W. But then the inequality on the right of
2.4. HARMONIC ANALYSIS AND ELECTRIC NETWORKS 31
(2.66) does hold and so we have a contradiction anyway. The inequality on the left
is equivalent to that on the right by passing to − f .
u ( x ) = u0 ( x ), x ∈ V \ W. (2.69)
Moreover, this function is the unique minimizer of the Dirichlet energy functional,
1
∑
2
E (u) = c xy u(y) − u( x ) (2.70)
2 x,y∈V
( x,y)∈ E
Proof. First we will establish uniqueness. Suppose u and ũ are two distinct func-
tions which are harmonic on W with respect to L and both of which obey (2.69).
Then v = u − ũ is harmonic on W and vanishes on V \ W. But the Maximum
Principle implies
∂
E (u) = −2 ∑ c xy u(y) − u( x ) = −2(Lu)( x )
(2.72)
∂u( x ) y ∈V
Proof. This is an immediate consequence of the fact that the minimum of a family
of non-decreasing functions is non-decreasing.
32 CHAPTER 2. RANDOM WALKS
Figure 2.9: The setting for the application of the serial law (top) and parallel law
(bottom). In the top picture the sequence of nodes is replaced by a single link whose
resistance is the sum of the individual resistances. In the bottom picture, the cluster
of parallel links can be replaced by a single link whose conductatance is the sum of
the individual conductances.
Let us go back to the two questions we posed above and work them out a little
more quantitatively. Suppose A and B are disjoint sets in V and suppose that A
is kept at potential u = 0 and B at a constant potential u = U > 0. A current I
will then flow from A to B. Thinking of the whole network as just one resistor, the
natural question is what is its effective resistance Reff = U/I. A formal definition of
this quantity is as follows:
Definition 2.45 [Effective resistance] Let A, B ⊂ V be disjoint. The effective resis-
tance Reff ( A, B) is a number in [0, ∞] defined by
Reff ( A, B)−1 = inf E (u) : 0 ≤ u ≤ 1, u ≡ 0 on A, u ≡ 1 on B .
(2.73)
Problem 2.46 Show that adding or removing the condition 0 ≤ u ≤ 1 does not
change the value of the infimum.
Exercise 2.49 Find the effective resistance Reff ( x, y) between any pair of vertices of
the ring {0, . . . , N − 1} where N − 1 is considered a neighbor of 0 and all edges
have a unit conductance.
Problem 2.50 Consider a two-dimensional torus T N of N × N vertices. Explicitly,
T N is a graph with vertex set V = {0, . . . , N − 1} × {0, . . . , N − 1} and edges be-
tween any pair ( x1 , y1 ) ∈ V and ( x2 , y2 ) ∈ V with
| x1 − x2 | mod N + |y1 − y2 | mod N = 1. (2.76)
All edges have a unit conductance. Find the effective resistance Reff ( x, y) between
a neighboring pair ( x, y) of vertices of T N . Hint: Use discrete Fourier transform.
As it turns out, the most important instance of effective resistance Reff ( x, y) is when
one of the points is “at infinity.” The precise definition is as follows:
Definition 2.51 [Resistance to infinity] Consider an infinite resistor network and
let BR be a sequence of balls of radius R centered at a designated point 0. The resistance
Reff ( x, ∞) from x to ∞ is then defined by the monotone limit
Exercise 2.52 Show that the value of Reff ( x, ∞) does not depend on the choice of
the designated point 0.
Exercise 2.53 Show that Reff ( x, ∞) = ∞ for a network given as an infinite chain of
vertices, i.e., G = Z with the usual nearest neighbor structure.
Apart from monotonicity, the resistor networks have the convenient property that
certain parts of the network can be modified without changing effective resistances
between sets non-intersecting the modified part. The most well known examples
of these are the parallel and serial laws.
34 CHAPTER 2. RANDOM WALKS
For the sake of stating these laws without annoying provisos, we will temporarily
assume that vertices of G may have multiple edges between them. (Each such
edge then has its own private conductance.) In graph theory, this means that we
allow G to be an unoriented multigraph. The parallel and serial laws tell us how to
reinterpret such networks back in terms of graphs.
Lemma 2.54 [Serial Law] Suppose a resistor network contains a sequence of incident
edges e1 , . . . , e` of the form e j = ( x j−1 , x j ) such that the vertices x j , j = 1, . . . , ` − 1,
are all of degree 2. Then the effective resistance Reff ( A, B) between any sets A, B not
containing x1 , . . . , x`−1 does not change if we replace the edges e1 , . . . , e` by a single edge e
with resistance
r e = r e1 + · · · + r e ` (2.78)
Lemma 2.55 [Parallel Law] Suppose two vertices x, y have multiple edges e1 , . . . , en
between them. Then the effective resistance Reff ( A, B) between any sets A, B does not
change if we replace these by a single edge e with conductance
c e = c e1 + · · · + c e n (2.79)
To demonstrate the utility of resistor networks for the study of random walks, we
will now define a random walk on a resistor network. Strictly speaking, this will not be
a random walk in the sense of Definition 2.1 because resistor networks generally do
not have any underlying (let alone Euclidean) geometry. However, the definition
will fall into the class of Markov chains that are natural generalizations of random
walks to non-geometric setting.
Definition 2.58 [Random walk on resistor network] Suppose we have a resistor
network — i.e., a connected graph G = (V, E) and a collection of conductances ce , e ∈ E.
A random walk on this network is a collection of random variables Z0 , Z1 , . . . such that
for all n ≥ 1 and all z1 , . . . , zn ∈ V,
P( Z0 = z) = 1. (2.82)
2.5. RANDOM WALKS ON RESISTOR NETWORKS 35
To mark the initial condition explicitly, we will write Pz for the distribution of the walks
subject to the initial condition (2.82).
Example 2.59 Any symmetric random walk on Zd is a random walk on the resis-
tor network with nodes Zd and an edge between any pair of vertices that can be
reached in one step of the random walk. Indeed, if X1 , X2 , . . . denote the steps of
the random walk (Sn ) with S0 = z, then
P z ( S1 = z 1 , . . . , S n = z n ) = P ( X1 = z 1 − z ) · · · P ( X n = z n − z n − 1 ) . (2.83)
To see that this is of the form (2.80–2.81), we define the conductance c xy by
c xy = P( X1 = y − x ) (2.84)
and note that symmetry of the step distribution implies c xy = cyx while the nor-
malization gives π ( x ) = 1.
The symmetry assumption is crucial for having P( x, y) of the form (2.81). If one
is content with just the Markov property (2.80), then any random walk on Zd will
do. The simplest example of a symmetric random walk is the simple random walk,
which just chooses a neighbor at random and passes to it. This “dynamical rule”
generalizes to arbitrary graphs:
Example 2.60 Random walk on a graph: Consider a locally finite unoriented graph G
and let d( x ) denote the degree of vertex x. Define
c xy = 1, ( x, y) ∈ E. (2.85)
This defines a resistor network; the random walk on this network is often referred
to as random walk on G because the probability to jump from x to neighbor y is
1
P( x, y) = , ( x, y) ∈ E, (2.86)
d( x )
which corresponds to choosing a neighbor at random. In this case π ( x ) = d( x ).
Problem 2.62 Prove (2.88) by showing that Pz (τW c > n) ≤ e−δn for some δ > 0.
Hint: Show that e := minz∈W Pz (τW c ≤ k ) > 0 when k is the diameter of W. Then
iterate along multiples of k to prove that Pz (τW c > nk) ≤ (1 − e)n .
∑ Ex
u( x ) = u0 ( ZτW c )1{Z1 =y} . (2.90)
y ∈V
Since the probability of each path factors into a product (2.80), we have
and so
u( x ) = ∑ P(x, y)u(y). (2.92)
y ∈V
In explicit terms,
π ( x )u( x ) = ∑ cxy u(y). (2.93)
y ∈V
But π ( x ) is the sum of c xy over all y and so we can write the difference of the right
and left-hand side as (Lu)( x ) = 0.
The probabilistic interpretation of the solution allows us to rewrite the formula for
effective resistance as follows:
Lemma 2.63 Let W ⊂ V be a finite set and let x ∈ W. Let Tx denote the first return
time of the walk ( Zn ) to x,
Tx = inf{n ≥ 1 : Zn = x }. (2.94)
Then
−1
Reff { x }, W c = π ( x ) Px Tx ≥ τW c ). (2.95)
We thus have to show that E (u) equals the RHS of (2.95). For that we insert
2
u(y) − u(z) = u(y) u(y) − u(z) + u(z) u(z) − u(y) (2.98)
where we used that cyz = czy to write the contribution of each term on the right
of (2.98) using the same expression. But Lu(z) = 0 for z ∈ W \ { x } and u(z) = 0
for z ∈ W c . At z = x we have u(z) = 1 and
∑ cxy 1 − Py ( Tx < τW c )
−(Lu)( x ) =
y ∈V
Theorem 2.64 [Effective resistance and recurrence vs transience] Recall the nota-
tion (2.94) for Tx . Then
and
Reff ( x, ∞) < ∞ ⇔ Px ( Tx = ∞) > 0. (2.102)
Proof. It clearly suffices to prove only (2.102). By Lemma 2.63 and Exercise 2.52, for
the sequence of balls BR of radius R centered at any designated point,
Exercise 2.65 Use Exercise 2.53 and Problem 2.57 to show that the random walk
on Z is recurrent and that on a regular ternary tree is transient.
38 CHAPTER 2. RANDOM WALKS
Figure 2.11: The setting for the proof of the Nash-Williams estimate. Only the edges
between Bn (0) and its complement are fully drawn. By setting the potential on the
boundary vertices of Bn (0) to a constant, these vertices are effectively fused into
one. The resistivity between the fused vertex and Bn (0)c is |∂Bn |−1 .
Proof. This follows because the effective resistance to infinity, Reff ( x, ∞), is a de-
creasing function of the conductances.
Exercise 2.67 Show that there exists a critical dimension dc ∈ N ∪ {∞} such that
the d-dimensional SRW is recurrent in dimensions d ≤ dc and transient in d > dc .
Problem 2.68 Show that a removal of a single edge from Zd does not change it
recurrence/transience properties.
1
R n +1 ≥ R n + (2.107)
|∂Bn |
2.5. RANDOM WALKS ON RESISTOR NETWORKS 39
where
|∂Bn | = # ( x, y) ∈ E : x ∈ Bn (0), y ∈ Bn (0)−1 .
(2.108)
Proof. The SRW can be thought of as a random walk on the electrical network
with graph structure Zd and a unit conductance on each edge. In the notation of
Lemma 2.69 we have |∂Bn | ≤ 2dnd−1 and so
n −1
1 1 n −1 1− d
Rn ≥ ∑ |∂Bk |
≥
2d k∑
k . (2.109)
k =1 =1
Branching processes
∑ pn = 1. (3.1)
n ≥0
41
42 CHAPTER 3. BRANCHING PROCESSES
The second line in the formula for Xn+1 shows that once the sequence ( Xn ) hits
zero — i.e., once the family has died out — it will be zero forever. We refer to
this situation extinction; the opposite case is referred to as survival. The first goal is
to characterize offspring distributions for which extinction occurs with probability
one — or, complementarily, survival occurs with a positive probability.
Consider the moment-generating function
E e−s(ξ n+1,1 +···+ξ n+1,Xn ) = ∑E e−s(ξ n+1,1 +···+ξ n+1,k ) 1{Xn =k}
k ≥0
∑ e− λ ( s ) k P ( Xn = k ) = E e− λ ( s ) Xn
=
k ≥0
The right-hand side is the moment-generating function of Xn at the point λ(s). This
is the content of (3.6). Note that λ(s) ≥ 0 once s ≥ 0 and so there is no problem
with using λ(s) as an argument of φn .
To derive the explicit formula for φn we first solve recursively for φn to get
φn (s) = φ0 λ ◦ · · · ◦ λ(s) (3.10)
3.1. GALTON-WATSON BRANCHING PROCESS 43
Figure 3.1: The plot of s 7→ λ(s) for offspring distribution with p0 = 1 − p and
p3 = p for p taking values 0.25, 0.33 and 0.6, respectively. These values represent
the three generic regimes distinguished — going left to right — by whether λ0 (0+ )
is less than one, equal to one, or larger than one. As λ is strictly concave, non-
negative with λ(0) = 0, only in the last case λ has a strictly positive fixed point.
We refer to the three situations as subcritical, critical and supercritical.
(Again, we are using that λ maps (0, ∞) into (0, ∞) and so the iterated map is well
defined.) From here (3.7) follows by noting that φ0 (s) = e−s due to (3.2).
In light of (3.7), the question whether φn (s) → 1 or not now boils to the question
whether λn (s) → 0 or not. To find for the right criterion, we will need to character-
ize the analytic properties of λ:
Lemma 3.3 Suppose that pn < 1 for all n. Then λ is non-decreasing and continuous
on [0, ∞) and strictly concave and differentiable on (0, ∞). In addition,
lim λ0 (s) =
s ↓0
∑ n pn (3.11)
n ≥0
and
lim λ0 (s) = inf{n : pn > 0}. (3.12)
s→∞
Proof. Since pn ≤ 1, the series ∑n≥0 e−sn pn is absolutely summable locally uni-
formly on s > 0 and so it can be differentiated term-by-term. In particular, the first
derivative λ(s) is the expectation
for the probability mass function n 7→ pn e−sn eλ(s) on N ∪ {0}, while the second
derivative λ00 (s) is the negative of the corresponding variance. Under the condi-
tion pn < 1 for all n, the variance is non-zero and so λ00 (s) < 0 for all s > 0. This
establishes differentiability and strict concavity (0, ∞); continuity at s = 0 is di-
rectly checked. The limit of the derivatives (3.11) exists by concavity and equals
the corresponding limit of (3.13). To prove (3.12), let k be the infimum on the right
hand side. Then e−λ(s) ∼ pk e−sk and so all terms but the k-th in (3.13) disappear in
the limit s → ∞. The k-th term converges to k and so (3.12) holds.
44 CHAPTER 3. BRANCHING PROCESSES
Exercise 3.4 Compute λ00 (s) explicitly and show that it can be written as the nega-
tive of a variance. Use this to show that λ00 (s) < 0 for all s > 0 once at least two of
the pn ’s are non-zero.
The statements in the lemma imply that the function looks as in Fig. 3.1.
Exercise 3.5 Consider branching process with offspring distribution determined
by p0 = p and p2 = 1 − p with 0 < p < 1. Sketch the graph of λ(s) and characterize
the regime when λ has a non-zero fixed point.
P( Xn ≥ 1) −→ 0. (3.15)
n→∞
(2) If µ > 1 then the process dies out with probability e−s? ,
P( Xn = 0) −→ e−s? , (3.16)
n→∞
where s? is the unique positive solution to λ(s) = s, and it survives forever and, in
fact, the population size goes to infinity with complementary probability,
P( Xn ≥ M) −→ 1 − e−s? (3.17)
n→∞
for every M ≥ 1.
Proof. Suppose first that µ ≤ 1. The strict convexity and the fact that λ0 (s) → µ
as s ↓ 0 ensure that λ(s) < s for all s > 0. This means λn+1 (s) = λ(λn (s)) < λn (s),
i.e., the sequence n 7→ λn (s) is strictly decreasing. Thus for each s ≥ 0 the limit
exists. But λn+1 (s) = λ(λn (s)) and the continuity of λ imply
As λ(s) < s for s > 0, the only point that satisfies this equality on [0, ∞) is r (s) = 0.
We conclude that λn (s) → 0 yielding
From
φn (s) ≤ 1 − (1 − e−s )P( Xn ≥ 1) (3.21)
3.1. GALTON-WATSON BRANCHING PROCESS 45
it follows that
P( Xn ≥ 1) −→ 0 (3.22)
n→∞
as we desired to prove.
Next let us assume µ > 1. Then we have λ(s) > s for s sufficiently small. On the
other hand, due to p0 > 0 and (3.12), we have λ(s) < s once s is large. Thus there
exists a non-zero solution to λ(s) = s. Strict convexity of λ implies that this solution
is actually unique because if s? is the least such positive solution then λ0 (s? ) ≤ 1
and strict concavity tell us that λ(s) < s for 0 < s < s? and λ(s) > s for s > s? .
We claim that
λn (s) −→ s? , s > 0. (3.23)
n→∞
This is proved in the same way as for µ ≤ 1 but we have to deal with s > s?
and s < s? separately.
For s > s? we have s? < λ(s) < s and so the sequence λn (s) is decreasing. The
limit is again a fixed point of λ(s) = s and since s? is the only one available, we
have λn (s) → s? .
For 0 < s < s? we instead have s < λ(s) < s? , where we used that λ(s) is increasing
to get the second inequality. It follows that λn (s) is increasing in this regime. The
limit is a fixed point of λ and so λn (s) → s? in this case as well.
Having established (3.23), we now note that this implies
we must have
P( Xn = 0) −→ e−s? . (3.26)
n→∞
would imply lim infn→∞ φn (s) ≥ e−s? + ee−sM > e−s? , a contradiction. Hence,
P( Xn = M ) tends to zero for any finite M, proving (3.17).
Exercise 3.7 Suppose p0 = 0 and p1 < 1. Show that Xn → ∞ with probability one.
Problem 3.8 Suppose µ < 1. Show that P( Xn ≥ 1) decays exponentially with n.
The analysis in the previous section revealed the following picture: Branching pro-
cesses undergo an abrupt change of behavior when µ increases through one. This
is a manifestation of what physicists call a phase transition. The goal of this section is
to investigate the situation at the critical point, i.e., for generic branching processes
with mean-offspring µ = 1. Here is the desired result:
Theorem 3.9 Suppose µ := Eξ = 1 and σ2 := Var(ξ ) ∈ (0, ∞). Then
2 1
P( Xn ≥ 1) = n → ∞.
2
1 + o (1) , (3.28)
σ n
Moreover,
X 2
n
P ≥ z Xn ≥ 1 −→ e−z(2/σ ) , (3.29)
n n→∞
1 − φn (s)
= ∑ e− s ( k −1) P ( X n ≥ k ). (3.30)
1 − e− s k ≥1
1 σ2
n → ∞.
= n 1 + o (1) , (3.34)
λn (s) 2
1 1 σ2
n → ∞.
= + n 1 + o (1) , (3.35)
λn ( /n)
θ θ 2
Proof. First, let us get the intuitive idea for the appearance of 1/n scaling for λn .
We know that λn (s) → 0 for any s ≥ 0. Since λn+1 (s) = λ(λn (s)), we can thus
expand λ about s = 0 to get a simpler recurrence relation. A computation shows
σ2 2
λ(s) = µ s − s + o ( s2 ) (3.36)
2
and since µ = 1, we thus have
σ2
λ n ( s )2 1 + o (1) .
λ n +1 ( s ) = λ n ( s ) − (3.37)
2
This is solved to the leading order by setting λn (s) = c/n which gets us c = 2/σ2 .
We actually do not have to work much harder in order to get the above calculation
under control. First, the existence of the first two derivatives of λ tells us that, for
each e > 0, we can find s0 (e) > 0 such that
1 σ2 1 1 σ2
+ (1 − e ) ≤ ≤ + (1 + e ) , 0 < s < s0 . (3.38)
s 2 λ(s) s 2
Thus, we have
1 σ2 1 1 σ2
+ (1 − e ) ≤ ≤ + (1 + e ) (3.39)
λn (s) 2 λ n +1 ( s ) λn (s) 2
whenever λn (s) < s0 . Define n0 = n0 (s) be the first n for which this is true for
iterations started from s, i.e.,
n0 = sup n ≥ 0 : λn (s) ≥ s0 (e) . (3.40)
Since s0 (e) ≤ λn0 (s) ≤ λ(s0 (e)), summing the inequalites from n0 on we get
1 σ2 1 1 σ2
+ (1 − e ) ( n − n0 ) ≤ ≤ + (1 + e ) ( n − n0 ) (3.41)
λ(s0 (e)) 2 λn (s) s0 ( e ) 2
48 CHAPTER 3. BRANCHING PROCESSES
n σ2 1 n σ2
+ (1 − e ) k ≤ ≤ + ( 1 + e ) k (3.42)
θ 2 λk (θ/n) θ 2
Proof of Theorem 3.9. First we prove (3.28). Fix s > 0. By (3.34) we then have
2/σ2
n → ∞.
1 − φn (s) = 1 + o (1) , (3.43)
n
The bounds 0 ≤ P( Xn ≥ k ) ≤ P( Xn ≥ 1) and Lemma 3.10 imply
1 − φn (s) P( Xn ≥ 1)
P( Xn ≥ 1) ≤ −
≤ . (3.44)
1−e s 1 − e− s
From here (3.28) follows by taking n → ∞ and s → ∞.
Next we plug s = θ/n into the left-hand side of (3.30) and apply (3.35) to get
1 − φn (θ/n) 1/
n → ∞.
θ
= + o (1), (3.45)
1 − e−θ/n 1/ + σ2/
θ 2
The identity
P( Xn ≥ k ) = P( Xn ≥ 1) P( Xn ≥ k | Xn ≥ 1) (3.46)
turns (3.30) into
1 − φn (θ/n) 1
−
= nP( Xn ≥ 1)eθ/n ∑ e−θk/n P( Xn ≥ k| Xn ≥ 1) (3.47)
1−e θ/n n k ≥1
1 1
lim
n→∞
∑
n k ≥1
e−θk/n P( Xn ≥ k | Xn ≥ 1) =
θ + 2/σ2
(3.48)
for some non-increasing function G : (0, ∞) → (0, 1), we can interpret the sum on
the left of (3.48) as the Riemann sum of an (improper) integral to get
Z ∞
1
dz e−θz G (z) = (3.50)
0 θ + 2/σ2
By the properties of the Laplace transform (whose discussion we omit) this can
only be true if
2
G (z) = e−z(2/σ ) . (3.51)
3.2. CRITICAL PROCESS & DUALITY 49
But this shows that the limit (3.49) must exist because from any subsequence we
can always extract a limit by using Cantor’s diagonal argument and the fact that G
is decreasing (again we omit details here).
From Theorem 3.9 we learn that, conditional on survival up to time n, the number
of surviving individuals is of order n. Next we will look at what happens when we
condition on extinction. Of course, this is going to have a noticeable effect only on
the supercritical processes.
Theorem 3.12 [Duality] Consider a Galton-Watson branching process with supercrit-
ical offspring distribution (pn ). Assume 0 < p0 < 1, let s? be the unique positive solution
to λ(s) = s, and define
qn = pn e− s ? n + λ ( s ? ) , n ≥ 0. (3.52)
Then ∑n≥0 qn = 1 and
∑ nqn = λ0 (s? ) < 1, (3.53)
n ≥0
Proof. Consider a finite tree T and let V be the set of vertices and E the set of edges.
Let n(v) denote the number of children of vertex v ∈ T. The probability that T
occurs is then
P( Tp = T ) = ∏ pn(v) (3.55)
v∈ T
Conditioning on extinction multiplies this by es? which we can write using
|V | = | E | + 1 (3.56)
and the fact that λ(s? ) = s? as
This shows
P Tp = T extinction = ∏ pn(v) e−s? n(v)+λ(s? )
(3.58)
v∈ T
which is exactly (3.54).
An interesting special case of duality is the case when a process is self-dual. This
loosely defined term refers to the situation when the dual process has a distribution
“of the same kind” as the original process. Here are some examples:
Example 3.13 Binomial distribution : Let θ ∈ [0, 1] and consider the offspring distri-
bution (pn ) which is Binomial( N, θ ), i.e.,
N n
pn = θ (1 − θ ) N − n , 0 ≤ n ≤ N. (3.59)
n
50 CHAPTER 3. BRANCHING PROCESSES
Then the dual process is also binomial, with parameters N and θ ? where
θ? θ
= e− s ? (3.60)
1 − θ? 1−θ
where s? is determined from (1 − θ + θe−s? ) N = e−s? .
Example 3.14 Poisson distribution : A limit case of the above is the Poisson off-
spring distribution,
λn
pn = e− λ , n ≥ 0, (3.61)
n!
with λ > 1. Here the dual (qn ) is also Poisson but with parameter λ? which is
defined as the unique number less than one with
?
λe−λ = λ? e−λ . (3.62)
The self-dual point is λ = 1.
Exercise 3.15 Verify the claims in the previous examples.
Problem 3.16 Characterize the distribution of the size of the first generation, X1 ,
conditioned on survival forever, i.e., the event n≥1 { Xn ≥ 1}.
T
Problem 3.17 Apply the above analysis to prove that if µ > 1, then for each µ̃ < µ,
P( Xn ≥ µ̃n | Xn ≥ 1) → 1 as n → ∞.
θ ( p) = P p |Cω (∅)| = ∞
(3.65)
where the index p denotes we are considering percolation with parameter p. The
principal observations concerning tree percolation are summarized as follows:
Theorem 3.19 [Bond percolation on Tb ] Consider bond percolation on Tb with b ≥ 2
and parameter p. Define pc = 1/b. Then
(
= 0, for p ≤ pc ,
θ ( p) (3.66)
> 0, for p > pc .
To keep the subject of branching processes in the back of our mind, we begin by
noting a connection between percolation and branching:
Lemma 3.20 [Connection to branching] Let Xn be the number of vertices in Cω (∅)
that have distance n to the root. Then ( Xn )n≥0 has the law of a Galton-Watson branching
process with binomial offspring distribution (pn ),
b n
p (1 − p ) b − n , if 0 ≤ n ≤ b,
pn = n (3.70)
0, otherwise.
Proof. Let Vn be the vertices of Tb that have distance n to the root. Suppose that Xn
is known and let v1 , . . . , v Xn be the vertices in Cω (∅) ∩ Vn . Define
ξ n+1,j = # u ∈ Vn+1 : (v j , u) ∈ Eω . (3.71)
Then, clearly,
Xn+1 = ξ n+1,1 + · · · + ξ n+1,Xn (3.72)
52 CHAPTER 3. BRANCHING PROCESSES
Corollary 3.21 The probability θ ( p) to have an infinite component at the root is the
maximal positive solution in [0, 1] to the equation
b
θ = 1 − 1 − pθ (3.73)
Proof. Recall the connection with branching. The event that C∞ (∅) = ∞ clearly
coincides with the event that the branching process ( Xn ) lives forever. As the prob-
ability to die out equals q = e−s? , where the latter quantity is the smallest solution
to the equation
q = ∑ pn qn = ( pq + 1 − p)b , (3.75)
n ≥0
where C̃ω (v) denote the connected component of v in the subtree of Tb rooted
at v. The union on the right-hand side is disjoint and the components are actually
3.3. TREE PERCOLATION 53
Figure 3.4: The graph of Ψ 7→ x (1 − p + pΨ)b for the tree with b = 4 at x = 0.8 and
p = 0.3. The function is convex and has a unique fixed point with Ψ ∈ (0, 1).
b
Ψ p ( x ) = x E p x |Cω (v1 )| · · · x |Cω (vX1 )| = x ∑ pn Ψ p ( x ) n .
(3.77)
n =0
The equation now (3.74) follows from (3.70). The right-hand side of (3.74) is convex
and it is strictly positive at Ψ = 0 and less than one at Ψ = 1. There is thus a unique
intersection in (0, 1) which then has to be the value of Ψ p ( x ).
Exercise 3.22 Set b = 2 and/or b = 3 and solve (3.73) explicitly. See Fig. 3.3.
Problem 3.23 Use the argument in the previous proof to show that P p (|Cω (∅)| ≥
n) decays exponentially with n for all p < pc .
Exercise 3.24 Suppose p > pc . Use the duality for branching processes (Theo-
rem 3.12) to show that the tail of the probability distribution of the finite compo-
nents, P p (n ≤ |Cω (∅)| < ∞), decays exponentially with n.
To get the asymptotic for the component size distribution at pc , we also note:
Lemma 3.25 For every x < 1,
1 − Ψ p (x)
= ∑ x n−1 P p |Cω (∅)| ≥ n
(3.78)
1−x n ≥1
Proof. The proof is essentially identical to that of Lemma 3.10. Let us abbreviate
54 CHAPTER 3. BRANCHING PROCESSES
Ψ p (x) = ∑ xn P p ∑ x n ( a n − a n +1 ).
|Cω (∅)| = n = (3.79)
n ≥1 n ≥1
Ψ p ( x ) = a1 + ∑ ( x n − x n −1 ) a n . (3.80)
n ≥1
The first term on the right-hand side is a1 = 1 while the second term can be written
as ( x − 1) ∑n≥1 x n−1 an . This and a bit of algebra yield (3.78).
Proof of Theorem 3.19. The claims concerning θ ( p) are derived by analyzing (3.73).
Let us temporarily denote Φ(θ ) = 1 − (1 − pθ )b . Then Φ is concave on [0, 1] and
b 2 2
Φ(θ ) = pbθ − p θ + o (θ 2 ) (3.81)
2
This shows that for pb ≤ 1 the only fixed point of Φ is θ = 0, while for pb > 1 there
are two solutions: one at θ = 0 and one at θ = θ ( p) > 0. This proves (3.66).
To get the critical scaling (3.67), we note that (3.81) can be rewritten as
h b−1 2 i
Φ(θ ) = θ + bθ ( p − pc ) − p 1 + o (1) θ (3.82)
2
The positive solution θ ( p) of Φ(θ ) = θ will thus satisfy
2b2
θ ( p) = 1 + o (1) ( p − p c ), p ↓ pc . (3.83)
b−1
so we just need to find the good asymptotic of Ψ p near x = 1. This end we denote
g( x ) = 1 − Ψ p ( x ) and note that (3.74) becomes
b
g( x ) = 1 − x 1 − pg( x ) (3.85)
Differentiating at x = 1 we get
2 √
g( x ) = 1 − x − (1 − x ) (3.87)
x
3.3. TREE PERCOLATION 55
where the sign of the square root was chosen to make g positive on (0, 1) as it
should be. By (3.78) we in turn have
g( x )
= ∑ x n P p |Cω (∅)| ≥ n
x (3.88)
1−x n ≥1
and so to find the probabilities P p (|Cω (∅)| ≥ n) we just need to expand the left-
hand side into a Taylor series about x = 0 and use that the coefficients of this
expansion are uniquely determined. This is said nearly as easily as it is done:
g( x ) 1 2n x n
x =2 √ −1 = 2 ∑ (3.89)
1−x 1−x n ≥1
n 4
Some of our computations above have perhaps been somewhat unnecessarily for-
mal. For instance, (3.73) can be derived as follows:
Let 1 − θ be the probability that the root is in a finite component. For that to be
true, every occupied edge from the root must end up at a vertex whose component
in the forward direction is also finite. The probability that the vertex v1 is like this
is 1 − pθ, and similarly for all b neighbors of ∅. As these events are independent
for distinct neighbors, this yields
1 − θ = (1 − pθ )b (3.93)
which is (3.73). Similarly, we can also compute χ( p) directly: By (3.76),
|Cω (∅)| = 1 + ω∅,v1 |C̃ω (v1 )| + · · · + ω∅,vb |C̃ω (vb )| (3.94)
Using that ω∅,v is independent of |C̃ω (v)|, taking expectations we get
χ( p) = 1 + bpχ( p) (3.95)
pc
whereby χ( p) = pc − p .
The recursive nature of the tree, and of the ensuing calculations, allows us to look
at some more complicated variants of the percolation process:
56 CHAPTER 3. BRANCHING PROCESSES
Problem 3.27 k-core percolation : Consider the problem of so called k-core percola-
tion on the tree Tb . Here we take k ≥ 3 and take a sample of C (∅). Then we start
applying the following pruning procedure: If a vertex has less than k “children” to
which it is connected by an occupied edge, we remove it from the component along
with the subtree rooted in it. Applying this over and over, this gives us a decreasing
sequence Cn (∅) of subtrees of C (∅). Let ϑ ( p) denote the probability that Cn (∅) is
infinite for all n. Show that ϑ ( p) is the largest positive solution to
b
b `
ϑ = πk (ϑp) where πk (λ) = ∑ λ (1 − λ ) b − λ , (3.96)
`=k
`
i.e., πk (λ) is the probability that Binomial(b, λ) is at least k. Explain why ϑ ( p) for
1-core percolation equals to θ ( p) for the ordinary percolation.
Having understood percolation on the regular rooted tree, we can now move on to
a slightly complicated setting which is percolation on the complete graph Kn . This
problem has emerged independent of the development in percolation as a model
of a random graph and is named after its inventors Erdős and Rényi.
The complete graph Kn has vertices {1, . . . , n} and an (unoriented) edge between
every pair of distinct vertices. Given p ∈ [0, 1] we toss a biased coin for each edge
and if it comes out heads — which happens with probability p — we keep the edge
and if we get tails then we discard it. We call the resulting random graph G (n, p).
The principal question of interest is the distribution of the largest connected com-
ponent; particularly, when it is of order n.
Our main observation is that a percolation transition still occurs in the setting, even
though the formulation is somewhat less clean due to the necessity to take n → ∞:
Theorem 3.28 For any α ≥ 0 and e > 0, let θe,n (α) denote the probability that ver-
tex “1” is in a component G (n, α/n) of size at least en. Then
θ = 1 − e−αθ . (3.98)
In particular, (
= 0, α ≤ 1,
θ (α) (3.99)
> 0, α > 1.
Exercise 3.29 Show that θ (α) is the probability that the branching process with
n
Poisson offspring distribution pn = αn! e−α survives forever.
The key idea of our proof is to explore the component of vertex “1” using a search
algorithm. The algorithm keeps vertices in three classes: explored and active — called
3.4. ERDŐS-RÉNYI RANDOM GRAPH 57
jointly “discovered” — and undisovered. Initially, we mark vertex “1” as active and
the others as undiscovered. Then we repeat the following:
(1) Pick the active vertex v that has the least index.
(3) Change the status of v to explored and its undiscovered neighbors to active.
The algorithm stops when we run out of active vertices. It is easy to check that this
happens when we have explored the entire connected component of vertex “1”.
Let Ak denote the number of active vertices at the k-th stage of the algorithm; the
initial marking of “1” is represented by A0 = 1. If v ∈ Ak is the vertex with the
least index, we use Lk to denote the number of as of yet undiscovered neighbors
of v. We have
Ak+1 = Ak + Lk − 1. (3.100)
Note that n − k − Ak is the number of undiscovered vertices after the k-th run of
the above procedure.
Lemma 3.30 Conditional on Ak , we have
Proof. The newly discovered vertices are chosen with probability α/n from the set of
n − k − Ak undiscovered vertices.
We will use the above algorithm to design a coupling with the corresponding search
algorithm for bond percolation on a regular rooted tree — or, alternatively, with a
branching process with a binomial offspring distribution. Recall that Tb denotes
the rooted tree with forward degree b and Kn the complete graph on n vertices. To
make all percolation processes notationally distinct, we will from now on write PTn
for the law of percolation on Tn and PKn for the corresponding law on Kn . In all
cases below the probability that an edge is occupied is α/n.
Lemma 3.31 [Coupling with tree percolation] For m ≤ n and r ≤ n − m,
Proof. The proof is based on the observation that, as long as less than n − m vertices
have been discovered, we can couple the variables Lk for Tm , Kn and Tn so that
(Tm ) ( Kn ) (Tn )
Lk ≤ Lk ≤ Lk (3.103)
Now think of the binomial random variable Binom(b, α/n) as the sum of the first b
terms in a sequence of Bernoulli random variables that are 1 with probability α/n
58 CHAPTER 3. BRANCHING PROCESSES
(Tm ) ( Kn )
and zero otherwise. To get Lk we then add only the first m, to sample Lk we
(T )
add the first n − k − Ak , and to get Lk n
we add the first n of these variables. Under
the condition
m ≤ n − k − Ak ≤ n (3.105)
we will then have (3.103). The upper bound in this condition is trivial and the
lower bound will hold as long as k + Ak ≤ n − m.
This argument shows that, if the connected component C (1) of vertex “1” in Kn is
of size r, then so is the component C (∅) on Tn , i.e., the right inequality in (3.102)
holds. Similarly, thinking of adding the discovered vertices to the tree one by one,
before the component C (1) of Kn reaches the size r ≤ n − m, the component C (∅)
of Tm will not be larger than r. This implies
PKn |C (1)| < r ≤ PTm |C (∅)| < r ,
r ≤ n − m, (3.106)
whose complement then yields the left inequality in (3.102).
The following observation will be helpful in the proof:
(n)
Lemma 3.32 [Continuity in offspring distribution] Let p(m) = (pm ) be a family
of offspring distributions and let p be an offspring distribution such that 0 < p0 < 1.
Suppose that, for each m ≥ 0,
p(mn) −→ pm . (3.107)
n→∞
Then for each r ≥ 1,
Pp(n) r ≤ ∑ X` < ∞ −→ Pp r ≤
n→∞
∑ X` < ∞ . (3.108)
`≥0 `≥0
where Pp(n) and Pp denote the law of the branching process with offspring distributions p(n)
and p, respectively.
Problem 3.33 Define
∑ pm
(n) −sm
λ(n) (s) = − log e (3.109)
m ≥0
and show that λ(n) (s) → λ(s), where λ(s) is defined using p = (pm ). Then use this
to prove the lemma.
Now we are ready to prove our result for the Erdős-Rényi random graph:
Proof of Theorem 3.28. We will prove upper and lower bounds on the quantity
θe,n := PKn (|C (1)| ≥ en). (3.110)
By Lemma 3.31, it follows that
θe,n ≥ PT(1−e)n |C (∅)| ≥ en ≥ PT(1−e)n |C (∅)| = ∞
(3.111)
Equation (3.73) in Corollary 3.21 shows that the right-hand side is the largest posi-
tive solution to the equation
θ = 1 − 1 − nα θ )(1−e)n (3.112)
3.4. ERDŐS-RÉNYI RANDOM GRAPH 59
As
1 − nα θ )(1−e)n −→ e(1−e)αθ (3.113)
n→∞
this equation and, by convexity of the right-hand side, also its maximal positive
solution converge to that of (3.98) in the limits n → ∞ followed by e ↓ 0. This
proves (3.97) with the limits replaced by limes inferior.
It remains to prove the corresponding upper bound. Lemma 3.31 gives us
The first term on the right-hand side then converges to the solution to (3.73) and so
it suffices to show that the second term vanishes after we take the required limits.
To this end we fix r ≥ 1 and write
The only n-dependence on the right hand side is then through the offspring distri-
bution. As
α n−m αm −α
n α m
1− −→ e , m ≥ 0, (3.117)
m n n n→∞ m!
n
the offspring distribution converges to Poisson, pn (α) = αn! e−α , and so, by Lemma 3.32,
we have
PTn r ≤ |C (∅)| < ∞ −→ Pp(α) r ≤ ∑ X` < ∞ .
(3.118)
n→∞
`≥0
The rest of the analysis has to be done depending on the value of α. Indeed, if α ≤ 1
— which corresponds to the (sub)critical regime — then the branching process dies
out with probability one and so the right-hand side tends to zero as r → ∞. For
α > 1, we can use duality (Theorem 3.12) to convert this to the similar question for
α < 1 and so the convergence to zero holds in this case as well. We conclude that,
after the required limits θe,n is bounded by the maximal positive solution of (3.98).
This proves (3.97) with limes superior and thus finishes the proof.
The fact that all vertices of Kn look “the same” suggest that θ (α) actually represents
the fraction of vertices in components of macroscopic size. This is indeed the case,
but we will not try to prove it here. In fact, more is known:
Theorem 3.34 Given a realization of G (n, α/n), let C1 , C2 , . . . be the list of all connected
components ranked decreasingly by their size. Then we have:
(1) If α < 1, then |C1 | = O(log n), i.e., all components are at most logarithmic.
(2) If α > 1, then |C1 | = θ (α)n + o (n) and |C2 | = O(log n), i.e, there is a unique giant
component and all other components are of at most logarithmic size.
(3) If α = 1, then |C1 |, |C2 |, . . . are all of order n2/3 with a nontrivial limit distribution
of n−2/3 |C1 |, n−2/3 |C2 |, . . . .
60 CHAPTER 3. BRANCHING PROCESSES
The case (3) is the one most interesting because corresponds to the critical behavior
we saw at p = pc on the regular tree. The regime actually extends over an entire
critical window, i.e., for values
1 λ
p = + 4/3 (3.119)
n n
where λ is any fixed real number. One of the key tools to analyze this regime is the
tree-search algorithm that we used at the beginning of this section.
Chapter 4
Percolation
Percolation was mentioned earlier in these notes in the context of infinite trees and
complete graphs. Here we will consider percolation on the hypercubic lattice Zd ,
or more generally on graphs with underlying spatial geometry. The main goal is
to prove a uniqueness of the infinite connected component by the famous Burton-
Keane uniqueness argument.
The goal of this section is to establish the existence of a unique percolation threshold.
The argument is based on monotonicity in p. Intuitively it is clear that increasing p
will result into a larger number of occupied edges which in turn means that the
graph is more likely to contain large connected component. To make this more
precise, we will couple percolation for all p’s on one probability space and exhibit
the monotonicity explicitly.
Lemma 4.1 [Coupling of all p’s] Consider a graph G = (V, E) and consider a family
of i.i.d. uniform random variables Ue indexed by the edges e ∈ E. For each p ∈ [0, 1] define
(
( p) 1, if Ue ≤ p,
ωe = (4.1)
0, if Ue > p.
( p)
Then (ωe ) are Bernoulli with parameter p and the graph with vertices V and edges
( p)
E( p) = {e ∈ E : ωe = 1} has the law of bond percolation on G with parameter p.
( p)
Proof. The random variables ωe are independent because the Ue ’s are indepen-
dent. As
( p)
P(ωe = 1) = P(Ue ≤ p) = p (4.2)
( p)
the distribution of (ωe is Bernoulli with parameter p.
The above allows us to establish the monotonicity of the quantity
θ ( p) = P p |Cω (∅)| = ∞ .
(4.3)
61
62 CHAPTER 4. PERCOLATION
Figure 4.1: A sample from bond percolation on Z2 with p = 0.65. Only the sites
that have a connection via occupied edges to the boundary of the 50 × 50 box are
depicted; the others are suppressed.
Note that, unlike for the regular tree Tb , we are not making any claim about the
value of θ ( p) at p = pc . It is expected that θ ( pc ) = 0 in all d ≥ 1 but proofs exist
only for d ≤ 2 and d ≥ 19. This is one of the most annoying basic open problem in
percolation theory.
Exercise 4.3 Show that pc = 1 in d = 1.
Theorem 4.4 [Non-triviality of percolation threshold] We have
occupied edge-self-avoiding paths from ∅ “to infinity.” Let An be the event that
there exists such a path of length at least n and let cn (d) be the number of such
paths of length n starting from the origin. Then
P p ( An ) ≤ cn (d) pn (4.7)
In order to bound the right-hand side, we need to estimate cn (d). We can do this as
follows: The path has 2d ways to leave the origin. In every next step, there are at
most 2d − 1 choices to go because the path cannot immediately “backtrack” to the
previous vertex. This implies
i.e., the grid Z2 shifted by half lattice spacing in each direction. This graph has the
property that each of its edges crosses exactly one edge in Z2 and vice versa. For
this reason we will refer to (Z2 )∗ as the dual graph and to the edge of (Z2 )∗ crossing
edge e of Z2 as the dual edge.
Proof of pc < 1. We will prove that in all d ≥ 2,
2
pc ≤ (4.11)
3
Figure 4.2: A connected component of occupied edges and a contour on the dual
lattice bisecting (some of) the vacant edges adjacent to this component.
where IntΓ denotes the set of vertices of Z2 surrounded by Γ and |Γ| denotes the
number of edges in Γ. Now
P p ( BΓ ) ≤ (1 − p)|Γ| (4.13)
for all N ≥ 1. Once (1 − p)3 < 1, the series on the right-hand side is absolutely
summable and the N → ∞ limit is thus zero. Therefore, the complementary event,
{∃v : |Cω (∅)| = ∞}, has probability one.
Historically, another critical value of p was introduced to characterize the values
of p for which the mean-cluster size is finite. Let, as before,
The coupling of all p’s implies that p 7→ χ( p) is increasing and so it makes sense to
define the critical value
πc = sup p ≥ 0 : χ( p) < ∞ .
(4.16)
4.1. PERCOLATION TRANSITION 65
This very simple but deep theorem — whose validity we already checked for reg-
ular tree — has profound consequences. One of them is the following observation
due to Hammersely. Consider the two-point connectivity function,
τp ( x, y) = P p x ∈ Cω (y)
(4.17)
Despite the apparent lack of symmetry on the right-hand side, we clearly have
τp ( x, y) = τp (y, x ) because the event boils down to having an occupied path be-
tween x and y.
Theorem 4.6 Suppose χ( p) < ∞. Then there exists c > 0 such that
Combined with the previous result, this says that the two-point connectivities de-
cay exponential throughout the entire subcritical regime. Or, in other words, the
result rules out the existence of so called intermediate phase which would be charac-
terized by absence of percolation and yet non-exponential decay of connectivities.
We will prove this theorem once we have introduced the BK-inequality.
Exercise 4.7 Prove Theorem 4.5 in d = 1.
This implies
P p ( R L ≥ n) ≤ ∑ ∑ τp ( x, y) (4.20)
x ∈ B L y ∈Zd
|y−z|≥n
From Theorem 4.6 we know that τp ( x, y) ≤ e−c| x−y| and a calculation shows
We will invoke some useful “abstract nonsense” theorems from probability theory.
The first one goes back to Kolmogorov:
Definition 4.10 [Tail Event] Let (η j ) be random variables indexed by j ∈ N. We say
that an event A is a tail event if the following holds: For any η ∈ A and any k ≥ 1, any η 0
such that η j0 = η j for j ≥ k obeys η 0 ∈ A.
In other words, A is a tail event if changing any finite random of the random vari-
ables η j won’t affect the containment in A.
Theorem 4.11 [Kolmogorov’s Zero-One Law] Suppose (η j ) be i.i.d. Then every tail
event has probability either zero or one.
This law may be deemed responsible for many limit theorems in probability theory.
E.g., it implies that the event limn→∞ ( X1 + · · · + Xn )/n exists and equals EX1 is tail
event and so, for i.i.d. X j ’s, it occurs with probability zero or one.
Exercise 4.12 Verify this statement.
are tail events and so exactly one of them occurs with probability one and the others with
probability zero.
4.2. UNIQUENESS OF INFINITE COMPONENT 67
Proof. Tail events are not affected by a change of any finite number of edges. It is
easy to check that if there is no infinite connected component, then a change of any
finite number of edges will not introduce one. Similarly, if there are infinitely many
infinite components, changing any finite number of edges is not going to destroy
more than a finite number of them. So { N = 0} and { N = ∞} are tail events. The
event {1 ≤ N < ∞}, being the complement of { N = 0} ∪ { N = ∞}, is then a tail
event as well.
Note that, for different p, a different event may be the one with probability one.
E.g., for p < pc we definitely have P p ( N = 0) = 1 while for p > pc we have P p ( N >
0) = 1. The latter follows because as P( N > 0) has a positive probability, it must
already have probability one.
Our other technical “abstract nonsense” tool will be the Ergodic Theorem. Con-
sider a collection of random variables (ηx ) indexed by vertices in Zd . We define
the translation by z to be the function Tz that acts on the η’s as
( Tz η ) x = ηx+z , x ∈ Zd . (4.24)
In other words, Tz shifts η in such a way that the coordinate ηz lands at the origin. If
f (η ) is a function of these random variables, then f ◦ Tz denotes the function such
that ( f ◦ Tz )(η ) = f ( Tz (η )).
Definition 4.14 [Shift invariant events] We say that an event A is shift invariant if
for each η ∈ A,
Tz (η ) ∈ A holds for all z ∈ Zd (4.25)
Theorem 4.15 [Spatial Ergodic Theorem for Bernoulli] Let (ηx ) be i.i.d. random
variables indexed by x ∈ Zd and let f = f (η ) be a function thereof. Suppose that
E| f (η )| < ∞. Then, with probability one,
1
lim
n→∞ (2n + 1)d ∑ f ◦ Tx = E f (η ). (4.26)
x : | x |≤n
This theorem has a number of proofs all of which go somewhat beyond the scope
of these notes. However, one of the possible lines of attack gets actually very close:
Problem 4.16 Suppose d = 1 for simplicity and let f = f (η ) be a function that
depends only on finitely many coordinates. Use the Strong Law of Large Numbers
to show that (4.26) holds. Note: From here Theorem 4.15 follows by approximating
every integrable function by functions of this type.
Let us see how bond percolation on Zd fits into the above framework. For each ver-
tex z ∈ Zd , we have a collection of d independent random variables ωe1 (z) , . . . , ωed (z)
where e1 (z), . . . , ed (z) denote the edges with one endpoint at z that are oriented in
the positive coordinate direction. We then set
η z = ( ω e1 ( z ) , . . . , ω e d ( z ) ) . (4.27)
68 CHAPTER 4. PERCOLATION
As each ωe appears in exactly one ηz , these variables are i.i.d. and they encode the
entire percolation configuration. Going back to the question about the number of
infinite connected components, we can thus draw the following immediate conse-
quence of Theorem 4.15:
Corollary 4.17 Consider bond percolation on Zd with parameter p ∈ [0, 1]. Then there
exists k ∈ {0, 1, . . . } ∪ {∞} such that P p ( N = k ) = 1.
Proof. The event { N = k } is translation invariant because shifting the ω’s will not
affect the component structure of the resulting random graph. Hence P p ( N = k )
is either zero or one. But the union of these events over k ∈ {0} ∪ N ∪ {∞} is
everything and so one of these events actually must occur with probability one.
Again, which k is the “lucky one” depends on the value of p. Next we observe:
Lemma 4.18 Suppose that 0 < p < 1. If P p ( N = k ) > 0 for some k ≥ 2, then we also
have P p ( N ≤ k − 1) > 0.
Proof. Without loss of generality assume k < ∞. If the full graph contains k clusters
with positive probability, then there exists n such that the box Λn = [−n, n]d ∩ Zd
intersects at least two of them with positive probability. But every configuration
in the box Λn has positive probability and, in particular, that where all edges with
both endpoints in Λn are occupied does as well. In this configuration, all infinite
components intersecting Λn became connected and, as there were at least two such
components, in the new configuration we have one less than originally. It follows
that P p ( N ≤ k − 1) > 0 as claimed.
Corollary 4.19 [Zero, One or Infinity] Consider bond percolation on Zd with param-
eter p. Then exactly one of the events { N = 0}, { N = 1} or { N = ∞} has probability
one and others have probability zero.
Proof. The previous lemma showed that if P( N = k ) cannot have probability one
for 2 ≤ k < ∞. So, by Corollary 4.17, it must have probability zero.
Finally, we nail things down even further:
Theorem 4.20 [Burton-Keane’s Uniqueness Theorem] Consider bond percolation
on Zd with parameter p. Then P p ( N = ∞) = 0.
Proof of Theorem 4.20. The first step of the proof is to realize that, if P p ( N = ∞) = 1,
then the origin has a positive probability to be an encounter point. Indeed, for a
sufficiently large box Λn we will find at least three distinct infinite clusters that
intersect Λn with a positive probability. We can connect these clusters by paths to
4.2. UNIQUENESS OF INFINITE COMPONENT 69
the origin and set all other edges in Λn to vacant. This will make the origin an
encounter point.
Let E(v) be the event that v is an encounter point. The Ergodic Theorem now
implies that if
q := P p ( E(v)) > 0 (4.28)
then the box Λn will, with probability tending to one, contain at least q/ | Λ |
2 n en-
counter points.
We will show that this leads to a contradiction: Label encounter points in Λn in
some fashion, say, v1 , . . . , vm , and for each vi define a distinct vertex wi , called an
exit point, on the boundary of Λn as follows: For v1 pick a vertex in one of the
components meeting at v1 . For v2 do the same noting that even if v2 lies on a path
between v1 and w1 , the fact that v2 is encounter implies the existence of another
vertex on the boundary that is not on this path. This applies for each vi because
even if it lies on a path between some v j , j < i, and wk , k < i, there will always be
another path from vi to the boundary that has not yet been used.
We conclude that, given we have m ≥ (q/2)|Λn | encounter points inside Λn , we can
find as many distinct exit points on ∂Λn . That would imply
|∂Λn | q
≥ (4.29)
|Λn | 2
which is a contradiction for n sufficiently large as |∂Λn | grows like nd−1 while |Λn |
grows like nd . Hence we could not have P p ( N = ∞) > 0 to begin with and so, by
Corollary 4.19 either P p ( N = 0) = 1 or P p ( N = 1) = 1.
70 CHAPTER 4. PERCOLATION
Exercise 4.22 Derive a uniform lower bound on the probability that, under the
condition (4.28), the box Λn will contain at least q/2|Λn | encounter points. Use that
the expected number of encounter points in this box is q|Λn |.