Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Branching Processes: Galton-Watson Processes Were Introduced by Francis Galton in 1889 As A Simple Mathemat

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15
At a glance
Powered by AI
The document discusses Galton-Watson branching processes and their properties. It also discusses how Polya urn models relate to Yule processes.

A Galton-Watson process is a stochastic process that models a population where each individual produces a random number of offspring according to a given probability distribution. It is constructed using a doubly infinite array of i.i.d. random variables with the given distribution.

If the mean of the offspring distribution is less than 1, then the Galton-Watson process will die out (reach an absorbing state of 0 individuals) with probability 1. If the mean is greater than 1, the process has a positive probability of avoiding extinction.

BRANCHING PROCESSES

1. GALTON-WATSON PROCESSES

Galton-Watson processes were introduced by Francis Galton in 1889 as a simple mathemat-


ical model for the propagation of family names. They were reinvented by Leo Szilard in the late
1930s as models for the proliferation of free neutrons in a nuclear fission reaction. General-
izations of the extinction probability formulas that we shall derive below played a role in the
calculation of the critical mass of fissionable material needed for a sustained chain reaction.
Galton-Watson processes continue to play a fundamental role in both the theory and applica-
tions of stochastic processes.
First, an informal desription: A population of individuals (which may represent people, or-
ganisms, free neutrons, etc., depending on the context) evolves in discrete time n = 0, 1, 2, . . .
according to the following rules. Each n th generation individual produces a random number
(possibly 0) of individuals, called offspring, in the (n + 1)st generation. The offspring counts
, , , . . . for distinct individuals , , , . . . are mutually independent, and also indepen-
dent of the offspring counts of individuals from earlier generations. Furthermore, they are
identically distributed, with common distribution {pk }k 0 . The state Z n of the Galton-Watson
process at time n is the number of individuals in the n th generation.
More formally,

Definition 1. A Galton-Watson process {Z n }n0 with offspring distribution F = {pk }k 0 is a


discrete-time Markov chain taking values in the set Z+ of nonnegative integers whose transition
probabilities are as follows:

(1) P {Z n+1 = k | Z n = m } = pkm .

Here {pkm } denotes the mth convolution power of the distribution {pk }. In other words, the
conditional distribution of Z n+1 given that Z n = m is the distribution of the sum of m i.i.d.
random variables each with distribution {pk }. The default initial state is Z 0 = 1.

Construction: A Galton-Watson process with offspring distribution F = {pk }k 0 can be built


on any probability space that supports an infinite sequence of i.i.d. random variables all with
distribution F . Assume that these are arranged in a doubly infinite array, as follows:

11 , 12 , 13 ,
21 , 22 , 23 ,
31 , 32 , 33 ,
etc.
1
2 BRANCHING PROCESSES

Set Z 0 = 1, and inductively define


Zn
X
(2) Z n+1 = n+1
i .
i =1

The independence of the random variables i guarantees


n
that the sequence (Z n )n0 has the
Markov property, and that the conditional distributions satisfy equation (1).

For certain choices of the offspring distribution F , the Galton-Watson process isnt very in-
teresting. For example, if F is the probability distribution that puts mass 1 on the integer 17,
then the evolution of the process is purely deterministic:
Z n = (17)n for every n 0.
Another uninteresting case is when F has the form
p0 p1 > 0 and p0 + p1 = 1.
In this case the population remains at its initial size Z 0 = 1 for a random number of steps with
a geometric distribution, then jumps to 0, after which it remains stuck at 0 forever afterwards.
(Observe that for any Galton-Watson process, with any offspring distribution, the state 0 is
an absorbing state.) To avoid having to consider these uninteresting cases separately in every
result to follow, we make the following standing assumption:
Assumption 1. The offspring distribution is not a point mass (that is, there is no k 0 such that
pk = 1), and it places positive probability on some integer k 2. Furthermore, the offspring
distribution has finite mean > 0 and finite variance 2 > 0.

1.1. First Moment Calculation. The inductive definition (2) allows a painless calculation of
the means E Z n . Since the random variables n+1
i are independent of Z n ,

m
X X
E Z n+1 = E n+1
i 1{Z n = m }
k =0 i =1

m
X X
= E n+1
i P {Z n = m }
k =0 i =1

X
= m P {Z n = m}
k =0
= E Z n .
Since E Z 0 = 1, it follows that

(3) E Z n = n .

Corollary 1. If < 1 then with probability one the Galton-Watson process dies out eventually,
i.e., Z n = 0 for all but finitely many n . Furthermore, if = min{n : Z n = 0} is the extinction time,
then
P { > n } n .
BRANCHING PROCESSES 3

Proof. The event { > n} coincides with the event {Z n 1}. By Markovs inequality,

P {Z n 1} E Z n = n .

1.2. Recursive Structure and Generating Functions. The Galton-Watson process Z n has a sim-
ple recursive structure that makes it amenable to analysis by generating function methods.
Each of the first-generation individuals , , , . . . behaves independently of the others; more-
over, all of its descendants (the offspring of the offspring, etc.) behaves independently of the
descendants of the other first-generation individuals. Thus, each of the first-generation indi-
viduals engenders an independent copy of the Galton-Watson process. It follows that a Galton-
Watson process is gotten by conjoining to the single individual in the 0th generation Z 1 (con-
ditionally) independent copies of the Galton-Watson process. The recursive structure leads to
a simple set of relations among the probability generating functions of the random variables
Zn :

Proposition 2. Denote P by n (t ) = E t Z n the probability generating function of the random vari-



able Z n , and by (t ) = k =0 pk t k the probability generating function of the offspring distribu-
tion. Then n is the nfold composition of by itself, that is,

(4) 0 (t ) = t and
(5) n+1 (t ) = (n (t )) = n ((t )) n 0.

Proof. There are two ways to proceed, both simple. The first uses the recursive structure di-
rectly to deduce that Z n+1 is the sum of Z 1 conditionally independent copies of Z n . Thus,

n+1 (t ) = E t Z n +1 = E n (t )Z 1
= (n (t )).

The second argument relies on the fact the generating function of the mth convolution power
{pkm } is the m th power of the generating function (t ) of {pk }. Thus,

X
n+1 (t ) = E t Z n+1 = E (t Z n+1 | Z n = k )P (Z n = k )
k =0

X
= (t )m P (Z n = k )
k =0
= n ((t )).

By induction on n , this is the (n + 1)st iterate of the function (t ). 

Problem 1. (A) Show that if the mean offspring number := k k pk < P


P
then the expected
size of the n th generation is E Z n = . (B) Show that if the variance = k (k )2 pk <
n 2

then the variance of Z n is finite, and give a formula for it.


4 BRANCHING PROCESSES

Properties of the Generating Function (t ): Assumption 1 guarantees that (t ) is not a linear


function, because the offspring distribution puts mass on some integer k 2. Thus, (t ) has
the following properties:

(A) (t ) is strictly increasing for 0 t 1.


(B) (t ) is strictly convex, with strictly increasing first derivative.
(C) (1) = 1.

1.3. Extinction Probability. If for some n the population size Z n = 0 then the population size
is 0 in all subsequent generations. In such an event, the population is said to be extinct. The
first time that the population size is 0 (formally, = min{n : Z n = 0}, or = if there is no
such n ) is called the extinction time. The most obvious and natural question concerning the
behavior of a Galton-Watson process is: What is the probability P { < } of extinction?
Proposition 3. The probability of extinction is the smallest nonnegative root t = of the equa-
tion
(6) (t ) = t .

Proof. The key idea is recursion. Consider what must happen in order for the event < of
extinction to occur: Either (a) the single individual alive at time 0 has no offspring; or (b) each
of its offspring must engender a Galton-Watson process that reaches extinction. Possibility (a)
occurs with probability p0 . Conditional on the event that Z 1 = k , possibility (b) occurs with
probability k . Therefore,

X
= p0 + p k k = (),
k =1
that is, the extinction probability is a root of the Fixed-Point Equation (6).
There is an alternative proof that = () that uses the iteration formula (5) for the prob-
ability generating function of Z n . Observe that the probability of the event Z n = 0 is easily
recovered from the generating function n (t ):
P {Z n = 0} = n (0).
By the nature of the Galton-Watson process, these probabilities are nondecreasing in n, be-
cause if Z n = 0 then Z n +1 = 0. Therefore, the limit := limn n (0) exists, and its value is
the extinction probability for the Galton-Watson process. The limit must be a root of the
Fixed-Point Equation, because by the continuity of ,
() = ( lim n (0))
n
= lim (n (0))
n
= lim n +1 (0)
n
= .

Finally, it remains to show that is the smallest nonnegative root of the Fixed-Point Equa-
tion. This follows from the monotonicity of the probability generating functions n : Since
BRANCHING PROCESSES 5

0,
n (0) n () = .
Taking the limit of each side as n reveals that . 

It now behooves us to find out what we can about the roots of the Fixed-Point Equation (6).
First, observe that there is always at least one nonnegative root, to wit, t = 1, this because (t )
is a probability generating function. Furthermore, since Assumption 1 guarantees that (t ) is
strictly convex, roots of equation 6 must be isolated. The next proposition asserts that P there
are either one or two roots, depending on whether the mean number of offspring := k k pk
is greater than one.
Definition 2. A Galton-Watson process with mean offspring number is said to be supercritical
if > 1, critical if = 1, or subcritical if < 1.
Proposition 4. Unless the offspring distribution is the degenerate distribution that puts mass 1
at k = 1, the Fixed-Point Equation (6) has either one or two roots. In the supercritical case, the
Fixed-Point Equation has a unique root t = < 1 less than one. In the critical and subcritical
cases, the only root is t = 1.

Together with Proposition 3 this implies that extinction is certain (that is, has probability
one) if and only if the Galton-Watson process is critical or subcritical. If, on the other hand, it
is supercritical then the probability of extinction is < 1.

Proof. By assumption, the generating function (t ) is strictly convex, with strictly increasing
first derivative and positive second derivative. Hence, if = 0 (1) 1 then there cannot be a
root 0 < 1 of the Fixed-Point Equation = (). This follows from the Mean Value theorem,
which implies that if = () and 1 = (1) then there would be a point < < 1 where 0 ( ) = 1.
Next, consider the case > 1. If p0 = 0 then the Fixed-Point Equation has roots t = 0 and
t = 1, and because (t ) is strictly convex, there are no other positive roots. So suppose that
p0 > 0, so that (0) = p0 > 0. Since 0 (1) = > 1, Taylors formula implies that (t ) < t for values
of t < 1 sufficiently near 1. Thus, (0) 0 > 0 and (t ) t < 0 for some 0 < t < 1. By the
Intermediate Value Theorem, there must exist (0, t ) such that () = 0. 

1.4. Tail of the Extinction Time Distribution. For both critical and subcritical Galton-Watson
processes extinction is certain. However, critical and subcritical Galton-Watson processes dif-
fer dramatically in certain respects, most notably in the distribution of the time to extinction.
This is defined as follows:
(7) = min{n 1 : Z n = 0}.
Proposition 5. Let {Z n }n0 be a Galton-Watson process whose offspring distribution F has mean
1 and variance 2 < . Denote by the extinction time. Then
(A) If < 1 then there exists C = C F (0, ) such that P { > n } C n as n .
(B) If = 1 then P { > n } 2/(2 n ) as n .
6 BRANCHING PROCESSES

Thus, in the subcritical case, the extinction time has an exponentially decaying tail, and
hence finite moments of all orders. On the other hand, in the critical case the extinction time
has infinite mean.

Proof. First note that P { > n} = P {Z n > 0}. Recall from the proof of Proposition 3 that P {Z n =
0} = n (0); hence,
P { > n} = 1 n (0).
This shows that the tail of the distribution is determined by the speed at which the sequence
n (0) approaches 1. In the subcritical case, the graph of the generating function (t ) has slope
< 1 at t = 1, whereas in the critical case the slope is = 1. It is this difference that accounts
for the drastic difference in the rate of convergence.

Subcritical Case: Consider first the case where = 0 (1) < 1. Recall from the proof of Proposi-
tion 3 that in this case the sequence n (0) increases and has limit 1. Thus, for n large, n (0) will
be near 1, and in this neighborhood the first-order Taylor series will provide a good approxi-
mation to . Consequently,
(8) 1 n+1 (0) = 1 (n (0))
= 1 (1 (1 n (0)))
= 1 (1 0 (1)(1 n (0))) + O (1 n (0))2
= (1 n (0)) + O (1 n (0))2 .
If not for the remainder term, we would have an exact equality 1n+1 (0) = (1n (0)), which
could be iterated to give
1 n (0) = n (1 0 (0)) = n .
This would prove the assertion (A). Unfortunately, the equalities are exact only in the special
case where the generating function (t ) is linear. In the general case, the remainder term in
the Taylor series expansion (??) must be accounted for.
Because the generating function (t ) is convex, with derivative 0 (1) = , the error in the
approximation (8) is negative: in particular, for some constant 0 < C < ,
(1 n (0)) C (1 n (0))2 1 n+1 (0) (1 n (0)).
The upper bound implies that 1 n (0) n (repeat the iteration argument above, replacing
equalities by inequalities!). Now divide through by (1 n (0)) to get
n1 (1 n +1 (0))
1 C (1 n (0)) 1 =
n (1 n (0))
n1 (1 n+1 (0))
1 C n 1.
n (1 n (0))
Thus, successive ratios of the terms n (1 n (0)) are exceedingly close to 1, the error decay-
ing geometrically. Since these errors sum, Weierstrass Theorem on convergence of products
implies that
n (1 n (0))
lim = lim n (1 n (0)) := C F
n 0 (1 0 (0)) n
BRANCHING PROCESSES 7

exists and is positive.

Critical Case: Exercise. (See Problem 4 below.) 

1.5. Asymptotic Growth Rate for Supercritical Galton-Watson Processes. It is not hard to see
that if a Galton-Watson process Z n is supercritical (that is, the mean offspring number > 1)
then either Z n = 0 eventually or Z n . Here is an informal argument for the case where
p0 > 0: Each time that Z n = K , for some K 1, there is chance p0K that Z n+1 = 0. If somehow
the process Z n were to visit the state K infinitely many times, then it would have infinitely many
chances to hit an event of probability p0K ; but once it hits this event, it is absorbed in the state
0 and can never revisit state K . This argument can be made rigorous:
Problem 2. Prove that if a Markov chain has an absorbing state z , and if x is a state such that
z is accessible from x , then x is transient.

If Z n is supercritical, then it follows that with positive probability (=1-probability of extinc-


tion) Z n . How fast does it grow?
Theorem 6. There exists a nonnegative random variable W such that
(9) lim Z n /n = W almost surely.
n

If the offspring distribution has finite second moment1 and > 1 then the limit random variable
W is positive on the event that Z n .

Given the Martingale Convergence Theorem, the convergence (9) is easy; however, (9) is quite
difficult to prove without martingales. In section 2 below, I will prove an analogous conver-
gence theorem for a continuous-time branching process.

1.6. Problems.
Problem 3. Suppose that the offspring distribution is nondegenerate, with mean 6= 1, and
let be the smallest positive root of the Fixed-Point Equation. (A) Show that if 6= 1 then the
root is an attractive fixed point of , that is, 0 () < 1. (B) Prove that for a suitable positive
constant C ,
n (0) C 0 ()n .
(Hence the term attractive fixed point.)
Problem 4. Suppose that the offspring distribution is nondegenerate, with mean = 1. This is
called the critical case. Suppose also that the offspring distribution has finite variance 2 . (A)
Prove that for a suitable positive constant C ,
1 n (0) C /n .
(B) Use the result of part (A) to conclude that the distribution of the extinction time has the
following scaling property: for every x > 1,
lim P ( > n x | > n ) = C /x .
n
1Actually, it is enough that P log k < : this is the Kesten-Stigum theorem.
k 3 pk k
8 BRANCHING PROCESSES

HINT for part (A): The Taylor series approximation to (t ) at = 1 leads to the following ap-
proximate relationship, valid for large n :
1
1 n +1 (0) 1 n (0) 00 (1)(1 n (0))2 ,
2
which at first does not seem to help, but on further inspection does. The trick is to change
variables: if xn is a sequence of positive numbers that satisfies the recursion
xn +1 = xn b xn2
then the sequence yn := 1/xn satisfies
yn +1 = yn + b + b /yn + . . . .
Problem 5. Theres a Galton-Watson process in my random walk! Let Sn be the simple nearest-
neighbor random walk on the integers started at S0 = 1. Define T to be the time of the first visit
to the origin, that is, the smallest n 1 such that Sn = 0. Define Z 0 = 1 and
T
X 1
Zk = 1{X n = k and X n+1 = k + 1}.
n=0
In words, Z k is the number of times that the random walk X n crosses from k to k +1 before first
visiting 0.

(A) Prove that the sequence {Z k }k 0 is a Galton-Watson process, and identify the offspring dis-
tribution as a geometric distribution.

(B) Calculate the probability generating function of the offspring distribution, and observe that
it is a linear fractional transformation. (See Ahlfors, Complex Analysis, ch. 1 for the definition
and basic theory of LFTs. Alternatively, try the Wikipedia article.)

(C) Use the result of (B) to find out as much as you can about the distribution of Z k .

(D) Show that T = k 1 Z k is the total number of individuals ever born in the course of the
P

Galton-Watson process, and show that (the extinction time of the Galton-Watson process) is
the maximum displacement M from 0 attained by the random walk before its first return to the
origin. What does the result of problem 4, part (B), tell you about the distribution of M ?

2. YULES BINARY FISSION PROCESS

2.1. Definition and Construction. The Yule process is a continuous-time branching model,
in which individuals undergo binary fission at random times. It evolves as follows: Each in-
dividual, independently of all others and of the past of the process, waits an exponentially
distributed time and then splits into two identical particles. (It is useful for the construction
below to take the view that at each fission time the fissioning particle survives and creates one
new clone of itself.) The exponential waiting times all have mean 1. Because the exponential
random variables are mutually independent, the probability that two fissions will occur simul-
taneously is 0.
A Yule process started by 1 particle at time 0 can be built from independent Poisson processes
as follows. Let {N j (t )} j N be a sequence of independent Poisson counting processes. Since the
BRANCHING PROCESSES 9

interoccurrence times in a Poisson process are exponential-1, the jump times in the Poisson
process N j (t ) can be used as the fission times of the j th particle; at each such fission time, a
new particle must be added to the population, and so a new Poisson process Nk (t ) must be
activated. Thus, the time Tm at which the m th fission occurs can be defined as follows: set
T0 = 0 and
Xm
(10) Tm = min{t > Tm 1 : (N j (t ) N j (Tm 1 )) = 1}.
j =1

Thus, Tm is the first time after Tm 1 that one of the first m Poisson processes jumps. The size
Z t of the population at time t is then
(11) Zt = m for Tm 1 t < Tm .
A similar construction can be given for a Yule process starting with Z 0 = k 2 particles: just
change the definition of the fission times Tm to
+k 1
mX
(12) Tm = min{t > Tm 1 : (N j (t ) N j (Tm 1 )) = 1}.
j =1

Alternatively, a Yule process with Z 0 = k can be gotten by superposing k independent Yule


j j
processes Z t all with Z 0 = 1, that is,
k
X
j
(13) Zt = Zt
j =1

Problem 6. Show that by suitably indexing the Poisson processes in the first construction (12)
one can deduce the superposition representation (13).
Problem 7. Calculate the mean E Z t and variance var(Z t ) of the population size in a Yule pro-
cess. For the mean you should get E Z t = e t . HINT: Condition on the time of the first fission.

2.2. Asymptotic Growth.


Theorem 7. Let Z t be the population size at time t in a Yule process with Z 0 = 1. Then
a .s .
(14) Z t /e t W
where W has the unit exponential distribution.

The proof has two parts: First, it must be shown that Z t /e t converges to something; and
second, it must be shown that the limit random variable W is exponentially distributed. The
proof of almost sure convergence will be based on a careful analysis of the first passage times
Tm defined by (10). Convergence of Z t /e t to a positive random variable W is equivalent to
convergence of log Z t t to a real-valued limit log W . Since Z t is a counting process (that is, it
is nondecreasing in t and its only discontinuities are jumps of size 1), convergence of log Z t t
is equivalent to showing that there exists a finite random variable Y = log W such that for any
" > 0,
(15) lim (Tm log m ) = Y .
m
To accomplish this, we will use the following consequence of the construction (10).
10 BRANCHING PROCESSES

Proposition 8. Let Tm be the fission times in a Yule process Z t with Z 0 = k . Then the interoccur-
rence times m := Tm Tm 1 are independent, exponentially distributed random variables with
expectations E m = 1/(m + k 1).

Proof (Sketch). The random variable Tm is the first time after Tm 1 at which one of the Poisson
processes N j (t ), for 1 j m + k 1, has a jump. Times between jumps in a Poisson process
are exponentially distributed with mean 1, and jump times in independent Poisson processes
are independent. Thus, the time until the next jump in m independent Poisson processes is
the minimum of m independent exponentials, which is exponentially distributed with mean
1/m .
This is not quite a complete argument, because the start times Tm are random. However,
it is not difficult (exercise!) to turn the preceding into a rigorous argument by integrating out
over the possible values of Tm and the possible choices for which Poisson processes jump at
which times. 

The family of exponential distributions is closed under scale transformations: In particular,


if Y is exponentially distributed with mean 1 and > 0 is a scalar, then Y is exponentially
distributed with mean . Since the variance var(Y ) of a unit exponential is 1, it follows that the
variance var(Y ) of an exponential with mean is 2 . Consequently, if m = Tm Tm 1 is the
time between the (m 1)th and the m th fission times in a Yule process with Z 0 = 1, then

(16) E m +1 = m 1 and var(m +1 ) = m 2 ,

and so
m
X m
X
(17) E Tm+1 = k 1 log m and var(Tm +1 ) = k 2 (2) <
k =1 k =1
P
as m , where (2) = k =1 k 2 . In particular, the variance of Tm remains bounded as
m , and so the distribution of Tm remains concentrated around log m. In fact, Tm log m
converges, to a possibly random limit, by the following general result about random series of
independent random variables:

Theorem 9. Let X j be independent random variables with mean E X j = 0 and finite variances
var(X j ) = 2j . Then

X n
X
(18) 2j 2
:= < = lim X j := S
n
j =1 j =1

exists and is finite with probability one, and the limit random variable S has mean zero and
variance 2 .

A proof of Theorem 9, based on Walds Second Identity, is given in section 3 below. Modulo
this, we have proved (15), and hence that W = limt Z t /e t exists and is finite and strictly
positive with probability 1.
BRANCHING PROCESSES 11

2.3. Characterization of the Exponential Distributions. It remains to show that the limit ran-
dom variable W is exponentially distributed with mean 1. For this, we appeal to self-similarity.
Let T = T1 be the time of the first fission. At this instant, two identical offspring particles are
produced, each of which engenders its own Yule process. Thus,

(19) Zt = 1 if t < T and


Z t = Z t0 T + Z t00T if t T

where Z s0 and Z s00 are independent Yule processes and independent of the fission time T
each started with Z 00 = Z 000 = 1 particle. Divide each side by e t and let t to get

(20) W = e T (W 0 + W 00 ) = U (W 0 + W 00 )

where T is a unit exponential and W 0 , W 00 are independent replicas of W , both independent


of T . Note that U = e T is uniformly distributed on the unit interval.

Proposition 10. If W is a positive random variable that satisfies the distributional equation
(20) then W has an exponential distribution. Conversely, there exist (on some probability space)
independent unit exponential random variables T, W 0 , W 00 such that the random variable W
defined by (20) also has the unit exponential distribution.

Proof. The converse half is easy, given what we know about Poisson processes: Take a unit-
intensity Poisson process Nt and let be the time of the second occurrence. Then is the
sum of two independent unit exponentials. Furthermore, we know that the time of the first
occurrence is, conditional on , uniformly distributed on the interval [0, ]. Thus, if we multiply
by an independent uniform-[0,1], we obtain a random variable whose distribution coincides
with that of the time of the first occurrence in a Poisson process. (Note: The random variable
U so obtained is not the same as the time of first occurrence in Nt , but its distribution must
be the same.)
The direct half is harder. I will show that if W is a positive random variable that satisfies (20)
then its Laplace transform

(21) ( ) := E e W

must coincide with the Laplace transform of the exponential distribution with mean , for
some value of > 0. By the Uniqueness Theorem for Laplace transforms, this will imply that
W has an exponential distribution. The strategy will be to take the Laplace transform of both
sides of (20), and to split the expectation on the right side into two, one for the event {U < 1"}
and the other for {U 1 "}. Letting " 0 will then lead to a first-order differential equation
for ( ) whose only solutions coincide with Laplace transforms of exponential distributions.
The sordid details: equation (20) and the independence of U , W 0 , W 00 imply that for any " > 0
12 BRANCHING PROCESSES

and every > 0,


Z 1
E e u (W
W 0 +W 00 )
( ) = E e = du
0
Z 1" Z 1
u (W 0 +W 00 )
E e u (W
0 +W 00 )
= Ee du + du
0 1"
Z 1 Z 1
(1")u(W 0 +W 00 ) 0 00
= Ee d u (1 ") + E e u W E e u W d u
0 1"
Z 1
= (1 ")( ") + (u )2 d u
1"

Subtract ( (1 ")) from both sides and divide by " to get


Z1
( ) ( ") 1
= ( ") + (u )2 d u .
" " 1"

Now take " 0 and use the continuity and boundedness of ( ) together with the Fundamen-
tal Theorem of Calculus to conclude that

(22) 0 ( ) = ( ) + ( )2 .

It is easily checked that for any > 0 the Laplace transform ( ) = /(+ ) of the exponential
distribution with mean 1/ > 0 is a solution of the differential equation (22). This gives a one-
parameter family of solutions; by the uniqueness theorem for first-order ordinary differential
equations, it follows that these are the only solutions. 

3. CONVERGENCE OF RANDOM SERIES

This section is devoted to the proof of Theorem 9. Assume that X 1 , X 2 , . . . are independent
random variables with means E X j = 0 and finite variances 2j = E X j2 , and for each n = 0, 1, 2, . . .
set
n
X
(23) Sn = Xj.
j =1

Walds Second Identity . For any bounded stopping time T ,


T
X
(24) E ST2 =E 2j .
j =1

Proof. Since T is a stopping time, for any integer k 1 the event {T k } = {T > k 1} depends
only on the random variables X i for i < k , and hence is independent of X k . In particular, if j < k
then E X j X k 1{T k } = E X j 1{T k }E X k = 0. Now suppose that T is a bounded stopping time;
BRANCHING PROCESSES 13

then T m almost surely for some integer m 1. Thus,


m 2
X
E ST2 = E X k 1{T k }
k =1
m
X X
= E X k2 1{T k } + 2 E X j X k 1{T k }
k =1 1 j <k m
m
X
= E X k2 1{T k }
k =1
m
X
= E X k2 E 1{T k }
k =1
m
X
= k2 E 1{T k }
k =1
T
X
=E k2 .
k =1

P
Corollary 11. (L 2 Maximal Inequality) Assume that the total variance 2 := j =1 j
2
< .
Then for any > 0,
2
(25) P {sup |Sn | } .
n1 2

Proof. Define T to be the first n such that |Sn | , or + if there is no such n. The event
of interest, that sup |Sn | , coincides with the event {T < }. This in turn is the increasing
limit of the events {T m } as m . Now for each finite m the random variable T m is a
bounded stopping time, so Walds Identity implies
!
m
TX
E ST2 m = E 2j 2 .
j =1

Hence,
2 P {T m} E ST2 m 1{T m} E ST2 m 2 .


Convergence of Random Sequences: Strategy. Let {sn }n 1 be a sequence of real (or complex)
numbers. To show that the sequence sn converges, it suffices to prove that it is Cauchy; and for
this, it suffices to show that for every k 1 (or for all sufficiently large k ) there exists an integer
nk such that
(26) |snk sn | 2k for all n nk .
Now suppose that the sequence Sn is random. To prove that this sequence converges with prob-
ability one, it suffices to exhibit a sequence of integers nk such that the complements Gkc of the
14 BRANCHING PROCESSES

events
(27) Gk := {|Snk Sn | 2k n nk }
occur only finitely many times, with probability one. For this, it is enough to prove that

X
X
(28) P (Gkc ) = E 1Gkc < ,
k =1 k =1

because if the expectation is finite then the random count itself must be finite, with probability
one. This is the Borel-Cantelli criterion for convergence of a random series.

Proof of Theorem 9. Assume then that the random variables Sn are the partial sums (23) of
independent random variables X j with means E X j = 0 and variances E X j2 = 2j such that the
total variance 2 = j 2j < . Then for every k 1 there exists nk < such that
P


X
2j 8k .
j =nk

By the Maximal Inequality,


P (Gkc ) = P { sup |Sn Snk | 2k } 8k /4k = 2k .
nnk

Since k 2k < , the Borel-Cantelli criterion is satisfied, and so the sequence Sn is, almost
P

surely, Cauchy, and therefore has a finite limit S . Exercise: If you know the basics of measure
theory, prove that E S = 0 and E S 2 = 2 . Hint: First show that Sn S in L 2 , and conclude that
the sequence Sn is uniformly integrable.


4. THE POLYA URN

4.1. Rules of the game. The Polya urn is the simplest stochastic model of self-reinforcing be-
havior, in which repetition a particular act makes it more likely that the same act will be re-
peated in the future. Suppose that every afternoon you visit a video-game arcade with two
games: MS. PAC-MAN and SPACE INVADERS. On the first day, not having played either game
before, you choose one at random. With each play, you develop a bit more skill at the game you
choose, and your preference for it increases, making it more likely that you will choose it next
time: In particular, if after n visits you have played MS. PAC-MAN Rn times and SPACE INVADERS
Bn = n Rn times, then on the (n + 1)st day the chance that you decide to put your quarter in
SPACE INVADERS is
Bn + 1
n := .
n +2
It is natural to ask if after a while your relative preferences for the two games will begin to sta-
bilize, and if so to what?
It is traditional to re-formulate this model as an urn model. At each step, a ball is chosen at
random from among the collection of all balls (each colored either RED or BLUE) in the urn, and
is then replaced, together with a new ball of the same color. More formally:
BRANCHING PROCESSES 15

Definition 3. The Polya urn is a Markov chain (Rn , Bn ) on the space N N of positive integer
pairs (r, b ) with transition probabilities
(29) p ((r, b ), (r + 1, b )) = r /(r + b ),
p ((r, b ), (r, b + 1)) = b /(r + b ).
The default initial state is (1, 1). The associated sampling process is the sequence X n of Bernoulli
random variables defined by X n = 1 if Rn+1 = Rn + 1 and X n = 0 if Rn+1 = Rn .

4.2. The Polya urn and the Yule process. Hidden within the Yule binary fission process is a
Polya urn. Heres how it works: Start two independent Yule processes Yt R and Yt B , each having
one particle at time 0 (thus, Y0R = Y0B = 1). Mark the particles of the process Yt R red, and those
of the process Yt B blue. Set
Yt = Yt R + Yt B ;
then Yt is itself a Yule process, with initial state Y0 = 2.
Start the Yule process with two particles, one RED, the other BLUE. (Or, more generally, start
it with r0 red and b0 blue.) Recall that at the time Tm of the mth fission, one particle is chosen at
random from the particles in existence and cloned. This creates a new particle of the same color
as its parent. Thus, the mechanism for duplicating particles in the Yule process works exactly
the same way as the replication of balls in the Polya urn: in particular, the sequence of draws
(RED or BLACK) made at times T1 , T2 , . . . has the same law as the sampling process associated
with the Polya urn.

(To be continued.)

You might also like