Stochastic Processes Notes
Stochastic Processes Notes
Stochastic Processes Notes
Overview of Probability
We call (X, , P ) a probability space. Here is the sample space, X : R is a random variable (RV)
and P is a probability (measure). This is a function on subsets of . Elements are called outcomes.
Subsets of are called events.
Given A R, P {X A} = { : X() A}. Given x R, {X = x} = { : X() = x}.
Example
Suppose we toss a coin twice, then = {HH, HT, T H, T T }, || = 4, X is the number of heads. Then X
is the of values that X takes, that is X = {0, 1, 2}. Now {X = 1} = { : X() = 1} = {HT, T H}.
P (X = 1) = P (HT ) + P (T H).
P
(or X ) could be discrete, for example X = {x1 , x2 , ...}. We require
xX P (X = x) = 1. (or X )
R
could be continuous.
If X = [0, 1], P (X A) = A fX (x)dx. Here fX (x) is a probability density function (pdf),
R
fX (x) 0, X fX (x)dx = 1.
Expectation
P
If X = {x1 , x2 , ...} and g : X R then E(g(X)) =
g(xi )P (X = xi ). If g(X) = X, E(X) := X , then mean
2
2
2
of X. If g(X) = (X X )2 then E(g(X))
:=
V
ar(X)
=
e
for r N {0}. Recall that
Let X = {0, 1, 2, ...}. We say that X P oisson() if P (X = r) = r!
P
P
xr
exp(x) = r0 r! , which means that r P (X = r) = 1. For the Poisson distribution the mean and variance are
both . Other discrete distributions include the geometric and hypergeometric distributions.
The exponential distribution is memoryless, that is the time to wait does not depend on the time already waited.
More specifically, P (X > t + s|X > s) = P (X > t) = et .
Gaussian/Normal distribution
We say X N (, 2 ) if the pdf is
(x )2
1
fX (x) = exp
2 2
2
for x R. E(X) = , V ar(X) = 2 .
The Central Limit Theorem (CLT)
Supoose (Xi )ni=1 are IID RVs with E(Xi ) = , V ar(Xi ) = 2 and let Sn =
Pn
i=1
Xi . Then if Zn =
Sn
n
n
N (0, 1), that is the distribution converges to N (0, 1). If A = [a, b] then P (Zn A)n
Rb
a
exp(u2 /2
du.
2
This can bePapplied, for example if we take Sn to be the number of heads in n coin tosses. E(Sn ) = n/2,
n
V ar(Sn ) = i=1 V ar(Xi ) = n/4. Hence
P
Sn n2
[a, b]
1
2 n
=
a
exp(u2 /2)
du.
2
dr MX
dtr t=0
= E(X r ).
E(e ) =
n
X
n
r=0
by the Binomial theorem. Hence the sum of IID Bernoulli trials has a Binomial distribution. If we fix > 0 and
let = np with n (so p 0). Note that in a special case with n and p close to 21 then we can apply the
n
CLT. In this case we can use the fact that limn 1 + nx = ex then
n
(et 1)
MY (t) = 1 +
exp (et 1) .
n
If Z P oisson(), P (Y = r) =
MY (t) =
xr
r=0 r! .
for r 0. Then
etr
r=0
since ex =
r e
r!
X (et )r e
r e
=
= exp((et 1))
r!
r!
r=0
P
(X
=
n).
If
=
e
,
then
we
recover
M
(t).
X
n=0
Properties
GX (1) = 1.
P
n1
X
dG
P (X = n),
n=1 n
d =
G00X (1) = E(X(X 1)).
dGX
d =1
= E(X).
Example
X = X1 + ... + Xn , where the Xi Bernoulli(p). Now GXi () = (1 p + p). If the Xi are independent then
GX () = GX1 ()GX2 ()...GXn () = (1 p + p)n . So X Bin(n, p).
Example
Consider GX () =
1
2 .
n
1X
1
GX () = E( ) =
=
P (X = n) =
2
2 n=0 2
n=0
1
2n+1 ,
fX,Y (xi , yj )
P (X = xi , Y = yj )
=
.
P (Y = yj )
fY (yj )
If
PX, Y are independent then fX,Y (xi , yj ) = fX (xi )fY (yj ) for all i, j. Given g : X Y R, E(g(X, Y )) =
i,j g(xi , yj )fX,Y (xi , yj ). If X, Y are independent and g(X, Y ) = h1 (X)h2 (Y ) then E(g(X, Y )) = E(h1 (X))E(h2 (Y )).
The conditional expectation of X given Y is the quantity E(X|Y ). This is a function of Y , the average over
X given by a value of Y . If Y = yj , then
E(X|Y = yj ) =
X
i
xi P (X = xi |Y = yj ) =
X
i
xi
fX,Y (xi , yj )
,
fY (yj )
a function of Y = yj . E(X|Y ) is a RV which is governed by the probability distribution of Y , hence we can also
take expectations.
Tower rule
E(E(X|Y )) = E(X).
We have a useful check, if X and Y are independent then E(X|Y ) = E(X). In general
!
X X xi fX,Y (xi , yj )
X X
X
E(E(X|Y )) =
fY (yj ) =
xi
fX,Y (xi , yj ) =
xi fX (xi ) = E(X).
fY (yj )
i
j
i
j
i
Compound processes
Suppose (Xi )
i=1 are IID RVs with PGF GX () (since the Xi are IID, X = Xi ). Suppose N is a RV with PGF GN (),
independent of the Xi . Let Z = X1 +X2 +...+XN . Z is a compound process, a random sum of random variables.
Proposition
For the compound process Z, the PGF is GZ () = GN (GX ()) = GN GX ().
Proof
By definition
GZ () = E(Z ) =
n P (Z = n)
n=0
X1 +X2 +...+Xn
= E(
) = E(E(X1 +X2 +...+Xn |N )) [Tower rule]
X
X
=
E(X1 +X2 +...+Xn |N = n)P (N = n) =
E(X1 )E(X2 )...E(Xn )P (N = n) [Independence]
=
n=0
n=0
n=0
1
X
n P (X = n) =
n=0
1
(1 + )
2
and
6
1X n
P (N = n) =
.
GN () = E( ) =
6 n=1
n=1
N
6
X
Hence
GZ () = GN
6
1
1X 1
(1 + ) =
(1 + )n .
2
6 n=1 2n
It follows that
P (Z = k) is given by
the k coefficient in the sum. By the binomial theorem, we have P (Z = k) =
P
n
6
n
1
1
, recalling that nk = 0 for k > n.
n=1 k
6
2
Branching Processes
Let Sn be the number of individuals in a population at time n. Suppose S0 = 1 (one individual at time 0). Individuals evolve at each timestep according to a common RV X, and evolve independently of others. We assume X
has PGF GX (). Let Xi , i 1 be IID copies of X. We want to work out the long term behaviour of Sn , E(Sn )
and P (Sn = 0).
We use generating function analysis. For Sn , denote the PGF by Gn (). So since S1 = X, G1 () = GX (). For
S2 , G2 () = E(2S ) = E(E(S2 |X)) = GX GX () by the previous proposition. Similarly, G3 () = E(E(S3 |S2 )) =
GX GX GX ().
Proposition
Gn () = GX GX ... GX () (n-fold composition). Moreover Gn () = GX (Gn1 ()) = Gn1 (GX ()).
Proof
This follows easily by induction.
Remark: the coefficient of k in Gn () gives P (Sn = k).
Expected behaviour of Sn
0
0
n
We want to study E(Sn ) = dG
d =1 = Gn (1). Let = E(X) = GX (1). We work out E(Sn ) iteratively. Now
Gn () = GX (Gn1 ()) G0n () = G0X (Gn1 ())G0n1 () [Chain rule]
G0n (1) = G0X (Gn1 (1))G0n1 (1)
G0n (1) = G0X (1)G0n1 (1) [Since GX (1) = 1 for all RVs X]
E(Sn ) = E(Sn1 ).
Since E(S1 ) = , we can apply this iteratively to get n := E(Sn ) = n .
Probability of extinction
Recall, given Gn (), P (Sn = 0) = Gn (0). Let en = Gn (0) be the probability of extinction at time n. Let
e = limn en be the probability of ultimate extinction. Now e1 = GX (0), e2 = GX (GX (0)) = GX (e1 ). By
iteration, en+1 = GX (en ), that is en = GX GX ... GX (0) (n-fold composition).
Finding e
We can begin
P to find e by plotting GX () for [0, 1]. Note that GX (0) [0, 1], GX (1) = 1 and since
GX () = n=0 n P (X = n), hence GX () is increasing. There are two cases, as seen in the following figure.
2
3
2
Rearranging
this equation we get 2 6 + 3 + = 0 ( 1)( + 4 2) = 0, so the three roots are
= 1 and
= 2 6. We need to take the positive root for this to make sense as a probability, so e = 2 + 6.
Envelope problem
Take two envelopes, one contains twice the amount of the other. Pick an envelope, suppose it contains amount x.
Hence the other envelope contains either x/2 or 2x. The expected value for switching is 12 x2 + 12 (2x) = 5x
4 > x,
suggesting that you should always switch. This is a slight misconception, since the same argument can be applied
again meaning that you would be better off staying with the envelope you already have, which gives a paradox.
Consider S0 random, with PGF GY () specified. How do the results (as before) change? In this case, S1 =
X1 + X2 + ... + XY and Sn+1 = X1 + X2 + ... + XSn .
Proposition
n (), where G
n () is the S0 = 1 case PGF.
The PGF Gn () for Sn is given by Gn () = GY G
Proof
1 () = GY GX (). We can apply this argument
We make the observation that G1 () = E(E(S1 |Y )) = GY G
repeatedly to get the result.
Consequences
n := E(Sn ) = Y
n , where
n is the S0 = 1 case mean. en := GY (
en ), where en is the S0 = 1 case. Hence
e = GY (
e), where e is the ultimate extinction probability assuming S0 = 1.
Example
Suppose S0 = 6 and GX () = 0.3 + 0.5 + 0.22 . Work out E(Sn ) and e.
We have E(Sn ) = Y
n . Since Y = 6 and
n = 0.9, E(Sn ) = 6(0.9)n . Similarly, e = GY (
e). Since e = 1,
e = GY (1) = 1.
What if GY () = (0.4 + 0.6)3 ?
This is a Binomial(3, 0.4) distribution for S0 . So E(Y ) = 1.2, which means that E(Sn ) = 1.2(0.9)n . e = GY (
e) =
(0.4
e + 0.6)3 = 1 since e = 1. Remark: If e = 1, then we always get e = 1 if GY () is a well defined PGF.
Poisson Processes
Definition
Events occur as a Poisson process if the intervals of time between events are IID exponentially distributed RVs.
Recall that a RV T has an exponential distribution of its pdf is given by fT (t) = et for t 0 and > 0 (and 0
otherwise). For example,
Z
Z
1
u
t
P (T > t) =
e
du = e , E(T ) =
tet dt = .
t
0
The mean time between successive events is
1
.
Let T1 be the time to the first event, T2 be the time between the first and second events,..., Tk be the time
between the k 1-th and the k-th events. Let Sn = T1 + T2 + ... + Tn , this is the time to the n-th event. Assume
{Tk }nk=1 is a sequence of IID RVs each with exponential distribution of rate . Recall the memoryless property of
the exponential distribution: P (T > t + s|T > s) = P (T > t).
Questions: What is the distribution of Sn ? Given a specified time t, how many events occur within this time?
Remark: Suppose we have n lightbulbs in sequence. We may be interested to find P (min{T1 , ..., Tn } t) or
P (max{T1 , ..., Tn } 6).
Theorem 2.1
n1
exp(t)
The time Sn to the n-th event follows a gamma distribution with pdf gn (t) = (t) (n1)!
for t 0, n 1.
Note that n = 1 gives the exponential distribution.
Proof
We will compute the moment generating functions (MGF) for Sn via the MGFs for the Tk , and show that this
coincides with the MGF for a RV Y with pdf gn (t). By uniqueness of MGFs, the distributions must then coincide.
Recall that MT (t) = E(etT ). T has an exponential distribution, hence
Z
Z
MT (t) =
etu (eu )du =
eu(t) du =
exp(u(t ))
,
=
t
t
0
0
0
that is assuming t < . It follows that
MSn (t) = E(ttSn ) = E(etT1 etT2 ...etTn ) = E(etT1 )E(etT2 )...E(etTn )
n
Suppose we now wish to fix some period of time t and consider the number N of events in this time.
Theorem 2.2
The number Nt of events in time t follows a Poisson distribution with parameter t, that is Nt P o(t). That is
r
P (Nt = r) = (t) exp(t)
for r 0.
r!
Proof
Consider the following. For r 1, P (At least r events in time period t) = P (Time to the r-th event is at most
Rt
t) = 0 gr (x)dx. P (Exactly r events in time t) = P (At least r events in time t) P (At least r + 1 events in time
Rt
Rt
r
.
t) = 0 gr (x)dx 0 gr+1 (x)dx = (t) exp(t)
r!
We can draw several consequences from this.
Combining Poisson processes
A Poisson stream is a sequence of arrivals (events) where the inter-arrival times are independent and follow an
exponential distribution.
Suppose males (M ) arrive to a shop as a Poisson stream of rate m . Suppose females (F ) also arrive as a Poisson
stream with rate f . Assume both streams are independent. We want to analyse the combined process for the total
arrivals.
Method 1
Let G(t) := P (Time to the next arrival is less than t). This is the probability distribution of the arrival time of
the next customer, irrespective of being male or female. Now G(t) = 1 P (No arrivals before time t) = 1 P (No
males arrive in time t)P (No females arrive in time t) = 1 exp(m t) exp(f t) = 1 exp((m + f )t).
Method 2
Let N be the number of arrivals in time t, N (m) be the number of male arrivals and N (f ) be the number of female
arrivals. Then N (m) P o(m t), N (f ) P o(f t). Then
P (N = k) = P (N (m) + N (f ) = k) =
k
X
P (N (m) = r and N (f ) = k r) =
r=0
=
=
r!
P (N (m) = r)P (N (f ) = k r)
r=0
k
X
(m t)r exp(m t) (f t)kr exp(f t)
r=0
k
X
(k r)!
k
exp(m t) exp(f t) X k
(m t)r (f t)kr
=
k!
r
r=0
Splitting processes
Suppose customers arrive as a Poisson stream with combined rate . For each customer that arrives, there is a
probability p that the customer is male, and probability 1 p that the customer is female. Arrivals are independent.
We want to find the arrival process
P for females alone. We expect f = (1 p). Now, P (No females arrive in time
t) = P (No arrivals in time t) + n=1 P (n arrivals in time t, each arrival is male)
)
(
X
X
(t)n exp(t) n
(tp)n
= exp(t) exp(tp) = exp((1 p)t).
p = exp(t) 1 +
= exp(t) +
n!
n!
n=1
n=1
Hence the inter-arrival times for females have an exponential distribution with parameter (1 p).
Remark: we can extend combining and splitting to an arbitrary number of Poisson stream types.
Example
Suppose events occur as a Poisson stream. Suppose we know that there are N events in time T (N, T fixed). Now
fix t < T . What is the probability distribution that governs the number of events in time t?
We can split the time interval T into two parts, one up to time t and one after time t. Now
P ({r events in time t} {N events in time T })
P (N events in time T )
P ({r events in time t} {N r events in time T t})
=
P (N events in time T )
by the memoryless property of the exponential distribution. The top events are independent so we can take the
product of the probabilities. Hence
(t)r exp(t)
((T t))N r exp((T t))
N!
P (r events in time t|N events in time T ) =
r!
(N r)!
(T )N exp(t)
N r
=
p (1 pt )N r
r t
where pt =
t
T
Remark: Also consider the time to the r-th event given N events in time T . The corresponding distribution
is a beta distribution and the mean time taken is NrT
+1 .
Example
Suppose students arrive to Harrison as a Poisson stream of rate 3 per unit time.
(i) Find P (5 students enter within time 2).
(ii) Find P (Time taken for 4-th student to arrive is at least 2).
(iii) Find P (3 students enter in time 1|5 students enter by time 2).
(iv) If one student entered in time 1, show that the time the student entered is uniformly distributed on [0, 1].
5
(i) We need to find P (N2 = 5) for = 3. P (N2 = 5) = 6 exp(6)
= 0.161 to 3 sf.
5!
(ii) Following the proof of Theorem 2.2:
62
63
6
P (4 th event in time t 2) = P ( 3 events in time 2) = P (N2 3) = e
1+6+
+
= 0.151 to 3 sf.
2
3!
3 1 2
5
(iii) For N = 5, T = 2, r = 3, t = 1, P (3 students in time 1|5 students in time 2) = 53 12
= 16
.
2
t
(iv) The single event in time T is governed by a Binomial(1, T ). Since T = 1, the probability of success is t, that
means we get the probability distribution for a U nif orm[0, 1] RV.
Poisson Rate Equations (Steady state)
Previously we have has the number of arrivals in time t Nt P oisson(t). If Nt is specified alone, the system
state will just tend to . Well consider a corresponding departure process (also with a Poisson distribution
P oisson(t)). Well examine the long run average, the probability distribution governing the state of the system in
future time (after transient effects). We hope that the long run probabilities that govern the number of individuals
3/2
limx0 o(x)
is a suitable choice for o(x), but x isnt. In time period t, the number N of events is
x = 0. x
governed by a P oisson(t) distribution. We want to look at
2 t2
+ ... = 1 t + o(t)
2!
P (N = 1) = t exp(t) = t + o(t)
P (N = 0) = exp(t) = 1 t +
P (N 2) =
Well consider probabilities with o(t) as insignificant. Well now consider a system which contains a population
of individuals. Denote the system state by n, that is the number of individuals in the system. Given state n, we
assume arrivals are governed by a P oisson(n t) distribution, and departures are governed by a Poisson distribution
with rate n (per unit time). Observe that if the state of the system changes then so do the probability distributions
that govern arrivals and departures, that is the rates n and n will change. If the system is in state n = 0 (empty)
then 0 = 0. Also, assume an upper limit on capacity so that if we are in state N (for some fixed number) then
N = 0.
State diagram
Figure 2: The state diagram for the system, show the directions of the rate constants i and j .
Define Pn (t) to be the probability of being in state n at time t. Well compare Pn (t) to its neighbours. In fact
the evolution of Pn (t) with time will just depend on the neighbours. Transitions to states with a difference in n at
least 2 will have small probabilities in time scale t. Note we assume all processes are independent. Now suppose
Pn (t) is given for all 0 n N . Consider P0 (t + t), that is the probability of being in state 0 at time t + t. Now
P0 (t + t) = P (State 0 at time t and no arrivals in time t) + P (State 1 at time t and one departure from state 1
in time t) + P (State at least 2 at time t and sufficient departures to 0). Hence
P0 (t + t) = P0 (t)(1 0 t) + P1 (t)1 t + o(t).
(1)
Similarly
PN (t + t) = PN (t)(1 N t) + PN 1 (t)N 1 t + o(t).
(2)
Now we consider 0 < n < N . In this case we also need to include the possibility of no arrivals or departures about
our given state. Hence
Pn (t + t) = Pn1 (t)n1 t + Pn+1 (t)n+1 t + Pn (t)(1 nt)(1 n t) + o(t)
= Pn1 (t)n1 t + Pn+1 (t)n+1 t + Pn (t)(1 n t n t) + o(t).
(3)
(4)
n (t)
We can rearrange Equations 1 to 3 to get the LHSs in the form Pn (t+t)P
and take t 0. These LHSs become
t
dPn
for
0
N
.
Hence
we
obtain
the
Poisson
rate
equations:
dt
Equation 1
dP0
= 1 P1 (t) 0 P0 (t)
dt
(5)
10
Equation 2
dPN
= N 1 PN 1 (t) N PN (t)
dt
(6)
Equation 3
dPn
= n1 Pn1 (t) + n+1 Pn+1 (t) (n + n )Pn (t) for 0 < n < N.
dt
(7)
We now have N + 1 coupled ODEs. We will be interested in the steady state, and this means that the long run
behaviour is time-independent. Sufficient conditions for (that is that imply) steady state are at least one of the
following: (i) An upper limit on capacity; (ii) After some state n = n0 , the departure rate n is greater than the
arrival rate n for all n n0 .
Well assume that the system tends to a steady state. Moreover we assume that the steady state is independent of the initial state and transient effects can be ignored (quickly) as time t increases. As we approach the
dPn
n
steady state, Pn (t) Pn as t (for some constant Pn ). Hence dP
dt 0 and hence we set dt = 0 in Equations
4 to 6 to get:
Equation 4 0 = 1 P1 (t) 0 P0 (t)
(8)
(9)
Equation 6 0 = n1 Pn1 (t)n+1 Pn+1 (t) (n + n )Pn (t) for 0 < n < N.
(10)
(11)
n1 ...0
n ...1
PN
n=0
nPn .
11
Recall that if N = we require limn nn < 1. If limn > 1 then there is no steady state (that is the state
tends to infinity with probability 1). If limn nn = 1 then finding a steady state is possible but not always
guaranteed.
Example
Suppose we have one engineer to repair a set of three photocopiers. Individual machines break down at a rate
of once per hour. Repair time is 30 minutes per machine on average. The state of the system if the number of
machines broken. Times are exponentially distributed.
The individual breakdown rate is 1 per hour so = 1. The repair rate is 2 per hour so = 2.
Figure 4: The steady state diagram for the case of 3 photocopiers and 2 engineers.
To find the required information we equate probability flows. Doing this, we find the equations 2P0 = P1 ,
2P1 = 2P2 and P2 = 2P3 . Solving each of these equations for P0 and using P0 + P1 + P2 + P3 = 1 we find
16
57
P0 = 55
. Using this we can calculate E(X) = 55
.
Queueing Theory
As usual, we ignore transient effects and assume steady state. The analysis is via steady state diagrams. We
have the notation that a G1 /G2 /n queue is a queue whose arrival process is governed by a process G1 , a service
(departure) process given by G2 , and n is the number of servers. Denote G1 /G2 /n/ to be the queue as above
with infinite capacity (we usually omit ).
12
Well consider G1 = M () and G2 = M (), where the arrival rate is and the individual service rate is
and G1 , G2 are Poisson streams (note the 0 M 0 denotes Markov). The mean time between successive arrivals is 1
(from the exponential distribution). We will focus on M/M/n queues with n = 1, 2 specifically. Well also consider
finite or infinity capacity for n = 1. Well analyse the probability distribution of the system size, expected system
size and waiting time in the system.
Suppose we have an M/M/1 = M/M/1/ queue. This is a single server queue with infinite capacity. There
is an arrival rate of individuals and service rate . The state is the number of individuals in the systems, that
is the sum of the number of people in the system and the number of people being served. Let = be the traffic
intensity parameter. We get the following steady state equations. P0 = P
we
1 ,...,Pn = Pn+1 ,... By
P
Pinduction
can see that P1 = P0 , P2 = 2 P0 ,..., Pn = n P0 . We then find P0 from n=0 Pn = 1. Hence P0 ( n=0 P n ) = 1
which means that P0 = 1 (provided < 1). If > 1 then there is no steady state solution and the system size
tends to . Hence Pn = n (1 ).
We define Ls to be the mean number of individuals in the system and Lq to be the mean number of individuals in the queue. Now
Ls =
n=0
nPn =
nn (1 ) = (1 )
n=0
X
n=0
nn1 =
P
1
1
since n=0 nxn1 = (1x)
2 provided |x| < 1. We can obtain Lq in two ways. Firstly Lq =
n=1 (n 1)Pn = 1 .
However Ls is the mean number of people in the queue plus the mean number of people being served, that is
Lq + 0 P (System empty)+1 P (System busy). Hence
P
Ls = Lq + 0 P0 + 1 (1 P0 )
2
= Lq + Lq =
.
1
1
1
1
1
Ls
1X
nPn + =
+1 =
=
Ws =
.
Ws =
n=0
1
(1 )
(1 )
2
Lq
=
=
.
(1 )
(1 )
Figure 5: The set up for the M/M/2 queue, note that a single line is formed and the first customer joins any empty
server.
13
X
1
1
2
n
= 1 P0 =
and Pn = 2
n for n 1.
P0 1 + 2
= 1 P0 1 +
1
+
1
+
n=1
This steady state solution is valid provided < 1. As before we have
Ls =
nPn =
n=0
2
,
1 2
Lq =
(n 2)Pn =
n=2
23
.
1 2
We can also use the relation Ls = Lq plus the expected number of people being served. To get Ws and Wq we will
L
use Littles formula: Ws = Ls and Wq = q , that is the expected time spent in the system and the queue respectively.
M/M/N systems
Suppose we have a system with N servers, infinite capacity, an arrival rate and a service rate (per server).
P0
if n N
N ! N nN
P
As before we can set n=0 Pn = 1 to solve for P0 and hence for Pn . In this case we can
P
P
L
apply Littles theorem so Ls = n=0 nPn and Ws = Ls , Lq = n=N (n N )Pn and Wq = q .
provided we set =
Example
Suppose we have an M/M/1 queue with finite capacity N . The possible system states are 0, 1, 2, ..., N . The arrival
rate is and the service rate is and we define the state of the system to be the number of people in the shop. We
can use our previous steady state equations to find the steady state probabilities Pn = n P0 for n N and = .
PN
As usual we solve for P0 by setting n=0 Pn = 1, that is P0 (1 + + 2 + ... + N ) = 1. This is a geometric series so
1
1
N
P0 = 1
) = 1 so P0 = N1+1 . Using the 6= 1
N +1 provided 6= 1. In the case of = 1 we have P0 (1 + 1 + ... + 1
n
(1)
case we have Pn = 1
N +1 for n N . As N , the results are consistent with the infinite capacity case. Now
PN
PN
Ls = n=0 nPn = P0 n=0 nn . Let X be the state of the system. Then
GX () = E(X ) =
N
X
n P (X = n) =
n=0
We then compute
Ls =
G0X (1)
N
X
n=0
P0 ()n =
P0 (1 ()N +1 )
.
1
(1 (N + 1)N + N N +1 )
.
(1 )(1 N +1 )
In this case we are unable to apply Littles theorem (due to customers potentially being turned away). We need to
replace by a modified effective arrival rate ef f = (1 PN ).
14
Littles Formulae
Let ef f denote the effective arrival rate, that is the rate at which customers arrive and actually join the queue,
that is the arrival rate for customers who eventually get served.
Littles Theorem
Ls = ef f Ws , Lq = ef f Wq where Ls (Lq ) is the average number of customers in the system (queue) and Ws (Wq )
is the average waiting time in the system (queue).
Idea of proof
An arriving customer sees Ls in the system (on average). The customer spends time Ws in the system before
departing, in this time a further ef f Ws have arrived. Since were in steady state, the number seen on arrival
should balance the number seen on departure, hence Ls = ef f Ws .
In general ef f 6= . For M/M/1, M/M/2 queues with infinite capacity then ef f = . For an M/M/1 queue with
finite capacity N , ef f = (1 PN ). In general,
ef f =
(Probability of being in state n)(Probability customer stays, given state n)(Arrival rate to state n).
n=0
n=0
P
P
Remark: We have Ls =
n=0 nPn , Lq = P n=r (n r)Pn (for r servers. Also, Ls = Lq +expected number
Figure 7: The two queue systems to be compared. In the case of the two queues, new customers join either queue
with equal probability.
2
1 for M/M/1, Ls = 12
2(/2)
4
= /2
1 and for M/M/2, = 2 = 2 . For M/M/2, Ls = 1(/2)2 = 42 . For M/M/1
Ls
2
4
2 , hence the system total is 2Ls = 2 . For M/M/2, Ws = = 42 (which is
We can quote the results for Ls and Ws for the M/M/1 and M/M/2 queues. Ls =
for M/M/2. For M/M/1,
(/2)
per server, Ls = 1(/2)
=
valid given our assumption that < 2). For each M/M/1 queue we have
Ws =
Individual server Ls
1
2
=
Ls =
.
Arrival rate to the server
(/2)
2
15
2
2
4
=
> 0.
2 4 2
4 2
Hence the expected waiting time in two M/M/1 queues is longer than that of one M/M/2 queue. We can also
compare the expected number of people in each of the two systems:
Expected number in parallel M/M/1 Expected number in M/M/2 =
22
2
4
=
> 0.
2 4 2
4 2
Hence we can expect to see more people in the two M/M/1 queues than in the single M/M/2 queues.
Limitations in the model
1. Finite capacity is usual in realistic models. However for N large, the infinite capacity model is a good
approximation.
2. Customers tend to opt for the queue of minimum length.
3. Arrival and service processes are not always Poisson.
Markov Chains
Each day is considered either cloudy (C), or sunny (S). If C occurs on any given day then on the following day C
occurs with probability 12 . If S occurs on any given day then it is cloudy with probability 31 on the next day.
Let Sn denote the event of being sunny on day n. Let Cn denote the event of being cloudy on day n. Let
P (0) = (P (S0 ), P (C0 )) denote the initial probability state at time 0. For example, if P (0) = (0, 1) then we are
certain that it is cloudy on day 0. We let P (n) = (P (Sn ), P (Cn )). We want to know P (n) given some P (0) . We can
use our information to draw a probability transition diagram.
The components in row 1 of T are the transitions from S and the entries in row 2 are the transitions from C. Note
that the row sum is always 1. Notice also that this rule does no depend on the day n. Hence P (n+1 ) = P (n) T . In
16
general T could depend on time n, and in this case we write T := T (n). However, here T does not depend on time.
By iteration, we see that P (1) = P (0) T ,...,P (n) = P (0) T n . We can observe that as n P (n) settles to a limit
vector, which we call P . That is limn P (n) = P , moreover limn P (n+1) = P . But we have
lim P (n+1) = lim (P (n) T ) = ( lim P (n) )T = P T P = P T.
m
X
i=1
17
system in terms of their recurrence properties. That is, we want to know the frequency of visits to each state as
time evolves.
Consider two urns, Urn 1 and Urn 2. These contain between them 3 balls labelled 1, 2, 3. A ball is selected
at random with an equal chance for any of them to be chosen. We then take that labelled ball and transfer it from
one urn to the other. The state of the system at time n, denoted by Xn is the number of balls in Urn 1. We want
to find P (n) and P (the behaviour as n ).
Figure 9: The probability transition diagram for the case of two urns and three balls.
From the probability transition diagram, we can see that
0 1 0 0
1 0 2 0
3
3
T =
0 2 0 1 .
3
3
0 0 1 0
We observe that as n increases, P (n) oscillates and does not converge
to a limit. So P = limn P (n) does not
P
3
exist as previously defined. However, solving P T = P subject to i=0 Pi = 1 gives a solution P = 18 , 83 , 38 , 81 .
Before solving this, lets fully understand the two state Markov chain.
Figure 10: The probability transition diagram for the two state Markov chain, where 0 a, b 1.
From the probability transition diagram, we can see that
1a
a
T =
.
b
1b
18
If P is the steady state probability then P = P T and P1 + P2 = 1. From the matrix equation we get
P1 = (1 a)P1 + bP2 , P2 = aP1 + (1 b)P2 aP1 = bP2 .
b
a
, a+b
for 0 < a, b < 1. Solving det(T I) = 0 gives
Combining this with the probability equation gives P = a+b
the eigenvalues = 1 and = 1 a b (1, 1) for 0 < a, b < 1. In this case, if |1 a b| < 1 then for any P (0) ,
P (0) T n P with rate bounded by |1 a b|. There are also two special cases to consider.
If a = b = 1 then the
state vectors form an alternating sequence (1, 0) (0, 1) (1, 0).... Moreover 21 , 12 12 , 21 under T . However
Pn1
if we redefine P to be the average of all of the state vectors then n1 k=0 P (k) P . The second special case is
n
b = 0, a 6= 0. In this case P = (0, 1), P (n) = ((1 a)n , 1 (1 a)n ), given P (0) = (1, 0).
Going back to the urn problem with three balls, P (n) oscillates with period 2, however
n1
1 3 3 1
1 X (k)
P =
P
, , ,
.
n
n
8 8 8 8
k=0
What if we now want to consider M balls in two urns? Again each ball is equally likely to be chosen and Xn denotes
the number of balls in Urn 1.
Figure 11: The probability transition diagram for the urn problem with M balls and two urns.
As before we want to determine P . We observe that Tij = P (Xn+1 = j|Xn = i) is an (M + 1) (M + 1)
matrix with zeros on the diagonal. We examine the vector P = (P0 , P1 , ..., PM ) such that P = P T subject to
M
for 0 i M (this gives the Binomial distribution). For
P0 + P1 + ... + PM = 1. This gives Pi = Mi 12
(n)
(0) n
1 3 3 1
M = 3, we saw P = 8 , 8 , 8 , 8 . However P
= P T 6 P as n (it oscillated). Moreover, for the long-run
Pn1 (k)
1
average, n k=0 P
P as n .
Classification of states (subchains)
Definition
A state j is accessible from i if (T n )ij > 0 for some n > 0. Two states i, j communicate if they are accessible to
each other. This induces an equivalence relation () on the states, that is if i and j communicate then
1. If i j then j i.
2. i i.
3. If i j and j k then i k (transitivity).
Consequently, communication splits the Markov chain into subchains. These are disjoint equivalence classes.
A problem in the study of Markov chains is finding these irreducible subchains (consisting of communicating
states). A Markov chain is irreducible if all states within communicate with each other, for example the urn
problem is irreducible.
A state i is called absorbing if Tii = 1 and Tij = 0 for j 6= i. A state i is periodic with period k > 1 if
(T n )ii > 0 when k|n and (T n )ii = 0 otherwise. State i is aperiodic if no such k exists. The urn problem is a Markov
chain with period 2 since all states have period 2.
Recurrence of states
19
(n)
Let fi be the probability of the first return to state i occurring at time n (starting from i at time 0). Let
P
(n)
fi =
n1 fi , this is the probability of eventual return to state i (starting from i initially). Notice that
(n)
fi
n.
6= (T (n) )ii , where (T (n) )ij = P (Xn = j|X0 = i). The latter (T (n) )ii includes intermediate returns before time
Classification of recurrence
If
Pfi = 1(n)then state i is called recurrent, hence the return to state i is certain (this is equivalently expressed by
(t )ii = ). If fi < 1 then state i is said to be transient and return is not certain (again characterised by
Pn1 (n)
(t
)ii < ). The number of returns is then governed by a geometric distribution with parameter fi in the
n1
P
(n)
case of a transient state. In a recurrent state, any number of returns occurs with probability 1. Let i = n=1 nfi ,
the expected recurrence time. For a recurrent state i, we say that state i is positively recurrent if i is finite,
but we say state i is null recurrent if i is infinite. We say a state is ergodic if it is aperiodic and positively
recurrent. Similarly, a subchain is called ergodic if all of its communicating states are ergodic.
Remark: The urn problem has an irreducible Markov chain but is not ergodic. The previous example considering sunny and cloudy weather can be shown to be ergodic.
For ergodic chains, there exists some n such that (T (n) )ij > 0 for all i, j. For ergodic chains, we see that P (n) P
Pn1
with P = P T . This need not hold for irreducible (for example periodic) chains. Instead n1 k=0 P (k) P as
n .
General approach
Given a Markov matrix T , first draw the corresponding probability transition diagram.
Decide which states communicate, and hence identify subchains.
Decide which state are absorbing, periodic or aperiodic.
P
(n)
Decide which states are recurrent or transient. Calculate fi = n1 fi (the sum of the probabilities of first
return at time n for all positive n). If fi = 1 then state i is recurrent and if fi < 1 then state i is transient.
P
(n)
If state i is recurrent, we compute i = n1 nfi . If i < then state i is positively recurrent. If i =
then state i is null recurrent (this requires the use of series convergence tests).
(n)
Remark: If T is a constant transition matrix (in time n) then every recurrent state has i < (fi
exponentially fast).
We pose a classical problem. At each play of a game the gambler wins 1 with probability p and loses 1 with
probability q = 1 p. The gambler aims to win N before losing to 0. The gambler starts with i for 0 i N .
What is the probability that the gambler wins? Let Xn be the gamblers fortune at time n, then Xn {0, 1, ..., N }.
By considering Markov chains, we observe that states 0 and N are absorbing states.
Figure 12: The probability transition diagram for the gamblers ruin problem.
Let Ei be the event of winning given that we start in state i and i be the probability of winning given that
we start in state i. We can show that the states 1, ..., N 1 are transient, so with probability 1 we will eventually
reach either state 0 or state N . We aim to get a recurrence relation (difference equation) between i , i1 and i+1
20
which we will then solve for i in terms of i. Notice that we can write
i = P (Winning|Win at time 1)P (Win at time 1)+P (Winning|Lose at time 1)P (Lose at time 1) = pi+1 +qi1 .
(12)
We know that 0 = 0 and N = 1. We take a trial solution i = Ai for some unknown and A. Then
Ai = pAi+1 + qAi1 Ai1 (p2 + 1) = 0 ( 1)(p q) = 0 = 1 or =
q
.
p
Note that we should always have = 1 as a solution. Combining these two values of , we get the general solution
i
i = A + B pq . From the boundary conditions we have
0 = 0 A + B = 0,
i
q
N
1
p
q
=1A+B
= 1 i =
N .
p
1 pq
1
2
(13)
i
N
(14)
j Tij
(15)
P
Note that this result is not the same as the steady state vector equation P = P T Pj = i Pi Tij . The steady
state vector equation has non-zero values for states in A.
In order to solve this problem, we also need to impose boundary conditions. If state i A then i = 0 if i L
and i = 1 if i W . In the classical gamblers ruin problem, A = {0, N }, L = {0} and W = {N }. Let Di be the
expected time to reach some state in A given that we start in some state i 6 A. From this we can define a set of
boundary conditions Di = 0 if state i A.
21
Lemma
If Tij is the transition matrix then
X
Di =
Dj Tij + 1.
(16)
Proof
Let E(n, i) be the event of reaching A in time n, starting in state i. Then
X
X
P (E(n, i)) =
P (E(n, i)|i j)P (i j) =
P (E(n 1, j))Tij .
j
(17)
P
P
P
By definition, Di = n0 nP (E(n, i)). Then Di = n1 j nP (E(n 1, j))Tij . We interchange the sums and
relabel n by n + 1 then
XX
X X
X X
X
Di =
(n+1)P (E(n, j))Tij Di =
nP (E(n, j)) Tij +
P (E(n, j)) Tij =
Dj Tij +1.
j
n0
n0
n0
Example
We consider the classical gamblers ruin problem. Immediately, Equation 15 gives Di = pDi+1 + qDi1 + 1. This
i
is a difference equation with homogeneous solution Di = A + B pq , and we take a trial particular solution
Di = C + Ei. We substitute this into the above to find C and E. Then we use D0 = DN = 0 to find A and B. If
p = q = 21 we can check that Di = i(N i).
It is usually best to solve Equation 14 (for i ) or Equation 15 (for Di ) by direct algebra of simultaneous equations.
Example
Suppose we toss a fair coin. How long do we have to wait in order to see 3 heads in a row?
Let En be the event of seeing HHH at time n. Then the En are not independent over n, since if the n 1-th toss
comes up T then En = En+1 = 0.
Figure 13: The probability transition diagram for the coin toss example. Note that the unlabelled arrows have
probability 21 .
In this example we have states {S, T, H, HH, HHH} and DHHH = 0 since it is an absorbing state. From
Equation 15 we get
DS =
1
1
DH + DT + 1,
2
2
DH =
1
1
DHH + DT + 2,
2
2
DHH =
1
1
DHHH + DT + 1,
2
2
DT =
1
1
DT + DH + 1.
2
2
Using the boundary condition DHHH = 0 and back-substitution gives DS = 14. We can extend this for requiring
N heads in a row. In this case we can show that DS = 2n+1 2.
22
Example
Suppose coins are flipped in sequence. Player 1 wins when HH occurs and Player 2 wins when T H occurs. The
sequence of coin tosses continues until one of these events occurs. What is the probability that Player 1 wins?
Figure 14: The probability transition diagram for this game. Note that each arrow has probability
values are the probabilities of reaching each winning state.
1
2
We can see from the diagram that the probability of Player 1 winning is 14 , since the only way for Player 1 to
win is if the first two coins are H. We can see this another way. Let i be the probability of reaching HH from
state i. We want to know S . The states
H, T, HH, T H} and the boundary conditions are
P of the system1 are {S,
HH = 1, T H = 0. We then solve i =
Tij j so S = 2 H + 12 T , T = 21 T + 12 T H , H = 12 HH + 12 T . Solving
these equations simultaneously gives S = 41 . We can also calculate Di , the mean time to finish given state i in a
similar way, noting the boundary conditions DHH = DT H = 0.
This idea generalises. Given any sequence of coin states, we can create a sequence that favourably beats the
original sequence by removing the last state and adding any state to the beginning of the sequence, for example
HT HHT T H loses to HHT HHT T more often than not. Proving the relative probabilities proceeds in a similar
method to above.