Fall 2013
Fall 2013
Fall 2013
6.265/15.070J
Lecture 1
Fall 2013
9/4/2013
1jd (xj
yj )2 ,
p
1jd (xj yj )
1
p
,p
P
also a metric space under (x, y) = nN 21n min(n (x, y), 1), where n is the
metric dened on C[0, n]. (Why did we have to use the min operator in the def
inition above?). We call T and uniform metric. We will also write Ix yIT
or Ix yI instead of T .
(1)
xA
and
lim sup wx () = 0.
0 xA
(2)
Proof. We only show that if A is compact then (1) and (2) hold. The converse is
established using a similar type of mathematical analysis/topology arguments.
We already know that if A is compact it needs to be closed. The assertion
(1) follows from Proposition 2. We now show (2). For any s, t [0, T ] we have
|y(t) y(s)| |y(t) x(t)| + |x(t) x(s)| + |x(s) y(s)| |x(t) x(s)| + 2Ix yI.
Similarly we show that |x(t) x(s)| |y(t) y(s)| + 2Ix yI. Therefore for
every > 0.
|wx () wy ()| < 2Ix yI.
(3)
(4)
Suppose A is compact but (4) does not hold. Then we can nd a subsequence
xni A, i 1 such that wxni (1/ni ) c for some c > 0. Since A is compact
then there is further subsequence of xni which converges to some x A. To
ease the notation we denote this subsequence again by xni . Thus Ixni xI 0.
From (3) we obtain
|wx (1/ni ) wxni (1/ni )| < 2Ix xni I < c/2
for all i larger than some i0 . This implies that
wx (1/ni ) c/2,
(5)
Convergence of mappings
Problem 7. Use Proposition 3 (or anything else useful) to prove that C[0, T ] is
complete.
That C[0, T ] has a dense countable subset can be shown via approximations
by polynomials with rational coefcients (we skip the details).
3
The space C[0, ) equipped with uniform metric will be convenient when we
discuss Brownian motion and its application later on in the course, since Brown
ian motion has continuous samples. Many important processes in practice, how
ever, including queueing processes, storage, manufacturing, supply chain, etc.
are not continuous, due to discrete quantities involved. As a result we need to
deal with probability concept on spaces of not necessarily continuous functions.
Denote by D[0, ) the space of all functions x on [0, ) taking values in R
or in general any metric space (S, ), such that x is right-continuous and has left
limits. Namely, for every t0 , limtt0 f (t), limtt0 f (t) exist, and limtt0 f (t) =
f (t0 ). As an example, think about a process describing the number of customers
in a branch of a bank. This process is described as a piece-wise constant func
tion. We adopt a convention that at a moment when a customer arrives/departs,
the number of customers is identied with the number of customers right af
ter arrival/departure. This makes the process right-continuous. It also has leftlimits, since it is piece-wise constant.
Similarly, dene D[0, T ] to be the space of right-continuous functions on
[0, T ] with left limits. We will right shortly RCLL. On D[0, T ] and D[0, ) we
would like to dene a metric which measures some proximity between the func
tions (processes). We can try to use the uniform metric again. Let us consider
the following two processes x, y D[0, T ]. Fix , [0, T ) and > 0 such that
+ < T and dene x(z) = 1{z }, y(z) = 1{z + }. We see that x
and y coincide everywhere except for a small interval [, + ). It makes sense
to assume that these processes are close to each other. Yet Ix yIT = 1.
Thus uniform metric is inadequate. For this reason Skorohod introduce the so
called Skorohod metric. Before we dene Skorohod metric let us discuss the
idea behind it. The problem with uniform metric was that the two processes x, y
described above where close to each other in a sense that one is a perturbed ver
sion of the other, where the amount of perturbation is . In particular, consider
the following piece-wise linear function : [0, T ] [0, T ] given by
(t) =
+ t,
1
1 (t
t [0, + ];
), t [ + , T ].
7
We see that x((t)) = y(t). In other words, we rescaled the axis [0, T ] by
a small amount and made y close to (in fact identical to) x. This motivates the
following denition. From here on we use the following notations: x y stands
for min(x, y) and x y stands for max(x, y)
Denition 5. Let be the space of strictly increasing continuous functions
from [0, T ] onto [0, T ]. A Skorohod metric on D[0, T ] is dened by
s (x, y) = inf I II Ix y)I ,
References
[1] P. Billingsley, Convergence of probability measures, Wiley-Interscience
publication, 1999.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
9/9/2013
Preliminary notes
X1 + . . . + Xn
| > E) 0,
n
as n .
But how quickly does this convergence to zero occur? We can try to use Cheby
shev inequality which says
P(|
Var(X1 )
X1 + . . . + Xn
| > E)
.
n
nE2
P(
1in Xi
> a) = P(
Xi > na)
1in
= P(e
1in
Xi
E[e
1in
Xi
> ena )
]
Markov inequality
ena
E[ i eXi ]
,
=
(ea )n
But recall that Xi s are i.i.d. Therefore E[
obtain an upper bound
1in Xi
P(
ie
> a)
Xi ]
E[eX1 ]
ea
n
.
(1)
Of course this bound is meaningful only if the ratio E[eX1 ]/ea is less than
unity. We recognize E[eX1 ] as the moment generating function of X1 and de
note it by M (). For the bound to be useful, we need E[eX1 ] to be at least
nite. If we could show that this ratio is less than unity, we would be done
exponentially fast decay of the probability would be established.
Similarly, suppose we want to estimate
P(
1in Xi
< a),
1in Xi
M () n
,
ea
2
M () =
0
ex ex dx
e()x dx.
1 ()x
When < this integral is equal to
e
= 1/( ). But
0
when , the integral is innite. Thus the exp. moment generating
function is nite iff < and is M () = /( ). In this case the
domain of the moment generating function is D(M ) = (, ).
(x)2
2
1
=e2
e 2 dx
2
Introducing change of variables y = x we obtain that the integral
is equal to
1
2
y
2
e
em
m=0
(e )m
m!
m=0
= ee
m
e
m!
P
m
(where we use the formula m0 tm! = et ). Thus again D(M ) = R.
This again has to do with the fact that m /m! decays at the rate similar to
1/m! which is faster then any exponential growth rate em .
exp(X) exp(0 X)
.
0
d
Since d
exp(x) = x exp(x), then almost surely Y X exp(0 X), as
0 . Thus to establish the claim it sufces to show that convergence
of expectations holds as well, namely lim0 E[Y ] = E[X exp(0 X)], and
E[X exp(0 X)] < . For this purpose we will use the Dominated Convergence
Theorem. Namely, we will identify a random variable Z such that |Y | Z al
most surely in some interval (0 E, 0 + E), and E[Z] < .
Fix E > 0 small enough so that (0 E, 0 + E) (1 , 2 ). Let Z =
E1 exp(0 X + E|X|). Using the Taylor expansion of exp() function, for every
(0 E, 0 + E), we have
1
1
1
Y = exp(0 X) X + ( 0 )X 2 + ( 0 )2 X 3 + + ( 0 )n1 X n + ,
2!
3!
n!
which gives
1
1
|Y | exp(0 X) |X| + ( 0 )|X|2 + + ( 0 )n1 |X|n +
2!
n!
1
1 n1
2
n
exp(0 X) |X| + E|X| + + E
|X| +
2!
n!
= exp(0 X)E1 (exp(E|X|) 1)
exp(0 X)E1 exp(E|X|)
= Z.
It remains to show that E[Z] < . We have
E[Z] = E1 E[exp(0 X + EX)1{X 0}] + E1 E[exp(0 X EX)1{X < 0}]
E1 E[exp(0 X + EX)] + E1 E[exp(0 X EX)]
= E1 M (0 + E) + E1 M (0 E)
< ,
since E was chosen so that (0 E, 0 + E) (1 , 2 ) D(M ). This completes
the proof of the proposition.
Problem 1.
(a) Establish part (a) of Proposition 1.
(b) Construct an example of a random variable for which the corresponding
M () n
1in Xi
P(
> a)
.
n
ea
Similarly, if a < , then there exists < 0, such that M ()/ea < 1 and
P
M () n
1in Xi
P(
< a)
.
n
ea
How small can we make the ratio M ()/ exp(a)? We have some freedom
in choosing as long as E[eX1 ] is nite. So we could try to nd which
minimizes the ratio M ()/ea . This is what we will do in the rest of the lecture.
The surprising conclusion of the large deviations theory is very often that such
a minimizing value exists and is tight. Namely it provides the correct decay
rate! In this case we will be able to say
P
1in Xi
P(
> a) exp(I(a, )n),
n
where I(a, ) = log M ( )/e
.
6
Legendre transforms
X
0kn1
(1.2n)k 1.2n
e
.
k!
a2
2
)
2
Xi
a2
n n
imated by the value of the function itself at a, namely 2
e 2 . This is
consistent with the value given by the large deviations theory. Simply the
n
lower order magnitude term 2
disappears in the approximation on the
log scale.
Xi > an) =
1in
X (n)m
en .
m!
m>an
But again it is hard to infer a more explicit rate of decay using this expres
sion
5 Additional reading materials
Chapter 0 of [2]. This is non-technical introduction to the eld which de
scribes motivation and various applications of the large deviations theory.
Soft reading.
Chapter 2.2 of [1].
References
[1] A. Dembo and O. Zeitouni, Large deviations techniques and applications,
Springer, 1998.
[2] A. Shwartz and A. Weiss, Large deviations for performance analysis, Chap
man and Hall, 1995.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
9/11/2013
Content.
1. Cramers Theorem.
2. Rate function and properties.
3. Change of measure technique.
Cramers Theorem
log P(Sn a)
I(a),
n
and we have indicated that the bound is tight. Namely, ideally we would like to
establish the limit
lim sup
n
log P(Sn a)
= I(a),
n
(1)
but unfortunately this statement is not precisely correct. Consider the following
example. Let X be an integer-valued random variable, and A = { m
p : m
Z, p is odd prime.}. Then for prime n, we have P(Sn A) = 1; but for n = 2k ,
n A)
we have P (Sn A) = 0. As a result, the limit limn log P (S
in this case
n
does not exist.
The sense in which the identity (1) is given by the Cramers Theorem below.
Theorem 1 (Cramers Theorem). Given a sequence of i.i.d. real valued ran
dom variables Xi , i 1 with a common moment generating function M () =
E[exp(X1 )] the following holds:
(a) For any closed set F R,
lim sup
n
1
log P(Sn F ) inf I(x),
xF
n
1
log P(Sn U ) inf I(x).
xU
n
We will prove the theorem only for the special case when D(M ) = R
(namely, the MGF is nite everywhere) and when the support of X is entire
R. Namely for every K > 0, P(X > K) > 0 and P(X < K) > 0. For
example a Gaussian random variable satises this property.
To see the power of the theorem, let us apply it to the tail of Sn . In the
following section we will establish that I(x) is a non-decreasing function on the
interval [, ). Furthermore, we will establish that if it is nite in some interval
containing x it is also continuous at x. Thus x a and suppose I is nite in
an interval containing a. Taking F to be the closed set [a, ) with a > , we
obtain from the
lim sup
n
1
log P(Sn [a, )) min I(x)
xa
n
= I(a).
1
1
log P(Sn [a, )) lim inf log P(Sn (a, ))
n
n
n
inf I(x)
x>a
= I(a).
2
Thus in this special case indeed the large deviations limit exists:
1
log P(Sn a) = I(a).
n n
lim
The limit is insensitive to whether the inequality is strict, in the sense that we
also have
1
log P(Sn > a) = I(a).
n n
lim
M (0 )
.
M (0 )
(2)
Proof of part (a). Convexity is due to the fact that I(x) is point-wise supremum.
Precisely, consider (0, 1)
I(x + (1 )y) = sup[(x + (1 x)y) log M ()]
=I(x) + (1 )I(y).
This establishes the convexity. Now since M (0) = 1 then I(x) 0 x
log M (0) = 0 and the non-negativity is established. By Jensens inequality, we
have that
M () = E[exp(X1 )] exp(E[X1 ]) = exp().
3
1
lim inf log
exp(x) dP (x)
K
1
lim inf log (exp(K)P([K, ]))
1
= K + lim inf log P([K, ])
= K (since supp(X1 ) = R, we have P([K, )) > 0.)
Since K is arbitrary,
lim inf
1
log M () =
Similarly,
1
lim inf log M () =
Therefore,
lim x log M () = lim (x
1
log M ())
||
1
log P (Sn + ) I(+ )
n
(3)
1
log P (Sn ) I( )
n
(4)
Similarly, we have
lim sup
n
(5)
(6)
1
log(xn + yn ) inf I(x).
xF
n
(you are asked to establish the last implication as an exercise). We have estab
lished
1
lim sup log P (Sn F ) inf I(x)
(7)
xF
n
n
Proof of the upper bound in statement (a) is complete.
Proof of Cramers Theorem. Part (b). Fix an open set U R. Fix E > 0 and
nd y such that I(y) inf xU ((x). It is sufcient to show that
lim inf
n
1
P (Sn U ) I(y),
n
(8)
I(y) = 0 y log M (0 ).
Such 0 exists by Proposition 1. Since y > , then again by Proposition 1 we
may assume 0 0.
We will use the change-of-measure technique to obtain the cover bound. For
this, consider a new random variable let X0 be a random variable dened by
Z z
1
exp(0 x) dP (x)
P(X0 z) =
M (0 )
Now,
E[X0 ] =
=
1
M (0 )
M (0 )
x exp(0 x) dP (x)
M (0 )
= y,
6
where the second equality was established in the previous lecture, and the last
equality follows by the choice of 0 and Proposition 1. Since U is open we can
nd > 0 be small enough so that (y , y + ) U . Thus, we have
P(Sn U )
P(Sn (y , y + ))
Z
=
dP (x1 ) dP (xn )
1
|n
xi y|<
Z
=
1
|n
xi )M n (0 )
exp(0
xi y|<
1in
(9)
Since 0 is non-negative, we obtain a bound
P(Sn (y , y + ))
n
exp(0 yn 0 n)M (0 )
Z
1
|n
1
|n
xi y|< 1in
We obtain
lim inf n1 log P(Sn U ) 0 y 0 + log M (0 )
n
= I(y) 0 .
Recalling that 0 depends on y only and sending to zero, we obtain (8). This
completes the proof of part (b).
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
9/16/2013
Content.
1. Insurance problem
2. Queueing problem
3. Buffer overow probability
N
)
(C An ) 0)
n=1
= P(max
N
N
)
(An C) S0 )
n=1
L
If E[A1 ] C, we have P(maxN N
n=1 (An C) S0 ) = 1. Thus, the
interesting case is E[A1 ] < C (negative drift), and the goal is to determine the
starting capital S0 such that
P(max
N
N
)
(An C) S0 ) .
n=1
k=1
n
)
max
1nN 1
(AN k C) , 0
k=1
n
)
(Ak C), 0
k=1
Our goal is to design the size of the queue length storage (buffer) B, so that
the likelihood that the number of packets in the queue exceeds B is small. In
communication application this is important since every packet not tting into
the buffer is dropped. Thus the goal is to nd buffer size B > 0 such that
P(Q B) P(max
n1
n
)
(Ak C) B)
k=1
k=1
)
1
log P(max
(Ak C) B) = sup{ > 0 : M () < exp(C)}
n1
B B
lim
k=1
d
d
M ()
= E[A],
exp(C)
=C
d
d
=0
=0
Since E[An ] < C, then there exists small enough so that M () < exp(C),
M ( )
exp( C )
n
)
k=1
when B is large. Thus given select B such that exp( B) , and we can
set B = 1 log 1 .
3
a
exp(a) 1
exp(t)a1 dt =
M () =
0
Then
exp(a) 1
exp(2)}
a
Case 1: a = 3, we have = sup{ > 0 : exp(3) 1 3 exp(2)}, i.e.
= 1.54078.
Case 2: a = 4, we have that { > 0 : exp(3) 1 3 exp(2)} = since
E[A] = 2 = C.
and thus =
, which implies that P(maxn k=1 (Ak C) B) = 0 by
theorem 1.
sup{ > 0 : M () exp(C)} = sup{ > 0 :
Proof of Theorem 1. We will rst prove an upper bound and then a lower bound.
Combining them yields the result. For the upper bound, we have that
n
n
)
)
)
P(max
P(
(Ak C) B)
(Ak C) B)
n
k=1
n=1
)
n=1
k=1
1
)
B
P(
Ak C + )
n
k=1
exp(n((C +
n=1
= exp(B)
n1
Fix any such that C log M (), the inequality above gives
)
exp(B)
exp(n(C log M ()))
n0
)
1
1
log P(max
(Ak C) B) + log([1 exp((C log M ()))]1 )
n
B
k=1
)
1
k=1
n
)
k=1
n
)
(Ak C) B) P( (Ak C) B), n
k=1
k=1
k=1
Bt
Then, we have
)
k=1
Bt
(Ak C)
t
)
1
lim inf log P(max
(Ak C) B)
n
B
B
k=1
Bt
)
1
Bt
lim inf log P
(Ak C)
B
B
t
k=1
Bt
)
t
Bt
= lim inf
log P
(Ak C)
B
Bt
t
k=1
= t lim inf
n
= t lim inf
n
t inf
n
)
1
log P(
n
k=1
1
1
log P(
n
n
x>C+ 1t
(Ak C)
k=1
n
)
t
1
Ak C + )
t
inf t inf
t>0 x>C+ 1
t
We claim that
(1)
1
I(x) = inf tI(C + )
t>0
t
x>C+ 1t
inf t inf
t>0
1
t
< x ,
x>C+ 1t
1
I(x) = I(C + )
t
)
1
1
lim inf log P(max
(Ak C) B) inf tI(C + )
n
t>0
B B
t
k=1
Exercise in HW 2 shows that sup{ > 0 : M () < exp(C)} = inf t>0 tI(C +
1
t ).
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
9/16/2013
Content.
1. Large Deviations in may dimensions
2. Gartner-Ellis Theorem
3. Large Deviations for Markov chains
1 Large Deviations in Rd
Most of the developments in this lecture follows Dembo and Zeitouni
p book [1].
d
d
Let Xn R be i.i.d. random variables and A R . Let Sn = 1in Xn
The large deviations question is now regarding the existence of the limit
1
Sn
log P(
A).
n n
n
lim
1
Sn
log P(
F ) inf I(x)
xF
n
n
lim sup
n
lim inf
n
1
Sn
log P(
U ) inf I(x)
xU
n
n
1
Unfortunately, the theorem does not hold in full generality, and the addi
tional condition such as M () < for all is needed. Known counterex
amples are somewhat involved and can be found in a paper by Dinwoodie [2]
which builds on an earlier work of Slaby [5]. The difculty arises that there is
no longer the notion of monotonicity of I(x) as a function of the vector x. This
is not the tightest condition and more general conditions are possible, see [1].
The proof of the theorem is skipped and can be found in [1].
d
Let us consider an example of application of Theorem 2. Let Xn = N (0, )
where d = 2 and
1 12
=
, F = {(x1 , x2 ) : 2x1 + x2 5}.
1
2 1
Goal: prove that the limit limn
By the upper bound part,
lim sup
n
1
n
1
Sn
log P(
F ) inf I(x)
xF
n
n
We have
M () = E[exp((, X))]
d
(, X) = N (0, T )
= N (0, 12 + 1 2 + 22 ),
where = (1 , 2 ). Thus
1
M () = exp( (12 + 1 2 + 22 ))
2
1
I(x) = sup(1 x1 + 2 x2 (12 + 1 2 + 22 ))
2
1 ,2
Let
1
g(1 , 2 ) = 1 x1 + 2 x2 (12 + 1 2 + 22 ).
2
From
d
dj g(1 , 2 )
= 0, we have that
1
x1 1 2 = 0,
2
1
x2 2 1 = 0,
2
2
4
2
1 = x1 x2 ,
3
3
Then
4
2
2 = x2 x1
3
3
2
I(x1 , x2 ) = (x21 + x22 x1 x2 ).
3
So we need to nd
2 2
(x + x2 x1 x2 )
3 1
s.t. 2x1 + x2 5 (x F )
inf
x1 ,x2
This becomes a non-linear optimization problem. Applying the Karush-KuhnTucker condition, we obtain
min f
s.t. g 0
v f + v g = 0,
(1)
g = 0
< 0.
(2)
which gives
4
2
4
2
( x1 x2 , x2 x1 ) + (2, 1) = 0, (2x1 + x2 5) = 0.
3
3
3
3
If 2x1 + x2 5 = 0, then = 0 and further x1 = x2 = 0. But this violates
2x1 + x2 5. So we have 2x1 + x2 5 = 0 which implies x2 = 5 2x1 .
Thus, we have a one dimensional unconstrained minimization problem:
min
which gives x1 =
10
11 ,
2 2 2
x + (5 2x1 )2 x1 (5 2x1 )
3 1 3
x2 =
35
11
lim sup
n
1
Sn
log P(
F)
n
n
1
Sn
(3)
(4)
inf
I(x1 , x2 )
Combining, we obtain
lim
n
1
Sn
1
Sn
log P(
F ) = lim inf log P(
F)
n
n
n
n
n
1
Sn
= lim sup log P(
F)
n
n
n
= 5.37.
Gartner-Ellis Theorem
The Gartner-Ellis Theorem deals with large deviations event when the sequence
Xn is not necessarily independent. One immediate application of this theorem
is large deviations for Markov chains, which we will discuss in the following
section.
Let Xn be a sequence of not p
necessarily independent random variables in
Rd . Then in general for Sn =
1kn Xk the identity E[exp((, Sn ))] =
n
(E[exp((, X1 ))]) does not hold. Nevertheless there exists a broad set of con
ditions under which the large deviations bounds hold. Thus consider a general
sequence of random variable Yn Rd which stands for (1/n)Sn in the i.i.d.
case. Let n () = n1 log E[exp(n(, Yn ))]. Note that for the i.i.d. case
1
log E[exp(n(, n1 Sn ))]
n
1
= log M n ()
n
= log M ()
n () =
(5)
takes place for some limiting function , then under certain additional technical
assumptions, the large deviations principle holds for rate function
I(x) sup((, x) (x)).
(6)
Formally,
Theorem 2. Given a sequence of random variables Yn , suppose the limit ()
(5) exists for all Rd . Furthermore, suppose () is nite and differentiable
4
everywhere on Rd . Then the following large deviations bounds hold for I dened
by (6)
lim sup
n
lim inf
n
1
log P(Yn F ) inf I(x),
xF
n
1
log P(Yn U ) inf I(x),
xU
n
As for Theorem 1, this is not the most general version of the theorem. The
version above is established as exercise 2.3.20 in [1]. More general versions can
be found there as well.
Can we use Chernoff type argument to get an upper bound? For > 0, we
have
1
P( Yn a) = P(exp(Yn ) exp(na))
n
exp(n(a n ()))
So we can get an upper bound
sup(a n (a))
0
In the i.i.d. case we used the fact that sup0 (a M ()) = sup (a M ())
when a > = E[X]. But now we are dealing with the multidimensional case
where such an identity does not make sense.
3
1. > 0 is real.
2. For every e-value of B, || , where || is the norm of (possibly
complex) .
3. The left and right e-vectors of B denoted by and corresponding to
, are unique up to a constant multiple and have strictly positive compo
nents.
This theorem can be found in many books on linear algebra, for example [4].
The following corollary for the Perron-Frobenious Theorem shows that the
essentially the rate of growth of the sequence of matrices B n is n . Specically,
Corollary 1. For every vector = (j , 1 j N ) with strictly positive
elements, the following holds
1
1
n
n
Bi,j
Bj,i
lim log
j = lim log
j = log .
n n
n n
1jN
1jN
Therefore,
lim
n
1
log
n
n
Bi,j
j = lim
n
1jN
1
log
n
n
Bi,j
j
1jN
1
= lim log(n i )
n n
= log .
The second identity is established similarly.
Now, given a Markov chain Xn , a function f : Rd and vector Rd ,
consider a modied matrix P = (e(,f (j)) Pi,j , 1 i, j N ). Then P is an
irreducible non-negative matrix, since P is such a matrix. Let (P ) denote its
Perron-Frobenious eigenvalue.
p
Theorem 4. The sequence n1 Sn = n1 p1ik f (Xn ) satises the large devia
tions bounds with rate function I(x) = Rd ((, x)log (P )). Specically,
6
for every state i0 , closed set F Rd and every open set U Rd , the fol
lowing holds:
1
log P(n1 Sn F |X0 = i0 ) inf I(x),
xF
n
n
1
lim inf log P(n1 Sn U |X0 = i0 ) inf I(x).
n
xU
n
lim sup
1kn
= log
Pn (i0 , j) ,
1jN
where Pn (i, j) denotes the i, j-th entry of the matrix Pn . Letting j = 1 and
applying Corollary 1, we obtain
lim n () = log (P ).
n
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
9/23/2013
Content.
1. A heuristic construction of a Brownian motion from a random walk.
2. Denition and basic properties of a Brownian motion.
Historical notes
1765 Jan Ingenhousz observations of carbon dust in alcohol.
1828 Robert Brown observed that pollen grains suspended in water per
form a continual swarming motion.
motion.
The developments in this lecture follow closely the book by Resnick [3].
In this section we provide a heuristic construction of a Brownian motion
from a random walk. The derivation below is not a proof. We will provide a rig
orous construction of a Brownian motion when we study the weak convergence
theory.
The large deviations theory predicts exponential decay of probabilities P( 1in Xn >
an) when a > = E[X1 ] and E[eX1 ] are nite. Naturally, the decay will be
1
slower the closer a is to . We considered only the case when a was a constant.
But what if a is a function of n: a = an ? The Central Limit Theorem tells us
that the decay disappears when an 1n . Recall
Theorem 1 (CLT). Given an i.i.d. sequence (Xn , n 1) with E[X1 ] =
, var[X1 ] = 2 . For every constant a
P
a
t2
1
1in Xi n
e 2 dt.
lim P(
a) =
n
n
2
P
= 1in (Xi ). For
Now let us look at a sequence of partial sums Sn P
simplicity assume = 0 so that we look at Sn =
1in Xi . Can we say
anything about Sn as a function of n? In fact, let us make it a function
o of a real
Xi
J
variable t R+ and rescale it by n as follows. Dene Bn (t) = 1int
n
for every t 0.
Denote by N (, 2 ) the distribution function of a normal r.v. with mean
and variance 2 .
1. For every xed 0 s < t, by CLT we have the distribution of
P
lnsJ<ilntJ Xi
lntJ lnsJ
writing
P
Bn (t) Bn (s)
st<int Xi
=
nt ns
ts
we obtain that Bn (t)Bn (s) converges in distribution to N (0, 2 (ts)).
o
nt1 <int2
1int1
Xi
Xi
4. Bn (0) = 0 by denition.
Denition
=
2ik
zero: B(0)
= x for some value x = 0. We may dene this process as
x + B(t), where B is Brownian motion.
Problem 1.
1. Let be the space of all (not necessarily continuous) functions : R+
R.
i Construct an example of a stochastic process in which satises
conditions (a)-(c) of the Brownian motion, but such that every path
is almost surely discontinuous.
i Construct an example of a stochastic process in which satises
conditions (a)-(c) of the Brownian motion, but such that every path
is almost surely discontinuous in every point t [0, 1].
HINT: work with the Brownian motion.
2. Suppose B(t) is a stochastic process dened on the set of all (not neces
sarily continuous) functions x : R+ R satisfying properties (a)-(c) of
Denition 1. Prove that for every t 0, limn B(t+ n1 ) = B(t) almost
surely.
Properties
2(ti+1 ti )
(xi+1 xi )2
2(ti+1 ti )
= E[B(t)B(s)]
= E[(B(s) + B(t) B(s))B(s)]
b
the Brownian motion. Therefore tB( 1t ) sB( 1s ) is zero mean Gaussian with
variance
1 1
1
s2 ( ) + (t s)2 ( ) = t s.
s
t
t
This proves (c).
We now return to (b). Take any t1 < t2 < t3 . We established in (c) that all
the differences B (1) (t2 ) B (1) (t1 ), B (1) (t3 ) B (1) (t2 ), B (1) (t3 ) B (1) (t1 ) =
B (1) (t3 ) B (1) (t2 ) + B (1) (t2 ) B (1) (t1 ) are zero mean Gaussian with vari
ances t2 t1 , t3 t2 and t3 t1 respectively. In particular the variance of
B (1) (t3 ) B (1) (t1 ) is the sum of the variances of B (1) (t3 ) B (1) (t2 ) and
B (1) (t2 ) B (1) (t1 ). This implies that the covariance of the summands is zero.
Moreover, from part (b) it is not difcult to establish that B (1) (t3 ) B (1) (t2 )
and B (1) (t2 ) B (1) (t1 ) are jointly Gaussian. Recall, that two jointly Gaussian
random variables are independent if and only if their covariance is zero.
It remains to prove the continuity at zero of B (1) (t). We need to show the
continuity almost surely, so that the zero measure set corresponding to the sam
ples C[0, ) where the continuity does not hold, can be thrown away.
Thus, we need to show that the probability measure of the set
1
A = { C[0, ) : lim tB( , ) = 0}
t0
t
is equal to unity.
We will use Strong Law of Large Numbers (SLLN). First set t = 1/n
and consider tB( 1t ) = B (n)/n. Because of the independent Gaussian incre
ments property B(n) = 1in (B(i) B(i 1)) is the sum of independent
i.i.d. standard normal random variables. By SLLN we have then B(n)/n
E[B(1) B(0)] = 0 a.s. We showed convergence to zero along the sequence
t = 1/n almost surely. Now we need to take care of the other values of t, or
equivalently, values s [n, n + 1). For any such s we have
|
B(s) B(n)
B(s) B(n)
B(n) B(n)
||
|+|
|
s
n
s
s
s
n
1 1
1
|B(n)|| | +
sup |B(s) B(n)|
s n
n nsn+1
|B(n)| 1
+
sup |B(s) B(n)|.
n2
n nsn+1
(1)
a.s. Now consider the second term and set Zn = supnsn+1 |B(s) B(n)|.
We claim that for every E > 0,
P(Zn /n > E i.o.) = P( C[0, ) : Zn ()/n > E i.o.) = 0
(2)
where i.o. stands for innitely often. Suppose (2) was indeed the case. The
equality means that for almost all samples the inequality Zn ()/n > E hap
pens for at most nitely many n. This means exactly that for almost all (that
is a.s.) Zn ()/n 0 as n . Combining with (1) we would conclude that
a.s.
B(s) B(n)
| 0,
sup |
s
n
nsn+1
as n . Since we already know that B(n)/n 0 we would conclude that
a.s. lims B(s)/s = 0 and this means almost sure continuity of B (1) (t) at
zero.
It remains to show (2). We observe that due to the independent stationary
increments property, the distribution of Zn is the same as that of Z1 . This is
the distribution of the maximum of the absolute value of a standard Brownian
motion during the interval [0, 1]. In the following lecture we will show that this
maximum has nite expectation: E[|Z1 |] < . On the other hand
Z
Z (n+1)
0
E[|Z1 |] =
P(|Z1 | > x)dx =
P(|Z1 | > x)dx
0
n=0 n
n=0
0 |Z1 |
|Zn |
P(
> E) =
P(
> E)
n
n
n=1
Thus the sum on the left-hand side is nite. Now we use the Borel-Cantelli
Lemma to conclude that (2) indeed holds.
5
References
[1] P. Billingsley, Convergence of probability measures, Wiley-Interscience
publication, 1999.
[2] R. Durrett, Probability: theory and examples, Duxbury Press, second edi
tion, 1996.
[3] S. Resnick, Adventures in stochastic processes, Birkhuser Boston, Inc.,
1992.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
9/25/2013
Content.
1. Quick intro to stopping times
2. Reection principle
3. Brownian motion with drift
Stopping times are loosely speaking rules by which we interrupt the process
without looking at the process after it was interrupted. For example sell your
stock the rst time it hits $20 per share is a stopping rule. Whereas, sell your
stock one day before it hits $20 per share is not a stopping rule, since we do
not know the day (if any) when it hits this price.
Given a stochastic process {Xt }t0 with t Z+ or t R+ , a random variable
T is called a stopping time if for every time the event T t is completely
determined by the history {Xs }0st .
This is not a formal denition. The formal denition will be given later when
we study ltration. Then we will give the denition in terms of the underlying
(, F, P). For now, though, let us just adopt this loose denition.
2
for any given t. Surprisingly the resulting expression is very simple and follows
from one of the key properties of the Brownian motion the reection principle.
Given a > 0, dene
Ta = inf{t : B(t) = a}
the rst time when Brownian motion hits level a. When no such time exists
we dene Ta = , although we now show that it is nite almost surely.
Proposition 1. Ta < almost surely.
Proof. Note that if B hits some level b a almost surely, then by continuity
and since B(0) = 0, it hits level a almost surely. Therefore, it sufces to prove
that lim supt B(t) = almost surely. This in its own order will follow from
lim supn B(n) = almost surely.
Problem 1. Prove that lim supn |B(n)| = almost surely.
(1)
is also a Brownian motion, independent from B(t), t Ta . The only issue here
is that Ta is a random instance and the differential property was established for
xed times t. Turns out (we do not prove this) the differential property also holds
for a random time Ta , since it is a stopping time and is nite almost surely. The
rst is an immediate consequence of its denition: we can determine whether
Ta t by checking looking at the path B(u), 0 u t. The almost sure
niteness was established in Proposition 1. The property (1) is called the strong
independent increments property of the Brownian motion.
Theorem 1 (The reection principle). Given a standard Brownian motion
B(t), for every a 0
1
P(M (t) a) = 2P(B(t) a) = 2
2t
x2
e 2t dx.
Proof. We have
P(B(t) a) = P(B(t) a, M (t) a) + P(B(t) a, M (t) < a).
2
(2)
Note, however, that P(B(t) a, M (t) < a) = 0 since M (t) B(t). Now
1
2
since the Brownian motion satises P(B(t) 0) = 1/2 for every t. Applying
this identity, we obtain
1
P(B(t) a) = P(M (t) a).
2
This establishes the required identity (2).
We now establish the joint probability distribution of M (t) and B(t).
Proposition 2. For every a > 0, y 0
P(M (t) a, B(t) a y) = P(B(t) > a + y).
(3)
Proof. We have
P(B(t) > a + y) = P(B(t) > a + y, M (t) a) + P(B(t) > a + y, M (t) < a)
= P(B(t) > a + y, M (t) a)
= P(B(Ta + (t Ta )) a > y|M (t) a)P(M (t) a).
But since B(Ta + (t Ta )) a, by differential property is also a Brownian
motion, then, by symmetry
P(B(Ta + (t Ta )) a > y|M (t) a)
= P(B(Ta + (t Ta )) a < y|M (t) a)
= P(B(t) < a y|M (t) a).
We conclude
P(B(t) > a + y) = P(B(t) < a y|M (t) a)P(M (t) a)
= P(B(t) < a y, M (t) a).
E[eTa ] = e
2a
x2
a
e 2t dx = 2(1 N ( )).
t
a2
1
e 2t .
2
t
3
2
Therefore
E[e
Ta
]=
et
t
3
2
a2
e 2t dt.
2
Computing
this integral is a boring exercise in calculus. We just state the result
which is e 2a .
3
When < 0 this means that M () supt0 B (t) < almost surely. On
the other hand M () 0 (why?).
Our goal now is to compute the probability distribution of M ().
4
2||x
The direct proof of this result can be found in Section 6.8 of Resnicks
book [3]. The proof consists of two parts. We rst show that the distribution
of M () is exponential. Then we compute its parameter.
Later on we will study an alternative proof based on the optional stopping
theory for martingale processes.
4 Additional reading materials
Sections 6.5 and 6.8 from Chapter 6 of Resnicks book Adventures in
Stochastic Processes.
Sections 7.3 and 7.4 in Durrett [2].
Billingsley [1], Section 9.
References
[1] P. Billingsley, Convergence of probability measures, Wiley-Interscience
publication, 1999.
[2] R. Durrett, Probability: theory and examples, Duxbury Press, second edi
tion, 1996.
[3] S. Resnick, Adventures in stochastic processes, Birkhuser Boston, Inc.,
1992.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
9/30/2013
Content.
1. Unbounded variation of a Brownian motion.
2. Bounded quadratic variation of a Brownian motion.
Any sequence of values 0 < t0 < t1 < < tn < T is called a partition =
(t0 , . . . , tn ) of an interval [0, T ]. Given a continuous function f : [0, T ] R
its total variation is dened to be
|f (tk ) f (tk1 )|,
LV (f ) sup
1kn
where the supremum is taken over all possible partitions of the interval [0, T ]
for all n. A function f is dened to have bounded variation if its total variation
is nite.
Theorem 1. Almost surely no path of a Brownian motion has bounded variation
for every T 0. Namely, for every T
P( : LV (B()) < ) = 0.
The main tool is to use the following result from real analysis, which we do
not prove: if a function f has bounded variation on [0, T ] then it is differentiable
almost everywhere on [0, T ]. We will now show that quite the opposite is true.
Proposition 1. Brownian motion is almost surely nowhere differentiable. Specif
ically,
P( t 0 : lim sup |
h0
B(t + h) B(t)
| = ) = 1.
h
1
Proof. Fix T > 0, M > 0 and consider A(M, T ) C[0, ) the set of all
paths C[0, ) such that there exists at least one point t [0, T ] such that
lim sup |
h0
B(t + h) B(t)
| M.
h
(1)
A(M, T ) n An .
(2)
and
Find k = max{j :
Yk = max{|B(
j
n
t}. Dene
k+2
k+1
k+1
k
k
k1
) B(
)|, |B(
) B( )|, |B( ) B(
)|}.
n
n
n
n
n
n
In other words, consider the maximum increment of the Brownian motion over
these three short intervals. We claim that Yk 6M/n for every path An .
To prove the bound required bound on Yk we rst consider
|B(
k+2
k+1
k+2
k+1
) B(
)| |B(
) B(t)| + |B(t) B(
)|
n
n
n
n
2
1
2M + 2M
n
n
6M
.
n
0. Combining this with (1), we conclude P(An ) = 0. Combining with (2), this
will imply that P(A(M, T )) = 0 and we will be done.
Now to obtain the required bound on P(Bn ) we note that, since the incre
ments of a Brownian motion are independent and identically distributed, then
X
P(Bn )
P(Yk 6M/n)
0kT n
3
2
2
1
1
T nP(max{|B( ) B( )|, |B( ) B( )|, |B( ) B(0)|} 6M/n)
n
n
n
n
n
1
(3)
= T n[P(|B( )| 6M/n)]3 .
n
Finally, we just analyze this probability. We have
1
P(|B( )| 6M/n) = P(|B(1)| 6M/ n).
n
Since B(1)which has the standard normal distribution, its density at any
point is
at most 1/ 2, then we have that this probability is at a most (2(6M )/ 2n).
We conclude that the expression in (3) is, ignoring constants, O(n(1/ n)3 ) =
(4)
(5)
In words, the standard Brownian motion has almost surely nite quadratic vari
ation which is equal to T .
Proof. We will use the following fact. Let Z be a standard Normal random
variable. Then E[Z 4 ] = 3 (cute, isnt it?). The proof can be obtained using
Laplace transforms of Normal random variables or integration by parts, and we
skip the details.
Let i = (B(ti )B(ti1 ))2 (ti ti1 ). Then, using the independent Gaussian
increments property of Brownian motion, i is a sequence of independent zero
mean random variables. We have
X
Q(i ) T =
i .
1in
1in
(ti ti1 )2 .
1in
=2
1in
1in
(ti ti1 )2
1in
2(i )
(ti ti1 )
1in
= 2(i )T.
Now if limi (i ) = 0, then the bound converges to zero as well. This estab
lishes the rst part of the theorem.
To prove the second part identify a sequence Ei 0 such that (i ) =
Ei /i2 . By assumption, such a sequence exists. By Markovs inequality, this is
4
bounded by
P((Q(i ) T )2 > 2Ei )
E(Q(i ) T )2
2(i )T
T
= 2
2Ei
2Ei
i
(6)
Since i iT2 < , then the sum of probabilities in (6) is nite. Then apply
ing the Borel-Cantelli Lemma, the probability that (Q(i ) T )2 > 2Ei for
innitely many i is zero. Since Ei 0, this exactly means that almost surely,
limi Q(i ) = T .
3 Additional reading materials
Sections 6.11 and 6.12 of Resnicks [1] chapter 6 in the book.
References
[1] S. Resnick, Adventures in stochastic processes, Birkhuser Boston, Inc.,
1992.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
10/2/2013
Content.
1. Conditional expectations
2. Martingales, sub-martingales and super-martingales
Conditional Expectations
1.1
Denition
Simple properties
Consider the trivial case when G = {, }. We claim that the constant
value c = E[X] is E[X|G]. Indeed, any constant function is measurable
with respect to any -eld So (a) holds. For (b), we have E[X1{}] =
E[X] = c and E[c1{}] = E[c] = c; and E[X1{}] = 0 and E[c1{}] =
0.
As the other extreme, suppose G = F. Then we claim that X = E[X|G].
The condition (b) trivially holds. The condition (a) also holds because of
the equality between two -elds.
Let us go back to our example of conditional expectation with respect to
an event A . Consider the associated -elds G = {, A, Ac , }
(we established in the rst lecture that this is indeed a -eld). Consider
a random variable Y : R dened as
Y () = E[X|A] =
E[X1{A}]
P(A)
Y () = E[X|Ac ] =
E[X1{Ac }]
P(Ac )
c1
for A and
c2
Proof of existence
Problem 1. Prove that Y is unique up-to measure zero. That is if Y ' is also RN
derivative, then Y = Y ' a.s. w.r.t. P1 and hence P2 .
We now use this theorem to establish the existence of conditional expec
tations. Thus we have G F, P is a probability measure on F and X is
measurable with respect to F. We will only consider the case X 0 such that
E[X] < . We also assume that X is not constant, so that E[X] > 0. Consider
a new probability measure P2 on G dened as follows:
P2 (A) =
EP [X1{A}]
, A G,
EP [X]
Properties
Conditional Jensens inequality. Let be a convex function and E[|X|], E[|(X)|] <
. Then (E[X|G]) E[(X)|G].
Proof. We use the following representation of a convex function, which we do
not prove (see Durrett [1]). Let
A = {(a, b) Q : ax + b (x), x}.
Then (x) = sup{ax + b : (a, b) A}.
Now we prove the Jensens inequality. For any pair of rationals a, b
Q satisfying the bound above, we have, by monotonicity that E[(X)|G]
aE[X|G] + b, a.s., implying E[(X)|G] sup{aE[X|G] + b : (a, b) A} =
(E[X|G]) a.s.
Tower property. Suppose G1 G2 F. Then E[E[X|G1 ]|G2 ] = E[X|G1 ] and
E[E[X|G2 ]|G1 ] = E[X|G1 ]. That is the smaller eld wins.
Proof. By denition E[X|G1 ] is G1 measurable. Therefore it is G2 measurable.
Then the rst equality follows from the fact E[X|G] = X, when X G, which
we established earlier. Now x any A G1 . Denote E[X|G1 ] by Y1 and E[X|G2 ]
by Y2 . Then Y1 G1 , Y2 G2 . Then
E[Y1 1{A}] = E[X1{A}],
simply by the denition of Y1 = E[X|G1 ]. On the other hand, we also have
A G2 . Therefore
E[X1{A}] = E[Y2 1{A}].
Combining the two equalities we see that E[Y2 1{A}] = E[Y1 1{A}] for every
A G1 . Therefore, E[Y2 |G1 ] = Y1 , which is the desired result.
An important special case is when G1 is a trivial -eld {, }. We obtain
that for every eld G
E[E[X|G]] = E[X].
3.1
Denition
Simple examples
1. Random walk. Let Xn , n = 1, . . . be an i.i.d. sequence with mean
and variancei
2 < . Let Fn be the Borel -algebra on Rn . Then
Sn n = 0kn Xk n is a martingale. Indeed Sn is adapted to
Fn , and
E[Sn+1 (n + 1)|Fn ] = E[Xn+1 + Sn n|Fn ]
= E[Xn+1 |Fn ] + E[Sn n|Fn ]
a
= E[Xn+1 ] + Sn n
= Sn n.
Here in (a) we used the fact that Xn+1 is independent from Fn and Sn
Fn .
2. Random walk squared. Under the same setting, suppose in addition
= 0. Then Sn2 n 2 is a martingale. The proof of this fact is very
similar.
6
References
[1] R. Durrett, Probability: theory and examples, Duxbury Press, second edi
tion, 1996.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
10/4/2013
Content.
1. Martingales and properties.
2. Stopping times and Optional Stopping Theorem.
Martingales
Brownian motion with drift. Now consider a Brownian motion with drift
and standard deviation . That is consider B (t) = t + B(t), where
B is the standard Brownian motion. It is straightforward to show that
B (t)t is a martingale. Also it is simple to see that (B (t)t)2 2 t
is also a martingale.
Walds martingale. Suppose Xn , n N is an i.i.d. sequence with
function M ()
E[X1 ] = , such that the exponential moment generating
g
X
1
E[e ] < for some > 0. Let Sn =
1kn Xk . Then Zn =
exp(Sn )
M n ()
Properties of martingales
1mn
implying that its expectation is E[Zn |Fn ] = Zn . This completes the proof.
3
Stopping times
In the example above, showing how to create a winning gambling system, notice
that part of the strategy was to stop the rst time when we win. Thus the game
is interrupted at a random time, which depends on the observed conditions. We
now formalize this using the concept of stopping times. The end goal of this
section is establishing the following result: if the gambling strategy involves a
stopping time which is bounded, then our expected gain is non-positive.
Denition 2. Given a ltration {Ft }tT on a sample space , a random vari
able : T is called a stopping time, if the event { t} = { :
() t} Ft , for every t.
Here we consider again the two cases T = N or T = R+ . Note, that we do
not need to specify a probability measure here. The concept of stopping times is
framed only in terms of measurability with respect to -elds.
4
Each of the events in the union on the right-hand side is measurable with
respect to Fn . This example is the familiar Walds stopping time: is the
smallest n for which a random walk X1 + + Xn x, except for, we
do not say that the random sequence should be i.i.d. and, in fact, we do
not say anything about the probability law of the sequence at all.
Consider the standard Wiener measure on C([0, )), that is consider the
standard Brownian motion. Then, given a > 0, Ta = inf{t : B(t) = a}
is a stopping time with respect to the ltration Bt , t R+ , where Bt is
the ltration described in the beginning of the lecture. This is the familiar
hitting time of the Brownian motion.
The main result in terms of stopping times that we wish to establish is as
follows.
Theorem 3. Suppose Xn is a supermartingale and is a stopping time, which
is a.s. bounded: M a.s. for some M . Then E[X ] E[X0 ]. In other
words, if there exists a bound on the number of rounds for betting, then the
expected net gain is non-positive, provided that in each round the expected gain
is non-positive.
This theorem will be established as a result of several short lemmas.
Lemma 1. Suppose is a stopping time corresponding to the ltration Fn .
Then the sequence of random variables Hn = 1{ n} is predictable.
Proof. Hn is a random variable which takes values 0 and 1. Note that the event
{Hn = 0} = { < n} = { n 1}. Since is a stopping time, then the
event { n 1} Fn1 . Thus Hn is predictable.
5
Xm (Hm Hm+1 ) + Hn Xn .
0mn1
(1)
Note, H0 = 1{ 0} = 1. Hm Hm+1 = 1{ m} 1{ m + 1} =
1{ = m}. Therefore, the expression on the right-hand side of (1) is equal to
Xn X0 . By Theorem 2, the left-hand side of (1) is a supermartingale. We
conclude that Yn = Xn is a supermartingale.
Now we are ready to obtain our end result.
Proof of Theorem 3. The process Yn = Xn is a supermartingale by Corol
lary 1. Therefore
E[YM ] E[Y0 ]
But YM = XM = X and Y0 = X0 = X0 . We conclude E[X ] E[X0 ].
This concludes the proof of Theorem 3.
4
References
[1] R. Durrett, Probability: theory and examples, Duxbury Press, second edi
tion, 1996.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
10/9/2013
Content.
1. Second stopping theorem.
2. Doob-Kolmogorov inequality.
3. Applications of stopping theorems to hitting times of a Brownian motion.
Doob-Kolmogorov inequality
E[Xn2 ]
.
E2
E[Xn2 1{Bm }]
1mn
Note
E[Xn2 1{Bm }] = E[(Xn Xm + Xm )2 1{Bm }]
2
= E[(Xn Xm )2 1{Bm }] + 2E[(Xn Xm )Xm 1{Bm }] + E[Xm
1{Bm }]
1mn
E[|Xn |p ]
.
Ep
Proof. The proof of the general case is more complicated, but when p 2 we
almost immediately obtain the result. Using conditional Jensens inequality we
know that |Xn | is a submartingale, as | | is a convex function. It is also nonp
negative. Function x 2 is convex increasing when p 2 and x 0. Recall from
p
the previous lecture that this implies |Xn | 2 is also a submartingale. Applying
Theorem 2 we obtain
p
1mn
E[|Xn |p ]
.
Ep
E[|XT |p ]
.
Ep
We now use the martingale theory and optional stopping theorems to derive
some properties of hitting times of a Brownian motion. Our setting is either
a standard Brownian motion B(t) or a Brownian motion with drift B (t) =
t + B(t). In both cases the starting value is assumed 0. We x a < 0 < b
and ask the question: what is the probability that B (t) hits a before b? For
simplicity we use B instead of B , but mention that we talk about Brownian
motion with drift.
We dene
Ta = inf{t : B (t) = a}, Tb = inf{t : B (t) = b}, Tab = min(Ta , Tb ).
In Lecture 6, Problem 1 we established that when B is standard, lim supt B(t) =
a.s. Thus Tb < a.s. By symmetry, Ta < a.s. Now we ask the question:
what is the probability P(Tab = Ta )? We will use the optional stopping theorems
established before. The only issue is that we are now dealing with continuous
time processes. The derivations of stopping theorems require more details (for
example dening predictable sequences is trickier). We skip the details and just
assume that optional stopping theorems apply in our case as well.
The case of the standard Brownian motion is the simplest.
Theorem 4. Let Ta , Tb , Tab be dened with respect to the standard Brownian
motion B(t). Then
P(Tab = Ta ) =
|b|
.
|a| + |b|
Proof. Recall that B is a martingale. Observe that Tab denes a stopping time:
the event {Tab t} Bt (stopping Tab t is determined completely by the
path of the Brownian motion up to time t). Therefore by Corollary 1 in the
previous lecture, Yt B(t Tab ) is also a martingale. Note that it is a bounded
martingale, since its absolute value is at most max(|a|, |b|). Theorem 1 applied
to Yt then implies that E[YTab ] = E[B(Tab )] = E[Y0 ] = E[B(0)] = 0. On the
other hand, when Tab = Ta , we have B(Tab ) = B(Ta ) = a and, conversely,
when Tab = Tb , we have B(Tab ) = B(Tb ) = b. Therefore
E[B(Tab )] = aP(Tab = Ta ) + bP(Tab = Tb ) = |a|P(Tab = Ta ) + |b|P(Tab = Tb )
Since P(Tab = Ta )+P(Tab = Tb ) = 1, then, combining with the fact E[B(Tab )]
we obtain
|b|
|a|
P(Tab = Ta ) =
, P(Tab = Tb ) =
.
|a| + |b|
|a| + |b|
4
We now consider the more difcult case, when the drift of the Brown
ian motion = 0. Specically, assume < 0. Recall, that in this case
limt B(t) = a.s., so Tab Ta < a.s. Again we want to compute
P(Tab = Ta ).
We x drift < 0, variance 2 > 0 and consider q() = + 12 2 2 .
Proposition 1. For every , the process V (t) = eB(t)q()t is a martingale.
Proof. We rst need to check that E[|V (t)|] < . We leave it as an exercise.
We have for every 0 s < t
E[V (t)|Bs ] = E[e(B(t)B(s)) eq()(ts) eB(s)q()s |Bs ]
= E[e(B(t)B(s)) ]eq()(ts) eB(s)q()s
= eq()(ts) E[e(B(t)B(s)) ]V (s).
where the second equality follows from the ind. increments property of the
Brownian motion, and from the fact E[eB(s)q()s |Bs ] = eB(s)q()s . Since
d
e(ts)+ 2
2 2 (ts)
= eq()(ts) .
(1)
2b
P(Tab = Tb ) =
1 e 2
2b
2a
e 2 e 2
1 e
=
e
2|||b|
2
2|||a|
2
2|||a|
2
Compared with the driftless case, the probability of hitting b rst is exponentially
tilted. Now let us take a . The events Aa = {Tab = Ta } are monotone:
Aa Aa' for at < a < 0. Therefore
P(a<0 Aa ) = lim P(Aa ) = lim
a
1 e
1
e
2|||b|
2
2|||a|
2
2|||a|
= 1 e
2|||b|
2
But what is the event a<0 Aa ? Since Brownian motion has continuous paths,
this event is exactly the event that the Brownian motion never hits the positive
level b. That is the event supt0 B(t) < b. We conclude that when the drift of
the Brownian motion is negative
P(sup B(t) b) = e
2|||b|
2
t0
Recall, from Lecture 6, that we already established this fact directly from the
properties of the Brownian motion the supremum of a Brownian motion with
a negative drift has an exponential distribution with parameter 2||/ 2 .
4 Additional reading materials
Durrett [1] Chapter 4.
Grimmett and Stirzaker [2] Section 7.8.
References
[1] R. Durrett, Probability: theory and examples, Duxbury Press, second edi
tion, 1996.
[2] G. R. Grimmett and D. R. Stirzaker, Probability and random processes,
Oxford University Press, 2005.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
10/9/2013
Content.
1. Martingale Convergence Theorem
2. Doobs Inequality Revisited
3. Martingale Convergence in Lp
4. Backward Martingales. SLLN Using Backward Martingale
5. Hewitt-Savage 0 1 Law
6. De-Finettis Theorem
Then, almost surely X = limn Xn exists and is nite in expectation. That is,
dene X = lim sup Xn . Then Xn X a.s. and E[|X |] < .
Proof. The proof relies Doobs Upcrossing Lemma. For that consider
{ : Xn () does not converge to a limit in R}
= { : lim inf Xn () < lim sup Xn ()}
n
(1)
where Q is the set of rational values. Let, UN [a, b]() = largest k such that it
satises the following: there exists
0 s1 < t1 < ... < sk < tk N
1
such that
1 i k.
(2)
Doobs upcrossing lemma proves that P(a,b ) = 0 for every a < b. Then we
have from (2) that P() = 0. Thus, Xn () converges in [, ] a.s. That is,
X = lim Xn exists a.s.
n
Now,
E[|X |] = E[lim inf |Xn |]
n
a XN ,
0,
if XN a
otw.
1, if X0 () < a
0, otw.
Inductively,
0, otw.
Ck (Xk Xk1 )
1kn
We claim that
YN () (b a)UN [a, b] (XN () a) .
Let UN [a, b] = k. Then there is 0 s1 < t1 < < sk < tk N such that
Xsi () < a < b < Xti (), i = 1, , k. By denition, Csi +1 = 1 for all
i 1. Further, Ct () = 1 for si + 1 t li ti where li ti is the smallest
time t si such that Xt () > b. Without the loss of generality, assume that
s1 = min{n : Xn < a}. Let, sk+1 = min{n > tk : Xn () < a}. Then,
YN () =
jN
[
1ik si tli
+
tsk+1
=
1ik
Therefore we have
<
Now, UN [a, b] / U [a, b]. Hence by the Monotone Convergence Theorem,
E[UN [a, b]] / E[U [a, b]]. That is, E[U [a, b]] < . Hence, P(U [a, b] =
) = 0.
2
Doobs Inequality
+ . Given > 0,
Theorem 2. Let Xn be a sub-MG and let Xn = max0mn Xm
let A = {Xn }. Then,
N = min{m : Xm
or m = n}
(3)
We have
(4)
(5)
Similarly,
(6)
(7)
But
(8)
1
E[Xn ]
P( max Xk )
0kn
0m
E[(Xn,M )p ] =
0
pp1 P(Xn,M )d
1
pp1 [ E[Xn+ 1(Xn,M )]]d
P(Xn,M ) =
0,
if M <
P(Xn ), if M
have
pE[Xn+
Xn,M
p2 d]
p
E[Xn+ (Xn,M )p1 ]
p1
1
1
p
Here,
1
q
=1
1
p
(9)
= qE[(Xn+ )p ] p E[(Xn,M )p ] q
Thus,
p(1 1q )
||Xn,M ||p
q||Xn+ ||p
Backward Martingale
= a,b:a,bQ a,b
Now, recall Un [a, b] is the number of upcrossing of [a, b] in Xn , Xn+1 , ..., X0 as
n . By upcrossing inequality, it follows that
(b a)E[Un [a, b]] E[|X0 |] + |a|
Since Un [a, b] / U [a, b] and By monotone convergence theorem, we have
E[U [a, b]] < P(a,b ) = 0
This implies Xn converges a.s.
Now, Xn = E[X0 |Fn ]. Therefore, Xn is UI. This implies Xn X in
L1 .
Theorem 6. If X = limn Xn and F = n Fn . Then X =
E[X0 |F ].
Proof. Let Xn = E[X0 |Fn ]. If A F Fn , then E[Xn ; A] = E[X0 ; A].
Now,
|E[Xn ; A] E[X ; A]| = |E[Xn X ; A]|
E[|Xn X |; A]
E[|Xn X |] 0 as n (by Theorem 5)
Hence, E[X ; A] = E[X0 ; A]. Thus, X = E[X0 |F ].
Theorem 7. Let Fn F , and Y L1 . Then, E[Y |Fn ] E[Y |F ] a.s.
in L1 .
Proof. Xn = E[Y |Fn ] is backward MG by denition. Therefore,
Xn X a.s. and in L1 .
By Theorem 6, X = E[X0 |F ] = E[Y |F ]. Thus, E[Y |Fn ] E[Y |F ].
E[Xn |Fn1 ] = E[
i=1
= E[1 |Sn+1 ]
1
=
Sn+1
n+1
= Xn+1
Then Xn is backward MG.
Proof. By Theorem 5 7, we have Xn X a.s. and in L1 , with X =
E[1 |F ]. Now F is in (the exchangeable -algebra). By Hewitt-Savage
(proved next) 0-1 law, is trivial. That is, E[1 |F ] is a constant. Therefore,
E[X ] = E[1 ] is also a constant. Thus,
Sn
= E[1 ]
n n
X = lim
If is bounded then
An () E[(X1 , ..., Xk )] a.s.
9
Proof. An () n by denition. So
An () = E[An ()|n ]
1 X
E[(Xi1 , ..., Xik )|n ]
=
npk
i1 ,...,ik
1 X
E[(X1 , ..., Xk )|n ]
=
n pk
i1 ,...,ik
(10)
(11)
We want to show that indeed E[(X1 , ..., Xn )|] is E[(X1 , ..., Xn )].
First, we show that E[(X1 , ..., Xn )|] (Xk+1 , ...) since is bounded.
Then, we nd that if E[X|G] F where X is independent of F then E[X|G] is
constant, equal to E[X]. This will complete the proof of Lemma.
First step: consider An (). It has npk terms in which there are k(n 1)pk1
terms containing X1 . Therefore, the effect of terms containing X1 is:
Tn (1)
1
np k
(i1 ,...,ik )
1
k (n 1)pk1 ||||
np k
(n k)! (n 1)!
k
||||
n!
(n k)!
k
= |||| 0 as n
n
(12)
1
Let A1
n () = An () Tn (1). Then, we have An () E[(X1 , ..., Xn )|]
from (11) and (12). Thus, E[((X1 , ..., Xn ))|] is independent on X1 . Simi
larly, repeating argument for X2 , ..., Xk we obtain that
(13)
De Finettis Theorem
Theorem 10. Given X1 , X2 , ... sequence of exchangeable, that is, for any n
and n Sn , (X1 , ..., Xn ) (Xn (1) , ..., Xn (n) ), then conditional on ,
X1 , ..., Xn , ... are i.i.d.
Proof. As in H-Ss proof and Lemma, dene An () =
Then, due to exchangeability,
1
npk
11
g(Xm )
iIn,k
k1
X
g(Xij )
j=1
iIn,k1
(15)
Let j (X1 , ..., Xk1 ) = f (X1 , ..., Xk1 )g(Xj ), 1 j k1 and (X1 , ..., Xk ) =
f (X1 , ..., Xk1 )g(Xk ). Then,
npk1 An (f )nAn (g) = npk An () + npk1
k1
X
An (j )
j=1
X
n
1
An (f )An (g) = An () +
An (j )
nk+1
nk+1
j=1
(16)
Thus, we have using (16) that for any collection of bounded functions f1 , ..., fk ,
k
k
k
k
E[ fi (Xi )|] =
E[fi (Xi )|]
i=1
i=1
12
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
10/21/2013
Content.
1. Exponential concentration for martingales with bounded increments
2. Concentration for Lipschitz continuous functions
3. Examples in statistics and random graph theory
1 Azuma-Hoeffding inequality
Suppose Xn is a martingale wrt ltration Fn such that X0 = 0 The goal of this
lecture is to obtain bounds of the form P(|Xn | n) exp((n)) under
some condition on Xn . Note that since E[Xn ] = 0, the deviation from zero is
the right regime to look for rare events. It turns out the exponential bound of
the form above holds under very simple assumption that the increments of Xn
are bounded. The theorem below is known as Azuma-Hoeffding Inequality.
Theorem 1 (Azuma-Hoeffding Inequality). Suppose Xn , n 1 is a martin
gale such that X0 = 0 and |Xi Xi1 | di , 1 i n almost surely for some
constants di , 1 i n. Then, for every t > 0,
t2
P (|Xn | > t) 2 exp n
2 i=1 di2
when |x/di | 1
1 x
1
x
( + 1)di + (1 )(di )
2
di
2 di
1 x
1
x
+ 1 f (di ) +
1
f (di )
2
di
2 di
f (di ) + f (di ) f (di ) f (di )
=
+
x.
2
2
exp(x) = f (x) = f
(1)
k=0
k=0
=
k=0
ak
+
k!
a2k
2k k!
k=0
(1)k ak
=
k!
k=0
a2k
(2k)!
(because 2k k! (2k)!)
( a2 )k
a2
= exp( ).
k!
2
(2)
exp(di ) exp(di )
d2i
)+
x.
2
2
(3)
We now turn to our martingale sequence Xn . For every t > 0 and every > 0
we have
P(Xn t) = P (exp(Xn ) exp(t))
exp(t)E[exp(Xn )]
= exp(t)E[exp(
where X0 = 0 was used in the last equality. Applying the tower property of
conditional expectation we have
(Xi Xi1 ))
E exp(
1in
X
E exp((Xn Xn1 )) exp(
(Xi Xi1 ))|Fn1
1in1
= exp(
1in1
exp(
exp
(Xi Xi1 ))
1in1
2 2
dn
exp(di ) exp(di )
+
E[Xn Xn1 |Fn1 ] ,
2
where (3) was used in the last inequality. Martingale property implies E[Xn
Xn1 |Fn1 ] = 0, and we have obtained an upper bound
2 2
X
X
dn
E exp(
(Xi Xi1 )) E exp(
(Xi Xi1 )) exp
2
1in
1in1
P
exp(t) exp
Optimizing over
the choice of , we see that the tightest bound is obtained by
P
setting = t/ i d2i > 0, leading to an upper bound
t2
P(Xn t) exp P
.
2 i d2i
A similar approach using < 0 gives for every t > 0
t2
P
P(Xn t) exp
.
2 i d2i
Combining, we obtain the required result.
2
and y1 , . . . , yn
|g(x1 , . . . , xn ) g(y1 , . . . , yn )|
n
X
di 1{xi 6= yi }.
(4)
i=1
In particular when a vector x changes value only in its i-th coordinate the amount
of change in function g is at most di . As a special case, consider a subset of vec
tors x = (x1 , . . . , xn ) such that |xi | ci and suppose g is Lipschitz continuous
with constant
K. Namely, for for every x, y, |g(x) g(y)| K|x y|, where
|x y| = i |xi yi |. Then for any two such vectors
X
|g(x) g(y)| K|x y| K
2ci |xi yi |,
i
Mi =
xi+1 ,...,xn
Z
g(x1 , ..., xn )dP(xi+2 ) dP(xn ).
Mi+1 =
xi+2 ,...,xn
Thus
Z
|Mi+1 Mi | = |
(g(x1 , ..., xn )
xi+2 ,...,xn
di+1
dP(xi+1 ) dP(xn )
xi+2 ,...,xn
= di+1 .
This derivation represents a simple idea that Mi and Mi+1 only differ in averag
i = Mi M0 = Mi E[g(X1 , . . . , Xn )],
ing out Xi+1 in Mi . Now dening M
i is also a martingale with differences bounded by di , but with
we have that M
an additional property M0 = 0. Applying Theorem 1 we obtain the required
result.
3 Two examples
We now consider two applications of the concentration inequalities developed
in the previous sections. Our rst example concerns convergence empirical dis
tributions to the true distributions of random variables. Specically, suppose we
have a distribution function F , and i.i.d. sequence X1 , . . . , Xn with distribution
F . From the sample
2
deviations type bound 2 exp( r2dn ). Taking instead t = r n, we obtain Gaus
r2
sian type bound 2 exp( 2d
). Namely, M Cn = E[M Cn ] + ( n). This is a
meaningful concentration around the mean since, as we have discussed above
E[M Cn ] = (n).
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
10/23/2013
Content.
1
Talagrands inequality
d(x, y) = |{i i n : xi = yi }| =
n
n
1{xi =yi }
i=1
n
n
ai 1{xi =yi }
i=1
Also |a| =
a2i .
c
DA
(x) = sup da (x, A) = inf{da (x, y) : y A}
|a|=1
1
t2
exp( )
(A)
4
2
2.1
(1)
Therefore,
d
f (x) f (y) n di
q
P
P 1{xi 6=yi } = da (x, y) with ai = q
q
Pi
2
2
d2i
i
i di
i di
P
q
f (x)
2
Thus F (x) = ||d||
where
||d||
=
2
i di .
2
c (x),
Let A = {F m}. By denition of DA
c
DA
(x) = sup da (x, A) da (x, y)
a:|a|=1
for a given a such that |a| = 1 and y A. Now for any y A, by denition
F (y) m. Then,
c
F (x) F (y) + da (x, y) m + DA
(x)
c (x) r}. By Talagrands inequality, for
which implies {F m + r} {DA
any r 0,
c
P({f m + r}) P({DA
r})
r2
1
exp( )
4
P(A)
That is,
P({F m})P({F m + r}) exp(
2
r2
)
4
(2)
r2
r2
), P(F mF r) 2 exp( )
4
4
Thus,
P(|F mF | r) 4 exp(
(3)
r2
)
4
n
n
t i xi
i=1
ti yi +
n
i
|ti ||yi xi |
t i yi +
y > +(
sup < t,
tT
n di
1(yi =
6 xi ))||d||2
||d||2
i
sup
tT
= F (y) + da (x, y)
t2i (vi ui )2 )
(4)
r2
)
4
2
)
4 2
Now,
Z
1ds +
P(F mF + )d
0
0
Z
mF +
2 exp( 2 )d
4
0
Z
2
mF +
2 exp( 2 )d
4
0
Z
1
2
= mF + 2 8 2
exp( 2 )d
4
24 2
0
= mF + 2 2
E[F ] =
Thus,
|E[F ] mF | 2 2
Let Ln (X1 , ..., Xn ) be the length of longest increasing subsequence. (Note that
r2
)
4(mn + r)
P(Ln mn r) 2 exp(
r2
)
4mn
Ln (y) +
n
n
(
1
(
Ln (x)[
1(i I)1(xi 6= yi )]
L
(x)
n
i=1
Dene
ai (x) =
1
,
Ln (x)
0,
if i I
o.w.
Ln (x) mn
(
Ln (x)
For x such that Ln (x) mn + r, the RHS of ahove is minimal when Ln (x) =
mn + r. Therefore, we have
c
DA
(x)
Ln (x) mn
(
Ln (x)
For x such that Ln (x) mn + r, the RHS of above is minimal when Ln (x) =
mn + r. Therefore, we have
r
c
DA
(x)
mn + r
5
That is
c
Ln (x) mn + r DA
(x)
r
mn + r
for A = {Ln mn }
P(Ln mn + r) P(DA
r
mn + r
1
r2
exp(
)
2P (A)
4(mn + r)
r2
)
4(mn + r)
To establish lower bound, replace argument of the above with x such that Ln (x)
s + u, A = {Ln s}. Then we obtain,
c
DA
(x)
u
s+u
r
1
r2
)
exp(
)
P(Ln mn r)
4mn
mn
which implies
P(Ln mn r) 2 exp(
r2
)
4mn
c (x) = sup
Preparation. Given set A, x X: DA
(da (x, A) = inf yA da (x, y)).
aRn
+
Let
and let
s S :
sUA (x)
Thus,
6 x) = 0 UA (x) 0 VA (x)
x A 1(x =
|y|
inf
yVA (x)
c (x) inf
Proof. (i) DA
yVA (x) |y|: since inf yVA (x) (y) is achieved, let Z be
such that |Z| = inf yVA (x) |y|. Now for any a Rn+ , |a| = 1:
a y a z |a||z| = |z|
inf
yVA (x)
inf
yVA (x)
inf
sUA (x)
yA
|a|=1,aRn
+
|y|
inf
yVA (x)
c (x) inf
(ii) DA
yVA (x) |y|: Let z be the one achieving minimum
P 2 in VA (x).
Then due to convexity of the objective (equivalently |y|2 =
yi = f (y))
and of the domain, we have for any y VA (x), vf (z)(y z) 0 for any
y VA (x). vf (z) = v(z z) = 2z. Therefore the condition implies
(y z)z 0 y z z z = |z|2 y
Thus, for a =
z
|z|
z
|z|
|z|
inf
a y |z|
yVA (x)
But for any given a, inf yVA (x) a y = inf sUA (x) a s = da (x, A) as explained
before. That is, supa:|a|=1 da (x, A) |z| = inf yVA (x) |y|. This completes the
proof.
7
Now we are ready to establish the inequality of Talagrand. The proof is via
induction. Consider n = 1, given set A. Now,
c
(x) =
DA
0, for x A
1, for x
/A
sup
yA
aRn
+ ,|a|=1
yA
Then,
Z
exp(D /4)dP =
exp(0)dP +
exp(1/4)dP
Ac
1
P (A)
(5)
Let f (x) = e1/4 (e1/4 1)x and g(x) = x1 . Because f (x) is a decreasing func
tion of x, g(x) is a decreasing convex function. Thus, the result if established
for n = 1.
Induction hypothesis. Let it hold for some n. We shall assume for ease of the
proof that 1 = 2 = ... = n = ... = . L
Let A n+1 . Let B be its projection on n . Let A(), be section
of A along : if x n , then z = (x, ) n+1 . We observe the
following:
if s UA() (x), then (s, 0) UA (z). Because, for some y n such that
6 y). Therefore, (s, 0) = (1(x =
6 y), 1( =
6 )) = 1(z =
6
(y, ) A, s = 1(x =
(y, )) where (y, ) A. Further, if t UB (x), then (t, 1) UA (z). This is
x,
) A for some
}. Now
because of the following: B = {x
n : (
if t UB (x), then y B such that t = 1(x 6= y). Now (t, 1) = (1(x 6=
y), 1(
6= )) = 1(z 6= (y,
)) as long as there exists
so that (y,
) A and
6 .
=
Given this, it follows that if VA() (x), VB (x), and [0, 1], then
(( + (1 )), 1 ) VA (z). Recall that
c
(z)2 =
DA
inf
|y|2 (1 )2 + | + (1 )|2
yVA (z)
(1 )2 + ||2 + (1 )||2
(6)
Therefore,
c
(z)2 (1 )2 +
DA
inf
VA() (x)
||2 + (1 )
inf
VB (x)
c
c
= (1 )2 + DA()
(x)2 + (1 )DB
(x)2
||2
exp
c
(1 )2 + DA
( )
c (x)
(x)2 + (1 )DB
(1 )2
exp(
)
4
exp(
c
DA()
(x)2
n
X
) exp(
-
dP (x)
c (x)2
(1 )DB
) dP (x)
4
Y
)2
(1
)E[X Y ]
4
1
1
(1 )2
exp(
: [0, 1])
)E[X p ]1/p E[Y q ]1/q , (for p = , q =
4
1
Z
Z
(1 )2
c
c
= exp(
)
exp(DA()
(x)2 /4)dP (x)
exp(DB
(x)2 /4)dP (x)
4
n
n
(1 )2
1
1
exp(
)(
) (
)1 by induction hypothesis.
4
P (A()) P (B)
= exp(
= exp
(1 )2
4
1
P (B)
P (A())
P (B)
(7)
(7) is true for any [0, 1], so for tightest upper bound, we shall optimize.
2
2 u.
Claim: for any u [0, 1], inf [0,1] exp( (1)
4 )u
Therefore, (7) reduces to
1
P (A())
(2
)
P (B)
P (B)
Therefore,
Dc (x, )2
exp( A
)dP (x)d()
4
n+1
Z
1
P(A())
(2
)d()
P(B)
P(B)
N
1
(P
)(A)
(2
)
P(B)
P(B)
1
N
(8)
f (1) = 1.
2
2 u:
Proof. To establish: inf [0,1] exp( (1)
4 )u
if u e1/2 : = 1 + 2 log u 1
2 = log u
2
u
log
u
log
u
2
log
u =e
=e
e
. Thus,
exp(
(1)2
4
(1 )2
)u = exp(log2 u 2 log2 u log u) = exp( log u log2 u)
u
We have that
1
1
1
1 u e1/2 0 log u 0 log u , 0 log2 u
2
2
4
and
'
1 1
1
1
exp( log u log2 u)
2 4
4
4
10
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
10/28/2013
Content.
1. Spaces L2 , M2 , M2,c .
2. Quadratic variation property of continuous martingales.
E[XT2 ]
.
x2
Proof. Consider any sequence of partitions n = {0 = tn0 < t1n < . . . < tnNn =
T } such that (n ) = maxj |tnj+1 tjn | 0. Additionally, suppose that the
sequence n is nested, in the sense the for every n1 n2 , every point in n1 is
also a point in n2 . Let Xtn = Xtnj where j = max{i : ti t}. Then Xtn is a
sub-martingale adopted to the same ltration (notice that this would not be the
case if we instead chose right ends of the intervals). By the discrete version of
the D-K inequality (see previous lectures), we have
P(max Xtnj x) = P(sup Xtn x)
jNn
tT
E[XT2 ]
.
x2
(n)
(n)
(n)
P(sup |Xt
Xt | > E)
tT
1
(n)
E[(XT XT )2 ] 0,
2
E
P(sup |Xt
Xt | > 1/k)
tT
1
.
2k
(nk )
(n )
suptT |Xt k ()
lim
n :(n )0
0jn1
where the limit is over all partitions n = {0 = t0 < t1 < < tn = t} and
(n ) = maxj |tj tj1 |.
Proof. Fix s < t. Let X M2,c . We have
E[(Xt Xs )2 ((Xt ) (Xs ))|Fs ] = E[Xt2 2Xt Xs + Xs2 ((Xt ) (Xs ))|Fs ]
= E[Xt2 |Fs ] 2Xs E[Xt |Fs ] + Xs2 E[(Xt )|Fs ] + (Xs )
= E[Xt2 (Xt )|Fs ] Xs2 + (Xs )
= 0.
Thus for every s < t u < v by conditioning rst on Fu and using tower
property we obtain
E (Xt Xs )2 ((Xt ) (Xs ))
E
j
6M 4 .
j
4
6M E[sup{|Xr Xs |4 : |r s| (n )}].
Now X() is a.s. continuous and therefore uniformly continuous on [0, t].
Therefore, a.s. sup{|Xr Xs |2 : |r s| (n )} 0 as (n ) 0.
Also |Xr Xs | 2M a.s. Applying Bounded Convergence Theorem, we ob
tain that E[sup{|Xr Xs |4 : |r s| (n )}] converges to zero as well and
the result is obtained.
We now return to the proof of the proposition. We rst assume |Xs | M
and (Xs ) M a.s. for s [0, t].
We have (using a telescoping sum)
X
2
X
2
E
(Xtj+1 Xtj )2 (Xt ) = E
(Xtj+1 Xtj )2 ((Xtj+1 ) (Xtj ))
j
When we expand the square the terms corresponding to cross products with
j1 6= j2 disappear due to (1). Thus the expression is equal to
E
2
X
(Xtj+1 Xtj )2 ((Xtj+1 ) (Xtj ))
j
2E
X
((Xtj+1 ) (Xtj ))2 ].
j
We now analyze the second term. Since (Xt ) is a.s. non-decreasing, then
X
((Xtj+1 ) (Xtj )) sup {(Xr ) (Xs ) : |r s| (n )}
((Xtj+1 ) (Xtj ))2
0srt
(2)
0srt
Now (Xt ) is a.s. continuous and thus the supremum term converges to zero a.s.
as n . On the other hand a.s. (Xt )((Xr ) (Xs )) 2M 2 . Thus using
Bounded Convergence Theorem, we obtain that the expectation in (2) converges
to zero as well. We conclude that in the bounded case |Xs |, (Xs ) M on [0, t],
the quadratic variation of Xs over [0, t] converges to (Xt ) in L2 sense. This
implies convergence in probability as well.
It remains to analyze the general (unbounded) case. Introduce stopping
times TM for every M R+ as follows
TM = min{t : |Xt | M or (Xt ) M }
Consider XtM XtTM . Then X M M2,c and is a.s. bounded. Further
2
(XtTM ) is a bounded martinsince Xt2 (Xt ) is a martingale, then XtT
M
gale. Since Doob-Meyer decomposition is unique, we that (XtTM ) is indeed
the unique non-decreasing component of the stopped martingale XtTM . There
is a subtlety here: XtM is a continuous martingale and therefore it has its own
quadratic variation (XtM ) - the unique non-decreasing a.s. process such that
(XtM )2 (XtM ) is a martingale. It is a priori non obvious that (XtM ) is the
same as (XtTM ) - quadratic variation of Xt stopped at TM . But due to unique
ness of the D-M decomposition, it is.
Fix E > 0, t 0 and nd M large enough so that P(TM < t) < E/2. This is
possible since Xt and (Xt ) are continuous processes. Now we have
X
P
(Xtj+1 Xtj )2 (Xt ) > E
j
X
P
(Xtj+1 Xtj )2 (Xt ) > E, t TM + P(TM < t)
j
X
(Xtj+1 TM Xtj TM )2 (XtTM ) > E, t TM + P(TM < t)
=P
j
X
P
(Xtj+1 TM Xtj TM )2 (XtTM ) > E + P(TM < t).
j
We already established the result for bounded martingales and quadratic vari
ations. Thus, there exists = (E) > 0 such that, provided () < , we
have
X
2
P
(Xtj+1 TM Xtj TM ) (XtTM ) > E < E/2.
j
References
[1] I. Karatzas and S. E. Shreve, Brownian motion and stochastic calculus,
Springer, 1991.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
10/30/2013
Content.
1. Simple processes. Ito isometry
2. First 3 steps in constructing Ito integral for general processes
It (X()) =
0jn1
(1)
E[It2 (X)] = E[
(2)
It (X) M2,c ,
(3)
t
(4)
Notice that (4) is a generalization of Ito isometry. We only prove Ito isome
try, the proof of (4) follows along the same lines.
Proof. Dene tn = t for convenience. We begin with (1). Let {t1j } and {tj2 }
be partitions corresponding to simple processes X and Y . Consider a partition
{tj } obtained as a union of these two partitions. For each tj which belongs to
the second partition but not the rst dene Xtj = Xt1 , where t1i is the largest
i
point not exceeding tj . Do a similar thing for Y . Observe that now Xt = Xtj
for t [tj , tj+1 ). The linearity of Ito integral then follows straight from the
denition.
Now for (2) we have
E[It2 (X)] =
Z t
X
E[Xt2j (tj+1 tj )] = E[
Xt2j (tj+1 tj )] = E[ Xs2 ds].
Let us show (3). We already know that the process It (X) is continuous. From
Ito isometry it follows that E[It2 (X)] < . It remains to show that it is a
martingale. Thus x s < t. Dene tn = t and dene j0 = max{j : tj s}.
X
Xtj (Btj+1 Btj )|Fs ]
E[It (X)|Fs ] = E[
jn1
= E[
jj0 1
j>j0
= E[
jj0 1
= Is (X).
(think about justifying last two equalities).
Z
lim E[
n
(Xtn Xt )2 dt] = 0.
(5)
(Xtn Xt )2 dt 2
(Xtn )2 dt + 2
0
Xt2 dt
0
Xt2 dt.
oT
Since E[ 0 Xt2 dt] < , then applying Dominated Convergence Theorem, we
obtain the result.
Exercise 1. Establish (7) by applying instead Monotone Convergence Theorem.
References
[1] I. Karatzas and S. E. Shreve, Brownian motion and stochastic calculus,
Springer, 1991.
[2] B. ksendal, Stochastic differential equations, Springer, 1991.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
11/04/2013
Content.
1. Denition of Ito integral
2. Properties of Ito integral
lim E[
(1)
Xn
= E[
0
2E[
0
But since the sequence Xn satises (1), it follows that the sequence IT (X n ) is
Cauchy in L2 sense. Recall now from Theorem 2.2. previous lecture that each
It (X n ) is a continuous square integrable martingale: It (X n ) M2,c Applying
Proposition 2, Lecture 1, which states that M2,c is a closed space, there exists
a limit Zt , t [0, T ] in M2,c satisfying E[(ZT IT (X n ))2 ] 0. The same
applies to every t T since (Zt It (X n ))2 is a submartingale.
It remains to show that such a process Zt is unique. If Zt is a limit of some
n satisfying (1), then by submartingale inequality for every E > 0
sequence X
we have P(suptT |Zt Zt | E) E[(ZT ZT )2 ]/E2 . But
n ))2 ]
E[(ZT ZT )2 ] 3E[(ZT IT (X n ))2 ] + 3E[(IT (X n ) IT (X
n ) ZT )2 ],
+ 3E[(IT (X
and the right-hand side converges to zero. Thus E[(ZT ZT )2 ] = 0. It follows
that Zt = Zt a.s. on [0, T ]. Since T was arbitrary we obtain an a.s. unique limit
on R+ .
Now we can formally state the denition of Ito integral.
Denition 1 (Ito integral). Given a stochastic process Xt L2 and T > 0, its
Ito integral It (X), t [0, T ] is dened to be the unique process Zt constructed
in Proposition 2.
We have dened Ito integral as a process which is dened only on a nite
interval [0, T ]. With a little bit of extra work it can be extended to a process
It (X) dened for all t 0, by taking T and taking appropriate limits.
Details can be found in [1] and are omitted, as we will deal exclusively with Ito
integrals dened on a nite interval.
2
2.1
Let us compute the Ito integral for a special case Xt = Bt . We will do this di
rectly from the denition. Later on we will develop calculus rules for computing
the Ito integral for many interesting cases.
We x a sequence of partitions n : 0 = t0 < < tn = T and consider
Btn = Btj , t [tj , tj+1 ). Assume that limn (n ) = 0, where (n ) =
maxj |tj+1 tj |. We rst show that this is sufcient for having
Z T
lim E[
(Bt Btn )2 dt] = 0.
(2)
n
Indeed
(Bt Btn )2 dt =
n1
n Z tj+1
tj
j=0
We have
Z
E[
tj+1
tj+1
(Bt Btj ) dt =
tj
tj
tj+1
(t tj )dt
=
tj
(tj+1 tj )2
,
2
implying
Z T
n1
n1
n
1n
E[
(Bt Btn )2 dt] =
(tj+1 tj )2 (n )
(tj+1 tj ) = (n )T 0,
2
0
j=0
j=0
n1
n
Bt2j+1 Bt2j =
j=0
n1
n
n1
n
j=0
j=0
(Btj+1 Btj )2 + 2
n1
n
(Btj+1 Btj )2 = T
j=0
in L2 (recall that the only requirement for this convergence was that (n )
0). Therefore, also in L2
n1
n
j=0
1
T
Btj (Btj+1 Btj ) B 2 (T ) .
2
2
We conclude
3
Z T
1
T
IT (B) =
Bt dBt = B 2 (T ) .
2
2
0
Further, recall that since Bt M2,c then it admits a unique Doob-Meyer
decomposition Bt2 = t + Mt , where t = (Bt ) is the quadratic variation of Bt
and Mt is a continuous martingale. Thus we recognize Mt to be 2It (B).
2.2
Properties
(3)
(4)
T
0
Xt2 dt.
Proof. The proof of (3) is straightforward and is skipped. We now prove (4).
Fix any set A Fs . We need to show that
Z t
2
E[(It (X) Is (X)) I(A)] = E[I(A)
Xu2 du].
s
from the denition of It (X). Similarly we show that all the other terms with
factor 2 in front converge to zero.
By property (2.6) Theorem 2.2 previous lecture, we have
Z t
n
n 2
E[(It (X ) Is (X )) I(A)] = E[I(A) (Xun )2 du]
s
Now
Z
E[I(A)
Z t
Xu2 du] = E[I(A) (Xun Xu )(Xun + Xu )du]
s
Z t
E[ |(Xun Xu )(Xun + Xu )|du]
s
Z t
Z t
1
1
E 2 [ (Xun Xu )2 du]E 2 [ (Xun + Xu )2 du]
s
where Cauchy-Schwartz inequality was used in the last step. Now the rst term
in the product converges to zero by the assumption (1) and the second is uni
formly bounded in n (exercise). The assertion then follows.
Now we prove the last part.
R t Applying Proposition 3 from Lecture 1, it suf
ces to show that It2 (X) 0 Xs2 ds is a martingale, since then by uniqueness
Rt
of the Doob-Meyer decomposition we must have that (It (X)) = 0 Xs2 ds. But
note that (4) is equivalent to
Z t
2
2
2
2
E[It (X) Is (X)|Fs ] = E[It (X)|Fs ] Is (X) = E[ Xu2 du|Fs ]
s
Z t
Z s
= E[ Xu2 du|Fs ] E[
Xu2 du|Fs ]
0
0
Z t
Z s
= E[ Xu2 du|Fs ]
Xu2 du.
0
Namely,
Z t
Z
E[It2 (X)|Fs ] E[ Xu2 du|Fs ] = Is2 (X)
0
namely,
3
It2 (X)
Rt
2
0 Xs ds
Xu2 du,
0
is indeed a martingale.
References
[1] I. Karatzas and S. E. Shreve, Brownian motion and stochastic calculus,
Springer, 1991.
[2] B. ksendal, Stochastic differential equations, Springer, 1991.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
11/13/2013
Content.
1. Ito process and functions of Ito processes. Ito formula.
2. Multidimensional Ito formula. Integration by parts.
Ito process
t
0 dBs
It (B) =
0
1
t
Bs dBs = Bt2 ,
2
2
or
t
B 2 (t) = 2
B(s)dB(s) + t,
(1)
0
t
Xt = X0 +
Us ds +
0
Vs dBs ,
0
(2)
Rt
Rt
Thus Bt2 is an Ito process: Bt2 = 0 ds + 2 0 Bs dBs or d(Bt2 ) = dt +
2Bt dBt . Note the difference from the usual differentiation: dx2 = 2xdx. The
additional term dt arises because Brownian motion B is not differentiable and
instead has quadratic variation.
Notation Given an Ito process dXt = Ut dt+Vt dBt , let us introduce the notation
(dXt )2 which stands for Vt2 dt. Equivalently (dXt )2 is (dXt ) (dXt ) which is
computed using the rules dt dt = dt dBt = dBt dt = 0, dBt dBt = dt.
Ito formula
g
1 2g
(Xt )dXt +
(Xt )(dXt )2
x
2 x2
Using the notational convention for dXt = Ut dt + Vt dBt and (dXt )2 , we can
rewrite the Ito formula as
dYt =
g
x
(Xt )Ut +
1 2g
g
2
(X
)V
(Xt )Vt dBt .
t t dt +
2
2 x
x
Thus, we see that the space of Ito processes is closed under twice-continuously
differentiable transformations.
Proof sketch of Theorem 1. We will do this for a very special case. We assume
g 2 g
as well as U and V are all bounded simple processes.
that the derivatives x
x2
The general case is then obtained by approximating U and V by bounded
simple processes in a way similar to how we dened the Ito integral.
Let n : 0 = t0 < t1 < < tn = t be a sequence of partitions such
that (n ) 0. We use the notation B(tj ) = B(tj+1 ) B(tj ), X(tj ) =
X g
= g(X(0)) +
(X(tj ))X(tj )
x
j<n
X
1 X 2g
2
+
(X(t
))
X(t
)
+
o(2 X(tj )).
j
j
2
x2
j<n
j<n
Now, we have
X(tj ) = X(tj+1 ) X(tj ) = U (tj )(tj+1 tj ) + V (tj )(B(tj+1 ) B(tj ))).
Thus, we obtain
X g
X g
(X(tj )) U (tj )(tj+1 tj ) + V (tj )(B(tj+1 ) B(tj ))
(X(tj ))X(tj ) =
x
x
j<n
j<n
X g
g
(X(tj )) U (tj )(tj+1 tj )
(X(s))U (s)ds
x
0 x
j<n
and
Z t
X g
g
(X(tj )) V (tj )(B(tj+1 ) B(tj ))
(X(s))V (s)dB(s)
x
x
0
j<n
takes place. For the rst convergence, let us x any sample . Then this conver
gence follows straight from the denition of Riemann integral since (n )
0. Thus we have a.s. convergence. Since by our assumptions the integrated
variables are bounded then Bounded Convergence Theorem (applied to uniform
on [0, t] distribution, just as in Proposition 1 Lecture 12) implies convergence
in L2 . To prove the second convergence consider a simple process g which is
g
(X(tj )) for all t [tj , tj+1 ). Then the left-hand side is Ito
dened to be x
integral of g(s)V (s). Then, by denition of Ito integral, the convergence to the
right-hand side holds if the following convergence takes place
Z t
g
lim E[ (
g (s)V (s)
(X(s))V (s))2 ds = 0.
n
x
0
3
j<n
+2
X 2g
(X(tj ))U (tj )V (tj )(tj+1 tj )B(tj )
x2
j<n
X 2g
+
(X(tj ))V 2 (tj )2 B(tj ).
x2
j<n
g
We now analyze these terms when (n ) 0. Recall our assumption that x
2
and U are bounded. Say it is at most C > 0. Therefore the rst sum converges
to zero provided (n ) 0. To analyze the second sum, we square it and take
expected value:
X 2g
2
E[2
(X(t
))U
(t
)V
(t
)(t
t
)B(t
)
]
j
j
j
j+1
j
j
x2
j<n
=4
X
j<n
2g
E[
2
](tj+1 tj )3 ,
where to obtain this equality we rst condition on eld Ftj , note that the ex
pected value of all cross products vanishes and use E[2 B(tj )|Ftj ] = tj+1 tj .
Again, since the second partial derivative and U, V are bounded, then the entire
term converges to zero provided that (n ) 0. In order to analyze the last
sum the following result is needed.
Problem 1 (Generalized Quadratic Variation). Suppose a(s) H 2 and n :
0 = t0 < < tn = t is a sequence of partitions satisfying (n ) 0 as
n . Then the following convergence occurs in L2 :
Z t
X
2
lim
a(tj )(B(tj+1 ) B(tj )) =
a(s)ds.
n
HINT: use the same approach that we did in establishing the quadratic variation
of B.
Using this result we establish that the last sum converges in L2 to
Z t 2
g
(X(s))V 2 (s)ds.
2
0 x
4
P
It remains to analyze j o(2 X(tj )) and using similar techniques, it can be
shown that this term vanishes in L2 norm as (n ) 0. Putting all of this
together, we conclude that g(X(t)) is approximated in L2 sense by
Z t
Z
g
1 t 2g
g(X(0)) +
(X(s))dX(s) +
(X(s))V 2 (s)ds.
2 0 x2
0 x
But recall that V 2 (s)ds = (dX(s))2 . Making this substitution, we complete the
derivation of the Ito formula.
Let us apply Theorem 1 to several examples.
Exercise 1. Verify that in all of the examples below the underlying processes
are in L2 .
(1) using Ito formula. Since Bt =
R t Example 1. Let us re-derive our formula
1 2
dB
is
an
Ito
process
and
g(x)
=
x
is
twice continuously differentiable,
s
2
0
then by the Ito formula we have
1
g
1 2g
d( Bt2 ) = dg(Bt ) =
dBt +
(dBt )2
2
x
2 x2
1
= Bt dBt + (dBt )2
2
dt
= Bt dBt + ,
2
which matches (1).
Example 2. Let us apply Ito formula to Bt4 . We obtain
1
d(Bt4 ) = 4Bt3 dBt + 12Bt2 dt = 4Bt3 dBt + 6Bt2 dt,
2
namely, written in an integral form
Z t
Z t
4
3
Bt = 4
Bs dBs + 6
Bs2 ds.
0
Taking expectations of both sides and recalling that Ito integral is a martingale,
we obtain
Z t
1
E[ Bs2 ds] = E[Bt4 ]
6
0
5
which we find to be (1/6)3t2 as the 4-th moment of a normal zero mean dis4
tribution with
R t std2 is 3 . Recall an earlier exercise where you were asked to
compute E[ 0 Bs ds] directly. We see that Ito calculus is useful even in computing conventional integrals.
3
i,j=1
where dXi,t dXj,t is computed using the rules dtdt = dtdBi = dBi dt =
0, dBi dBj = 0 for all i 6= j and (dBi )2 = dt.
Let us now do a quick example illustrating the use of the Ito formula. Consider g(t, Bt ) = etBt . We will use Ito formula to find its derivative. Since both
t and Bt are Ito processes and g(t, x) = etx is twice continuously differentiable
2
2 tx
function g : R2 R, the formula applies. t
g = xetx , t
2 g = x e , x g =
2
2 tx
tetx , x
2 g = t e . Then we can find its Ito representation using the Ito formula
as
1
1
d(etBt ) = etBt Bt dt + etBt Bt2 (dt)2 + tetBt dBt + t2 etBt (dBt )2
2
2
1 2
tBt
tBt
= e (Bt + t )dt + te dBt .
2
R
Problem 2. Use the Ito formula to nd eBt dBt . In other words, you need to
Rrepresent this integral in terms of expressions not involving dBt (as we did for
Bt dBt ).
Suppose
R t f is a continuously differentiable function. Let us use Ito formula to
nd 0 fs dBs and derive the integration by parts formula. In other words we
look at a special simple case when X is a deterministic process i.e., Xs = fs
a.s. First we observe that f L2 . Indeed, it is differentiable and therefore
continuous.
This
that f is bounded on any nite interval and therefore
Rt 2
R t implies
2
E[ 0 fs ds] = 0 fs ds < .
2
g
df g
d f
Introduce g(t, x) = ft x. We nd that g
t = x dt , x = ft , t2 = x dt2 and
second order partial derivatives with respect to x disappear. Therefore, using Ito
formula, we obtain
d(g(t, Bt ) =
df
df
d2 f
Bt dt + 2 Bt (dt)2 + ft dBt + 0 = Bt dt + ft dBt
dt
dt
dt
fs dBs = ft Bt
0
Bs
0
df
ds.
ds
References
[1] B. ksendal, Stochastic differential equations, Springer, 1991.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
11/20/2013
Content.
1. Trading strategies
2. Black-Scholes option pricing formula
t dXt
0
t t dt +
0
t t dBt
0
(1)
dX(s)
(s)
(2)
t Xt = 0 X0 +
0
Z t
=
0j X0j +
j (s)dXj (s).
(3)
1jm 0
1jm
This denition simply means that whatever we have at any time we invest.
This is easier to understand when we have a simple strategy, that is L02 .
Then we trade at times 0 = t0 < t1 < < tn = t. We start with portfolio
0 worth of dollars of a security. At time t1 our portfolio is worth
0 buy 0 X
0 + 0 (X
t X
0 ) Wt . We create some other portfolio t with the
0 X
1
1
1
t Xt )
tj (X
j+1
j
jn1
Z t
(s)dX(s).
T > 0
assumption in nance theory that market prices are such that there does not
exist a trading strategy creating an arbitrage opportunity. As it turns out, under
technical assumptions, this is equivalent to existence of a change of measure
such that with respect to the new measure X is a martingale.
2
For the purposes of this section we assume that we are dealing with two securi
ties:
1. A stock, whose time t price St follows a geometric Brownian motion
St = x exp(t + Bt )
for some constants , ; and
2. A bond, whose time t price t at time t is given as
t = 0 exp(rt)
for some constants 0 , r.
Both of these are Ito processes. The stock process we nd using Ito formula
is
1
dS = S(dt + dB) + S 2 (dB)2 = S(dt + dB),
2
2
(4)
The reason for this deterministic dependence has to do with the fact that one
can simply replicate the option by carefully constructing a portfolio consisting
of a bond and a stock. Another surprising aspect of the Black Scholes result was
that the portfolio and the price can be computed in a very explicit form.
We rst present the Black Scholes formula and then discuss its relevance to
option pricing.
The Black-Scholes Formula We are given a strike price K, time instance T ,
interest rate r and volatility . Dene
2
z=
x
+ (r + 2 )(T t)
log K
T t
and
(5)
Before we discuss how to establish the Black-Scholes formula (we will only
sketch the proof) let us discuss its behavior in various extreme cases. Say
again = 0. In this case it must be that = r. What is the worth of the option
at time t = 0? Suppose the price of the stock at time zero is x. At time T
option pays with probability one an amount xerT K. If xrT K then its
worth at time t = 0 is exactly x erT K as we can create a net xerT K by
investing x erT K into stock or bonds (which are equivalent since = r).
Therefore the right price of this option at time zero is exactly x erT K. But
if xerT < K or x KerT < 0 then with probability one option does not pay
anything. Therefore it is worth zero.
Let us see whether this matches what the Black-Scholes formula predicts.
We rst compute z as 0. In this case the limit of z is
log(x/K) + (r + 2 /2)T
=
0
T
lim
=
0
T
lim
=
tT
T t
lim C(x, t) = xN () KN () = x K
tT
when x > K and = 0 when x < K. This is again consistent with common
sense. As the strike time T approaches, the uncertainty about the stock gradually
disappears and its worth is x K when x > K and 0 when x < K, namely it
is worth exactly max(x K, 0) - which is its payoff upon maturity.
Proof sketch for the Black-Scholes Theorem. We will show that there exists a
self-nanced trading strategy at , bt for trading stocks St and bonds t such that
aT ST + bT T = max(ST K, 0), that is the value of the portfolio at time T
is exactly the payoff max(ST K, 0) of the option at time T . Since there is no
arbitrage then the price of the option at time t = 0 must be exactly a0 S0 + b0 0 .
We will rst assume that the right price C(St , t) of the option at time t is
nice. Specically the corresponding function C(x, t) is twice continuously
differentiable. Later on when we actually nd C we simply verify that this is
indeed the case. For now, we make this assumption and let us try to infer the
function C as well as self-nanced strategies a and b. We want to nd a selfnanced trading strategy a, b such that
at St + bt t = C(St , t).
(6)
dC =
(7)
or in differential form
d(aS + b) = adS + bd = (aS + br)dt + aSdB.
(8)
1
1
C
(C(St , t) at St ) = (C(St , t)
St )
t
t
x
6
(9)
On the other hand we need to match the remaining dt coefcients in (8) and (7):
br =
C
1 2C 2 2
S
+
t
2 x2
C
C
1 2C 2 2
St =
+
S
x
t
2 x2
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
11/25/2013
Content.
1. -elds on metric spaces.
2. Kolmogorov -eld on C[0, T ].
3. Weak convergence.
In the rst two sections we review some concepts from measure theory on
metric spaces. Then in the last section we begin the discussion of the theory
of weak convergence, by stating and proving important Portmentau Theorem,
which gives four equivalent denitions of weak convergence.
1
Now let us focus on C[0, T ] and the Borel eld B on it. For each t 0 dene
a projection t : C[0, T ] R as t (x) = x(t). Observe that t is a uniformly
continuous mapping. Indeed
|t (x) t (y)| = |x(t) y(t)| Ix yI.
This immediately implies uniform continuity.
The family of projection mappings t give rise to an alternative eld.
Denition 2. The Kolmogorov -eld K on C[0, T ] is the -eld generated by
t1 (B), t [0, T ], B B, where B is the Borel eld of R.
It turns out (and this will be useful) that the two elds are identical:
Theorem 1. The Kolmogorov eld K is identical to the Borel eld B of
C[0, T ].
Proof. First we show that K B. Since t is continuous then, for every open
set U R t1 (U ) is open in C[0, T ]. This applies to all open intervals U . Thus
each t1 (U ) B. This shows K B.
Now we show the other direction. Since C[0, T ] is Polish, then by Corol
lary 1, it sufces to check that every ball B(x, r) K. Fix x C[0, T ], r 0.
For each rational q [0, T ], consider Bq q1 ([x(q) r, x(q) + r]). This
is the set of all functions y such that |y(q) x(q)| r. Consider q Bq . As a
countable intersection, this is a set in K. We claim that q Bq = B. This implies
the result.
/ B.
To establish the claim, note that B Bq for each q. Now suppose y
Namely, for some t [0, T ] we have |y(t) x(t)| r + > r. Find a
sequence of rational values qn converging to t. By continuity of x, y we have
x(qn ) x(t), y(qn ) y(t). Therefore for all sufciently large n we have
/ Bq .
|y(qn ) x(qn )| > r. This means y
Weak convergence
E0
(1)
Note that XE is a continuous bounded function. Therefore, by assumption limn EPn [XE ] =
E[XE ]. Combining, we obtain that
lim sup Pn (F ) lim sup EPn [XE ] = lim EPn [XE ] = EP [XE ] P(F E )
n
Pn (A(t))dt
0
P(A(t))dt.
0
References
[1] P. Billingsley, Convergence of probability measures, Wiley-Interscience
publication, 1999.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
11/27/2013
Content.
1. Additional technical results on weak convergence
2. Functional Strong Law of Large Numbers
3. Existence of Wiener measure (Brownian motion)
We are about to establish two very important limit results in the theory of stochas
tic processes. In probability theory two cornerstone theorems are (Weak or
Strong) Law of Large Numbers and Central Limit Theorem. These theorems
have direct analogue in the theory of stochastic processes as Functional Strong
Law of Large Numbers (FSLLN) and Functional Central Limit Theorem (FCLT)
also known as Donsker Theorem. The second theorem contains in it the fact that
Wiener Measure exists.
2
nt
()
+ (nt nt )
nt +1 ()
(1)
As we see, just as SLLN, the FSLLN holds without any assumptions on the
variance of X1 , that is even if = .
Here is another way to state FSLLN. We may consider functions Nn dened
on entire [0, ) using the same dening identity (1). Recall that sets [0, T ] are
compact in R. An equivalent way of saying FSLLN is Nn converges to zero
almost surely uniformly on compact sets.
Proof. Fix E > 0 and T > 0. By SLLN we have that for almost all realizations
of an sequence X1 (), X2 (), . . ., there exists n0 () such that for all n >
n0 (),
Sn ()
E
<
n
T
We let M () = max1mn0 () Sm (). We claim that for n > M ()/E, there
holds
sup |Nn (t)| < E.
0tT
We consider two cases. Suppose t [0, T ] is such that nt > n0 (). Then
Nn (t) max
nt
n
3
() S
,
nt +1 ()
We have
Sbntc ()
Sbntc ( ) bntc
=
t .
n
bntc
n
T
Using a similar bound on Sbntc+1 (), we obtain
Nn (t)
.
n
Suppose now t is such that nt n0 (). Then
|Nn (t)|
M ()
< ,
n
since, by our choice n > M ()/. We conclude sup0tT |Nn (t)| < for all
n > M ()/. This concludes the proof.
3
Weiner measure
Nn (t, ) =
Sbntc ()
Xbntc+1 ()
+ (nt bntc)
,
n
n
n 1, t [0, T ]. (2)
This is again a piece-wise linear continuous function. Then for each n we obtain
a mapping
n : R C[0, T ].
Of course, for each n, the mapping n depends only on the first nT + 1 coordinates of samples in R .
Lemma 1. Each mapping n is measurable.
Proof. Here is where it helps to know that Kolmogorov field is identical to Borel
field on C[0, T ], that is Theorem 1.4 from the previous lecture. Indeed, now it
suffices to show that that n1 (A) is measurable for each set A of the form
44
+ (nt m)Xm+1 () y .
Nn (t, ) =
n
where m = bntc. This defines a measurable subset of Rm+1 R . One
to observe that the function f : Rm+1 R defined by
way to see this is P
xk
f (x1 , . . . , xm ) = 1km
+ (nt m)xm+1 is continuous and therefore is
n
measurable. We conclude that n is measurable for each n.
kn
Note, that this is indeed a very subtle result. We could try to use submartingale inequality, since Sk is sub-martingale. It will give
1
E[S 2 ]
P(max |Sk | n) 2 2n = 2 .
kn
n
kn
P(|Sn | ) +
= P(|Sn | ) +
3 max P(|Sk | ).
kn
kn
Fix E > 0. Let denote the cumulative standard normal distribution. Find 0
large enough so that 22 (1 (/3)) < E/3 for all 0 . Fix any such .
By the CLT we can nd n0 = n0 () large enough so that
2 E[Sk2 ]
2k
2 P(|Sk | (1/3) n) 2
=
E/3.
(1/9) 2 n
( /9) 2 n
We conclude that for all n 27n0 /E,
(4)
0 n
max
ijnT :jin
ikj
Xk |
Exercise 1. Use this to nish the proof of the lemma. Hint: partition interval
[0, T ] into length intervals and use Lemma 2.
7
+ (nt1 bnt1 c)
n
n
The second term in the sum converges to zero in probability. The first term we
rewrite as
Sbnt1 c () b nt1 c
.
n
b nt1 c
and by CLT it converges to a normal N (0, t1 ) distribution. Similarly, consider
P
Xbnt2 c+1 ()
nt1 <mnt2 Xm ( )
Nn (t2 ) Nn (t1 ) =
+ (nt2 bnt2 c)
n
n
Xbnt1 c+1 ()
(nt1 bnt1 c)
n
Again by CLT we see that it converges to normal N (0, t2 t1 ) distribution. Moreover, the joint distribution of (Nn (t1 ), Nn (t2 ) Nn (t1 )) converges
to a joint distribution of two independent normals with zero mean and variances t1 , t2 t1 . Namely, the (Nn (t1 ), Nn (t2 )) converges in distribution to
4 Applications
Theorem 4 has applications beyond the existence of Wiener measure. Here is
one of them.
Theorem 5. The following convergence holds
max1kn Sk
sup B(t)
n
0tT
(5)
max1kn Sk
y) = 2(1 (y)),
n
(6)
10
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
12/09/2013
Content.
1. G/G/1 queueing system
2. One dimensional reection mapping
3. Reected Brownian Motion (RBM)
Technical preliminaries
B(t) =
The jobs which have not been processed yet form a queue. Denote by Q(t)
the length of the queue at time t. Then we naturally obtain
Q(t) = Q(0) + A(t) D(t) = Q(0) + A(t) S(B(t)).
(1)
(2)
I(t) = t B(t) =
1{Q(s) = 0}ds.
(3)
The main question we ask for G/G/1 system is to characterize the behavior
of the queue length Q(t) and workload Z(t) as a process or in steady state
(t = ), when steady-state exists, assuming that we know the distribution of the
interarrival and service times. Ideally, we would like to compute the distribution
of Q(t), Z(t). But this is, with some exceptions, infeasible. Thus the focus will
be on obtaining approximations. As we will see we can get good approximations
when the system is in so called heavy trafc regime. This is the regime when
. Then the typically observed queue length will be very large and, as we
will see, can be approximated by a certain reected Brownian motion, for which
both time dependent and steady-state distribution is known.
3
Introduce
X(t) = V (Q(0) + A(t)) t
(4)
Then X(t) represents the total work that arrived into the system up to time t
minus t. Then we can rewrite (2) as
Z(t) = X(t) + I(t) 0.
(5)
(6)
whenever derivative is dened. Moreover, since the cumulative idling time can
increase only when there are no jobs in the system, then
Z
Z(t)dI(t) = 0.
(7)
0
There is a good reason to write equations (5),(6),(7) these equations turn out
to be dening Q, Y uniquely from X and we now establish this fact in a more
general setting. We say that x D[0, ) has no downward jumps, if at every
point of discontinuity t0 we have limtt0 x(t) x(t0 ).
Problem 2. Show that X has no downward jumps.
Theorem 1 (Reection (Skorohod) Mapping Theorem). Given x D[0, ), x(0)
0, with no downward jumps, there exists a unique pair y C[0, ), z
D[0, ) such that
3
(8)
0st
(9)
0st
Z t
1
d
/
2
(z(t) z (t)) =
(z(s) z / (s)) (z(s) z / (s))ds
2
ds
0
Z t
d
d
=
(z(s) z / (s))( y(s) y / (s))ds
ds
ds
0
But, by assumption zdy = z / dy / = 0, and we obtain an expression
Z t
d
d
(z(s) y / (s) z / (s) y(s)ds) 0
ds
ds
0
We conclude z(t) = z / (t). It is then immediate that y(t) = y / (t).
It remains to show that the mappings y = (x) and z = (x) are Lipschitz
continuous. That is, we need to show that for some constants C1 , C2 > 0 and
every pair x, x/ D[0, T ], we have
I(x) (x/ )IT Ix x/ IT , I(x) (x/ )IT Ix x/ IT
Denote Ix x/ IT by V 0. For any t [0, T ] we have
(x)(t) (x/ )(t) = sup (x(s))+ sup (x/ (s))+
0st
0st
+
0st
/
0st
=V
Similarly, we show (x/ )(t) (x)(t) V . This proves Lipschitz continu
ity of with constant C1 = 1. The Lipschitz continuity of follows then
immediately with constant C2 = 2.
The reection mapping also satises the following important memoryless
property: if we consider the process starting from some time t0 its reection is
the same as if we started at time 0:
Proposition 1. Given t0 > 0, x D[0, ) and the reection y = (x), z =
(x) of x, consider a modied process x
(t) = z(t0 ) + x(t0 + t) x(t0 ), t 0.
x)(t) = y(t0 + t) y(t0 ), t 0.
Then (
x)(t) = z(t0 + t), (
Problem 3. Establish this proposition.
5
We will see later on that when the G/G/1 queueing system is in heavy-trafc,
the process X(t) dened in (4) is approximated well by a Brownian motion. For
now assume that it is in fact a Brownian motion. Note, that X(t) is a process,
whose distribution we now in principle, since it is directly linked the arrival and
service processes. Thus if we can nd a reection of X with respect to the
Skorohod mapping, we obtain an approximation of the workload process Z(t).
Denition 1. A (one-dimensional) Reected Brownian Motion (RBM) is the pro
cess Z = (B) obtained by Skorohod mapping (, ), when the input process
is a Brownian motion B(t), B(0) 0. When B has drift and variance 2 , we
also write Z = RBM (, 2 ).
Knowing the forms (8),(9) of the reection mapping allows us to obtain
say something about the distribution of the reected process Z when the input
process is a Brownian motion.
Theorem 2. A Reected Brownian Motion Z(t) = RBM (, 2 ), converges in
distribution to some limiting random variable Z() iff < 0. In this case
Z() is exponentially distributed with parameter 2/ 2 .
It should not be surprising that we get limiting exponential distribution when
the drift is negative and no limiting distribution when drift is non-negative. Af
ter all we have established these facts for a maximum of a Brownian motion
M (t) = sup0st B(s). We just do the appropriate adjustments.
Proof. First assume B(0) = 0. Observe that in this case sup0st (B(s))+ =
sup0st (B(s)). Then
P(Z(t) z) = P(B(t) + sup (B(s)) z).
0st
0st
(s) z)
= P( sup B
0st
= P( sup B(s) z)
0st
0st
References
[1] H. Chen and D. Yao, Fundamentals of queueing networks: Performance,
asymptotics and optimization, Springer-Verlag, 2001.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
due 9/16/2009
Problem 2. Given two metric spaces (S1 , 1 ), (S2 , 2 ) show that a function
f : S1 S2 is continuous if and only if for every open set O S2 , f 1 (O) is
an open subset of S1 .
Problem 5. Establish the following fact, (which we have used in proving the
s theorem for general closed sets F ): given two
upper bound part of the Cramer
strictly positive sequences xn , yn > 0, show that if lim supn (1/n) log xn
I, lim supn (1/n) log yn I, then lim supn (1/n) log(xn + yn ) I.
Problem 6. Suppose M () < for all . Show that I(x) is a strictly convex
function.
Hint. Give a direct proof of convexity of I and see where inequality may
turn into equality. You may use the following fact which we have established in
the class: for every x there exists 0 such that x = M (0 )/M (0 ).
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
due 9/30/2013
Problem 1. The following identity was used in the proof of Theorem 1 in Lec
ture 4: sup{ > 0 : M () < exp(C)} = inf t>0 tI(C + 1t ) (see the proof for
details). Establish this identity.
Hint: Establish the convexity of log M () in the region where M () is nite.
Letting = sup{ > 0 : M () < exp(CC)} use the convexity property above
C
to argue that log M () C + d logdM () C ( ). Use this property to nish
Problem 2. This problem concerns the rate of convergence to the limits for the
large deviations bounds. Namely, how quickly does n1 log P(n1 Sn A)
converge to inf xA I(x), where Sn is the sum of n i.i.d. random variables?
Of course the question is relevant only to the cases when this convergence takes
place.
(a) Let Sn be the sum of n i.i.d. random variables Xi , 1 i n taking values
in R. Suppose the moment generating function M () = E[exp(X)] is
nite everywhere. Let a = E[X]. Recall that we have established
in class that in this case the convergence limn n1 log P(n1 Sn a) =
I(a) takes place. Show that in fact there exists a constant C such that
C C
C
C
C 1
Cn log P(n1 Sn a) + I(a)C ,
n
for all sufciently large n. Namely, the rate of convergence is at least as
fast as O(1/n).
ers Theorem.
(b) Show that the rate O(1/n) cannot be improved.
1
1
log P (
Xi )2 + (
Yi )2 1
n n
n
n
lim
1in
1in
exist and compute it numerically. You may use MATLAB (or any other software
of your choice) and an approximate numeric answer is acceptable.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
due 10/23/2013
Problem 1. Let B be standard Brownian motion. Show that P(lim supt B(t) =
) = 1.
Problem 2. (a) Consider the following sequence of partitions n , n = 1, 2, . . .
of [0, T ] given by ti = ni , 0 i n. Prove that quadratic variation of a standard
Brownian motion almost surely converges to T : limn Q(n , B) = 1 a.s., even
though n (n ) = n 1/n = .
(b) Suppose now the partition is generated by drawing n independent ran
dom values tk = Uk , 1 k n drawn uniformly from [0, T ] and independently
from the Brownian motion. Prove that limn Q(n , B) = T a.s. Note, almost
sure is with respect to the probability space of both the Brownian motion prob
ability and uniform sampling.
Problem 3. Suppose X F is independent from G F. Namely, for every
measurable A R, B G P({X A} B) = P(X A)P(B). Prove that
E[X|G] = E[X].
Problem 4. Consider an assymetric simple random walk Q(t) on Z given by
P(Q(t + 1) = x + 1|Q(t) = x) = p and P(Q(t + 1) = x 1|Q(t) = x) = 1 p
for some 0 < p < 1.
1. Construct a function of the state (x), x Z such that (Q(t)) is a mar
tingale.
2. Suppose Q(0) = z > 0 and p > 1/2. Compute the probability that the
random walk never hits 0 in terms of z, p.
Problem 5. On a probability space (, F, P) consider a sequence of random
variables X1 , X2 , . . . , Xn and -elds F1 , . . . , Fn F such that E[Xj |Fj1 ] =
Xj1 and E[Xj2 ] < .
1. Prove directly (without using Jensens inequality) that E[Xj2 ] E[Xj21 ]
for all j = 2, . . . , n. Hint: consider (Xj Xj1 )2 .
1
Problem 6. The purpose of this exercise is to extend some of the stopping times
theory to processes which are (semi)-continuous. Suppose Xt is a continuous
time submartingale adopted to Ft , t R+ and T is a stopping time taking values
in R {}. Suppose additionally that Xt is a.s. a right-continuous function
with left limits (RCLL).
(a) Suppose there exists a countably innite strictly increasing sequence tn
R+ , n 0, such that P(T {tn , n 0} {}) = 1. Emulate the proof
of the discrete time processes to show that XtT , t R+ is a submartin
gale.
(b) Given a general stopping time T taking values in R+ {}, consider
a sequence of r.v. Tn dened by Tn () = 2kn , k = 1, 2, . . . if T ()
[ k2n1 , 2kn ) and Tn () = if T () = . Establish that Tn is a stopping
time for every n.
(c) Suppose the submartingale Xt is in L2 , namely E[Xt2 ] < , t. Show
that XT t is a submartingale as well.
Hint: Use part (b), Doob-Kolmogorov inequality and the Dominated Con
vergence Theorem.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
due 11/13/2013
we have discussed in the lecture, it is known that the limit limn E[Ln ]/ n =
Show that in fact it must be the case that limn mn /( n). Namely the median
t2
Cn
t
(a) Dene 0 Xs dMs for simple processes. Show that the resulting process
is a martingale and establish Ito isometry for it.
that if X n is a sequence of simple pro
(b) Given an arbitrary X L
2t, show
n
cesses such that limn E[ 0 (Xs Xs )2 d(Ms )] = 0, then the sequence
t n
X dMs is Cauchy for every t in the L2 sense. Use this to dene
0t s
0 Xs dMs for any process X L2 and establish Ito isometry for the Ito
integral. Here the integration d(Ms ) is understood in the Stieltjes sense.
You do not need to prove existence of processes Xtn satisfying the requirement above (unless you would like to).
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
Problem 1
( rn
Let fn (t) = Tt
for t [0, T ]. Then we have that fn C[0, T ] for n =
1, 2, .... Let K = {fn (t), n = 1, 2, ...}. In order to prove K is closed, it sufces
to prove C[0, T ]\K is open. For any f C[0, T ]\K, assume infn Jf fn J = 0.
Since lim inf n Jf fn J inf n Jf fn J, then we have lim inf n Jf fn J = 0.
Then there exists a subsequence {fni , i = 1, 2, ...} such that Jf fni J 0 as
i . Thus, we have
f (t) =
0,
1,
if 0 t < T
if t = T
t n
t
=
>
Then
n=1 Bo fn (t),
1
3
1
1+n
1
3
ni
1
1 + ni
where m is nite.
Problem 2
Problem 3
Proof. Suppose f1 , f2 , ... is a cauchy sequence in C[0, T ] with the uniform metric Jx yJ. For an E > 0, there exists an N > 0 such that for any n1 , n2 > N ,
we have
E > Jfn1 fn2 J = sup |fn1 (t) fn2 (t)|
t[0,T ]
Problem 4
Proof. Part a.
M (0) = E[e0 ] = 1. If M () < for some > 0, then for any ' (0, ], we
have
'
M ( ' ) = E(e X ) =
'
e x dP (x)
'
e x dP (x) + 1
M () + 1 <
Likewise if M () < for some < 0, then for any ' [, 0), we have
'
M ( ' ) = E(e X ) =
'
e x dP (x)
'
e x dP (x) + 1
M () + 1 <
Part b.
Suppose X has Cauchy distribution, i.e. its density function is
fX (x) =
1
(1 + x2 )
A exp(x x), if x 0
fX (x) =
.
where A = 1.10045 is a normalizing constant. For [1, 1]. It is readily
veried that
M () = E[exp(X)] = A
exp((1+)x x)dx+A
exp((1)x x)dx
x log(
1 + exp()
) x log(exp()) (x 1)
2
Problem 5
Proof. Consider two strictly positive sequences xn > 0 and yn > 0. Since
lim supn lognxn I and lim supn lognyn I, then for any E1 > 0, there exists
an N such that for any n > N , we have
log ym
log xm
I + E1 , sup
I + E1
m
m
mn
mn
sup
max{ sup
Thus, for any m n,
max{
log xm log ym
,
} I + E1
m
m
mn
log xm log ym
,
}} I + E1
m
m
4
mn
log 2
log xm log ym
log(2xm ) log(2ym )
,
}}
+ sup {max{
,
}} I+E1 +E2
m
m
n
m
m
mn
n mn
log(xn +yn )
n
log(xm + ym )
I
m
I.
Problem 6
Proof. For any x1 , x2 and [0, 1], let x = x1 + (1 )x2 , and observe
I(x) = I(x1 + (1 )x2 ) = sup((x1 + (1 )x2 ) log M ())
I(x1 ) + (1 )I(x2 )
(1)
If I(x) is not strictly convex, there exists x1 = x2 , and (0, 1) such that
sup{(x1 log M ()) + (1 )(x2 log M ())}
However, we know that for every x R there exists 0 R such that I(x) =
0 x log M (0 ). Moreover, 0 satises
x=
Let 0 R such that
(0 )
M
M (0 )
M (0 )
M (0 )
= x1 + (1 )x2 . We have
Clearly, if either 0 x1 log M (0 ) < sup (x1 log M ()) or 0 x2 log M (0 ) <
sup (x2 log M ()), then the equality does not hold. Therefore 0 also
achieves the maximum for both x1 and x2 . By rst order conditions, we also
obtain
M (0 )
M (0 )
= x2
= x1 ,
M (0 )
M (0 )
which implies that x1 = x2 and thus gives a contradiction.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
Problem 1
Proof. The lecture note 4 has shown that { > 0 : M () < exp(C)} is
nonempty. Let
:= sup{ > 0 : M () < exp(C)}
If = , which implies that for all > 0, M () < exp(C) holds, we have
1
inf tI(C + ) = inf sup{t(C log M ()) + } = =
t>0 R
t
t>0
= inf t( C log M ( )) +
t>0
(1)
=(C
+ )( ) +
t
t
M ( )
1
Thus, we have
1
M ( ) 1
inf t sup (C + ) log M () inf t sup (C
+ )( ) +
t>0 R
t>0 R
t
M ( )
t
(2)
Then we will establish the fact that
sufciently small h > 0 such that
( )
M
M ( )
log M ( h) log M ( )
<C
h
which implies that
log M ( h) > log M ( ) Ch
log M ( h) > C( h) M ( h) exp(C( h))
which contradicts the denition of . By the facts that
inf t sup (C
t>0
and
( )
M
M ( )
M ( ) 1
+ )( ) 0, (when = )
M ( )
t
C, we have that
inf t sup (C
t>0
M ( ) 1
+ )( ) = 0
M ( )
t
( )
M
M ( )
1
inf t sup (C + ) log M ()
t>0 R
t
1
inf tI(C + )
t>0
t
From (1) and (3), we have the result inft>0 tI(C + 1t ) = .
= 0. From (2),
(3)
(Yi a)
1
i=1
P(n Sn a [0, )) = P
[0, n)
n
By the CLT, setting = O(n1/2 ) gives
P(n1 Sn a [0, )) = O(1)
Thus, we have
n1 log P(n1 Sn a) + I(a) 0 n1 log P(n1 Sn a [0, ))
= O(n1/2 )
Combining the result from the upper bound n1 log P(n1 Sn a) I(a),
we have
C
|n1 log P(n1 Sn a) + I(a)|
n
(b). Take a = . It is obvious, P(n1 Sn )
I() = 0, we have
1
2
as n . Recalling that
Problem 3
1
Xi
n
in
C
n
and
1
Yi
n
y Mn =
in
Let B0 (1) be the open ball of radius one in R2 . From these denitions, we can
rewrite
2
2
n
n
1
1
Yi
1 = P(Mn
/ B0 (1))
P
Xi +
n
n
i=1
i=1
2
n
n
1 1
1
+
Yi
lim P
Xi
n n
n
n
i=1
where
1 =
i=1
I(x, y) =
with
sup
(1 ,2 )R2
inf
(x,y)B0 (1)C
I(x, y)
(1 x + 2 y log(M (1 , 2 )))
M (1 , 2 ) = E[exp(1 X + 2 Y )]
Note that since (X, Y ) are presumed independent, log(M (1 , 2 )) = log(MX (1 ))+
log(MY (2 )), with MX (1 ) = E[exp(1 , X)] and MY (2 ) = E[exp(2 Y )].
We can easily compute that
MX () = exp(
and
Y
MY () = E[e
2
)
2
1 y 1
1
]=
e
dy = e = (e e )
2
2
2
1
1
y 1
Since (x, y) are decoupled in the denition of (x, y), we obtain I(x, y) =
IX (x) + IY (y) with
x2
12
)=
2
2
1
1
1
IY (y) = sup g2 (y, 2 ) = sup(2 y log( (e2 e2 )))
2
2
2
IX (x) = sup g1 (x1 , 1 ) = sup(1 x
Since IX (x) is increasing in |x| and IY (y) is increasing in |y|, the maximum is attained on the circle x2 + y 2 = 1, which can be reparametrized as a
one-dimensional search over an angle . Optimizing over , we nd that the
minimum of I(x, y) is obtained at x = 1, y = 0, and that the value is equal to
1
2 . We obtain
2
2
n
n
1
1
1
1
Yi
1 =
lim log P
Xi +
n n
n
n
2
i=1
i=1
Problem 4
We denote Yn the set of all length-n sequences which satisfy condition (a). The
rst step of our method will be to construct a Markov Chain with the following
properties:
For every n 0, and any sequence (X1 , ..., Xn ) generated by the Markov
Chain, (X1 , X2 , ..., Xn ) belongs to Yn .
For every n 0, and every (x1 , ..., xn ) Yn , (x1 , ..., xn ) has positive
probability, and all sequences of Yn are almost equally likely.
Consider a general Markov Chain with two states (0, 1) and general transition
probabilities (P00 , P01 ; P10 , P11 ). We immediately realize that if P11 > 0, sequences with two consecutive ones dont have zero probability (in particular, for
n = 2, the sequence (1, 1) has probability (1)P11 . Therefore, we set P11 = 0
(and thus P10 = 0), and verify this enforces the rst condition.
Let now P00 = p, P01 = 1 p, and lets nd p such that all sequences are
almost equiprobable. What is the probability of a sequence (X1 , ..., Xn )?
Every 1 in the sequence (X1 , ..., Xn ) necessarily transited from a 0, with
probability (1 p).
Zeroes in the sequence (X1 , ..., Xn ) can come either from another 0, in
which case they contribute a p to the joint probability (X1 , ..., Xn ), or from
a 1, in which case they contribute a 1. Denote N0 and N1 the numbers of 0
and 1 in the sequence (X1 , ..., Xn ). Since each 1 of the sequence transits to a
0 of the sequence, there are N1 zeroes which contribute a probability of 1, and
thus N0 N1 zeroes contribute a probability of p. This is only almost correct,
though, since we have to account for the initial state X1 , and the nal state Xn .
By choosing for initial distribution (0) = p and (1) = (1 p), the above
reasoning applies correctly to X1 .
Our last problem is when the last state is 1, in which case that 1 does not
give a 1 to 0 transition, and the probabilities of zero-zero transitions is therefore
N0 N1 + 1. In summary, under the assumptions given above, we have:
P(X1 , ..., Xn ) =
(1 p)N1 pN0 N1 ,
when Xn = 0
N
N
N
+1
(1 p) 1 p 0 1 , when Xn = 1
)N1 pn ,
( 1p
p2
( 1p
)N1 pn+1 ,
p2
when Xn = 0
when Xn = 1
1p
p2
.
equation has positive solution p = 51
= 0.6180, which we take in the rest
2
of the problem (trivia: 1/p = , the golden ratio). The steady state distribution
of the resulting Markov Chain can be easily computed to be = (0 , 1 ) =
1p
1
( 2p
, 2p
) (0.7236, 0.2764). We also obtain the almost equiprobable condition:
We conclude that if
P (X1 , ..., Xn ) =
pn ,
when Xn = 0
n+1
p
, when Xn = 1
We now relate this Markov Chain at hand. Note the following: log(|Zn |) =
log( ||ZYnn|| ) + log(|Yn |), and therefore,
1
1
1
|Zn |
log(Zn ) = lim log(|Yn |) + lim log
n n
n n
n n
|Yn |
lim
Let us compute rst limn n1 log(|Yn |). This is easily done using our Markov
Chain. Fix n 0, and observe that since our Markov Chain only generates
sequences which belong to Yn , we have
1=
P (X1 , ..., Xn )
(X1 ,...,Xn)Yn
Note that for any (X1 , ..., Xn ) Yn , we have pn+1 P (X1 , ..., Xn ) pn , and
so we obtain
pn+1 |Yn | 1 pn |Yn |
6
and so
n |Yn | n+1 ,
P (X1 , ..., Xn )
P(G(X1 , ..., Xn ) k) =
(X1 ,...,Xn )Zn
P(G(X1 , ..., Xn ) k)
(X1 ,...,Xn)Zn
Similarly,
P(G(X1 , ..., Xn ) k) p
|Zn |
1/p
(1/p)
(1/p)n+1
|Yn |
|Zn |
|Yn |
with
p exp() 1 p
M () =
exp()
0
7
p2 exp(2)+4(1p) exp()
largest solution is () =
2
of the MC is
I(x) = sup(x log(()))
Since the mean of F under the steady state distribution is above 0.7, the minn|
imum minx0.7 I(x) = I() = 0. Thus, limn n1 log |Z
|Yn | = 0, and we
conclude
1
lim log |Zn | = log = 0.4812
n n
In general, for k , we will have
lim
1
log |Zn (k)| = log = 0.4812
n
lim
5
5.1
Problem 5
1(i)
Consider a standard Brownian motion B, and let U be a uniform random variable over [1/2, 1]. Let
W (t) =
B(t),
B(U ) = 0,
when t = U
otherwise
With probability 1, B(U ) is not zero, and therefore limtU W (t) = limrU B(t) =
B(U ) = 0 = W (U ), and W is not continuous in U . For any nite collection of times t = (t1 , ..., tn ) and real numbers x = (x1 , ..., xn ); denote
W (t) = (W (t1 ), ..., W (tn )), x = (x1 , ..., xn )
/ {ti , 1 i n})
P(W (t) x) =P(U
/ {ti , 1 i n})P(W (t) x|U
+ P(U {ti , 1 i n})P(W (t) x|U {ti , 1 i n})
/ {ti , i
Note that P(U {ti , 1 i n}) = 0, and P(W (t) x|U
n}) = P(B(t x)), and thus the Process W has exactly the same distribution
properties as B (gaussian process, independent and stationary increments with
zero mean and variance proportional to the size of the interval).
8
5.2
1(ii)
Let X be a Gaussian random variable (mean 0, standard deviation 1), and denote
QX the set {q + x, q Q} R+ where Q is the set of rational numbers.
W (t) =
B(t),
when t
/ QX \{0}
B(t) + 1, whent QX \{0}
Through the exact same argument as 1(i), W has the same distribution properties
as B (this is because QX , just like {ti , 1 i n}, has measure zero for a
random variable with density).
nl
However, note that for any t > 0, |t x i(tx)10
| 10n , proving
10n
n
nl
l
that limn (x + i(tx)10
) = t. However, for any n, x + i(tx)10
QX , and
10n
10n
i(tx)10n l
so limn W (x +
) = B(t) + 1 = B(t). This proves W (t) is surely
10n
discontinuous everywhere.
5.3
Let t 0, and consider the event En = {|B(t + n1 ) B(t)| > E}. Then, since
B(t + n1 ) B(t) is equal in distribution to 1n N , where N is a standard normal,
by Chebychevs inequality, we have
P(En ) = P(n1/2 |N | > E) = P(|N | > En1/2 ) = P(N 4 > E4 n2 )
3
E4 n 2
Problem 6
The event B AR is included in the event B(2) B(1) = B(1) B(0), and
thus
P (B AR ) P (B(2) B(1) = B(1) B(0)) = 0
Since the probability that two atomless, independent random variables are equal
is zero (easy to prove using conditional probabilities).
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Pset #3 Solutions
Problem 1.
Problem 2(a)
Problem 3.
Problem 4.
2.
Problem 5.
1.
2.
Problem 6.
a).
b).
c).
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Problem 1 (15 points) Suppose Xi is an i.i.d. zero mean sequence of random variables
with a nite everywhere moment generating function M () = E[exp(X1 )]. Argue the exis
tence and express the following large deviations limit
1
lim log P
Xi Xj > n2 z
n n
1i=jn
in terms of M () and z.
Problem 2 (15 points) Establish the following identity directly from the denition of the
Ito integral:
sdBs = tBt
Bs ds.
0
t
0 Bs ds
i Bti+1 (ti+1
ti ).
Problem 3 (15 points) Given a stochastic process Xt , the so-called Stratonovich integral
t
0 Xs dBs of Xt with respect to a Brownian motion Bt is dened as an L2 limit of
Xti (Bti+1 Bti ),
lim
n
0in1
December, 2013
t n
max
1knt
1ik
Xi
k
n
z .
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Fall 2013
Midterm Solutions
1
Problem 1
Suppose a random variable X is such that P(X > 1) = 0 and P(X > 1 E) >
0 for every E > 0. Recall that the large deviations rate function is dened
to be I(x) = sup (x log M ()) for every real value x, where M () =
E[exp(X)], for every real value .
(a) show that I(x) = for every x > 1.
Since P(X > 1) = 0, we have
M () =
Now, take any E > 0 such that x < 1 E, and note that
M () =
exp(x)dPX (x)
1
1E
exp(x)dPX (x) =
1E
exp(x)dPX (x) +
1
1E
exp(x)dPX (x)
Let f (, E) denote the quantity above. For any 0 and E > 0, we have
M () f (, E), and we obtain that
I(1) sup log f (, E)
0,E>0
Problem 2
..
..
P =
.
exp(f (X1 ))1 exp(f (XN ))N
Let v = [1, ..., 1]T and M () = E[exp(f (X1 ))]. Then we have that
P v = M ()v
Since P has rank 1 and M () > 0, we have that M () is the Perron-Frobenius
eigenvalue of P . Thus, we have
I(x) = sup(x log (P )) = sup(x log M ())
Problem 3
Wt
1
E
W t + Bt
and that
E[
Wt
1 + Wt
Bt
Wt
Wt
+
|Ft ] =
W t + Bt 1 + W t + Bt W t + Bt 1 + W t + Bt
W t + Bt
Wt (1 + Wt + Bt )
Wt
=
=
W t + Bt
(Wt + Bt )(1 + Wt + Bt )
3
t
Thus, WtW+B
, t 0 is a martingale. Since |XtT | 1, the optional stopt
ping theorem gives that XT is almost surely well dened random variable and
E[XT ] = E[X0 ]. Thus, we have
1
2
E[XT ] = P(T < ) + P(T = ) = E[X0 ] =
2
3
1
2
P(T < ) + (1 P(T < )) =
2
3
(1)
t
for t and it exists by Martingale
where 0 1 is the fraction WtW+B
t
convergence theorem. By (1), we have that P(T < ) < 1, thus
2
3
2
12 P(T < )
1 P(T < )
2
0 3 2
1 P(T < )
3
1 P(T < )
1 P(T < )
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
6.265-15.070
Midterm Exam
Date: October 30, 2013
6pm-8pm
80 points total
Problem 1 (30 points) . Suppose a random variable X is such that P(X > 1) = 0 and
P(X > 1 E) > 0 for every E > 0. Recall that the large deviations rate function is dened to
be I(x) = sup (x log M ()) for every real value x, where M () = E[exp(X)], for every
real value .
(a) Show that I(x) = for every x > 1.
(b) Show that I(x) < for every E[X] x < 1.
(c) Suppose limH0 P(1 E X 1) = 0. Show that I(1) = .
Problem 2 (20 points) Recall the following one-dimensional version of the Large Devia
tions Principle for nite state Markov chains. Given an N -state Markov chain Xn , n 0
with transition matrix Pi,j , 1 i, j N and a function f : {1, . . . , N } R, the se
f (Xi )
Problem 3 (30 points) The following two parts can be done independently.
(a) Suppose, Xn , n 0 is a martingale such that the distribution of Xn is identical for all
n and the second moment of Xn is nite. Establish that Xn = X0 almost surely for all
n.
Midterm Exam
(b) An urn contains two white balls and one black ball at time zero. At each time t =
1, 2, . . . exactly one ball is added to the urn. Specically, if at time t 0 there are
Wt white balls and Bt black balls, the ball added at time t + 1 is white with probability
Wt /(Wt + Bt ) and is black with the remaining probability Bt /(Wt + Bt ). In particular,
since there were three balls at the beginning, and at every time t 1 exactly one ball
is added, then Wt + Bt = t + 3, t 0. Let T be the rst time when the proportion of
white balls is exactly 50% if such a time exists, and T = if this is never the case.
Namely T = min{t : Wt /(Wt +Bt ) = 1/2} if the set of such t is non-empty, and T =
otherwise. Establish an upper bound P(T < ) 2/3.
Hint: Show that Wt /(Wt + Bt ) is a martingale and use the Optional Stopping Theorem.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.