Tree Scheme
Tree Scheme
Tree Scheme
Tree Scheme
John B. Walsh1
Department of Mathematics, University of British Columbia, Vancouver B.C. V6T
1Y4, Canada
(e-mail: walsh@math.ubc.ca)
1 Introduction
The binomial tree scheme was introduced by Cox, Ross, and Rubinstein [1] as a
simplification of the Black-Scholes model for valuing options, and it is a popular
and practical way to evaluate various contingent claims. Much of its usefulness
stems from the fact that it mimics the real-time development of the stock price,
making it easy to adapt it to the computation of American and other options.
From another point of view, however, it is simply a numerical method for solving
initial-value problems for a certain partial differential equation. As such, it is
known to be of first order [6], [7], [2], [3], at least for standard options. That is,
the error varies inversely with the number of time steps.
A key point in typical financial problems is that the data is not smooth. For
instance, if the stock value at term is x, the payoff for the European call option
is of the form f (x) = (x − K)+ , which has a discontinuous derivative. Others,
such as digital and barrier options, have discontinuous payoffs. This leads to
1 I would like to thank O. Walsh for suggesting this problem and for many helpful conver-
sations.
1
an apparent irregularity of convergence. It is possible, for example, to halve
the step size and actually increase the error. This phenomenon comes from the
discontinuity in the derivative, and makes it quite delicate to apply things such
as Richardson extrapolation and other higher-order methods which depend on
the existence of higher order derivatives in the data.
The aim of this paper is to study the convergence closely. We will determine
the exact rate of convergence and we will even find an expression for the constants
of this rate.
Merely knowing the form of the error allows us to modify the Richardson
extrapolation method to get a scheme of order 3/2.
We will also see that the delta, which determines the hedging strategy, can
also be determined from the tree scheme, and converges at exactly the same
rate.
The argument is purely probabilistic. The Black-Scholes model treats the
stock price as a diffusion process, while the binomial scheme treats it as a Markov
chain. We use a procedure called Skorokhod embedding to embed the Markov
chain in the diffusion process. This allows a close comparison of the two, and
an accurate evaluation of the error. This was done in a slightly different way by
C.R. Rogers and E.J. Stapleton, [9], who used it to speed up the convergence of
the binomial tree scheme.
This embedding lets us split the error into two relatively easily analyzed
parts, one which depends on the global behavior of the data, and the other
which depends on its local properties.
2 Embeddings
The stock price (St ) in the Black-Scholes model is a logarithmic Brownian mo-
tion, and their famous hedging argument tells us that in order to calculate
def
option prices, the discounted stock price S̃t = e−rt St should be a martingale.
This hedging argument does not depend on the fact that the stock price is a
logarithmic Brownian motion, but only on the fact that the market is complete:
the stock prices in other complete-market models should also be martingales,
at least for the purposes of pricing options. Even in incomplete markets, it is
common to use a martingale measure to calculate option prices, at least as a
first approximation.
It is a general fact [8] that any martingale can be embedded in a Brownian
motion with the same initial value by Skorokhod embedding, and a strictly
positive martingale can be embedded in a logarithmic Brownian motion. That
means that one can embed the discounted stock price from other single-stock
models in the discounted Black-Scholes stock price. Suppose for example, that
Yk , k = 0, 1, 2, . . . is the stock price in a discrete model, and that Y0 = S0 .
def
Under the martingale measure, the discounted stock price Ỹk = e−krδ Yn is a
martingale. Then there are (possibly randomized) stopping times 0 = τ0 < τ1 <
. . . for St such that the processes {Ỹk , k = 0, 1, 2, . . . } and {S̃τk , k = 0, 1, 2, . . . }
have exactly the same distribution. Thus the process (Ỹk ) is embedded in S̃t : Ỹk
2
is just the process S̃t sampled at discrete times. However, the times are random,
not fixed. This is what we mean by embedding.
We note that this embedding works for a single-stock market, but not in gen-
eral for a multi-stock market, unless the stocks evolve independently, or nearly
so.
Let f be a positive function. Suppose there is a contingent claim, such as a
European option, which pays off an amount f (ST ) at time T if the stock price at
time T is ST . If S0 = s0 , its value at time zero is V (s0 , 0) ≡ e−rT E{f (ST )} . On
the other hand, if T = nδ, the same contingent claim for the discrete model pays
def
f (Yn ) at maturity and has a value at time zero of U (s0 , 0) = e−rT E{f (Yn )} .
But Yn = e Ỹn has the same distribution as e S̃τn , while ST = erT S̃T . Thus
rT rT
U (s0 , 0) = e−rT E{f (erT S̃τn )}, and the difference between the two values is
1 def def
P Ỹj+1 = a−1 Ỹj | Ỹj = 1 − q .
P Ỹj+1 = aỸj | Ỹj = = q,
a+1
3
def
U (Ỹk , k) = e−r(T −kδ) E{f (Yn ) | Yk } = e−r(T −k)δ) E{f (erT Ỹn ) | Yk } . (2)
Let u(j, k) = U (aj , k). Then u is the solution of the difference scheme
u(j, n) = f (erT aj ), j ∈ Z .
(3)
Under its own martingale measure, the corresponding Black-Scholes model
will have a stock price given by
1 2
St = S0 eσWt +(r− 2 σ )t
, t ≥ 0, (4)
4 Results
We say that a function f is piecewise C (k) if f, f 0 , . . . , f (k) have at most finitely
many discontinuities and no oscillatory discontinuities. We will treat the follow-
ing class of possible payoff functions.
Let us introduce some notation which will be in force for the remainder of
the paper. Let f ∈ K and consider a contingent claim which pays an amount
f (s) at a fixed time T > 0 if the stock price at time T is s. Let n be the number
of time steps in the discrete
p model, so that the time-step is δ = T /n. The space
step h is then h = σ T /n.
4
The error depends on the discontinuities of f and f 0 , and on the relation of
these discontinuities to the lattice points.
This is a special case of Theorem 4.2 below, so there is no need for a separate
proof. We collect (10), and Propositions 9.5, 9.6 and 9.7, and use (46) to express
them in terms of f instead of g. We get:
5
"
e−rT 5 σ 2 T σ 4 T 2 n o 1 n o
E tot (f ) = + + E f (ST ) − 2 E (log(S̃T /s0 ))2 f (ST )
n 12 6 192 6σ T
1 n
4
o 2 2 n
2 00
o
− E (log( S̃ T /s 0 )) f (S T ) + σ T E S T f (S T )
12σ 4 T 2 3
X 1 1
+ σ2 T si ∆f 0 (si ) − ∆f (si ) + 2θ(s̃i /s0 )(1 − θ(s̃i /s0 )) p̂(log(s̃i /s0 ))
i
2 3
1 X
− log(s̃i /s0 ) ∆f (si ) p̂(log(s̃i /s0 ))
3 h
i:log(s̃i /s0 )∈Ne
#
1 X
+ log(s̃i /s0 ) ∆f (si ) p̂(log(s̃i /s0 ))
6
i:log(s̃i /s0 )∈Nh o
√
−rT σ T
X 1
+e √ (2θ(s̃i /s0 ) − 1) ∆f (si )p̂(log(s̃i /s0 )) + O 3/2 (8)
n n
i:log(s̃i /s0 )∈hZ
/
where the expectations are taken with respect to the martingale measure.
Remark 4.3 We have expressed the errors in terms of E{f (ST )}. However, we
can also express them in terms of erT S̃τn , and it might be better to do so, since
this is exactly what the binomial scheme computes. Indeed, the theorem tells us
that the expectations of f (ST ) and f (erT S̃τn ) only differ by O(1/n), and they
occur as coefficients multiplying 1/n in (8) so one can replace ST by erT S̃τn in
(8) and the result will only change by O(n−2 ), so these formulas remain correct.
So in fact ST and erT S̃τn are interchangeable in (8); and, for the same reason,
both are interchangeable with Sτn .
The delta, which determines the hedging strategy in the Black-Scholes model,
can also be estimated in the tree scheme, and its estimate also converges with
order one. (See Section 10.) Let θ̌(s) = frac ( h+log
2h )
s
Corollary 4.4 Suppose that f is continuous and both f and f 0 are in K. The
symmetric estimate (35) of the delta converges with order one. For a call or put
option with strike price K, there are constants A and B such that the error at
time 0 is of the form
1
A + B θ̌(K̃)(1 − θ̌(K̃)) + o(n−1 ) . (9)
n
6
2. The striking fact about the tree scheme’s convergence is that, even when
restricted to even values of n, the error goes to zero at the general rate of O(1/n),
but “with a wobble:” there are constants c1 < c2 for which c1 /n < E tot (f ) <
c2 /n, and the error fluctuates quasi-periodically between these bounds.2
The reason is clear from (8). For example, a typical European call with strike
price K pays off f (x) = (x − K)+ and (8) simplifies: the last three series vanish,
and the first reduces to the single term
1
σ2 T K + 2θ(1 − θ) p̂(log(s̃/s0 )) .
3
The quantity to focus on is θ. It is in effect the fractional distance (in log scale)
from K̃ to thepnearest even lattice point. In log scale, the lattice points are
multiples of σ T /n, so the whole lattice changes as n changes. This means
that θ changes with n too. It can vary from 0 to 1, so this term can vary by a
factor of nearly three. It is not the only error term, but it is important, and it
is why there are cases where one can double the number of steps and more than
double the error at the same time.
3. The coefficients in Theorem 4.2 are rather complex, and Corollary 4.1 is
handier for vanilla options. It shows that one can make a Richardson-like ex-
trapolation to increase the order of convergence. If we run the tree for three
values of n which give different values of θ, we can then write down (7) for the
three, solve for the coefficients A and B, and subtract off the first order error
terms, giving us potentially a scheme of order 3/2. In fact, one could do this
cheaply: use two runs at roughly the square root of n, and then one at n. This
might be of interest when using the scheme to value American options.
4. It is usually the raw stock price, not the discounted price, which evolves on
the lattice. However, our numerical studies have shown that the behavior of the
two schemes is virtually identical: to adapt Corollary 4.1 to the evolution of
the raw price, just replace the discounted strike price K̃ by the raw strike price
K in the definition of θ. We have therefore used the discounted price for its
convenience in the embedding.
5. From a purely probabilistic point of view, Theorem 4.2 is a rate-of-convergence
result for a central limit theorem for Bernoulli random variables. If we take f to
be the indicator function of (−∞, z], we recover the Berry-Esseen bound. (We
thank the referee for pointing this out.)
7
can be defined explicitly. Define stopping times τ0 , τ1 , τ2 . . . by induction:
As S̃t is a martingale, so is S̃τ0 , S̃τ1 , . . . Since S̃τk+1 can only equal aS̃τk or
−1
a S̃τk , we must have
1 a
P {S̃τk+1 = aS̃τk | S̃τk } = P {S̃τk+1 = a−1 S̃τk | S̃τk } = .
a+1 a+1
It follows that (S̃τk ) is a Markov chain with the same transition probabilities
as (Ỹk ); since S̃τ0 = Y0 = 1, the two are identical processes. It follows that
the error in the binomial scheme (considered as an approximation to the Black-
Scholes model) is given by
def
E tot (f ) = u(1, 0) − v(1, 0) = e−rT E{f (erT S̃τn ) − f (erT S̃T )} . (10)
Here is a quick heuristic argument to show that the convergence is first order.
Expand E{f (ST +s )} in a Taylor series. It is
8
Next, we make a Girsanov transformation to remove the drift of Xt . Let ξ
be the maximum of T , τn , and τJ , where τJ is defined below—the value of ξ is
not important, so long as it is larger than the values of t we work with—and set
1 1 2
dQ = e 2 Xξ + 8 σ ξ
dP .
By Girsanov’s Theorem [4], { σ1 Xt ,
0 ≤ t ≤ ξ} is a standard Brownian motion
on (Ω, F , Q). We will call Q the Brownian measure to distinguish it from the
martingale measure P . We will do all our calculations in terms of Q, and then
translate the results back to P at the very end. Under the measure Q, Xt is
a Brownian motion, and (Xτj ) is a simple symmetric random walk on hZ. It
alternates between even and odd multiples of h. To smooth this out, we will
restrict ourselves to even values of j and n.
Thus let n = 2m for some integer m and define
n 1 1 2
o
E tot (f ) = e−rT E Q f (s0 eXτn +rT ) − f (s0 eXT +rT ) e− 2 Xξ − 8 σ ξ . (13)
1 2
Now e− 2 Xt −σ t/8
is a Q-martingale, so as τn ≤ ξ,
1 2
E P {f (Sτn )} = E Q f (Sτn )e− 2 Xξ −σ ξ/8
1 2 2
= E Q f (Sτn )e− 2 Xτn −σ τn /8 = E Q g(Xτn )e−σ (τn −T )/8
E P {f (ST )} = E Q {g(XT )} .
9
Thus
σ2
E tot (f ) = e−rT E Q {g(Xτn ) − g(XT )} + e−rT E Q g(Xτn ) e− 8 (τn −T ) − 1 .
where we have used Proposition 11.1 and expanded in powers of 1/n. Thus
σ4 T 2 Q 1
E tot (f ) = e−rT E Q {g(Xτn ) − g(XT )} + e−rT E {g(Xτn )} + O( 2 )
192n n
σ4 T 2 Q 1
= e−rT E Q {g(Xτn )−g(XτJ )}+e−rT E Q {g(XτJ )−g(XT )}+e−rT E {g(Xτn )}+O( 2 )
192n n
def σ4 T 2 Q
= Ê glob (g) + Ê loc (g) + e−rT E {g(Xτn )} + O(1/n2 ) (14)
192n
which defines Ê glob (g) and Ê loc (g). The final term comes from the fact that we
defined g with a fixed time T instead of the random time ξ when we changed
the probability measure.
This splits the error into two parts. The global error Ê glob (g) can be handled
with a suitable modification of the Taylor series argument of (11). The local error
Ê loc (g) can be computed explicitly, and it is here that the local properties such
as the continuity and differentiability of g come into play.
1 h5 Q 1
Ê glob (g) = E {g(Xτn )} − 2 E Q {Xτ2n g(Xτn )}
6n 2 σ T
1 Q 4
i 3
− E {X τ g(X τ n )} + O(n− 2 ) . (15)
12σ 4 T 2 n
10
Proof. Let Pn (x) be the transition probabilities of a simple symmetric random
walk on the integers, so that Pj (x) = P Q {Xτj = hx}. Let us remark that J is
independent of (Xτj ) so that
∞
X
P Q {XτJ = hx} = P Q {J − n = k}Pn+k (x) ,
k=−n
∞ n p+q
X X k p xq J − n p Q q n 2 −r
Q
P {J−n = k}Pn (x) r g(xh) = E Q √ E Xτn g(Xτn ) √ .
x=−n
n n (σ T )q
k=−n
(16)
By Proposition 11.2 of the Appendix, the two expectations are bounded, so
k p xq
if p 6= 1 this term has order p+q
2 − r, which is the effective order of nr . By
3 3
Corollary 11.4, the contributions to this integral for |x| > n 5 and/or |k| > n 5
go to zero faster than any power of n. Thus we can restrict ourselves to the sum
3
over the values max |x|, |k| ≤ n 5 , in which case Pn+k (x) and Pn (x) are both
defined, and
XX
E Q {g(Xτn ) − g(XτJ )} = P Q {J − n = k} Pn (x) − Pn+k (x)) g(hx)
k x
X X k 3k 2 + 4kx2 3k 2 x2 k 2 x4
= − 2
+ 3
− 4
+ Q3 Pn (x) g(hx) ,
x
2n 8n 4n 8n
k
1 h 1 Q 3 Q
E {J − n} − E {(J − n)2 } E{g(Xτn )}
n 2 8n
1 1 Q 3 Q
− 2 E {J − n} − E {(J − n)2 } E Q {Xτ2n g(Xτn )}
σ T 2 4n
1 i
− 4 2 E Q {(J − n)2 }E Q {Xτ4n g(Xτn ) + O(n−3/2 ) . (17)
8σ T n
Proposition 11.2 gives the values of E{J −n} = 4/3+O(h) and E{(J −n)2 } =
2n/3 + O(1). Substituting, we get (15).
11
translates into exponential boundedness of g: there exist A > 0 and a > 0 such
that |g(x)| ≤ Aea|x| for all x.
def def
Let Nhe = 2hZ and Nho = h + Nhe be the sets of even and odd multiples of h
respectively. Recall that J was the first even integer j such that τj > T . Let us
define
def
L = sup{j : Xτj < T } , (18)
so that τL is the last stopping time before T .
There are two cases. Either L is an odd integer, in which case XτL ∈ Nho ,
L = J − 1, and τL = τJ−1 < T < τJ ; or L is an even integer, in which case
XτL ∈ Nhe , L = J − 2, τL = τJ−2 < T < τJ−1 . Note that in either case,
τL ≤ t ≤ T =⇒ |Xt − XτL | < h.
Define two operators, Πe and Πo on functions u(x), x ∈ R by:
• Πe u(x) = u(x) if x ∈ Nhe , and x 7→ Πe u(x) is linear in each interval
[2kh, (2k + 2)h], k ∈ N.
• Πo u(x) = u(x) if x ∈ Nho , and x 7→ Πo u(x) is linear in each interval
[(2k − 1)h, (2k + 1)h], k ∈ N.
Thus Πe u and Πo u are linear interpolations of u in between the even (respec-
tively odd) multiples of h.
Apply the Markov property at T . Xt is a Brownian motion from T on, and
if L is odd, then τJ is the first time after T that Xt hits Nhe , so, using the known
hitting probabilities of Brownian motion,
Let
def
q(x) = P {L is even | XT = x} .
Note that L is even if and only if XτL ∈ Nhe , and if this is so, Xt does not
def
hit Nho between τL and T . Reverse Xt from time T : let X̂t = XT −t , 0 ≤ t ≤ T .
Then L is even if and only if X̂t hits Nhe before it hits Nho . But now, if we
condition on XT = x, or equivalently on X̂0 = x, then {X̂t , 0 ≤ t ≤ T } is a
Brownian bridge, and X̂t − X̂0 is a Brownian motion. Thus we can calculate the
12
exact probability that it hits Nhe before Nho . More simply, we can just note that if
h is small, the probability of hitting Nhe before Nho is not much influenced by the
drift, and so it is approximately that of unconditional Brownian motion. Thus,
if X̂0 = x ∈ (2kh, (2k + 1)h), q(x) = P {X̂t reaches 2kh before (2k + 1)h} ∼
(2k+1)h−x
h = dist(x, Nho )/h, where dist(x, Λ) is the distance from x to the set Λ.
Thus
1
q(x) = dist(x, Nho ) + O(h) . (20)
h
Proposition 9.1
E{g(XτJ )−g(XT )} = E{Πe g(XT )−g(XT )}+E{(Πo Πe g(XT )−Πe g(XT ))q(XT )}.
Proof. Let us write E{g(XτJ )} = E{g(XτJ ), L even } + E{g(XτJ ), L odd }.
Note that {L odd } ∈ F T , so it is conditionally independent of {XT +t , t ≥ 0}
given XT . Thus
n
E{g(XτJ )} = E E{g(XτJ ) | XT , L even }P {L even | XT }
o
+ E{g(XτJ ) | XT , L odd }P {L odd | XT } .
∞
h2 X
Z
E loc (g) = (Πe g(x) − g(x))p(x) dx + (∆k ) p(2kh) + O(h3 ) . (21)
−∞ 3
k
Proof. The first integral equals the first expectation on the right hand side of
Proposition 9.1. The second expectation can be written
Z ∞
(Πo Πe g(x) − Πe g(x))q(x)p(x) dx . (22)
−∞
To simplify the second term, let ξ(x) = Πe g(x). Then ξ is piecewise linear with
vertices on Nhe , so we can write it in the form
X1
ξ(x) = ax + b + ∆k |x − 2kh|
2
k
13
for some a and b. Since Πo is a linear operator and Πo (ax + b) ≡ ax + b, we see
this is
Z ∞
1X
= ∆k (Πo |x − 2kh| − |x − 2kh|)q(x)p(x) dx .
2 −∞ k
Now |x − 2kh| is linear on both (−∞, 2kh) and (2kh, ∞), so that Πo |x −
2kh| = |x − 2kh| except on the interval [(2k − 1)h, (2k + 1)h]. On that interval,
Πo |x − 2kh| ≡ h and q(x) = (h − |x − 2kh|)/h, for q(x) is approximately 1/h
times the distance to the nearest odd multiple of h. Write p(x) = p(2kh) + O(h)
there. Then
∞ (2k+1)h
1
Z Z
(Πo |x−2kh|−|x−2kh|)q(x)p(x) dx = (h−|x−2kh|)2 (p(2kh)+O(h)) dx
−∞ h (2k−1)h
2 2
= h p(2kh) + O(h3 ) . (23)
3
If we remember that if g = |x − 2kh|, ∆k = 2, the corollary follows.
We can decompose g as follows. Define the modified Heaviside function H̃(x)
by
1 if x > 0
1
H̃(x) = if x = 0
2
0 if x < 0
X X1
g1 (x) = ∆g(y)H̃(x − y) , g2 (x) = (∆g 0 )(y)|x − y| . (24)
y y
2
Proof. The sums in (24) are finite. By the definition of K, g(x) = 12 (g(x+) +
g(x−)) at any discontinuity of g. It is easy to check that if we define g1 by
(24), then g − g1 is continuous. (This is the reason we modified the Heaviside
function.) However, it may still have a finite number of discontinuities in its
def
derivative. We remove these by subtracting g2 : g3 = g − g1 − g2 . Then it is
easy to see that g3 is continuous, has a continuous first derivative, and that g300
is piecewise continuous.
Remark 9.4 The local error is not hard to calculate, but it will have to be
handled separately for each of the functions g1 , g2 , and g3 .
14
9.1 The Smooth Case
Proposition 9.5 Suppose g is in C (2) and that g and its first two derivitaves
are exponentially bounded. Then
2h2 ∞ 00
Z
Ê loc (g) = g (x)p(x) dx + o(h2 ). (25)
3 −∞
If g ∈ C (4) and g 000 and g (iv) are exponentially bounded, the error is O(h4 ).
Proof. We will calculate the right hand side of (21). Let Ik be R ∞the interval
[2kh, (2k + 2)h] and let yk = (2k + 1)h be its midpoint. Write −∞ (Πe g(x) −
P R
g(x))p(x) dx = k Ik (Πe g(x) − g(x))p(x) dx. Expand g around yk : g(x) =
g(yk ) + g (yk )(x − yk ) + 12 g 00 (yk )(x − yk )2 + o(h2 ). Notice that on Ik , Πe g(x) −
0
This is a Riemann sum for the integral (h2 /3) g 00 (x)p(x) dx. (One has to
R
be slightly careful here: the o(h2 ) term is uniform, so it doesn’t cause trouble in
the improper integral. There is an o(1) error in approximating the integral by
the sum, but as it multiplies the coefficent of h2 , the error is o(h2 ) in any case.
If g ∈ C (4) , it is O(h4 ).) Thus
h2
Z
= g 00 (x)p(x) dx + o(h2 ) . (26)
3
2 P
The second contribution to the error in (21) is h3 k ∆k p(2kh) + O(h ),
3
def
where ∆k is the discontinuity of the derivative of Πe g at xk = 2kh:
15
Proposition 9.6 Suppose g is continuous and piecewise linear. Then
X 1
Ê loc (g) = h2 ∆g 0 (y) + 2θ̂(y)(1 − θ̂(y)) p(y) + O(h3 ) . (27)
y
3
Z ∞ Z
(Πe g(x) − g(x))p(x) dx = (p(y) + O(h)) (Πe g(x) − g(x)) dx
−∞ Ik
so the final term is just 2h2 /3. Adding this to (28), we get (27).
X h2 X
Ê loc (g) = h (2θ̂(y) − 1)∆g(y)p(y) − y ∆g(y) p(y)
3σ 2 T h
y ∈hZ
/ y∈Ne
h2 X
+ y ∆g(y) p(y) . (29)
6σ 2 T h y∈No
P
Proof. By Lemma 9.3 we can write g(x) = y ∆g(y)H̃(x − y). By linearity, it
is enough to consider the case where g(x) = H̃(x − y). Once again, we compute
the integrals in (21). Let Ik = [2kh, (2k + 2)h]. If y ∈ Ik and 0 < θ̂(y) < 1, we
note that Πe g(x) = 0 if x < 2kh, Πe g(x) = 1 if x > 2(k + 2)h, and Πe g is linear
in Ik . Write p(x) = p(y) + O(h) on Ik and note that the only contribution to
the integral comes from Ik :
16
∞ (2k+2)h (2k+2)h
(x − 2kh)
Z hZ Z i
(Πe g(x) − g(x))p(x) dx = dx − dx (p(y) + O(h))
−∞ 2kh 2h 2(k+θ̂(y))h
2
= (2θ̂(y) − 1)p(y)h + O(h ) . (30)
The cases θ̂(y) = 12 and θ̂(y) = 0 are special. In both cases we need to
expand p up to a linear term, since the constant term cancels out. So write
p(x) = p(y) + p0 (y)(x − y) + O(h2 ), x ∈ Ik . If g(x) = H̃(x − y), y ∈ Ik , then
θ̂(y) = 12 , means y = (2k + 1)h. Noting that the contribution from p(y) vanishes,
the first error term will be
"Z #
∞ (2k+2)h (2k+2)h
(x − 2kh)2
Z Z
0
(Πe g(x) − g(x))p(x) dx = p (y) dx − dx
−∞ 2kh 2h 2(k+θ̂(y))h
1
= − h2 p0 (y) (31)
6
h2 y
= p(y) ,
6σ 2 T
where we have used the fact that p0 (x) = − σ2xT p(x).
If θ̂ = 0, then g = H̃(x), so g(2kh) = 21 . Thus Πe g(x) = (x − (2k − 2)h)/4h if
(2k − 2)h < x < (2k + 2)h. It is zero for x ≤ (2k − 2)h and one for x > (2k + 2)h,
so that
∞ (2k+2)h Z (2k+2)h i
x − (2k − 2)h
Z hZ
(Πe g(x)−g(x))p(x) dx = p(2kh) − dx
−∞ (2k−2)h 4h 2kh
(2k+2)h Z (2k+2)h
hZ (x − (2k − 2)h)2 i
+ p0 (2kh) − (x − 2kh) dx . (32)
(2k−2)h 4h 2kh
17
10 Convergence of the Delta: Proof of Cor. 4.4
The price of our derivative at time t < T is
V (s, t) = er(T −t) E{f (ST ) | St = s} . (34)
∂V
The hedging strategy depends on the space derivative , which is called the
∂s
delta. It is of interest to know how well the tree scheme estimates this. From
(34)
∂V V (eh s, t) − V (e−h s, t)
(s, t) = e−r(T −t) E{f 0 (ST ) | St = s} = lim
∂s h→0 s(eh − e−h )
If t = kδ and s = ejh+rt , we approximate ∂V /∂s by the symmetric discrete
derivative
u(j + 1, k) − u(j − 1, k)
. (35)
s(eh − e−h )
where u is the solution of the tree scheme (3).
Remark 10.1 Estimating the delta is essentially equivalent to running the
scheme on f 0 , not f . If f 0 is continuous, the result follows from Theorem 4.2.
However, if f 0 is discontinuous—as it is for a call or a put—and if the disconti-
nuity falls on a non-lattice point, Theorem 4.2 would give order 1/2, not order 1,
which does not imply Corollary 4.4. In fact it depends on some uniform bounds
which come from Theorem 4.2 and the fact we use the symmetric estimate of
the derivative. Thus there is something to prove.
Proof. By the Markov property, it is enough to prove the result for t = 0 and
S0 = 1. We will also assume that r = 0 to simplify notation.
The key remark is that if St is a logarithmic Brownian motion from s, then
eh St and e−h St are logarithmic Brownian motions from eh s and e−h s respec-
tively, so that
∂V B(eh s, 0) − V (e−h s, 0)
(1, 0) = lim = lim E{fˆ(ST , h)} ,
∂s h→0 eh − e−h h→0
where
f (eh s) − f (e−h s)
fˆ(s, h) = .
eh − e−h
Now f 0 ∈ K so that f and its first three derivatives are polynomially bounded,
hence there is a polynomial Q(s) which bounds fˆ, ∂ fˆ/∂s and ∂ 2 fˆ/∂s2 , uniformly
for h < 1 . This will justify passages to the limit, so that, for instance, ∂V
∂s (1, 0) =
E{ST f 0 (ST )}.
def
E(h) = E{fˆ(Sτn , h) − ST f 0 (ST )}
= E{fˆ(Sτ , h) − fˆ(ST , h)} + E{fˆ(ST , h) − ST f 0 (ST )}
n
def
= E 1 (h) + E 2 (h) .
18
Now f ∈ K, hence so is fˆ(·, h). Thus E 1 (h) is the error for a payoff function
f (·, h), and Theorem 4.2 applies. By the uniform polynomial bound of fˆ and
ˆ
related functions, these coefficients are uniformly bounded in h for h < 1, and
we can conclude that there is a constant A such that E 1 (h) ≤ Ah2 for small h.
The bound on E 2 (h) is straight analysis. We can write
Z ∞
E 2 (h) = (fˆ(s, h) − sf 0 (s))p(s) ds ,
−∞
R seh 0
where p is the density of ST . Now fˆ(s, h) − sf 0 (s) = eh −e 1
−h se−h
(f (u) −
0 (2) 0
f (s)) du. If f ∈ C on the interval, expand f to first order in a Taylor series
and integrate to see this is s2 h2 f 00 (s) + o(h2 ). In any case, if |f 00 | ≤ C on the
interval, it is bounded by Cs2 h. There are only finitely many points where
f ∈ / C (3) , each contributes at most Cs2 h2 to E 2 we see that |E 2 (h)| ≤ Bh2 for
some other constant B.
To prove (9), let f = (s − K)+ and evaluate E 1 (h) by Theorem 4.2. Note
that fˆ will have discontinuities of approximately 1/2h and −1/2h at s = k − h
and s = K + h respectively. Note also that θ(log s) (see section 3) is periodic
def
with period 2h, so that θ(eh K) = θ(e−h K) = θ̌, so that we can write (8) in the
form h p̂(log(K − h)) − p̂(log(K+ i
E 2 (h) = h2 C + Dθ̌(1 − θ̌) .
2h
the ratio converges to −p̂0 (logK) and (9) follows. This completes the proof,
except to remark that θ̌ corresponds to θ̂, for the odd h instead of even multiples
of h.
11 Appendix
11.1 Moments of τn and J
The (very complicated!) coefficients in Theorem 4.2 come from moments of τn
and J. We will derive them here. We will assume that P is the Brownian
measure, i.e. that Xt is a Brownian motion. Thus we will not write E Q and P Q
to indicate that we are using the Brownian measure. We can write Xt = σWt ,
where {Wt , t ≥ 0} is a standard Brownian motion.
Proposition 11.1 (i) τ1 has the same distribution as Tn ν, where ν = inf{t >
0 : |Wt | = 1}, so it has the moment generating function
r T −1 nπ 2
def λτ1
F1 (λ) = E{e } = cos 2λ , −∞ < λ < . (36)
n 8T
(ii) τ1 , τ2 − τ1 , τ3 − τ2 , . . . are i.i.d. , independent of Xτ1 , Xτ2 , . . . .
T 2T 2 2T 2
(iii) E{τ1 } = n , var{τ1 } = 3n2 E{τn } = T, var{τn } = 3n .
(iv) For each k ≥ 1 there are constants ck > 0, Ck > 0 such that
19
ck T k Tk
E{τ1k } = , E{|τn − T |k } ≤ Ck .
nk nk/2
Proof. (i) follows by Brownian scaling, and the moment generating function
is well-known for λ < 0 [4]; it is not difficult to extend to λ > 0. Then (ii) is
well-known, see e.g. [9], and (iii) is an easy consequence of (i).
For (iv), notice that τ1 has finite exponential moments, so that the moments
in question are finite. The kth moment of τ1 is determined by Brownian scaling:
ck = E{ν k }. To get the kth central moment, note that by (ii) we can write
τn − T as a sum of n i.i.d. copies of τ1 − T /n, say τn − T = η1 , + · · · + ηn . The
ηj have mean zero, so by Burkholder’s and Hölder’s inequalities in that order,
n
nX k2 o n
k
X
E{(τn − T )k } ≤ Ck E ηj2 ≤ Ck n 2 E{ηjk } .
j=1 j=1
(iv) For k > 1, there exists a constant Ck such that E{(J − n)k } ≤ Ck nk/2
T
Proof. Set ηj = Tj − Tj−1 − n, j = 1, 2, . . . and put
j
X
Mj = ηj = τj − jE{τ1 } .
i=1
Then (Mj ) is a martingale. Apply the stopping theorem to the bounded stopping
time J ∧ N and let N → ∞ to see that 0 = E{MJ } = E{τJ } − E{J}E{τ1 } , so
that
E{τJ } n
E{J} = = E{τJ } . (37)
E{τ1 } T
Now τJ > T , so to find its expectation, notice that, as in Proposition 9.1, TJ
will either be the first hit of Nhe after T —if L is odd (see (18)—or it will be the
first hit of Nhe after the first hit of Nho after T , if L is even. The expected time for
Brownian motion to reach the endpoints of an interval is well known: if X0 = x ∈
(a, b), the expected time for X to leave (a, b) is σ −2 (x − a)(b − x). Let dist(x, A)
be the shortest distance from x to the set A. If XT = x, the expected additional
time to reach Nhe is σ −2 (h2 − dist2 (x, Nho )), while the expectated additional time
to reach Nho is σ −2 (h2 − dist2 (x, Nhe )). Once at Nho , the expected time to reach
Nhe from there is T /n = h2 /σ 2 . Now by (20), P {L is even | XT = x} = q(x) =
dist (x, Nho )/h + O(h), and P {L is odd | XT = x} = dist (x, Nhe )/h + O(h).
20
Thus, as L is conditionally independent of {XT +t , t ≥ 0} given XT , we have
E{τJ − T | XT = x} = P {L is odd | XT = x}E{τJ − T | XT = x, L is odd} +
P {L is even | XT = x}E{τJ − T | XT = x, L is even}, so that if p(x) is the
density of XT ,
Z ∞
σ −2 p(x) h2 − dist2 (x, Nho ) h−1 dist x, Nhe + O(h) dx
E{τJ − T } =
−∞
Z ∞
+ σ p(x) 2h2 − dist2 (x, Nhe ) h−1 dist (x, Nho ) + O(h) dx . (38)
−2
−∞
R∞ P R xk +h
Now let xk = (2k + 1)h and write −∞ = k (1/2h) xk −h 2h. Write
p(x) = p(xk )(1 + O(h)) on the interval (xk − h, xk + h). We can then do the
integrals explicitly:
X p(xk ) Z xk +h
(1 + O(h))σ −2 h2 − dist2 (x, Nho ) (dist (x, Nhe )/h + O(h)) dx
2h xk −h
k
5 −2 2 X 5 T
= σ h p(xk ) 2h + O(h3 ) ∼ (39)
12 12 n
k
R
since the Riemann sum approximates p(x) dx = 1. The other integral is similar,
and gives 11 T 4T 3
12 n , so we see E{τJ − T } = 3 n + O(h ) , which implies (i) and (ii).
n |J − n| k o Z ∞
3y def
E √ ≤4 y k d(e− 4 ) = Ck ,
n −∞
21
which proves the assertion.
We will need to control the tails of the distributions of τn and J. The
following proposition is a key.
Proposition
√ 11.3 Let (ξn ) be a sequence of reals. Suppose m = nξn , where
nξn → ∞ as n → ∞. Then, as n → ∞,
n√ m 2
1
− 3ρ
o
P n |τm − T | > ρ ≤ 2e 4T 2 ξn 1 + O √ . (40)
n ξn n
n√ m o √
def m
Pn = P n τm − T > ρ ≤ e−λρ E{eλ n(τm − n T ) }
n
4λ2 T 2 ξn
λT 2 !
−√ nξn − x2 x4
e n e
= e−λρ q = e−λρ .
cos 2λT√ cos x
n
q
2λT 3ρ
where x = √
n
. Take logs and choose λ = 2ξn T 2 to see that
ρ2 3 9 x2
log Pn ≤ − − + log cos x .
ξn T 2 2 x4 2
√
Expand log cos x near x = 0 and note that x2 = O(1/ξn n) = o(1), so this is
3ρ2 1
=− 2
+ O( √ ) . (41)
4ξn T ξn n
The other direction is similar. For λ > 0, let
def m m
e−λρ E e−λ(τm − n T )
Pn = P τm − T < −ρ ≤
n
λT nξn
e n
√
= e−λρ q
cosh 2λT
√
n
which differs from the above only in that cosh replaces the cosine. Exactly the
same manipulations show that Pn is again bounded by (41), and the conclusion
follows.
22
1
lim np E g(Xτn ); |τn − T | > n− 4 + = 0 ;
(43)
n→∞
1
lim np E g(XτJ ); |J − n| > n 2 + = 0 .
(44)
n→∞
√ √
Proof. Let ξ > 1. P {J > nξ} = P {τnξ < T } ≤ P { n |τnξ − ξT | > n (ξ −
3n(ξ−1)2 √
1)T }. By (40), P {J − n > n(ξ − 1)} ≤ e− 4ξ . Take y = (ξ − 1) n, to
√ −3 y2
see that P {J − n √> y n} ≤ 2e 4√(1+y/ n) . Similarly, for ξ < 1, P {J < nξ} =
√
P {τnξ > T } ≤ P { √n |τnξ − ξT | > n (1 − ξ)T }. Use (40) to get the same bound
for P {J − n < −y n}, and add to get (42).
Next, Xτn and τn are independent and |g(x)| ≤ Aea|x| for some A and a, so
1 1
|E{g(Xτn ); |τn − T | > n− 4 + }| ≤ AE{ea|Xτn | }P {|τn − T | > n− 4 + } .
Xτn is binomial √so we use its moment generating function to see that
E{ea|Xτn | } ≤ 2eσ nT . Combine this with the bound (40) on the tails of τn
to see (43).
The second assertion follows from Corollary 11.4, once we notice that in any
case, |XτJ − XT | ≤ 4h, so that, |E{g(XτJ )}| ≤ Aea|XτJ | ≤ Aea|XT |+4h .
n! 2−n
Pn (x) = n+x
n−x (45)
2 ! 2 !
which is the probability of taking 12 (n + x) positive steps and 12 (n − x) negative
steps out of n total steps. Now let
3
Proposition 11.5 Let n, k, and x be even integers with max (|k|, |x|) ≤ n 5 .
Then
k 3k 2 + 4kx2 3k 2 x2 k 2 x4
R(n, k, x) = 1 − + − + + Q3 + O(n−3/2 ) ,
2n 8n2 4n3 8n4
where Q3 is a sum of monomials of effective order at most − 23 .
23
√ 1 1
Proof. We need Stirling’s formula in the form n! = 2π nn+ 2 e−n+ 12n +O(n ) .
−3
The O(n−3 ) term is uniform in the sense that there is an a such that for all n it
is between −a/n3 and a/n3 .
Write R(n, k, x) in terms of factorials and use Stirling’s formula on each. We
1
can write R(n, k, x) = R1 R2 , where R2 comes from the factors e 12n +O(n ) . Let
−3
1 1 1
log R1 = (2m(1+ξ)+ ) log(1+ξ)+(m(1+η)+ ) log(1+η)+(m(1−η+ ) log(1−η)
2 2 2
1 1
− (m(1 + ξ + η) + ) log(1 + ξ + η) − (m(1 + ξ − η) + ) log(1 + ξ − η)
2 2
1 1 2 2 2 2
log R2 = + + −1− − .
24m 1 + ξ 1+η 1−η 1+ξ+η 1+ξ−η
Notice that errors in log R are of the same order as those of R; that is, if
rn → r > 0 and | log rn − log r| < cn−p , then |rn − r| < 2rcn−p for large n. Thus
it is enough to determine log R up to terms of order 1/n.
Since |k| and |x| are smaller than n3/5 , ξ and η are smaller than n−2/5 , and
an easy calculation shows that
ξ
log R2 =+ O(n−3/2 ) .
4n
Expand log R1 in a power series in ξ and η. To see how many terms we need
to keep, note that the coefficients may be O(n). If we include terms up to order
6 in ξ and η, the remainder will be o(n−3/2 ); those making an O(1/n) or larger
contribution will be of the form ξ p η q with p + q ≤ 2, and mξ p η q with p + q ≤ 5,
so that
1 1 1 3
log R1 = − ξ + ξ 2 + mη 2 ξ − mη 2 ξ 2 + mη 2 ξ 3 + mη 4 ξ + Ŝ(m, ξ, η) + o(n− 2 ) ,
2 4 2
where Ŝ(m, ξ, η) is a polynomial whose terms are all o(1/n). Notice that log R2
is also of o(1/n), so we can include it as part of Ŝ. In terms of n, k, and x,
k k 2 + 2kx2 k 2 x2 2k 3 x2 + kx4 3
log R = − + − + + S(n, k, x) + o(n− 2 )
2n 4n2 2n3 4n4
def 3
= Q(n, k, x) + S(n, k, x) + o(n− 2 ) .
where S(n, k, x) is a sum of monomials, each of which is o(1/n). The largest
term in Q is h2 x2 /n2 ≤ n−1/5 , so that we can write
8
!
Q+S − 23
X (Q + S)p
R = e (1 + o(n )) = (1 + o(n−1 ))
p=1
p!
def − 23
= (1 + o(n )) Q1 .
24
Check the effective order of the terms in the polynomial Q1 : we see that
k kx2 1 k2 k 2 x2
Ô = Ô =− , Ô = Ô = −1 ,
2n 2n2 2 4n2 n3
and the other two terms have effective order −3/2. All terms in S have effective
order less than −1, and hence less than or equal to −3/2, since all effective
orders are multiples of 1/2. Now the effective order of the product of monomials
is the sum of the effective orders, so that the only terms in Q1 of effective order
at least −1 are those in Q and the three terms from 21 Q2 which come from the
squares and products of the terms of effective order −1/2, namely
k2 k 2 x4 k 2 x2
2
+ 4
− .
8n 8n 4n3
All other terms have effective orders at most −3/2. Thus define
def k 3k 2 + 4kx2 3k 2 x2 k 2 x4
Q2 = 1 − + 2
− 3
+ ,
2n 8n 4n 8n4
and let Q3 be all the other terms in Q1 . Then R = (Q2 +Q3 )(1+o(n−3/2 )) where
all terms in Q3 have effective order at most −3/2. This proves the proposition.
E Q {g(XT )} = E P {f (ST )}
n 1 o
E Q {g 00 (XT )} = E P ST2 f 00 (ST ) + f (ST ) (46)
4
E Q {XTk g(XT )} = E P {(log(S̃T /s0 ))k f (ST )} .
References
[1] Cox, J.C., S.A. Ross, and M. Rubinstein, Option pricing, a simplified ap-
proach, Journal of Financial Economics 7, (1979) 229–263.
25
[2] Diener, F. and M. Diener, Asymptotics of the price oscillations of a vanilla
option in a tree model, (preprint)
[3] Diener, F. and M. Diener, Asymptotics of the price of a barrier option in a
tree model (preprint)
26