Tree Scheme

The Rate of Convergence of the Binomial
Tree Scheme
John B. Walsh1
Department of Mathematics, University of British Columbia, Vancouver B.C. V6T
1Y4, Canada
(e-mail: walsh@math.ubc.ca)
Abstract. We study the detailed convergence of the binomial tree scheme. It

is known that the scheme is first order. We find the exact constants, and show it
is possible to modify Richardson extrapolation to get a method of order three-
halves. We see that the delta, used in hedging, converges at the same rate. We
analyze this by first embedding the tree scheme in the Black-Scholes diffusion
model by means of Skorokhod embedding. We remark that this technique applies
to much more general cases.
Key words: Tree scheme, options, rate of convergence, Skorokhod embedding

Mathematics Subject Classification (1991): 91B24, 60G40, 60G44
JEL Classification: G13
1 Introduction
The binomial tree scheme was introduced by Cox, Ross, and Rubinstein [1] as a
simplification of the Black-Scholes model for valuing options, and it is a popular
and practical way to evaluate various contingent claims. Much of its usefulness
stems from the fact that it mimics the real-time development of the stock price,
making it easy to adapt it to the computation of American and other options.
From another point of view, however, it is simply a numerical method for solving
initial-value problems for a certain partial differential equation. As such, it is
known to be of first order [6], [7], [2], [3], at least for standard options. That is,
the error varies inversely with the number of time steps.
A key point in typical financial problems is that the data is not smooth. For
instance, if the stock value at term is x, the payoff for the European call option
is of the form f (x) = (x − K)+ , which has a discontinuous derivative. Others,
such as digital and barrier options, have discontinuous payoffs. This leads to
1 I would like to thank O. Walsh for suggesting this problem and for many helpful conver-
sations.
1
an apparent irregularity of convergence. It is possible, for example, to halve
the step size and actually increase the error. This phenomenon comes from the
discontinuity in the derivative, and makes it quite delicate to apply things such
as Richardson extrapolation and other higher-order methods which depend on
the existence of higher order derivatives in the data.
The aim of this paper is to study the convergence closely. We will determine
the exact rate of convergence and we will even find an expression for the constants
of this rate.
Merely knowing the form of the error allows us to modify the Richardson
extrapolation method to get a scheme of order 3/2.
We will also see that the delta, which determines the hedging strategy, can
also be determined from the tree scheme, and converges at exactly the same
rate.
The argument is purely probabilistic. The Black-Scholes model treats the
stock price as a diffusion process, while the binomial scheme treats it as a Markov
chain. We use a procedure called Skorokhod embedding to embed the Markov
chain in the diffusion process. This allows a close comparison of the two, and
an accurate evaluation of the error. This was done in a slightly different way by
C.R. Rogers and E.J. Stapleton, [9], who used it to speed up the convergence of
the binomial tree scheme.
This embedding lets us split the error into two relatively easily analyzed
parts, one which depends on the global behavior of the data, and the other
which depends on its local properties.
2 Embeddings
The stock price (St ) in the Black-Scholes model is a logarithmic Brownian mo-
tion, and their famous hedging argument tells us that in order to calculate
def
option prices, the discounted stock price S̃t = e−rt St should be a martingale.
This hedging argument does not depend on the fact that the stock price is a
logarithmic Brownian motion, but only on the fact that the market is complete:
the stock prices in other complete-market models should also be martingales,
at least for the purposes of pricing options. Even in incomplete markets, it is
common to use a martingale measure to calculate option prices, at least as a
first approximation.
It is a general fact [8] that any martingale can be embedded in a Brownian
motion with the same initial value by Skorokhod embedding, and a strictly
positive martingale can be embedded in a logarithmic Brownian motion. That
means that one can embed the discounted stock price from other single-stock
models in the discounted Black-Scholes stock price. Suppose for example, that
Yk , k = 0, 1, 2, . . . is the stock price in a discrete model, and that Y0 = S0 .
def
Under the martingale measure, the discounted stock price Ỹk = e−krδ Yn is a
martingale. Then there are (possibly randomized) stopping times 0 = τ0 < τ1 <
. . . for St such that the processes {Ỹk , k = 0, 1, 2, . . . } and {S̃τk , k = 0, 1, 2, . . . }
have exactly the same distribution. Thus the process (Ỹk ) is embedded in S̃t : Ỹk
2
is just the process S̃t sampled at discrete times. However, the times are random,
not fixed. This is what we mean by embedding.
We note that this embedding works for a single-stock market, but not in gen-
eral for a multi-stock market, unless the stocks evolve independently, or nearly
so.
Let f be a positive function. Suppose there is a contingent claim, such as a
European option, which pays off an amount f (ST ) at time T if the stock price at
time T is ST . If S0 = s0 , its value at time zero is V (s0 , 0) ≡ e−rT E{f (ST )} . On
the other hand, if T = nδ, the same contingent claim for the discrete model pays
def
f (Yn ) at maturity and has a value at time zero of U (s0 , 0) = e−rT E{f (Yn )} .
But Yn = e Ỹn has the same distribution as e S̃τn , while ST = erT S̃T . Thus
rT rT
U (s0 , 0) = e−rT E{f (erT S̃τn )}, and the difference between the two values is
U (s0 , 0) − V (s0 , 0) = e−rT E{f (erT S̃τn ) − f (erT S̃T )} . (1)

This involves the same process at two different times, the fixed time T and
the random time τn . In cases such as the binomial tree, we have a good hold on
the embedding times τn and can use this to get quite accurate estimates of the
error. Although we will only embed discrete parameter martingales here, the
theorem is quite general: it is used for the trinomial tree in [10]; one can even
embed continuous martingales, so this could apply to models in which the stock
price has discontinuities.
We should note that it is the discounted stock prices which are embedded, not
the stock prices themselves, although there is a simple relation between the two.
Rogers and Stapleton [9] have suggested modifying the binomial tree slightly in
order to embed the stock prices directly.
3 The Tree Scheme

Let r be the interest rate. We will consider the Cox-Ross-Rubinstein binomial
tree model for the discounted stock price. Let δ > 0, and let the stock price
at time t = kδ be Yk ; the discounted price is Ỹk = e−rkδ Yk . We will assume
the probability measure is the martingale measure, so that (Ỹk ) is a martingale,
and we assume it takes values in the discrete set of values aj , j = 0, 1, 2, . . . .
At each step, Yk can jump to one of two possible values: either Yk+1 = aYk or
Yk+1 = a−1 Yk , where a > 1 is a real number. The martingale property assures
us that
1 def def
P Ỹj+1 = a−1 Ỹj | Ỹj = 1 − q .

P Ỹj+1 = aỸj | Ỹj = = q,
a+1
so (Ỹk ) is a Markov chain with these transition probabilities.

Let f (x) be a positive function, and consider the contingent claim which pays
f (YT ) at time T , for some given function f . Fix an integer n and let δ = T /n.
If Y0 = s0 , then the value of the claim at time zero is U (s0 , 0), and its value at
some intermediate time t = kδ is
3
def
U (Ỹk , k) = e−r(T −kδ) E{f (Yn ) | Yk } = e−r(T −k)δ) E{f (erT Ỹn ) | Yk } . (2)
Let u(j, k) = U (aj , k). Then u is the solution of the difference scheme
u(j, k) = e−rδ q u(j + 1, k + 1) + (1 − q) u(j − 1, k + 1) , j ∈ Z, k = 0, . . . , n − 1 ,

u(j, n) = f (erT aj ), j ∈ Z .
(3)
Under its own martingale measure, the corresponding Black-Scholes model
will have a stock price given by
1 2
St = S0 eσWt +(r− 2 σ )t
, t ≥ 0, (4)
where Wt is a standard Brownian motion and σ > 0 is the volatility. The

1 2
discounted stock price is the martingale S̃t = eσWt − 2 σ t . In this model, the
above contingent claim pays f (ST ) at time T , its value at time zero is V (s0 , 0),
and its value at an intermediate time 0 < t < T is
def
V (St , t) = e−r(T −t) E{f (ST ) | St } . (5)
There is a relation between a, δ, n, T , and σ which connects these models:
r
T T
δ = , log a = σ .
n n
If we let n, j, and k tend to infinity in such a way that kT /n −→ t and
√
jσ T /n+krT /n
e → x, then u(j, k) will converge to V (x, t). The question we will
answer is “How fast?”
4 Results
We say that a function f is piecewise C (k) if f, f 0 , . . . , f (k) have at most finitely
many discontinuities and no oscillatory discontinuities. We will treat the follow-
ing class of possible payoff functions.
Definition 4.1 Let K be the class of real-valued functions f on R which satisfy

(i) f is piecewise C (2) ;
(ii) at each x, f (x) = 21 (f (x+) + f (x−)).
(iii) f , f 0 , and f 00 are polynomially bounded: i.e. there exist K > 0 and
p > 0 such that |f (x)| + |f 0 (x)| + |f 00 (x)| ≤ K(1 + |x|p ) for all x.
Let us introduce some notation which will be in force for the remainder of
the paper. Let f ∈ K and consider a contingent claim which pays an amount
f (s) at a fixed time T > 0 if the stock price at time T is s. Let n be the number
of time steps in the discrete
p model, so that the time-step is δ = T /n. The space
step h is then h = σ T /n.
4
The error depends on the discontinuities of f and f 0 , and on the relation of
these discontinuities to the lattice points.
∆f (s) = f (s+ ) − f (s− );

∆f 0 (s) = f 0 (s+ ) − f 0 (s− );
log s
θ(s) = frac
2h
where frac(x) is the fractional part of x.
Let the initial price be S0 = s0 . The value in the Black-Scholes model is
given by V (s0 , 0), and its value in the binomial tree scheme is given by U (s0 , 0),
so the error of the tree scheme is defined to be
def
E tot (f ) = U (s0 , 0) − V (s0 , 0) . (6)
def
Let hZ be the set of all multiples of h, Nhe = 2hZ the set of all even multiples
def def
of h, and Nho = h + Nhe the set of all odd multiples of h. The density of XT =
log(S̃T /s0 ) ≡ σWT − σ 2 T /2 is
(x+ 1 σ2 T )2
def 1 2
p̂(x) = √ e− 2σ2 T .
2πσ 2 T
The main result is Theorem 4.2 below, but we will first give a simple and
easily-used corollary.
Corollary 4.1 Suppose f ∈ K and that n is an even integer. If f is discontinu-

ous, and if the discontinuity is not on a lattice point, then E tot (f ) = O(n−1/2 ).
If all discontinuities are on lattice points, then E tot (f ) is O(n−1 ). Moreover, if
f is a European call or put of strike price K, if K̃ = Ke−rT is the discounted
strike price, and if s0 is the initial stock price, then E tot (f ) is of the form
1
E tot (f ) = A + Bθ(1 − θ) + O(n−3/2 ) , (7)
n
where θ = θ(K̃/s0 ), A is a constant which depends on f , and
B = 2σ 2 T K∆f 0 (K) p̂(log K̃/s0 ) .
This is a special case of Theorem 4.2 below, so there is no need for a separate
proof. We collect (10), and Propositions 9.5, 9.6 and 9.7, and use (46) to express
them in terms of f instead of g. We get:
Theorem 4.2 Suppose that f ∈ K. Let s1 , s2 , . . . , sk be the set of discontinuity

points of f and f 0 , and let s0 be the initial stock price. For any real s, let
s̃ = se−rT . Let n be an even integer. Then the error in the tree scheme is
5
"
e−rT 5 σ 2 T σ 4 T 2 n o 1 n o
E tot (f ) = + + E f (ST ) − 2 E (log(S̃T /s0 ))2 f (ST )
n 12 6 192 6σ T
1 n
4
o 2 2 n
2 00
o
− E (log( S̃ T /s 0 )) f (S T ) + σ T E S T f (S T )
12σ 4 T 2 3
X 1 1
+ σ2 T si ∆f 0 (si ) − ∆f (si ) + 2θ(s̃i /s0 )(1 − θ(s̃i /s0 )) p̂(log(s̃i /s0 ))
i
2 3
1 X
− log(s̃i /s0 ) ∆f (si ) p̂(log(s̃i /s0 ))
3 h
i:log(s̃i /s0 )∈Ne
#
1 X
+ log(s̃i /s0 ) ∆f (si ) p̂(log(s̃i /s0 ))
6
i:log(s̃i /s0 )∈Nh o
√
−rT σ T
X 1
+e √ (2θ(s̃i /s0 ) − 1) ∆f (si )p̂(log(s̃i /s0 )) + O 3/2 (8)
n n
i:log(s̃i /s0 )∈hZ
/
where the expectations are taken with respect to the martingale measure.
Remark 4.3 We have expressed the errors in terms of E{f (ST )}. However, we
can also express them in terms of erT S̃τn , and it might be better to do so, since
this is exactly what the binomial scheme computes. Indeed, the theorem tells us
that the expectations of f (ST ) and f (erT S̃τn ) only differ by O(1/n), and they
occur as coefficients multiplying 1/n in (8) so one can replace ST by erT S̃τn in
(8) and the result will only change by O(n−2 ), so these formulas remain correct.
So in fact ST and erT S̃τn are interchangeable in (8); and, for the same reason,
both are interchangeable with Sτn .
The delta, which determines the hedging strategy in the Black-Scholes model,
can also be estimated in the tree scheme, and its estimate also converges with
order one. (See Section 10.) Let θ̌(s) = frac ( h+log
2h )
s
Corollary 4.4 Suppose that f is continuous and both f and f 0 are in K. The
symmetric estimate (35) of the delta converges with order one. For a call or put
option with strike price K, there are constants A and B such that the error at
time 0 is of the form
1
A + B θ̌(K̃)(1 − θ̌(K̃)) + o(n−1 ) . (9)
n
5 Remarks and Extensions

1. The random walk Ỹk is periodic, and alternates between even and odd lattice
points. This leads to a well-understood even/odd fluctuation in the tree scheme.
To avoid this, we work exclusively with even values of n. We could as well have
worked exclusively with odd values, but not with both.
6
2. The striking fact about the tree scheme’s convergence is that, even when
restricted to even values of n, the error goes to zero at the general rate of O(1/n),
but “with a wobble:” there are constants c1 < c2 for which c1 /n < E tot (f ) <
c2 /n, and the error fluctuates quasi-periodically between these bounds.2
The reason is clear from (8). For example, a typical European call with strike
price K pays off f (x) = (x − K)+ and (8) simplifies: the last three series vanish,
and the first reduces to the single term
1
σ2 T K + 2θ(1 − θ) p̂(log(s̃/s0 )) .
3
The quantity to focus on is θ. It is in effect the fractional distance (in log scale)
from K̃ to thepnearest even lattice point. In log scale, the lattice points are
multiples of σ T /n, so the whole lattice changes as n changes. This means
that θ changes with n too. It can vary from 0 to 1, so this term can vary by a
factor of nearly three. It is not the only error term, but it is important, and it
is why there are cases where one can double the number of steps and more than
double the error at the same time.
3. The coefficients in Theorem 4.2 are rather complex, and Corollary 4.1 is
handier for vanilla options. It shows that one can make a Richardson-like ex-
trapolation to increase the order of convergence. If we run the tree for three
values of n which give different values of θ, we can then write down (7) for the
three, solve for the coefficients A and B, and subtract off the first order error
terms, giving us potentially a scheme of order 3/2. In fact, one could do this
cheaply: use two runs at roughly the square root of n, and then one at n. This
might be of interest when using the scheme to value American options.
4. It is usually the raw stock price, not the discounted price, which evolves on
the lattice. However, our numerical studies have shown that the behavior of the
two schemes is virtually identical: to adapt Corollary 4.1 to the evolution of
the raw price, just replace the discounted strike price K̃ by the raw strike price
K in the definition of θ. We have therefore used the discounted price for its
convenience in the embedding.
5. From a purely probabilistic point of view, Theorem 4.2 is a rate-of-convergence
result for a central limit theorem for Bernoulli random variables. If we take f to
be the indicator function of (−∞, z], we recover the Berry-Esseen bound. (We
thank the referee for pointing this out.)
6 Embedding the Markov Chain in the Diffusion

The argument in Section 2 showed that the tree scheme could be embedded in a
logarithmic Brownian motion, but didn’t say how. In fact the embedding times
2 F. and M. Diener [2], [3] have investigated this wobble from a quite different point of view,
based on an asymptotic expansion of the binomial coefficients derived by a modification of

Laplace’s method.
7
can be defined explicitly. Define stopping times τ0 , τ1 , τ2 . . . by induction:
τ0 = 0, τk+1 = inf{t > τk : S̃t = aS̃τk or a−1 S̃τk } .
As S̃t is a martingale, so is S̃τ0 , S̃τ1 , . . . Since S̃τk+1 can only equal aS̃τk or
−1
a S̃τk , we must have
1 a
P {S̃τk+1 = aS̃τk | S̃τk } = P {S̃τk+1 = a−1 S̃τk | S̃τk } = .
a+1 a+1
It follows that (S̃τk ) is a Markov chain with the same transition probabilities
as (Ỹk ); since S̃τ0 = Y0 = 1, the two are identical processes. It follows that
the error in the binomial scheme (considered as an approximation to the Black-
Scholes model) is given by
def
E tot (f ) = u(1, 0) − v(1, 0) = e−rT E{f (erT S̃τn ) − f (erT S̃T )} . (10)
Here is a quick heuristic argument to show that the convergence is first order.
Expand E{f (ST +s )} in a Taylor series. It is
E{f (XT +s )} = E{f (ST )} + a1 s + a2 s2 + O(s3 ) .

for some a1 and a2 . Now τn = τ1 + (τ2 − τ1 ) + · · · + (τn − τn−1 ) is a sum of
i.i.d. random variables, so it has mean T and variance n var(τ1 ) = c/n, (see
Prop 11.1) so that E{τn − T } = 0 and E{(τn − T )2 } = c/n. Moreover, τn is
independent of the sequence (Sτj ), so if we stretch things a bit and assume it is
independent of (St ), and set s = τn − T , we would have
E{f (Sτn ) − f (ST )} ∼ E{a1 (τn − T ) + a2 (τn − T )2 } (11)

= a2 c/n
implying that the error is indeed O(1/n).

This argument is not rigorous, since τn is a function of the process (St ), so
it can’t be independent of it. Nevertheless, the dependence is essentially a local
property, and we can isolate it by breaking the error into a global part, on which
this argument is rigorous, and a local part, which can be handled directly.
One lesson to draw from the above is that it is important that E{τn } = T .
If it were not, the lowest order error√term above would not drop out, and would
in fact make a contribution of O(1/ n).
7 Splitting the Error

We will make two simplifying transformations. First, we take logarithms of the
def
stock price: set Xt = log(S̃t /s0 ) = σWt − 12 σ 2 t. From the form of the times τj
we see that (Xτj ) is a random walk on hZ, the integer multiples of h, while (Xt )
is a Brownian motion with drift.
8
Next, we make a Girsanov transformation to remove the drift of Xt . Let ξ
be the maximum of T , τn , and τJ , where τJ is defined below—the value of ξ is
not important, so long as it is larger than the values of t we work with—and set
1 1 2
dQ = e 2 Xξ + 8 σ ξ
dP .
By Girsanov’s Theorem [4], { σ1 Xt ,
0 ≤ t ≤ ξ} is a standard Brownian motion
on (Ω, F , Q). We will call Q the Brownian measure to distinguish it from the
martingale measure P . We will do all our calculations in terms of Q, and then
translate the results back to P at the very end. Under the measure Q, Xt is
a Brownian motion, and (Xτj ) is a simple symmetric random walk on hZ. It
alternates between even and odd multiples of h. To smooth this out, we will
restrict ourselves to even values of j and n.
Thus let n = 2m for some integer m and define
J = inf{2j : τ2j > T } .

Then J is a stopping time for the sequence τ0 , τ1 , . . . with even-integer values,
and τJ is a stopping time for (Xt ). Notice that τJ > T . We expect that τJ ∼ T .
In terms of the martingale measure P , the error is
E tot (f ) = e−rT E{f (Sτn ) − f (SτJ )} + e−rT E{f (SτJ ) − f (ST )}

def
= E glob (f ) + E loc (f ) .
As the notation suggests, E glob (f ) depends on global properties of f , such as

its integrability, while E loc (f ) depends on local properties, such as its continuity
and smoothness. Notice that these concern the signed error, not the absolute
error.
Define a function g by
def x σ2 T
g(x) = f (s0 ex+rT )e− 2 − 8 . (12)
In terms of Q, the error in (10) is
n 1 1 2
o
E tot (f ) = e−rT E Q f (s0 eXτn +rT ) − f (s0 eXT +rT ) e− 2 Xξ − 8 σ ξ . (13)
1 2
Now e− 2 Xt −σ t/8
is a Q-martingale, so as τn ≤ ξ,
1 2
E P {f (Sτn )} = E Q f (Sτn )e− 2 Xξ −σ ξ/8

1 2 2
= E Q f (Sτn )e− 2 Xτn −σ τn /8 = E Q g(Xτn )e−σ (τn −T )/8

since Sτn = s0 eXτn . Similarly
E P {f (ST )} = E Q {g(XT )} .
9
Thus
σ2
E tot (f ) = e−rT E Q {g(Xτn ) − g(XT )} + e−rT E Q g(Xτn ) e− 8 (τn −T ) − 1 .

Now Xt is a Q-Brownian motion, so that the times τ1 , τ2 , . . . are independent

of Xτ1 , Xτ2 , . . . , (see Proposition 11.1) and the last term above is
σ2
e−rT E Q {g(Xτn )}E Q e− 8 (τn −T ) − 1 .

But now τn − T = (τ1 − T /n) + (τ2 − τ1 − T /n) + · · · + (τn − τn−1 − T /n);

the summands are i.i.d., so the last expectation is
 n
σ2 T
4 2
σ 2 T n e 8n
 = 1 + σ T + O(1/n2 ) .
E Q e− 8 (τ1 − n ) = 

q
cosh σ T
2 192n
4n
where we have used Proposition 11.1 and expanded in powers of 1/n. Thus
σ4 T 2 Q 1
E tot (f ) = e−rT E Q {g(Xτn ) − g(XT )} + e−rT E {g(Xτn )} + O( 2 )
192n n
σ4 T 2 Q 1
= e−rT E Q {g(Xτn )−g(XτJ )}+e−rT E Q {g(XτJ )−g(XT )}+e−rT E {g(Xτn )}+O( 2 )
192n n
def σ4 T 2 Q
= Ê glob (g) + Ê loc (g) + e−rT E {g(Xτn )} + O(1/n2 ) (14)
192n
which defines Ê glob (g) and Ê loc (g). The final term comes from the fact that we
defined g with a fixed time T instead of the random time ξ when we changed
the probability measure.
This splits the error into two parts. The global error Ê glob (g) can be handled
with a suitable modification of the Taylor series argument of (11). The local error
Ê loc (g) can be computed explicitly, and it is here that the local properties such
as the continuity and differentiability of g come into play.
8 The Global Error

Let us first look at the global error in (14).
Theorem 8.1 Let g be measurable and exponentially bounded. Then
1 h5 Q 1
Ê glob (g) = E {g(Xτn )} − 2 E Q {Xτ2n g(Xτn )}
6n 2 σ T
1 Q 4
i 3
− E {X τ g(X τ n )} + O(n− 2 ) . (15)
12σ 4 T 2 n
10
Proof. Let Pn (x) be the transition probabilities of a simple symmetric random
walk on the integers, so that Pj (x) = P Q {Xτj = hx}. Let us remark that J is
independent of (Xτj ) so that
∞
X
P Q {XτJ = hx} = P Q {J − n = k}Pn+k (x) ,
k=−n
and, for integers p, q, and r,
∞ n p+q
X X k p xq J − n p Q q n 2 −r
Q
P {J−n = k}Pn (x) r g(xh) = E Q √ E Xτn g(Xτn ) √ .
x=−n
n n (σ T )q
k=−n
(16)
By Proposition 11.2 of the Appendix, the two expectations are bounded, so
k p xq
if p 6= 1 this term has order p+q
2 − r, which is the effective order of nr . By
3 3
Corollary 11.4, the contributions to this integral for |x| > n 5 and/or |k| > n 5
go to zero faster than any power of n. Thus we can restrict ourselves to the sum
3
over the values max |x|, |k| ≤ n 5 , in which case Pn+k (x) and Pn (x) are both
defined, and
XX
E Q {g(Xτn ) − g(XτJ )} = P Q {J − n = k} Pn (x) − Pn+k (x)) g(hx)
k x
X X k 3k 2 + 4kx2 3k 2 x2 k 2 x4
= − 2
+ 3
− 4
+ Q3 Pn (x) g(hx) ,
x
2n 8n 4n 8n
k
by Proposition 11.5, where Q3 is a sum of terms of effective order at most − 23 .

By (16), we identify this as
1 h 1 Q 3 Q
E {J − n} − E {(J − n)2 } E{g(Xτn )}
n 2 8n
1 1 Q 3 Q
− 2 E {J − n} − E {(J − n)2 } E Q {Xτ2n g(Xτn )}
σ T 2 4n
1 i
− 4 2 E Q {(J − n)2 }E Q {Xτ4n g(Xτn ) + O(n−3/2 ) . (17)
8σ T n
Proposition 11.2 gives the values of E{J −n} = 4/3+O(h) and E{(J −n)2 } =
2n/3 + O(1). Substituting, we get (15).
9 The Local Error

The local error, E loc comes from the interval of time between T and τJ . This is
short, but it is where the local properties of the payoff function f come in.
We will express this in terms of g rather than f . Now g inherits the conti-
nuity and differentiability properties of f , and the polynomial boundedness of f
11
translates into exponential boundedness of g: there exist A > 0 and a > 0 such
that |g(x)| ≤ Aea|x| for all x.
def def
Let Nhe = 2hZ and Nho = h + Nhe be the sets of even and odd multiples of h
respectively. Recall that J was the first even integer j such that τj > T . Let us
define
def
L = sup{j : Xτj < T } , (18)
so that τL is the last stopping time before T .
There are two cases. Either L is an odd integer, in which case XτL ∈ Nho ,
L = J − 1, and τL = τJ−1 < T < τJ ; or L is an even integer, in which case
XτL ∈ Nhe , L = J − 2, τL = τJ−2 < T < τJ−1 . Note that in either case,
τL ≤ t ≤ T =⇒ |Xt − XτL | < h.
Define two operators, Πe and Πo on functions u(x), x ∈ R by:
• Πe u(x) = u(x) if x ∈ Nhe , and x 7→ Πe u(x) is linear in each interval
[2kh, (2k + 2)h], k ∈ N.
• Πo u(x) = u(x) if x ∈ Nho , and x 7→ Πo u(x) is linear in each interval
[(2k − 1)h, (2k + 1)h], k ∈ N.
Thus Πe u and Πo u are linear interpolations of u in between the even (respec-
tively odd) multiples of h.
Apply the Markov property at T . Xt is a Brownian motion from T on, and
if L is odd, then τJ is the first time after T that Xt hits Nhe , so, using the known
hitting probabilities of Brownian motion,
E{g(XτJ ) | XT , L is odd } = Πe g(XT ) . (19)

On the other hand, if L is even, XτL ∈ Nhe , so that τJ−1 is the first time after
T when Xt ∈ Nho , and τJ is the first time after τJ−1 that Xt ∈ Nhe . Moreover,
τJ−1 coincides with a stopping time when L is even, so we can apply the Markov
property at τJ−1 to see
E{g(XτJ ) | XτJ−1 , L is even } = Πe g(XτJ−1 ) .

But if L is even, τJ−1 is the first hit of Nho , so
E{g(XτJ ) | XT , L is even } = E{Πe g(XτJ−1 ) | XT , L is even }

= Πo Πe g(XT ) .
Let
def
q(x) = P {L is even | XT = x} .
Note that L is even if and only if XτL ∈ Nhe , and if this is so, Xt does not
def
hit Nho between τL and T . Reverse Xt from time T : let X̂t = XT −t , 0 ≤ t ≤ T .
Then L is even if and only if X̂t hits Nhe before it hits Nho . But now, if we
condition on XT = x, or equivalently on X̂0 = x, then {X̂t , 0 ≤ t ≤ T } is a
Brownian bridge, and X̂t − X̂0 is a Brownian motion. Thus we can calculate the
12
exact probability that it hits Nhe before Nho . More simply, we can just note that if
h is small, the probability of hitting Nhe before Nho is not much influenced by the
drift, and so it is approximately that of unconditional Brownian motion. Thus,
if X̂0 = x ∈ (2kh, (2k + 1)h), q(x) = P {X̂t reaches 2kh before (2k + 1)h} ∼
(2k+1)h−x
h = dist(x, Nho )/h, where dist(x, Λ) is the distance from x to the set Λ.
Thus
1
q(x) = dist(x, Nho ) + O(h) . (20)
h
Proposition 9.1
E{g(XτJ )−g(XT )} = E{Πe g(XT )−g(XT )}+E{(Πo Πe g(XT )−Πe g(XT ))q(XT )}.
Proof. Let us write E{g(XτJ )} = E{g(XτJ ), L even } + E{g(XτJ ), L odd }.
Note that {L odd } ∈ F T , so it is conditionally independent of {XT +t , t ≥ 0}
given XT . Thus
n
E{g(XτJ )} = E E{g(XτJ ) | XT , L even }P {L even | XT }
o
+ E{g(XτJ ) | XT , L odd }P {L odd | XT } .
By the definition of q and the above relations, we see this is
= E{Πo Πe g(XT )q(XT )} + E{Πe g(XT )(1 − q(XT ))}

= E Πo Πe g(XT ) − Πe g(XT ) q(XT ) + E{Πe g(XT )} ,
which proves the proposition.
x 2
def
Definition 9.1 Let p(x) = √ 1
2πσ2 T
e− 2σ2 T be the density of XT under the
Brownian measure.
Corollary 9.2 Let ∆k = (Πe g)0 (2kh+) − (Πe g)0 (2kh−) be the jump in the
derivative of Πe g at the point 2kh. Then
∞
h2 X
Z
E loc (g) = (Πe g(x) − g(x))p(x) dx + (∆k ) p(2kh) + O(h3 ) . (21)
−∞ 3
k
Proof. The first integral equals the first expectation on the right hand side of
Proposition 9.1. The second expectation can be written
Z ∞
(Πo Πe g(x) − Πe g(x))q(x)p(x) dx . (22)
−∞
To simplify the second term, let ξ(x) = Πe g(x). Then ξ is piecewise linear with
vertices on Nhe , so we can write it in the form
X1
ξ(x) = ax + b + ∆k |x − 2kh|
2
k
13
for some a and b. Since Πo is a linear operator and Πo (ax + b) ≡ ax + b, we see
this is
Z ∞
1X
= ∆k (Πo |x − 2kh| − |x − 2kh|)q(x)p(x) dx .
2 −∞ k
Now |x − 2kh| is linear on both (−∞, 2kh) and (2kh, ∞), so that Πo |x −
2kh| = |x − 2kh| except on the interval [(2k − 1)h, (2k + 1)h]. On that interval,
Πo |x − 2kh| ≡ h and q(x) = (h − |x − 2kh|)/h, for q(x) is approximately 1/h
times the distance to the nearest odd multiple of h. Write p(x) = p(2kh) + O(h)
there. Then
∞ (2k+1)h
1
Z Z
(Πo |x−2kh|−|x−2kh|)q(x)p(x) dx = (h−|x−2kh|)2 (p(2kh)+O(h)) dx
−∞ h (2k−1)h
2 2
= h p(2kh) + O(h3 ) . (23)
3
If we remember that if g = |x − 2kh|, ∆k = 2, the corollary follows.
We can decompose g as follows. Define the modified Heaviside function H̃(x)
by 
 1 if x > 0
1
H̃(x) = if x = 0
 2
0 if x < 0
Lemma 9.3 Let g be piecewise C (2) . Then we can write g = g1 + g2 + g3 , where

g1 is a step function with at most finitely many discontinuities, g2 is continuous
and piecewise linear, and g3 ∈ C (1) with g300 piecewise continuous. Moreover, we
have
X X1
g1 (x) = ∆g(y)H̃(x − y) , g2 (x) = (∆g 0 )(y)|x − y| . (24)
y y
2
Proof. The sums in (24) are finite. By the definition of K, g(x) = 12 (g(x+) +
g(x−)) at any discontinuity of g. It is easy to check that if we define g1 by
(24), then g − g1 is continuous. (This is the reason we modified the Heaviside
function.) However, it may still have a finite number of discontinuities in its
def
derivative. We remove these by subtracting g2 : g3 = g − g1 − g2 . Then it is
easy to see that g3 is continuous, has a continuous first derivative, and that g300
is piecewise continuous.
Remark 9.4 The local error is not hard to calculate, but it will have to be
handled separately for each of the functions g1 , g2 , and g3 .
14
9.1 The Smooth Case
Proposition 9.5 Suppose g is in C (2) and that g and its first two derivitaves
are exponentially bounded. Then
2h2 ∞ 00
Z
Ê loc (g) = g (x)p(x) dx + o(h2 ). (25)
3 −∞
If g ∈ C (4) and g 000 and g (iv) are exponentially bounded, the error is O(h4 ).
Proof. We will calculate the right hand side of (21). Let Ik be R ∞the interval
[2kh, (2k + 2)h] and let yk = (2k + 1)h be its midpoint. Write −∞ (Πe g(x) −
P R
g(x))p(x) dx = k Ik (Πe g(x) − g(x))p(x) dx. Expand g around yk : g(x) =
g(yk ) + g (yk )(x − yk ) + 12 g 00 (yk )(x − yk )2 + o(h2 ). Notice that on Ik , Πe g(x) −
0
g(x) = 12 g 00 (yk )(h2 − (x − yk )2 ) + o(h2 ). (Indeed, Πe g = g if g is a linear function

of x, while g 7→ Πe g is linear, so that the first order terms drop out. The rest
follows since x 7→ Πe g(x) is R linear on Ik and equals g at the endpoints.) Write
p(x) = p(yk )+O(h) on Ik ; Ik (Πe g(x)−g(x))p(x) dx = 32 g 00 (yk )p(yk )(h3 +o(h3 )).
Summing, we get
Z ∞ X1
(Πe g(x) − g(x))p(x) dx = g 00 (yk )p(yk )(h2 + o(h2 ))2h .
−∞ 3
k
This is a Riemann sum for the integral (h2 /3) g 00 (x)p(x) dx. (One has to
R
be slightly careful here: the o(h2 ) term is uniform, so it doesn’t cause trouble in
the improper integral. There is an o(1) error in approximating the integral by
the sum, but as it multiplies the coefficent of h2 , the error is o(h2 ) in any case.
If g ∈ C (4) , it is O(h4 ).) Thus
h2
Z
= g 00 (x)p(x) dx + o(h2 ) . (26)
3
2 P
The second contribution to the error in (21) is h3 k ∆k p(2kh) + O(h ),
3
def
where ∆k is the discontinuity of the derivative of Πe g at xk = 2kh:
∆k = (g(xk+1 ) − 2g(xk ) + g(xk−1 ))/2h

Z 2h
= (1/2h) (g 0 (xk + x) − g 0 (xk − x)) dx
0
= 2g 00 (xk )(h + o(h)) .
2 P
h2
This gives h3 00
g 00 (x)p(x) dx + o(h2 ). Add this
R
k g (xk )p(xk )(2h + o(h)) = 3
to (26) to finish the proof.
9.2 The Piecewise Linear Case

For x ∈ R, let θ̂(x) = frac(x/2h) be the fractional part of x/2h. Then x =
2kh + 2θ̂(x)h for some integer k. Let ∆g 0 (x) = g 0 (x+) − g 0 (x−).
15
Proposition 9.6 Suppose g is continuous and piecewise linear. Then
X 1
Ê loc (g) = h2 ∆g 0 (y) + 2θ̂(y)(1 − θ̂(y)) p(y) + O(h3 ) . (27)
y
3
Proof. Write g(x) = ax + b + 12 y ∆g 0 (y)|x − y|, which we can do by Lemma

P
9.3.
Now Πe is a linear operator, and Πe f = f if f is affine, so it is enough to
prove this for g(x) = |x − y| for some fixed y. Let us compute the two terms in
(21).
Let Ik be the interval [2kh, (2k + 2)h]. If y ∈ Ik , x 7→ g(x) is linear on the
semi-infinite intervals on both sides of Ik , so Πe g(x) = g(x) except on Ik . Now
x 7→ Πe g(x) is linear in Ik and equals g(x) at the endpoints. As g(2kh) = 2θ̂(y)h
and g((2k + 2)h) = 2(1 − θ̂(y))h, we can write Πe g − g explicitly. Let p(x) =
p(y) + O(h) for x ∈ Ik . Then the first term in (21) is
Z ∞ Z
(Πe g(x) − g(x))p(x) dx = (p(y) + O(h)) (Πe g(x) − g(x)) dx
−∞ Ik
= 4h2 θ̂(y)(1 − θ̂(y))p(y) + O(h3 ) . (28)
Turning to the final term in (21), notice that as Πe g = g outside of Ik ,

X X
∆(Πe g)0 (z)0 (y) = ∆g 0 (z) = ∆g 0 (y) = 2 ,
y z
so the final term is just 2h2 /3. Adding this to (28), we get (27).
9.3 The Step Function Case

Proposition 9.7 Suppose that g is a step function. Then
X h2 X
Ê loc (g) = h (2θ̂(y) − 1)∆g(y)p(y) − y ∆g(y) p(y)
3σ 2 T h
y ∈hZ
/ y∈Ne
h2 X
+ y ∆g(y) p(y) . (29)
6σ 2 T h y∈No
P
Proof. By Lemma 9.3 we can write g(x) = y ∆g(y)H̃(x − y). By linearity, it
is enough to consider the case where g(x) = H̃(x − y). Once again, we compute
the integrals in (21). Let Ik = [2kh, (2k + 2)h]. If y ∈ Ik and 0 < θ̂(y) < 1, we
note that Πe g(x) = 0 if x < 2kh, Πe g(x) = 1 if x > 2(k + 2)h, and Πe g is linear
in Ik . Write p(x) = p(y) + O(h) on Ik and note that the only contribution to
the integral comes from Ik :
16
∞ (2k+2)h (2k+2)h
(x − 2kh)
Z hZ Z i
(Πe g(x) − g(x))p(x) dx = dx − dx (p(y) + O(h))
−∞ 2kh 2h 2(k+θ̂(y))h
2
= (2θ̂(y) − 1)p(y)h + O(h ) . (30)
The secondPterm in (21) is easily handled. We note that since H̃ 0 (x − y) = 0 for

all x 6= y, y ∆(H̃ 0 )(x − y) = 0, so that by (21), that integral is
Z ∞
(Πo Πe g(x) − Πe g(x))p(x) dx = O(h3 ) .
−∞
The cases θ̂(y) = 12 and θ̂(y) = 0 are special. In both cases we need to
expand p up to a linear term, since the constant term cancels out. So write
p(x) = p(y) + p0 (y)(x − y) + O(h2 ), x ∈ Ik . If g(x) = H̃(x − y), y ∈ Ik , then
θ̂(y) = 12 , means y = (2k + 1)h. Noting that the contribution from p(y) vanishes,
the first error term will be
"Z #
∞ (2k+2)h (2k+2)h
(x − 2kh)2
Z Z
0
(Πe g(x) − g(x))p(x) dx = p (y) dx − dx
−∞ 2kh 2h 2(k+θ̂(y))h
1
= − h2 p0 (y) (31)
6
h2 y
= p(y) ,
6σ 2 T
where we have used the fact that p0 (x) = − σ2xT p(x).
If θ̂ = 0, then g = H̃(x), so g(2kh) = 21 . Thus Πe g(x) = (x − (2k − 2)h)/4h if
(2k − 2)h < x < (2k + 2)h. It is zero for x ≤ (2k − 2)h and one for x > (2k + 2)h,
so that
∞ (2k+2)h Z (2k+2)h i
x − (2k − 2)h
Z hZ
(Πe g(x)−g(x))p(x) dx = p(2kh) − dx
−∞ (2k−2)h 4h 2kh
(2k+2)h Z (2k+2)h
hZ (x − (2k − 2)h)2 i
+ p0 (2kh) − (x − 2kh) dx . (32)
(2k−2)h 4h 2kh
The first term in square brackets vanishes, so this is

1 0 2kh
=p (2kh)h2 + O(h3 ) = − 2 p(2kh) + O(h3 ) . (33)
3 3σ T
The proposition follows upon adding (30), (31), and (33).
17
10 Convergence of the Delta: Proof of Cor. 4.4
The price of our derivative at time t < T is
V (s, t) = er(T −t) E{f (ST ) | St = s} . (34)
∂V
The hedging strategy depends on the space derivative , which is called the
∂s
delta. It is of interest to know how well the tree scheme estimates this. From
(34)
∂V V (eh s, t) − V (e−h s, t)
(s, t) = e−r(T −t) E{f 0 (ST ) | St = s} = lim
∂s h→0 s(eh − e−h )
If t = kδ and s = ejh+rt , we approximate ∂V /∂s by the symmetric discrete
derivative
u(j + 1, k) − u(j − 1, k)
. (35)
s(eh − e−h )
where u is the solution of the tree scheme (3).
Remark 10.1 Estimating the delta is essentially equivalent to running the
scheme on f 0 , not f . If f 0 is continuous, the result follows from Theorem 4.2.
However, if f 0 is discontinuous—as it is for a call or a put—and if the disconti-
nuity falls on a non-lattice point, Theorem 4.2 would give order 1/2, not order 1,
which does not imply Corollary 4.4. In fact it depends on some uniform bounds
which come from Theorem 4.2 and the fact we use the symmetric estimate of
the derivative. Thus there is something to prove.
Proof. By the Markov property, it is enough to prove the result for t = 0 and
S0 = 1. We will also assume that r = 0 to simplify notation.
The key remark is that if St is a logarithmic Brownian motion from s, then
eh St and e−h St are logarithmic Brownian motions from eh s and e−h s respec-
tively, so that
∂V B(eh s, 0) − V (e−h s, 0)
(1, 0) = lim = lim E{fˆ(ST , h)} ,
∂s h→0 eh − e−h h→0
where
f (eh s) − f (e−h s)
fˆ(s, h) = .
eh − e−h
Now f 0 ∈ K so that f and its first three derivatives are polynomially bounded,
hence there is a polynomial Q(s) which bounds fˆ, ∂ fˆ/∂s and ∂ 2 fˆ/∂s2 , uniformly
for h < 1 . This will justify passages to the limit, so that, for instance, ∂V
∂s (1, 0) =
E{ST f 0 (ST )}.
def
E(h) = E{fˆ(Sτn , h) − ST f 0 (ST )}
= E{fˆ(Sτ , h) − fˆ(ST , h)} + E{fˆ(ST , h) − ST f 0 (ST )}
n
def
= E 1 (h) + E 2 (h) .
18
Now f ∈ K, hence so is fˆ(·, h). Thus E 1 (h) is the error for a payoff function
f (·, h), and Theorem 4.2 applies. By the uniform polynomial bound of fˆ and
ˆ
related functions, these coefficients are uniformly bounded in h for h < 1, and
we can conclude that there is a constant A such that E 1 (h) ≤ Ah2 for small h.
The bound on E 2 (h) is straight analysis. We can write
Z ∞
E 2 (h) = (fˆ(s, h) − sf 0 (s))p(s) ds ,
−∞
R seh 0
where p is the density of ST . Now fˆ(s, h) − sf 0 (s) = eh −e 1
−h se−h
(f (u) −
0 (2) 0
f (s)) du. If f ∈ C on the interval, expand f to first order in a Taylor series
and integrate to see this is s2 h2 f 00 (s) + o(h2 ). In any case, if |f 00 | ≤ C on the
interval, it is bounded by Cs2 h. There are only finitely many points where
f ∈ / C (3) , each contributes at most Cs2 h2 to E 2 we see that |E 2 (h)| ≤ Bh2 for
some other constant B.
To prove (9), let f = (s − K)+ and evaluate E 1 (h) by Theorem 4.2. Note
that fˆ will have discontinuities of approximately 1/2h and −1/2h at s = k − h
and s = K + h respectively. Note also that θ(log s) (see section 3) is periodic
def
with period 2h, so that θ(eh K) = θ(e−h K) = θ̌, so that we can write (8) in the
form h p̂(log(K − h)) − p̂(log(K+ i
E 2 (h) = h2 C + Dθ̌(1 − θ̌) .
2h
the ratio converges to −p̂0 (logK) and (9) follows. This completes the proof,
except to remark that θ̌ corresponds to θ̂, for the odd h instead of even multiples
of h.
11 Appendix
11.1 Moments of τn and J
The (very complicated!) coefficients in Theorem 4.2 come from moments of τn
and J. We will derive them here. We will assume that P is the Brownian
measure, i.e. that Xt is a Brownian motion. Thus we will not write E Q and P Q
to indicate that we are using the Brownian measure. We can write Xt = σWt ,
where {Wt , t ≥ 0} is a standard Brownian motion.
Proposition 11.1 (i) τ1 has the same distribution as Tn ν, where ν = inf{t >
0 : |Wt | = 1}, so it has the moment generating function
r T −1 nπ 2
def λτ1
F1 (λ) = E{e } = cos 2λ , −∞ < λ < . (36)
n 8T
(ii) τ1 , τ2 − τ1 , τ3 − τ2 , . . . are i.i.d. , independent of Xτ1 , Xτ2 , . . . .
T 2T 2 2T 2
(iii) E{τ1 } = n , var{τ1 } = 3n2 E{τn } = T, var{τn } = 3n .
(iv) For each k ≥ 1 there are constants ck > 0, Ck > 0 such that
19
ck T k Tk
E{τ1k } = , E{|τn − T |k } ≤ Ck .
nk nk/2
Proof. (i) follows by Brownian scaling, and the moment generating function
is well-known for λ < 0 [4]; it is not difficult to extend to λ > 0. Then (ii) is
well-known, see e.g. [9], and (iii) is an easy consequence of (i).
For (iv), notice that τ1 has finite exponential moments, so that the moments
in question are finite. The kth moment of τ1 is determined by Brownian scaling:
ck = E{ν k }. To get the kth central moment, note that by (ii) we can write
τn − T as a sum of n i.i.d. copies of τ1 − T /n, say τn − T = η1 , + · · · + ηn . The
ηj have mean zero, so by Burkholder’s and Hölder’s inequalities in that order,
n
nX k2 o n
k
X
E{(τn − T )k } ≤ Ck E ηj2 ≤ Ck n 2 E{ηjk } .
j=1 j=1
But E{ηjk } = E{(τ1 − T /n)k } = Ck0 T k /nk , which implies (iv).
Proposition 11.2 Suppose n is an even integer. Then

(i) E{τJ } = T + 43 Tn + O(h3 );
4
(ii) E{J} = n + 3 + O(h);
(iii) E{(J − n) } = 32 n + O(1).
2
(iv) For k > 1, there exists a constant Ck such that E{(J − n)k } ≤ Ck nk/2
T
Proof. Set ηj = Tj − Tj−1 − n, j = 1, 2, . . . and put
j
X
Mj = ηj = τj − jE{τ1 } .
i=1
Then (Mj ) is a martingale. Apply the stopping theorem to the bounded stopping
time J ∧ N and let N → ∞ to see that 0 = E{MJ } = E{τJ } − E{J}E{τ1 } , so
that
E{τJ } n
E{J} = = E{τJ } . (37)
E{τ1 } T
Now τJ > T , so to find its expectation, notice that, as in Proposition 9.1, TJ
will either be the first hit of Nhe after T —if L is odd (see (18)—or it will be the
first hit of Nhe after the first hit of Nho after T , if L is even. The expected time for
Brownian motion to reach the endpoints of an interval is well known: if X0 = x ∈
(a, b), the expected time for X to leave (a, b) is σ −2 (x − a)(b − x). Let dist(x, A)
be the shortest distance from x to the set A. If XT = x, the expected additional
time to reach Nhe is σ −2 (h2 − dist2 (x, Nho )), while the expectated additional time
to reach Nho is σ −2 (h2 − dist2 (x, Nhe )). Once at Nho , the expected time to reach
Nhe from there is T /n = h2 /σ 2 . Now by (20), P {L is even | XT = x} = q(x) =
dist (x, Nho )/h + O(h), and P {L is odd | XT = x} = dist (x, Nhe )/h + O(h).
20
Thus, as L is conditionally independent of {XT +t , t ≥ 0} given XT , we have
E{τJ − T | XT = x} = P {L is odd | XT = x}E{τJ − T | XT = x, L is odd} +
P {L is even | XT = x}E{τJ − T | XT = x, L is even}, so that if p(x) is the
density of XT ,
Z ∞
σ −2 p(x) h2 − dist2 (x, Nho ) h−1 dist x, Nhe + O(h) dx

E{τJ − T } =
−∞
Z ∞
+ σ p(x) 2h2 − dist2 (x, Nhe ) h−1 dist (x, Nho ) + O(h) dx . (38)
−2
−∞
R∞ P R xk +h
Now let xk = (2k + 1)h and write −∞ = k (1/2h) xk −h 2h. Write
p(x) = p(xk )(1 + O(h)) on the interval (xk − h, xk + h). We can then do the
integrals explicitly:
X p(xk ) Z xk +h
(1 + O(h))σ −2 h2 − dist2 (x, Nho ) (dist (x, Nhe )/h + O(h)) dx
2h xk −h
k
5 −2 2 X 5 T
= σ h p(xk ) 2h + O(h3 ) ∼ (39)
12 12 n
k
R
since the Riemann sum approximates p(x) dx = 1. The other integral is similar,
and gives 11 T 4T 3
12 n , so we see E{τJ − T } = 3 n + O(h ) , which implies (i) and (ii).
To see (iii), note that Mj2 − j var(τ1 ) is also a martingale. As with M , we

can stop it at time J to see that that E{MJ2 } = E{J}var{τ1 }. Now E{MJ2 } =
E{τJ2 } − 2E{JτJ }E{τ1 } + E{τ1 }2 E{J 2 }.
But E{JτJ } = T E{J} + E{J(τj − T )}, and J is F T -measurable, while the
value of E{τj − T | F T } depends only on the value of XT and on the parity of
L, the index of the last stopping time before T . The joint distribution of XT
and the parity of L was determined in Section 9 by reversing Xt from T . It did
not depend on J. In short, E{J τJ } = E{J}E{τJ }. Thus
E{TJ2 } − 2E{J}E{τJ }E{τ1 } + E{τ1 }E{J 2 } = E{J}var(τ1 ) .

We have found the values of all the quantities except E{J 2 } and E{(τJ −T )2 },
2
but we know the latter will be of the form γ Tn2 for some γ > 0; this turns out
2
to be negligeable, so we can solve for E{J } to find that
10
E{J 2 } = n2 + n + O(1) ,
3
and (iii) follows.
√ 3y
To see (iv), notice that from (42), P |J − n| > y n ≤ 4e− 4 , so that

n |J − n| k o Z ∞
3y def
E √ ≤4 y k d(e− 4 ) = Ck ,
n −∞
21
which proves the assertion.
We will need to control the tails of the distributions of τn and J. The
following proposition is a key.
Proposition
√ 11.3 Let (ξn ) be a sequence of reals. Suppose m = nξn , where
nξn → ∞ as n → ∞. Then, as n → ∞,
n√ m 2
1
− 3ρ
o
P n |τm − T | > ρ ≤ 2e 4T 2 ξn 1 + O √ . (40)
n ξn n
Proof. By Chebyshev’s inequality and (36),
n√ m o √
def m
Pn = P n τm − T > ρ ≤ e−λρ E{eλ n(τm − n T ) }
n
4λ2 T 2 ξn
λT 2 !
−√ nξn − x2 x4
e n e
= e−λρ q = e−λρ .
cos 2λT√ cos x
n
q
2λT 3ρ
where x = √
n
. Take logs and choose λ = 2ξn T 2 to see that
ρ2 3 9 x2
log Pn ≤ − − + log cos x .
ξn T 2 2 x4 2
√
Expand log cos x near x = 0 and note that x2 = O(1/ξn n) = o(1), so this is
3ρ2 1
=− 2
+ O( √ ) . (41)
4ξn T ξn n
The other direction is similar. For λ > 0, let
def m m
e−λρ E e−λ(τm − n T )

Pn = P τm − T < −ρ ≤
n
λT nξn
e n
√
= e−λρ q
cosh 2λT
√
n
which differs from the above only in that cosh replaces the cosine. Exactly the
same manipulations show that Pn is again bounded by (41), and the conclusion
follows.
Corollary 11.4 Let y > 0 and let g be exponentially bounded. Then

n |J − n| y2
−3
o
P √ ≥ y ≤ 2e 4 (1+y/ n) (42)
√
and for all strictly positive p and ,
22
1
lim np E g(Xτn ); |τn − T | > n− 4 + = 0 ;

(43)
n→∞
1
lim np E g(XτJ ); |J − n| > n 2 + = 0 .

(44)
n→∞
√ √
Proof. Let ξ > 1. P {J > nξ} = P {τnξ < T } ≤ P { n |τnξ − ξT | > n (ξ −
3n(ξ−1)2 √
1)T }. By (40), P {J − n > n(ξ − 1)} ≤ e− 4ξ . Take y = (ξ − 1) n, to
√ −3 y2
see that P {J − n √> y n} ≤ 2e 4√(1+y/ n) . Similarly, for ξ < 1, P {J < nξ} =
√
P {τnξ > T } ≤ P { √n |τnξ − ξT | > n (1 − ξ)T }. Use (40) to get the same bound
for P {J − n < −y n}, and add to get (42).
Next, Xτn and τn are independent and |g(x)| ≤ Aea|x| for some A and a, so
1 1
|E{g(Xτn ); |τn − T | > n− 4 + }| ≤ AE{ea|Xτn | }P {|τn − T | > n− 4 + } .
Xτn is binomial √so we use its moment generating function to see that
E{ea|Xτn | } ≤ 2eσ nT . Combine this with the bound (40) on the tails of τn
to see (43).
The second assertion follows from Corollary 11.4, once we notice that in any
case, |XτJ − XT | ≤ 4h, so that, |E{g(XτJ )}| ≤ Aea|XτJ | ≤ Aea|XT |+4h .
11.2 Transition Probabilities

Let Pn (x) be the transition probability of a simple symmetric random walk on
the integers:
n! 2−n
Pn (x) = n+x
n−x (45)
2 ! 2 !
which is the probability of taking 12 (n + x) positive steps and 12 (n − x) negative
steps out of n total steps. Now let
def Pn+k (x)

R(n, k, x) = .
Pn (x)
k p xq p q def 1
Define the effective order Ô of a monomial nr to be Ô( knxr ) = 2 (p+q)−r.
3
Proposition 11.5 Let n, k, and x be even integers with max (|k|, |x|) ≤ n 5 .
Then
k 3k 2 + 4kx2 3k 2 x2 k 2 x4
R(n, k, x) = 1 − + − + + Q3 + O(n−3/2 ) ,
2n 8n2 4n3 8n4
where Q3 is a sum of monomials of effective order at most − 23 .
23
√ 1 1
Proof. We need Stirling’s formula in the form n! = 2π nn+ 2 e−n+ 12n +O(n ) .
−3
The O(n−3 ) term is uniform in the sense that there is an a such that for all n it
is between −a/n3 and a/n3 .
Write R(n, k, x) in terms of factorials and use Stirling’s formula on each. We
1
can write R(n, k, x) = R1 R2 , where R2 comes from the factors e 12n +O(n ) . Let
−3
ξ = k/n, η = x/n, and m = n/2, and take logarithms. We find
1 1 1
log R1 = (2m(1+ξ)+ ) log(1+ξ)+(m(1+η)+ ) log(1+η)+(m(1−η+ ) log(1−η)
2 2 2
1 1
− (m(1 + ξ + η) + ) log(1 + ξ + η) − (m(1 + ξ − η) + ) log(1 + ξ − η)
2 2
1 1 2 2 2 2
log R2 = + + −1− − .
24m 1 + ξ 1+η 1−η 1+ξ+η 1+ξ−η
Notice that errors in log R are of the same order as those of R; that is, if
rn → r > 0 and | log rn − log r| < cn−p , then |rn − r| < 2rcn−p for large n. Thus
it is enough to determine log R up to terms of order 1/n.
Since |k| and |x| are smaller than n3/5 , ξ and η are smaller than n−2/5 , and
an easy calculation shows that
ξ
log R2 =+ O(n−3/2 ) .
4n
Expand log R1 in a power series in ξ and η. To see how many terms we need
to keep, note that the coefficients may be O(n). If we include terms up to order
6 in ξ and η, the remainder will be o(n−3/2 ); those making an O(1/n) or larger
contribution will be of the form ξ p η q with p + q ≤ 2, and mξ p η q with p + q ≤ 5,
so that
1 1 1 3
log R1 = − ξ + ξ 2 + mη 2 ξ − mη 2 ξ 2 + mη 2 ξ 3 + mη 4 ξ + Ŝ(m, ξ, η) + o(n− 2 ) ,
2 4 2
where Ŝ(m, ξ, η) is a polynomial whose terms are all o(1/n). Notice that log R2
is also of o(1/n), so we can include it as part of Ŝ. In terms of n, k, and x,
k k 2 + 2kx2 k 2 x2 2k 3 x2 + kx4 3
log R = − + − + + S(n, k, x) + o(n− 2 )
2n 4n2 2n3 4n4
def 3
= Q(n, k, x) + S(n, k, x) + o(n− 2 ) .
where S(n, k, x) is a sum of monomials, each of which is o(1/n). The largest
term in Q is h2 x2 /n2 ≤ n−1/5 , so that we can write
8
!
Q+S − 23
X (Q + S)p
R = e (1 + o(n )) = (1 + o(n−1 ))
p=1
p!
def − 23
= (1 + o(n )) Q1 .
24
Check the effective order of the terms in the polynomial Q1 : we see that
k kx2 1 k2 k 2 x2
Ô = Ô =− , Ô = Ô = −1 ,
2n 2n2 2 4n2 n3
and the other two terms have effective order −3/2. All terms in S have effective
order less than −1, and hence less than or equal to −3/2, since all effective
orders are multiples of 1/2. Now the effective order of the product of monomials
is the sum of the effective orders, so that the only terms in Q1 of effective order
at least −1 are those in Q and the three terms from 21 Q2 which come from the
squares and products of the terms of effective order −1/2, namely
k2 k 2 x4 k 2 x2
2
+ 4
− .
8n 8n 4n3
All other terms have effective orders at most −3/2. Thus define
def k 3k 2 + 4kx2 3k 2 x2 k 2 x4
Q2 = 1 − + 2
− 3
+ ,
2n 8n 4n 8n4
and let Q3 be all the other terms in Q1 . Then R = (Q2 +Q3 )(1+o(n−3/2 )) where
all terms in Q3 have effective order at most −3/2. This proves the proposition.
12 Summary and Translation

We have done all our work in terms of the function g, but we would like the final
results in terms of the original function f . The translation is straightforward.
x σ2 T
Let ρ(x) = e− 2 − 8 . Then g(x) = f (s0 ex+rT )ρ(x). Moreover, p̂(x) =
ρ(x)p(x), where p and p̂ are the densities of XT under the Brownian and
martingale measures respectively. Moreover, the martingale measure P and
the Brownian measure Q are connected by dP = ρ(XT )dQ on F t . Thus
E Q {g(XT )} = E P {f (ST )}. We also have formulas involving the derivatives
of g. Note first that ρ0 (x) = − 21 ρ(x). Let s = s0 ex+rT , ST = s0 eXT +rT , so
XT = log(S̃T /s0 ) and
E Q {g(XT )} = E P {f (ST )}
n 1 o
E Q {g 00 (XT )} = E P ST2 f 00 (ST ) + f (ST ) (46)
4
E Q {XTk g(XT )} = E P {(log(S̃T /s0 ))k f (ST )} .
References
[1] Cox, J.C., S.A. Ross, and M. Rubinstein, Option pricing, a simplified ap-
proach, Journal of Financial Economics 7, (1979) 229–263.
25
[2] Diener, F. and M. Diener, Asymptotics of the price oscillations of a vanilla
option in a tree model, (preprint)
[3] Diener, F. and M. Diener, Asymptotics of the price of a barrier option in a
tree model (preprint)
[4] Karatzas, I. and S. Shreve, Brownian Motion and Stochastic Calculus,

Springer-Verlag, 1988.
[5] Karatzas, I. and S. Shreve, Methods of Mathematical Finance, Springer-
Verlag, 1998.
[6] Leisen, D.P. and M. Reimer, Binomial models for option valuation-
examining and improving convergence, Applied Math. Finance 3, (1996)
319–346
[7] Leisen, D.P., Pricing the American put option, a detailed convergence anal-
ysis for binomial models. Journal of economic dynamics and control 22,
(1998) 1419–1444.
[8] Monroe, Itrel, On embedding right-continuous martingales in Brownian mo-
tion, Ann. Math. Stat 43, 1293–1316.
[9] Rogers, L.C.G. and E.J. Stapleton, Fast accurate binomial pricing, Finance
and Stochastics 2 (1998), 3–17.
[10] Walsh, Owen D. and John B., Embedding and the convergence of the bi-
nomial and trinomial tree schemes, Numerical Methods and Stochastics,
Fields Communications Series 34, pp. 101-121 (in press).
26

Tree Scheme

Uploaded by

Copyright:

Available Formats

Tree Scheme

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tree Scheme

Uploaded by

Copyright:

Available Formats

The Rate of Convergence of the Binomial

Abstract. We study the detailed convergence of the binomial tree scheme. It

Key words: Tree scheme, options, rate of convergence, Skorokhod embedding

U (s0 , 0) − V (s0 , 0) = e−rT E{f (erT S̃τn ) − f (erT S̃T )} . (1)

3 The Tree Scheme

so (Ỹk ) is a Markov chain with these transition probabilities.

u(j, k) = e−rδ q u(j + 1, k + 1) + (1 − q) u(j − 1, k + 1) , j ∈ Z, k = 0, . . . , n − 1 ,

where Wt is a standard Brownian motion and σ > 0 is the volatility. The

Definition 4.1 Let K be the class of real-valued functions f on R which satisfy

∆f (s) = f (s+ ) − f (s− );

Corollary 4.1 Suppose f ∈ K and that n is an even integer. If f is discontinu-

B = 2σ 2 T K∆f 0 (K) p̂(log K̃/s0 ) .

Theorem 4.2 Suppose that f ∈ K. Let s1 , s2 , . . . , sk be the set of discontinuity

5 Remarks and Extensions

6 Embedding the Markov Chain in the Diffusion

based on an asymptotic expansion of the binomial coefficients derived by a modification of

τ0 = 0, τk+1 = inf{t > τk : S̃t = aS̃τk or a−1 S̃τk } .

E{f (XT +s )} = E{f (ST )} + a1 s + a2 s2 + O(s3 ) .

E{f (Sτn ) − f (ST )} ∼ E{a1 (τn − T ) + a2 (τn − T )2 } (11)

implying that the error is indeed O(1/n).

7 Splitting the Error

J = inf{2j : τ2j > T } .

E tot (f ) = e−rT E{f (Sτn ) − f (SτJ )} + e−rT E{f (SτJ ) − f (ST )}

As the notation suggests, E glob (f ) depends on global properties of f , such as

since Sτn = s0 eXτn . Similarly

Now Xt is a Q-Brownian motion, so that the times τ1 , τ2 , . . . are independent

But now τn − T = (τ1 − T /n) + (τ2 − τ1 − T /n) + · · · + (τn − τn−1 − T /n);

8 The Global Error

Theorem 8.1 Let g be measurable and exponentially bounded. Then

and, for integers p, q, and r,

by Proposition 11.5, where Q3 is a sum of terms of effective order at most − 23 .

9 The Local Error

E{g(XτJ ) | XT , L is odd } = Πe g(XT ) . (19)

E{g(XτJ ) | XτJ−1 , L is even } = Πe g(XτJ−1 ) .

E{g(XτJ ) | XT , L is even } = E{Πe g(XτJ−1 ) | XT , L is even }

By the definition of q and the above relations, we see this is

= E{Πo Πe g(XT )q(XT )} + E{Πe g(XT )(1 − q(XT ))}

Lemma 9.3 Let g be piecewise C (2) . Then we can write g = g1 + g2 + g3 , where

g(x) = 12 g 00 (yk )(h2 − (x − yk )2 ) + o(h2 ). (Indeed, Πe g = g if g is a linear function

∆k = (g(xk+1 ) − 2g(xk ) + g(xk−1 ))/2h

9.2 The Piecewise Linear Case

Proof. Write g(x) = ax + b + 12 y ∆g 0 (y)|x − y|, which we can do by Lemma

= 4h2 θ̂(y)(1 − θ̂(y))p(y) + O(h3 ) . (28)

Turning to the final term in (21), notice that as Πe g = g outside of Ik ,

9.3 The Step Function Case

The secondPterm in (21) is easily handled. We note that since H̃ 0 (x − y) = 0 for

The first term in square brackets vanishes, so this is

But E{ηjk } = E{(τ1 − T /n)k } = Ck0 T k /nk , which implies (iv).

Proposition 11.2 Suppose n is an even integer. Then

To see (iii), note that Mj2 − j var(τ1 ) is also a martingale. As with M , we

E{TJ2 } − 2E{J}E{τJ }E{τ1 } + E{τ1 }E{J 2 } = E{J}var(τ1 ) .

Proof. By Chebyshev’s inequality and (36),

Corollary 11.4 Let y > 0 and let g be exponentially bounded. Then

and for all strictly positive p and ,

11.2 Transition Probabilities

def Pn+k (x)

ξ = k/n, η = x/n, and m = n/2, and take logarithms. We find

12 Summary and Translation

[4] Karatzas, I. and S. Shreve, Brownian Motion and Stochastic Calculus,

and for all strictly positive p and ,