Lect Notes 4
Lect Notes 4
4.1
Numerical differentiation
4.1.1
Computation of derivatives
A direct approach
Let us recall that the derivative of a function is given by
F (x + ) F (x)
0
F 0 (x) = lim
F (x + x ) F (x)
x
1
(4.1)
The problem is then: how big should x be? It is obvious that x should be
small, in order to be as close as possible to the limit. The problem is that
it cannot be too small because of the numerical precision of the computer.
Assume for a while that the computer can only deliver a precision of 1e-2
and that we select x = 0.00001, then F (x + x ) F (x) would be 0 for the
x
x
x
x
Further, Taylors expansion theorem states
F (x + x ) = F (x) + F 0 (x)x +
F 00 () 2
x
2
Fb(x + ) Fb (x)
00 ()
F
2e
x
x 6
F 0 (x)
x
x
2
Fb (x + ) Fb(x)
2e
M
x
F 0 (x) 6
+ x
x
2
such that the upper bound is 2 eM . One problem here is that we usually
do not know M However, from a practical point of view, most people use the
following scheme for x
x = 1e 5. max(|x|, 1e 8)
which essentially amounts to work at the machine precision.
Ajouter ici une discussion de x
Similarly, rather than taking a forward difference, we may also take the
backward difference
F 0 (x) '
F (x) F (x x )
x
(4.2)
Central difference
There are a number of situations where onesided differences are not accurate enough, one potential solution is then to use the central difference or
twosided differences approach that essentially amounts to compute the
derivative using the backwardforward formula
F 0 (x) '
F (x + x ) F (x x )
2x
(4.3)
What do we gain from using this formula? To see this, let us consider the
Taylor series expansion of F (x + x ) and F (x x )
1
F (x + x ) = F (x) + F 0 (x)x + F 00 (x)2x +
2
1
F (x x ) = F (x) F 0 (x)x + F 00 (x)2x
2
1 (3)
F (1 )3x
6
1 (3)
F (2 )3x
6
(4.4)
(4.5)
F (x + x ) F (x x ) 2x (3)
+
F ()
2x
6
(4.6)
Dropping the, hopefully tiny, term O(hk+1 ) from this equation, we obtain a
linear equation, A = A(h) + hk , in the two unknowns A and . But this
really gives a different equation for each possible value of h. We can therefore
get two different equations to identify both A and by just using two different
step sizes. Then doing this , using step sizes h and h/2, for any h, and taking
2k times
A = A(h/2) + (h/2)k + O(hk+1 )
(4.7)
(note that, in equations (4.6) and (4.7), the symbol O(hk+1 ) is used to stand
for two different sums of terms of order hk+1 and higher) and subtracting
4
2k A(h/2) A(h)
+ O(hk+1 )
2k 1
where, once again, O(hk+1 ) stands for a new sum of terms of order hk+1 and
A=
higher. Denoting
B(h) =
then
2k A(h/2) A(h)
2k 1
A = B(h) + O(hk+1 )
What have we done so far? We have defined an approximation B(h) whose
error is of order k + 1 rather than k, such that it is a better one than A(h)s.
The generation of a new improved approximation for A from two A(h)s
with different values of h is called Richardson Extrapolation. We can then
continue the process with B(h) to get a new better approximation. This
method is widely used when computing numerical integration or numerical
differentiation.
Numerical differentiation with Richardson Extrapolation Assume
we want to compute the first order derivative of the function F C 2n R at
point x? . We may first compute the approximate quantity:
D00 (F ) =
F (x? + h0 ) F (x? h0 )
2h0
F (x? + h1 ) F (x? h1 )
2h1
Then according to the previous section, we may compute a better approximation as (since k = 2 in the case of numerical differentiation)
D01 (F ) =
4D01 (F ) D00 (F )
3
5
D01 (F ) D00 (F )
3
j+1
D`1
(F )
`1
Dj+1
(F ) Dj`1 (F )
4k 1
error proportional to hj
fm
= feval(f,x-h,varargin{:});
D(j+1,1) = (fs-fm)/(2*h);
% derivative with updated step size
%
% recursion
%
for k = 1:j,
D(j+1,k+1) = D(j+1,k-1+1) + (D(j+1,k-1+1)-D(j-1+1,k-1+1))/(4^k -1);
end
%
% compute errors
%
err = abs(D(j+1,j+1)-D(j-1+1,j-1+1));
rerr = 2*err/(abs(D(j+1,j+1))+abs(D(j-1+1,j-1+1))+eps);
j = j+1;
end
n = size(D,1);
D = D(n,n);
4.1.2
Partial Derivatives
Let us now consider that rather than having a single variable function, the
problem is multidimensional, such that F : Rn R and that we now want
to compute the first order partial derivative
Fi (x) =
F (x)
xi
This may be achieved extremely easily by computing, for example in the case
of central difference formula
F (x + ei x ) F (x ei x )
2x
where ei is a vector which i th component is 1 and all other elements are
0.
%
= l -> left difference
%
= r -> right difference
%
x0
= x0(:);
f
= feval(func,x0,varargin{:});
m
= length(x0);
n
= length(f);
J
= zeros(n,m);
dev = diag(.00001*max(abs(x0),1e-8*ones(size(x0))));
if (lower(method)==l);
for i=1:m;
ff= feval(func,x0+dev(:,i),varargin{:});
J(:,i)
= (ff-f)/dev(i,i);
end;
elseif (lower(method)==r)
for i=1:m;
fb= feval(func,x0-dev(:,i),varargin{:});
J(:,i)
= (f-fb)/dev(i,i);
end;
elseif (lower(method)==c)
for i=1:m;
ff= feval(func,x0+dev(:,i),varargin{:});
fb= feval(func,x0-dev(:,i),varargin{:});
J(:,i)
= (ff-fb)/(2*dev(i,i));
end;
else
error(Bad method specified)
end
4.1.3
Hessian
Hessian matrix can be computed relying on the same approach as for the
Jacobian matrix. Let us consider for example that we want to compute the
second order derivative of function F : R R using a central difference
approach, as we have seen it delivers higher accuracy. Let us write first write
the Taylors expansion of F (x + x ) and F (x x ) up to order 3
2x 00
F (x) +
2
2
F (x x ) = F (x) F 0 (x)x + x F 00 (x)
2
F (x + x ) = F (x) + F 0 (x)x +
3x (3)
F (x) +
6
3x (3)
F (x) +
6
4x (4)
F (1 )
4!
4x (4)
F (2 )
4!
4x (4)
[F (1 ) + F (4) (2 )]
4!
F (x + x ) 2F (x) + F (x x ) 2x (4)
F ()
2x
12
derivative is O(2x ).
4.2
Numerical Integration
f (X, )e 2 2 d
2
i=0
where the coefficients i depend on the method chosen to compute the integral.
This approach to numerical integration is known as the quadrature problem.
These method essentially differ by (i) the weights that are assigned to each
function evaluation and (ii) the nodes at which the function is evaluated. In
fact basic quadrature methods may be categorized in two wide class:
9
1. The methods that are based on equally spaced data points: these are
Newtoncotes formulas: the midpoint rule, the trapezoid rule and
Simpson rule.
2. The methods that are based on data points which are not equally spaced:
these are Gaussian quadrature formulas.
4.2.1
NewtonCotes formulas
Yb
Ya
YM
F (a)
a+b
2
Z b
(b a)3 00
a+b
F (x)dx = (b a)F
+
F ()
2
4!
a
where [a; b], such that the approximate integral is given by
a+b
Ib = (b a)F
2
Note that this rule does not make any use of the end points. It is noteworthy
that this approximation is far too coarse to be accurate, such that what is usually done is to break the interval [a; b] into smaller intervals and compute the
approximation on each subinterval. The integral is then given by cumulating
the subintegrals, we therefore end up with a composite rule. Hence, assume
that the interval [a; b] is broken into n > 1 subintervals of size h = (b a)/n,
Ibn = h
n
X
f (xi )
i=1
Trapezoid rule
The trapezoid rule essentially amounts to use a linear approximation of the
function to be integrated between the two end points of the interval. This
11
then defines the trapezoid {(a, 0), (a, F (a)), (b, F (b)), (b, 0)} which area and
consequently the approximate integral is given by
(b a)
Ib =
(F (a) + F (b))
2
xb
xa
F (a) +
F (b)
ab
ba
then
Z
b
a
F (x)dx '
'
'
'
'
'
'
xa
xb
F (a) +
F (b)dx
ba
a ab
Z b
1
(b x)F (a) + (x a)F (b)dx
ba a
Z b
1
(bF (a) aF (b)) + x(F (b) F (a))dx
ba a
Z b
1
x(F (b) F (a))dx
bF (a) aF (b) +
ba a
b2 a2
bF (a) aF (b) +
(F (b) F (a))
2(b a)
b+a
(F (b) F (a))
bF (a) aF (b) +
2
(b a)
(F (a) + F (b))
2
12
Simpsons rule
The simpsons rule attempts to circumvent an inefficiency of the trapezoid rule:
a composite trapezoid rule may be far too coarse if F is smooth. An alternative
is then to use a piecewise quadratic approximation of F that uses the values
of F at a, b and (b + a)/2 as interpolating nodes. Figure 4.2 illustrates the
rule. The thick line is the function F to be integrated and the thin line is
the quadratic interpolant for this function. A quadratic interpolation may be
obtained by the Lagrange interpolation formula, where = (b + a)/2
L (x) =
(x a)(x b)
(x a)(x )
(x )(x b)
F (a) +
F () +
F (b)
(a )(a b)
( a)( b)
(b a)(b )
a+b
2
F (b)
F (a)
a+b
2
=
=
F (a) b3 a3
b2 a2
(b
+
)
+
b(b
a)
2h2
3
2
h
F (a) 2
(b 2ba + a2 ) = F (a)
12h
3
I2 =
I3 =
(x a)(x b)
F ()dx
h2
a
Z
F () b 2
x (b + a)x + abdx
=
h2 a
3
b2 a2
F () b a3
(b + a)
+ ba(b a)
=
h2
3
2
F ()
4h
=
(b a)2 = F ()
3h
3
=
=
(x )(x a)
F (b)dx
2h2
a
Z
F (b) b 2
x (a + )x + adx
2h2 a
b2 a2
F (b) b3 a3
(a + )
+ a(b a)
2h2
3
2
14
F (b) 2
h
(b 2ba + a2 ) = F (b)
12h
3
b+a
ba
b
I=
F (a) + 4F
+ F (b)
6
2
If, like in the midpoint rule and the trapezoid rules we want to compute a
better approximation of the integral by breaking [a; b] into n > 2 even number
of subintervals, we set h = (b a)/n, xi = x + ih, i = 0, . . . , n. Then the
h
Ibn = [F (x0 ) + 4F (x1 ) + 2F (x2 ) + 4F (x3 ) + . . . + 2F (xn2 ) + 4F (xn1 ) + F (xn )]
3
Matlab Code: Simpsons Rule Integration
function simp=simpson(func,a,b,n,varargin);
%
% function simp=simpson(func,a,b,n,P1,...,Pn);
%
% func
: Function to be integrated
% a
: lower bound of the interval
% b
: upper bound of the interval
% n
: even number of sub-intervals => n+1 points
% P1,...,Pn : parameters of the function
%
h
= (b-a)/n;
x
= a+[0:n]*h;
y
= feval(func,x,varargin{:});
simp= h*(2*(1+rem(1:n-1,2))*y(2:n)+y(1)+y(n+1))/3;
by
F (x)dx
a
setting a and b too large enough negative and positive values. However, this
may be a particularly slow way of approximating the integral, and the next
theorem provides a indirect way to achieve higher efficiency.
Theorem 1 If : R R is a monotonically increasing, C 1 , function on
the interval [a; b] then for any integrable function F (x) on [a; b] we have
Z 1 (b)
Z b
F ((y))0 (y)dy
F (x)dx =
1 (a)
This theorem is just what we usually call a change of variables, and convert a
problem where we want to integrate a function in the variable x into a perfectly
equivalent problem where we integrate with regards to y, with y and x being
related by the nonlinear relation: x = (y).
As an example, let us assume that we want to compute the average of the
transformation of a gaussian random variable x ; N (0, 1). This is given by
Z
x2
1
G(x)e 2 dx
2
such that F (x) = G(x)e
x2
2
2
G( 2z)ez dz
16
1
y
(y) = log
such that 0 (y) =
1y
y(1 y)
In this case, the integral rewrites
Z 1
1
y
1
F
dy
2 log
1y
y(1 y)
0
or
1
1y
y
log((1y)/y)
y
1
2 log
dy
G
1y
y(1 y)
with
1
h(y)
1y
y
log(y/(1y))
G
2 log
y
1y
1
y(1 y)
Table 4.1 reports the results for the different methods we have seen so far. As
can be seen, the midpoint and the trapezoid rule perform pretty well with
20 subintervals as the error is less than 1e-4, while the Simpson rule is less
efficient as we need 40 subintervals to be able to reach a reasonable accuracy.
We will see in the next section that there exist more efficient methods to deal
with this type of problem.
Note that not all change of variable are admissible. Indeed, in this case
we might have used (y) = log(y/(1 y))1/4 , which also maps [0; 1] into R in
a monotonically increasing way. But this would not have been an admissible
17
Midpoint
2.2232
Trapezoid
1.1284
Simpson
1.5045
(-0.574451)
(0.520344)
(0.144219)
1.6399
1.6758
1.8582
(0.0087836)
(-0.0270535)
(-0.209519)
1.6397
1.6579
1.6519
(0.00900982)
(-0.00913495)
(-0.0031621)
1.6453
1.6520
1.6427
(0.00342031)
(-0.00332232)
(0.00604608)
1.6488
1.6487
1.6475
(-4.31809e-005)
(4.89979e-005)
(0.00117277)
1.6487
1.6487
1.6487
(-2.92988e-006)
(2.90848e-006)
(-1.24547e-005)
4.2.2
Gaussian quadrature
b
a
F (x)dx '
18
n
X
i=1
i F (xi )
for some quadrature nodes xi [a; b] and quadrature weights i . All xi s are
arbitrarily set in NewtonCotes formulas, as we have seen we just imposed a
equally spaced grid over the interval [a; b]. Then the weights i follow from
the fact that we want the approximation to be equal for a polynomial of
order lower or equal to the degree of the polynomials used to approximate the
function. The question raised by Gaussian Quadrature is then Isnt there a
more efficient way to set the nodes and the weights? The answer is clearly
R
Yes. The key point is then to try to get a good approximation to F (x)dx.
In fact, Gaussian quadrature is a much more general than simple integration, it actually computes an approximation to the weighted integral
Z b
n
X
F (x)w(x)dx '
i F (xi )
a
i=1
respect to the weighting function w(x) on the interval [a; b], and define ` so
n+1 /n
0
n (x)n+1 (x)
19
>0
We will take advantage of this property. The nodes will be the roots of the
orthogonal polynomial of order n, while the weights will be chosen such that
the gaussian formulas is exact for lower order polynomials
Z b
n
X
k (x)w(x)dx =
i k (xi ) for k = 0, . . . , n 1
a
i=1
This implies that the weights can be recovered by solving a linear system of
the form
Rb
1 0 (x1 ) + . . . + n 0 (xn ) = a w(x)dx
1 1 (x1 ) + . . . + n 1 (xn ) = 0
..
.
1 n1 (x1 ) + . . . + n1 n (xn ) = 0
which rewrites
=
with
0 (x1 )
..
.
0 (xn )
..
..
, =
.
.
n1 (x1 ) n1 (xn )
20
Rb
.. and =
w(x)dx
0
..
.
0
Note that the orthogonality property of the polynomials imply that the
matrix is invertible, such that = 1 . We now review the most commonly
used Gaussian quadrature formulas.
GaussChebychev quadrature
This particular quadrature can be applied to problems that takes the form
Z
1
1
F (x)(1 x2 ) 2 dx
1
1
1
F (x)(1 x2 ) 2 dx =
X
F (2n)()
F (xi ) + 2n1
n
2
(2n)!
i=1
for [1; 1] and where the nodes are given by the roots of the Chebychev
polynomial of order n:
xi = cos
2i 1
2n
i = 1, . . . , n
It is obviously the case that we rarely have to compute an integral that exactly
takes the form this quadrature imposes, and we are rather likely to compute
Z
F (x)dx
a
xa
2dx
1 implying dy =
ba
ba
(y + 1)(b a)
F a+
dy
2
1
1
21
1
1
G(y) p
1
1 y2
(y + 1)(b a)
a+
2
such that
Z
b
a
(b a) X
F
F (x)dx '
2n
i=1
dy
p
1 y2
(yi + 1)(b a)
a+
2
q
1 yi2
F (x)dx
1
F (x)dx =
1
n
X
i F (xi ) +
i=1
for [1; 1]. In this case, both the nodes and the weights are non trivial
to compute. Nevertheless, we can generate the nodes using any root finding
procedure, and the weights can be computed as explained earlier, noting that
R1
1 w(x)dx = 2.
transformation
y=2
2dx
xa
1 implying dy =
ba
ba
22
b
a
baX
i F
F (x)dx '
2
i=1
(yi + 1)(b a)
a+
2
where yi and i are the GaussLegendre nodes and weights over the interval
[a; b].
Such a simple formula has a direct implication when we want to compute
the discounted value of a an asset, the welfare of an agent or the discounted
sum of profits in a finite horizon problem, as it can be computed solving the
integral
T
0
in the case of a profit function. However, it will be often the case that we will
want to compute such quantities in an infinite horizon model, something that
this quadrature method cannot achieve unless considering a change of variable
of the kind we studied earlier. Nevertheless, there exists a specific Gaussian
quadrature that can achieve this task.
As an example of the potential of GaussLegendre quadrature formula, we
compute the welfare function of an individual that lives an infinite number of
period. Time is continuous and the welfare function takes the form
Z T
c(t)
et
dt
W =
0
where we assume that c(t) = c? et . Results for n=2, 4, 8 and 12 and T=10,
50, 100 and 1000 (as an approximation to ) are reported in table 4.3, where
23
we set = 0.01, = 0.05 and c? = 1. As can be seen from the table, the
integral converges pretty fast to the true value as the absolute error is almost
zero for n > 8, except for T=1000. Note that even with n = 4 a quite high
level of accuracy can be achieved in most of the cases.
GaussLaguerre quadrature
This particular quadrature can be applied to problems that takes the form
Z
F (x)ex dx
0
ex
is then given by
Z
n
X
x
F (x)e dx =
i F (xi ) +
0
i=1
(n!)2
F (2n) ()
(2n + 1)!(2n)! (2n)!
generate the nodes using any root finding procedure, and the weights can be
R
computed as explained earlier, noting that 0 w(x)dx = 1.
The problem involves a discount rate that should be eliminated to stick to the
0
and can be approximated by
n
1X
i F
i=1
24
yi
= 2.5
= 1
= 0.5
= 0.9
T=10
-3.5392
-8.2420
15.3833
8.3929
(-3.19388e-006)
(-4.85944e-005)
(0.000322752)
(0.000232844)
-3.5392
-8.2420
15.3836
8.3931
(-3.10862e-014)
(-3.01981e-012)
(7.1676e-011)
(6.8459e-011)
-3.5392
-8.2420
15.3836
8.3931
(0)
(1.77636e-015)
(1.77636e-015)
(-1.77636e-015)
12
-3.5392
-8.2420
15.3836
8.3931
(-4.44089e-016)
(0)
(3.55271e-015)
(1.77636e-015)
T=50
2
4
8
12
2
4
8
12
-11.4098
-21.5457
33.6783
17.6039
(-0.00614435)
(-0.0708747)
(0.360647)
(0.242766)
-11.4159
-21.6166
34.0389
17.8467
(-3.62327e-008)
(-2.71432e-006)
(4.87265e-005)
(4.32532e-005)
-11.4159
-21.6166
34.0390
17.8467
(3.55271e-015)
(3.55271e-015)
(7.10543e-015)
(3.55271e-015)
-11.4159
-21.6166
34.0390
17.8467
(-3.55271e-015)
(-7.10543e-015)
(1.42109e-014)
(7.10543e-015)
-14.5764
8
12
16.4972
(-0.110221)
(-0.938113)
(3.63138)
-14.6866
-24.5416
36.2078
18.7749
(-1.02204e-005)
(-0.000550308)
(0.00724483)
(0.00594034)
(2.28361)
-14.6866
-24.5421
36.2150
18.7808
(3.55271e-015)
(-1.03739e-012)
(1.68896e-010)
(2.39957e-010)
-14.6866
-24.5421
36.2150
18.7808
(-5.32907e-015)
(-1.77636e-014)
(2.84217e-014)
(1.77636e-014)
-1.0153
(-14.9847)
T=100
-23.6040
32.5837
T=1000
-0.1066
0.0090
(-24.8934)
(36.3547)
0.0021
(18.8303)
-12.2966
-10.8203
7.6372
3.2140
(-3.70336)
(-14.1797)
(28.7264)
(15.6184)
-15.9954
-24.7917
34.7956
17.7361
(-0.00459599)
(-0.208262)
(1.56803)
(1.09634)
-16.0000
-24.9998
36.3557
18.8245
(-2.01256e-007)
(-0.000188532)
(0.00798507)
(0.00784393)
25
where yi and i are the GaussLaguerre nodes and weights over the interval
[0; ).
where we assume that c(t) = c? et . Results for n=2, 4, 8 and 12 are reported
in table 4.3, where we set = 0.01, = 0.05 and c? = 1. As can be seen from
the table, the integral converges pretty fast to the true value as the absolute
error is almost zero for n > 8. It is worth noting that the method performs
far better than the GaussLegendre quadrature method with T=1000. Note
that even with n = 4 a quite high level of accuracy can be achieved in some
cases.
Table 4.3: Welfare in infinite horizon
n
2
4
8
12
= 2.5
-15.6110
= 1
-24.9907
= 0.5
36.3631
= 0.9
18.8299
(0.388994)
(0.00925028)
(0.000517411)
(0.00248525)
-15.9938
-25.0000
36.3636
18.8324
(0.00622584)
(1.90929e-006)
(3.66246e-009)
(1.59375e-007)
-16.0000
-25.0000
36.3636
18.8324
(1.26797e-006)
(6.03961e-014)
(0)
(0)
-16.0000
-25.0000
36.3636
18.8324
(2.33914e-010)
(0)
(0)
(3.55271e-015)
GaussHermite quadrature
This type of quadrature will be particularly useful when we will consider
stochastic processes with gaussian distributions as they approximate integrals
of the type
F (x)ex dx
26
Z
n
X
n! F (2n) ()
2
i F (xi ) + n
F (x)ex dx =
2
(2n)!
i=1
both the nodes and the weights are non trivial to compute. The nodes can be
computed using any root finding procedure, and the weights can be computed
R
F (x)e 22 dx
2
in order to stick to the problem this type of approach can explicitly solve, we
need to transform the variable using the linear map
y=
1
2
F ( 2y + )ey dy
and can therefore be approximated by
n
1 X
i F ( 2yi + )
i=1
where yi and i are the GaussHermite nodes and weights over the interval
(; ).
As a first example, let us compute the average of a lognormal distribution,
that is log(X) ; N (, 2 ) We then know that E(X) = exp( + 2 /2). This
27
0.01
1.00005
0.1
1.00500
0.5
1.12763
1.0
1.54308
2.0
3.76219
(8.33353e-10)
(8.35280e-06)
(0.00552249)
(0.105641)
(3.62686)
1.00005
1.00501
1.13315
1.64797
6.99531
(2.22045e-16)
(5.96634e-12)
(2.46494e-06)
(0.000752311)
(0.393743)
1.00005
1.00501
1.13315
1.64872
7.38873
(2.22045e-16)
(4.44089e-16)
(3.06422e-14)
(2.44652e-09)
(0.00032857)
1.00005
1.00501
1.13315
1.64872
7.38906
(3.55271e-15)
(3.55271e-15)
(4.88498e-15)
(1.35447e-14)
(3.4044e-08)
discretization of shocks that we will face when we will deal with methods for
solving rational expectations models. In fact, we will often face shocks that
follow Gaussian AR(1) processes
xt+1 = xt + (1 )x + t+1
where t+1 ; N (0, 2 ). This implies that
(
)
Z
Z
1
1 xt+1 xt (1 )x 2
exp
dxt+1 = f (xt+1 |xt )dxt+1 = 1
2
2
which illustrates the fact that x is a continuous random variable. The question
we now ask is: does there exist a discrete representation to x which is equivalent
to its continuous representation? The answer to this question is yes as shown
in Tauchen and Hussey [1991]2 Tauchen and Hussey propose to replace the
integral by
Z
Z
f (xt+1 |xt )
f (xt+1 |x)dxt+1 = 1
(xt+1 ; xt , x)f (xt+1 |x)dxt+1
f (xt+1 |x)
2
28
where f (xt+1 |x) denotes the density of xt+1 conditional on the fact that xt = x
#)
xt+1 xt (1 )x 2
f (xt+1 |xt )
1
xt+1 x 2
(xt+1 ; xt , x)
= exp
f (xt+1 |x)
2
then we can use the standard linear transformation and impose yt = (xt
x)/( 2) to get
Z
2
1
2
for which we can use a GaussHermite quadrature. Assume then that we have
n
1 X
j (yj ; yi ; x) ' 1
j=1
bij of the transition probability from state i to state j, but remember that the
quadrature is just an approximation such that it will generally be the case that
Pn
bij = 1 will not hold exactly. Tauchen and Hussey therefore propose the
j=1
following modification:
bij =
Pn
j (yj ; yi ; x)
si
where si =
0.7330
0.1745
=
0.0077
0.0000
0.2557
0.5964
0.2214
0.0113
29
0.0113
0.2214
0.5964
0.2557
0.0000
0.0077
0.1745
0.7330
meaning for instance that we stay in state 1 with probability 0.7330, but will
transit from state 2 to state 3 with probability 0.2214.
Matlab Code: Discretization of an AR(1)
n
xbar
rho
sigma
=
=
=
=
2;
0;
0.95;
0.01;
%
%
%
%
number of nodes
mean of the x process
persistence parameter
volatility
[xx,wx] = gauss_herm(n);
% nodes and weights for x
x_d
= sqrt(2)*s*xx+mx;
% discrete states
x=xx(:,ones(n,1));
y=x;
w=wx(:,ones(n,1));
%
% computation
%
px = (exp(y.*y-(y-rx*x).*(y-rx*x)).*w)./sqrt(pi);
sx = sum(px);
px = px./sx(:,ones(n,1));
4.2.3
Potential problems
In all the cases we dealt with in the previous sections, the integral were definite or at least existed (up to some examples), but there may exist some
singularities in the function such that the integral may not be definite. For
instance think of integrating x over [0; 1], the function diverges in 0. How
will perform the methods we presented in the previous section. The following
theorem by Davis and Rabinowitz [1984] states that standard method can still
be used.
Theorem 3 Assume that there exists a continuous monotonically increasing
R1
function G : [0; 1] R such that 0 G(x)dx < and |F (x)| 6 |G(x)| on
[0; 1], the the NewtonCotes rule (with F (1) = 0 to avoid the singularity in 1)
R1
and the GaussLegendre quadrature rule converge to 0 F (x)dx as n increases
to .
Therefore, we can still apply standard methods to compute such integrals, but
convergence is much slower and the error formulas cannot be used anymore as
30
kF (k) (x)k is infinite for k > 1. Then, if we still want to use error bounds,
4.2.4
Multivariate integration
as
approach to higher dimensions by multiplying sums. For instance, let xki and
ik , k = 1, . . . , nk be the quadrature nodes and weights of the one dimensional
problem along dimension k {1, . . . , s}, which can be obtained either from a
...
ns
X
is =1
i1 =1
A potential difficulty with this approach is that when the dimension of the
space increases, the computational cost increases exponentially this is the
socalled curse of dimensionality. Therefore, this approach should be restricted for low dimensions problems.
As an example of use of this type of method, let us assume that we want
to compute the first order moment of the 2 dimensional function F (x1 , x2 ),
where
x1
x2
;N
1
2
11 12
,
12 22
Z Z
1
21
0 1
1
|| (2)
F (x1 , x2 ) exp (x ) (x ) dx1 dx2
2
11 12
0
0
. Let be the Cholesky
where x = (x1 , x2 ) , = (1 , 2 ) , =
12 22
decomposition of such that = 0 , and let us make the change of variable
y = 1 (x )/ 2 x = 2y +
then, the integral rewrites
!
s
Z Z
X
2
1
yi dy1 dy2
F ( 2y + ) exp
i=1
1 X
i11 i22 F ( 211 y1 + 1 , 2(21 y1 + 22 y2 ) + 2 )
i1 =1 i2 =1
32
0.0100 0.0075
0.0075 0.0200
The results are reported in table 4.5, where we consider different values for
n1 and n2 . It appears that the method performs well pretty fast, as the true
value for the integral is 0.01038358129717, which is attained for n 1 > 8 and
n2 > 8.
Table 4.5: 2D Gauss-Hermite quadrature
nx \ny
2
4
8
12
2
0.01029112845254
0.01038328639869
0.01038328710679
0.01038328710679
4
0.01029142086814
0.01038358058862
0.01038358129674
0.01038358129674
8
0.01029142086857
0.01038358058906
0.01038358129717
0.01038358129717
=
=
=
=
=
2;
8;
gauss_herm(n1);
8;
gauss_herm(n2);
Sigma
Omega
mu1
mu2
=
=
=
=
%
%
%
%
%
int=0;
for i=1:n1;
for j=1:n2;
x12
= sqrt(2)*Omega*[x1(i);x2(j)]+[mu1;mu2];
f
= (exp(x12(1))-exp(mu1))*(exp(x12(2))-exp(mu2));
33
12
0.01029142086857
0.01038358058906
0.01038358129717
0.01038358129717
int
= int+w1(i)*w2(j)*f
end
end
int=int/sqrt(pi^n);
4.2.5
MonteCarlo integration
(b a) X
f (xi )
n
i=1
There have been attempts to build truly random number generators, but these technics
were far too costly and awkward.
4
Generating a 2 dimensional sequence may be done extracting subsequences: yk =
(x2k+1 , x2k+2 ).
36
50
100
150
200
250
for these numbers led to push linear congruential methods into disfavor, the
solution has been to design more complicated generators. An example of
those generators quoted by Judd [1998] is the multiple prime random number
generator for which we report the matlab code. This pseudo random numbers
generator proposed by Haas [1987] generates integers between 0 and 99999,
such that dividing the sequence by 100,000 returns numbers that approximate
a uniform random variable over [0;1] with 5 digits precision. If higher precision
is needed, the sequence may just be concatenated using the scheme (for 8 digits
precision) 100, 000 x2k + x2k+1 . The advantage of this generator is that its
long = 10000;
m
= 971;
ia
= 11113;
37
k+1
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
xk
0.6
0.7
0.8
0.9
ib
= 104322;
x
= zeros(long,1);
x(1) = 481;
for i= 2:long;
m = m+7;
ia= ia+1907;
ib= ib+73939;
if m>=9973;m=m-9871;end
if ia>=99991;ia=ia-89989;end
if ib>=224729;ib=ib-96233;end
x(i)=mod(x(i-1)*m+ia+ib,100000)/10;
end
which has a period length of 102 5, such that it passes a lot of randomness
tests.
A key feature of all these random number generators is that they attempt
to draw numbers from a uniform distribution over the interval [0;1]. There
however may be some cases where we would like to draw numbers from another
distribution mainly the normal distribution. The way to handle this problem is then to invert the cumulative density function of the distribution we
want to generate to get a random draw from this particular distribution. More
formally, assume we want numbers generated from the distribution F (.), and
N
we have a draw {xi }N
i=1 from the uniform distribution, then the draw {yi }i=1
Inverting this function may be trivial in some cases (say the uniform over [a;b])
N
2
1 X
Xi =
where 2 = var(Xi )
var
N
N
i=1
b2 =
N
N
2
1 X
1 X
Xi X with X =
Xi
N 1
N
i=1
i=1
39
2
1 X
F (xi ) IbF
=
N 1
N
bF2
i=1
port in table 4.6 the results obtained integrating the exponential function over
[0;1]. This table illustrates why MonteCarlo integration is seldom used (i)
Table 4.6: Crude MonteCarlo example:
N
10
100
1000
10000
100000
1000000
Ibf
1.54903750
1.69945455
1.72543465
1.72454262
1.72139292
1.71853252
bIbf
R1
0
ex dx.
0.13529216
0.05408852
0.01625793
0.00494992
0.00156246
0.00049203
N
1 X
(F (xi ) + F (1 xi ))
IbfA =
2N
i=1
5
Note that we used the same seed when generating this integral and the one we generate
using crude MonteCarlo.
41
Ibf
1.71170096
1.73211884
1.72472178
1.71917393
1.71874441
1.71827383
bIbf
R1
0
ex dx.
0.02061231
0.00908890
0.00282691
0.00088709
0.00027981
0.00008845
Nb
Na
1 X
1 X
s
a
b
If =
F (xi ) +
F (xbi )
Na
Nb
i=1
i=1
where xai [0; ] and xbi [; 1]. Then the variance of this estimator is
given by
(1 )2
2
vara (F (x)) +
varb (F (x))
Na
Nb
which equals
(1 )
vara (F (x)) +
varb (F (x))
N
N
Table 4.8 reports results for the exponential function for = 0.25. As can
be seen from the table, up to the 10 points example,6 there is hopefully
no differences between the crude MonteCarlo method and the stratified
sampling approach in the evaluation of the integral and we find potential
gain in the use of this approach in the variance of the estimates. The
potential problem that remains to be fixed is How should be selected?
In fact we would like to select such that we minimize the volatility,
6
42
N
10
100
1000
10000
100000
1.52182534
1.69945455
1.72543465
1.72454262
1.72139292
bIbf
R1
0
ex dx.
0.11224567
0.04137204
0.01187637
0.00359030
0.00114040
F (x)dx =
(F (x) (x))dx +
(x)dx
1.5. Table 4.9 reports the results. As can be seen the method performs
a little worse than the antithetic variates, but far better than the crude
MonteCarlo.
43
Ibf
bIbf
1.64503465
1.71897083
1.72499149
1.72132486
1.71983807
1.71838279
R1
0
ex dx.
0.05006855
0.02293349
0.00688639
0.00210111
0.00066429
0.00020900
F (x)
G(x)dx
G(x)
H(x)G(x)dx
D
44
I2bis
Z
Z
2 !
h2
F (x)
1
F (x)2
=
G(x)dx
G(x)dx
2
N
N
D G(x)
D G(x)
Z
2 !
Z
1
F (x)2
F (x)dx
dx
N
D G(x)
D
Ibf
1.54903750
1.69945455
1.72543465
1.72454262
1.72139292
1.71853252
45
bIbf
0.04278314
0.00540885
0.00051412
0.00004950
0.00000494
0.00000049
R1
0
ex dx.
4.2.6
n
Definition 1 A sequence {xi }
i=1 D R is said to be equidistributed
46
(D) X
F (xi ) =
lim
N N
i=1
F (x)dx
D
(b a) X
lim
F (xi ) =
N
N
i=1
F (x)dx
a
Remember that the fractional part is that part of a number that lies right after the dot.
This is denoted by {.}, such that {2.5} = 0.5. This can be computed as
{x} = x max{k Z|k 6 x}
The matlab function that return this component is x-fix(x).
47
k(k+1)
Haber: { k(k+1)
p
p
},
.
.
.
,
{
}
1
n
2
2
Baker: ({k er1 }, . . . , {kern }), rs are rational and distinct numbers
In all these cases, the ps are usually prime numbers. Figure 4.6 reports a 2
dimensional sample of 1000 points for each type of sequence. There obviously
Figure 4.6: QuasiMonte Carlo sequences
Weyl sequence
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0.2
0.4
0.6
0.8
0
0
Niederreiter sequence
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.4
0.6
0.8
0.2
0
0
0.4
0.6
0.8
0.8
Baker sequence
0.8
0
0
Haber sequence
0.2
0.4
0.6
exist other ways of obtaining sequences for quasiMonte carlo methods that
rely on low discrepancy approaches, Fourier methods, or the socalled good
lattice points approach. The interested reader may refer to chapter 9 in Judd
48
[1998], but we will not investigate this any further as this would bring us far
away from our initial purpose.
Matlab Code: Equidistributed Sequences
n=2;
% dimension of the space
nb=1000;
% number of data points
K=[1:nb];
% k=1,...,nb
seq=NIEDERREITER;
% Type of sequence
switch upper(seq)
case WEYL
% Weyl
p=sqrt(primes(n+1));
x=K*p;
x=x-fix(x);
case HABER
% Haber
p=sqrt(primes(n+1));
x=(K.*(K+1)./2)*p;
x=x-fix(x);
case NIEDERREITER
% Niederreiter
x=K*(2.^((1:n)/(1+n)));
x=x-fix(x);
case BAKER
% Baker
x=K*exp(1./primes(n+1));
x=x-fix(x);
otherwise
error(Unknown sequence requested)
end
49
ex dx.
N
10
Weyl
1.67548650
(0.0427953)
(0.00186656)
(0.0427953)
(0.104939)
100
1.71386433
1.75678423
1.71386433
1.71871676
(0.0044175)
(0.0385024)
(0.0044175)
(0.000434929)
1000
1.71803058
1.71480932
1.71803058
1.71817437
(0.000251247)
(0.00347251)
(0.000251247)
(0.000107457)
10000
100000
1000000
Haber
1.72014839
R1
Niederreiter
1.67548650
Baker
1.82322097
1.71830854
1.71495774
1.71830854
1.71829897
(2.67146e-005)
(0.00332409)
(2.67146e-005)
(1.71431e-005)
1.71829045
1.71890493
1.71829045
1.71827363
(8.62217e-006)
(0.000623101)
(8.62217e-006)
(8.20223e-006)
1.71828227
1.71816697
1.71828227
1.71828124
(4.36844e-007)
(0.000114855)
(4.36844e-007)
(5.9314e-007)
50
Bibliography
Davis, P.J. and P. Rabinowitz, Methods of Numerical Integration, New York:
Academic Press, 1984.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions to Nonlinear Asset Pricing Models, Econometrica,
1991, 59 (2), 371396.
51
Index
Antithetic variates, 41
Richardson Extrapolation, 5
Composite rule, 11
Simpsons rule, 13
Control variates, 43
Stratified sampling, 41
GaussChebychev quadrature, 21
Trapezoid rule, 11
GaussLaguerre quadrature, 24
GaussLegendre quadrature, 22
Hessian, 1
Importance sampling, 44
Jacobian, 1
Law of large numbers, 39
Midpoint rule, 10
MonteCarlo, 34
NewtonCotes, 10
Pseudorandom numbers, 36
Quadrature, 9
Quadrature nodes, 18
Quadrature weights, 18
QuasiMonte Carlo, 46
Random numbers generators, 35
52
Contents
4 Numerical differentiation and integration
4.1
4.2
Numerical differentiation . . . . . . . . . . . . . . . . . . . . . .
4.1.1
Computation of derivatives . . . . . . . . . . . . . . . .
4.1.2
Partial Derivatives . . . . . . . . . . . . . . . . . . . . .
4.1.3
Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . .
Numerical Integration . . . . . . . . . . . . . . . . . . . . . . .
4.2.1
NewtonCotes formulas . . . . . . . . . . . . . . . . . .
10
4.2.2
Gaussian quadrature . . . . . . . . . . . . . . . . . . . .
18
4.2.3
Potential problems . . . . . . . . . . . . . . . . . . . . .
30
4.2.4
Multivariate integration . . . . . . . . . . . . . . . . . .
31
4.2.5
MonteCarlo integration . . . . . . . . . . . . . . . . . .
34
4.2.6
46
53
54
List of Figures
4.1
NewtonCotes integration . . . . . . . . . . . . . . . . . . . . .
10
4.2
Simpsons rule . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
4.3
35
4.4
37
4.5
38
4.6
48
55
56
List of Tables
4.1
18
4.2
25
4.3
26
4.4
GaussHermite quadrature . . . . . . . . . . . . . . . . . . . .
28
4.5
. . . . . . . . . . . . .
33
. . . . . . . . . . . . .
40
. . . . . . . . . . . . .
42
. . . . . . . . . . . . .
43
. . . . . . . . . . . . .
44
. . . . . . . . . . . . .
45
. . . . . . . . . . . . .
50
2D Gauss-Hermite quadrature . . . . . .
R1
4.6 Crude MonteCarlo example: 0 ex dx. .
R1
4.7 Antithetic variates example: 0 ex dx. . .
R1
4.8 Stratified sampling example: 0 ex dx. .
R1
4.9 Control variates example: 0 ex dx. . . .
R1
4.10 Importance sampling example: 0 ex dx.
R1
4.11 Quasi MonteCarlo example: 0 ex dx. .
57