Matlab For Microeconometrics: Numerical Optimization: Nick Kuminoff Virginia Tech: Fall 2008
Matlab For Microeconometrics: Numerical Optimization: Nick Kuminoff Virginia Tech: Fall 2008
Nick Kuminoff
For a thorough introduction to the mathematics of numerical methods and proofs of convergence
rates, see:
Judd, Ken. 1998. Numerical Methods in Economics. The MIT Press, Cambridge
Massachusetts.
For a less formal, but much more applicable, discussion of how to implement numerical methods
in Matlab, see:
Miranda, Mario J. and Paul L. Fackler. 2002. Applied Computational Economics and
Finance. The MIT Press, Cambridge, Massachusetts.
Finally, if you open Matlab and select Help Matlab Help and then select contents from
the tab menu, there is an optimization toolbox guide that you may find useful.
TABLE OF CONTENTS
Page
1. BASICS & QUICK REFERENCE TO COMMON COMMANDS ................3
5. NEWTON-RAPHSON...............13
All algorithms require you to code your objective function as a separate m-file.
Matlab does minimization, not maximization. Thus, if your analytical model is set up as
a maximization problem, you need to multiply your objective function by -1, or some
other transformation so that numerical minimization returns the desired parameter vector.
You can use different algorithms to maximize the same objective function.
Use the optimset command to define the stopping criteria for whichever algorithm you
use.
You can instruct the fminunc algorithm to return the gradient and Hessian evaluated at the
solution, which may be useful for subsequent inference.
Code and data to generate the results shown here can be downloaded from the course web
page.
The table below contains a quick reference to the most commonly used commands for
unconstrained optimization. Matlab also has a set of algorithms for constrained
optimization. To see these, view the introduction to the optimization toolbox from the
Help menu.
COMMAND
USE
1. UNCONSTRAINED OPTIMIZATION
fzero
fminsearch
fminunc
optimset
The first step in solving a nonlinear optimization problem is to write an m-file which defines an
objective function that you seek to minimize. While textbooks often define nonlinear
econometric models as maximization problems, Matlabs numerical algorithms search for
parameter values that minimize a function. This is a trivial inconvenience since most
maximization problems can be converted to minimization problems simply by multiplying the
objective function by -1.
To begin, consider the following nonlinear model:
y = exp( X ) + u
(2.1)
Suppose we want to use nonlinear least squares to estimate this model for . The objective
function could be defined as follows:
Q ( ) = N 1 [ yi exp( X i )] , where
2
Nxk
(2.2)
kx 1
We can create an m-file named obj.m that evaluates this function by writing the following four
lines of code:
>>
>>
>>
>>
function Q=obj(beta,y,X)
N=size(y,1);
f=y-exp(X*beta);
Q=(1/N)*sum(f.^2);
%
%
%
%
define function
# observations
evaluate obj for all i
objective function
(2.3)
(2.4)
(2.5)
(2.6)
Line (2.3) tells Matlab that you are creating a new function named obj that takes beta, y, and X,
as its arguments, evaluates them and returns the scaler, Q. Line (2.4) measures the number of
rows in the y vector. Line (2.5) evaluates the argument inside the square brackets in (2.2) for
every observation. Finally, line (2.6) evaluates the objective function in (2.2).
Personally, I find that it is often helpful to precede the function with some descriptive text
explaining what the objective function is doing. My m-file would look something like this:
define function
# observations
evaluate obj for all i
objective function
Now that we have an objective function, we want to tell Matlab to solve for the parameter vector,
, that minimizes the function. This can be done using a number of different algorithms. The
next three sections describe the most common approaches.
Finally, make sure that your objective function is included in Matlabs path.
1
A
E
B
B
E
C
1
2
1
Step 4.M: repeat steps 1-3 until
simplex collapses onto solution s.t. max
side length < some tolerance
To illustrate how to implement the algorithm, we can use 2753 observations on houses from
Lucas Daviss (AER, 2004) cancer cluster paper:
y =
X =
x2 =
x3 =
x4 =
x5 =
x6 =
x7 =
constant
For example, we might be interested in how the time between sales differed between Churchill
county and Lyon county. One hypothesis would be that the cancer cluster in Churchill made it
harder for homeowners to sell their homes. In this case, we are primarily interested in recovering
1 , the coefficient on x1 .
The following lines of code load the data and prepare it for the estimation:
load optimization.mat
N=size(parcel,1);
y=duration;
X=[church house ones(N,1)];
k=size(X,2);
[min(X)' median(X)' max(X)']
%
%
%
%
%
%
load data
# observations
dependent variable
independent variables
# independent variables
look at scaling of data
(3.1)
(3.2)
(3.3)
(3.4)
(3.5)
(3.6)
Importantly, the data should all be scaled to be about the same magnitude. This will prevent the
objective function from being very flat in some dimensions relative to others. Having variables
that are scaled by vastly different orders of magnitude can cause optimization algorithms to
terminate prematurely. Line (3.6) summarizes the range of the data:
Variable
Churchill
acres
sqft
age
2
age
2
acres
constant
min
0.000
0.000
0.204
0.000
0.000
0.000
1.000
median
0.000
0.023
1.432
0.090
0.008
0.000
1.000
max
1.000
5.300
4.554
1.000
1.000
2.809
1.000
The Nelder-Mead algorithm can be used to estimate the nonlinear least squares model in (2.2)
based on these data. First, we need to define a set of starting values for the parameters and the
stopping criteria for the algorithm. This is done in the following three lines of code:
>> start=ones(k,1);
>> opt=optimset('Display','iter','TolFun',1e-8,'TolX',1e-8,'MaxIter',1e+6,'MaxFunEval',1e+6);
>> beta=fminsearch('obj',start,opt,y,X)
(3.7)
(3.8)
(3.9)
Line (3.7) defines a vector of ones as a starting guess for . Line (3.8) defines the stopping
criteria for the algorithm, as well as other options as follows:
Display,iter
TolFun,1e-8
TolX,1e-8
Whether or not to display the output from each iteration is a matter of personal preference.
However, it is important to make sure the other four criteria are set appropriately. Otherwise,
Matlabs default values for the stopping criteria may cause the function to terminate before it
reaches the true solution. Finally, line (3.9) calls the Nelder-Mead algorithm (aka fminsearch) to
minimize the objective function contained in the file obj.m, using the starting values defined
by start, the options defined by opt, and the data contained in the vector y and the matrix X.
The syntax and order in which you enter the arguments in (3.9) matters! You always need to
enter the name of your objective function first, in single quotes, followed by the starting values,
the options, and finally by the other data you want to pass to your objective function.
Importantly, the order in which these data are entered must also match the order in which they
are defined in your objective function. Notice that the objective function in (2.3) defines the
vector of parameters first, followed by the dependent variable, and finally by the matrix of
covariates. This matches the order in (3.9). For convenience, here are both lines again:
>> function Q=obj(beta,y,X)
>> theta=fminsearch('obj',start,opt,y,X)
So far, we have defined two separate m-files. The general NLS exponential objective function
on lines (2.3)-(2.6), and the script file on lines (3.1)-(3.9) which defines starting values and
other options for the Nelder-Mead algorithm, and then uses the fminsearch command to call the
algorithm to minimize the objective function using some predefined data and the specified
options. After calling the minimization procedure in (3.9), Matlab displays the following output:
Iteration
Func-count
min f(x)
Procedure
0
1
2
3
4
.
.
.
1988
1989
1990
1
8
10
11
12
.
.
.
3017
3018
3027
3.40011e+007
3.40011e+007
1.97289e+007
1.97289e+007
1.97289e+007
.
.
.
709769
709769
709769
initial simplex
expand
reflect
reflect
.
.
.
reflect
reflect
shrink
Optimization terminated:
the current x satisfies the termination criteria using OPTIONS.TolX of 1.000000e-008
and F(X) satisfies the convergence criteria using OPTIONS.TolFun of 1.000000e-008
beta =
-0.0742
-0.2321
0.1298
2.9995
-3.9431
0.3996
6.8009
For each iteration the results tell you the cumulative number of function evaluations, the current
value of the objective function at the best point on the simplex, and the procedure used to adjust
the simplex. The subsequent termination message tells you that the algorithm converged
according to the criteria defined in (3.8). Finally, the parameter estimates defined by beta
minimize (2.2). As we hypothesized, 1 < 0 , perhaps suggesting that all else constant, houses
in Churchill take longer to sell. The following table explores the sensitivity of this result to the
tolerances on the objective function and the parameter vector.
(1)
(2)
(3)
(4)
Churchill
-0.0696
-0.0742
-0.0742
-0.0742
acres
-0.2882
-0.2321
-0.2321
-0.2321
sqft
0.1386
0.1298
0.1298
0.1298
3.9862
2.9995
2.9995
2.9995
-5.8904
-3.9431
-3.9431
-3.9431
0.4971
0.3996
0.3996
0.3996
6.7186
6.8009
6.8009
6.8009
age
age
2
2
acres
constant
Tolerance
# iterations
obj fn value
converged
Moving from left to right in the table, the tolerances become increasingly precise. In column (1)
we are satisfied with precision in our estimates out to 4 decimal places, while in column (8) we
9
require precision out to 8 decimal places. As we increase the required precision, the number of
iterations required for convergence increases. Importantly, the parameter estimates change as
well. What happens is that with low precision in column (1), the algorithm converges before it
reaches the optimum. This may be because the objective function is relatively flat near the
solution so that further changes in parameters lead to very small changes in the function itself.
This has the largest impact for the coefficients on age and age2. Increasing the precision in
columns (2)-(4) leads to a superior solution. We know it is superior because it has a lower value
for the objective function.
With data that are scaled around unity, it is typical to use tolerances on the parameters
somewhere between 1e-6 and 1e-10. Why not increase the precision to an extremely small
number 1ike 1e-100, just to be safe? There are two reasons. First, Matlab starts to run into
rounding error problems with numbers smaller than 1e-16. Second, decreasing the tolerance can
dramatically increase the time required for convergence. That said, it certainly does not hurt to
try increasing the tolerance to a very small number like 1e-12 as a robustness check on your
results. However, it should take longer to converge.
10
2Q ( )
Q ( )
Let Q ( ) =
and H ( ) =
.
~ ~
it +1 = it H 1Q , where
~
H = positive definite approximation to H
~
Q = finite-difference approximation to Q
(4.1)
(4.2)
(4.3)
11
beta =
-0.0742
-0.2321
0.1298
2.9995
-3.9430
0.3996
6.8009
It converges to the same results as the Nelder-Mead algorithm. The difference is that it
converged about 100 times faster. It only took 94 iterations to reach the same solution.
In many cases, you will want to use the Hessian or the gradient, evaluated at the solution, in
order to calculate standard errors or to perform other inference. You can obtain this information
by altering the command line used to call the algorithm to read:
>> [beta,fval,flag,output,grad,H]=fminunc('obj',start,opt,y,X)
(4.4)
()
Q
positive (negative) values indicate convergence (problems)
information about the optimization algorithm used to return
~
Q
~
H
()
()
Notice that the gradient and Hessian are both finite-difference approximations. Alternatively, we
can calculate exact expressions if we are willing to code them ourselves.
12
Q ( )
= 2 N 1 [ yi exp( X i )] exp( X i )X i
(5.1)
We could create a new objective function that returns the gradient as a second output. Here is
the code for an m-file called obj_grad.m:
%---------------------------------------------------%
% OBJ_GRAD is the objective function to an NLS
%
%
problem, plus the gradient
%
%
%
%
function [Q dQ]=obj_grad(beta,y,x)
%
%
%
% INPUTS
%
%
y:
Nx1 vector of dependent variables
%
%
X:
Nxk matrix of independent variables %
%
beta:
kx1 vector of parameters
%
% OUTPUTS
%
%
Q:
NLS objective function to minimize %
%
dQ:
gradient
%
%---------------------------------------------------%
function [Q dQ]=obj(beta,y,X)
%
N=size(y,1);
%
f=y-exp(X*beta);
%
Q=(1/N)*sum(f.^2);
%
if nargout==2,
%
dQ=(-2/N)*(X'*(f.*exp(X*beta)));
%
end
%
%---------------------------------------------------%
(5.2)
(5.3)
(5.4)
(5.5)
(5.6)
(5.7)
(5.8)
Line (5.2) specifies that there are two outputs, Q ( ) and Q ( ) . Lines (5.3)-(5.5) code the
objective function, just as in the obj.m file. Line (5.6) tells Matlab to evaluate the following
line only if requested. (This saves computational time since it is not necessary to calculate
Q ( ) on every step of the algorithmduring the line search procedure, for example). Finally,
13
start=ones(k,1);
opt=optimset('GradObj','on','Display','iter','TolFun',1e-8,'TolX',1e-8,'MaxIter',1e+6,'MaxFunEval',1e+6);
beta=fminunc('obj_grad',start,opt,y,X)
(5.9)
(5.10)
(5.11)
Selecting GradObj,on tells Matlab that you have coded the gradient as a second output in
your objective function. Here are the estimation results:
Iteration
0
1
2
3
4
.
.
.
48
49
50
f(x)
3.40011e+007
1.3967e+007
1.3967e+007
1.38358e+007
6.30737e+006
.
.
.
709769
709769
709769
Norm of
step
0.755648
10
2.5
0.227829
.
.
.
0.00423822
0.00151899
0.000774996
First-order
optimality
2.97e+008
1.09e+008
1.09e+008
4.77e+007
1.7e+007
.
.
.
27.3
75
19.2
CG-iterations
1
1
0
2
.
.
.
3
3
3
The algorithm converged faster than the quasi-Newton method with a finite-difference gradient
illustrated in the previous section. This reflects the increased precision in the gradient, combined
with fewer evaluations of the objective function. While the parameter estimates are nearly
identical to those estimated in the previous two sections, they are not exactly the same. A couple
of the parameters differ out at the 4th decimal place. Again, this reflects the increased precision
in the gradient. In general, using an analytical gradient, instead of an approximation, may have a
small impact on the resulting parameter estimates. This difference may or may not be
economically important, depending on the problem. Here, for convenience, are the differences:
14
Numerical
-0.0742
-0.2321
0.1298
2.9995
-3.9431
0.3996
6.8009
Analytical
-0.0742
-0.2321
0.1299
2.9991
-3.9423
0.3995
6.8010
15
Scale your data. Scaling each variable to be around 1 often helps the algorithm to
converge faster and increases the robustness of results to alternative choices for the
conversion tolerances.
Try multiple starting values. Particularly if you are having trouble getting an estimator
to converge, try some different starting values. The most common choices are vectors of
ones or zeros, or random numbers. It is also quite easy to write a quasi-grid search that
loops over a large set of starting values and saves the parameter estimates and objective
function value associated with each set of starting values. Remember, R q is a large
place.
Try different algorithms. Estimating the model using both the Nelder-Mead algorithm
and a Newton-based method can provide a robustness check on your results. NelderMead tends to be more robust to nearly discontinuous functions or poorly-scaled
problems. If one does not work, perhaps the other will.
Constrained Optimization. If you can use economic reasoning to place bounds on some
of your parameters, you may want to use Matlabs constrained optimization algorithms,
such as fmincon, which essentially implements fminunc in a bounded space.
16