Applied Econometrics Using Matlab
Applied Econometrics Using Matlab
Applied Econometrics Using Matlab
James P. LeSage
Department of Economics
University of Toledo
October, 1999
.::UdecomBooks::.
Preface
.::UdecomBooks::.
ii
.::UdecomBooks::.
iii
.::UdecomBooks::.
iv
.::UdecomBooks::.
Acknowledgements
.::UdecomBooks::.
Contents
1 Introduction 1
Chapter 2 Appendix 42
3 Utility Functions 45
3.1 Calendar function utilities . . . . . . . . . . . . . . . . . . . . 45
3.2 Printing and plotting matrices . . . . . . . . . . . . . . . . . 49
3.3 Data transformation utilities . . . . . . . . . . . . . . . . . . 65
3.4 Gauss functions . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5 Wrapper functions . . . . . . . . . . . . . . . . . . . . . . . . 73
3.6 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . 76
Chapter 3 Appendix 77
4 Regression Diagnostics 80
4.1 Collinearity diagnostics and procedures . . . . . . . . . . . . 80
4.2 Outlier diagnostics and procedures . . . . . . . . . . . . . . . 94
4.3 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . 100
vi
.::UdecomBooks::.
CONTENTS vii
.::UdecomBooks::.
CONTENTS viii
References 313
.::UdecomBooks::.
List of Examples
ix
.::UdecomBooks::.
LIST OF EXAMPLES x
.::UdecomBooks::.
LIST OF EXAMPLES xi
.::UdecomBooks::.
List of Figures
xii
.::UdecomBooks::.
LIST OF FIGURES xiii
.::UdecomBooks::.
List of Tables
xiv
.::UdecomBooks::.
Chapter 1
Introduction
.::UdecomBooks::.
CHAPTER 1. INTRODUCTION 2
.::UdecomBooks::.
CHAPTER 1. INTRODUCTION 3
.::UdecomBooks::.
CHAPTER 1. INTRODUCTION 4
.::UdecomBooks::.
Chapter 2
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 6
result = ols(y,x);
The structure variable ‘result’ returned by our ols function might have
fields named ‘rsqr’, ‘tstat’, ‘beta’, etc. These fields might contain the R-
squared statistic, t−statistics for the β̂ estimates and the least-squares es-
timates β̂. One virtue of using the structure to return regression results is
that the user can access individual fields of interest as follows:
bhat = result.beta;
disp(‘The R-squared is:’);
result.rsqr
disp(‘The 2nd t-statistic is:’);
result.tstat(2,1)
There is nothing sacred about the name ‘result’ used for the returned
structure in the above example, we could have used:
bill_clinton = ols(y,x);
result2 = ols(y,x);
restricted = ols(y,x);
unrestricted = ols(y,x);
That is, the name of the structure to which the ols function returns its
information is assigned by the user when calling the function.
To examine the nature of the structure in the variable ‘result’, we can
simply type the structure name without a semi-colon and MATLAB will
present information about the structure variable as follows:
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 7
result =
meth: ’ols’
y: [100x1 double]
nobs: 100.00
nvar: 3.00
beta: [ 3x1 double]
yhat: [100x1 double]
resid: [100x1 double]
sige: 1.01
tstat: [ 3x1 double]
rsqr: 0.74
rbar: 0.73
dw: 1.89
Each field of the structure is indicated, and for scalar components the
value of the field is displayed. In the example above, ‘nobs’, ‘nvar’, ‘sige’,
‘rsqr’, ‘rbar’, and ‘dw’ are scalar fields, so their values are displayed. Matrix
or vector fields are not displayed, but the size and type of the matrix or
vector field is indicated. Scalar string arguments are displayed as illustrated
by the ‘meth’ field which contains the string ‘ols’ indicating the regression
method that was used to produce the structure. The contents of vector or
matrix strings would not be displayed, just their size and type. Matrix and
vector fields of the structure can be displayed or accessed using the MATLAB
conventions of typing the matrix or vector name without a semi-colon. For
example,
result.resid
result.y
would display the residual vector and the dependent variable vector y in the
MATLAB command window.
Another virtue of using ‘structures’ to return results from our regression
functions is that we can pass these structures to another related function
that would print or plot the regression results. These related functions can
query the structure they receive and intelligently decipher the ‘meth’ field
to determine what type of regression results are being printed or plotted.
For example, we could have a function prt that prints regression results and
another plt that plots actual versus fitted and/or residuals. Both these func-
tions take a regression structure as input arguments. Example 2.1 provides
a concrete illustration of these ideas.
The example assumes the existence of functions ols, prt, plt and data
matrices y, x in files ‘y.data’ and ‘x.data’. Given these, we carry out a regres-
sion, print results and plot the actual versus predicted as well as residuals
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 8
with the MATLAB code shown in example 2.1. We will discuss the prt and
plt functions in Section 2.4.
% ----- Example 2.1 Demonstrate regression using the ols() function
load y.data;
load x.data;
result = ols(y,x);
prt(result);
plt(result);
function results=ols(y,x)
The keyword ‘function’ instructs MATLAB that the code in the file ‘ols.m’
represents a callable MATLAB function.
The help portion of the MATLAB ‘ols’ function is presented below and
follows immediately after the first line as shown. All lines containing the
MATLAB comment symbol ‘%’ will be displayed in the MATLAB command
window when the user types ‘help ols’.
function results=ols(y,x)
% PURPOSE: least-squares regression
%---------------------------------------------------
% USAGE: results = ols(y,x)
% where: y = dependent variable vector (nobs x 1)
% x = independent variables matrix (nobs x nvar)
%---------------------------------------------------
% RETURNS: a structure
% results.meth = ’ols’
% results.beta = bhat
% results.tstat = t-stats
% results.yhat = yhat
% results.resid = residuals
% results.sige = e’*e/(n-k)
% results.rsqr = rsquared
% results.rbar = rbar-squared
% results.dw = Durbin-Watson Statistic
% results.nobs = nobs
% results.nvar = nvars
% results.y = y data vector
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 9
% --------------------------------------------------
% SEE ALSO: prt(results), plt(results)
%---------------------------------------------------
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 10
results.nobs = nobs
results.nvar = nvars
results.y = y data vector
--------------------------------------------------
SEE ALSO: rtrace, prt, plt
---------------------------------------------------
REFERENCES: David Birkes, Yadolah Dodge, 1993, Alternative Methods of
Regression and Hoerl, Kennard, Baldwin, 1975 ‘Ridge Regression: Some
Simulations’, Communcations in Statistics
Now, turning attention to the actual MATLAB code for estimating the
ordinary least-squares model, we begin processing the input arguments to
carry out least-squares estimation based on a model involving y and x. We
first check for the correct number of input arguments using the MATLAB
‘nargin’ variable.
if (nargin ~= 2); error(’Wrong # of arguments to ols’);
else
[nobs nvar] = size(x); [nobs2 junk] = size(y);
if (nobs ~= nobs2); error(’x and y must have same # obs in ols’);
end;
end;
If we don’t have two input arguments, the user has made an error which
we indicate using the MATLAB error function. The ols function will return
without processing any of the input arguments in this case. Another error
check involves the number of rows in the y vector and x matrix which should
be equal. We use the MATLAB size function to implement this check in
the code above.
Assuming that the user provided two input arguments, and the number
of rows in x and y are the same, we can pass on to using the input information
to carry out a regression.
The ‘nobs’ and ‘nvar’ returned by the MATLAB size function are pieces
of information that we promised to return in our results structure, so we
construct these fields using a ‘.nobs’ and ‘.nvar’ appended to the ‘results’
variable specified in the function declaration. We also fill in the ‘meth’ field
and the ‘y’ vector fields.
results.meth = ’ols’;
results.y = y;
results.nobs = nobs;
results.nvar = nvar;
The decision to return the actual y data vector was made to facilitate
the plt function that will plot the actual versus predicted values from the
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 11
regression along with the residuals. Having the y data vector in the structure
makes it easy to call the plt function with only the structure returned by a
regression function.
We can proceed to estimate least-squares coefficients β̂ = (X 0 X)−1 X 0 y,
but we have to choose a solution method for the least-squares problem.
The two most commonly used approaches are based on the Cholesky and
qr matrix decompositions. The regression library ols function uses the qr
matrix decomposition method for reasons that will be made clear in the next
section. A first point to note is that we require more than a simple solution
for β̂, because we need to calculate t−statistics for the β̂ estimates. This
requires that we compute (X 0 X)−1 which is done using the MATLAB ‘slash’
operator to invert the (X 0 X) matrix. We represent (X 0 X) using (r0 r), where
r is an upper triangular matrix returned by the qr decomposition.
[q r] = qr(x,0);
xpxi = (r’*r)\eye(nvar);
results.beta = r\(q’*y);
xpxi = (x’*x)\eye(k);
results.beta = xpxi*(x’*y);
results.yhat = x*results.beta;
results.resid = y - results.yhat;
sigu = results.resid’*results.resid;
results.sige = sigu/(nobs-nvar);
tmp = (results.sige)*(diag(xpxi));
results.tstat = results.beta./(sqrt(tmp));
ym = y - mean(y);
rsqr1 = sigu; rsqr2 = ym’*ym;
results.rsqr = 1.0 - rsqr1/rsqr2; % r-squared
rsqr1 = rsqr1/(nobs-nvar);
rsqr2 = rsqr2/(nobs-1.0);
results.rbar = 1 - (rsqr1/rsqr2); % rbar-squared
ediff = results.resid(2:nobs) - results.resid(1:nobs-1);
results.dw = (ediff’*ediff)/sigu; % durbin-watson
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 12
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 13
vector β̂. If small changes in the data elements lead to large changes in the
solution to an estimation problem, we say that the problem is ill-conditioned.
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 14
The implication is that only the first digit is numerically accurate, meaning
that an estimate of β̂ = .83 only informs us that the parameter is between
.80 and .90!
These statements are based on some theoretical bounds calculated in
Belsley, Kuh, and Welsch (1980), leading us to conclude that the results
could be affected in the ways stated. The theoretical bounds are upper
bounds on the potential problems, reflecting the worst that could happen.
To examine the numerical accuracy of the Cholesky and qr approaches to
solving the least-squares problem, we rely on a “benchmark” data set for
which we know the true parameter values. We can then test the two regres-
sion algorithms to see how accurately they compute estimates of the true
parameter values. Such a benchmark is shown in (2.1).
1 1 + γ 1 + γ ... 1 + γ (n − 1) + (n − 2)γ + ²
1 γ+² γ ... γ (n − 2)γ + ²
1 γ γ + ² ... γ (n − 2)γ + ²
X=
.. .. .. .. .. y =
..
. . . . . .
1 γ γ ... γ + ² (n − 2)γ + ²
1 γ γ ... γ (n − 1) + (n − 2)γ − ²
(2.1)
This is a modification of a benchmark originally proposed by Wampler
(1980) set forth in Simon and LeSage (1988a, 1988b). The n by (n − 1)
matrix X and the nx1 vector y in (2.1) represent the Wampler benchmark
with the parameter γ added to the last (n − 2) columns of the X matrix,
and with (n − 2)γ added to the y vector. When γ = 0 this benchmark is
equivalent to the original Wampler benchmark. The modified benchmark
shares the Wampler property that its solution is a column of ones for all
values of ² > 0, and for all values of γ, so the coefficient estimates are unity
irrespective of ill-conditioning. This property makes it easy to judge how
accurate the least-squares computational solutions for the estimates are. We
simply need to compare the estimates to the true value of one.
The parameters γ and ² in (2.1) control the severity of two types of
near-linear relationships in the data matrix X. The parameter ² controls
the amount of collinearity between the last (n−2) columns of the data matrix
X. As the parameter ² is decreased towards zero, the last (n − 2) columns
move closer to becoming perfect linear combinations with each other. The
implication is that the parameter ² acts to control the amount of collinearity,
or the severity of the near linear combinations among the last (n−2) columns
of the data matrix. As we make the value of ² smaller we produce an
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 15
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 16
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 17
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 18
%---------------------------------------------------
% RETURNS: nothing, just plots regression results
% --------------------------------------------------
% NOTE: user must supply pause commands, none are in plt_reg function
% e.g. plt_reg(results);
% pause;
% plt_reg(results2);
% --------------------------------------------------
% SEE ALSO: prt_reg(results), prt, plt
%---------------------------------------------------
switch results.meth
case {’arma’,’boxcox’,’boxcox2’,’logit’,’ols’,’olsc’,’probit’,’ridge’, ...
’theil’,’tobit’,’hwhite’,’tsls’,’nwest’}
subplot(211), plot(tt,results.y,’-’,tt,results.yhat,’--’);
title([upper(results.meth), ’ Actual vs. Predicted’]);
subplot(212), plot(tt,results.resid); title(’Residuals’);
case {’robust’,’olst’,’lad’}
subplot(311), plot(tt,results.y,’-’,tt,results.yhat,’--’);
title([upper(results.meth), ’ Actual vs. Predicted’]);
subplot(312), plot(tt,results.resid); title(’Residuals’);
subplot(313), plot(tt,results.weight); title(’Estimated weights’);
otherwise
error(’method not recognized by plt_reg’);
end;
subplot(111);
The ‘switch’ statement examines the ‘meth’ field of the results structure
passed to the plt reg function as an argument and executes the plotting
commands if the ‘meth’ field is one of the regression methods implemented
in our function library. In the event that the user passed a result structure
from a function other than one of our regression functions, the ‘otherwise’
statement is executed which prints an error message.
The switch statement also helps us to distinguish special cases of robust,
olst, lad regressions where the estimated weights are plotted along with the
actual versus predicted and residuals. These weights allow the user to detect
the presence of outliers in the regression relationship. A similar approach
could be used to extend the plt reg function to accommodate other special
regression functions where additional or specialized plots are desired.
A decision was made not to place the ‘pause’ command in the plt reg
function, but rather let the user place this statement in the calling program
or function. An implication of this is that the user controls viewing regres-
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 19
sion plots in ‘for loops’ or in the case of multiple invocations of the plt reg
function. For example, only the second ‘plot’ will be shown in the following
code.
result1 = ols(y,x1);
plt_reg(result1);
result2 = ols(y,x2);
plt_reg(result2);
If the user wishes to see the regression plots associated with the first
regression, the code would need to be modified as follows:
result1 = ols(y,x1);
plt_reg(result1);
pause;
result2 = ols(y,x2);
plt_reg(result2);
The ‘pause’ statement would force a plot of the results from the first
regression and wait for the user to strike any key before proceeding with the
second regression and accompanying plot of these results.
Our plt reg function would work with new regression functions that we
add to the library provided that the regression returns a structure containing
the fields ‘.y’, ‘.yhat’, ‘.resid’, ‘.nobs’ and ‘.meth’. We need simply add this
method to the switch-case statement.
A more detailed example of using the results structure is the prt reg
function from the regression library. This function provides a printout of
regression results similar to those provided by many statistical packages.
The function relies on the ‘meth’ field to determine what type of regression
results are being printed, and uses the ‘switch-case’ statement to implement
specialized methods for different types of regressions.
A small fragment of the prt reg function showing the specialized print-
ing for the ols and ridge regression methods is presented below:
function prt_reg(results,vnames,fid)
% PURPOSE: Prints output using regression results structures
%---------------------------------------------------
% USAGE: prt_reg(results,vnames,fid)
% Where: results = a structure returned by a regression
% vnames = an optional vector of variable names
% fid = optional file-id for printing results to a file
% (defaults to the MATLAB command window)
%---------------------------------------------------
% NOTES: e.g. vnames = strvcat(’y’,’const’,’x1’,’x2’);
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 20
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 21
fprintf(fid,’*******************************************************\n’);
% <=================== end of ols,white, newey-west case
case {’ridge’} % <=================== ridge regressions
fprintf(fid,’Ridge Regression Estimates \n’);
if (nflag == 1)
fprintf(fid,’Dependent Variable = %16s \n’,vnames(1,:));
end;
fprintf(fid,’R-squared = %9.4f \n’,results.rsqr);
fprintf(fid,’Rbar-squared = %9.4f \n’,results.rbar);
fprintf(fid,’sigma^2 = %9.4f \n’,results.sige);
fprintf(fid,’Durbin-Watson = %9.4f \n’,results.dw);
fprintf(fid,’Ridge theta = %16.8g \n’,results.theta);
fprintf(fid,’Nobs, Nvars = %6d,%6d \n’,results.nobs,results.nvar);
fprintf(fid,’*******************************************************\n’);
% <=================== end of ridge regression case
otherwise
error(’method unknown to the prt_reg function’);
end;
tout = tdis_prb(results.tstat,nobs-nvar); % find t-stat probabilities
tmp = [results.beta results.tstat tout]; % matrix to be printed
% column labels for printing results
bstring = ’Coefficient’; tstring = ’t-statistic’; pstring = ’t-probability’;
cnames = strvcat(bstring,tstring,pstring);
in.cnames = cnames; in.rnames = Vname; in.fmt = ’%16.6f’; in.fid = fid;
mprint(tmp,in); % print estimates, t-statistics and probabilities
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 22
The first use of prt reg produces a printout of results to the MATLAB
command window that uses ‘generic’ variable names:
Ordinary Least-squares Estimates
R-squared = 0.8525
Rbar-squared = 0.8494
sigma^2 = 0.6466
Durbin-Watson = 1.8791
Nobs, Nvars = 100, 3
***************************************************************
Variable Coefficient t-statistic t-probability
variable 1 1.208077 16.142388 0.000000
variable 2 0.979668 11.313323 0.000000
variable 3 1.041908 13.176289 0.000000
The second use of prt reg uses the user-supplied variable names. The
MATLAB function strvcat carries out a vertical concatenation of strings
and pads the shorter strings in the ‘vnames’ vector to have a fixed width
based on the longer strings. A fixed width string containing the variable
names is required by the prt reg function. Note that we could have used:
vnames = [’y variable’,
’constant ’,
’population’,
’income ’];
but, this takes up more space and is slightly less convenient as we have
to provide the padding of strings ourselves. Using the ‘vnames’ input in
the prt reg function would result in the following printed to the MATLAB
command window.
Ordinary Least-squares Estimates
Dependent Variable = y variable
R-squared = 0.8525
Rbar-squared = 0.8494
sigma^2 = 0.6466
Durbin-Watson = 1.8791
Nobs, Nvars = 100, 3
***************************************************************
Variable Coefficient t-statistic t-probability
constant 1.208077 16.142388 0.000000
population 0.979668 11.313323 0.000000
income 1.041908 13.176289 0.000000
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 23
The third case specifies an output file opened with the command:
fid = fopen(’ols.out’,’wr’);
The file ‘ols.out’ would contain output identical to that from the second use
of prt reg. It is the user’s responsibility to close the file that was opened
using the MATLAB command:
fclose(fid);
Variable names are constructed if the user did not supply a vector of
variable names and placed in a MATLAB fixed-width string-array named
‘Vname’, with the first name in the array being the row-label heading ‘Vari-
able’ which is used by the function mprint. For the case where the user
supplied variable names, we simply transfer these to a MATLAB ‘string-
array’ named ‘Vname’, again with the first element ‘Variable’ that will be
used by mprint. We do error checking on the number of variable names
supplied which should equal the number of explanatory variables plus the
dependent variable (nvar+1). In the event that the user supplies the wrong
number of variable names, we issue a warning and print output results using
the generic variable names.
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 24
Vname = ’Variable’;
for i=1:nvar;
tmp = [’variable ’,num2str(i)]; Vname = strvcat(Vname,tmp);
end;
if (nflag == 1) % the user supplied variable names
[tst_n nsize] = size(vnames);
if tst_n ~= nvar+1
warning(’Wrong # of variable names in prt_reg -- check vnames argument’);
fprintf(fid,’will use generic variable names \n’);
nflag = 0;
else,
Vname = ’Variable’;
for i=1:nvar; Vname = strvcat(Vname,vnames(i+1,:)); end;
end; % end of nflag issue
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 25
input argument ‘fid’ makes it easy to handle both the case where the user
wishes output printed to the MATLAB command window or to an output
file. The ‘fid’ argument takes on a value of ‘1’ to print to the command
window and a user-supplied file name value for output printed to a file.
Finally, after printing the specialized output, the coefficient estimates, t-
statistics and marginal probabilities that are in common to all regressions are
printed. The marginal probabilities are calculated using a function tdis prb
from the distributions library discussed in Chapter 9. This function returns
the marginal probabilities given a vector of t−distributed random variates
along with a degrees of freedom parameter. The code to print coefficient
estimates, t-statistics and marginal probabilities is common to all regression
printing procedures, so it makes sense to move it to the end of the ‘switch-
case’ code and execute it once as shown below. We rely on the function
mprint discussed in Chapter 3 to do the actual printing of the matrix of
regression results with row and column labels specified as fields of a structure
variable ‘in’. Use of structure variables with fields as input arguments to
functions is a convenient way to pass a large number of optional arguments
to MATLAB functions, a subject taken up in Chapter 3.
tout = tdis_prb(results.tstat,nobs-nvar); % find t-stat probabilities
tmp = [results.beta results.tstat tout]; % matrix to be printed
% column labels for printing results
bstring = ’Coefficient’; tstring = ’t-statistic’; pstring = ’t-probability’;
cnames = strvcat(bstring,tstring,pstring);
in.cnames = cnames; in.rnames = Vname; in.fmt = ’%16.6f’; in.fid = fid;
mprint(tmp,in); % print estimates, t-statistics and probabilities
The case of a ridge regression illustrates the need for customized code
to print results for different types of regressions. This regression produces
a ridge parameter estimate based on a suggested formula from Hoerl and
Kennard (1970), or allows for a user-supplied value. In either case, the
regression output should display this important parameter that was used to
produce the coefficient estimates.
case {’ridge’} % <=================== ridge regressions
fprintf(fid,’Ridge Regression Estimates \n’);
if (nflag == 1)
fprintf(fid,’Dependent Variable = %16s \n’,vnames(1,:));
end;
fprintf(fid,’R-squared = %9.4f \n’,results.rsqr);
fprintf(fid,’Rbar-squared = %9.4f \n’,results.rbar);
fprintf(fid,’sigma^2 = %9.4f \n’,results.sige);
fprintf(fid,’Durbin-Watson = %9.4f \n’,results.dw);
fprintf(fid,’Ridge theta = %16.8g \n’,results.theta);
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 26
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 27
t-statistics and probabilities that should be in the ‘.beta, .tstat’ fields of the
structure returned by the theil function.
As an example of adding a code segment to handle a new regression
method, consider how we would alter the prt reg function to add a Box-
Jenkins ARMA method. First, we need to add a ‘case’ based on the Box-
Jenkins ‘meth’ field which is ‘arma’. The specialized code for the ‘arma’
method handles variable names in a way specific to a Box-Jenkins arma
model. It also presents output information regarding the number of AR and
MA parameters in the model, log-likelihood function value and number of
iterations required to find a solution to the nonlinear optimization problem
used to find the estimates, as shown below.
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 28
Provided that we placed the regression estimates and t-statistics from the
arma regression routine into structure fields ‘results.beta’ and ‘results.tstat’,
the common code (that already exists) for printing regression results would
work with this new function.
This produced the following profile output for the ols function:
The total time spent to carry out the regression involving 1000 obser-
vations and 15 explanatory variables was 0.1 seconds. Three lines of code
accounted for 100% of the time and these are listed in order as: [54 44 47].
Line #54 accounted for 50% of the total time, whereas the qr decomposi-
tion on line #44 only accounted for 40% of the time. Line #54 computes
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 29
the mean of the y-vector used to determine the R-squared statistic. The
third slowest computation involved line #47 where the backsolution for the
coefficients took place, requiring 10% of the total time.
These results shed additional light on the speed differences between the
Cholesky and qr decomposition methods discussed earlier in Section 2.3.
Using either the Cholesky or qr method, we would still be spending the
majority of time computing the mean of the y-vector and backsolving for
the coefficients, and these computations account for 60% of the total time
required by the ols function. Saving a few hundredths of a second using the
Cholesky in place of the qr decomposition would not noticeably improve the
performance of the ols function from the user’s viewpoint.
The profile report on the prt reg function was as follows:
Here we see that printing the results took 0.47 seconds, almost five times
the 0.10 seconds needed to compute the regression results. It is not surpris-
ing that computation of the marginal probabilities for the t-statistics on line
#354 took 32% of the total time. These computations require use of the in-
complete beta function which in turn draws on the log gamma function, both
of which are computationally intensive routines. Most of the time (45%) was
spent actually printing the output to the MATLAB command window which
is done in the ‘for-loop’ at line #367. (Note that we replaced the call to the
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 30
mprint function with the ‘for loop’ and explicit fprintf statements to make
it clear that printing activity actually takes time.)
One conclusion we should draw from these profiling results is that the
design decision to place computation of the marginal probabilities for the t-
statistics in the prt reg function instead of in the ols function makes sense.
Users who wish to carry out Monte Carlo experiments involving a large num-
ber of least-squares regressions and save the coefficient estimates would be
hampered by the slow-down associated with evaluating the marginal prob-
abilities in the ols function.
A second conclusion is that if we are interested in least-squares estimates
for β alone (as in the case of a two-stage least-squares estimation procedure),
we might implement a separate function named olsb. Computing β̂ coeffi-
cients using a specialized function would save time by avoiding computation
of information such as the R−squared statistic, that is not needed.
The comments concerning speed differences between Cholesky and qr
solutions to the least-squares problem are amplified by these profiling results.
It would take the same amount of time to print results from either solution
method, and as indicated, the time needed to print the results is five times
that required to compute the results!
We could delve into the time-profile for the tdis prb function which is
part of the distributions library discussed in Chapter 9. This turns out to
be un-enlightening as the routine spends 100% of its time in the MATLAB
incomplete beta function (betainc) which we cannot enhance.
A final point regarding use of the MATLAB profile command is that
you need to position MATLAB in the directory that contains the source file
for the function you are attempting to profile. For example, to profile the
tdis prb function which is in the ‘distrib’ directory, we need to move to
this directory before executing the profile tdis prb and profile report
commands.
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 31
For specific examples of this canned format you can examine the demon-
stration files in the regression function library.
In this section, we wish to go beyond simple demonstrations of the var-
ious estimation procedures to illustrate how the results structures can be
useful in computing various econometric statistics and performing hypothe-
sis tests based on regression results. This section contains sub-sections that
illustrate various uses of the regression function library and ways to produce
new functions that extend the library. Later chapters illustrate additional
regression functions not discussed here.
% ----- Example 2.4 Using the ols() function for Monte Carlo
nobs = 100; nvar = 5; ntrials = 100;
b = ones(nvar,1); % true betas = 1
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 32
We recover the estimates β̂ from the ‘results’ structure each time through
the loop, transpose and place them in the ‘ith’ row of the matrix ‘bsave’.
After the loop completes, we compute the mean and standard deviations of
the estimates and print these out for each of the 5 coefficients. MATLAB
mean and std functions work on the columns of matrices, motivating our
storage scheme for the ‘bsave’ matrix.
To provide a graphical depiction of our results, we use the MATLAB
hist function to produce a histogram of the distribution of estimates. The
hist function works on each column when given a matrix input argument,
producing 5 separate histograms (one for each column) that are color-coded.
We used the LaTeX notation, ‘backslash beta’ in the ‘ylabel’, ‘xlabel’ and
‘legend’ commands to produce a Greek symbol, β in the y and x labels and
the LaTeX underscore to create the subscripted βi , i = 1, . . . , 5 symbols in
the legend. Figure 2.1 shows the resulting 5-way histogram.
It is clear from Figure 2.1 that although the estimates are centered on
the true value of unity, the distribution extends down to 0.75 and up to 1.25.
The implication is that particular data samples may produce estimates far
from truth despite the use of an unbiased estimation procedure.
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 33
30
β
1
β
2
β3
25
β4
β5
20
frequency of β outcomes
15
10
0
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Estimated β values
yt = X t β + u t
ut = ρut−1 + εt (2.3)
Example 2.5 carries out least-squares estimation, prints and plots the results
for the generated data set.
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 34
yt = y(101:n,1); xt = xmat(101:n,:);
n = n-100; % reset n to reflect truncation
Vnames = strvcat(’y’,’cterm’,’x2’,’x3’);
result = ols(yt,xt); prt(result,Vnames); plt(result);
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 35
3. modifying the prt reg function by adding specific code for printing
the output. Your printing code should probably present the results
from iteration which can be passed to the prt reg function in the
results structure.
You might compare your code to that in olsc.m from the regression
function library.
Maximum likelihood estimation of the least-squares model containing
serial correlation requires that we simultaneously minimize the negative of
the log-likelihood function with respect to the parameters ρ, β and σε in the
problem. This can be done using a simplex minimization procedure that
exists as part of the MATLAB toolbox. Other more powerful multidimen-
sional optimization procedures are discussed in Chapter 10 which takes up
the topic of general maximum likelihood estimation of econometric models.
We will use a MATLAB function that minimizes a function of several
variables using a simplex algorithm. The function is named fmins and it
has the following input format for our application:
The string input argument ‘ar1 like’ is the name of a MATLAB function
we must write to evaluate the negative of the log-likelihood function. The
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 36
X
n
L(ρ, β) = (et − ρet−1 )2 (2.4)
t=2
But, this likelihood ignores the first observation and would only be appro-
priate if we do not view the serial correlation process as having been in
operation during the past. The function we use is not conditional on the
first observation and takes the form (see Green, 1997 page 600):
where σε2 has been concentrated out of the likelihood function. We compute
this parameter estimate using e∗0 e∗ /(n − k), where e∗ = y ∗ − X ∗ β̂.
The MATLAB function ar1 like to compute this log-likelihood for var-
ious values of the parameters ρ and β is shown below:
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 37
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 38
The resulting parameter estimates for ρ, β are returned from the fmins
function and used to compute an estimate of σε2 . These results are then
printed. For an example of a function in the regression library that imple-
ments this approach, see olsar1.
F = (e0 er − e0 eu )/m
(2.6)
e0 eu /(n − k)
where m is the number of restrictions (five in our example) and n, k are the
number of observations and number of explanatory variables in the unre-
stricted model respectively.
We use ols for the two regressions and send the ‘results’ structures to a
function waldf that will carry out the joint F-test and return the results.
Assuming the existence of a function waldf to implement the joint F-
test, the MATLAB code to carry out the test would be as shown in example
2.8.
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 39
All of the information necessary to carry out the joint F-test resides in
the two results structures, so the function waldf needs only these as input
arguments. Below is the waldf function.
function [fstat, fprb] = waldf(resultr,resultu)
% PURPOSE: computes Wald F-test for two regressions
%---------------------------------------------------
% USAGE: [fstat fprob] = waldf(resultr,resultu)
% Where: resultr = results structure from ols() restricted regression
% resultu = results structure from ols() unrestricted regression
%---------------------------------------------------
% RETURNS: fstat = {(essr - essu)/#restrict}/{essu/(nobs-nvar)}
% fprb = marginal probability for fstat
% NOTE: large fstat => reject as inconsistent with the data
%---------------------------------------------------
if nargin ~= 2 % flag incorrect arguments
error(’waldf: Wrong # of input arguments’);
elseif isstruct(resultu) == 0
error(’waldf requires an ols results structure as input’);
elseif isstruct(resultr) == 0
error(’waldf requires an ols results structure as input’);
end;
% get nobs, nvar from unrestricted and restricted regressions
nu = resultu.nobs; nr = resultr.nobs;
ku = resultu.nvar; kr = resultr.nvar;
if nu ~= nr
error(’waldf: the # of obs in the results structures are different’);
end;
if (ku - kr) < 0 % flag reversed input arguments
error(’waldf: negative dof, check for reversed input arguments’);
end;
% recover residual sum of squares from .sige field of the result structure
epeu = resultu.sige*(nu-ku); eper = resultr.sige*(nr-kr);
numr = ku - kr; % find # of restrictions
ddof = nu-ku; % find denominator dof
fstat1 = (eper - epeu)/numr; % numerator
fstat2 = epeu/(nu-ku); % denominator
fstat = fstat1/fstat2; fprb = fdis_prb(fstat,numr,ddof);
The only point to note is the use of the function fdis prb which re-
turns the marginal probability for the F-statistic based on numerator and
denominator degrees of freedom parameters. This function is part of the
distribution functions library discussed in Chapter 9.
As another example, consider carrying out an LM specification test for
the same type of model based on restricted and unrestricted explanatory
variables matrices. This test involves a regression of the residuals from the
restricted model on the explanatory variables matrix from the unrestricted
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 40
model. The test is then based on the statistic computed using n ∗ R2 from
this regression, which is chi-squared (χ2 ) distributed with degrees of freedom
equal to the number of restrictions.
We can implement a function lm test that takes as input arguments the
‘results’ structure from the restricted regression and an explanatory variables
matrix for the unrestricted model. The lm test function will: carry out the
regression of the residuals from the restricted model on the variables in
the unrestricted model, compute the chi-squared statistic, and evaluate the
marginal probability for this statistic. We will return a regression ‘results’
structure from the regression performed in the lm test function so users
can print out the regression if they so desire. Example 2.9 shows a program
that calls the lm test function to carry out the test.
Note that one of the arguments returned by the lm test function is a re-
gression ‘results’ structure that can be printed using the prt function. These
regression results reflect the regression of the residuals from the restricted
model on the unrestricted data matrix. The user might be interested in see-
ing which particular explanatory variables in the unrestricted model matrix
exhibited a significant influence on the residuals from the restricted model.
The results structure is passed back from the lm test function just as if it
were any scalar or matrix argument being returned from the function.
The function lm test is shown below.
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 41
.::UdecomBooks::.
Chapter 2 Appendix
42
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 43
.::UdecomBooks::.
CHAPTER 2. REGRESSION USING MATLAB 44
.::UdecomBooks::.
Chapter 3
Utility Functions
This chapter presents utility functions that we will use throughout the re-
mainder of the text. We can categorize the various functions described here
into functions for:
The material in this chapter is divided into four sections that correspond
to the types of utility functions enumerated above. A final section discusses
development of “wrapper” functions that call other functions to make print-
ing and plotting econometric procedure results simpler for the user.
45
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 46
The first function cal stores information regarding the calendar dates
covered by our time-series data in a MATLAB structure variable. We would
use the function at the outset of our analysis to package information regard-
ing the starting year, period and frequency of our data series. This packaged
structure of calendar information can then be passed to other time series
estimation and printing functions.
For example, consider a case where we are dealing with monthly data
that begins in January, 1982, we would store this information using the call:
cal_struct = cal(1982,1,12);
which returns a structure, ‘cal struct’ with the following fields for the be-
ginning year, period and frequency.
cal_struct.beg_yr
cal_struc.beg_per
cal_struc.freq
The field ‘beg yr’ would equal 1982, ‘beg per’ would equal 1 for January,
and ‘freq’ would equal 12 to designate monthly data. Beyond setting up
calendar information regarding the starting dates for our time-series data,
we might want to know the calendar date associated with a particular ob-
servation. The cal function can provide this information as well, returning
it as part of the structure. The documentation for the function is:
PURPOSE: create a time-series calendar structure variable that
associates a date with an observation #
-----------------------------------------------------
USAGE: result = cal(begin_yr,begin_per,freq,obs)
or: result = cal(cstruc,obs)
where: begin_yr = beginning year, e.g., 1982
begin_per = beginning period, e.g., 3
freq = frequency, 1=annual,4=quarterly,12=monthly
obs = optional argument for an observation #
cstruc = a structure returned by a previous call to cal()
-----------------------------------------------------
RETURNS: a structure:
result.beg_yr = begin_yr
result.beg_per = begin_period
result.freq = frequency
result.obs = obs (if input)
result.year = year for obs (if input)
result.period = period for obs (if input)
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 47
recorded in the ‘.year’ and ‘.period’ fields. We also include the observation
number in the field ‘.obs’ of the returned structure. To illustrate use of this
function, consider that our data series start in January, 1982 and we wish to
determine the calendar year and period associated with observation number
83, we would make the call:
cal_struct = cal(1982,1,12,83)
ans =
beg_yr: 1982
beg_per: 1
freq: 12
obs: 83
year: 1988
period: 11
cstr = cal(1980,1,12);
begf = ical(1991,6,cstr);
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 48
This would set begf=138, which we can utilize when calling our fore-
casting function, or simply printing the data series observations for this
particular date.
The third function, tsdate allows us to display and print date strings
associated with annual, quarterly or monthly time-series, which is helpful
for printing results and forecasted values with date labels. Documentation
for this function is:
PURPOSE: produce a time-series date string for an observation #
given beginning year, beginning period and frequency of the data
---------------------------------------------------
USAGE: out = tsdate(beg_yr,beg_period,freq,obsn);
or: tsdate(beg_yr,beg_period,freq,obsn);
or: tsdate(cal_struct,obsn);
where: beg_yr = beginning year, e.g., 1974
beg_period = beginning period, e.g., 4 for april
freq = 1 for annual, 4 for quarterly 12 for monthly
obsn = the observation #
cal_struct = a structure returned by cal()
The function uses MATLAB ‘nargout’ to determine whether the user has
requested an output argument or not. In the event that no output argument
is requested, the function calls the MATLAB datestr function with no semi-
colon after the call to produce a printed string in the MATLAB command
window. For cases where the user desires more control over the printing
format of the date string, or wishes to include the date string in an output
file, we can call the function with an output argument. As an illustration of
the first case, consider the following code:
% ----- Example 3.2 Using the tsdate() function
cstruct = cal(1982,1,4);
for i=1:2;
tsdate(cstruct,i);
end;
which would produce the following output in the MATLAB command win-
dow:
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 49
Q1-82
Q2-82
that would place each date in the cell-array for use later in printing. The
cell-array looks as follows:
fdates =
’Jan82’ ’Feb82’ ’Mar82’ ’Apr82’ ’May82’ ’Jun82’
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 50
task. We also provide a function to plot time-series with dates on the time-
axis.
The function lprint transforms a MATLAB matrix by adding formatting
symbols needed to produce a table in LaTeX, a widely-used mathematical
typesetting program.
This is the first example of a function that uses the MATLAB structure
variable as an input argument. This allows us to provide a large number of
input arguments using a single structure variable. It also simplifies parsing
the input arguments in the function and setting default options.
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 51
The output from this example is shown below. Matrices that have a
large number of columns are automatically wrapped when printed, with
wrapping determined by both the numeric format and column names. To
understand how wrapping works, note that the lprint function examines
the width of column names as well as the width of the numeric format
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 52
supplied. The numeric format is padded at the left to match column names
that are wider than the format. If column names are not as wide as the
numeric format, these are right-justified. For example, if you supply a 15
character column name and a ‘10.6f’ numeric format, the columns will be
15 characters wide with the numbers printed in the last 10 character places
in the ‘10.6f’ format. On the other hand, if you supply 8 character column
names and ‘10.6f’ numeric format, the columns will be 10 characters wide
with the column names right-justified to fill the right-most 8 characters.
Wrapping occurs after a number of columns determined by 80 divided
by the numeric format width. Depending on the numeric format, you can
achieve a different number of columns for your table. For example, using a
format ‘10f.3’ would allow 8 columns in your table, whereas using ‘5d’ would
allow up to 16 columns. The format of ‘20.4f’ in the example, produces
wrapping after 4 columns. Note that the width of the printout may be
wider than 80 columns because of the column name padding considerations
discussed in the previous paragraph.
using no options
-0.2212 & 0.3829 & -0.6628 \\
-1.5460 & -0.7566 & -1.0419 \\
-0.7883 & -1.2447 & -0.6663 \\
-0.6978 & -0.9010 & 0.2130 \\
0.0539 & -1.1149 & -1.2154 \\
using fmt option
-0 & 0 & -1000 \\
-2000 & -1000 & -1000 \\
-1000 & -1000 & -1000 \\
-1000 & -1000 & 0 \\
0 & -1000 & -1000 \\
using column names and fmt option
Illinois & Ohio & Indiana \\
-0.000 & 0.000 & -1000.000 \\
-2000.000 & -1000.000 & -1000.000 \\
-1000.000 & -1000.000 & -1000.000 \\
-1000.000 & -1000.000 & 0.000 \\
0.000 & -1000.000 & -1000.000 \\
row and column labels
Rows Illinois & Ohio & Indiana \\
row1 -0.000 & 0.000 & -1000.000 \\
row2 -2000.000 & -1000.000 & -1000.000 \\
row3 -1000.000 & -1000.000 & -1000.000 \\
row4 -1000.000 & -1000.000 & 0.000 \\
row5 0.000 & -1000.000 & -1000.000 \\
wrapped output for large matrices
IL & OH & IN & WV &
-0.6453 & 0.6141 & 0.1035 & 1.5466 &
1.0657 & -0.6160 & 0.4756 & 0.6984 &
-0.1516 & 1.0661 & 1.2727 & 0.8227 &
0.8837 & -0.8217 & 1.6452 & 1.1852 &
0.8678 & -1.0294 & 0.8200 & 0.4893 &
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 53
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 54
function mprint(y,info)
% note structure variable named info in argument declaration
[nobs nvars] = size(y);
fid = 1; rflag = 0; cflag = 0; rnum = 0; nfmts = 1; % setup defaults
begr = 1; endr = nobs; begc = 1; endc = nvars; fmt = ’%10.4f’; cwidth = 80;
if nargin == 1
% rely on defaults
elseif nargin == 2
if ~isstruct(info)
error(’mprint: you must supply the options as a structure variable’);
end;
fields = fieldnames(info);
nf = length(fields);
for i=1:nf
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 55
if strcmp(fields{i},’fmt’)
fmts = info.fmt; [nfmts junk] = size(fmts);
if nfmts <= nvars, fmt = fmts;
else
error(’mprint: wrong # of formats in string -- need nvar’);
end;
elseif strcmp(fields{i},’fid’), fid = info.fid;
elseif strcmp(fields{i},’begc’), begc = info.begc;
elseif strcmp(fields{i},’begr’), begr = info.begr;
elseif strcmp(fields{i},’endc’), endc = info.endc;
elseif strcmp(fields{i},’endr’), endr = info.endr;
elseif strcmp(fields{i},’cnames’), cnames = info.cnames; cflag = 1;
elseif strcmp(fields{i},’rnames’), rnames = info.rnames; rflag = 1;
elseif strcmp(fields{i},’rflag’), rnum = info.rflag;
elseif strcmp(fields{i},’width’); cwidth = info.width;
end;
end;
else
error(’Wrong # of arguments to mprint’);
end; % end of if-elseif input checking
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 56
This function takes a dates structure, begin and end period and format
as arguments. As indicated in the function documentation, the user can
call this function using a number of different input arguments in almost any
order. Example 3.6 demonstrates some of the alternatives.
This flexibility to allow user input of any number of the options in almost
any order is achieved using the MATLAB varargin variable. This is a
generally useful approach to crafting functions, so we provide the details of
how this is accomplished. The function declaration contains the MATLAB
keyword ‘varargin’ which is a cell-array that we parse inside the function. It
is our responsibility to correctly deduce the number and order of the input
arguments supplied by the user.
function tsprint(y,cstruc,varargin)
% NOTE the use of varargin keyword in the function declaration
[nobs nvar] = size(y); fmt = ’%10.4f’; % set defaults
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 57
Our deductions are based on logic and the use of MATLAB functions
isnumeric to detect ‘begp’ input arguments and the size command to dis-
tinguish formats from variable names supplied as input arguments. Needless
to say, there are limits to what can be accomplished with this approach de-
pending on the nature of the input arguments. In addition, writing this type
of function is more prone to errors than the use of the structure variable as
an input argument demonstrated by the mprint and lprint functions. The
advantage is that the user need not define an input structure, but can pass
input arguments directly to the function. Passing input arguments directly
may be more convenient when typing commands to the MATLAB COM-
MAND WINDOW, which is likely to be the case when using the tsprint
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 58
rnames = ’Date’;
for k=begp:endp;
rnames = strvcat(rnames,tsdate(cstruc,k));
end;
in.rnames = rnames;
in.fmt = fmt;
in.cnames = vnames;
mprint(y(begp:endp,:),in);
The output varies due to the different formats and variable names sup-
plied as shown below.
Date illinos indiana kentucky michigan ohio
Jan90 192 76 348 97 171
Feb90 190 76 350 96 170
Mar90 192 78 356 95 172
Apr90 192 80 357 97 174
May90 194 82 361 104 177
Jun90 199 83 361 106 179
Jul90 200 84 354 105 181
Aug90 200 84 357 84 181
Sep90 199 83 360 83 179
Oct90 199 83 353 82 177
Nov90 197 82 350 81 176
Dec90 196 82 353 96 171
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 59
Date IL IN KY MI OH PA TN WV
Jan90 192.00 76.00 348.00 97.00 171.00 267.00 61.00 352.00
Feb90 190.00 76.00 350.00 96.00 170.00 266.00 57.00 349.00
Mar90 192.00 78.00 356.00 95.00 172.00 271.00 62.00 350.00
Apr90 192.00 80.00 357.00 97.00 174.00 273.00 62.00 351.00
May90 194.00 82.00 361.00 104.00 177.00 277.00 63.00 355.00
Jun90 199.00 83.00 361.00 106.00 179.00 279.00 63.00 361.00
Jul90 200.00 84.00 354.00 105.00 181.00 279.00 63.00 359.00
Aug90 200.00 84.00 357.00 84.00 181.00 281.00 63.00 358.00
Sep90 199.00 83.00 360.00 83.00 179.00 280.00 62.00 358.00
Oct90 199.00 83.00 353.00 82.00 177.00 277.00 62.00 357.00
Nov90 197.00 82.00 350.00 81.00 176.00 268.00 61.00 361.00
Dec90 196.00 82.00 353.00 96.00 171.00 264.00 59.00 360.00
Date IL IN KY MI
Jan90 192 76 348 97
Feb90 190 76 350 96
Mar90 192 78 356 95
Apr90 192 80 357 97
May90 194 82 361 104
Jun90 199 83 361 106
Jul90 200 84 354 105
Aug90 200 84 357 84
Sep90 199 83 360 83
Oct90 199 83 353 82
Nov90 197 82 350 81
Dec90 196 82 353 96
Date OH PA TN WV
Jan90 171 267 61 352
Feb90 170 266 57 349
Mar90 172 271 62 350
Apr90 174 273 62 351
May90 177 277 63 355
Jun90 179 279 63 361
Jul90 181 279 63 359
Aug90 181 281 63 358
Sep90 179 280 62 358
Oct90 177 277 62 357
Nov90 176 268 61 361
Dec90 171 264 59 360
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 60
A warning about using the cal and tsprint functions. When you trun-
cate time-series to account for transformations, you should re-set the calen-
dar with another call to the cal function based on the new dates associated
with the truncated series. Example 3.8 provides an illustration of this.
% ----- Example 3.8 Truncating time-series and the cal() function
dates = cal(1982,1,12); load test.dat; y = growthr(test,12);
vnames = strvcat(’IL’,’IN’,’KY’,’MI’,’OH’,’PA’,’TN’,’WV’);
% define beginning and ending print dates
begp = ical(1983,1,dates); endp = ical(1984,12,dates);
tsprint(y,dates,begp,endp,vnames);
ynew = trimr(y,dates.freq,0); % truncate initial observations
tdates = cal(1983,1,12); % reset the calendar after truncation
% re-define beginning and ending print dates based on the new calendar
begp = ical(1983,1,tdates); endp = ical(1983,12,tdates);
tsprint(ynew,tdates,begp,endp,vnames);
One nice feature of using the cal and ical functions is that dates informa-
tion is documented in the file. It should be fairly clear what the estimation,
forecasting, truncation, etc. dates are — just by examining the file.
In addition to printing time-series data we might wish to plot time-series
variables. A function tsplot was created for this purpose. The usage format
is similar to tsprint and relies on a structure from cal. The documentation
for this function is:
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 61
tsplot produces graphs that look like that shown in Figure 3.1. (Colors
are used to distinguish the lines in MATLAB). The time-axis labeling is not
ideal, but this is due to limitations in MATLAB. The function attempts to
distinguish between graphs that involve fewer observations and those that
involve a large number of observations. A vertical grid is used for the case
where we have a small number of observations.
If you invoke multiple calls to tsplot, you are responsible for placing
MATLAB pause statements as in the case of the plt function from the
regression library. You can add titles as well as use the MATLAB subplot
feature with this function. Example 3.10 produces time-series plots using
the subplot command to generate multiple plots in a single figure.
Note, we never really used the ‘dates’ structure returned by the first call
to cal. Nonetheless, this command in the file provides documentation that
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 62
-10
-20
Jul82 Jan85 Jul87 Jan90 Jul92 Jan95 Jul97
20
10 IL
IN
0
-10
-20
Jan90 Jan91 Jan92 Jan93
a sub-sample covering 1990-1992
20
IL
10 IN
-10
-20
Jan83 Jan84 Jan85 Jan86 Jan87 Jan88
quarterly averages produced with mth2qtr()
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 63
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 64
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 65
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 66
employment
upturn
10
downturn
-5
-10
-15
-20
Mar82 Dec84 Sep87 Jun90 Mar93 Dec95 Sep98
that is truncated to “feed the lags”. This approach would return different
length vectors or matrices depending on the type of transformation and
frequency of the data. For example, in the case of seasonal differencing, we
loose the first ‘freq’ observations where freq=4 or 12 depending on whether
we are working with quarterly or monthly data. When we consider that
after implementing a seasonal differencing transformation, we might then
construct lagged values of the seasonally differenced series, it becomes clear
that this approach has the potential to raise havoc.
The second design option is to always return the same size vector or
matrix that is used as an input argument to the function. This requires that
the user take responsibility for truncating the vector or matrix returned by
the function. We supply an auxiliary function, trimr than can help with
the truncation task facing the user.
An example to illustrate using a seasonal differencing function sdiff and
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 67
The function trimr is designed after a similar function from the Gauss
programming language, its documentation is:
PURPOSE: return a matrix (or vector) x stripped of the specified rows.
-----------------------------------------------------
USAGE: z = trimr(x,n1,n2)
where: x = input matrix (or vector) (n x k)
n1 = first n1 rows to strip
n2 = last n2 rows to strip
NOTE: modeled after Gauss trimr function
-----------------------------------------------------
RETURNS: z = x(n1+1:n-n2,:)
-----------------------------------------------------
Note that we utilize our cal function structure in the sdiff function, but
an integer ‘freq’ argument works as well. That is: ysdiff = sdiff(test,12);
would return the same matrix of seasonally differenced data as in the exam-
ple above.
In addition to the sdiff function, we create an ordinary differencing func-
tion tdiff. (Note, we cannot use the name diff because there is a MATLAB
function named diff). This function will produce traditional time-series
differences and takes the form:
ydiff = tdiff(y,k);
which would return the matrix or vector y differenced k times. For example:
k = 1 produces: yt − yt−1 , and k = 2 returns: ∆2 yt = yt − 2yt−1 + yt−2 .
Another time-series data transformation function is the lag function that
produces lagged vectors or matrices. The format is:
xlag1 = lag(x); % defaults to 1-lag
xlag12 = lag(x,12); % produces 12-period lag
where here again, we would have to use the trimr function to eliminate the
zeros in the initial positions of the matrices or vectors xlag1, and xlag12
returned by the lag function.
For VAR models, we need to create matrices containing a group of con-
tiguous lags that are used as explanatory variables in these models. A special
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 68
lag function mlag is used for this purpose. Given a matrix Yt containing
two time-series variable vectors:
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 69
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 70
matdiv - divides matrices that are not of the same dimension but are
row or column compatible. (NOTE: there is no Gauss function like
this, but Gauss allows this type of operation on matrices.)
matmul - multiplies matrices that are not of the same dimension but
are row or column compatible.
matadd - adds matrices that are not of the same dimension but are
row or column compatible.
matsub - divides matrices that are not of the same dimension but are
row or column compatible.
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 71
8 print;
9 b_old = 0;
10 b_new = b;
11 w = x;
12 do until abs(b_new - b_old) < 1e-3;
13 b_old = b_new;
14 b_new = invpd(w’x)*(w’y);
15 w = x./abs(y - x * b_new);
16 endo;
17 print "LAD estimates " b_new’;
4. Gauss carries out regression estimation using: (y/x) on line #7, which
we need to replace with a MATLAB ‘backslash’ operator.
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 72
Of course, beyond simply converting the code, we can turn this into a
full-blown regression procedure and add it to the regression library. The
code for this is shown below.
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 73
conv = max(abs(b_new-b_old));
while (conv > crit) & (iter <= maxit);
b_old=b_new; b_new = invpd(w’*x)*(w’*y);
resid = (abs(y-x*b_new)); ind = find(resid < 0.00001);
resid(ind) = 0.00001; w = matdiv(x,resid);
iter = iter+1; conv = max(abs(b_new-b_old));
end;
results.meth = ’lad’; results.beta = b_new;
results.y = y; results.nobs = nobs;
results.nvar = nvar; results.yhat = x*results.beta;
results.resid = y - results.yhat;
sigu = results.resid’*results.resid;
results.sige = sigu/(nobs-nvar);
tmp = (results.sige)*(diag(inv(w’*x)));
results.tstat = results.beta./(sqrt(tmp));
ym = y - ones(nobs,1)*mean(y); rsqr1 = sigu; rsqr2 = ym’*ym;
results.rsqr = 1.0 - rsqr1/rsqr2; % r-squared
rsqr1 = rsqr1/(nobs-nvar); rsqr2 = rsqr2/(nobs-1.0);
results.rbar = 1 - (rsqr1/rsqr2); % rbar-squared
ediff = results.resid(2:nobs) - results.resid(1:nobs-1);
results.dw = (ediff’*ediff)/sigu’; % durbin-watson
results.iter = iter; results.conv = conv;
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 74
some users may directly call these functions in lieu of using the generic plt
function.
function plt(results,vnames)
% PURPOSE: Plots results structures returned by most functions
% by calling the appropriate plotting function
%---------------------------------------------------
% USAGE: plt(results,vnames)
% Where: results = a structure returned by an econometric function
% vnames = an optional vector of variable names
% --------------------------------------------------
% NOTES: this is simply a wrapper function that calls another function
% --------------------------------------------------
% RETURNS: nothing, just plots the results
% --------------------------------------------------
Note that the function plt contains no code to carry out plotting, it con-
sists of a large switch statement with multiple case arguments that key on
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 75
the ‘meth’ field of the structure variable input. This allows you to construct
your own plotting functions that plot information contained in ‘results struc-
tures’ and add another ‘case’ to the large switch statement in the function
plt. We illustrate how to do this using another wrapper function prt below.
In addition to plotting results, many of the econometric estimation and
testing methods produce results that are printed using an associated prt reg,
prt var, etc., function. A wrapper function named prt seems appropriate
here as well and we have constructed one as part of the Econometrics Tool-
box. All functions that provide ‘results structures’ with information that
can be printed have been added to the function prt, so a simple call to prt
with the ‘results structure’ will provide printed results. As in the case of
plt, the function prt contains no code for actually printing results, it is sim-
ply a large ‘switch’ statement that queries the ‘meth’ field of the structure
variable input and contains many ‘case statements’ where the appropriate
printing functions are called.
All regression results structures and statistical tests results can be printed
using prt(results), or prt(results,vnames,fid). This was accomplished
by providing a specialized printing function to provide formatted output
for each regression or statistical testing function. A call to the specialzed
function is then added as a ‘case’ argument in the wrapper function prt.
You should of course take the same approach when constructing your own
functions.
As an illustration of this, consider the functions johansen, adf, cadf,
phillips that implement cointegration tests. A function prt coint was con-
structed that provides nicely formatted printing of the results structures
returned by each of these functions. We then add to a case to the switch
statement in the prt function that looks as follows:
case {’johansen’,’adf’,’cadf’,’phillips’}
% call prt_coint
if arg == 1
prt_coint(results);
elseif arg == 2
prt_coint(results,vnames);
elseif arg == 3
prt_coint(results,vnames,fid);
else
prt_coint(results,[],fid);
end;
This ensures that a call to the function prt will produce printed re-
sults for the results structures produced by these four cointegration testing
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 76
functions by calling the appropriate prt coint function. It saves users the
trouble of knowing or remembering the name of your printing function.
When users submit econometric functions for inclusion in the Econo-
metrics Toolbox, they should provide a printing or plotting function where
appropriate and I will add a ‘case’ statement to the prt or plt function in
the toolbox to call your printing or plotting functions.
.::UdecomBooks::.
Chapter 3 Appendix
The utility functions discussed in this chapter (as well as others not dis-
cussed) are in a subdirectory util.
utility function library
77
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 78
.::UdecomBooks::.
CHAPTER 3. UTILITY FUNCTIONS 79
.::UdecomBooks::.
Chapter 4
Regression Diagnostics
80
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 81
X2 = X3 + u (4.2)
Instead, we generate the X2 vector from the X3 vector with an added random
error vector u. Equation (4.2) represents X2 as a near linear combination of
X3 where the strength of the linear dependency is determined by the size of
the u vector. To generate data sets with an increasing amount of collinearity
between X2 and X3 , we adopted the following strategy:
1. First set the variance of the random normal error vector u at 1.0 and
generate the X2 vector from the X3 vector.
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 82
where the only change between these 1000 Y vectors and those from
the benchmark generation is the collinear nature of the X2 and X3
vectors.
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 83
The results of the experiment showing both the means and standard
deviations from the distribution of estimates are:
beta means alpha beta gamma theta
benchmark 1.0033 1.0027 1.0047 0.9903
sigu=1.0 1.0055 1.0027 1.0003 0.9899
sigu=0.5 1.0055 1.0027 1.0007 0.9896
sigu=0.1 1.0055 1.0027 1.0034 0.9868
A first point to note about the experimental outcomes is that the means
of the estimates are unaffected by the collinearity problem. Collinearity cre-
ates problems with regard to the variance of the distribution of the estimates,
not the mean. A second point is that the benchmark data set produced pre-
cise estimates, with standard deviations for the distribution of outcomes
around 0.33. These standard deviations would result in t-statistics around
3, allowing us to infer that the true parameters are significantly different
from zero.
Turning attention to the standard deviations from the three collinear
data sets we see a clear illustration that increasing the severity of the near
linear combination between X2 and X3 produces an increase in the standard
deviation of the resulting distribution for the γ and θ estimates associated
with X2 and X3 . The increase is about three-fold for the worse case where
σu2 = 0.1 and the strength of the collinear relation between X2 and X3 is
the greatest.
A diagnostic technique presented in Chapter 3 of Regression Diagnos-
tics by Belsley, Kuh, and Welsch (1980) is implemented in the function
bkw. The diagnostic is capable of determining the number of near linear
dependencies in a given data matrix X, and the diagnostic identifies which
variables are involved in each linear dependency. This diagnostic technique
is based on the Singular Value Decomposition that decomposes a matrix
X = U DV 0 , where U contains the eigenvectors of X and D is a diagonal
matrix containing eigenvalues.
For diagnostic purposes the singular value decomposition is applied to
the variance-covariance matrix of the least-squares estimates and rearranged
to form a table of variance-decomposition proportions. The procedure for a
k variable least-squares model is described in the following. The variance of
the estimate β̂k can be expressed as shown in (4.3).
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 84
X
k
var(β̂k ) = σ̂ε2 2
(Vkj /λ2j ) (4.3)
j=1
The diagnostic value of this expression lies in the fact that it decomposes
var(β̂k ) into a sum of components, each associated with one of the k singular
values λ2j that appear in the denominator. Expression (4.4) expands the
summation in (4.3) to show this more clearly.
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 85
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 86
vnames = strvcat(’x1’,’x2’)
---------------------------------------------------
RETURNS:
nothing, just prints the table out
--------------------------------------------------
SEE ALSO: dfbeta, rdiag, diagnose
---------------------------------------------------
REFERENCES: Belsley, Kuh, Welsch, 1980 Regression Diagnostics
----------------------------------------------------
The function allows a variable name vector and format as optional inputs.
As a convenience, either a variable name vector with names for the variables
in the data matrix X or one that includes a variable name for y as well as
the variables in X can be used. This is because the bkw function is often
called in the context of regression modeling, so we need only construct a
single variable name string vector that can be used for printing regression
results as well as labelling variables in the bkw output.
As an example of using the bkw function to carry out tests for collinear-
ity, the program below generates a collinear data set and and uses the bkw
function to test for near linear relationships.
The results of the program are shown below. They detect the near linear
relationship between variables 1, 2 and 4 which we generated in the data
matrix X.
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 87
matrix before inversion. The scalar term γ is called the ‘ridge’ parameter.
The ridge regression formula is shown in (4.7).
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 88
The results from ols and ridge estimation are shown below. From these
results we see that the near linear relationship between variables x1 , x2 and
x4 lead to a decrease in precision for these estimates. The ridge estimates
increase the precision as indicated by the larger t−statistics.
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 89
Rbar-squared = 0.9015
sigma^2 = 0.9237
Durbin-Watson = 1.6826
Nobs, Nvars = 100, 5
***************************************************************
Variable Coefficient t-statistic t-probability
variable 1 -0.293563 -0.243685 0.808000
variable 2 2.258060 1.871842 0.064305
variable 3 1.192133 11.598932 0.000000
variable 4 2.220418 1.796384 0.075612
variable 5 0.922009 8.674158 0.000000
A point to note about ridge regression is that it does not produce un-
biased estimates. The amount of bias in the estimates is a function of how
large the value of the ridge parameter γ is. Larger values of γ lead to im-
proved precision in the estimates — at a cost of increased bias.
A function rtrace helps to assess the trade-off between bias and effi-
ciency by plotting the ridge estimates for a range of alternative values of the
ridge parameter. The documentation for rtrace is:
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 90
To the extent that the parameter values vary greatly from those associ-
ated with values of θ = 0, we can infer that a great deal of bias has been
introduced by the ridge regression. A graph produced by rtrace is shown
in Figure 4.1, indicating a fair amount of bias associated with 3 of the 5
parameters in this example.
Another solution for the problem of collinearity is to use a Bayesian
model to introduce prior information. A function theil produces a set of
estimates based on the ‘mixed estimation’ method set forth in Theil and
Goldberger (1961). The documentation for this function is:
PURPOSE: computes Theil-Goldberger mixed estimator
y = X B + E, E = N(0,sige*IN)
c = R B + U, U = N(0,v)
---------------------------------------------------
USAGE: results = theil(y,x,c,R,v)
where: y = dependent variable vector
x = independent variables matrix of rank(k)
c = a vector of prior mean values, (c above)
R = a matrix of rank(r) (R above)
v = prior variance-covariance (var-cov(U) above)
---------------------------------------------------
RETURNS: a structure:
results.meth = ’theil’
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 91
x1
x2
1.3 x3
x4
x5
1.2
Regression Coefficients
1.1
0.9
0.8
0.7
0.6
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016
Value of θ, vertical line shows H-K θ value
The user can specify subjective prior information in the form of a normal
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 92
prior for the parameters β in the model. Theil and Goldberger showed
that this prior information can be expressed as stochastic linear restrictions
taking the form:
c = Rβ + u, (4.11)
These matrices are used as additional dummy or fake data observations in
the estimation process. The original least-squares model in matrix form can
be rewritten as in (4.12) to show the role of the matrices defined above.
y X ε
... = ... β + ... (4.12)
c R u
The partitioning symbol, (. . .), in the matrices and vectors of (4.12)
designates that we are adding the matrix R and the vectors c and u to the
original matrix X and vectors y and ε. These additional observations make
it clear we are augmenting the weak sample data with our prior information.
At this point we use an OLS estimation algorithm on the modified y vector
and X matrix to arrive at the “mixed estimates”. One minor complication
arises here however, the theoretical disturbance vector no longer consists of
the simple ε vector which obeys the Gauss-Markov assumptions, but has an
additional u vector added to it. We need to consider the variance-covariance
structure of this new disturbance vector which requires a minor modification
of the OLS formula producing the resulting “mixed estimator” shown in
(4.13).
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 93
This program produced the following results indicating that use of prior
information improved the precision of the estimates compared to the least-
squares estimates.
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 94
An example where we generate a data set and then artificially create two
outliers at observations #50 and #70 is shown below. The graphical output
from plt dfb in Figure 4.2 shows a graph of the change in β̂ associated with
omitting each observation. We see evidence of the outliers at observations
#50 and #70 in the plot.
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 95
df betas
0.2
x1
-0.2
0 10 20 30 40 50
df betas 60 70 80 90 100
0.1
x2
-0.1
0 10 20 30 40 50
df betas 60 70 80 90 100
0.5
x3
-0.5
0 10 20 30 40 50
df betas 60 70 80 90 100
0.1
x4
-0.1
0 10 20 30 40 50 60 70 80 90 100
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 96
dffits
4
-2
0 10 20 30 40 50 60 70 80 90 100
studentized residuals
10
-5
-10
0 10 20 30 40 50 60 70 80 90 100
hat-matrix diagonals
0.2
0.15
0.1
0.05
0
0 10 20 30 40 50 60 70 80 90 100
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 97
Note that we can use our function mprint from Chapter 3 to produce
a formatted printout of the results that looks as follows:
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 98
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 99
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 100
.::UdecomBooks::.
Chapter 4 Appendix
101
.::UdecomBooks::.
CHAPTER 4. REGRESSION DIAGNOSTICS 102
.::UdecomBooks::.
Chapter 5
This chapter describes the design and use of MATLAB functions to imple-
ment vector autoregressive (VAR) and error correction (EC) models. The
MATLAB functions described here provide a consistent user-interface in
terms of the MATLAB help information, and related routines to print and
plot results from the various models. One of the primary uses of VAR and
EC models is econometric forecasting, for which we provide a set of func-
tions.
Section 5.1 describes the basic VAR model and our function to estimate
and print results for this method. Section 5.2 turns attention to EC models
while Section 5.3 discusses Bayesian variants on these models. Finally, we
take up forecasting in Section 5.4. All of the functions implemented in our
vector autoregressive function library are documented in the Appendix to
this chapter.
103
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 104
y1t y1t C1 ε1t
A11 (`) . . . A1n (`)
y2t .. .. y2t C2 ε2t
= .. + + (5.1)
.. . . . .. .. ..
. . . .
An1 (`) . . . Ann (`)
ynt ynt Cn εnt
The VAR model posits a set of relationships between past lagged values
of all variables in the model and the current value of each variable in the
model. For example, if the yit represent employment in state i at time
t, the VAR model structure allows employment variation in each state to
be explained by past employment variation in the state itself, yit−k , k =
1, . . . , m as well as past employment variation in other states, yjt−k , k =
1, . . . , m, j 6= i. This is attractive since regional or state differences in
business cycle activity suggest lead/lag relationships in employment of the
type set forth by the VAR model structure.
The model is estimated using ordinary least-squares, so we can draw
on our ols routine from the regression library. A function var produces
estimates for the coefficients in the VAR model as well as related regression
statistics and Granger-causality test statistics.
The documentation for the var function is:
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 105
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 106
There are two utility functions that help analyze VAR model Granger-
causality output. The first is pgranger, which prints a matrix of the
marginal probabilities associated with the Granger-causality tests in a con-
venient format for the purpose of inference. The documentation is:
PURPOSE: prints VAR model Granger-causality results
--------------------------------------------------
USAGE: pgranger(results,varargin);
where: results = a structure returned by var(), ecm()
varargin = a variable input list containing
vnames = an optional variable name vector
cutoff = probability cutoff used when printing
usage example 1: pgranger(result,0.05);
example 2: pgranger(result,vnames);
example 3: pgranger(result,vnames,0.01);
example 4: pgranger(result,0.05,vnames);
----------------------------------------------------
e.g. cutoff = 0.05 would only print
marginal probabilities < 0.05
---------------------------------------------------
NOTES: constant term is added automatically to vnames list
you need only enter VAR variable names plus deterministic
---------------------------------------------------
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 107
than print out the detailed VAR estimation results, we might be interested
in drawing inferences regarding Granger-causality from the marginal proba-
bilities. The following program would produce a printout of just these prob-
abilities. It utilizes an option to suppress printing of probabilities greater
than 0.1, so that our inferences would be drawn on the basis of a 90% con-
fidence level.
% ----- Example 5.2 Using the pgranger() function
dates = cal(1982,1,12); % monthly data starts in 82,1
load test.dat; % monthly mining employment in 8 states
y = growthr(test,12); % convert to growth-rates
yt = trimr(y,dates.freq,0); % truncate
dates = cal(1983,1,1); % redefine the calendar for truncation
vname = strvcat(’il’,’in’,’ky’,’mi’,’oh’,’pa’,’tn’,’wv’);
nlag = 12;
res = var(yt,nlag); % estimate 12-lag VAR model
cutoff = 0.1; % examine Granger-causality at 90% level
pgranger(res,vname,cutoff); % print Granger probabilities
The format of the output is such that the columns reflect the Granger-
causal impact of the column-variable on the row-variable. That is, Indiana,
Kentucky, Pennsylvania, Tennessee and West Virginia exert a significant
Granger-causal impact on Illinois employment whereas Michigan and Ohio
do not. Indiana exerts the most impact, affecting Illinois, Michigan, Ohio,
Tennessee, and West Virginia.
The second utility is a function pftest that prints just the Granger-
causality joint F-tests from the VAR model. Use of this function is similar
to pgranger, we simply call the function with the results structure returned
by the var function, e.g., pftest(result,vnames), where the ‘vnames’ ar-
gument is an optional string-vector of variable names. This function would
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 108
produce the following output for each equation of a VAR model based on
all eight states:
A few points regarding how the var function was implemented. We rely
on the ols function from the regression library to carry out the estimation
of each equation in the model and transfer the ‘results structure’ returned
by ols to a new var results structure array for each equation. Specifically
the code looks as follows:
The explanatory variables matrix, ‘xmat’ is the same for all equations of
the VAR model, so we form this matrix before entering the ‘for-loop’ over
equations. Structure arrays can be formed by simply specifying:
struct(i).fieldname, where i is the array index. This makes them just as
easy to use as the structure variables we presented in the discussion of the
regression function library in Chapter 2.
The most complicated part of the var function is implementation of the
Granger-causality tests which requires that we produce residuals based on
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 109
models that sequentially omit each variable in the model from the explana-
tory variables matrix in each equation. These residuals reflect the restricted
model in the joint F-test, whereas the VAR model residuals represent the
unrestricted model. The Granger-causality tests represent a series of tests
for the joint significance of each variable in each equation of the model. This
sequence of calculations is illustrated in the code below:
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 110
using the matrix xtmp with a Cholesky solution provided by the MATLAB
‘backslash’ operator. We take the Cholesky approach rather than the qr
matrix decomposition because a profile of the var function showed that
over 50% of the time spent in the routine was devoted to this step. In
contrast, only 12% of the time was spent determining the VAR regression
information using the ols command. Finally, note that the code above is
embedded in a loop over all equations in the model, so we store the ‘ftest’
and ‘fprob’ results in the structure for equation j.
Although the illustrations so far have not involved use of deterministic
variables in the VAR model, the var function is capable of handling these
variables. As an example, we could include a set of seasonal dummy variables
in the VAR model using:
A handy option on the prt var (and prt) function is the ability to print
the VAR model estimation results to an output file. Because these results
are quite large, they can be difficult to examine in the MATLAB command
window. Note that the wrapper function prt described in Chapter 3 also
works to print results from VAR model estimation, as does plt.
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 111
---------------------------------------------------
NOTE: - constant term is added automatically to vnames list
you need only enter VAR variable names plus deterministic
- you may use prt_var(results,[],fid) to print
output to a file with no vnames
---------------------------------------------------
In addition to the prt and prt var functions, there are plt and plt var
functions that produce graphs of the actual versus predicted and residuals
for these models.
One final issue associated with specifying a VAR model is the lag length
to employ. A commonly used approach to determining the lag length is to
perform statistical tests of models with longer lags versus shorter lag lengths.
We view the longer lag models as an unrestricted model versus the restricted
shorter lag version of the model, and construct a likelihood ratio statistic
to test for the significance of imposing the restrictions. If the restrictions
are associated with a statistically significant degradation in model fit, we
conclude that the longer lag length model is more appropriate, rejecting the
shorter lag model.
Specifically, the chi-squared distributed test statistic which has degrees
of freedom equation to the number of restrictions imposed is:
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 112
sims = 1;
disp(’LR-ratio results with Sims correction’);
lrratio(y,maxlag,minlag,sims);
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 113
nperiods = 48;
[m1 m2] = irf(results,nperiods,’o1’,vnames);
The results are provided graphically as eight graphs for the example
involving 8 equations, where a typical graph is shown in Figure 5.1. This
function puts up multiple MATLAB graph windows with a varying number
of the impulse response graphs in each window. This allows the user to
examine all of the responses and compare them using the multiple MATLAB
figure windows. The figure presents only a single graph window showing
three of the eight graphs produced by the example program.
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 114
zt = α 0 y t (5.3)
Engle and Granger (1987) provide a Representation Theorem stating
that if two or more series in yt are co-integrated, there exists an error cor-
rection representation taking the following form:
where: zt−1 = yt−1 − θ − αxt−1 , ci are constant terms and εit denote distur-
bances in the model.
We provide a function adf, (augmented Dickey-Fuller) to test time-series
for the I(1), I(0) property, and another routine cadf (co-integrating aug-
mented Dickey-Fuller) to carry out the tests from step two above on zt to
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 115
This would be used to test a time-series vector for I(1) or I(0) status.
Allowance is made for polynomial time trends as well as constant terms in
the function and a set of critical values are returned in a structure by the
function. A function prt coint (as well as prt) can be used to print output
from adf, cadf and johansen, saving users the work of formatting and
printing the result structure output.
The function cadf is used for the case of two variables, yt , xt , where we
wish to test whether the condition yt = αxt can be interpreted as an equi-
librium relationship between the two series. The function documentation
is:
PURPOSE: compute augmented Dickey-Fuller statistic for residuals
from a cointegrating regression, allowing for deterministic
polynomial trends
------------------------------------------------------------
USAGE: results = cadf(y,x,p,nlag)
where: y = dependent variable time-series vector
x = explanatory variables matrix
p = order of time polynomial in the null-hypothesis
p = -1, no deterministic part
p = 0, for constant term
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 116
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 117
space, but all lags produced the same inferences. One point to note is that
the adf and cadf functions return a set of 6 critical values for significance
levels 1%,5%,10%,90%,95%,99% as indicated in the documentation for these
functions. Only three are printed for purposes of clarity, but all are available
in the results structure returned by the functions.
Augmented DF test for unit root variable: illinois
ADF t-statistic # of lags AR(1) estimate
-0.164599 6 0.998867
1% Crit Value 5% Crit Value 10% Crit Value
-3.464 -2.912 -2.588
We see from the adf function results that both Illinois and Indiana are
I(1) variables. We reject the augmented Dickey-Fuller hypothesis of I(0)
because our t-statistics for both Illinois and Indiana are less than (in absolute
value terms) the critical value of -2.588 at the 90% level.
From the results of cadf we find that Illinois and Indiana mining employ-
ment are not co-integrated, again because the t-statistic of -1.67 does not
exceed the 90% critical value of -3.08 (in absolute value terms). We would
conclude that an EC model is not appropriate for these two time-series.
For most EC models, more than two variables are involved so the Engle
and Granger two-step procedure needs to be generalized. Johansen (1988)
provides this generalization which takes the form of a likelihood-ratio test.
We implement this test in the function johansen. The Johansen procedure
provides a test statistic for determining r, the number of co-integrating
relationships between the n variables in yt as well as a set of r co-integrating
vectors that can be used to construct error correction variables for the EC
model.
As a brief motivation for the work carried out by the johansen function,
we start with a reparameterization of the EC model:
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 118
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 119
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 120
The johansen function will return results for a sequence of tests against
alternative numbers of co-integrating relationships ranging from r ≤ 0 up
to r ≤ m − 1, where m is the number of variables in the matrix y.
The function prt provides a printout of the trace and maximal eigenvalue
statistics as well as the critical values returned in the johansen results
structure.
The printout does not present the eigenvalues and eigenvectors, but they
are available in the results structure returned by johansen as they are
needed to form the co-integrating variables for the EC model. The focus of
co-integration testing would be the trace and maximal eigenvalue statistics
along with the critical values. For this example, we find: (using the 95%
level of significance) the trace statistic rejects r ≤ 0 because the statistic of
307.689 is greater than the critical value of 159.529; it also rejects r ≤ 1,
r ≤ 2, r ≤ 3, r ≤ 4, and r ≤ 5 because these trace statistics exceed the
associated critical values; for r ≤ 6 we cannot reject H0, so we conclude that
r = 6. Note that using the 99% level, we would conclude r = 4 as the trace
statistic of 52.520 associated with r ≤ 4 does not exceed the 99% critical
value of 54.681.
We find a different inference using the maximal eigenvalue statistic. This
statistic allows us to reject r ≤ 0 as well as r ≤ 1 and r ≤ 2 at the 95% level.
We cannot reject r ≤ 3, because the maximal eigenvalue statistic of 30.791
does not exceed the critical value of 33.878 associated with the 95% level.
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 121
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 122
---------------------------------------------------
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 123
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 124
The printed output is shown below for a single state indicating the pres-
ence of two co-integrating relationships involving the states of Illinois and
Indiana. The estimates for the error correction variables are labeled as such
in the printout. Granger causality tests are printed, and these would form
the basis for valid causality inferences in the case where co-integrating rela-
tionships existed among the variables in the VAR model.
Dependent Variable = wv
R-squared = 0.1975
Rbar-squared = 0.1018
sige = 341.6896
Nobs, Nvars = 170, 19
******************************************************************
Variable Coefficient t-statistic t-probability
il lag1 0.141055 0.261353 0.794176
il lag2 0.234429 0.445400 0.656669
in lag1 1.630666 1.517740 0.131171
in lag2 -1.647557 -1.455714 0.147548
ky lag1 0.378668 1.350430 0.178899
ky lag2 0.176312 0.631297 0.528801
mi lag1 0.053280 0.142198 0.887113
mi lag2 0.273078 0.725186 0.469460
oh lag1 -0.810631 -1.449055 0.149396
oh lag2 0.464429 0.882730 0.378785
pa lag1 -0.597630 -2.158357 0.032480
pa lag2 -0.011435 -0.038014 0.969727
tn lag1 -0.049296 -0.045237 0.963978
tn lag2 0.666889 0.618039 0.537480
wv lag1 -0.004150 -0.033183 0.973572
wv lag2 -0.112727 -0.921061 0.358488
ec term il -2.158992 -1.522859 0.129886
ec term in -2.311267 -1.630267 0.105129
constant 8.312788 0.450423 0.653052
****** Granger Causality Tests *******
Variable F-value Probability
il 0.115699 0.890822
in 2.700028 0.070449
ky 0.725708 0.485662
mi 0.242540 0.784938
oh 1.436085 0.241087
pa 2.042959 0.133213
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 125
tn 0.584267 0.558769
wv 1.465858 0.234146
Johansen MLE estimates
NULL: Trace Statistic Crit 90% Crit 95% Crit 99%
r <= 0 il 214.390 153.634 159.529 171.090
r <= 1 in 141.482 120.367 125.618 135.982
r <= 2 ky 90.363 91.109 95.754 104.964
r <= 3 oh 61.555 65.820 69.819 77.820
r <= 4 tn 37.103 44.493 47.855 54.681
r <= 5 wv 21.070 27.067 29.796 35.463
r <= 6 pa 10.605 13.429 15.494 19.935
r <= 7 mi 3.192 2.705 3.841 6.635
NULL: Eigen Statistic Crit 90% Crit 95% Crit 99%
r <= 0 il 72.908 49.285 52.362 58.663
r <= 1 in 51.118 43.295 46.230 52.307
r <= 2 ky 28.808 37.279 40.076 45.866
r <= 3 oh 24.452 31.238 33.878 39.369
r <= 4 tn 16.034 25.124 27.586 32.717
r <= 5 wv 10.465 18.893 21.131 25.865
r <= 6 pa 7.413 12.297 14.264 18.520
r <= 7 mi 3.192 2.705 3.841 6.635
The results indicate that given the two lag model, two co-integrating
relationships were found leading to the inclusion of two error correction
variables in the model. The co-integrating relationships are based on the
trace statistics compared to the critical values at the 95% level. From the
trace statistics in the printed output we see that, H0: r ≤ 2 was rejected
at the 95% level because the trace statistic of 90.363 is less than the asso-
ciated critical value of 95.754. Keep in mind that the user has the option
of specifying the number of co-integrating relations to be used in the ecm
function as an optional argument. If you wish to work at the 90% level of
significance, we would conclude from the johansen results that r = 4 co-
integrating relationships exist. To estimate an ecm model based on r = 4
we need simply call the ecm function with:
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 126
βi ∼ N (1, σβ2i )
βj ∼ N (0, σβ2j ) (5.6)
where βi denotes the coefficients associated with the lagged dependent vari-
able in each equation of the VAR and βj represents any other coefficient.
The prior means for lagged dependent variables are set to unity in belief
that these are important explanatory variables. On the other hand, a prior
mean of zero is assigned to all other coefficients in the equation, βj in (5.6),
indicating that these variables are viewed as less important in the model.
The prior variances, σβ2i , specify uncertainty about the prior means β̄i =
1, and σβ2j indicates uncertainty regarding the means β̄j = 0. Because
the VAR model contains a large number of parameters, Doan, Litterman
and Sims (1984) suggested a formula to generate the standard deviations
as a function of a small number of hyperparameters: θ, φ and a weighting
matrix w(i, j). This approach allows a practitioner to specify individual
prior variances for a large number of coefficients in the model using only a
few parameters that are labeled hyperparameters. The specification of the
standard deviation of the prior imposed on variable j in equation i at lag k
is:
µ ¶
σ̂uj
σijk = θw(i, j)k −φ (5.7)
σ̂ui
where σ̂ui is the estimated standard error from a univariate autoregression
involving variable i, so that (σ̂uj /σ̂ui ) is a scaling factor that adjusts for vary-
ing magnitudes of the variables across equations i and j. Doan, Litterman
and Sims (1984) labeled the parameter θ as ‘overall tightness’, reflecting the
standard deviation of the prior on the first lag of the dependent variable.
The term k −φ is a lag decay function with 0 ≤ φ ≤ 1 reflecting the decay
rate, a shrinkage of the standard deviation with increasing lag length. This
has the effect of imposing the prior means of zero more tightly as the lag
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 127
length increases, based on the belief that more distant lags represent less
important variables in the model. The function w(i, j) specifies the tight-
ness of the prior for variable j in equation i relative to the tightness of the
own-lags of variable i in equation i.
The overall tightness and lag decay hyperparameters used in the stan-
dard Minnesota prior have values θ = 0.1, φ = 1.0. The weighting matrix
used is:
1 0.5 . . . 0.5
0.5 1 0.5
W=
.. (5.8)
. .
.. ..
.
0.5 0.5 . . . 1
This weighting matrix imposes β̄i = 1 loosely, because the lagged de-
pendent variable in each equation is felt to be an important variable. The
weighting matrix also imposes the prior mean of zero for coefficients on
other variables in each equation more tightly since the βj coefficients are
associated with variables considered less important in the model.
A function bvar will provide estimates for this model. The function
documentation is:
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 128
results(eq).tprob = t-probabilities
results(eq).resid = residuals
results(eq).yhat = predicted values
results(eq).y = actual values
results(eq).sige = e’e/(n-k)
results(eq).rsqr = r-squared
results(eq).rbar = r-squared adjusted
---------------------------------------------------
SEE ALSO: bvarf, var, ecm, rvar, plt_var, prt_var
---------------------------------------------------
The printout shows the hyperparameter values associated with the prior.
It does not provide Granger-causality test results as these are invalid given
the Bayesian prior applied to the model. Results for a single equation of the
mining employment example are shown below.
Dependent Variable = il
R-squared = 0.9942
Rbar-squared = 0.9936
sige = 12.8634
Nobs, Nvars = 171, 17
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 129
******************************************************************
Variable Coefficient t-statistic t-probability
il lag1 1.134855 11.535932 0.000000
il lag2 -0.161258 -1.677089 0.095363
in lag1 0.390429 1.880834 0.061705
in lag2 -0.503872 -2.596937 0.010230
ky lag1 0.049429 0.898347 0.370271
ky lag2 -0.026436 -0.515639 0.606776
mi lag1 -0.037327 -0.497504 0.619476
mi lag2 -0.026391 -0.377058 0.706601
oh lag1 -0.159669 -1.673863 0.095996
oh lag2 0.191425 2.063498 0.040585
pa lag1 0.179610 3.524719 0.000545
pa lag2 -0.122678 -2.520538 0.012639
tn lag1 0.156344 0.773333 0.440399
tn lag2 -0.288358 -1.437796 0.152330
wv lag1 -0.046808 -2.072769 0.039703
wv lag2 0.014753 0.681126 0.496719
constant 9.454700 2.275103 0.024149
There exists a number of attempts to alter the fact that the Minnesota
prior treats all variables in the VAR model except the lagged dependent
variable in an identical fashion. Some of the modifications suggested have
focused entirely on alternative specifications for the prior variance. Usually,
this involves a different (non-symmetric) weight matrix W and a larger value
of 0.2 for the overall tightness hyperparameter θ in place of the value θ = 0.1
used in the Minnesota prior. The larger overall tightness hyperparameter
setting allows for more influence from other variables in the model. For
example, LeSage and Pan (1995) constructed a weight matrix based on
first-order spatial contiguity to emphasize variables from neighboring states
in a multi-state agricultural output forecasting model. LeSage and Magura
(1991) employed interindustry input-output weights to place more emphasis
on related industries in a multi-industry employment forecasting model.
These approaches can be implemented using the bvar function by con-
structing an appropriate weight matrix. For example, the first order conti-
guity structure for the eight states in our mining employment example can
be converted to a set of prior weights by placing values of unity on the main
diagonal of the weight matrix, and in positions that represent contiguous
entities. An example is shown in (5.9), where row 1 of the weight matrix
is associated with the time-series for the state of Illinois. We place a value
of unity on the main diagonal to indicate that autoregressive values from
Illinois are considered important variables. We also place values of one in
columns 2 and 3, reflecting the fact that Indiana (variable 2) and Kentucky
(variable 3) are states that have borders touching Illinois. For other states
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 130
result = bvar(y,nlag,tight,w,decay);
prt(result,vnames);
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 131
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 132
X
n
yit = αi + Cij yjt−1 + uit (5.12)
j=1
This suggests a prior mean for the VAR model coefficients on variables
associated with the first own-lag of important variables equal to 1/ci , where
ci is the number of important variables in each equation i of the model. In
the example shown in (5.13), the prior means for the first own-lag of the
important variables y2t−1 and y3t−1 in the y1t equation of the VAR would
equal 0.5. The prior means for unimportant variables, y1t−1 , y4t−1 and y5t−1
in this equation would be zero.
This prior is quite different from the Minnesota prior in that it may
downweight the lagged dependent variable using a zero prior mean to dis-
count the autoregressive influence of past values of this variable. In contrast,
the Minnesota prior emphasizes a random-walk with drift model that relies
on prior means centered on a model: yit = αi + yit−1 + uit , where the in-
tercept term reflects drift in the random-walk model and is estimated using
a diffuse prior. The random-walk averaging prior is centered on a random-
walk model that averages over important variables in each equation of the
model and allows for drift as well. As in the case of the Minnesota prior,
the drift parameters αi are estimated using a diffuse prior.
Consistent with the Minnesota prior, LeSage and Krivelyova use zero
as a prior mean for coefficients on all lags other than first lags. Litterman
(1986) motivates reliance on zero prior means for many of the parameters
of the VAR model by appealing to ridge regression. Recall, ridge regression
can be interpreted as a Bayesian model that specifies prior means of zero
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 133
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 134
It should be noted that the prior relies on inappropriate zero prior means
for the important variables at lags greater than one for two reasons. First, it
is difficult to specify a reasonable alternative prior mean for these variables
that would have universal applicability in a large number of VAR model
applications. The difficulty of assigning meaningful prior means that have
universal appeal is most likely the reason that past studies relied on the
Minnesota prior means while simply altering the prior variances. A prior
mean that averages over previous period values of the important variables
has universal appeal and widespread applicability in VAR modeling. The
second motivation for relying on inappropriate zero prior means for longer
lags of the important variables is that overparameterization and collinearity
problems that plague the VAR model are best overcome by relying on a
parsimonious representation. Zero prior means for the majority of the large
number of coefficients in the VAR model are consistent with this goal of par-
simony and have been demonstrated to produce improved forecast accuracy
in a wide variety of applications of the Minnesota prior.
A flexible form with which to state prior standard deviations for variable
j in equation i at lag length k is shown in (5.14).
π(aijk ) = N (1/ci , σc ), j ∈ C, k = 1, i, j = 1, . . . , n
π(aijk ) = N (0, τ σc /k), j ∈ C, k = 2, . . . , m, i, j = 1, . . . , n
π(aijk ) = N (0, θσc /k), j¬ ∈ C, k = 1, . . . , m, i, j = 1, . . . , n
(5.14)
where:
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 135
restriction necessary to ensure that the prior mean of zero is imposed on the
parameters associated with lags greater than one for important variables
loosely, relative to a tight imposition of the prior mean of 1/ci on first own-
lags of important variables. We use θσc /k for lags on unimportant variables
whose prior means are zero, imposing a decrease in the variance as the lag
length increases. The restriction in (5.17) would impose the zero means for
unimportant variables with more confidence than the zero prior means for
important variables.
This mathematical formulation adequately captures all aspects of the
intuitive motivation for the prior variance specification enumerated above.
A quick way to see this is to examine a graphical depiction of the prior mean
and standard deviation for an important versus unimportant variable. An
artificial example was constructed for an important variable in Figure 5.1
and an unimportant variable in Figure 5.2. Figure 5.1 shows the prior
mean along with five upper and lower limits derived from the prior standard
deviations in (5.14). The five standard deviation limits shown in the figure
reflect ± 2 standard deviation limits resulting from alternative settings for
the prior hyperparameter τ ranging from 5 to 9 and a value of σc = 0.25.
Larger values of τ generated the wider upper and lower limits.
The solid line in Figure 5.1 reflects a prior mean of 0.2 for lag 1 indicating
five important variables, and a prior mean of zero for all other lags. The prior
standard deviation at lag 1 is relatively tight producing a small band around
the averaging prior mean for this lag. This imposes the ‘averaging’ prior
belief with a fair amount of certainty. Beginning at lag 2, the prior standard
deviation is increased to reflect relative uncertainty about the new prior
mean of zero for lags greater than unity. Recall, we believe that important
variables at lags greater than unity will exert some influence, making the
prior mean of zero not quite appropriate. Hence, we implement this prior
mean with greater uncertainty.
Figure 5.2 shows an example of the prior means and standard deviations
for an unimportant variable based on σc = 0.25 and five values of θ ranging
from .35 to .75. Again, the larger θ values produce wider upper and lower
limits. The prior for unimportant variables is motivated by the Minnesota
prior that also uses zero prior means and rapid decay with increasing lag
length.
A function rvar implements the random-walk averaging prior and a re-
lated function recm carries out estimation for an EC model based on this
prior. The documentation for the rvar function is shown below, where we
have eliminated information regarding the results structure variable returned
by the function to save space.
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 136
0.5
mean
0.4 upper
lower
0.3
0.2
prior standard deviation
0.1
-0.1
-0.2
-0.3
-0.4
-0.5
0 2 4 6 8 10 12
lag length
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 137
0.2
mean
upper
0.15 lower
0.1
prior standard deviation
0.05
-0.05
-0.1
-0.15
-0.2
0 2 4 6 8 10 12
lag length
typical values would be: sig = .1-.3, tau = 4-8, theta = .5-1
---------------------------------------------------
NOTES: - estimation is carried out in annualized growth terms because
the prior means rely on common (growth-rate) scaling of variables
hence the need for a freq argument input.
- constant term included automatically
---------------------------------------------------
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 138
y1 = XA + ε1 (5.18)
where it is assumed, var(ε1 ) = σ 2 I. The stochastic prior restrictions for this
single equation can be written as:
m111 σ/σ111 0 ... 0 a111 u111
m112 0 σ/σ112 0 0 a112 u112
= +
.. .. .. ..
. 0 . 0 . .
mnnk 0 0 0 σ/σnnk annk
unnk
(5.19)
2
where we assume, var(u) = σ I and the σijk take the form shown in (5.7)
for the Minnesota prior, and that set forth in (5.14) for the random-walk
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 139
averaging prior model. Similarly, the prior means mijk take the form de-
scribed for the Minnesota and averaging priors. Noting that (5.19) can be
written in the form suggested by Theil-Goldberger:
r = RA + u, (5.20)
the estimates for a typical equation are derived using (5.21).
 = (X 0 X + R0 R)−1 (X 0 y1 + R0 r) (5.21)
The difference in prior means specified by the Minnesota prior and the
random-walk averaging prior resides in the mijk terms found on the left-
hand-side of the equality in (5.20). The Minnesota prior indicates values:
(σ/σ111 , 0, . . . , 0)0 , where the non-zero value occurs in the position represent-
ing the lagged dependent variable. The averaging prior would have non-zero
values in locations associated with important variables and zeros elsewhere.
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 140
Note that you input the variables y in levels form, indicate any of four
data transformations that will be used when estimating the VAR model,
and the function varf will carry out this transformation, produce estimates
and forecasts that are converted back to levels form. This greatly simplifies
the task of producing and comparing forecasts based on alternative data
transformations.
Of course, if you desire a transformation other than the four provided,
such as logs, you can transform the variables y prior to calling the function
and specify ‘transf=0’. In this case, the function does not provide levels
forecasts, but rather forecasts of the logged-levels will be returned. Setting
‘transf=0’, produces estimates based on the data input and returns forecasts
based on this data.
As an example of comparing alternative VAR model forecasts based on
two of the four alternative transformations, consider the program in example
5.12.
% ----- Example 5.12 Forecasting VAR models
y = load(’test.dat’); % a test data set containing
% monthly mining employment for
% il,in,ky,mi,oh,pa,tn,wv
dates = cal(1982,1,12); % data covers 1982,1 to 1996,5
nfor = 12; % number of forecast periods
nlag = 6; % number of lags in var-model
begf = ical(1995,1,dates); % beginning forecast period
endf = ical(1995,12,dates); % ending forecast period
% no data transformation example
fcast1 = varf(y,nlag,nfor,begf);
% seasonal differences data transformation example
freq = 12; % set frequency of the data to monthly
fcast2 = varf(y,nlag,nfor,begf,[],freq);
% compute percentage forecast errors
actual = y(begf:endf,:);
error1 = (actual-fcast1)./actual;
error2 = (actual-fcast2)./actual;
vnames = strvcat(’il’,’in’,’ky’,’mi’,’oh’,’pa’,’tn’,’wv’);
fdates = cal(1995,1,12);
fprintf(1,’VAR model in levels percentage errors \n’);
tsprint(error1*100,fdates,vnames,’%7.2f’);
fprintf(1,’VAR - seasonally differenced data percentage errors \n’);
tsprint(error2*100,fdates,vnames,’%7.2f’);
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 141
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 142
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 143
3-step 0.68 1.22 1.26 1.39 2.34 1.17 1.00 1.16 1.91
4-step 0.93 1.53 1.45 1.46 2.81 1.39 1.25 1.35 2.02
5-step 1.24 1.84 1.63 1.74 3.27 1.55 1.57 1.53 2.10
6-step 1.55 2.22 1.70 2.05 3.41 1.53 1.81 1.64 2.15
7-step 1.84 2.62 1.59 2.24 3.93 1.68 1.99 1.76 2.49
8-step 2.21 3.00 1.56 2.34 4.45 1.82 2.10 1.89 2.87
9-step 2.55 3.30 1.59 2.58 4.69 1.93 2.33 1.99 3.15
10-step 2.89 3.64 1.74 2.65 5.15 2.08 2.51 2.12 3.39
11-step 3.25 3.98 1.86 2.75 5.75 2.29 2.70 2.27 3.70
12-step 3.60 4.36 1.94 2.86 6.01 2.40 2.94 2.23 3.96
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 144
The program code stores the individual MAPE forecast errors in a struc-
ture variable using: err(cnt).rm2 = abs((actual-frm2)./actual);, which
will have fields for the errors from all five models. These fields are matrices
of dimension 12 x 12, containing MAPE errors for each of the 12-step-ahead
forecasts for time cnt and for each of the 12 industries. We are not re-
ally interested in these individual results, but present this as an illustration.
As part of the illustration, we show how to access the individual results to
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 145
compute the average MAPE errors for each horizon and industry. If you
wished to access industry number 2’s forecast errors based on the model us-
ing r co-integrating relations, for the first experimental forecast period you
would use: err(1).rm(:,2). The results from our experiment are shown
below. These results represent an average over a total of 312 twelve-step-
ahead forecasts. Our simple MATLAB program produced a total of 224,640
forecasts, based on 312 twelve-step-ahead forecasts, for 12 industries, times
5 models!
Our experiment indicates that using more than the r co-integrating re-
lationships determined by the Johansen likelihood trace statistic degrades
the forecast accuracy. This is clear from the large number of forecast error
ratios greater than unity for the two models based on r + 1 and r + 2 ver-
sus those from the model based on r. On the other hand, using a smaller
number of co-integrating relationships than indicated by the Johansen trace
statistic seems to improve forecast accuracy. In a large number of industries
at many of the twelve forecast horizons, we see comparison ratios less than
unity. Further, the forecast errors associated with r − 2 are superior to those
from r − 1, producing smaller comparison ratios in 9 of the 12 industries.
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 146
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 147
.::UdecomBooks::.
Chapter 5 Appendix
148
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 149
co-integration library
.::UdecomBooks::.
CHAPTER 5. VAR AND ERROR CORRECTION MODELS 150
.::UdecomBooks::.
Chapter 6
151
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 152
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 153
Initialize θ0
Repeat {
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 154
y = Xβ + ε (6.2)
ε ∼ N (0, σ In ) 2
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 155
vector containing the prior means and T is an mxm matrix containing the
prior variances and covariances.
As is well-known, when m < k, the prior in (6.3) is improper, but can
be justified as the limiting case of a set of proper priors. For our purposes
it is convenient to express (6.3) in an alternative (equivalent) form based on
a factorization of T −1 into Q0 Q = T −1 , and q = Qr leading to (6.4).
For simplicity, we assume the diffuse prior for σ, π2 (σ) ∝ (1/σ), but all
of our results would follow for the case of an informative conjugate gamma
prior for this parameter. Following the usual Bayesian methodology, we
combine the likelihood function for our simple model:
In (6.6), we have used the notation β̂(σ) to convey that the mean of the
posterior, β̂, is conditional on the parameter σ, as is the variance, denoted
by V (σ). This single parameter prevents analytical solution of the Bayesian
regression problem. In order to overcome this problem, Theil and Gold-
berger (1961) observed that conditional on σ, the posterior density for β is
multivariate normal. They proposed that σ 2 be replaced by an estimated
value, σ̂ 2 = (y − X β̂)0 (y − X β̂)/(n − k), based on least-squares estimates β̂.
The advantage of this solution is that the estimation problem can be solved
using existing least-squares regression software. Their solution produces a
point estimate which we label β̂T G and an associated variance-covariance
estimate, both of which are shown in (6.7). This estimation procedure is
implemented by the function theil discussed in Chapter 4.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 156
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 157
2. Compute the mean and variance of β using (6.8) and (6.9) conditional
on the initial value σ 0 .
The above four steps are known as a ‘single pass’ through our (two-step)
Gibbs sampler, where we have replaced the initial arbitrary values of β 0 and
σ 0 with new values labeled β 1 and σ 1 . We now return to step 1 using the
new values β 1 and σ 1 in place of the initial values β 0 and σ 0 , and make
another ‘pass’ through the sampler. This produces a new set of values, β 2
and σ 2 .
Gelfand and Smith (1990) outline fairly weak conditions under which
continued passes through our Gibbs sampler will produce a distribution of
(β i , σ i ) values that converges to the joint posterior density in which we are
interested, p(β, σ). Given independent realizations of β i , σ i , the strong law
of large numbers suggests we can approximate the expected value of the β, σ
parameters using averages of these sampled values.
To illustrate the Gibbs sampler for our Bayesian regression model, we
generate a regression model data set containing 100 observations and 3 ex-
planatory variables; an intercept term, and two uncorrelated explanatory
variables generated from a standard normal distribution. The true values
of β0 for the intercept term, and the two slope parameters β1 and β2 , were
set to unity. A standard normal error term (mean zero, variance equal to
unity) was used in generating the data.
The prior means for the β parameters were set to unity and the prior
variance used was also unity, indicating a fair amount of uncertainty. The
following MATLAB program implements the Gibbs sampler for this model.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 158
We rely on MATLAB functions norm rnd and chis rnd to provide the
multivariate normal and chi-squared random draws which are part of the
distributions function library discussed in Chapter 9. Note also, we omit
the first 100 draws at start-up to allow the Gibbs sampler to achieve a
steady state before we begin sampling for the parameter distributions.
The results are shown below, where we find that it took only 1.75 seconds
to carry out the 1100 draws and produce a sample of 1000 draws on which
we can base our posterior inferences regarding the parameters β and σ. For
comparison purposes, we produced estimates using the theil function from
the regression function library.
elapsed_time =
1.7516
Gibbs estimates
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 159
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 160
results with 400 passes and much better results with 10,000 passes. We will
illustrate Tobit censured regression in Chapter 7.
Best et al., 1995 provide a set of Splus functions that implement six
different MCMC convergence diagnostics, some of which have been imple-
mented in a MATLAB function coda. This function provides: autocorre-
lation estimates, Rafterty-Lewis (1995) MCMC diagnostics, Geweke (1992)
NSE, (numerical standard errors) RNE (relative numerical efficiency) esti-
mates, Geweke Chi-squared test on the means from the first 20% of the
sample versus the last 50%. We describe the role of each of these diagnostic
measures using an applied example.
First, we have implemented the Gibbs sampler for the Bayesian regres-
sion model as a MATLAB function ols g that returns a structure variable
containing the draws from the sampler along with other information. Details
regarding this function are presented in Section 6.4.
Some options pertain to using the function to estimate heteroscedastic
linear models, a subject covered in the next section. For our purposes we can
use the function to produce Gibbs samples for the Bayesian homoscedastic
linear model by using a large value of the hyperparameter r. Note that this
function utilizes a structure variable named ‘prior’ to input information to
the function. Here is an example that uses the function to produce Gibbs
draws that should be similar to those illustrated in the previous section
(because we use a large hyperparameter value of r = 100).
The sample Gibbs draws for the parameters β are in the results structure
variable, result.bdraw which we send down to the coda function to produce
convergence diagnostics. This function uses a MATLAB variable ‘nargout’
to determine if the user has called the function with an output argument. If
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 161
so, the function returns a result structure variable that can be printed later
using the prt (or prt gibbs) functions. In the case where the user sup-
plies no output argument, (as in the example code above) the convergence
diagnostics will be printed to the MATLAB command window.
Note that if we wished to analyze convergence for the estimates of the σ
parameters in the model, we could call the function with these as arguments
in addition to the draws for the β parameters, using:
coda([result.bdraw result.sdraw]);
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 162
Variable beta2
NSE estimate Mean N.S.E. Chi-sq Prob
i.i.d. 0.980194 0.004015 0.078412
4% taper 0.980483 0.004075 0.090039
8% taper 0.980873 0.003915 0.088567
15% taper 0.982157 0.003365 0.095039
Variable beta3
NSE estimate Mean N.S.E. Chi-sq Prob
i.i.d. 0.940191 0.004233 0.961599
4% taper 0.940155 0.003972 0.957111
8% taper 0.940150 0.003735 0.954198
15% taper 0.940160 0.003178 0.946683
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 163
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 164
basis for a 95% interval estimate. This information is set using ‘info.q’,
which has a default value of 0.025.
Given our draws for β, raftery dichotomizes the draws using a binary
time-series that is unity if βi ≤ ‘info.q’ and zero otherwise. This binary
chain should be approximately Markovian so standard results for two-state
Markov chains can be used to estimate how long the chain should be run to
achieve the desired accuracy for the chosen quantile ‘info.q’.
The function coda prints out three different estimates from the raftery
function. A thinning ratio which is a function of the amount of autocorrela-
tion in the draws, the number of draws to use for ‘burn-in’ before beginning
to sample the draws for purposes of posterior inference, and the total number
of draws needed to achieve the accuracy goals.
Some terminology that will help to understand the raftery output. It is
always a good idea to discard a number of initial draws referred to as “burn-
in” draws for the sampler. Starting from arbitrary parameter values makes
it unlikely that initial draws come from the stationary distribution needed
to construct posterior estimates. Another practice followed by researchers
involves saving only every third, fifth, tenth, etc. draw since the draws from
a Markov chain are not independent. This practice is labeled “thinning”
the chain. Neither thinning or burn-in are mandatory in Gibbs sampling
and they tend to reduce the effective number of draws on which posterior
estimates are based.
From the coda output, we see that the thinning estimate provided by
raftery in the second column is 1, which is consistent with the lack of
autocorrelation in the sequence of draws. The third column reports that
only 2 draws are required for burn-in, which is quite small. Of course, we
started our sampler using the default least-squares estimates provided by
ols g which should be close to the true values of unity used to generate the
regression data set. In the fourth column, we find the total number of draws
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 165
needed to achieve the desired accuracy for each parameter. This is given as
3869 for β1 , β2 and β3 , which exceeds the 1,000 draw we used, so it would
be advisable to run the sampler again using this larger number of draws.
On the other hand, a call to the function raftery with the desired accu-
racy (‘info.r’) set to 0.01, so that nominal reporting based on a 95% interval
using the 0.025 and 0.975 quantile points should result in actual posterior
values that lie between 0.95 and 0.97, produces the results shown below.
These indicate that our 1,000 draws would be adequate to produce this
desired level of accuracy for the posterior.
The Nmin reported in the fifth column represents the number of draws
that would be needed if the draws represented an iid chain, which is virtually
true in our case. Finally, the i−statistic is the ratio of the fourth to the fifth
column. Raftery and Lewis indicate that values exceeding 5 for this statistic
are indicative of convergence problems with the sampler.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 166
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 167
an equilibrium state, we would expect the means from these two splits of
the sample to be roughly equal. A Z−test of the hypothesis of equality of
these two means is carried out and the chi-squared marginal significance is
reported. For our illustrative example, the second β parameter does not
fair well on these tests. We cannot reject the hypothesis of equal means at
the 95% level of significance, but we can at the 90% level. Increasing the
number of draws to 4,100 (suggested by the Rafterty and Lewis diagnostics)
and discarding the first 100 for burn-in produced the following results for
the chi-squared test of equality of the means from the first 20% versus the
last 50% of the 4,000 draws.
Variable beta2
NSE estimate Mean N.S.E. Chi-sq Prob
i.i.d. 0.982636 0.001962 0.596623
4% taper 0.982807 0.001856 0.612251
8% taper 0.982956 0.001668 0.623582
15% taper 0.982959 0.001695 0.630431
Variable beta3
NSE estimate Mean N.S.E. Chi-sq Prob
i.i.d. 0.942324 0.002137 0.907969
4% taper 0.942323 0.001813 0.891450
8% taper 0.942284 0.001823 0.885642
15% taper 0.942288 0.001936 0.892751
Here we see that the means are equal, indicating no problems with con-
vergence. The coda function allows the user to specify the proportions of
the sample used to carry out this test as ‘info.p1’ and ‘info.p2’ in the struc-
ture variable used to input user-options to coda. The default values based
on the first 20% of the sample versus the last 50% are values used by the
Splus version of CODA.
The chi-squared tests are implemented by a call inside coda to a MAT-
LAB function apm. This function allows one to produce posterior moment
estimates that represent an average over two sets of draws. This function
can be used without invoking coda and would be useful in cases where one
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 168
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 169
randn(’seed’,30301);
result2 = ols_g(y,x,prior,ndraw2,nomit,start2);
gres2 = momentg(result2.bdraw);
result = apm(gres1,gres2);
prt(result)
Geweke Chi-squared test for each parameter chain
First 33 % versus Last 67 % of the sample
Variable variable 1
NSE estimate Mean N.S.E. Equality chi sq
i.i.d. 1.04442600 0.00285852 0.6886650
4% taper 1.04414401 0.00303774 0.6864577
8% taper 1.04428137 0.00303411 0.6934830
15% taper 1.04455890 0.00267794 0.6867602
Variable variable 2
NSE estimate Mean N.S.E. Equality chi sq
i.i.d. 0.97690447 0.00271544 0.9589017
4% taper 0.97686937 0.00233080 0.9581417
8% taper 0.97684701 0.00199957 0.9586626
15% taper 0.97683040 0.00172392 0.9614142
Variable variable 3
NSE estimate Mean N.S.E. Equality chi sq
i.i.d. 0.93683408 0.00298336 0.7947394
4% taper 0.93733319 0.00260842 0.7662486
8% taper 0.93728792 0.00243342 0.7458718
15% taper 0.93728776 0.00227986 0.7293916
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 170
y = Xβ + ε (6.14)
ε ∼ N(0, σ V ),
2
V = diag(v1 , v2 , . . . , vn )
β ∼ N (c, T )
σ ∼ (1/σ)
r/vi ∼ ID χ2 (r)/r
r ∼ Γ(m, k)
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 171
small values of r, we can see the impact of the prior distribution assumed
for vij by considering that, the mean of the prior is r/(r − 2) and the mode
of the prior equals r/(r + 2). Small values of the hyperparameter r allow the
vij to take on a skewed form where the mean and mode are quite different.
This is illustrated in Figure 6.1 where distributions for vi associated with
various values of the parameter r are presented.
1.4
r=2
r=5
1.2
r=20
1
prior probability density
0.8
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
Vi values
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 172
X
n
[ (e2i /vi )/σ 2 ]|(β, V ) ∼ χ2 (n) (6.16)
i=1
This result parallels our simple case from section 6.2 where we adjust the ei
using the relative variance terms vi .
Geweke (1993) shows that the posterior distribution of V conditional on
(β, σ) is proportional to:
1. Begin with arbitrary values for the parameters β 0 , σ 0 , vi0 and r0 , which
we designate with the superscript 0.
5. Using β 1 and σ 1 , calculate expression (6.17) and use the value along
with an n−vector of random χ2 (r + 1) draws to determine vi , i =
1, . . . , n.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 173
to handle draws for the vi parameters and the Γ(m, k) draw to update the
hyperparameter r. To demonstrate MATLAB code for this model, we gener-
ate a heteroscedastic data set by multiplying normally distributed constant
variance disturbances for the last 50 observations in a sample of 100 by the
square root of a time trend variable. Prior means for β were set equal to
the true values of unity and prior variances equal to unity as well. A diffuse
prior for σ was employed and the Γ(m, k) prior for r was set to m = 8, k = 2
indicating a prior belief in heteroscedasticity.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 174
The program makes use of the fact that Vp−1 is a diagonal matrix, so we
transform the vector y by multiplying it by (V −1 ) and we use the Gauss
function matmul to carry out the same transformation on the matrix X.
Using this transformation saves space in RAM memory and speeds up the
Gibbs sampler. The function gamm rnd from the distributions function
library (see Chapter 9) is used to produce the random Γ(m, k) draws and
chis rnd is capable of generating a vector of random χ2 draws.
The results based on 1,100, 2,100 and 10,100 draws with the first 100
omitted for burn-in are shown below along with the Theil-Goldberger es-
timates. We can see that convergence is not a problem as the means and
standard deviations from the sample of 2,100 and 10,100 draws are quite
close. The time required to carry out the draws on a MacIntosh G3, 266
Mhz. computer are also reported for each set of draws. The heteroscedastic
linear model estimates are slightly more accurate than the Theil-Goldberger
estimates, presumably because the latter do not take the heteroscedastic na-
ture of the disturbances into account. Note that both estimation procedures
are based on the same prior information.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 175
***************************************************************
Posterior Estimates
Variable Coefficient t-statistic t-probability
variable 1 1.543398 1.429473 0.156081
variable 2 1.104141 0.996164 0.321649
variable 3 1.025292 0.953436 0.342739
A plot of the mean over the 2000 samples of vi estimates is shown in Fig-
ure 6.2. These estimates appear to capture the nature of the heteroscedastic
disturbances introduced in the model for observations 51 to 100.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 176
6
Vi estimates
1
0 10 20 30 40 50 60 70 80 90 100
Observations
y = X B + E, E = N(0,sige*V),
V = diag(v1,v2,...vn), r/vi = ID chi(r)/r, r = Gamma(m,k)
B = N(c,T), sige = gamma(nu,d0)
---------------------------------------------------
USAGE: results = ols_g(y,x,,ndraw,nomit,prior,start)
where: y = dependent variable vector
x = independent variables matrix of rank(k)
ndraw = # of draws
nomit = # of initial draws omitted for burn-in
prior = a structure for prior information input:
prior.beta, prior means for beta, c above
prior.bcov, prior beta covariance , T above
prior.rval, r prior hyperparameter, default=4
prior.m, informative Gamma(m,k) prior on r
prior.k, informative Gamma(m,k) prior on r
prior.nu, informative Gamma(nu,d0) prior on sige
prior.d0 informative Gamma(nu,d0) prior on sige
default for above: nu=0,d0=0 (diffuse prior)
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 177
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 178
examine the sampling process. Note that the format of the draws returned in
the structure is such that mean(results.bdraw) or std(results.bdraw)
will produce posterior means and standard deviations.
A corresponding function prt gibbs is used to process the results struc-
ture returned by ols g and print output in the form of a regression as shown
below:
(1 − φ1 L − φ2 L2 − . . . − φm Lm )yt = c + εt (6.18)
where we wish to impose the restriction that the mth order difference equa-
tion is stable. This requires that the roots of
(1 − φ1 z − φ2 z 2 − . . . − φm z m ) = 0 (6.19)
lie outside the unit circle. Restrictions such as this, as well as non-linear
restrictions, can be imposed on the parameters during Gibbs sampling by
simply rejecting values that do not meet the restrictions (see Gelfand, Hills,
Racine-Poon and Smith, 1990). Below is a function ar g that implements
a Gibbs sampler for this model and imposes the stability restrictions using
rejection sampling. Information regarding the results structure is not shown
to save space, but the function returns a structure variable containing draws
and other information in a format similar to the ols g function.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 179
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 180
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 181
The function allows for informative or diffuse priors on the noise vari-
ance σε2 and allows for a homoscedastic or heteroscedastic implementation
using the chi-squared prior described for the Bayesian heteroscedastic lin-
ear model. Processing this input information from the user is accomplished
using the MATLAB ‘fieldnames’ command and ‘strcmp’ statements, with
defaults provided by the function for cases where the user inputs no infor-
mation for a particular field.
One point to note is the rejection sampling code, where we use the MAT-
LAB function roots to examine the roots of the polynomial for stability. We
also rely on a MATLAB function fliplr that ‘flips’ a vector or matrix from
‘left to right’. This is needed because of the format assumed for the poly-
nomial coefficients by the roots function. If the stability condition is not
met, we simply carry out another multivariate random draw to obtain a new
vector of coefficients and check for stability again. This process continues
until we obtain a coefficient draw that meets the stability conditions. One
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 182
could view this as discarding (or rejecting) draws where the coefficients are
inconsistent with stability.
Use of rejection sampling can produce problems if the acceptance rate
becomes excessively low. Consider that if we need a sample of 1000 draws
to obtain posterior inferences and the acceptance rate is 1 of 100 draws,
we would need to make 100,000 multivariate random draws for the autore-
gressive parameters to obtain a usable sample of 1000 draws on which to
based our posterior inferences. To help the user monitor problems that
might arise due to low acceptance rates, we calculate the acceptance rate
and return this information in the results structure. Of course, the function
prt gibbs was modified to produce a printout of the results structure from
the ar g estimation procedure.
In addition to returning the time required in seconds, the function places
a ‘waitbar’ on the screen that shows graphically the progress of the Gibbs
sampling. This is accomplished with the MATLAB function waitbar().
The following program generates data based on an AR(2) model which
is on the edge of the stability conditions. The AR(1)+AR(2) coefficients
equal unity, where the restriction for stability of this model requires that
these two coefficients sum to less than unity. The demonstration program
compares estimates from ols, ar g and theil.
% ----- Example 6.7 Using the ar_g() function
n = 200; k = 3; e = randn(n,1)*2; y = zeros(n,1);
for i=3:n
y(i,1) = 1 + y(i-1,1)*0.25 + y(i-2,1)*0.75 + e(i,1);
end;
x = [ones(n,1) mlag(y,2)];
yt = trimr(y,100,0); xt = trimr(x,100,0); % omit first 100 for startup
vnames = strvcat(’y-variable’,’constant’,’ylag1’,’ylag2’);
res1 = ols(yt,xt);
prt(res1,vnames);
ndraw = 1100; nomit = 100;
bmean = zeros(k,1); bcov = eye(k)*100;
prior.bcov = bcov; % diffuse prior variance
prior.beta = bmean; % prior means of zero
res2 = ar_g(yt,2,ndraw,nomit,prior);
prt(res2,’y-variable’);
res3 = theil(yt,xt,bmean,eye(k),bcov);
prt(res3,vnames);
The results from running the program are shown below. Running the
program 100 times produced results where the stability conditions were vi-
olated 10 percent of the time by the least-squares and Theil-Goldberger
estimates. Figure 6.3 shows a histogram of the distribution of φ1 + φ2 for
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 183
the 100 sets of estimates, where the 10 violations of stability for least-squares
and Theil-Goldberger appear as the two bars farthest to the right. The mean
acceptance rate for the Gibbs sampler over the 100 runs was 0.78, and the
median was 0.86.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 184
***************************************************************
Variable Prior Mean Std Deviation
constant 0.000000 10.000000
ylag1 0.000000 10.000000
ylag2 0.000000 10.000000
***************************************************************
Posterior Estimates
Variable Coefficient t-statistic t-probability
constant -0.672706 -0.296170 0.767733
ylag1 0.350736 2.355845 0.020493
ylag2 0.666585 4.391659 0.000029
45
40 ols
gibbs
theil
35
30
Frequency of outcomes
25
20
15
10
0
0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05
φ1+φ2
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 185
y = ρW y + ε (6.20)
ε ∼ N (0, σ In ) 2
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 186
0 0.5 0.5 0 0
0.5 0 0.5 0 0
W = 0.33 0.33 0 0.33 0 (6.22)
0 0 0.5 0 0.5
0 0 0 1 0
The parameter ρ is a coefficient on the spatially lagged dependent vari-
able W y that reflects the influence of this explanatory variable on variation
in the dependent variable y. The model is called a first order spatial autore-
gression because its represents a spatial analogy to the first order autoregres-
sive model from time series analysis, yt = ρyt−1 + εt . Multiplying a vector
y containing 5 areas cross-sectional data observations by the standardized
spatial contiguity matrix W produces an explanatory variable equal to the
mean of observations from contiguous states.
Using diffuse priors, π(ρ) and π(σ) for the parameters (ρ, σ) shown in
(6.23),
which can be combined with the likelihood for this model, we arrive at a
joint posterior distribution for the parameters, p(ρ, σ|y).
1
p(ρ, σ|y) ∝ |In − ρW |σ −(n+1) exp{− (y − ρW y)0 (y − ρW y)} (6.24)
2σ 2
If we treat ρ as known, the kernel for the conditional posterior for σ
given ρ takes the form:
1 0
p(σ|ρ, y) ∝ σ −(n+1) exp{− ε ε} (6.25)
2σ 2
where ε = y − ρW y. It is important to note that by conditioning on ρ
(treating it as known) we can subsume the determinant, |In − ρW |, as part
of the constant of proportionality, leaving us with one of the standard dis-
tributional forms. From (6.25) we conclude that σ 2 ∼ χ2 (n).
Unfortunately, the conditional distribution of ρ given σ takes the follow-
ing non-standard form:
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 187
0.9
0.8
0.7
0.6
|In - ρ W|
0.5
0.4
0.3
0.2
0.1
0
-2 -1.5 -1 -0.5 0 0.5 1
ρ values
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 188
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 189
2. We adjust the parameter c used in the Metropolis step after the initial
‘nomit’ passes through the sampler based on a two standard deviation
measure of the ρ values sampled up to this point.
3. Since ‘rho2’ is the candidate value that might become are updated
value of ρ depending on the outcome of the Metropolis step, we carry
out rejection sampling on this draw to ensure that sampled values will
meet the restriction. In the event that ‘rho2’ does not meet the re-
striction, we discard it and draw another value. This process continues
until we obtain a value for ‘rho2’ that meets the restriction.
This estimator has been incorporated in a function far g that allows the
user to input an informative prior for the spatial autocorrelation parameter
ρ.
We carried out an experiment to illustrate Metropolis sampling on the
conditional distribution of ρ in the first-order spatial autoregressive model.
A series of ten models were generated using values of ρ ranging from -0.9
to 0.9 in increments of 0.2. (The bounds on ρ for this standardized spatial
weight matrix were -1.54 and 1.0, so these values of ρ used to generate the
data were within the bounds.) Estimates for all ten models were derived
using a sample of 2,100 draws with the first 100 omitted for “burn-in”. In
addition to producing Gibbs estimates, based on a diffuse prior centered on
the true value of ρ having a variance of 10. We also produced maximum
likelihood estimates for the ten models using the maximum likelihood meth-
ods presented in Anselin (1988). Timing results as well as acceptance rates
for the Gibbs sampler are reported in Table 5.1.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 190
From the table we see that the Metropolis algorithm produced estimates
close to the true value of ρ used to generate the data vector y, as did the
maximum likelihood method. The acceptance rates are lower for the value
of ρ = 0.9 because this magnitude is near the upper bound of unity on ρ for
this particular weight matrix.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 191
The input format for these functions rely on a structure variable ‘prior’
for inputting prior hyperparameters. For example, the documentation for
the function bvar g is shown below, where information regarding the results
structure returned by the function was eliminated to save space.
PURPOSE: Gibbs sampling estimates for Bayesian vector
autoregressive model using Minnesota-type prior
y = A(L) Y + X B + E, E = N(0,sige*V),
V = diag(v1,v2,...vn), r/vi = ID chi(r)/r, r = Gamma(m,k)
c = R A(L) + U, U = N(0,Z), Minnesota prior
a diffuse prior is used for B associated with deterministic
variables
---------------------------------------------------
USAGE: result = bvar_g(y,nlag,prior,ndraw,nomit,x)
where: y = an (nobs x neqs) matrix of y-vectors
nlag = the lag length
ndraw = # of draws
nomit = # of initial draws omitted for burn-in
prior = a structure variable
prior.tight, Litterman’s tightness hyperparameter
prior.weight, Litterman’s weight (matrix or scalar)
prior.decay, Litterman’s lag decay = lag^(-decay)
prior.rval, r prior hyperparameter, default=4
prior.m, informative Gamma(m,k) prior on r
prior.k, informative Gamma(m,k) prior on r
x = an optional (nobs x nx) matrix of variables
NOTE: constant vector automatically included
---------------------------------------------------
The output from these functions can be printed using the wrapper func-
tion prt which calls a function prt varg that does the actual working of
printing output results.
In addition, Gibbs forecasting functions bvarf g, rvarf g, becmf g,
recmf g analogous to the forecasting functions with names corresponding
to those described in Chapter 5 are in the vector autoregressive function
library.
Probit and tobit models can be estimated using Gibbs sampling, a sub-
ject covered in the next chapter. Functions probit g and tobit g implement
this method for estimating these models and allow for heteroscedastic errors
using the chi-squared prior described in Section 6.3.
A Bayesian model averaging procedure set forth in Raftery, Madigan
and Hoeting (1997) is implemented in the function bma g. Leamer (1983)
suggested that a Bayesian solution to model specification uncertainty is to
average over all possible models using the posterior probabilities of the mod-
els as weights. Raftery, Madigan and Hoeting (1997) devise a way to imple-
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 192
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 193
# of draws = 5000
nu,lam,phi = 4.000, 0.250, 3.000
# of models = 20
time(seconds) = 30.8
***************************************************************
Model averaging information
Model v1 v2 v3 v4 v5 v6 Prob Visit
model 1 1 1 0 1 1 1 1.624 479
model 2 1 1 1 0 1 1 33.485 425
model 3 1 1 0 0 1 1 64.823 212
***************************************************************
Variable Coefficient t-statistic t-probability
const 10.069325 222.741085 0.000000
v1 0.947768 25.995923 0.000000
v2 0.996021 26.755817 0.000000
v3 -0.026374 -0.734773 0.463338
v4 0.000018 0.000462 0.999632
v5 4.785116 58.007254 0.000000
v6 1.131978 10.349354 0.000000
From the example, we see that the correct model is found and assigned
a posterior probability of 64.8%. The printout also shows how many times
the MCMC sampler ‘visited’ each particular model. The remaining 4,000
MCMC draws spent time visiting other models with posterior probabilities
less than one percent. All unique models visited are returned in the results
structure variable by the bma g function so you can examine them. It is
often the case that even when the procedure does not find the true model,
the estimates averaged over all models are still quite close to truth. The
true model is frequently among the high posterior probability models.
This function was created from Splus code provided by Raftery, Madigan
and Hoeting available on the internet. A few points to note about using the
function bma g. The function assigns a diffuse prior based on data vectors
y and explanatory matrices X that are standardized inside the function so
the default prior should work well in most applied settings. Example 6.8
above relies on these default prior settings, but they can be changed as
an input option. Priors for dichotomous or polychotomous variables are
treated differently from continuous variables, requiring that these two types
of variables be entered separately as in Example 6.9. The documentation
for the function is:
PURPOSE: Bayes model averaging estimates of Raftery, Madigan and Hoeting
-----------------------------------------------------------------
USAGE: result = bma_g(ndraw,y,x1,x2,prior)
or: result = bma_g(ndraw,y,x1)
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 194
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 195
printed below, we see that 30 of the 503 distinct models exhibited posterior
probabilities exceeding 1 percent.
Certain variables such as student attendance rates (attend), median in-
come (medinc), teachers average salary (tpay), a large city dummy variable
(bcity) and a dummy variable for the northwest region of the state (northw)
appeared in all of the 30 high posterior probability models. Other vari-
ables such as small city dummy variable (scity) as well as regional dummy
variables (southe, northe, northc) never appeared in the high posterior prob-
ability models.
Perhaps most interesting are the increased posterior probabilities asso-
ciated with the class size variable (csize) entering models 21 to 30 and the
welfare (welf) variable entering models 23 to 30. Also of interest is that
expenditures per pupil (expnd) enters and exits the 30 highest posterior
probability models. The may explain why economists disagree about the
impact of resources on student performance. Different model specifications
will find this variable to be significant or insignificant depending on the other
variables entered in the model.
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 196
model 15 0 1 1 0 1 1 1 1
model 16 0 1 1 0 0 1 1 1
model 17 0 1 1 1 0 0 1 1
model 18 1 1 1 0 1 1 1 1
model 19 1 1 1 0 0 1 1 1
model 20 0 1 1 0 0 0 1 1
model 21 0 1 1 0 1 1 1 1
model 22 0 1 1 0 0 1 1 1
model 23 1 1 1 1 1 1 1 1
model 24 1 1 1 1 0 1 1 1
model 25 0 1 1 1 1 1 1 1
model 26 1 1 1 1 1 1 1 1
model 27 1 1 1 1 0 1 1 1
model 28 0 1 1 1 0 1 1 1
model 29 0 1 1 1 1 1 1 1
model 30 0 1 1 1 0 1 1 1
Model bcity scity subur southe northc northe northw Prob Visit
model 1 1 0 1 0 0 0 1 1.415 2
model 2 1 0 1 0 0 0 1 1.466 3
model 3 1 0 1 0 0 0 1 1.570 2
model 4 1 0 1 0 0 0 1 1.620 3
model 5 1 0 1 0 0 0 1 1.622 1
model 6 1 0 1 0 0 0 1 1.643 4
model 7 1 0 1 0 0 0 1 1.794 2
model 8 1 0 1 0 0 0 1 1.900 2
model 9 1 0 1 0 0 0 1 1.945 1
model 10 1 0 1 0 0 0 1 1.971 1
model 11 1 0 1 0 0 0 1 1.997 2
model 12 1 0 1 0 0 0 1 2.061 3
model 13 1 0 1 0 0 0 1 2.201 2
model 14 1 0 1 0 0 0 1 2.338 1
model 15 1 0 1 0 0 0 1 2.358 1
model 16 1 0 1 0 0 0 1 2.519 2
model 17 1 0 1 0 0 0 1 2.606 2
model 18 1 0 1 0 0 0 1 2.757 5
model 19 1 0 1 0 0 0 1 2.938 5
model 20 1 0 1 0 0 0 1 3.119 4
model 21 1 0 1 0 0 0 1 3.179 3
model 22 1 0 1 0 0 0 1 3.329 4
model 23 1 0 1 0 0 0 1 3.781 2
model 24 1 0 1 0 0 0 1 4.247 6
model 25 1 0 1 0 0 0 1 4.571 4
model 26 1 0 1 0 0 0 1 4.612 7
model 27 1 0 1 0 0 0 1 5.083 4
model 28 1 0 1 0 0 0 1 5.147 5
model 29 1 0 1 0 0 0 1 7.420 7
model 30 1 0 1 0 0 0 1 8.172 5
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 197
***************************************************************
Variable Coefficient t-statistic t-probability
const -374.606723 -7.995893 0.000000
unemp 0.029118 0.088849 0.929231
nwhite -0.287751 -6.413557 0.000000
medinc 0.000748 5.498948 0.000000
welf -0.113650 -1.286391 0.198795
expnd -0.000138 -0.352608 0.724504
csize -0.287077 -1.475582 0.140572
tpay 0.000478 3.661464 0.000273
attend 4.358154 9.260027 0.000000
bcity 17.116312 3.692185 0.000242
scity -0.000026 -0.000009 0.999993
subur 1.935253 1.543629 0.123197
southe 0.002544 0.001778 0.998582
northc 0.000012 0.000008 0.999994
northe 0.000123 0.000109 0.999913
northw 5.651431 4.328001 0.000018
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 198
of draws and writing the output to a file for storage using the mprint
function. Additional sequences of draws can then be generated and placed
in files for storage until a sufficient sample of draws has been accumulated.
Most of the MCMC estimation functions allow the user to input starting
values which can also be useful in this circumstance. A series of draws
based on alternative starting values is another approach that is often used
to diagnose convergence of the sampler.
.::UdecomBooks::.
Chapter 6 Appendix
199
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 200
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 201
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 202
.::UdecomBooks::.
CHAPTER 6. MARKOV CHAIN MONTE CARLO MODELS 203
.::UdecomBooks::.
Chapter 7
The regression function library contains routines for estimating limited de-
pendent variable logit, probit, and tobit models. In addition, there are
Gibbs sampling functions to implement recently proposed MCMC methods
for estimating these models proposed by Chib (1992) and Albert and Chib
(1993).
These models arise when the dependent variable y in our regression
model takes values 0, 1, 2, . . . representing counts of some event or a coding
system for qualitative outcomes. For example, y = 0 might represent a cod-
ing scheme indicating a lack of labor force participation by an individual in
our sample, and y = 1 denotes participation in the labor force. As another
example where the values taken on by y represent counts, we might have
y = 0, 1, 2, . . . denoting the number of foreign direct investment projects in
a given year for a sample of states in the U.S.
Regression analysis of these data usually are interpreted in the framework
of a probability model that attempts to describe the P rob(event i occurs) =
F (X: parameters). If the outcomes represent two possibilities, y = 0, 1, the
model is said to be binary, whereas models with more than two outcomes
are referred to as multinomial or polychotomous.
Ordinary least-squares can be used to carry out a regression with a binary
response variable y = 0, 1, but two problems arise. First, the errors are by
construction heteroscedastic. This is because the actual y = 0, 1 minus the
value Xβ = −Xβ or ι − Xβ. Note also that the heteroscedastic errors are a
function of the parameter vector β. The second problem with least-squares
is that the predicted values can take on values outside the (0,1) interval,
204
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 205
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 206
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
logit
0.1 t dof=7
normal
t dof=2
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Section 7.1 presents the logit and probit regression functions from the
regression function library and Section 7.2 takes up Gibbs sampling esti-
mation of these models. In Section 7.3 tobit regression is discussed and
Section 7.4 turns attention to Gibbs sampling this model.
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 207
The resulting printouts are shown below. The logit function displays
the usual coefficient estimates, t−statistics and marginal probabilities as
well as measures of fit proposed by McFadden (1984) and Estrella (1998).
The maximized value of the log-likelihood function is reported along with
a log-likelihood ratio test statistic and marginal probability that all slopes
in the model are zero. This is based on a comparison of the maximized
log-likelihood (Lu) versus the log-likelihood for a restricted model (Lr) with
only a constant term.
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 208
Solving the logit model for parameter estimates involves non-linear op-
timization of the likelihood function. Fortunately, the likelihood function
takes a form that is globally concave allowing us to use Newton’s method.
This approach would usually converge in just a few iterations unless we
encounter an ill-conditioned data set. (We don’t actually use Newton’s
method, an issue discussed below.)
Since this is our first encounter with optimization, an examination of the
logit function is instructive. The iterative ‘while-loop’ that optimizes the
likelihood function is shown below:
iter = 1;
while (iter < maxit) & (crit > tol)
tmp = (i+exp(-x*b)); pdf = exp(-x*b)./(tmp.*tmp); cdf = i./(i+exp(-x*b));
tmp = find(cdf <=0); [n1 n2] = size(tmp);
if n1 ~= 0; cdf(tmp) = 0.00001; end;
tmp = find(cdf >= 1); [n1 n2] = size(tmp);
if n1 ~= 0; cdf(tmp) = 0.99999; end;
% gradient vector for logit, see page 883 Green, 1997
term1 = y.*(pdf./cdf); term2 = (i-y).*(pdf./(i-cdf));
for kk=1:k;
tmp1(:,kk) = term1.*x(:,kk); tmp2(:,kk) = term2.*x(:,kk);
end;
g = tmp1-tmp2; gs = (sum(g))’; delta = exp(x*b)./(i+exp(x*b));
% Hessian for logit, see page 883 Green, 1997
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 209
H = zeros(k,k);
for ii=1:t;
xp = x(ii,:)’; H = H - delta(ii,1)*(1-delta(ii,1))*(xp*x(ii,:));
end;
db = -inv(H)*gs;
% stepsize determination
s = 2; term1 = 0; term2 = 1;
while term2 > term1
s = s/2;
term1 = lo_like(b+s*db,y,x);
term2 = lo_like(b+s*db/2,y,x);
end;
bn = b + s*db; crit = abs(max(max(db)));
b = bn; iter = iter + 1;
end; % end of while
The MATLAB variables ‘maxit’ and ‘tol’ can be set by the user as an
input option, but there are default values supplied by the function. The
documentation for the function shown below makes this clear.
PURPOSE: computes logistic regression estimates
---------------------------------------------------
USAGE: results = logit(y,x,maxit,tol)
where: y = dependent variable vector (nobs x 1)
x = independent variables matrix (nobs x nvar)
maxit = optional (default=100)
tol = optional convergence (default=1e-6)
---------------------------------------------------
RETURNS: a structure
result.meth = ’logit’
result.beta = bhat
result.tstat = t-stats
result.yhat = yhat
result.resid = residuals
result.sige = e’*e/(n-k)
result.r2mf = McFadden pseudo-R^2
result.rsqr = Estrella R^2
result.lratio = LR-ratio test against intercept model
result.lik = unrestricted Likelihood
result.cnvg = convergence criterion, max(max(-inv(H)*g))
result.iter = # of iterations
result.nobs = nobs
result.nvar = nvars
result.zip = # of 0’s
result.one = # of 1’s
result.y = y data vector
--------------------------------------------------
SEE ALSO: prt(results), probit(), tobit()
---------------------------------------------------
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 210
The ‘while-loop’ first evaluates the logistic probability density and cumu-
lative density function at the initial least-squares values set for the param-
eter vector β, returned by the ols function. Values for the cdf outside the
(0,1) bounds are set to 0.00001 and 0.99999 to avoid overflow and underflow
computational problems by the following code.
tmp = find(cdf <=0); [n1 n2] = size(tmp);
if n1 ~= 0; cdf(tmp) = 0.00001; end;
tmp = find(cdf >= 1); [n1 n2] = size(tmp);
if n1 ~= 0; cdf(tmp) = 0.99999; end;
β1 = β0 + λ0 ∆0 (7.5)
Gradient methods of optimization rely on a direction vector ∆ = W g,
where W is a positive definite matrix and g is the gradient of the function
evaluated at the current value β0 , ∂F (β0 )/∂β0 . Newton’s method is based on
a linear Taylor series expansion of the first order conditions: ∂F (β0 )/∂β0 =
0, which leads to W = −H −1 and ∆ = −H −1 g. Note that Newton’s method
implicitly sets λ0 to unity.
Our algorithm determines a stepsize variable for λ, which need not be
unity, using the following code.
db = -inv(H)*gs;
% stepsize determination
s = 2;
term1 = 0; term2 = 1;
while term2 > term1
s = s/2;
term1 = lo_like(b+s*db,y,x);
term2 = lo_like(b+s*db/2,y,x);
end;
bn = b + s*db; crit = abs(max(max(db)));
b = bn; iter = iter + 1;
end; % end of while
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 211
(
N (ỹi , σi2 ), truncated at the left by 0 if yi = 1
f (zi |ρ, β, σ) ∼ (7.6)
N (ỹi , σi2 ), truncated at the right by 0 if yi = 0
Because the probit model is unable to identify both β and σε2 , we scale our
problem to make σε2 equal to unity.
These expressions simply indicate that we can replace values of yi = 1
with the sampled normals truncated at the left by 0 and values of yi = 0
with sampled normals truncated at the right by 0.
As an intuitive view of why this works, consider the following MATLAB
program that generates a vector of latent values z and then converts these
values to zero or one, depending on whether they take on values greater or
less than zero. The program carries out Gibbs sampling using the function
probit g. During the sampling, generated values for the latent variable z
from each pass through the sampler are saved and returned in a structure
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 212
variable results.ydraw. The program then plots the mean of these predic-
tions versus the actual vector of latent z values, which is shown in Figure 7.2.
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 213
5
acutal
draws
4
actual y-values vs mean of y values drawn
-1
-2
0 10 20 30 40 50 60 70 80 90 100
observations
The first point to note is that the function allows a prior for the param-
eters β and implements the heteroscedastic model based on t−distributed
errors discussed in Chapter 6. Albert and Chib (1993) point out that tak-
ing this approach allows one to produce a family of models that encompass
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 214
both traditional logit and probit models. Recall that the logistic distribu-
tion is somewhat near the t−distribution with seven degrees of freedom and
the normal distribution assumed by probit models can be represented as a
t−distribution with a very large degrees of freedom parameter.
An implication of this is that the user can rely on a large prior hyperpa-
rameter value for r = 100 say, and a diffuse prior for β to produce estimates
close to a traditional probit model. On the other hand, setting r = 7 or even
r = 3 and relying on a diffuse prior for β should produce estimates close to
those from a traditional logit regression. Green (1997) states that the issue
of which distributional form should be used on applied econometric prob-
lems is unresolved. He further indicates that inferences from either logit or
probit models are often the same.
Example 7.3 illustrates these ideas by comparing maximum likelihood
logit estimates to those from Gibbs sampling with a hyperparameter value
for r = 7.
The results shown below indicate that the estimates, t−statistics, and
fit from maximum likelihood logit and Gibbs sampling are quite similar as
they should be.
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 215
Log-Likelihood = -18.2430
# of iterations = 10
Convergence criterion = 6.5006301e-11
Nobs, Nvars = 100, 4
# of 0’s, # of 1’s = 51, 49
***************************************************************
Variable Coefficient t-statistic t-probability
variable 1 -1.852287 -3.582471 0.000537
variable 2 -1.759636 -3.304998 0.001336
variable 3 2.024351 3.784205 0.000268
variable 4 1.849547 3.825844 0.000232
Example 7.4 illustrates that we can rely on the same function probit g
to produce probit estimates using a setting for the hyperparameter r of 100,
which results in the normal probability model.
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 216
end;
prt(probit(y,x)); % print out maximum likelihood estimates
ndraw = 2100; nomit = 100;
prior.beta = zeros(4,1); % diffuse prior means for beta
prior.bcov = eye(4)*10000;% diffuse prior variance for beta
prior.rval = 100; % probit prior r-value
resp = probit_g(y,x,ndraw,nomit,prior); % Gibbs sampling
prt(resp); % print Gibbs probit results
Again, the results shown below indicate that Gibbs sampling can be used
to produce estimates similar to those from maximum likelihood estimation.
Probit Maximum Likelihood Estimates
McFadden R-squared = 0.5786
Estrella R-squared = 0.6981
LR-ratio, 2*(Lu-Lr) = 80.1866
LR p-value = 0.0000
Log-Likelihood = -29.2014
# of iterations = 8
Convergence criterion = 1.027844e-07
Nobs, Nvars = 100, 4
# of 0’s, # of 1’s = 51, 49
***************************************************************
Variable Coefficient t-statistic t-probability
variable 1 -1.148952 -3.862861 0.000203
variable 2 -1.393497 -4.186982 0.000063
variable 3 1.062447 3.167247 0.002064
variable 4 1.573209 4.353392 0.000034
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 217
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 218
75. The Gibbs estimates that robustify for the outliers are closer to the true
values of the parameters used to generate the data, as we would expect.
Logit Maximum Likelihood Estimates
McFadden R-squared = 0.6299
Estrella R-squared = 0.7443
LR-ratio, 2*(Lu-Lr) = 86.4123
LR p-value = 0.0000
Log-Likelihood = -25.3868
# of iterations = 9
Convergence criterion = 2.9283713e-07
Nobs, Nvars = 100, 4
# of 0’s, # of 1’s = 44, 56
***************************************************************
Variable Coefficient t-statistic t-probability
variable 1 -1.579212 -3.587537 0.000528
variable 2 -2.488294 -3.743234 0.000310
variable 3 2.633014 4.038189 0.000108
variable 4 1.958486 3.632929 0.000452
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 219
4.5
3.5
3
vi estimates
2.5
1.5
1
0 10 20 30 40 50 60 70 80 90 100
observations
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 220
vi = ((e.*e) + in*rval)./chiv;
V = in./vi;
if mm ~= 0
rval = gamm_rnd(1,1,mm,kk); % update rval
end;
if i > nomit % if we are past burn-in, save the draws
bsave(i-nomit,:) = bhat’;
ymean = ymean + lp;
vmean = vmean + vi;
yhat = yhat + 1-bb;
if mm~= 0
rsave(i-nomit,1) = rval;
end;
end; % end of if i > nomit
waitbar(i/ndraw);
end; % End the sampling
gtime = etime(clock,t0);
close(hwait);
A few points to note about the sampling routine. First, we allow the
user to rely on an improper prior for the hyperparameter r, or an infor-
mative Gamma(m,k) prior. In the event of an informative Gamma prior,
the MATLAB variable mm 6= 0 and we update the hyperparameter r us-
ing a random gamma draw produced by the function gamm rnd based on
the informative prior parameters stored in mm, kk. Note also that we save
the mean of latent variable draws for y and the mean of the non-constant
variance parameters vi and return these in the result structure variable.
The code that carries out the left-truncated normal random draw is:
% update z
lp=xstar*bhat;
bb=.5*(1+erf(-lp/sqrt(2)));
tt=(bb.*(1-yin)+(1-bb).*yin).*rand(n,1)+bb.*yin;
y=sqrt(2)*erfinv(2*tt-1) + lp;
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 221
yi? = xi β + εi (7.7)
yi = 0 if yi? ≤ 0 (7.8)
yi = yi? if yi? > 0 (7.9)
where the index function produces a set of limit and non-limit observations.
This results in a likelihood function that consists of two parts, one part
corresponding to a classical regression for the non-limit observations and
the other part involving a discrete distribution of the relevant probabilities
for the limiting cases.
The log-likelihood function shown in (7.10) was placed in the MATLAB
function to like which we maximize using the function maxlik from the
optimization function library discussed in Chapter 10.
X X
lnL = −1/2[ln(2π)+lnσ 2 +(yi −xi β)2 /σ 2 ]+ ln[1−Φ(xi β)/σ] (7.10)
yi >0 yi =0
The documentation for the tobit function that produces maximum like-
lihood estimates for this model is shown below.
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 222
% results.resid = residuals
% results.sige = e’*e/(n-k)
% results.rsqr = rsquared
% results.rbar = rbar-squared
% results.lik = Log-Likelihood value
% results.iter = # of iterations taken
% results.grad = gradient at solution
% results.opt = method of optimization used
% results.nobs = nobs
% results.nobsc = # of censored observations
% results.nvar = nvars
% results.y = y data vector
The resulting estimates and the CPU time taken are shown below. We
see that the estimates produced by all three optimization methods are iden-
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 223
tical to three decimal places. All optimization algorithms took about the
same time. In addition to reporting estimates, t-statistics and marginal
probabilities, the printing routine for tobit estimation reports the gradient
at the solution, which might provide useful information if the user wishes to
change convergence tolerances.
elapsed_time = 1.1433
Tobit Regression Estimates
Dependent Variable = y
R-squared = 0.8028
Rbar-squared = 0.8008
sigma^2 = 0.8144
Log-Likelihood = -33.80814
# iterations = 13
optimization = bfgs
Nobs, Nvars = 100, 2
# of censored = 81
***************************************************************
gradient at solution
Variable Gradient
x1 0.00214371
x2 -0.00233626
sigma 0.00148077
Variable Coefficient t-statistic t-probability
x1 -2.240217 -4.913982 0.000004
x2 -2.194446 -6.957942 0.000000
elapsed_time = 0.6400
Tobit Regression Estimates
Dependent Variable = y
R-squared = 0.8028
Rbar-squared = 0.8008
sigma^2 = 0.8144
Log-Likelihood = -33.80814
# iterations = 13
optimization = bhhh
Nobs, Nvars = 100, 2
# of censored = 81
***************************************************************
gradient at solution
Variable Gradient
x1 0.00214371
x2 -0.00233626
sigma 0.00148077
Variable Coefficient t-statistic t-probability
x1 -2.240217 -4.913982 0.000004
x2 -2.194446 -6.957942 0.000000
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 224
elapsed_time = 0.6229
Tobit Regression Estimates
Dependent Variable = y
R-squared = 0.8028
Rbar-squared = 0.8008
sigma^2 = 0.8144
Log-Likelihood = -33.80814
# iterations = 13
optimization = dfp
Nobs, Nvars = 100, 2
# of censored = 81
***************************************************************
gradient at solution
Variable Gradient
x1 0.00214371
x2 -0.00233626
sigma 0.00148077
Variable Coefficient t-statistic t-probability
x1 -2.240217 -4.913982 0.000004
x2 -2.194446 -6.957942 0.000000
y = X B + E, E = N(0,sige*V),
V = diag(v1,v2,...vn), r/vi = ID chi(r)/r, r = Gamma(m,k)
B = N(c,T), sige = gamma(nu,d0)
----------------------------------------------------------------
USAGE: result = tobit_g(y,x,ndraw,nomit,prior,start)
where: y = nobs x 1 independent variable vector
x = nobs x nvar explanatory variables matrix
ndraw = # of draws
nomit = # of initial draws omitted for burn-in
prior = a structure variable for prior information input
prior.beta, prior means for beta, c above (default=0)
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 225
The maximum likelihood and Gibbs sampling estimates are very similar,
as they should be, and would lead to similar inferences. As in the case of
the Gibbs sampling probit model, the value of the Bayesian tobit model lies
in its ability to incorporate subjective prior information and to deal with
cases where outliers or non-constant variance exist in the model.
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 226
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 227
and points out that maximum likelihood estimates are problematical in this
circumstance. Studies relying on a single dummy variable or one with group-
wise heteroscedasticity indicate that heteroscedasticity presents a serious
problem for maximum likelihood estimation. An approach to solving this
problem is to replace the constant variance term σ with σi , where specifica-
tion of a particular model for σi needs to be made by the investigator. This
of course complicates the task of maximizing the likelihood function.
The Bayesian approach introduced by Geweke (1993) implemented in
the function tobit g eliminates the need to specify the form of the non-
constant variance and accommodates the case of outliers as well as non-
constant variance. In addition, the estimated parameters vi based on the
mean of Gibbs draws (representing the mean of posterior distribution for
these parameters) can be used to draw inferences regarding the nature of
the non-constant variance.
.::UdecomBooks::.
Chapter 7 Appendix
The maximum likelihood logit, mlogit, probit and tobit estimation functions
as well as the Gibbs sampling functions probit g and tobit g discussed in this
chapter are part of the regression function library in subdirectory regress.
228
.::UdecomBooks::.
CHAPTER 7. LIMITED DEPENDENT VARIABLE MODELS 229
The spatial autoregressive versions of the probit and tobit models men-
tioned in the chapter summary are in the spatial econometrics function
library in subdirectory spatial.
.::UdecomBooks::.
Chapter 8
Simultaneous Equation
Models
Ct = α + βYt + ε (8.1)
Where Ct represents aggregate consumption at time t and Yt is disposable
income at time t.
Because of the national income accounting identity in this simple model:
Yt ≡ Ct + It , we cannot plausibly argue that the variable Yt is fixed in re-
peated samples. The accounting identity implies that every time Ct changes,
so will the variable Yt .
A solution to the ‘simultaneity problem’ is to rely on two-stage least-
squares estimation rather than ordinary least-squares. The function tsls will
230
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 231
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 232
Note that the MATLAB function tsls requires that the user supply a
matrix of variables that will be used as instruments during the first-stage
regression. In example 8.1 all exogenous variables in the system of equations
are used as instrumental variables, a necessary condition for two-stage least-
squares to produce consistent estimates.
Least-squares and two-stage least-squares estimates from example 8.1 are
presented below. The least-squares estimates exhibit the simultaneity bias
by producing estimates that are significantly different from the true values
of β = 1, used to generate the dependent variable y2. In contrast, the two-
stage least-squares estimates are much closer to the true values of unity.
The results from the example also illustrate that the simultaneity bias tends
to center on the parameter estimates associated with the constant term and
the right-hand-side endogenous variable y1. The coefficient estimates for
x2 from least-squares and two-stage least-squares are remarkably similar.
Ordinary Least-squares Estimates
Dependent Variable = y2-eqn
R-squared = 0.9144
Rbar-squared = 0.9136
sigma^2 = 0.5186
Durbin-Watson = 2.1629
Nobs, Nvars = 200, 3
***************************************************************
Variable Coefficient t-statistic t-probability
y1 variable 1.554894 42.144421 0.000000
constant 0.462866 7.494077 0.000000
x2 variable 0.937300 18.154273 0.000000
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 233
R-squared = 0.7986
Rbar-squared = 0.7966
sigma^2 = 1.2205
Durbin-Watson = 2.0078
Nobs, Nvars = 200, 3
***************************************************************
Variable Coefficient t-statistic t-probability
y1 variable 0.952439 10.968835 0.000000
constant 1.031016 9.100701 0.000000
x2 variable 0.937037 11.830380 0.000000
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 234
After error checking on the input arguments, the function simply forms
the matrices needed to carry out two-stage least-squares estimation of the
parameters (e.g., Green, 1997). Given parameter estimates, the usual sum-
mary statistics measuring fit and dispersion of the parameters are calculated
and added to the results structure variable that is returned by the function.
Of course, a corresponding set of code to carry out printing the results was
added to the prt reg function, which is called by the wrapper function prt.
As a final example that clearly demonstrates the nature of inconsistent
estimates, consider example 8.2 where a Monte Carlo experiment is carried
out to compare least-squares and two-stage least-squares over 100 runs. In
the program code for example 8.2, we rely on the utility function mprint
described in Chapter 3, to produce a tabular print-out of our results with
row and column labels. Another point to note regarding the code is that we
dimension our matrices to store the coefficient estimates as (100,3), where we
have 3 parameter estimates and 100 Monte Carlo runs. This facilitates using
the MATLAB functions mean and std to compute means and standard
deviations of the resulting estimates. Recall these functions work down the
columns of matrices to compute averages and standard deviations of each
column in the input matrix.
% ----- Example 8.2 Monte Carlo study of ols() vs. tsls()
nobs = 200;
x1 = randn(nobs,1); x2 = randn(nobs,1);
b1 = 1.0; b2 = 1.0; iota = ones(nobs,1);
y1 = zeros(nobs,1); y2 = zeros(nobs,1);
evec = randn(nobs,1);
% create simultaneously determined variables y1,y2
for i=1:nobs;
y1(i,1) = iota(i,1) + x1(i,1)*b1 + evec(i,1);
y2(i,1) = iota(i,1) + y1(i,1) + x2(i,1)*b2 + evec(i,1);
end;
% use all exogenous in the system as instruments
xall = [iota x1 x2];
niter = 100; % number of Monte Carlo loops
bols = zeros(niter,3); % storage for ols results
b2sls = zeros(niter,3); % storage for 2sls results
disp(’patience -- doing 100 2sls regressions’);
for iter=1:niter; % do Monte Carlo looping
y1 = zeros(nobs,1); y2 = zeros(nobs,1); evec = randn(nobs,1);
% create simultaneously determined variables y1,y2
for i=1:nobs;
y1(i,1) = iota(i,1)*1.0 + x1(i,1)*b1 + evec(i,1);
y2(i,1) = iota(i,1)*1.0 + y1(i,1)*1.0 + x2(i,1)*b2 + evec(i,1);
end;
result1 = ols(y2,[y1 iota x2]); % do ols regression
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 235
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 236
indicates the equation structure of the entire system as well as all of the
associated variable vectors. This problem was solved using MATLAB struc-
ture variables. An additional challenge is that the results structure must
return results for all equations in the system, which we also solve using a
MATLAB structure variable, as in the case of vector autoregressive models
discussed in Chapter 5.
The documentation for the thsls function is:
PURPOSE: computes Three-Stage Least-squares Regression
for a model with neqs-equations
---------------------------------------------------
USAGE: results = thsls(neqs,y,Y,X)
where:
neqs = # of equations
y = an ’eq’ structure containing dependent variables
e.g. y(1).eq = y1; y(2).eq = y2; y(3).eq = y3;
Y = an ’eq’ structure containing RHS endogenous
e.g. Y(1).eq = []; Y(2).eq = [y1 y3]; Y(3).eq = y2;
X = an ’eq’ structure containing exogenous/lagged endogenous
e.g. X(1).eq = [iota x1 x2];
X(2).eq = [iota x1];
X(3).eq = [iota x1 x2 x3];
---------------------------------------------------
NOTE: X(i), i=1,...,G should include a constant vector
if you want one in the equation
---------------------------------------------------
RETURNS a structure:
result.meth = ’thsls’
result(eq).beta = bhat for each equation
result(eq).tstat = tstat for each equation
result(eq).tprob = tprobs for each equation
result(eq).resid = residuals for each equation
result(eq).yhat = yhats for each equation
result(eq).y = y for each equation
result(eq).rsqr = r-squared for each equation
result(eq).rbar = r-squared adj for each equation
result(eq).nvar = nvar in each equation
result(eq).sige = e’e/nobs for each equation
result(eq).dw = Durbin-Watson
result.nobs = nobs
result.neqs = neqs
result.sigma = sig(i,j) across equations
result.ccor = correlation of residuals across equations
--------------------------------------------------
SEE ALSO: prt, prt_eqs, plt
---------------------------------------------------
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 237
3. The function does not check for identification, that is the user’s re-
sponsibility.
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 238
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 239
***************************************************************
Variable Coefficient t-statistic t-probability
constant 9.985283 100.821086 0.000000
x1 var 1.101060 9.621838 0.000000
Cross-equation correlations
equation y1-LHS y2-LHS y3-LHS
y1-LHS 1.0000 0.1439 -0.1181
y2-LHS 0.1439 1.0000 0.6731
y3-LHS -0.1181 0.6731 1.0000
There is also a function plt eqs that is called by the wrapper function
plt to produce graphs of the actual versus predicted and residuals for each
equation of the system. The graphs are produced in a ‘for loop’ with a
pause between the graphs for each equation in the system. A virtue of the
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 240
tt=1:nobs;
clf;
cnt = 1;
for j=1:neqs;
nvar = results(j).nvar;
subplot(2,1,1), plot(tt,results(j).y,’-’,tt,results(j).yhat,’--’);
if nflag == 1
title([upper(results(1).meth), ’ Act vs. Predicted ’,vnames(cnt,:)]);
else
title([upper(results(1).meth), ’ Act vs. Predicted eq ’,num2str(j)]);
end;
subplot(2,1,2), plot(tt,results(j).resid)
cnt = cnt+nvar+1;
pause;
end;
function resid=olse(y,x)
% PURPOSE: OLS regression returning only residual vector
%---------------------------------------------------
% USAGE: residual = olse(y,x)
% where: y = dependent variable vector (nobs x 1)
% x = independent variables matrix (nobs x nvar)
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 241
%---------------------------------------------------
% RETURNS: the residual vector
%---------------------------------------------------
if (nargin ~= 2); error(’Wrong # of arguments to olse’); end;
beta = x\y;
resid = y - x*beta;
The documentation for the sur function is shown below with the infor-
mation regarding the results structure returned by the function omitted to
save space.
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 242
field ‘crit’ allows input of the convergence criterion. The latter scalar input
represents the change in the sum of the absolute value of the β̂ estimates
from iteration to iteration. When the estimates change by less than the
‘crit’ value from one iteration to the next, convergence has been achieved
and iteration stops.
As an example, consider the Grunfeld investment model, where we have
annual investment time-series covering the period 1935-54 for five firms.
In addition, market value of the firm and desired capital stock serve as
explanatory variables in the model. The explanatory variables represent
market value lagged one year and a constructed desired capital stock variable
that is also lagged one year. See Theil (1971) for an explanation of this
model.
Example 8.4 estimates the Grunfeld model using the sur function with
and without iteration.
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 243
vname = strvcat(vname1,vname2,vname3,vname4,vname5);
prt(result,vname); % print results for iteration
prt(result2,vname);% print results for no iteration
The results indicate that iteration does not make a great deal of different
in the resulting estimates. To conserve on space, we present comparative
results for a two equation model with General Electric and Westinghouse
for which results are reported in Theil (1971).
Cross-equation correlations
equation I gen electr I westinghouse
I gen electr 1.0000 0.7730
I westinghouse 0.7730 1.0000
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 244
Cross-equation correlations
equation I gen electr I westinghouse
I gen electr 1.0000 0.7650
I westinghouse 0.7650 1.0000
.::UdecomBooks::.
CHAPTER 8. SIMULTANEOUS EQUATION MODELS 245
plotting results by embedding the previous code in a for-loop over the equa-
tions in the model.
.::UdecomBooks::.
Chapter 8 Appendix
The tsls, thsls and sur estimation functions discussed in this chapter are
part of the regression function library in the subdirectory regress.
246
.::UdecomBooks::.
Chapter 9
4. F-distribution (fdis)
247
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 248
There are also some related more special purpose functions. These are
shown in the list below:
unif rnd - generates uniform draws between upper and lower limits.
Section 9.1 demonstrates use of the pdf, cdf, inv, rnd statistical distri-
bution functions and the specialized functions are taken up in Section 9.2.
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 249
The results from running the example 9.1 program are displayed below
and the plots produced by the program are shown in Figure 9.1.
mean should = 0.66666667
mean of draws = 0.66721498
variance should = 0.01388889
variance of draws = 0.01426892
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 250
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
pdf of beta(10,5)
1
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
cdf of beta(10,5)
1
0.8
0.6
0.4
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
quantiles of beta(10,5)
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 251
P
ically, Σ−1 ∼ Wm (S −1 (θ), n − m + 1), where S(θ) = ni=1 εij εij , (Box and
Tiao, 1992).
An example from Tanner (1991) provides an applied illustration of how
this distribution arises in the context of Gibbs sampling. Data on 30 young
rats growth measured over a span of five weeks were analyzed in Gelfand,
Hills, Racine-Poon and Smith (1990). Assuming a linear model for individ-
ual rats growth curves:
α0 , β0 = N (γ, C)
Σ−1 = W (ρR)−1 , ρ)
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 252
P
p(σ 2 | . . .) = IG{(n + ν0 )/2, (1/2)[ i (Yi − Xi θi )0 (Yi − Xi θi ) + ν0 τ0 ]}
Another set of specialized functions, nmlt rnd and nmrt rnd were used
to produce left- and right-truncated normal draws when Gibbs sampling
estimates for the probit and tobit models. Example 9.3 shows the use of
these functions and produces a series of three histograms based on draws
from the truncated normal distributions. It should be noted that one can
implement these function by simply drawing from a normal distribution and
rejecting draws that don’t meet the truncation restrictions. This however is
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 253
not very efficient and tends to produce a very slow routine. The functions
nmlt rnd and nmrt rnd are based on FORTRAN code that implements
an efficient method described in Geweke (1991).
% ----- Example 9.3 Left- and right-truncated normal draws
n = 1000; x = zeros(n,1);
% generate from -infinity < 0
for i=1:n; x(i,1) = nmrt_rnd(0); end;
subplot(3,1,1), hist(x,20); xlabel(’right-truncated at zero normal’);
% generate from 1 < infinity
for i=1:n; x(i,1) = nmlt_rnd(1); end;
subplot(3,1,2), hist(x,20); xlabel(’left-truncated at +1 normal’);
% generate from -1 < +infinity
for i=1:n; x(i,1) = nmlt_rnd(-1); end;
subplot(3,1,3), hist(x,20); xlabel(’left-truncated at -1 normal’);
1500
1000
500
0
-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0
right-truncated at zero normal
3000
2000
1000
0
1 1.5 2 2.5 3 3.5 4 4.5 5
left-truncated at +1 normal
1500
1000
500
0
-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
left-truncated at -1 normal
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 254
The timing results from running the program are shown below, indicating
that for the case of truncation at -3, the simple rejection approach took over
80 times as long as the nmrt rnd function to produce 10 draws. Similarly,
for the truncation limit at -2, rejection sampling took 7 times as long as
nmrt rnd. For truncation limits ranging from -1 to 3 rejection sampling
was equal or up to 3 times faster than the nmrt rnd function.
Rejection sampling versus nmrt_rnd function
(time in seconds)
truncation nmrt_rnd rejection
at
-3 0.0763 6.2548
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 255
-2 0.0326 0.2305
-1 0.0340 0.0301
0 0.0321 0.0125
1 0.0244 0.0091
2 0.0248 0.0081
3 0.0294 0.0092
0.4
contaminated
normal
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 256
.::UdecomBooks::.
Chapter 9 Appendix
257
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 258
.::UdecomBooks::.
CHAPTER 9. DISTRIBUTION FUNCTIONS LIBRARY 259
.::UdecomBooks::.
Chapter 10
Optimization functions
library
260
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 261
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 262
√
associated with values like, λ = 1/2, or λ = 2. Some of the functional
forms associated with values for λ1 , λ2 are:
ln yi = β0 + ln Xi β + εi , λ1 = 0, λ2 = 0 (10.1)
yi = β0 + ln Xi β + εi , λ1 = 1, λ2 = 0 (10.2)
yi = β0 + Xi β + εi , λ1 = 1, λ2 = 1 (10.3)
X
n
L(λ|y, X) = const + (λ − 1) ln yi − (n/2) ln σ̂ 2 (λ) (10.4)
i=1
where:
To minimize the log of this likelihood function with respect to the param-
eter λ we can use the MATLAB function fmin that implements a simplex
optimization procedure. This function also allows us to set lower and upper
limits on the parameter λ to which the simplex search will be constrained.
It is often the case that, values of −2 ≤ λ ≤ 2 are thought to represent
a reasonable range of feasible values for the parameter λ in the Box-Cox
model.
Our first task in using fmin is to write a log-likelihood function that
evaluates the concentrated log-likelihood for any value of the parameter λ
and returns a scalar value equal to the negative of the log-likelihood function.
(Minimizing the negative log-likelihood is equivalent to maximizing the log-
likelihood.) This function is shown below:
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 263
The function relies on another function boxc trans to carry out the
Box-Cox data transformation. It also contains an argument ‘model’ that
allows for a case (‘model=0’) where the y variable alone is transformed and
another case (‘model=1’) where both y and the X variables are transformed.
The function boxc trans is:
function z = boxc_trans(x,lam)
% PURPOSE: compute box-cox transformation
%----------------------------------------------------
% USAGE: bdata = boxc_trans(data,lam)
% where: lam = scalar transformation parameter
% data = matrix nobs x k
%----------------------------------------------------
% RETURNS: bdata = data matrix box-cox transformed
[n k] = size(x); z = zeros(n,k); iota = ones(n,1);
for i=1:k;
if lam ~= 0, z(:,i) = (x(:,i).^lam - iota)/lam;
else, z(:,i) = log(x(:,i)); end;
end;
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 264
This call is made by a Box-Cox regression function box cox which al-
lows the user to input a sample data vector y and matrix X as well as lower
and upper limits on the parameter λ, a flag to indicate whether the transfor-
mation should be applied to just y or both the y and X data, and optional
optimization options. The function documentation is:
PURPOSE: box-cox regression using a single scalar transformation
parameter for both y and (optionally) x
-----------------------------------------
USAGE: results = box_cox(y,x,lam,lam_lo,lam_up,model,foptions)
where: y = dependent variable vector
x = explanatory variables matrix
(intercept vector in 1st column --- if desired)
lam_lo = scalar, lower limit for simplex search
lam_up = scalar, upper limit for simplex search
model = 0 for y-transform only
= 1 for both y, and x-transform
foptions = (optional) a 4x1 vector of optimization information
foptions(1) = flag to display intermediate results while working
foptions(2) = convergence for simplex (default = 1e-4)
foptions(3) = convergence for function value (default = 1e-4)
foptions(4) = maximum number of iterations (default = 500)
-----------------------------------------
RETURNS: a structure:
results.meth = ’boxcox’
results.beta = bhat estimates
results.lam = lamda estimate
results.tstat = t-stats for bhat
results.yhat = yhat (box-cox transformed)
results.resid = residuals
results.sige = e’*e/(n-k)
results.rsqr = rsquared
results.rbar = rbar-squared
results.nobs = nobs
results.nvar = nvars
results.y = y data vector (box-cox transformed)
results.iter = # of iterations
results.like = -log likelihood function value
--------------------------------------------------
NOTE: uses MATLAB simplex function fmin
--------------------------------------------------
SEE ALSO: prt(results), plt(results)
---------------------------------------------------
The function box cox is responsible for checking that the input data are
positive (as required by the Box-Cox regression procedure) and providing
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 265
default optimization options for the user if they are not input. The function
also constructs estimates of β̂, σ̂ 2 , t−statistics, R−squared, predicted values,
etc., based on the maximum likelihood estimate of the parameter λ. These
values are returned in a structure variable suitable for use with the prt reg
function for printing results and plt reg function for plotting actual vs.
predicted values and residuals.
Example 10.1 demonstrates use of the function box cox. In the example,
two models are generated. One model involves a y-transformation only such
that the estimated value of λ should equal zero. This is accomplished by
transforming a typical regression generated y-variable using the exponential
function, and carrying out the Box-Cox regression on the transformed y =
exp(y). The second model performs no transformation on y or X, so the
estimated value of λ should equal unity.
% --- Example 10.1 Simplex max likelihood estimation for Box-Cox model
% generate box-cox model data
n = 100; k = 2; kp1 = k+1;
x = abs(randn(n,k)) + ones(n,k)*10;
btrue = ones(k,1); epsil = 0.2*randn(n,1); x = [ones(n,1) x];
y = 10*x(:,1) + x(:,2:k+1)*btrue + epsil;
ycheck = find(y > 0); % ensure y-positive
if length(ycheck) ~= n,error(’all y-values must be positive’); end;
yt = exp(y); % should produce lambda = 0 estimate
model = 0; % transform only y-variable
result = box_cox(yt,x,-2,2,model); prt(result);
model = 1; % transform both y,x variables
xcheck = find(x > 0);
if length(xcheck) ~= n*kp1, error(’all x-values must be positive’); end;
yt = y; xt = x; % should produce lambda=1 estimate
result = box_cox(yt,xt,-2,2,model); prt(result); plt(result);
The printed output from the box cox function indicates λ estimates
near zero for the first regression and unity for the second. The parameter
estimates are also reasonably close to the values used to generate the data.
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 266
y = ρW y + Xβ + ε (10.6)
This model involves a cross-section of observations made at various points
in space. The parameter ρ is a coefficient on the spatially lagged dependent
variable W y, that reflects the influence of neighboring observations on varia-
tion in individual elements of the dependent variable y. The model is called
a first order spatial autoregression because its represents a spatial analogy to
the first order autoregressive model from time series analysis, yt = ρyt−1 +εt .
The matrix W represents a standardized spatial contiguity matrix such that
the row-sums are unity. The product W y produces an explanatory variable
equal to the mean of observations from contiguous areas.
Anselin (1988) provides the theoretical information matrix for this model
along with a concentrated likelihood function that depends on the single
parameter ρ. Using the theoretical information matrix, we can construct a
dispersion estimate and associated t−statistic for the important spatial lag
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 267
where λmin and λmax are the minimum and maximum eigenvalues of the
standardized spatial weight matrix W . We can impose this restriction using
fmin as already demonstrated.
Given a maximum likelihood estimate for ρ, we can compute estimates
for β, σ 2 , and construct the theoretical information matrix which we use to
produce a variance-covariance matrix for the estimates. The following code
fragment from the function sar provides an example of this approach.
It should be noted that a similar approach could be taken with the Box-
Cox model to produce an estimate for the dispersion of the transformation
parameter λ. Fomby, Hill and Johnson (1984) provide the theoretical infor-
mation matrix for the Box-Cox model.
Summarizing our discussion of the MATLAB function fmin, we should
not have a great deal of difficulty since single parameter optimization prob-
lems should be easy to solve. The ability to impose restrictions on the
domain of the parameter over which we are optimizing is often useful (or re-
quired) given theoretical knowledge of the likelihood function, or the nature
of the problem being solved. Despite the seeming simplicity of a univariate
likelihood function that depends on a single parameter, a number of prob-
lems in econometrics take this form once other parameters are concentrated
out of the likelihood function. The greatest problem with this approach to
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 268
yt = Xβ + ut (10.7)
ut = ρut−1 + ε (10.8)
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 269
A regression function ols ar1 performs error checking on the user input,
establishes optimization options for the user, and supplies starting values
based on the Cochrane-Orcutt estimation function olsc. The code to carry
out Cochrane-Orcutt estimation, set up optimization options, and call fmins
looks as follows:
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 270
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 271
5. Go to step 1, and replace the initial values with the new values of
βi , σi .
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 272
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 273
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 274
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 275
end;
results.rsqr1 = 1 - sigu1/y1s; results.rsqr2 = 1 - sigu2/y2s;
like = log((1-lamda).*f1 + lamda.*f2); results.like = sum(like);
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 276
R-squared = 0.9934
sigma^2 = 2.4245
Nobs, Nvars = 100, 3
***************************************
Variable Coefficient t-statistic t-probability
x2_1 4.78698864 19.75899413 0.00000000
x2_2 4.67524699 21.70327247 0.00000000
x2_3 5.26189140 24.83553712 0.00000000
Switching equation
Conv criterion = 0.00097950
# iterations = 53
# obs regime 1 = 51
# obs regime 2 = 49
log Likelihood = -136.7960
Nobs, Nvars = 100, 3
***************************************
Variable Coefficient t-statistic t-probability
x3_1 1.04653177 10.97136877 0.00000000
x3_2 0.96811598 8.97270638 0.00000000
x3_3 1.04817888 10.45553196 0.00000000
The estimated parameters are close to the true values of one and five
for the two regimes as are the parameters of unity used in the switching
equation. Figure 10.1 shows the actual y-values versus predictions. The plot
shows predictions classified into regimes 1 and 2 based on the probabilities
for each regime > 0.5. The results structure returned by switch em returns
predicted values for all n observations, but the graph only shows predicted
values based on classification by probabilities for the two regimes.
Summarizing the EM approach to optimization, we require a likelihood
function that can be maximized once some missing data values or parameters
are replaced by their expectations. We then loop through an expectation-
maximization sequence where the expected values of the parameters or miss-
ing sample data are substituted to produce a full-data likelihood function.
The remaining parameters or observations are computed based on maxi-
mizing the likelihood function and these estimates are used to produce new
expectations for the missing sample data or parameters. This process con-
tinues until convergence in the parameter estimates. A number of estimation
problems have been structured in a form amenable to EM optimization. For
example, Shumway and Stoffer (1982) provide an EM algorithm for estima-
tion time-series state space models, van Norden and Schaller (1993) provide
an EM algorithm for estimation of the Markov transition regime-switching
model of Hamilton (1989). McMillen (1992) sets forth an EM approach to
estimating spatial autoregressive logit/probit models.
An interesting point is that for most cases where an EM approach to es-
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 277
10
-10
-20
0 10 20 30 40 50 60 70 80 90 100
Estimated probabilities
1
0.8
0.6
regime1
0.4 regime2
0.2
0
0 10 20 30 40 50 60 70 80 90 100
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 278
1. The frpr min, dfp min, pow min functions begin by setting the
hessian matrix to the identity and building up the hessian estimate
with updating. This avoids inversion of the hessian matrix which can
become non-positive definite in some problems. The maxlik function
inverts the hessian matrix but relies on the function invpd to force
non-positive definite matrices to be positive definite prior to inver-
sion. (The small eigenvalues of the non-positive definite matrix are
arbitrarily augmented to accomplish this.)
2. The frpr min, dfp min, pow min functions use a function linmin
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 279
that can encounter problems, simply print an error message and quit.
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 280
xb = x*beta;
llf1 = -(y-xb).^2./(2*sigma) - .5*log(2*pi*sigma);
xbs = xb./sqrt(sigma); cdf = .5*(1+erf(xbs./sqrt(2)));
llf2 = log(h+(1-cdf));
llf = (y > 0).*llf1 + (y <= 0).*llf2;
like = -sum(llf);% scalar result
Now we can create a function to read the data and call these optimization
functions to solve the problem. The documentation for all of the optimiza-
tion functions take a similar form, so they can be used interchangeably when
attempting to solve maximum likelihood problems. The documentation for
dfp min is shown below. There are four required input arguments, the
function name given as a string, a vector of starting parameter values, a
structure variable containing optimization options, and a variable argument
list of optional arguments that will be passed to the function. In this case,
the variable argument list contains the data vector for y and data matrix X.
Other optimization options can be input as fields in the structure variable,
and these differ depending on the optimization function being called. For
example, the dfp min function allows the user to provide optional argu-
ments in the structure fields for the maximum number of iterations, a flag
for printing intermediate results and a convergence tolerance.
Example 10.3 shows a program that calls five of the optimization func-
tions to solve the tobit estimation problem. A comparison of the time
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 281
taken by each optimization function, the estimates produced, and the log-
likelihood function values and the hessian matrices evaluated at the solution
are presented.
% --- Example 10.3 Maximum likelihood estimation of the Tobit model
n=200; k=5; randn(’seed’,20201); x = randn(n,k); beta = ones(k,1);
y = x*beta + randn(n,1); % generate uncensored data
% now censor the data
for i=1:n, if y(i,1) < 0, y(i,1) = 0.0; end; end;
% use ols for starting values
res = ols(y,x); b = res.beta; sige = res.sige;
parm = [b
sige]; % starting values
info.maxit = 100;
% solve using frpr_min routine
tic; [parm1,like1,hess1,niter1] = frpr_min(’to_liked’,parm,info,y,x);
disp(’time taken by frpr routine’); toc;
% solve using dfp_min routine
tic; [parm2,like2,hess2,niter2] = dfp_min(’to_liked’,parm,info,y,x);
disp(’time taken by dfp routine’); toc;
% solve using pow_min routine
tic; [parm3,like3,hess3,niter3] = pow_min(’to_liked’,parm,info,y,x);
disp(’time taken by powell routine’); toc;
% solve using maxlik routine
tic; [parm4,like4,hess4,grad,niter4,fail] = maxlik(’to_liked’,parm,info,y,x);
disp(’time taken by maxlik routine’); toc;
% solve using minz routine
infoz.call = ’other’;
infoz.maxit = 500;
infoz.prt = 0;
tic;
[parm5,infoz,stat] = minz(’to_liked’,parm,infoz,y,x);
disp(’time taken by minz routine’); toc;
niter5 = stat.iter;
like5 = stat.f;
hess5 = stat.Hi;
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 282
The output from example 10.3 is shown below, where we see that all
four functions found nearly identical results. A primary difference was in
the time needed by the alternative optimization methods.
comparison of # of iterations
fprf dfp powell maxlik minz
8 7 5 10 11
comparison of hessians
fprf hessian
137.15 -6.66 -30.32 -2.11 2.97 -2.50
-6.66 136.03 -17.15 -4.10 -5.99 -2.13
-30.32 -17.15 126.68 -7.98 -10.56 -5.10
-2.11 -4.10 -7.98 126.65 -4.78 -5.57
2.97 -5.99 -10.56 -4.78 145.19 -8.04
-2.50 -2.13 -5.10 -5.57 -8.04 71.06
dfp hessian
137.12 -6.66 -30.31 -2.11 2.97 -2.52
-6.66 136.00 -17.15 -4.10 -5.98 -2.11
-30.31 -17.15 126.65 -7.97 -10.57 -5.14
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 283
function [x,f,hessn,gradn,niter]=optsolv(fun,x,info,varargin)
PURPOSE: a modified version of Shor’s r-algorithm to minimize func
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 284
---------------------------------------------------
USAGE: [x,f,hess,gradn,niter] = optsolv(func,b,info)
Where: func = likelihood function to minimize (<=36 characters long)
b = parameter vector fed to func
info = a structure with fields:
info.maxit = maximum # of iterations (default = 1000)
info.btol = b tolerance for convergence (default = 1e-7)
info.ftol = func tolerance for convergence (default = 1e-7)
info.pflag = 1 for printing iterations, 0 for no printing
varargain = arguments passed to the function
---------------------------------------------------
RETURNS: x = (kx1) minimizing vector
f = value of func at solution values
hessn = hessian evaluated at the solution values
gradn = gradient evaluated at the solution
niter = # number of iterations
---------------------------------------------------
NOTE: - func must take the form func(b,P0,P1,...)
where: b = parameter vector (k x 1)
P0,P1,... = arguments passed to the function
---------------------------------------------------
if ~isstruct(info)
error(’optsolv: options should be in a structure variable’);
end;
% default options
options=[-1,1.e-4,1.e-6,1000,0,1.e-8,2.5,1e-11];
app = 1; % no user supplied gradients
constr = 0; % unconstrained problem
% parse options
fields = fieldnames(info);
nf = length(fields); xcheck = 0; ycheck = 0;
for i=1:nf
if strcmp(fields{i},’maxit’), options(4) = info.maxit;
elseif strcmp(fields{i},’btol’), options(2) = info.btol;
elseif strcmp(fields{i},’ftol’), options(3) = info.ftol;
elseif strcmp(fields{i},’pflag’), options(5) = info.pflag;
end;
end;
funfcn = fcnchk(funfcn,length(varargin));
We simply feed the user-input options directly into the options vector used
by the function.
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 285
which do not supply the y and X data arguments required by our function.
The modified function apprgrdn was written to accept a call with a variable
arguments list.
g=apprgrdn(parm,f,funfcn,deltaparm,varargin{:});
The third step involves changing likelihood function calls from inside
the apprgrdn function The calls to the likelihood function in the original
apprgrdn function were in the form:
fi=feval(fun,b);
These were modified with the code shown below. First we modify the func-
tion to rely on the MATLAB fcnchk function to provide a string argument
for the function name. Next, we call the likelihood function using the string
‘funfcn’ containing its name, a vector of parameters ‘x0’, and the variable
argument list ‘varargin:’.
funfcn = fcnchk(funfcn,length(varargin));
fi=feval(funfcn,x0,varargin{:});
A fourth step relates to the fact that many optimization routines crafted
for non-econometric purposes do not provide a hessian argument. This can
be remedied with the function hessian from the optimization function li-
brary. It evaluates the hessian at a given parameter vector using a likelihood
function argument. The documentation for hessian is:
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 286
Notice that it was designed to work with likelihood functions matching the
format of those in the optimization function library. We can produce a
hessian matrix evaluated at the solution vector with a call to hessian, for
optimization algorithms like solvopt that don’t provide this information.
An example of using this new optimization routine is shown below, where
we compare the solution times and results with the maxlik function demon-
strated previously.
.::UdecomBooks::.
CHAPTER 10. OPTIMIZATION FUNCTIONS LIBRARY 287
The results from example 10.4 are shown below, indicating that the
solvopt function might be a valuable addition to the optimization function
library.
time taken by optsolv routine = 7.4620
time taken by maxlik routine = 1.1209
comparison of estimates
optsolv maxlik
1.082 1.082
1.145 1.145
1.168 1.168
1.021 1.021
1.155 1.155
0.867 0.867
comparison of # of iterations
optsolv maxlik
23 10
comparison of hessians
optsolv
145.137 -5.641 -6.786 7.562 -14.985 -7.246
-5.641 125.331 -11.404 -4.000 -16.000 -1.483
-6.786 -11.404 145.101 31.777 -10.786 -0.818
7.562 -4.000 31.777 129.980 -28.289 -1.575
-14.985 -16.000 -10.786 -28.289 152.888 -1.994
-7.246 -1.483 -0.818 -1.575 -1.994 76.805
maxlik
145.145 -5.641 -6.787 7.562 -14.985 -7.248
-5.641 125.339 -11.405 -4.000 -16.001 -1.486
-6.787 -11.405 145.110 31.779 -10.787 -0.815
7.562 -4.000 31.779 129.989 -28.291 -1.575
-14.985 -16.001 -10.787 -28.291 152.898 -1.985
-7.248 -1.486 -0.815 -1.575 -1.985 76.824
.::UdecomBooks::.
Chapter 10 Appendix
dfp_min - Davidson-Fletcher-Powell
frpr_min - Fletcher-Reeves-Polak-Ribiere
maxlik - general all-purpose optimization routine
minz - general purpose optimziation routine
pow_min - Powell conjugate gradient
optsolv - yet another general purpose optimization routine
288
.::UdecomBooks::.
Chapter 11
289
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 290
0 1 1 0 0
1 0 1 0 0
W = 1 1 0 1 0 (11.1)
0 0 1 0 1
0 0 0 1 0
Information regarding first-order contiguity is recorded for each obser-
vation as ones for areas that are neighbors (e.g., observations 2 and 3 are
neighbors to 1) and zeros for those that are not (e.g., observations 4 and
5 are not neighbors to 1). By convention, zeros are placed on the main
diagonal of the spatial weight matrix.
For the case of our 3,107 county sample, this matrix would be sparse
since the largest number of neighbors to any county is 8 and the average
number of neighbors is 4. A great many of the elements in the contiguity
matrix W are zero, meeting the definition of a sparse matrix.
To understand how sparse matrix algorithms conserve on storage space
and computer memory, consider that we need only record the non-zero ele-
ments of a sparse matrix for storage. Since these represent a small fraction
of the total 3107x3107 = 9,653,449 elements in the weight matrix, we save
a trememdous amount of computer memory. In fact for our example of the
3,107 U.S. counties, only 12,429 non-zero elements were found in the first-
order spatial contiguity matrix, representing a very small fraction (about
0.4 percent) of the total elements.
MATLAB provides a function sparse that can be used to construct a
large sparse matrix by simply indicating the row and column positions of
non-zero elements and the value of the matrix element for these non-zero
row and column elements. Continuing with our example, we can store the
first-order contiguity matrix in a single data file containing 12,429 rows with
3 columns that take the form:
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 291
500
1000
1500
2000
2500
3000
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 292
ss = ford(:,3);
clear ford; % clear out the matrix to save RAM memory
W = sparse(ii,jj,ss,3107,3107);
clear ii; clear jj; clear ss; % clear out these vectors to save memory
spy(W);
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 293
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 294
10
15
20
25
30
35
40
45
W
W*W*W
50
0 5 10 15 20 25 30 35 40 45 50
nz = 1165
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 295
is provided in terms of both floating point operations (flops) and the time
required.
From the results presented below we see that the minimum degree or-
dering accounts for the greatly reduced number of floating point opera-
tions during sparse matrix multiplication. The time is slightly faster for the
straightforward sparse matrix multiplication because the internal ordering
is faster than our explicit ordering.
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 296
original un-ordered
0
500
1000
1500
2000
2500
3000
0 1000 2000 3000
nz = 12428
500
1000
1500
2000
2500
3000
0 1000 2000 3000
nz = 12428
Figure 11.3: Minimum degree ordering versus unordered Pace and Barry
matrix
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 297
rsum = sum(win’);
wout = zeros(n1,n1);
for i=1:n1
wout(i,:) = win(i,:)/rsum(1,i);
end;
This code is inefficient because we need to allocate space for wout, which
could be a very large but sparse matrix containing millions of elements, most
of which are zero. Rather than allocate this memory, we can rely on the find
command to extract only non-zero elements on which the standardization
operates as shown below.
y = ρW1 y + ε (11.2)
ε ∼ N (0, σ In ) 2
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 298
spatial analogy to the first order autoregressive model from time series anal-
ysis, yt = ρyt−1 + εt , where total reliance is on the past period observations
to explain variation in yt .
It is conventional to standardize the spatial weight matrix W so that
the row sums are unity and to put the vector of sample observations y
in deviations from the means form to eliminate the intercept term from
the model. Anselin (1988) shows that ordinary least-squares estimation of
this model will produce biased and inconsistent estimates. Beacuse of this,
a maximum likelihood approach can be used to find an estimate of the
parameter ρ using the likelihood function shown in (11.3).
1 1
L(y|ρ, σ 2 ) = |In − ρW | exp{− (y − ρW y)0 (y − ρW y)} (11.3)
2πσ 2 (n/2) 2σ 2
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 299
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 300
use more execution time. Some experimentation on my part with the vari-
ous options that can be set has led me to believe this is an optimal setting
for this type of model. The command sparse informs MATLAB that the
matrix W is sparse and the command speye creates an identity matrix
in sparse format. We set up an initial matrix based on (In − 0.1W ) from
which we construct a column vector of minimum degree permutations for
this sparse matrix. By executing the lu command with this vector, we man-
age to operate on a sparser set of LU factors than if we operated on the
matrix z = (I − ρW ).
Given this function to evaluate the log likelihood for very large spatial
weight matrices W , we can now rely on the same fmin simplex optimization
algorithm demonstrated in Chapter 10. Another place where we can rely
on sparse matrix functions is in determining the minimum and maximum
eigenvalues of the matrix W . We will use these values to set the feasible
range for ρ in our call to the simplex search function fmin. The code for
carrying this out is:
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 301
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 302
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 303
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 304
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 305
The code that we execute at the outset in our function far g to compute
determinant values over a grid of ρ values is shown below. After finding the
minimum and maximum eigenvalues using our approach from the previous
section, we define a grid based on increments of 0.01 over these values and
evaluate the determinant over this grid.
Note that we save the values of the determinant alongside the associated
values of ρ in a 2-column matrix named detval. We will simply pass this
matrix to the conditional distribution function c far which is shown below:
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 306
% ---------------------------------------------------
i1 = find(detval(:,2) <= rho + 0.005);
i2 = find(detval(:,2) <= rho - 0.005);
i1 = max(i1); i2 = max(i2);
index = round((i1+i2)/2);
detm = detval(index,1); n = length(y);
z = speye(n) - rho*sparse(W);
if nargin == 5, % diffuse prior
epe = (n/2)*log(y’*z’*z*y);
elseif nargin == 7 % informative prior
epe = (n/2)*log(y’*z’*z*y + (rho-c)^2/T);
end;
cout = -epe -(n/2)*log(sige) + detm;
In the function c far, we find the determinant value that is closest to the
ρ value for which we are evaluating the conditional distribution. This is very
fast in comparison to calculating the determinant. Since we need to carry
out a large number of draws, this approach works better than computing
determinants for every draw.
Another point is that we allow for the case of a normally distributed
informative prior on the parameter ρ in the model, which changes the con-
ditional distribution slightly. The documentation for our function far g is
shown below.
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 307
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 308
We also present the maximum likelihood results for comparison with the
Gibbs sampling results. If there is no substantial heterogeneity in the distur-
bance, the two sets of estimates should be similar, as we saw in Chapter 6.
From the results, we see that the estimates are similar, suggesting a lack of
heterogeneity that would lead to different estimated values for ρ and σ.
Note that the time needed to produce 1100 draws was around 378 sec-
onds, making this estimation method competitive with the maximum likeli-
hood approach which took around 100 seconds.
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 309
.::UdecomBooks::.
Chapter 11 Appendix
310
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 311
.::UdecomBooks::.
CHAPTER 11. HANDLING SPARSE MATRICES 312
.::UdecomBooks::.
References
313
.::UdecomBooks::.
REFERENCES 314
Dempster, A.P. N.M. Laird and D.B. Rubin. 1977. “Maximum like-
lihood from incomplete data via the EM algorithm,” Journal of the
Royal Statistical Society, Series B, Vol. 39, pp. 1-38.
Dickey, David A., Dennis W. Jansen and Daniel L. Thornton. 1991.
“A primer on cointegration with an application to money and in-
come,” Federal Reserve Bulletin, Federal Reserve Bank of St. Louis,
March/April, pp. 58-78.
Doan, Thomas, Robert. B. Litterman, and Christopher A. Sims. 1984.
“Forecasting and conditional projections using realistic prior distribu-
tions,” Econometric Reviews, Vol. 3, pp. 1-100.
Engle, Robert F. and Clive W.J. Granger. 1987. “Co-integration
and error Correction: Representation, Estimation and Testing,” Eco-
noemtrica, Vol. 55, pp. 251-76.
Estrella, Artuto. 1998. “A new measure of fit for equations with di-
chotmous dependent variable”, Journal of Business & Economic Statis-
tics, Vol. 16, no. 2, pp. 198-205.
Fomby, Thomas B., R. Carter Hill and Stanley R. Johnson. 1984.
Advanced Econometric Methods, (Springer Verlag: New York).
Geman, S., and D. Geman. 1984. “Stochastic relaxation, Gibbs distri-
butions, and the Bayesian restoration of images,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, Vol. 6, pp. 721-741.
Gelfand, Alan E., and A.F.M Smith. 1990. “Sampling-Based Ap-
proaches to Calculating Marginal Densities”, Journal of the American
Statistical Association, Vol. 85, pp. 398-409.
Gelfand, Alan E., Susan E. Hills, Amy Racine-Poon and Adrian F.M.
Smith. 1990. “Illustration of Bayesian Inference in Normal Data Mod-
els Using Gibbs Sampling”, Journal of the American Statistical Asso-
ciation, Vol. 85, pp. 972-985.
Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin.
1995. Bayesian Data Analysis, (London: Chapman & Hall).
Geweke, John. 1991. “Efficient Simulation from the Multivariate nor-
mal and Student-t Distributions Subject to Linear Constraints,” in
Computing Science and Statistics: Proceedings of the Twenty-Third
Symposium on the Interface.
.::UdecomBooks::.
REFERENCES 315
Hoerl A.E., R.W. Kennard and K.F. Baldwin, 1975. “Ridge Regres-
sion: Some Simulations,” Communications in Statistics, A, Vol. 4, pp.
105-23.
.::UdecomBooks::.
REFERENCES 316
.::UdecomBooks::.
REFERENCES 317
Raftery, Adrian E., and Steven M. Lewis. 1992a. “How many iter-
ations in the Gibbs sampler?”, in Bayesian Statistics, Vol. 4, J.M.
Bernardo, J.O. Berger, A.P. Dawid and A.F.M Smith, eds. (Oxford
University Press: Oxford, pp. 763-773.)
Raftery, Adrian E., and Steven M. Lewis. 1992b. “One long run
with diagnostics: Implementation strategies for Markov chain Monte
Carlo”, Statistical Science, Vol. 7, pp. 493-497.
.::UdecomBooks::.
REFERENCES 318
Raftery, Adrian E., and Steven M. Lewis. 1995. “The number of it-
erations, convergence diagnostics and generic Metropolis algorithms”,
forthcoming in Practical Markov Chain Monte Carlo, W.R. Gilks, D.J.
Spiegelhalter and S. Richardson, eds. (Chapman and Hall: London).
Raftery, Adrian E., David Madigan and Jennifer A. Hoeting. 1997.
“Bayesian model averaging for linear regression models,”, Journal of
the American Statistical Association, Vol. 92, pp. 179-191.
Shoesmith, Gary L. 1992. “Cointegration, error Correction and Im-
proved Regional VAR Forecasting,” Journal of Forecasting, Vol. 11,
pp. 91-109.
Shoesmith, Gary L. 1995. “Multiple Cointegrating Vectors, error Cor-
rection, and Litterman’s Model” International Journal of Forecasting,
Vol. 11, pp. 557-567.
Simon, S.D., and J.P. LeSage, 1988a. “The Impact of Collinearity In-
volving the Intercept Term on the Numerical Accuracy of Regression,”
Computational Statistics in Economics and Management Science, Vol.
1 no. 2, pp. 137-152.
Simon, S.D., and J.P. LeSage, 1988b. “Benchmarking Numerical Ac-
curacy of Statistical Algorithms,” with Stephen D. Simon, Computa-
tional Statistics and Data Analysis, Vol. 7, pp. 197-209.
Sims, Christopher A. 1980. “Macroeconomics and Reality,” Econo-
metrica Vol. 48, pp. 1-48.
Smith, A.F.M and G.O. Roberts (1992). “Bayesian Statistics without
Tears: A Sampling-Resampling Perspective”, The American Statisti-
cian, Vol. 46, pp. 84-88.
Spector, L., and M. Mazzeo. 1980. “Probit Analysis and Economic
Education.” Journal of Economic Education, Vol. 11, pp. 37-44.
Tanner, Martin A. 1991. Tools for Statistical Inference, (Springer-
Verlag: New York).
Theil, Henri., 1971. Principles of Econometrics, (John Wiley & Sons:
New York).
Theil, Henri and Arthur S. Goldberger. 1961. “On Pure and Mixed
Statistical Estimation in Economics,” International Economic Review,
Vol. 2, 65-78.
.::UdecomBooks::.
REFERENCES 319
Wampler, R.H., 1980. “Test Procedures and Test Problems for Least-
Squares Algorithms,” Journal of Econometrics, Vol. 12, pp. 3-22.
.::UdecomBooks::.
Appendix: Toolbox functions
C:\matlab\toolbox\econ
320
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 321
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 322
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 323
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 324
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 325
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 326
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 327
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 328
co-integration library
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 329
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 330
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 331
dfp_min - Davidson-Fletcher-Powell
frpr_min - Fletcher-Reeves-Polak-Ribiere
maxlik - general all-purpose optimization routine
pow_min - Powell conjugate gradient
solvopt - yet another general purpose optimization routine
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 332
.::UdecomBooks::.
APPENDIX: TOOLBOX FUNCTIONS 333
.::UdecomBooks::.