Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Engle 1982

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United

Kingdom Inflation
Author(s): Robert F. Engle
Reviewed work(s):
Source: Econometrica, Vol. 50, No. 4 (Jul., 1982), pp. 987-1007
Published by: The Econometric Society
Stable URL: http://www.jstor.org/stable/1912773 .
Accessed: 12/10/2012 08:11

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

http://www.jstor.org
Econometrica,Vol. 50, No. 4 (July, 1982)

AUTOREGRESSIVE CONDITIONAL HETEROSCEDASTICITY


WITH ESTIMATES OF THE VARIANCE OF
UNITED KINGDOM INFLATION1

BY ROBERT F. ENGLE

Traditional econometric models assume a constant one-period forecast variance. To


generalize this implausible assumption, a new class of stochastic processes called autore-
gressive conditional heteroscedastic (ARCH) processes are introduced in this paper. These
are mean zero, serially uncorrelated processes with nonconstant variances conditional on
the past, but constant unconditional variances. For such processes, the recent past gives
information about the one-period forecast variance.
A regression model is then introduced with disturbances following an ARCH process.
Maximum likelihood estimators are described and a simple scoring iteration formulated.
Ordinary least squares maintains its optimality properties in this set-up, but maximum
likelihood is more efficient. The relative efficiency is calculated and can be infinite. To test
whether the disturbances follow an ARCH process, the Lagrange multiplier procedure is
employed. The test is based simply on the autocorrelation of the squared OLS residuals.
This model is used to estimate the means and variances of inflation in the U.K. The
ARCH effect is found to be significant and the estimated variances increase substantially
during the chaotic seventies.

1. INTRODUCTION

IF A RANDOM VARIABLE Yt is drawn from the conditional density function


f(ly Iyt- 1), the forecast of today's value based upon the past information, under
standard assumptions, is simply E(y Iy,- 1), which depends upon the value of the
conditioning variable Yt- 1. The variance of this one-period forecast is given by
V(yt Iy,- ). Such an expression recognizes that the conditional forecast variance
depends upon past information and may therefore be a random variable. For
conventional econometric models, however, the conditional variance does not
depend upon Yt- 1. This paper will propose a class of models where the variance
does depend upon the past and will argue for their usefulness in economics.
Estimation methods, tests for the presence of such models, and an empirical
example will be presented.
Consider initially the first-order autoregression
Yt = YYt-I + Et

where E is white noise with V(E) = a2. The conditional mean of Yt is yYt-1 while
the unconditional mean is zero. Clearly, the vast improvement in forecasts due to
time-series models stems from the use of the conditional mean. The conditional

'This paper was written while the author was visiting the London School of Economics. He
benefited greatly from many stimulating conversations with David Hendry and helpful suggestions
by Denis Sargan and Andrew Harvey. Special thanks are due Frank Srba who carried out the
computations. Further insightful comments are due to Clive Granger, Tom Rothenberg, Edmond
Malinvaud, Jean-Francois Richard, Wayne Fuller, and two anonymous referees. The research was
supported by NSF SOC 78-09476 and The International Centre for Economics and Related
Disciplines. All errors remain the author's responsibility.
987
988 ROBERT F. ENGLE

variance of Yt is a2 while the unconditional variance is a2/1 -y2. For real


processes one might expect better forecast intervals if additional information
from the past were allowed to affect the forecast variance; a more general class
of models seems desirable.
The standard approach of heteroscedasticity is to introduce an exogenous
variable xt which predicts the variance. With a known zero mean, the model
might be
Yt = EtXt - 1

where again V(e) = a2. The variance of Yt is simply a2x2 I and, therefore, the
forecast interval depends upon the evolution of an exogenous variable. This
standard solution to the problem seems unsatisfactory, as it requires a specifica-
tion of the causes of the changing variance, rather than recognizing that both
conditional means and variances may jointly evolve over time. Perhaps because
of this difficulty, heteroscedasticity corrections are rarely considered in time-
series data.
A model which allows the conditional variance to depend on the past realiza-
tion of the series is the bilinear model described by Granger and Andersen [13].
A simple case is
Yt = EtYtt-lI
The conditional variance is now a Yt_ 1. However, the unconditional variance is
either zero or infinity, which makes this an unattractive formulation, although
slight generalizations avoid this problem.
A preferable model is

Yt =tht1/2,
2
ht= a + aiy
with V(E,)= 1. This is an example of what will be called an autoregressive
conditional heteroscedasticity (ARCH) model. It is not exactly a bilinear model,
but is very close to one. Adding the assumption of normality, it can be more
directly expressed in terms of At, the information set available at time t. Using
conditional densities,
(1) YtI t_-N(O,ht)g

(2) ht a=o + a Yt- 1


The variance function can be expressed more generally as

(3) ht= h(yt- 1,Yt-29 . . .


yt-p, a)
where p is the order of the ARCH process and a is a vector of unknown
parameters.
HETEROSCEDASTICITY 989

The ARCH regression model is obtained by assuming that the mean of Yt is


given as x,i8, a linear combination of lagged endogenous and exogenous variables
included in the information set At-l with 3 a vector of unknown parameters.
Formally,

Yt IAt- -N (xft8,ht),
(4) ht= h (Et- I et -.2 *E, a)

t =Yt - XtI/.
The variance function can be further generalized to include current and lagged
x's as these also enter the information set. The h function then becomes

(S) ht =h (Et- I, ...* 9t-p, xt, xt- I,9 ... ., xtp, a)

or simply

ht= h (.pt-a).

This generalization will not be treated in this paper, but represents a simple
extension of the results. In particular, if the h function factors into

hE(Et-1, . . ., . .. *,
ht= t-p, a)hx(xt, xt-p)

the two types of heteroscedasticity can be dealt with sequentially by first


correcting for the x component and then fitting the ARCH model on the
transformed data.
The ARCH regression model in (4) has a variety of characteristics which make
it attractive for econometric applications. Econometric forecasters have found
that their ability to predict the future varies from one period to another. McNees
[17, p. 52] suggests that, "the inherent uncertainty or randomness associated with
different forecast periods seems to vary widely over time." He also documents
that, "large and small errors tend to cluster together (in contiguous time peri-
ods)." This analysis immediately suggests the usefulness of the ARCH model
where the underlying forecast variance may change over time and is predicted by
past forecast errors. The results presented by McNees also show some serial
correlation during the episodes of large variance.
A second example is found in monetary theory and the theory of finance. By
the simplest assumptions, portfolios of financial assets are held as functions of
the expected means and variances of the rates of return. Any shifts in asset
demand must be associated with changes in expected means and variances of the
rates of return. If the mean is assumed to follow a standard regression or
time-series model, the variance is immediately constrained to be constant over
time. The use of an exogenous variable to explain changes in variance is usually
not appropriate.
990 ROBERT F. ENGLE

A third interpretation is that the ARCH regression model is an approximation


to a more complex regression which has non-ARCH disturbances. The ARCH
specification might then be picking up the effect of variables omitted from the
estimated model. The existence of an ARCH effect would be interpreted as
evidence of misspecification, either by omitted variables or through structural
change. If this is the case, ARCH may be a better approximation to reality than
making standard assumptions about the disturbances, but trying to find the
omitted variable or determine the nature of the structural change would be even
better.
Empirical work using time-series data frequently adopts ad hoc methods to
measure (and allow) shifts in the variance over time. For example, Klein [15]
obtains estimates of variance by constructing the five-period moving variance
about the ten-period moving mean of annual inflation rates. Others, such as
Khan [14], resort to the notion of "variability" rather than variance, and use the
absolute value of the first difference of the inflation rate. Engle [10] compares
these with the ARCH estimates for U.S. data.

2. THE LIKELIHOOD FUNCTION

Suppose y, is generated by an ARCH process described in equations (1) and


(3). The properties of this process can easily be determined by repeated applica-
tion of the relation Ex = E(Ex I4) The mean of y, is zero and all auto-
covariances are zero. The unconditional variance is given by at = Eyt = Eht. For
many functions h and values of a, the variance is independent of t. Under such
conditions, yt is covariance stationary; a set of sufficient conditions for this is
derived below.
Although the process defined by (1) and (3) has all observations conditionally
normally distributed, the vector of y is not jointly normally distributed. The joint
density is the product of all the conditional densities and, therefore, the log
likelihood is the sum of the conditional normal log likelihoods corresponding to
(1) and (3). Let / be the average log likelihood and 1, be the log likelihood of the
tth observation and T the sample size. Then
T
I z
l= i,,
T ,=1
(6)
lt=- 2 log ht,-l y2lht,
~t2'yt/

apart from some constants in the likelihood.


To estimate the unknown parameters a, this likelihood function can be
maximized. The first-order conditions are

(7) aa
1
2h, aa
2h
(HYt 1)
HETEROSCEDASTICITY 991

and the Hessian is

(8) _ h h ~ + [ i- I~~~
) aaaa'=-2h aa aa' h [h -J]aa 2ht aaj
The conditionalexpectationof the second term,given 4'1-m-1' is zero, and of the
last factor in the first, is just one. Hence, the information matrix, which is simply
the negative expectation of the Hessian averaged over all observations, becomes

(9) =aht E aht

which is consistently estimated by

(10) laa T

If the h function is pth order linear (in the squares), so that it can be written as

(11) h= + y2 + * +

then the information matrix and gradient have a particularly simple form. Let
Zt = (1yi2_ *. y2 o * , a,p)so that (11) can be rewrittenas
p) and a' = (ao,a1,
(12) ht = zta.
The gradient then becomes simply

(13) = Yt )

and the estimate of the information matrix

(14) x = I (zztlht

3. DISTRIBUTION OF THE FIRST-ORDER LINEAR ARCH PROCESS

The simplest and often very useful ARCH model is the first-order linear model
given by (1) and (2). A large observation for y will lead to a large variance for the
next period's distribution, but the memory is confined to one period. If a, = 0, of
course y will be Gaussian white noise and if it is a positive number, successive
observations will be dependent through higher-order moments. As shown below,
if a, is too large, the variance of the process will be infinite.
To determine the conditions for the process to be stationary and to find the
marginal distribution of the y's, a recursive argument is required. The odd
992 ROBERT F. ENGLE

moments are immediately seen to be zero by symmetry and the even moments
are computed using the following theorem. In all cases it is assumed that the
process begins indefinitely far in the past with 2r finite initial moments.

THEOREM 1: For integer r, the 2rth moment of a first-order linear ARCH


process with ao > 0, a1 > 0, exists if, and only if,
r

a,1 r (2j - 1) < 1.


j=1

A constructiveexpressionfor the moments is given in the proof.

PROOF: See Appendix.

The theorem is easily used to find the second and fourth moments of a
first-order process. Letting w1= (y4, y2)',

E(wt i t-1))(3ao) + (3a 6aoa)w

The condition for the variance to be finite is simply that a1 < 1, while to have a
finite fourth moment it is also required that 3af < 1. If these conditions are met,
the moments can be computed from (A4) as

F
3ao IF 1- a,
(15) E(w) a= (1 ai)2 IL1-3al J
ao
1-a1

The lower element is the unconditional variance, while the upper product gives
the fourth moment. The first expression in square brackets is three times the
squared variance. For a , 0, the second term is strictly greater than one
implying a fourth moment greater than that of a normal random variable.
The first-order ARCH process generates data with fatter tails than the normal
density. Many statistical procedures have been designed to be robust to large
errors, but to the author's knowledge, none of this literature has made use of the
fact that temporal clustering of outliers can be used to predict their occurrence
and minimize their effects. This is exactly the approach taken by the ARCH
model.

4. GENERAL ARCH PROCESSES

The conditions for a first-order linear ARCH process to have a finite variance
and, therefore, to be covariance stationary can directly be generalized for
pth-order processes.
HETEROSCEDASTICITY 993

THEOREM 2: The pth-order linear ARCH processes, with ao > 0, a,, . ap


> 0, is covariance stationary if, and only if, the associated characteristic equation
has all roots outside the unit circle. The stationary variance is given by E(y72) = ao/
(1 -
J=Iaj).

PROOF: See Appendix.

Although the pth-order linear model is a convenient specification, it is likely


that other formulations of the variance model may be more appropriate for
particular applications. Two simple alternatives are the exponential and absolute
value forms:

(16) ht = exp(ao + al 2tl

(17) ht = ao + allyt_ d.
These provide an interesting contrast. The exponential form has the advantage
that the variance is positive for all values of alpha, but it is not difficult to show
that data generated from such a model have infinite variance for any value of
a,1 &0. The implications of this deserve further study. The absolute value form
requires both parameters to be positive, but can be shown to have finite variance
for any parameter values.
In order to find estimation results which are more general than the linear
model, general conditions on the variance model will be formulated and shown
to be implied for the linear process.
Let (t be a p x 1 random vector drawn from the sample space Z, which has
elements g = ((t- , . . .., -p). For any (t, let (t* be identical, except that the mth
element has been multiplied by - 1, where m lies between 1 and p.

DEFINITION: The ARCH process defined by (1) and (3) is symmetric if


(a) h(t) = h()t* for any m and (,EZ,

(b) ah(tt)/aaj = ah(et* )/aai for any m, i and ,


(c) ah( -)/atm =m-ah((7)/at,__ for any m and (1cZ

All the functions described have been symmetric. This condition is the main
distinction between mean and variance models.
Another characterization of general ARCH models is in terms of regularity
conditions.

DEFINITION: The ARCH model defined by (1) and (3) is regular if


(a) minh((t) > 6 for some 6 > 0 and ,
(b) E(Iah((t )/aajIjah((t )/att-mj 'Pt-m- 1) exists for all i, m, t.
994 ROBERT F. ENGLE

The first portion of the definition is very important and easy to check, as it
requires the variance always to be positive. This eliminates, for example, the
log-log autoregression. The second portion is difficult to check in some cases, yet
should generally be true if the process is stationary with bounded derivatives,
since conditional expectations are finite if unconditional ones are. Condition (b)
is a sufficient condition for the existence of some expectations of the Hessian
used in Theorem 4. Presumably weaker conditions could be found.

THEOREM3: The pth-order linear ARCH model satisfies the regularity condi-
tions, if ao > O and a,, . . ., ap > O.

PROOF: See Appendix.

In the estimation portion of the paper, a very substantial simplification results


if the ARCH process is symmetric and regular.

5. ARCH REGRESSION MODELS

If the ARCH random variables discussed thus far have a non-zero mean,
which can be expressed as a linear combination of exogenous and lagged
dependent variables, then a regression framework is appropriate, and the model
can be written as in (4) or (5). An alternative interpretation for the model is that
the disturbances in a linear regression follow an ARCH process.
In the pth-order linear case, the specification and likelihood are given by

YtIAt- -1 N(x, iht),


h t=ao+aa,1_+.
t -I * +apE2,
_~~~p,tp
(18) Et Yt -Xtfl,
T

Tt = I

-
=1ogh,- 1,21h,

where x, may include lagged dependent and exogenous variables and an irrele-
vant constant has been omitted from the likelihood. This likelihood function can
be maximized with respect to the unknown parameters a and /P. Attractive
methods for computing such an estimate and its properties are discussed below.
Under the assumptions in (18), the ordinary least squares estimator of /P is still
consistent as x and E are uncorrelated through the definition of the regression as
a conditional expectation. If the x's can be treated as fixed constants then the
least squares standard errors will be correct; however, if there are lagged
dependent variables in xt, the standard errors as conventionally computed will
not be consistent, since the squares of the disturbances will be correlated with
HETEROSCEDASTICITY 995

squares of the x's. This is an extension of White's [18] argument on heterosce-


dasticity and it suggests that using his alternative form for the covariance matrix
would give a consistent estimate of the least-squares standard errors.
If the regressors include no lagged dependent variables and the process is
stationary, then letting y and x be the T x 1 and T x K vector and matrix of
dependent and independent variables, respectively,
E(y Ix) = x,8,
(19)
Var(y Ix) = a21
and the Gauss-Markov assumptions are statisfied. Ordinary least squares is the
best linear unbiased estimator for the model in (18) and the variance estimates
are unbiased and consistent. However, maximum likelihood is different and
consequently asymptotically superior; ordinary least squares does not achieve the
Cramer-Rao bound. The maximum-likelihood estimator is nonlinear and is
more efficient than OLS by an amount calculated in Section 6.
The maximum likelihood estimator is found by solving the first order condi-
tions. The derivative with respect to ,B is

(20) as ht + 2h a8/ ht
j

The first term is the familiar first-order condition for an exogenous heterosce-
dastic correction; the second term results because ht is also a function of the /3's,
as in Amemiya [1]. Substituting the linear variance function gives

(21) aA T E[ EtXh;1(E71 )Xte .x1,_,

which can be rewritten approximately by collecting terms in x and e as

(22) = t
LxEt[htl- E hj-

The Hessian is

82 _
x _
h h 7'
= 1.
8228/ x ht 2ht- aj3 3E
-
h,/

_ 2etXt 8ht+(6 -18i + a ah,1


h72 8a/ hVt 2hh 8/3'L
a82
996 ROBERT F. ENGLE

Taking conditional expectations of the Hessian, the last two terms vanish
because ht is entirely a function of the past. Similarly, E2/ht becomes one, since it
is the only current value in the second term. Notice that these results hold
regardless of whether xt includes lagged-dependent variables. The information
matrix is the average over all t of the expected value of the conditional
expectation and is, therefore, given by

(23) 4i8 = { E[E( I I

= 1EEF tx,x + aht aht


T t Lhst 2ht2a: ap

For the pth order linear ARCH regression this is consistently estimated by

(24) =1 T [ h ' +2aj2 2Jxt, xt 1.

By gathering terms in x xt, (24) can be rewritten, except for end effects, as

(25) jI1 = x'xt;hth1 + 2 E7 Jaht7-I

-Tx'xt
T rt2.

In a similar fashion, the off-diagonal blocks of the information matrix can be


expressed as:

(26) fafi= T Et 2h2 aa a8' )

The important result to be shown in Theorem 4 below is that this off-diagonal


block is zero. The implications are far-reaching in that estimation of a and /8 can
be undertaken separately without asymptotic loss of efficiency and their vari-
ances can be calculated separately.

THEOREM 4: If an ARCH regression model is symmetric and regular, then


ja1 = 0.

PROOF: See Appendix.

6. ESTIMATION OF THE ARCH REGRESSION MODEL

Because of the block diagonality of the information matrix, the estimation of a


and /8 can be considered separately without loss of asymptotic efficiency.
HETEROSCEDASTICITY 997

Furthermore, either can be estimated with full efficiency based only on a


consistent estimate of the other. See, for example, Cox and Hinkley [6, p. 308].
The procedure recommended here is to initially estimate /8 by ordinary least
squares, and obtain the residuals. From these residuals, an efficient estimate of a
can be constructed, and based upon these a' estimates, efficient estimates of /3 are
found. The iterations are calculated using the scoring algorithm. Each step for a
parameter vector 0 produces estimates f +1 based on f according to

(27) i1 = i ++ A -
-

where I' and al,//ao are evaluated at O . The advantage of this algorithm is
partly that it requires only first derivatives of the likelihood function in this case
and partly that it uses the statistical properties of the problem to tailor the
algorithm to this application.
For the pth-order linear model, the scoring step for a can be rewritten by
substituting (12), (13), and (14) into (27) and interpretingy as the residuals et7.
The iteration is simply

(28) a i+ I = a i + (z,Fz)- "Ff

where

zt= (1,etU,, ... , et_ )/hti

f= (e2 -h)

f = (f,, ,f? )
In these expressions, et is the residual from iteration i, ht' is the estimated
conditional variance, and a' is the estimate of the vector of unknown parameters
from iteration i. Each step is, therefore, easily constructed from a least-squares
regression on transformed variables. The variance-covariance matrix of the
parameters is consistently estimated by the inverse of the estimate of the
information matrix divided by T, which is simply 2(z-z) '. This differs slightly
from a2(z7- l computed by the auxiliary regression. Asymptotically, a = 2, if
the distributional assumptions are correct, but it is not clear which formulation is
better in practice.
The parameters in a must satisfy some nonnegativity conditions and some
stationarity conditions. These could be imposed via penalty functions or the
parameters could be estimated and checked for conformity. The latter approach
is used here, although a perhaps useful reformulation of the model might employ
squares to impose the nonnegativity constraints directly:

(29) ht =ao+aE1++ ** +a 2c2-P.


998 ROBERT F. ENGLE

Convergence for such an iteration can be formulated in many ways. Following


Belsley [3], a simple criterion is the gradient around the inverse Hessian. For a
parameter vector, 0, this is

(30) 09 = 8- ( l 1 8/

Using 9 as the convergence criterion is attractive, as it provides a natural


normalization and as it is interpretable as the remainder term in a Taylor-series
expansion about the estimated maximum. In any case, substituting the gradient
and estimated information matrix in (30), 9 = R2 of the auxiliary regression.
For a given estimate of a, a scoring step can be computed to improve the
estimate of beta. The scoring algorithm for /8 is

(31) ,i+ 1= Ai+[ al'


Defining xt = xrt and et = etst/rt with x and e as the corresponding matrix and
vector, (31) can be rewritten using (22) and (24) and et for the estimate of Et on
the ith iteration, as

(32) /3i+1 = pi +(x)-

Thus, an ordinary least-squares program can again perform the scoring iteration,
and (x'xZ)- ' from this calculation will be the final variance-covariance matrix of
the maximum likelihood estimates of /8.
Under the conditions of Crowder's [7] theorem for martingales, it can be
established that the maximum likelihood estimators a and /3 are asymptotically
normally distributed with limiting distribution

VT(&-a) -
N(0, 4a
(33)
VT(13-1) D*N(O,IiIl).

7. GAINS IN EFFICIENCY FROM MAXIMUM LIKELIHOOD ESTIMATION

The gain in efficiency from using the maximum-likelihood estimation rather


than OLS has been asserted above. In this section, the gains are calculated for a
special case. Consider the linear stationary ARCH model with p = 1 and all xt
exogenous. This is the case where the Gauss-Markov theorem applies and OLS
has a variance matrix a2(x'x)-1 = EE2(Et x'xt)-1. The stationary variance is
a2 = ao/(l -a,)
The information matrix for this case becomes, from (25),

E[ x x'xt (h7 + 27a1I/ht+i)]


HETEROSCEDASTICITY 999

With x exogenous, the expectation is only necessary over the scale factor.
Because the disturbance process is stationary, the variance-covariance matrix is
proportional to that for OLS and the relative efficiency depends only upon the
scale factors. The relative efficiency of MLE to OLS is, therefore,

R = E(ht-' + 2Et2a2Ih2 )&2

Now substitute ht = ao + a Et_1, a2 ao/I


= - a , and y = a/l - a,. Recogniz-
ing that Et7_ and Et2have the same density, define for each

U= EF(1 -)/ao

The expression for the relative efficiency becomes

l + u2
(34) R= E 2y + 2y2E
R=E(_Y2)(I+_Y

whereu has varianceone and mean zero. From Jensen'sinequality,the expected


value of a reciprocalexceedsthe reciprocalof the expectedvalue and, therefore,
the first term is greaterthan unity. The second is positive, so there is a gain in
efficiency whenever -y#0. Eu-2 is infinite because u2 is conditionally chi
squaredwith one degreeof freedom.Thus, the limit of the relativeefficiencygoes
to infinitywith y:

lim R-* 00.


Y-*oo

For a, close to unity, the gain in efficiency from using a maximumlikelihood


estimatormay be very large.

8. TESTING FOR ARCH DISTURBANCES

In the linear regressionmodel, with or without lagged-dependentvariables,


OLS is the appropriateprocedure if the disturbancesare not conditionally
heteroscedastic.Becausethe ARCH model requiresiterativeprocedures,it may
be desirableto test whetherit is appropriatebeforegoing to the effort to estimate
it. The Lagrangemultipliertest procedureis ideal for this as in many similar
cases. See, for example,Breuschand Pagan [4, 5], Godfrey [12], and Engle [9].
Under the null hypothesis,a = a2 * = ap = 0. The test is based upon the
score under the null and the informationmatrix under the null. Considerthe
ARCH model with ht = h(zta), where h is some differentiablefunction which,
therefore,includesboth the linearand exponentialcases as well as lots of others
and zt = (1, et I,
,et_p) where et are the ordinary least squares residuals.
Under the null, ht is a constant denoted ho. Writingat/la = h'zt', where h' is
1000 ROBERT F. ENGLE

the scalar derivative of h, the score and information can be written as

ai hl
__ ho0_

aa o 2h t (ho 2h-

o I hol 02
au-2t ho Z

and, therefore, the LM test statistic can be consistently estimated by

(35) * = f0'Z (z'z) -Izf

where z' = (z', * * *, f),J? is the column vector of

This is the form used by Breusch and Pagan [4] and Godfrey [12] for testing for
heteroscedasticity. As they point out, all reference to the h function has dis-
appeared and, thus, the test is the same for any h which is a function only of zta.
In this problem, the expectation required in the information matrix could be
evaluated quite simply under the null; this could have superior finite sample
performance. A second simplification, which is appropriate for this model as well
as the heteroscedasticity model, is to note that plim fo'fol T = 2 because normal-
ity has already been assumed. Thus, an asymptotically equivalent statistic would
be

(36) (= TfO'z(zz')-lztf0/f'tf0= TR2

where R2 is the squared multiple correlation between f0 and z. Since adding a


constant and multiplying by a scalar will not change the R 2 of a regression, this
is also the R2 of the regression of et on an intercept and p lagged values of et.
The statistic will be asymptotically distributed as chi square with p degrees of
freedom when the null hypothesis is true.
The test procedure is to run the OLS regression and save the residuals. Regress
the squared residuals on a constant and p lags and test TR 2 as a 2. This will be
an asymptotically locally most powerful test, a characterization it shares with
likelihood ratio and Wald tests. The same test has been proposed by Granger
and Anderson [13] to test for higher moments in bilinear time series.

9. ESTIMATION OF THE VARIANCE OF INFLATION

Economic theory frequently suggests that economic agents respond not only to
the mean, but also to higher moments of economic random variables. In
financial theory, the variance as well as the mean of the rate of return are
determinants of portfolio decisions. In macroeconomics, Lucas [16], for example,
HETEROSCEDASTICITY 1001

argues that the variance of inflation is a determinant of the response to various


shocks. Furthermore, the variance of inflation may be of independent interest as
it is the unanticipated component which is responsible for the bulk of the welfare
loss due to inflation. Friedman [11] also argues that, as high inflation will
generally be associated with high variability of inflation, the statistical relation-
ship between inflation and unemployment should have a positive slope, not a
negative one as in the traditional Phillips curve.
Measuring the variance of inflation over time has presented problems to
various researchers. Khan [14] has used the absolute value of the first difference
of inflation while Klein [15] has used a moving variance around a moving mean.
Each of these approaches makes very simple assumptions about the mean of the
distribution, which are inconsistent with conventional econometric approaches.
The ARCH method allows a conventional regression specification for the mean
function, with a variance which is permitted to change stochastically over the
sample period. For a comparison of several measures for U.S. data, see Engle
[10].
A conventional price equation was estimated using British data from 1958-II
through 1977-II. It was assumed that price inflation followed wage increases;
thus the model is a restricted transfer function.
Letting p be the first difference of the log of the quarterly consumer price
index and w be the log of the quarterly index of manual wage rates, the model
chosen after some experimentation was

(37) P = /1 I + /l2 3-4+ fl3f-5 +/84(P


+/- - w)_ I+ 85-

The model has typical seasonal behavior with the first, fourth, and fifth lags of
the first difference. The lagged value of the real wage is the error correction
mechanism of Davidson, et al. [8], which restricts the lag weights to give a
constant real wage in the long run. As this is a reduced form, the current wage
rate cannot enter.
The least squares estimates of this model are given in Table I. The fit is quite
good, with less than 1 per cent standard error of forecast, and all t statistics
greater than 3. Notice thatp_4 and _5 have equal and opposite signs, suggesting
that it is the acceleration of inflation one year ago which explains much of the
short-run behavior in prices.

TABLE I
LEASTSQUARES
ORDINARY (36)a

Variable p-i p_4 p5 (p-W)_ Const. ao(X 10-6)


al
Coeff. 0.334 0.408 - 0.404 - 0.0559 0.0257 89 0
St. Err. 0.103 0.110 0.114 0.0136 0.00572
t Stat. 3.25 3.72 3.55 4.12 4.49
a Dependent variable p = log(P) - log(P_ I) where P is quarterly U.K. consumer price index. w = log( W)
where W is the U.K. index of manual wage rates. Sample period 1958-1l to 1977-ll.
1002 ROBERT F. ENGLE

To establish the reliability of the model by conventional criteria, it was tested


for serial correlation and for coefficient restrictions. Godfrey's [12] Lagrange
multiplier test, for serial correlation up to sixth order, yields a chi-squared
statistic with 6 degrees of freedom of 4.53, which is not significant, and the
square of Durbin's h is 0.57. Only the 9th autocorrelation of the least squares
residuals exceeds two asymptotic standard errors and, thus, the hypothesis of
white noise disturbances can be accepted. The model was compared with an
unrestricted regression, including all lagged p and w from one quarter through
six. The asymptotic F statistic was 2.04, which is not significant at the 5 per cent
level. When (37) was tested for the exclusion of w _ through w-6, the statistic
was 2.34, which is barely significant at the 5 per cent but not the 2.5 per cent
level. The only variable which enters significantly in either of these regressions is
w-6 and it seems unattractive to include this alone.
The Lagrange multiplier test for a first-order linear ARCH effect for the model
in (37) was not significant. However, testing for a fourth-order linear ARCH
process, the chi-squared statistic with 4 degrees of freedom was 15.2, which is
highly significant. Assuming that agents discount past residuals, a linearly
declining set of weights was formulated to give the model
(38) ht = ao + aI(0.4ELI + 0.3E7_ 2 + 0.2EtU_3 + 0.1E7_4)

which is used in the balance of the paper. A two-parameter variance function


was chosen because it was suspected that the nonnegativity and stationarity
constraints on the a's would be hard to satisfy in an unrestricted model. The
chi-squared test for ai = 0 in (38) was 6.1, which has one degree of freedom.
One step of the scoring algorithm was employed to estimate model (37) and
(38). The scoring step on a was performed first and then, using the new efficient
a, the algorithm obtains in one step, efficient estimates of /3. These are given in
Table II. The procedure was also iterated to convergence by doing three steps on
a, followed by three steps on /3, followed by three more steps on a, and so forth.
Convergence, within 0.1 per cent of the final value, occurred after two sets of a
and /3 steps. These results are given in Table III.
The maximum likelihood estimates differ from the least squares effects primar-
ily in decreasing the sizes of the short-run dynamic coefficients and increasing

TABLE II
MAXIMUMLIKELIHOODESTIMATESOF ARCH MODEL (36) (37)
ONE-STEP SCORINGESTIMATESa

Variable p_l p-4 P-5 (p- w) Const. a0(X106) a,

Coeff. 0.210 0.270 - 0.334 - 0.0697 0.0321 19 0.846


St. Err. 0.110 0.094 0.109 0.0117 0.00498 14 0.243
t Stat. 1.90 2.86 3.06 5.98 6.44 1.32 3.49
a Dependent variable p = log(P) - log(P l) where P is quarterly U.K. consumer price index. w = log( W) where
W is the U.K. index of manual wage rates. Sample period 1958-II to 1977-II.
HETEROSCEDASTICITY 1003

TABLE III
OF ARCH MODEL (36) (37)
MAXIMUMLIKELIHOODESTIMATES
ITERATEDESTIMATESa

Variables p_ p5 ( - _ Const. a0 (X 10-6) a1

Coeff. 0.162 0.264 -0.325 -0.0707 0.0328 14 0.955


St. Err. 0.108 0.0892 0.0987 0.0115 0.00491 8.5 0.298
t Stat. 1.50 2.96 3.29 6.17 6.67 1.56 3.20

'Dependent variable p = log(P) - log(P I) where P is quarterly U.K. consumer price index. w = log(W) where W
is the U.K. index of manual wage rates. Sample period 1958-1l to 1977-ll.

the coefficient on the long run, as incorporated in the error correction mecha-
nism. The acceleration term is not so clearly implied as in the least squares
estimates. These seem reasonable results, since much of the inflationary dynam-
ics are estimated by a period of very severe inflation in the middle seventies.
This, however, is also the period of the largest forecast errors and, hence, the
maximum likelihood estimator will discount these observations. By the end of the
sample period, inflationary levels were rather modest and one might expect that
the maximum likelihood estimates would provide a better forecasting equation.
The standard errors for ordinary least squares are generally greater than for
maximum likelihood. The least squares standard errors are 15 per cent to 25 per
cent greater, with one exception where the standard error actually falls by 5 per
cent to 7 per cent. As mentioned earlier, however, the least squares estimates are
biased when there are lagged dependent variables. The Wald test for a, = 0 is
also significant.
The final estimates of ht are the one-step-ahead forecast variances. For the
one-step scoring estimator, these vary from 23 x 10-6 to 481 x 10-6. That is, the
forecast standard deviation ranges from 0.5 per cent to 2.2 per cent, which is
more than a factor of 4. The average of the ht, since 1974, is 230 x 10-6, as
compared with 42 x 10-6 during the last four years of the sixties. Thus, the
standard deviation of inflation increased from 0.6 per cent to 1.5 per cent over a
few years, as the economy moved from the rather predictable sixties into the
chaotic seventies.
In order to determine whether the confidence intervals arising from the ARCH
model were superior to the least squares model, the outliers were examined. The
expected number of residuals exceeding two (conditional) standard deviations is
3.5. For ordinary least squares, there were 5 while ARCH produced 3. For least
squares these occurred in '74-I, '75-I, '75-II, '75-IV, and '76-II; they all occur
within three years of each other and, in fact, three of them are in the same year.
For the ARCH model, they are much more spread out and only one of the least
squares points remains an outlier, although the others are still large. Examining
the observations exceeding one standard deviation shows similar effects. In the
seventies, there were 13 OLS and 12 ARCH residuals outside one sigma, which
are both above the expected value of 9. In the sixties, there were 6 for OLS, 10
for ARCH and an expected number of 12. Thus, the number of outliers for
1004 ROBERT F. ENGLE

ordinary least squares is reasonable; however, the timing of their occurrence is


far from random. The ARCH model comes closer to truly random residuals after
standardizing for their conditional distributions.
This example illustrates the usefulness of the ARCH model for improving the
performance of a least squares model and for obtaining more realistic forecast
variances.

Universityof California, San Diego

ManuscriptreceivedJuly, 1979; final revisionreceivedJuly, 1981.

APPENDIX
PROOF OF THEOREM 1: Let

(A2) w' = (y2r,y2(r- 1). y2)

First, it is shown that there is an upper triangular r X r matrix A and r x 1 vector b such that

(A2) E(w,41)= b + Aw, 1.

For any zero-mean normal random variable u, with variance c2,

E(u2r)= a2r n (2j- 1).


j=1

Because the conditional distribution of y is normal

(A3) E(y72m )=h2m n (2j-1)


j-l

m
=(aly2 l + ao)m n (2j- 1).

Expanding this expression establishes that the moment is a linear combination of w, 1. Furthermore,
only powers of y less than or equal to 2m are required; therefore, A in (A2) is upper triangular.
Now

E(w, | = b + A (b + Aw,_2)

or in general

E(w, ( + A + A2 + *+ A k- l)b + A kWk


Because the series starts indefinitely far in the past with 2r finite moments, the limit as k goes to
infinity exists if, and only if, all the eigenvalues of A lie within the unit circle.
The limit can be written as
lim E(w, | k)=(l-A)Y'b,
k ooo

which does not depend upon the conditioning variables and does not depend upon t. Hence, this is
an expression for the stationary moments of the unconditional distribution of y.

(A4) E(w,) = (1- A )-'b.


HETEROSCEDASTICITY 1005

It remains only to establish that the condition in the theorem is necessary and sufficient to have all
eigenvalues lie within the unit circle. As the matrix has already been shown to be upper triangular,
the diagonal elements are the eigenvalues. From (A3), it is seen that the diagonal elements are simply
m m
(ema (2j - 1) aI
o(2j I
1- 0,
j=1 j=1

for m-l, . . ., r. If Orexceeds or equals unity, the eigenvalues do not lie in the unit circle. It must
also be shown that if Or< 1, then Om< 1 for all m < r. Notice that 0,, is a product of m factors which
are monotonically increasing. If the mth factor is greater than one, then 0n? l will necessarily be
smaller than Om.If the mth factor is less than one, all the other factors must also be less than one and,
therefore, Om- I must also have all factors less than one and have a value less than one. This
establishes that a necessary and sufficient condition for all diagonal elements to be less than one is
that Or< 1, which is the statement in the theorem. Q.E.D.

PROOF OF THEOREM 2: Let

w, = (y72 y2 1, . . ., y2 P).

Then in terms of the companion matrix A,

(A5) E(w, Iipt- ,) = b + Awt_ l

where b' = (ao, 0 0, ) and

a/, at2 ... ap ?


A= I 0 -- 0 0].
0 1 ... O O
O O ... I 0

Taking successive expectations

E(w,t IAt-k) = (l + A + A2+ *+ Ak- )b + A kwk


Because the series starts indefinitely far in the past with finite variance, if, and only if, all eigenvalues
lie within the unit circle, the limit exists and is given by

(A6) lim E(w, 14| -k) = (l-A)- lb.


k-*oo

As this does not depend upon initial conditions or on t, this vector is the common variance for all t.
As is well known in time series analysis, this condition is equivalent to the condition that all the roots
of the characteristic equation, formed from the a's, lie outside the unit circle. See Anderson [2, p.
177]. Finally, the limit of the first element can be rewritten as

(A7) Ey7 = ao/( I- I aj) Q.E.D.

PROOF OF THEOREM 3: Clearly, under the conditions, h(t,) >?ao0> 0, establishing part (a). Let

41tnit = E(gah(~t )/aalIah((t )/a,t-m I 4t-n- I)

= 2amE(I1t-l21lt_ mIt-_m-_ 1).

Now there are three cases; i > m, i = m, and i < m. If i > m, then ,t & tp,,,I and the
conditional expectation of 1t,-,j is finite, because the conditional density is normal. If i = m, then
the expectation becomes E(t-,,,I3' I -rn-m-). Again, because the conditional density is normal, all
1006 ROBERT F. ENGLE

moments exist including the expectation of the third power of the absolute value. If i < m, the
expectation is taken in two parts, first with respect to t - i - 1:

= 2amE { 'ItmIE(t-, I 4,t-l- l) 'Pt-r-l)

2amE {ltrnmao + 2 ajtj i ) I-r-l}

p
=2amaoE {t- {mI4,r + I a(p+j,m,t
j='i

In the final expression, the initial index on p is larger and, therefore, may fall into either of the
preceding cases, which, therefore, establishes the existence of the term. If there remain terms with
i + j < m, the recursion can be repeated. As all lags are finite, an expression for 0,,mrt can be written
as a constant times the third absolute moment of at-rm conditional on 'Pt-m- I, plus another constant
times the first absolute moment. As these are both conditionally normal, and as the constants must be
finite as they have a finite number of terms, the second part of the regularity condition has been
established. Q. D. E.
To establish Theorem 4, a careful symmetry argument is required, beginning with the following
lemma.

LEMMA: Let u and v be any two random variables. E(g(u, v) I v) will be an anti-symmetricfunction
of v if g is anti-symmetric in v, the conditional density of u I v is symmetric in v, and the expectation
exists.

PROOF:

E( g(u, - v) I-v) =-E( g(u, v) I-v) because g is anti-symmetric in v

= E(g(u, v) Iv) because the conditional density is symmetric.

Q.E.D.

PROOF OF THEOREM 4: The i, j element of l.,B is given by

E E ah,
ah, a,
YltA) 2T (h2 aa,

~2T
E2MEIES
h7 a a,r, "ij by the chain rule.

If the expectation of the term in square brackets, conditional on At-m- m is zero for all i, j, t, m, then
the theorem is proven.

E(h2 aa, ae
Etm
X-ZWI "-} x7,-,,,Eh h2 aa,, ar 'I A-m I

because xJ_ is either exogenous or it is a lagged dependent variable, in which case it is included in

'Pt - rnat
-t h, ah

|E(h2
l 2't-rn- / I < 1( h2 aa,| a, -m| t-r-i

ah| ahIt
-32 aa, aE_ )
HETEROSCEDASTICITY 1007

by part (a) of the regularity conditions and this integral is finite by part (b) of the condition. Hence,
each term is finite. Now take the expectation in two steps, first with respect to 't-m This must
therefore also be finite.

If aht aht + )

By the symmetry assumption, h,-l is symmetric in E___8 /,, is anti-symmetric. Therefore,


the whole expression is anti-symmetric in E,rn, which is part of the conditioning set A,_ Because h
is symmetric, the conditional density must be symmetric in E,_1 and the lemma can be invoked to
show that g(E,-"1) is anti-symmetric.
Finally, taking expectations of g conditional on 'Pt-m- gives zero, because the density of E1-n
conditional on the past is a symmetric (normal) density and the theorem is established. Q.E.D.

REFERENCES
[1] AMEMIYA, T.: "Regression Analysis when the Variance of the Dependent Variable is Propor-
tional to the Square of Its Expectation," Journal of the American Statistical Association,
68(1973), 928-934.
[2] ANDERSON, T. W.: The Statistical Analysis of Time Series. New York: John Wiley and Sons,
1958.
[3] BELSLEY, DAVID: "On the Efficient Computation of Non-Linear Full-Information Maximum
Likelihood Estimator," paper presented to the European Meetings of the Econometric Society,
Athens,, 1979.
[4] BREUSCH, T. S., AND A. R. PAGAN: "A Simple Test for Heteroscedasticity and Random
Coefficient Variation," Econometrica,46(1978), 1287-1294.
[5] : "The Lagrange Multiplier Test and Its Applications to Model Specification," Review of
Economic Studies, 47(1980), 239-254.
[6] Cox, D. R., AND D. V. HINKLEY: Theoretical Statistics. London: Chapman and Hall, 1974.
[7] CROWDER, M. J.: "Maximum Likelihood Estimation for Dependent Observations," Journal of
the Royal Statistical Society, Series B, 38(1976), 45-53.
[8] DAVIDSON, J. E. H., D. F. HENDRY, F. SRBA, AND S. YEO: "Econometric Modelling of the
Aggregate Time-Series Relationship Between Consumers' Expenditure and Income in the
United Kingdom," The Economic Journal, 88(1978), 661-691.
[9] ENGLE, R. F.: "A General Approach to the Construction of Model Diagnostics Based upon the
Lagrange Multiplier Principle," University of California, San Diego Discussion Paper 79-43,
1979.
[10] : "Estimates of the Variance of U.S. Inflation Based on the ARCH Model," University of
California, San Diego Discussion Paper 80-14, 1980.
[11] FRIEDMAN, MILTON: "Nobel Lecture: Inflation and Unemployment," Journal of Political Econ-
omy, 85(1977), 451-472.
[12] GODFREY, L. G.: "Testing Against General Autoregressive and Moving Average Error Models
When the Regressors Include Lagged Dependent Variables," Econometrica, 46(1978), 1293-
1302.
[13] GRANGER, C. W. J., AND A. ANDERSEN: An Introduction to Bilinear Time-Series Models.
Gottingen: Vandenhoeck and Ruprecht, 1978.
[14] KHAN, M. S.: "The Variability of Expectations in Hyperinflations," Journal of Political Economy,
85(1977), 817-827.
[15] KLEIN, B.: "The Demand for Quality-Adjusted Cash Balances: Price Uncertainty in the U.S.
Demand for Money Function," Journal of Political Economy, 85(1977), 692-715.
[16] LUCAS,R. E., JR.: "Some International Evidence on Output-Inflation Tradeoffs," American
Economic Review, 63(1973), 326-334.
[17] McNEES, S. S.: "The Forecasting Record for the 1970's," New England Economic Review,
September/October 1979, 33-53.
[18] WHITE, H.: "A Heteroscedasticity Consistent Covariance Matrix Estimator and a Direct Test for
Heteroscedasticity," Econometrica, 48(1980), 817-838.

You might also like