Engle 1982
Engle 1982
Engle 1982
Kingdom Inflation
Author(s): Robert F. Engle
Reviewed work(s):
Source: Econometrica, Vol. 50, No. 4 (Jul., 1982), pp. 987-1007
Published by: The Econometric Society
Stable URL: http://www.jstor.org/stable/1912773 .
Accessed: 12/10/2012 08:11
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.
http://www.jstor.org
Econometrica,Vol. 50, No. 4 (July, 1982)
BY ROBERT F. ENGLE
1. INTRODUCTION
where E is white noise with V(E) = a2. The conditional mean of Yt is yYt-1 while
the unconditional mean is zero. Clearly, the vast improvement in forecasts due to
time-series models stems from the use of the conditional mean. The conditional
'This paper was written while the author was visiting the London School of Economics. He
benefited greatly from many stimulating conversations with David Hendry and helpful suggestions
by Denis Sargan and Andrew Harvey. Special thanks are due Frank Srba who carried out the
computations. Further insightful comments are due to Clive Granger, Tom Rothenberg, Edmond
Malinvaud, Jean-Francois Richard, Wayne Fuller, and two anonymous referees. The research was
supported by NSF SOC 78-09476 and The International Centre for Economics and Related
Disciplines. All errors remain the author's responsibility.
987
988 ROBERT F. ENGLE
where again V(e) = a2. The variance of Yt is simply a2x2 I and, therefore, the
forecast interval depends upon the evolution of an exogenous variable. This
standard solution to the problem seems unsatisfactory, as it requires a specifica-
tion of the causes of the changing variance, rather than recognizing that both
conditional means and variances may jointly evolve over time. Perhaps because
of this difficulty, heteroscedasticity corrections are rarely considered in time-
series data.
A model which allows the conditional variance to depend on the past realiza-
tion of the series is the bilinear model described by Granger and Andersen [13].
A simple case is
Yt = EtYtt-lI
The conditional variance is now a Yt_ 1. However, the unconditional variance is
either zero or infinity, which makes this an unattractive formulation, although
slight generalizations avoid this problem.
A preferable model is
Yt =tht1/2,
2
ht= a + aiy
with V(E,)= 1. This is an example of what will be called an autoregressive
conditional heteroscedasticity (ARCH) model. It is not exactly a bilinear model,
but is very close to one. Adding the assumption of normality, it can be more
directly expressed in terms of At, the information set available at time t. Using
conditional densities,
(1) YtI t_-N(O,ht)g
Yt IAt- -N (xft8,ht),
(4) ht= h (Et- I et -.2 *E, a)
t =Yt - XtI/.
The variance function can be further generalized to include current and lagged
x's as these also enter the information set. The h function then becomes
or simply
ht= h (.pt-a).
This generalization will not be treated in this paper, but represents a simple
extension of the results. In particular, if the h function factors into
hE(Et-1, . . ., . .. *,
ht= t-p, a)hx(xt, xt-p)
(7) aa
1
2h, aa
2h
(HYt 1)
HETEROSCEDASTICITY 991
(8) _ h h ~ + [ i- I~~~
) aaaa'=-2h aa aa' h [h -J]aa 2ht aaj
The conditionalexpectationof the second term,given 4'1-m-1' is zero, and of the
last factor in the first, is just one. Hence, the information matrix, which is simply
the negative expectation of the Hessian averaged over all observations, becomes
(10) laa T
If the h function is pth order linear (in the squares), so that it can be written as
(11) h= + y2 + * +
then the information matrix and gradient have a particularly simple form. Let
Zt = (1yi2_ *. y2 o * , a,p)so that (11) can be rewrittenas
p) and a' = (ao,a1,
(12) ht = zta.
The gradient then becomes simply
(13) = Yt )
(14) x = I (zztlht
The simplest and often very useful ARCH model is the first-order linear model
given by (1) and (2). A large observation for y will lead to a large variance for the
next period's distribution, but the memory is confined to one period. If a, = 0, of
course y will be Gaussian white noise and if it is a positive number, successive
observations will be dependent through higher-order moments. As shown below,
if a, is too large, the variance of the process will be infinite.
To determine the conditions for the process to be stationary and to find the
marginal distribution of the y's, a recursive argument is required. The odd
992 ROBERT F. ENGLE
moments are immediately seen to be zero by symmetry and the even moments
are computed using the following theorem. In all cases it is assumed that the
process begins indefinitely far in the past with 2r finite initial moments.
The theorem is easily used to find the second and fourth moments of a
first-order process. Letting w1= (y4, y2)',
The condition for the variance to be finite is simply that a1 < 1, while to have a
finite fourth moment it is also required that 3af < 1. If these conditions are met,
the moments can be computed from (A4) as
F
3ao IF 1- a,
(15) E(w) a= (1 ai)2 IL1-3al J
ao
1-a1
The lower element is the unconditional variance, while the upper product gives
the fourth moment. The first expression in square brackets is three times the
squared variance. For a , 0, the second term is strictly greater than one
implying a fourth moment greater than that of a normal random variable.
The first-order ARCH process generates data with fatter tails than the normal
density. Many statistical procedures have been designed to be robust to large
errors, but to the author's knowledge, none of this literature has made use of the
fact that temporal clustering of outliers can be used to predict their occurrence
and minimize their effects. This is exactly the approach taken by the ARCH
model.
The conditions for a first-order linear ARCH process to have a finite variance
and, therefore, to be covariance stationary can directly be generalized for
pth-order processes.
HETEROSCEDASTICITY 993
(17) ht = ao + allyt_ d.
These provide an interesting contrast. The exponential form has the advantage
that the variance is positive for all values of alpha, but it is not difficult to show
that data generated from such a model have infinite variance for any value of
a,1 &0. The implications of this deserve further study. The absolute value form
requires both parameters to be positive, but can be shown to have finite variance
for any parameter values.
In order to find estimation results which are more general than the linear
model, general conditions on the variance model will be formulated and shown
to be implied for the linear process.
Let (t be a p x 1 random vector drawn from the sample space Z, which has
elements g = ((t- , . . .., -p). For any (t, let (t* be identical, except that the mth
element has been multiplied by - 1, where m lies between 1 and p.
All the functions described have been symmetric. This condition is the main
distinction between mean and variance models.
Another characterization of general ARCH models is in terms of regularity
conditions.
The first portion of the definition is very important and easy to check, as it
requires the variance always to be positive. This eliminates, for example, the
log-log autoregression. The second portion is difficult to check in some cases, yet
should generally be true if the process is stationary with bounded derivatives,
since conditional expectations are finite if unconditional ones are. Condition (b)
is a sufficient condition for the existence of some expectations of the Hessian
used in Theorem 4. Presumably weaker conditions could be found.
THEOREM3: The pth-order linear ARCH model satisfies the regularity condi-
tions, if ao > O and a,, . . ., ap > O.
If the ARCH random variables discussed thus far have a non-zero mean,
which can be expressed as a linear combination of exogenous and lagged
dependent variables, then a regression framework is appropriate, and the model
can be written as in (4) or (5). An alternative interpretation for the model is that
the disturbances in a linear regression follow an ARCH process.
In the pth-order linear case, the specification and likelihood are given by
Tt = I
-
=1ogh,- 1,21h,
where x, may include lagged dependent and exogenous variables and an irrele-
vant constant has been omitted from the likelihood. This likelihood function can
be maximized with respect to the unknown parameters a and /P. Attractive
methods for computing such an estimate and its properties are discussed below.
Under the assumptions in (18), the ordinary least squares estimator of /P is still
consistent as x and E are uncorrelated through the definition of the regression as
a conditional expectation. If the x's can be treated as fixed constants then the
least squares standard errors will be correct; however, if there are lagged
dependent variables in xt, the standard errors as conventionally computed will
not be consistent, since the squares of the disturbances will be correlated with
HETEROSCEDASTICITY 995
(20) as ht + 2h a8/ ht
j
The first term is the familiar first-order condition for an exogenous heterosce-
dastic correction; the second term results because ht is also a function of the /3's,
as in Amemiya [1]. Substituting the linear variance function gives
(22) = t
LxEt[htl- E hj-
The Hessian is
82 _
x _
h h 7'
= 1.
8228/ x ht 2ht- aj3 3E
-
h,/
Taking conditional expectations of the Hessian, the last two terms vanish
because ht is entirely a function of the past. Similarly, E2/ht becomes one, since it
is the only current value in the second term. Notice that these results hold
regardless of whether xt includes lagged-dependent variables. The information
matrix is the average over all t of the expected value of the conditional
expectation and is, therefore, given by
For the pth order linear ARCH regression this is consistently estimated by
By gathering terms in x xt, (24) can be rewritten, except for end effects, as
-Tx'xt
T rt2.
(27) i1 = i ++ A -
-
where I' and al,//ao are evaluated at O . The advantage of this algorithm is
partly that it requires only first derivatives of the likelihood function in this case
and partly that it uses the statistical properties of the problem to tailor the
algorithm to this application.
For the pth-order linear model, the scoring step for a can be rewritten by
substituting (12), (13), and (14) into (27) and interpretingy as the residuals et7.
The iteration is simply
where
f= (e2 -h)
f = (f,, ,f? )
In these expressions, et is the residual from iteration i, ht' is the estimated
conditional variance, and a' is the estimate of the vector of unknown parameters
from iteration i. Each step is, therefore, easily constructed from a least-squares
regression on transformed variables. The variance-covariance matrix of the
parameters is consistently estimated by the inverse of the estimate of the
information matrix divided by T, which is simply 2(z-z) '. This differs slightly
from a2(z7- l computed by the auxiliary regression. Asymptotically, a = 2, if
the distributional assumptions are correct, but it is not clear which formulation is
better in practice.
The parameters in a must satisfy some nonnegativity conditions and some
stationarity conditions. These could be imposed via penalty functions or the
parameters could be estimated and checked for conformity. The latter approach
is used here, although a perhaps useful reformulation of the model might employ
squares to impose the nonnegativity constraints directly:
(30) 09 = 8- ( l 1 8/
Thus, an ordinary least-squares program can again perform the scoring iteration,
and (x'xZ)- ' from this calculation will be the final variance-covariance matrix of
the maximum likelihood estimates of /8.
Under the conditions of Crowder's [7] theorem for martingales, it can be
established that the maximum likelihood estimators a and /3 are asymptotically
normally distributed with limiting distribution
VT(&-a) -
N(0, 4a
(33)
VT(13-1) D*N(O,IiIl).
With x exogenous, the expectation is only necessary over the scale factor.
Because the disturbance process is stationary, the variance-covariance matrix is
proportional to that for OLS and the relative efficiency depends only upon the
scale factors. The relative efficiency of MLE to OLS is, therefore,
U= EF(1 -)/ao
l + u2
(34) R= E 2y + 2y2E
R=E(_Y2)(I+_Y
ai hl
__ ho0_
aa o 2h t (ho 2h-
o I hol 02
au-2t ho Z
This is the form used by Breusch and Pagan [4] and Godfrey [12] for testing for
heteroscedasticity. As they point out, all reference to the h function has dis-
appeared and, thus, the test is the same for any h which is a function only of zta.
In this problem, the expectation required in the information matrix could be
evaluated quite simply under the null; this could have superior finite sample
performance. A second simplification, which is appropriate for this model as well
as the heteroscedasticity model, is to note that plim fo'fol T = 2 because normal-
ity has already been assumed. Thus, an asymptotically equivalent statistic would
be
Economic theory frequently suggests that economic agents respond not only to
the mean, but also to higher moments of economic random variables. In
financial theory, the variance as well as the mean of the rate of return are
determinants of portfolio decisions. In macroeconomics, Lucas [16], for example,
HETEROSCEDASTICITY 1001
The model has typical seasonal behavior with the first, fourth, and fifth lags of
the first difference. The lagged value of the real wage is the error correction
mechanism of Davidson, et al. [8], which restricts the lag weights to give a
constant real wage in the long run. As this is a reduced form, the current wage
rate cannot enter.
The least squares estimates of this model are given in Table I. The fit is quite
good, with less than 1 per cent standard error of forecast, and all t statistics
greater than 3. Notice thatp_4 and _5 have equal and opposite signs, suggesting
that it is the acceleration of inflation one year ago which explains much of the
short-run behavior in prices.
TABLE I
LEASTSQUARES
ORDINARY (36)a
TABLE II
MAXIMUMLIKELIHOODESTIMATESOF ARCH MODEL (36) (37)
ONE-STEP SCORINGESTIMATESa
TABLE III
OF ARCH MODEL (36) (37)
MAXIMUMLIKELIHOODESTIMATES
ITERATEDESTIMATESa
'Dependent variable p = log(P) - log(P I) where P is quarterly U.K. consumer price index. w = log(W) where W
is the U.K. index of manual wage rates. Sample period 1958-1l to 1977-ll.
the coefficient on the long run, as incorporated in the error correction mecha-
nism. The acceleration term is not so clearly implied as in the least squares
estimates. These seem reasonable results, since much of the inflationary dynam-
ics are estimated by a period of very severe inflation in the middle seventies.
This, however, is also the period of the largest forecast errors and, hence, the
maximum likelihood estimator will discount these observations. By the end of the
sample period, inflationary levels were rather modest and one might expect that
the maximum likelihood estimates would provide a better forecasting equation.
The standard errors for ordinary least squares are generally greater than for
maximum likelihood. The least squares standard errors are 15 per cent to 25 per
cent greater, with one exception where the standard error actually falls by 5 per
cent to 7 per cent. As mentioned earlier, however, the least squares estimates are
biased when there are lagged dependent variables. The Wald test for a, = 0 is
also significant.
The final estimates of ht are the one-step-ahead forecast variances. For the
one-step scoring estimator, these vary from 23 x 10-6 to 481 x 10-6. That is, the
forecast standard deviation ranges from 0.5 per cent to 2.2 per cent, which is
more than a factor of 4. The average of the ht, since 1974, is 230 x 10-6, as
compared with 42 x 10-6 during the last four years of the sixties. Thus, the
standard deviation of inflation increased from 0.6 per cent to 1.5 per cent over a
few years, as the economy moved from the rather predictable sixties into the
chaotic seventies.
In order to determine whether the confidence intervals arising from the ARCH
model were superior to the least squares model, the outliers were examined. The
expected number of residuals exceeding two (conditional) standard deviations is
3.5. For ordinary least squares, there were 5 while ARCH produced 3. For least
squares these occurred in '74-I, '75-I, '75-II, '75-IV, and '76-II; they all occur
within three years of each other and, in fact, three of them are in the same year.
For the ARCH model, they are much more spread out and only one of the least
squares points remains an outlier, although the others are still large. Examining
the observations exceeding one standard deviation shows similar effects. In the
seventies, there were 13 OLS and 12 ARCH residuals outside one sigma, which
are both above the expected value of 9. In the sixties, there were 6 for OLS, 10
for ARCH and an expected number of 12. Thus, the number of outliers for
1004 ROBERT F. ENGLE
APPENDIX
PROOF OF THEOREM 1: Let
First, it is shown that there is an upper triangular r X r matrix A and r x 1 vector b such that
m
=(aly2 l + ao)m n (2j- 1).
Expanding this expression establishes that the moment is a linear combination of w, 1. Furthermore,
only powers of y less than or equal to 2m are required; therefore, A in (A2) is upper triangular.
Now
E(w, | = b + A (b + Aw,_2)
or in general
which does not depend upon the conditioning variables and does not depend upon t. Hence, this is
an expression for the stationary moments of the unconditional distribution of y.
It remains only to establish that the condition in the theorem is necessary and sufficient to have all
eigenvalues lie within the unit circle. As the matrix has already been shown to be upper triangular,
the diagonal elements are the eigenvalues. From (A3), it is seen that the diagonal elements are simply
m m
(ema (2j - 1) aI
o(2j I
1- 0,
j=1 j=1
for m-l, . . ., r. If Orexceeds or equals unity, the eigenvalues do not lie in the unit circle. It must
also be shown that if Or< 1, then Om< 1 for all m < r. Notice that 0,, is a product of m factors which
are monotonically increasing. If the mth factor is greater than one, then 0n? l will necessarily be
smaller than Om.If the mth factor is less than one, all the other factors must also be less than one and,
therefore, Om- I must also have all factors less than one and have a value less than one. This
establishes that a necessary and sufficient condition for all diagonal elements to be less than one is
that Or< 1, which is the statement in the theorem. Q.E.D.
w, = (y72 y2 1, . . ., y2 P).
As this does not depend upon initial conditions or on t, this vector is the common variance for all t.
As is well known in time series analysis, this condition is equivalent to the condition that all the roots
of the characteristic equation, formed from the a's, lie outside the unit circle. See Anderson [2, p.
177]. Finally, the limit of the first element can be rewritten as
PROOF OF THEOREM 3: Clearly, under the conditions, h(t,) >?ao0> 0, establishing part (a). Let
Now there are three cases; i > m, i = m, and i < m. If i > m, then ,t & tp,,,I and the
conditional expectation of 1t,-,j is finite, because the conditional density is normal. If i = m, then
the expectation becomes E(t-,,,I3' I -rn-m-). Again, because the conditional density is normal, all
1006 ROBERT F. ENGLE
moments exist including the expectation of the third power of the absolute value. If i < m, the
expectation is taken in two parts, first with respect to t - i - 1:
p
=2amaoE {t- {mI4,r + I a(p+j,m,t
j='i
In the final expression, the initial index on p is larger and, therefore, may fall into either of the
preceding cases, which, therefore, establishes the existence of the term. If there remain terms with
i + j < m, the recursion can be repeated. As all lags are finite, an expression for 0,,mrt can be written
as a constant times the third absolute moment of at-rm conditional on 'Pt-m- I, plus another constant
times the first absolute moment. As these are both conditionally normal, and as the constants must be
finite as they have a finite number of terms, the second part of the regularity condition has been
established. Q. D. E.
To establish Theorem 4, a careful symmetry argument is required, beginning with the following
lemma.
LEMMA: Let u and v be any two random variables. E(g(u, v) I v) will be an anti-symmetricfunction
of v if g is anti-symmetric in v, the conditional density of u I v is symmetric in v, and the expectation
exists.
PROOF:
Q.E.D.
E E ah,
ah, a,
YltA) 2T (h2 aa,
~2T
E2MEIES
h7 a a,r, "ij by the chain rule.
If the expectation of the term in square brackets, conditional on At-m- m is zero for all i, j, t, m, then
the theorem is proven.
E(h2 aa, ae
Etm
X-ZWI "-} x7,-,,,Eh h2 aa,, ar 'I A-m I
because xJ_ is either exogenous or it is a lagged dependent variable, in which case it is included in
'Pt - rnat
-t h, ah
|E(h2
l 2't-rn- / I < 1( h2 aa,| a, -m| t-r-i
ah| ahIt
-32 aa, aE_ )
HETEROSCEDASTICITY 1007
by part (a) of the regularity conditions and this integral is finite by part (b) of the condition. Hence,
each term is finite. Now take the expectation in two steps, first with respect to 't-m This must
therefore also be finite.
If aht aht + )
REFERENCES
[1] AMEMIYA, T.: "Regression Analysis when the Variance of the Dependent Variable is Propor-
tional to the Square of Its Expectation," Journal of the American Statistical Association,
68(1973), 928-934.
[2] ANDERSON, T. W.: The Statistical Analysis of Time Series. New York: John Wiley and Sons,
1958.
[3] BELSLEY, DAVID: "On the Efficient Computation of Non-Linear Full-Information Maximum
Likelihood Estimator," paper presented to the European Meetings of the Econometric Society,
Athens,, 1979.
[4] BREUSCH, T. S., AND A. R. PAGAN: "A Simple Test for Heteroscedasticity and Random
Coefficient Variation," Econometrica,46(1978), 1287-1294.
[5] : "The Lagrange Multiplier Test and Its Applications to Model Specification," Review of
Economic Studies, 47(1980), 239-254.
[6] Cox, D. R., AND D. V. HINKLEY: Theoretical Statistics. London: Chapman and Hall, 1974.
[7] CROWDER, M. J.: "Maximum Likelihood Estimation for Dependent Observations," Journal of
the Royal Statistical Society, Series B, 38(1976), 45-53.
[8] DAVIDSON, J. E. H., D. F. HENDRY, F. SRBA, AND S. YEO: "Econometric Modelling of the
Aggregate Time-Series Relationship Between Consumers' Expenditure and Income in the
United Kingdom," The Economic Journal, 88(1978), 661-691.
[9] ENGLE, R. F.: "A General Approach to the Construction of Model Diagnostics Based upon the
Lagrange Multiplier Principle," University of California, San Diego Discussion Paper 79-43,
1979.
[10] : "Estimates of the Variance of U.S. Inflation Based on the ARCH Model," University of
California, San Diego Discussion Paper 80-14, 1980.
[11] FRIEDMAN, MILTON: "Nobel Lecture: Inflation and Unemployment," Journal of Political Econ-
omy, 85(1977), 451-472.
[12] GODFREY, L. G.: "Testing Against General Autoregressive and Moving Average Error Models
When the Regressors Include Lagged Dependent Variables," Econometrica, 46(1978), 1293-
1302.
[13] GRANGER, C. W. J., AND A. ANDERSEN: An Introduction to Bilinear Time-Series Models.
Gottingen: Vandenhoeck and Ruprecht, 1978.
[14] KHAN, M. S.: "The Variability of Expectations in Hyperinflations," Journal of Political Economy,
85(1977), 817-827.
[15] KLEIN, B.: "The Demand for Quality-Adjusted Cash Balances: Price Uncertainty in the U.S.
Demand for Money Function," Journal of Political Economy, 85(1977), 692-715.
[16] LUCAS,R. E., JR.: "Some International Evidence on Output-Inflation Tradeoffs," American
Economic Review, 63(1973), 326-334.
[17] McNEES, S. S.: "The Forecasting Record for the 1970's," New England Economic Review,
September/October 1979, 33-53.
[18] WHITE, H.: "A Heteroscedasticity Consistent Covariance Matrix Estimator and a Direct Test for
Heteroscedasticity," Econometrica, 48(1980), 817-838.