'/
Documento de trabajo
The Likelihood of Multivariate
Garch Models is III-Conditioned
Miguel Jerez
José Casal
()@(jJ(}
Instituto Complutense de Análisis Económico
UNIVERSIDAD COMPLUTENSE
Sonia 50toGa
No.9904
Septiembre 1999
FACULTAD DE ECONOMICAS
Campus de Somos aguas
28223 MADRID
Teléfono 913942611 - FAX 91 2942613
Internet: http://www.ucm.es/info/icae/
E-mail: icaesec@ccee.ucm.es
()@(jJ(}
Instituto Complutense de Análisis Económico
UNIVERSIDAD COMPLUTENSE
THE LIKELmOOD OF MULTIVARlATE GARCH MODEI.S IS ILL-CONDlTIONED
Miguel Jerez
José Casals
Sonia Sotoca
Universidad Complutense de Madrid
Campus de Somosaguas
28223 Madrid
ABSTRACT
The likelihood ofmultivariate GARCH models is ill-conditioned because of two faets. First, financial time
series afien display high correlations, implying that an eigenvalue afthe conditional covariances fluctuates
near the zero boundary. Secand, GARCH models explain conditional covariances in tenns of a linear
combination of delayed squared errors and theu conditional expectation; this functional fonu implies that
the likelihood function is almost flat in the neighborhood of the optimal estimates. Building on this
analysis we propase a linear transformation of data which, not only stabilizes the likelihood computation,
but also provides insight about the statistical properties of data. The use of this transfonnation is illustrated
by modeling the short-nm conditional correlations of four nominal exchange rates,
RESUMEN
La verosimilitud de procesos GARCH multivariantes está mal condicionada por dos causas. En primer
lugar, las series fmancieras a menudo están fuertemente correJadas, lo cual implica que un autovalor de
las matrices de covarianzas condicionales está próximo a cero. En segundo lugar, los modelos GARCH
explican la varianza condicional en términos de errores cuadráticos retardados y de la esperanza
condicional de éstos; esta forma funcional implica que la función de verosimilitud es prácticamente plana
en el entorno de las estimaciones óptimas. A partir de este análisis, proponemos una transformación lineal
de los datos que, no sólo estabiliza el cálculo de la verosimilitud, sino que ayuda a analizar las propiedades
estadísticas de los datos, El uso de esta transformación se ilustra modelizando las correlaciones
condicionales a corto plazo de cuatro tipos de cambio nominales.
Key words: ARCH, GARCH, maximum-likelihood
JEL c1assification: C130. C320. C510.
Mailing address: Departamento de Fundamentos del Análisis Económico n, Universidad Complutense,
Campus de Somosaguas, 28223 Madrid, Spain, E-mail: mjerez@ccee.ucm.es.
e
1.
Introduction.
trial solution where conditional covariances are not positive-definite. In this situation computing the
Since the seminal paper ofEngle (1982) many works describe the volatility offinancial yields using
models with conditional heteroskedastic errors. Univariate models in the ARCH family are useful to
likelihood results in lUlbolUlded or mathematically lUldefined operations.
When both, identificability and high correlation problems occur, a) the likelihood function is almost flat
measure and fareeast the volatility of single assets. While this is important, problems of risk-assessment,
asset-allocation, hedging and options pricing require knowledge afthe properties ofmultivariate series.
Ofien, these properties can be represented adequately by means of a vector GARCH model.
in the neighborhood of the optimal estimates and b) this point is close to the zone of the parametric space
where eovariances are not positive-semidefinite. This situation spells disaster for iterative ML methods,
According with our experience, maximum-likelihood (1v1L) estimation of multivariate GARCH models
afien implies:
Building on this analysis we propose a linear transfonnation of data designed to project the eigenvalues
of conditional covariances far from the zero boundary and to optimize their relative value, This
1)
a high computational cost,
transformation is closely related to principal components and resuIts useful, not only to stabilize the
computation of likelihood, but also to analyze the statistical properties of the sample,
2)
sensitivity afthe estimates to changes in both, the sample and tIte initial conditions ofthe iterative
algoritlun,
grounds. Section 3 describes in detail the problems summarized aboye and discusses its implications.
The structure ofthe paper is as follows. Section 2 states the problem oflikelihood computation on standard
Section 4 defines the stabilizing data transformation and characterizes its properties, Section 5 applies this
3)
4)
frequent iteration on solutions where conditional covariances have negative eigenvalues and, because
ofthis,
non-convergence or convergence to solutions with norlZero gradient. This 'false convergence'
situation happens because many nonlínear algorithms stop when changes in tbe function or
parameter values are considered small enough. In an ill-conditioned case, these heuristic criteria can
be satisfied in solutions with a nonzero gradient.
This paper analyzes the causes ofsuch bad behavior, We conclude that it is due to a) the fact that financial
time series ofien exhibit high unconditional correlations and b) identificability problems derived from the
functional form of GARCH processes. We will refer to these problems as -"high correlations" and
"identificability" ,
data transformation to model tbe short-nm conditional correlations of four nominal exchange rates, Finally,
Section 6 discusses previous results and summarizes the main conclusions.
2, Problem statement and notation.
Consider a (kx 1) random vector Y I which, by means of an econometric model, is decomposed as
Y i '" E¡_/y l ) + el' being Et_¡() the expectation ofthe argument conditional to the information set up to
(-1, 0'-1' In a eonditional heteroskedastie framework, the errors el are such that et - iid(O,:E),
<, I n'_1 - iid(O,1:,).
Assume without loss of generality that Y, '" el' 1fthe conditional covariance :El depends on a vector 8 of
unknown parameters, the minus log gaussian likelihood of a sample of size N is:
Poor identificability is implied by tbe functional form the GARCH modeL 1t explains the conditional
covariance as a fimction of delayed cross-products of eITors and the conditional expectation ofthese cross-
(1)
products. Obviously these variables share much cornmon information and, in the neighborhood of the
optimal estimates, are deemed to be very similar, Therefore, point-estimates ofthe parameters will be
Literature proposes different ways to parametrize :El' Many formulations are eneompassed by the
highly correlated and imprecise. On the other hand, poor identificability does not affeet the eapacity of a
GARCH model to describe and forecast volatility and, except in extreme situations, shouId not
compromise the stability ofl\1L algorithms,
multivariate GARCH(p,q). To avoid unnecessary complications, in the rest ofthe paper we wi11 assume
that p=q= L The vector GARCH(1,I) model is characterized by:
(2)
The issue of high correlations is more critical. It implies that there is at least one eigenvalue of the
unconditional covariance is close to zero. Then, the smallest eigenvalues of eonditional covariances
fluctuate near the zero boundary and, in a context of iterative nonlinear methods, it is easy to iterate on a
where vech(.) denotes the vector-half operator, which staeks the lower triangle of an NxN symmetric
matrixintoa [N(N+l)l2]xl vector.
2
3
a
The following remarks surnmarize sorne features of model (2) that wiil be used in the rest ofthe paper:
3. Sources oC ill-conditioning in likelihood computation.
1)
It has a large number of parameters, even for moderate sizes of k. Many authors worry about this
lack of parsimony and suggest simplifYing assumptions like diagonal structure (Bollerslev el al.
1988) or constant-correlations (Bollerslev, 1990).
2)
The fimctional fonn (2) does not assure the positive-definiteness oí eonditional eovarianees. In faet,
this is a very diffieult condition to impose exeept in drastieally simplified versions of the model.
3)
By definition, the variables in the right-hand-side of (2) are such that:
3.1 High correlations.
Financial time series ofien display high lUlconditional correlations. Sorne explanatíons ofthis empírical
regularity may be a) coromon statistieal features of data, b) conunon factors due to the nature of the series
(e.g. exchange rates are ofien related to a single currency) or e) simultaneous volatility clusters. In tenns
of principal components, high correlations imply that there is at least one quasi-deterministie linear
combination ofthe series, characterized by a small eigenvalue ofthe l.Ulconditional covarianee, In this
situation the smallest eigenvalues of conditional covariances will fluctuate near fue zero boundary.
(3)
Taking into account the fonn ofthe log-likelihood function (1), this situation is dangerous because:
where v. is (conditional and unconditionally) a zero-mean uneorrelated process with a complex
heteroskedasticity (Bollerslev, 1988, pp. 123).
4)
1)
Iterating on a solution é, where :E/(é) has small eigenvalues, may yield floating-point errors
lUlbolUlded results when computing:
1.1) thesequences inlE,(é)I and E,(6)-1 (t= 1, ... ,N)in(I).
1.2) the first and second-order derivatives of (1), which are ftmctions of :Et(é)-l .
2)
If E,(6) has sorne negative eigenvalues, computation of in I E,(B) I (t = 1, ... ,N) result in
Generalizing the univariate result in Bollerslev (1988), the decomposition (3) allows one to express
(2) as a VARMA(l,l) model:
(4)
where L is the lag operator, vt are the innovations defined in (3) and the AR and MA factors are
related to the polynomials in (2) by ti> = A + B and
B, respectively, Ifmodel (2) is such that
the roots of JI - IP).. ¡ = o He outside the Mit circle, then (4) can be written as:
e '"
vech( e,e;) = vech(E) + (1 - <f>L) -1 (I - eL) v,
OI
mathematically undefined operations. Besides, many 1v1L algorithms reIy on the use of Cholesky
decomposition to avoid the explicit inversion of covariance matrices. As Cholesky factors require
these matrices to be positive-definite, negative eigenvalues also induce errors by this way when
computing the function (1) or its derivatives. According to our experience, simple perturbation
teclmiques help to avoid runtime errors, but are not useful ta achieve convergence.
(5)
The following example illustrates the effect of high correlations on the eigenvalues of conditional
where the constant term is the vector-hatf ofthe unconditional covariance:
covariances.
(6)
Unless otherwise indicated we will use the representation (5)-(6), keeping in mind that it is observationally
equivalent to the standard form (2).
Example l. Consider the bivariate GARCH( 1, 1) model expressed in the fonu (5):
e11l
el/el/
,
e
1,
,
0,
,
0,
0]2
+
1 - .97 B
O
O
1 - .90B
O
O
-1
1 - .86B
O
O
O
1 - .80B
1 - .85B
O
O
O
O
Vil
O
V12 /
(7)
1 - .73B v2t
and the lUlconditional covariances:
4
[1.0.8 1..8]0 ; with eigenvalues: A, "" 1.8, A
=
2 ""
5
.2, and
(8)
e
]"[1.0 .1] ; witheigenvalues:)..l
[a; a/,'
a
.1 1.0
012
2
2
(.t
2
0t =w+ae¡_¡+pol_l
= 1.1 and)..2 =.9 .
(ID)
(9)
2
According to (3), the variables in the right-hand-side of(10) are related by:
2
2
et - 1
=°/-1 +Vt _ 1
V¡_l
an uncorrelated, zero-mean heteroskedastic noise. Eqs. (10)-(11) imply that:
(11)
Note that the ratio between the smallest and largest eigenvalues in the first case (Á/ A.1 = .111 ) is much
lower than in the second case (').. /'),,/ = .818). This faet characterizes a (not extreme) ill-conditioned
situation.
being
The example consists of:
1)
The variables in the right-hand-side of(10), e; -1 and a; -1' are such that: EI _2( e; -1)
2)
The tenn vr_1 in (11) can be interpreted as the infonnation in -1 which is not contained in 0;_1'
Then, ifthe infonnation (or variance) of Vt _ 1 is low, it will be difficult to obtain independent
estimates of a and p, whereas sorne linear combination ofthese parameters will be identified.
1)
Obtaining two realizations with N=300 of a bivariate white noise process el' which conditional
covariances are gíven by model (7)-(8) for the frrst series, and model (7)-(9) for the second series.
2)
Computing the sequences of conditional covariances and the corresponding eigenvalues, using the
true value of the parameters.
Figure 1 represents the smallest and highest eigenvalues of :E/(e) in the ill-conditioned case (012 '" .8).
Note that the first sequence fluctuates very close to the zero boundary, being its extreme values min=.O 19
and max=.288. Figure 2 displays the same eigenvalues in the well-conditioned case (012 '" .1). Note that
the sequence of smallest eigenvalues (min=.354, max=.960) is farther from zero than in the previous case.
:;;:
a; -1'
e;
Therefore, the likelihood of (10) is very flat in sorne directions ofthe parametric space. It is difficult to
say when this problem will be important, because the support of V'_I changes in time (Bollerslev, 1988,
pp. 123) so its variance is almost impossible to describe analytically. One may guess that ¡fmodel (10)
shows high persistence - i.e. if a + P .. 1- the parameters will be more identifiable because U;_1 would
be less adaptive to -1 than in a model with less persistence.
e;
The following example illustrates the poor identificability of a GARCH(l ,1) model using sÍmulated data.
[Inser! Figure 1]
[Inser! Figure 2]
The sequences in Figures 1 and 2 have been computed with the true values of the parameters. A sensitivity
analysis reveaIs that small perturbations of the parameters in the ill-conditioned case yield negative
eigenvalues. For example, ifthe MA parameter ofthe covariance equation in (7) is set to .82 instead of
its true value .80, then the sequence of conditional covariances has severa! negative eigenvalues, being the
smallest -0.012. In the well-conditioned case, however, the eigenvalues are much more robusto Therefore,
a nonlinear ML algorithm has a higher risk of iterating 00 a solution with negative eigenvalues when
correlations between the series are high - like those in (8) - than when they are smaIl.
Example 2. Consider 500 samples ofthe process e/ - üd(0,a2), e/ Iq-1
variances following a GARCH( 1,1) model in ARMA foun:
2
2
I-SB
(12)
e =a + - - - v
,
- iidN(O,a;) with conditional
1-<pB'
with a2 == 1, e = .6 and cp =.7. The ML estimates ofthe pararneters in (12), theÍr correlations and fue
corresponding principal components are summarized in Table 1.
[Inser! Table 1]
Note that:
3.2. Poor identificability.
As we said in the Introduction, poor identificability is due to the functional fom of the GARCH model.
To simplify the analysis, we will discuss this problem in a univanate framework. AssUD1e therefore that
Y, "e" e, - iid(O, a'), e, IOH - ¡¡deO, a;). A GARCH(I,I) in the standard fonu (2) is:
1)
Point estimates are close to the true values.
2)
The estímate ofthe unconditional variance is almost orthogonal to the rest ofthe parameters. Ibis
situation is characterized both by a) smal1 correlations of ¡i with <P and é, and b) an eigenvalue
of 1.0 associated with the eigenvector [1 .01 -.1].
6
7
2
3)
Correlation between ~ and El is .98. The highest eigenvalue (1.98) is associated with the
eigenvector [.04 .71 .71 J, showing that the sum ofboth parameters is well identified. On the other
hand, the smallest eigenvalue (.02) is associated with the eigenvector [.05 .71 -.71]. The difference
between both estimates - which is the IX parameter in (lO) - is then ill-identified.
Figure 3 shows fue optimal estimates (represented by a <+' sign) corresponding to a log-likelihood of
= 1.065. The isoquantas are chosen to
represent corrfidence regions for <1> and 6, from a 5% confidence (given by the finer conic section) up to
95% in increments of 10 pereent points. The first three isoquantas are labe1ed with the corresponding
likelihood value. This Figure shows that a) big zones ofthe parametrie space have a likelihood similar to
the optimal and b) confidence regions are wide and, therefore, point-estimates result very uneertain.
4.1, Analytic properties oftbe stabilizing linear transformation.
The following propositions relate the stochastic properties of et" with those of et ·
Praposition l. The unconditional and conditional distributions of el" are:
720.840, and the isoquantas afthe log-likelihood conditional to 6 2
e; - iid(O,!)
(18)
(19)
Proo! The resul! follows immediately from (13)-(17).
[lnsert Figure 3]
Note that the result in (18) implies that the transfonnation defined by (15)-(17) is optimal, as it scales a11
the eigenvalues ofthe lUlconditional eovariance to unity, thus achieving the optimal condition nwnber of
one. An additional advantage is that the transfonned values e,"" have a meaningful statistical interpretation,
4. Stabilizing likelihood compufation.
as standardized principal components of el'
According to previous analysis, let be ef a (kx 1) random vector such that:
Proposition 2. If }jf is such that:
(13)
<, Iat-}
- iid(O ,1:,)
vech(~f)
(14)
then
=
w + A vech( et _le;_I) + B vech( ~t -1)
(20)
Ir follows the GARCH(l,l) motion law:
and consider the linear transformation:
(21)
(15)
where Vis a (kxk) matrix ofreal numbers such that
where:
IVI * o.
(22)
W"=p-1 W
(23)
The problem ofhigh correlations, discussed in Section 3.1, arises when an eigenvalue of l: is relatively
small. Then, the data can be optimally scaled by choosing:
(24)
(16)
(25)
where matrices in the right-hand-side of(16) are given by the eigenvalue-eigenvector decomposition:
(17)
and 8 1 , 8 2 are 0-1 matrices such that, for any symmetric matrix S, vech(S) = 8 1 veceS) and
veceS) '" 8 2 vech(S), beingvec(,)theoperatorwhichstacksthe columns ofan NxN matrixintoa N 2 x 1
vector.
Proa! See Appendix A,
8
Corollary l, Ifthe variance model is expressed in the fonu (5):
9
z
(26)
(30)
the cross-products ofthe transfonned data follow the VARMA model:
(31)
(27)
(32)
where:
where P denotes the sample analogue ofP, see Eq. (25). Finally, compute estimates afthe
(28)
Proposition 3. ~ (el' el' .. , eN)
a sample.
=Q(e;, e;, ."' e;) + ~
conditional covariances using:
(29)
(33)
lag IAl, being QO the minus lag gaussian density of
Expressions (30)-(32) follow irnmediately from Eq. (22)-(25) and (33) follows from (19).
Note that consistency is assured by the Theorero of Slutsky. If ML were employed to
compute the estimates in Step 2.1, Proposition 3 assures thatthe estimates -.P, Á and :B are
asymptotica11y equivalent to ML estimates. 1t also can be applied to compute information
Proa! See Appendix B,
eriteria Of LR statistics.
Note that, replacing (18) by e,* - iid( O, V::E V T), propositions 1 and 2 hold for any choice of V. A general
result analogous to Proposition 3 is easy to derive following the proof in Appendix B, as only the ftnal
simplification relies in the particular choice of V given in (16).
Step 3: If required, compute estimates of the covariances of
w, A and B using the following Proposition:
Proposition 4. If cóv( w*), cóv(Á *) and cóv(B *) are consistent estimates ofthe covariances of
and B", respectively, then the expressions:
4.2. Econometric implementation.
The results in Section 4.1 were derived for the true values ofthe data generating process. Building on
them, the following empírical implementation is straightforward:
w*, A'"
(34)
cov(w) =Pcov(w ')p'
(35)
Step 1: Starting from a sample {et } / ~J,
compute an estímate ofthe unconditional covariance matrix,
t, fue eigenvalue-eigenvector decomposition (17), the matrix V using fue sample analogue of (16) and
...,N'
the transformed series {e/}/,,¡, ... ,N using (15). Specify a GARCH model for e;". We will assume that it
is a GARCH(I,I) in !he fonu (2).
Step 2:
Step 2.1:
Compute consistent estimates for fue parameters in (21),
assure that fue corresponding gradient is small enough.
w. , Á" and B". Ifl\.1L is used,
(36)
provide consistent estimates of the covariances of Ji!,
Aand B.
Proo! Expression (34) follows immediately from (30). Applying fue yecO operator to both sirles of (31)
we obtain:
(37)
•
Step 2.2:
Compute the covariances {:E(""} t -l . .. ,N according to (21). Check fue smallest eigenvalue
to assure that it is positive,
which implies (35). The proof of (36) is analogous to this one.
Step 2,3:
Ifrequired, obtain estimates ofthe parameters in (2) through the expressions:
1his implementation aIlows one to obtain resutts for original data from those corresponding to transfonned
data. The following example illustrates its application.
10
11
...
------------------------------------------
:~-
--
5. Empirical example: short-run alignment of exchange rates.
The anomalous FF excess retwn was corrected using a simple intervention model, see Box and Tiao
It is well known that many exchange rates fluctuate in the same direction and in similar proportions. This
co-movement can be explained by competitive appreciation ar depreciation policies, by intemational
agreements ar just by the faet that aIl the rates are expressed in tenns of a cornmon numeraÍre (afien the
US Dollar) which perfonnance affects them aH.
(1975). TabIe 3 summarizes both, tbe new scaled eigenvector matrix and the Box-Ljung Q statistics of
cross-products of the transformed series. TIris test rejects the null of no conditional heteroskedasticity.
Figure 4 shows the resulting scaled series.
[lnsert Table 3J
Long-tenn comovements can be effeetively measured through sampIe correlations. On the other hand,
short-tenn fluctuations rnay deviate substantially frorn the alignment implied by the long-nm eorrelation
matrix. In this Section we model short-nm comovements of four relevant currencies through the
conditional correlatíons implied by a vector GARCH model.
Consider the spot bid exchange rates ofDeutsche Mark (DM), French Frane (FF), British POlllld (BP) and
Japanese Yen (JY) against US Dollar, observed in the London Market during 695 weeks, from January
1985 to April 1998. The data has been logged, differenced and scaled by a factor of 100, to obtain the
corresponding log pereent yields. Excess retums are then computed by substracting the sample mean.
[lnsert Figure 4J
A standard analysis ofthe scaled series and their cross-produets suggests that a diagonal GARCH(1,l) will
be adequate to capture most ofthe conditional heteroskedasticity. Table 4 summarizes the lv1L estimates
ofthis model, expressed in the VARMA form (5). Note that:
1)
All the parameters are mueh higher than ¡ts standard errors. As the scaled data is not gaussian, this
is onIy informal evidence of statistical significance.
2)
Many AR parameters are close to one, which implies a high persistenee of variance effects.
3)
The parameters in the constant term, which are the unconditional covariances, have been constrained
to identity matrix values, in coherence with the properties of data transformation, see Eq. (18). Free
estimates of these parameters (not shown here) are very similar to these and a likelihood-ratio test
would not reject the null of that the unconditional covariance is equal to identity.
[Insert Table 2J
4)
True convergence has been aehieved, as the square root nonn of gradient in both cases is small.
We tried to fit diagonal GARCH(l, 1) models to all the possible pairs ofthe excess retums. Most ofthe
attempts converged to solutions with a nonzero gradient and sorne negative eigenvalue in the conditional
covariances. Convergence was obtained onIy when JY was included in the pair. Taking into account tbe
analysis in Section 3.1 this was to be expected, as the correlation between JY retums and those ofthe other
currencies is relatively small. AH the attempts to build a mode! for three series failed to converge.
Therefore, we will use tbe data transfonnation defined in Section 4.
5)
Afier convergence, we have computed the sequences of conditional covariances implied by the
model both, for the scaled and original data. The minimum eigenvalues ofboth sequences, sbown
in the last two rows ofTable 4, are positive.
Table 2 summarizes the main descriptive statistics ofthe excess retums. Note that a) all the series exhibit
exeess kurtosis and sorne asynunetry, perhaps relevant for BP and JY, b) the eorrelations are high, ranging
from.48 (BP-JY) to .98 (DM-FF), according to this faet and c) the ratio between the lowest and highest
eigenvalues of the covariance matrix (Am¡'/ Amax = .0069) suggests tbat there wiIl be a problem ofhigh
correlatiollS. Note that the scaled eigenvectors in the last panel ofTable 2 are the sample analogues of V
in (16).
Inspection of data scaled according to (15)-(17) reveals that the first series has a big outlier (-12.8 standard
deviations) in the second week of Apri11986. The corresponding scaled eigenvector implies that this series
is roughIy the difference between the returns ofDM and FF (see Table 2). 1his anomaluos value does not
occur in a cluster ofhigh volatility and ¡ts souree was traeed to a) a high positive fluetuation ofthe FF
exehange rate (+2.77 standard deviations), combined with b) a simultaneous smalI negative variation of
the DM (-.69 standard deviations). As the eorrelation between hoth series is .98, this combination is
unlikely.
12
[Insert Table 4J
Table 5 summarizes the descriptive statistics of standardized residuals. Apart from a typical exeess
kurtosis, fuere are no symptoms of misspecification. In particular, tbe Box-Ljung statistics do not reject
the null of conditional homoskedasticity.
[Insert Table 5J
Figure 5 shows the conditional volatilities (square roots of conditional variances) implied by the mode!.
Note that: a) volatilities ofDM and FF returns are almost equal, b) BP rettuns share common periods of
volatility with DM and FF yields and e) JY is more stable than the European currencies.
13
[Insert Figure 5]
Figure 6 show the conditional correlations implied by the morlel, which have clear and intuitive pattems.
First, conditional correlations between DM and FF retums are close to unity, with transitory deviations
in the last half afthe sample. Tbis result is hardly surprising, as both currencies are in the hard core of the
EMS. Secand, conditional correlations ofBP retums with other European currencies are weaker (around
.80, with highs and lows of .93 and .45 respectively) and there is a decreasing trend in the last part ofthe
sample. Finally correlations of JY retums with those of European currencies are relatively small, around
.5 to.6 with highs and lows of .95 and O, respectively.
[Insert Figure 6]
6. Concluding remarks.
The fust part of this paper concludes that iterative ML estimation of multivariate GARCH models is prone
to diverge due to negative eigenvalues in the conditional covariances. Literature is unanimously concemed
about the positive definiteness of these matrices and is conscious that :ML estimation of multivariate
ARCH models results difficult. Many authors, e.g., Engle and Kroner (1995), worry also about the large
number of parameters of unconstrained ARCH processes.
Whereas lack of parsimony contributes to instabiJity of IvIL, two reasons suggest that it is not such a
serious problem by itself First, in a context ofhigh-frequency financial data, availability ofhuge datasets
somewhat balances overparametrization. Second, simplified ARCH models (e.g., diagonal GARCH) ofien
show the same instability of 1Ulconstrained specifications. We think that the high correlations and
identificability problems discussed in sections 3 and 4 provide a more direct explanation than lack of
parsimony. Besides, they suggest how to detect the potential problem before model building and how to
improve the behavior of:ML aIgorithms.
We have shown that the econometric implementation outlined in Section 4, which i5 closely related to
factor-ARCH modeling, see Engle el al. (1990), contributes to the stability of likeliliood computation. It
also confirms that instabilíty in likelihood computation is mainly due to the relative scale of the
unconditional covariance eigenvalues. On the other hand it has clear limitations, as it does not assure
conditional covariances to be positive-definite. This requires using a different parametrization like, e.g.,
the previously mentioned constant correlations fonn or the BEKK model, see Engie and Kroner (1995) .
The proposed transformation has three additional advantages. First, working with original or transformed
data is indifferent for practical purposes, as the propositions in Section 4 define one-to-one relationships
between their main stochastic properties. Second, the transformed variables, besides an obvious financial
interpretation as yields of orthogonal portfolios, have a clear statistical meaning and may help in model
building, e.g., by revealing unlikely comovements, as was illustrated in the empirical example in Section
5. Third, as the unconditional covariance of the transformed variables is identity, imposing the
corresponding constraints reduces the number of free parameters in the model and improves
identificability .
Empirical evidence, not shown here, suggests that the data transformation improves the perfonnance of
ML algorithms even when using stable parametrizations as, for example, the BEKK model, see Engie and
Kroner (1995), We think that this happens because the transformation improves the scaling ofboth, the
data and the conditional covariance eigenvalues. Obviously if a model assures that conditional covariances
are positive-defmite, negative eigenvalues are not an issue. However, ill-conditioning problems also arise
when some eigenvalues are positive but close to zero,
The issue ofhigh correlations is obviously the most important ofboth, as it compromises the validity of
estimates, This problem is easy to detect before model building, using the eigenvalues of a sample
lUlconditional correlation matrix and the corresponding condition number.
Except in extreme caseS, the problem of identificability is important only when combined with high
correlations. By itself, it implies that point~esma
will be highly correlated and imprecise, On the other
hand, it does not affect the capacity of GARCH specifications to describe and forecast volatility and can
be dealt with by restrictions on the model parameters, e.g., imposing IGARCH constraints. Existence of
cofeatures in variance, see EngIe and Kozicki (1993), aIso allows one to improve identificability by
simplifying the model dynamic structure.
14
15
Ack.nowledgements.
Appendix A. Proof ofProposition 2.
Alfonso Novales made useful cornments and suggestions to previous versions of this work. Sonia Sotoca
acknowledges financiaI support fram CICYT, project PB95-0912/95 and Fundación Caja de Madrid.
Eqs. (15) and (19) imply Ihat:
(A. 1)
References.
(A.2)
Bollerslev, T., 1988. On the Correlation Structure for the Generalized Autoregressive Conditional
Heteroskedastic Process. Journal ofTime Series Ana/ysis. 9, 2, 121-131.
Bollerslev, T., 1990. Modelling the Coherence in Short-Run Nominal Exchange Rates: A Multivariate
Generalized ARCH Approach. Review of Economics and Stafistics, 72, 498-505.
Bollerslev, T., R.F. Engle and J.M. Wooldridge, 1988. A Capital-Asset Pricing Model wilh Time-Varying
Covariances. Journal of Political Economy, 96/1, 16~
131.
Substituting (A.1) and (A.2) in (20) yields:
The next steps require to use the following algebraic result:
vec(ABA T)
=(A0A)vec(B)
(AA)
Box, G.E.P. and G.C. Tiao, 1975. Intervention analysis with applieations to eeonomie and environmental
problems. Journal ofthe American Statistical Association, 70, 70~9.
and the faet that the veehO and veeO operators are snch that, for any syrnmetric matrix S,
vech(S) = Al vec(S) and vee(S) = .12 veeh(S)vector, being al ,a2 are 0-1 matrices.
Engle, R.F., 1982. Antoregressive Conditional Heteroskedasticity with Estimates of the Variance ofU.K.
Inflation. Econometrica, 50, 987~10.
Then, Exp. (A.3) in veeO fOlm beeomes:
Engle, R.F., V.K. Ng and M. Rotschild, 1990. Asset Pricing wilh a FACTOR-ARCH Covariance
Strueture: Empirical Estimates for Treasure Bills. Journal of Econometrics, 45, 213-237
Engle, R.F. and S. Kozicki, 1993. Testing for Coroman Features. Journal of Business and Economic
Statistics, 11,369-380.
Bngle, R.F. and K.F. Kroner, 1995. Multivariate Simultaneous Generalized ARCH. Econometric Theory,
11,122-150.
(A.5)
and by result (AA):
Á, [V-'0V-'lvec(1:;) = IV + A Á, [V-'0( V-, )T1 vec [ ,;_, (.;_,)']
+ B Á, [V-'0 V-'lvec(1:;_,)
(A.6)
which can be expressed in veehO notation as:
Á, [V-'0 V-'l Á, vech(1:;) = IV + A Á, [V-'0 V-'lÁ, vech[ .;_,
(';-,>'1
+ B Á, [V-'0 V-'l Á, vech(1:;_,)
(A.7)
Denoting: P=Á,[V-'0V-'lÁ, simplifies(A.7)to:
P vech(1:;) = IV + AP vech[ ,;_, (.;_,)'1 + BPvech(1:;_,)
(A.8)
which implies:
(A.9)
Finally, identifying Ihe parameter matrices in (A.9) and (21) yields Exp. (22)-(25).
16
17
•
Appendix B. Proof of Proposition 3.
According with (14), the minus log gaussian likelihood of a sample of size N is:
1
~(e",N)=-kln2+L:E1
1
2
N
T
-1
+e,:E, e,)
21~
(B.l)
Substituting (A.l) and (A.2) in (B.l) yields:
~(e"
... , eN)
N
=
1. Nk1n(2 n) + 1. L
2
2 ,-,
{In I V- 1 :E; (V- 1 fl + (e;n V- 1
f[ V-1 :E; (V- 1 fr' V-1 e;l
(B.2)
and the terms inside the surnmatory are such that:
(B.3)
Fig. l. Eigenvalues ofthe conditional covariances in the il1 conditioned case (° 12
w
To understand the simplification in (B.3), note that (16) implies that VI :;: ¡A -112 1, because the
detenninant of the eigenvector matrix M is one and, therefore, In I VI :;: --In lA! .
i
2
largest eigenvalue of Sigma{t)
eigenvalue 01 Sigma{1)
Smal~st
35
3.5
2.5
2.5
3
Finally, substituting (B.3) and (B.4) in (B.2) implies lba!:
(B.5)
•
1.5
0.5
0.5
o~
'-
50
100
150
200
250
300
18
19
'"
.8).
Fig. 3. Isoquantas ofthe log-likelihood function ofmadel (12).
J
Fig. 2. Eigenvalues of the conditional covariances in fue well-conditioned case ( cr 12
= .1).
0.75
4
Smallesteigenvalue of Slgma{!}
largest eigenvalue of Sigma(!)
4
3.5
0.7
3.5
!
3
2.5
2.5
0.65
2
1.5
0.5
0,L-~51C23
oLI_ _
~ 50
__
~ 100 _ _150
~=-
200
250
300
0.5 '-:0L.f~c,
0.5
1
0.55
0.6
0.65
phi
))
20
21
0.7
0.75
0.8
Figure 5, Estimated conditional volatilities.
Figure 4. Scaled yields after intervention in FF.
VolaWityol DM/USD log pct rellNll
VolatiUyofFF/USD log ¡x;t retl.HT1
3.5,,-_-_'-_
_;:c_~-
Stardardized plol of series # 2
Standardized piel of series # 1
"
2
1.5
o:t
O
lOO
500
0.5
100
200
300
400
500
600
O
O
700
log peto retum
3.5r-_ _C:;=~'_ Vo!ati~yfBPIUSD
3
2.5
2.5
0.5
0.5
00
200
300
,400
500
200
300
400
500
Volalilityof JYIUSD Iog peI. retum
3.5r-_=~:";,
Slandardized ptot 01 series # 4
Standardized ptot of series # 3
100
lOO
600
100
200
300
400
500
500
0"-CI~23ÓOo,56c!7
700
600
23
22
600
700
------------------------------------------
--,
Figure 6. Estimated conditional correlations.
1f
,. \('IV
correlation DM.SP Iog pct retums
oorrelation DM.FF log pct retums
yoy.
O.9~
o,t
O.8~
O.6~
::¡f
0.3
0.3
o,
02
0.1
0.1
0~-12O3<5&ro
~-C,O203=5&'"7
Table 1. ML estimates, correlations and principal components infonnation.
True values
correlalion FF-BP 109 pet retums
correlation DM-JY Iog pct retums
a2
<1>
;=
1.0
=.7
Estimatest
2
0 = 1.065
(.091)
.¡, = .706
Correlations
e =.6
0.5
0.3
0.3
0.2
0.2
0.1
0.1
00
100
200
300
400
500
600
íl
= .609
(.231)
--
--
1
1
0.01
-0.1
0.06
1
--
0.02
0.05
0.71
-0.71
o
0.98
1
1.98
0.04
0.71
0.71
t The figure in parentheses is the standard deviation of the estimate.
°0~1C-2=3<5O'or
700
correlation BP-JYlog pct relums
correlation fF..JY Iog pcl rems
OA
0.3
~
0.2~
0~1 o __~' 100
1
200
300
400
500
600
0.1
700
25
24
Eigenvectors (by rows)
1
(.203)
OA
Eigenvalues
Table 3. Transfonnation coefficients and Q statistics ofthe scaled series.
Table 2. Descriptive statistics of excess retums.
Statistie
DM
FF
BP
JY
Scaled eigenvectors [matrix Vin Eq. (16)] after intervention
Standard deviation
1.358
1.31
1.359
1.298
DM
FF
BP
JY
Skewness
-0.046
-0.021
0.432
-0.609
DM
4.032
-4.195
0.068
-0.088
Excess Kurtosis
1.608
1.874
3.986
2.313
FF
-0.652
-0.628
1.062
0.413
BP
-0.102
-0.123
-0.482
0.889
JY
0.233
0.224
0.209
0.171
Sample correlations:
DM
1
--
--
--
FF
0.978
1
--
--
BP
0.777
0.781
1
--
JY
0.635
0.623
0.477
1
Eigen-strueture oftbe eovarianee matrix
Scaled eigenveetors [matrix V in Eq. (16)]
Eigenvalue
% ofvar.
0.039
0.55
3.535
-3.651
0.041
-0.078
0.472
6.66
-0.652
-0.628
1.062
0.413
0.954
13.45
-0.102
-0.123
-0.482
0.889
5.627
79.34
0.233
0.224
0.209
0.171
Ljung-Box Q statistic (for 10 lags oftbe autocorrelation funerion of cross-products ofilie
transformed series)t
Series #1
Series #2
Series #3
Series #4
Series #1
288.13
--
Series #2
42.45
19.67
---
---
Series #3
63.27
12.58
57.87
--
Series #4
28.6
23.09
84.04
25.19
t The 95% percentile ofaxio is 18.3. As the data is not gaussian, this is on1y an orientative critical value
ofthe statistic under the null ofno autocorrelation.
27
26
r
i
Table 4. ML estimates ofthe GARCH(l,l) model (standard deviations inparentheses).
vech(e,*e; T)
a¡j
4>ij
é¡j
(e;t?
1 (--)
.955 (.010)
.683 (.017)
e;t e;t
0(--)
.895 (.015)
.845 (.015)
e;, e;,
0(--)
.273 (.009)
.238 (.008)
e¡t e;t
0(--)
.442 (.007)
.232 (.004)
(e;t?
e;t e;t
1 (--)
.895 (.023)
.795 (.020)
0(--)
.936 (.012)
.846 (.014)
e;te;t
0(--)
.971 (.010)
.925 (.014)
(e;t?
1 (--)
.891 (.015)
.763 (.013)
e;t e;t
0(--)
.957 (.025)
.880 (.020)
(e;,)'
1 (--)
.895 (.018)
.745 (.018)
Diagnostics of estimation resuIts:
Gaussian likelihood (minus log) on convergence
3618.78
Square root norro of gradient
0.0773
Min. eigenvalue of scaled data covariances
0.0658
Min. eigenvalue of original data covariances
0.0046
Table 5. Statistics of standardized residuals.
Series #1
Series #2
Series #3
Series #4
Skewness
-0.583
0.481
-0.735
-0.015
Excess Kurtosis
3.156
5.016
2.376
1.865
º
Ljung-Box statistic (for 10 lags oftbe autocorrelation function of cross-products oftbe
standardized series)
Series #1
Series #2
Series #3
Series #4
Series #1
5.30
--
Senes #2
4.08
4.48
---
---
Senes #3
6.13
7.67
9.38
--
Series #4
8.80
16.11
5.19
9.90
t The 95% percentile ofaxio is 18.3. As the data is not gaussian, this is only an orientative critical value
of the statistic under the null of no autocorrelation.
t The parameters in this colunm are constrained to identity matrix values, according to the transfonnation
(15)-(17). The minus log likelihood corresponding to this model with free covariances is 3614.52.
Therefore, an LR test would not reject the constraints at the 95% confidence level.
29
28