Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
'/ Documento de trabajo The Likelihood of Multivariate Garch Models is III-Conditioned Miguel Jerez José Casal ()@(jJ(} Instituto Complutense de Análisis Económico UNIVERSIDAD COMPLUTENSE Sonia 50toGa No.9904 Septiembre 1999 FACULTAD DE ECONOMICAS Campus de Somos aguas 28223 MADRID Teléfono 913942611 - FAX 91 2942613 Internet: http://www.ucm.es/info/icae/ E-mail: icaesec@ccee.ucm.es ()@(jJ(} Instituto Complutense de Análisis Económico UNIVERSIDAD COMPLUTENSE THE LIKELmOOD OF MULTIVARlATE GARCH MODEI.S IS ILL-CONDlTIONED Miguel Jerez José Casals Sonia Sotoca Universidad Complutense de Madrid Campus de Somosaguas 28223 Madrid ABSTRACT The likelihood ofmultivariate GARCH models is ill-conditioned because of two faets. First, financial time series afien display high correlations, implying that an eigenvalue afthe conditional covariances fluctuates near the zero boundary. Secand, GARCH models explain conditional covariances in tenns of a linear combination of delayed squared errors and theu conditional expectation; this functional fonu implies that the likelihood function is almost flat in the neighborhood of the optimal estimates. Building on this analysis we propase a linear transformation of data which, not only stabilizes the likelihood computation, but also provides insight about the statistical properties of data. The use of this transfonnation is illustrated by modeling the short-nm conditional correlations of four nominal exchange rates, RESUMEN La verosimilitud de procesos GARCH multivariantes está mal condicionada por dos causas. En primer lugar, las series fmancieras a menudo están fuertemente correJadas, lo cual implica que un autovalor de las matrices de covarianzas condicionales está próximo a cero. En segundo lugar, los modelos GARCH explican la varianza condicional en términos de errores cuadráticos retardados y de la esperanza condicional de éstos; esta forma funcional implica que la función de verosimilitud es prácticamente plana en el entorno de las estimaciones óptimas. A partir de este análisis, proponemos una transformación lineal de los datos que, no sólo estabiliza el cálculo de la verosimilitud, sino que ayuda a analizar las propiedades estadísticas de los datos, El uso de esta transformación se ilustra modelizando las correlaciones condicionales a corto plazo de cuatro tipos de cambio nominales. Key words: ARCH, GARCH, maximum-likelihood JEL c1assification: C130. C320. C510. Mailing address: Departamento de Fundamentos del Análisis Económico n, Universidad Complutense, Campus de Somosaguas, 28223 Madrid, Spain, E-mail: mjerez@ccee.ucm.es. e 1. Introduction. trial solution where conditional covariances are not positive-definite. In this situation computing the Since the seminal paper ofEngle (1982) many works describe the volatility offinancial yields using models with conditional heteroskedastic errors. Univariate models in the ARCH family are useful to likelihood results in lUlbolUlded or mathematically lUldefined operations. When both, identificability and high correlation problems occur, a) the likelihood function is almost flat measure and fareeast the volatility of single assets. While this is important, problems of risk-assessment, asset-allocation, hedging and options pricing require knowledge afthe properties ofmultivariate series. Ofien, these properties can be represented adequately by means of a vector GARCH model. in the neighborhood of the optimal estimates and b) this point is close to the zone of the parametric space where eovariances are not positive-semidefinite. This situation spells disaster for iterative ML methods, According with our experience, maximum-likelihood (1v1L) estimation of multivariate GARCH models afien implies: Building on this analysis we propose a linear transfonnation of data designed to project the eigenvalues of conditional covariances far from the zero boundary and to optimize their relative value, This 1) a high computational cost, transformation is closely related to principal components and resuIts useful, not only to stabilize the computation of likelihood, but also to analyze the statistical properties of the sample, 2) sensitivity afthe estimates to changes in both, the sample and tIte initial conditions ofthe iterative algoritlun, grounds. Section 3 describes in detail the problems summarized aboye and discusses its implications. The structure ofthe paper is as follows. Section 2 states the problem oflikelihood computation on standard Section 4 defines the stabilizing data transformation and characterizes its properties, Section 5 applies this 3) 4) frequent iteration on solutions where conditional covariances have negative eigenvalues and, because ofthis, non-convergence or convergence to solutions with norlZero gradient. This 'false convergence' situation happens because many nonlínear algorithms stop when changes in tbe function or parameter values are considered small enough. In an ill-conditioned case, these heuristic criteria can be satisfied in solutions with a nonzero gradient. This paper analyzes the causes ofsuch bad behavior, We conclude that it is due to a) the fact that financial time series ofien exhibit high unconditional correlations and b) identificability problems derived from the functional form of GARCH processes. We will refer to these problems as -"high correlations" and "identificability" , data transformation to model tbe short-nm conditional correlations of four nominal exchange rates, Finally, Section 6 discusses previous results and summarizes the main conclusions. 2, Problem statement and notation. Consider a (kx 1) random vector Y I which, by means of an econometric model, is decomposed as Y i '" E¡_/y l ) + el' being Et_¡() the expectation ofthe argument conditional to the information set up to (-1, 0'-1' In a eonditional heteroskedastie framework, the errors el are such that et - iid(O,:E), <, I n'_1 - iid(O,1:,). Assume without loss of generality that Y, '" el' 1fthe conditional covariance :El depends on a vector 8 of unknown parameters, the minus log gaussian likelihood of a sample of size N is: Poor identificability is implied by tbe functional form the GARCH modeL 1t explains the conditional covariance as a fimction of delayed cross-products of eITors and the conditional expectation ofthese cross- (1) products. Obviously these variables share much cornmon information and, in the neighborhood of the optimal estimates, are deemed to be very similar, Therefore, point-estimates ofthe parameters will be Literature proposes different ways to parametrize :El' Many formulations are eneompassed by the highly correlated and imprecise. On the other hand, poor identificability does not affeet the eapacity of a GARCH model to describe and forecast volatility and, except in extreme situations, shouId not compromise the stability ofl\1L algorithms, multivariate GARCH(p,q). To avoid unnecessary complications, in the rest ofthe paper we wi11 assume that p=q= L The vector GARCH(1,I) model is characterized by: (2) The issue of high correlations is more critical. It implies that there is at least one eigenvalue of the unconditional covariance is close to zero. Then, the smallest eigenvalues of eonditional covariances fluctuate near the zero boundary and, in a context of iterative nonlinear methods, it is easy to iterate on a where vech(.) denotes the vector-half operator, which staeks the lower triangle of an NxN symmetric matrixintoa [N(N+l)l2]xl vector. 2 3 a The following remarks surnmarize sorne features of model (2) that wiil be used in the rest ofthe paper: 3. Sources oC ill-conditioning in likelihood computation. 1) It has a large number of parameters, even for moderate sizes of k. Many authors worry about this lack of parsimony and suggest simplifYing assumptions like diagonal structure (Bollerslev el al. 1988) or constant-correlations (Bollerslev, 1990). 2) The fimctional fonn (2) does not assure the positive-definiteness oí eonditional eovarianees. In faet, this is a very diffieult condition to impose exeept in drastieally simplified versions of the model. 3) By definition, the variables in the right-hand-side of (2) are such that: 3.1 High correlations. Financial time series ofien display high lUlconditional correlations. Sorne explanatíons ofthis empírical regularity may be a) coromon statistieal features of data, b) conunon factors due to the nature of the series (e.g. exchange rates are ofien related to a single currency) or e) simultaneous volatility clusters. In tenns of principal components, high correlations imply that there is at least one quasi-deterministie linear combination ofthe series, characterized by a small eigenvalue ofthe l.Ulconditional covarianee, In this situation the smallest eigenvalues of conditional covariances will fluctuate near fue zero boundary. (3) Taking into account the fonn ofthe log-likelihood function (1), this situation is dangerous because: where v. is (conditional and unconditionally) a zero-mean uneorrelated process with a complex heteroskedasticity (Bollerslev, 1988, pp. 123). 4) 1) Iterating on a solution é, where :E/(é) has small eigenvalues, may yield floating-point errors lUlbolUlded results when computing: 1.1) thesequences inlE,(é)I and E,(6)-1 (t= 1, ... ,N)in(I). 1.2) the first and second-order derivatives of (1), which are ftmctions of :Et(é)-l . 2) If E,(6) has sorne negative eigenvalues, computation of in I E,(B) I (t = 1, ... ,N) result in Generalizing the univariate result in Bollerslev (1988), the decomposition (3) allows one to express (2) as a VARMA(l,l) model: (4) where L is the lag operator, vt are the innovations defined in (3) and the AR and MA factors are related to the polynomials in (2) by ti> = A + B and B, respectively, Ifmodel (2) is such that the roots of JI - IP).. ¡ = o He outside the Mit circle, then (4) can be written as: e '" vech( e,e;) = vech(E) + (1 - <f>L) -1 (I - eL) v, OI mathematically undefined operations. Besides, many 1v1L algorithms reIy on the use of Cholesky decomposition to avoid the explicit inversion of covariance matrices. As Cholesky factors require these matrices to be positive-definite, negative eigenvalues also induce errors by this way when computing the function (1) or its derivatives. According to our experience, simple perturbation teclmiques help to avoid runtime errors, but are not useful ta achieve convergence. (5) The following example illustrates the effect of high correlations on the eigenvalues of conditional where the constant term is the vector-hatf ofthe unconditional covariance: covariances. (6) Unless otherwise indicated we will use the representation (5)-(6), keeping in mind that it is observationally equivalent to the standard form (2). Example l. Consider the bivariate GARCH( 1, 1) model expressed in the fonu (5): e11l el/el/ , e 1, , 0, , 0, 0]2 + 1 - .97 B O O 1 - .90B O O -1 1 - .86B O O O 1 - .80B 1 - .85B O O O O Vil O V12 / (7) 1 - .73B v2t and the lUlconditional covariances: 4 [1.0.8 1..8]0 ; with eigenvalues: A, "" 1.8, A = 2 "" 5 .2, and (8) e ]"[1.0 .1] ; witheigenvalues:)..l [a; a/,' a .1 1.0 012 2 2 (.t 2 0t =w+ae¡_¡+pol_l = 1.1 and)..2 =.9 . (ID) (9) 2 According to (3), the variables in the right-hand-side of(10) are related by: 2 2 et - 1 =°/-1 +Vt _ 1 V¡_l an uncorrelated, zero-mean heteroskedastic noise. Eqs. (10)-(11) imply that: (11) Note that the ratio between the smallest and largest eigenvalues in the first case (Á/ A.1 = .111 ) is much lower than in the second case (').. /'),,/ = .818). This faet characterizes a (not extreme) ill-conditioned situation. being The example consists of: 1) The variables in the right-hand-side of(10), e; -1 and a; -1' are such that: EI _2( e; -1) 2) The tenn vr_1 in (11) can be interpreted as the infonnation in -1 which is not contained in 0;_1' Then, ifthe infonnation (or variance) of Vt _ 1 is low, it will be difficult to obtain independent estimates of a and p, whereas sorne linear combination ofthese parameters will be identified. 1) Obtaining two realizations with N=300 of a bivariate white noise process el' which conditional covariances are gíven by model (7)-(8) for the frrst series, and model (7)-(9) for the second series. 2) Computing the sequences of conditional covariances and the corresponding eigenvalues, using the true value of the parameters. Figure 1 represents the smallest and highest eigenvalues of :E/(e) in the ill-conditioned case (012 '" .8). Note that the first sequence fluctuates very close to the zero boundary, being its extreme values min=.O 19 and max=.288. Figure 2 displays the same eigenvalues in the well-conditioned case (012 '" .1). Note that the sequence of smallest eigenvalues (min=.354, max=.960) is farther from zero than in the previous case. :;;: a; -1' e; Therefore, the likelihood of (10) is very flat in sorne directions ofthe parametric space. It is difficult to say when this problem will be important, because the support of V'_I changes in time (Bollerslev, 1988, pp. 123) so its variance is almost impossible to describe analytically. One may guess that ¡fmodel (10) shows high persistence - i.e. if a + P .. 1- the parameters will be more identifiable because U;_1 would be less adaptive to -1 than in a model with less persistence. e; The following example illustrates the poor identificability of a GARCH(l ,1) model using sÍmulated data. [Inser! Figure 1] [Inser! Figure 2] The sequences in Figures 1 and 2 have been computed with the true values of the parameters. A sensitivity analysis reveaIs that small perturbations of the parameters in the ill-conditioned case yield negative eigenvalues. For example, ifthe MA parameter ofthe covariance equation in (7) is set to .82 instead of its true value .80, then the sequence of conditional covariances has severa! negative eigenvalues, being the smallest -0.012. In the well-conditioned case, however, the eigenvalues are much more robusto Therefore, a nonlinear ML algorithm has a higher risk of iterating 00 a solution with negative eigenvalues when correlations between the series are high - like those in (8) - than when they are smaIl. Example 2. Consider 500 samples ofthe process e/ - üd(0,a2), e/ Iq-1 variances following a GARCH( 1,1) model in ARMA foun: 2 2 I-SB (12) e =a + - - - v , - iidN(O,a;) with conditional 1-<pB' with a2 == 1, e = .6 and cp =.7. The ML estimates ofthe pararneters in (12), theÍr correlations and fue corresponding principal components are summarized in Table 1. [Inser! Table 1] Note that: 3.2. Poor identificability. As we said in the Introduction, poor identificability is due to the functional fom of the GARCH model. To simplify the analysis, we will discuss this problem in a univanate framework. AssUD1e therefore that Y, "e" e, - iid(O, a'), e, IOH - ¡¡deO, a;). A GARCH(I,I) in the standard fonu (2) is: 1) Point estimates are close to the true values. 2) The estímate ofthe unconditional variance is almost orthogonal to the rest ofthe parameters. Ibis situation is characterized both by a) smal1 correlations of ¡i with <P and é, and b) an eigenvalue of 1.0 associated with the eigenvector [1 .01 -.1]. 6 7 2 3) Correlation between ~ and El is .98. The highest eigenvalue (1.98) is associated with the eigenvector [.04 .71 .71 J, showing that the sum ofboth parameters is well identified. On the other hand, the smallest eigenvalue (.02) is associated with the eigenvector [.05 .71 -.71]. The difference between both estimates - which is the IX parameter in (lO) - is then ill-identified. Figure 3 shows fue optimal estimates (represented by a <+' sign) corresponding to a log-likelihood of = 1.065. The isoquantas are chosen to represent corrfidence regions for <1> and 6, from a 5% confidence (given by the finer conic section) up to 95% in increments of 10 pereent points. The first three isoquantas are labe1ed with the corresponding likelihood value. This Figure shows that a) big zones ofthe parametrie space have a likelihood similar to the optimal and b) confidence regions are wide and, therefore, point-estimates result very uneertain. 4.1, Analytic properties oftbe stabilizing linear transformation. The following propositions relate the stochastic properties of et" with those of et · Praposition l. The unconditional and conditional distributions of el" are: 720.840, and the isoquantas afthe log-likelihood conditional to 6 2 e; - iid(O,!) (18) (19) Proo! The resul! follows immediately from (13)-(17). [lnsert Figure 3] Note that the result in (18) implies that the transfonnation defined by (15)-(17) is optimal, as it scales a11 the eigenvalues ofthe lUlconditional eovariance to unity, thus achieving the optimal condition nwnber of one. An additional advantage is that the transfonned values e,"" have a meaningful statistical interpretation, 4. Stabilizing likelihood compufation. as standardized principal components of el' According to previous analysis, let be ef a (kx 1) random vector such that: Proposition 2. If }jf is such that: (13) <, Iat-} - iid(O ,1:,) vech(~f) (14) then = w + A vech( et _le;_I) + B vech( ~t -1) (20) Ir follows the GARCH(l,l) motion law: and consider the linear transformation: (21) (15) where Vis a (kxk) matrix ofreal numbers such that where: IVI * o. (22) W"=p-1 W (23) The problem ofhigh correlations, discussed in Section 3.1, arises when an eigenvalue of l: is relatively small. Then, the data can be optimally scaled by choosing: (24) (16) (25) where matrices in the right-hand-side of(16) are given by the eigenvalue-eigenvector decomposition: (17) and 8 1 , 8 2 are 0-1 matrices such that, for any symmetric matrix S, vech(S) = 8 1 veceS) and veceS) '" 8 2 vech(S), beingvec(,)theoperatorwhichstacksthe columns ofan NxN matrixintoa N 2 x 1 vector. Proa! See Appendix A, 8 Corollary l, Ifthe variance model is expressed in the fonu (5): 9 z (26) (30) the cross-products ofthe transfonned data follow the VARMA model: (31) (27) (32) where: where P denotes the sample analogue ofP, see Eq. (25). Finally, compute estimates afthe (28) Proposition 3. ~ (el' el' .. , eN) a sample. =Q(e;, e;, ."' e;) + ~ conditional covariances using: (29) (33) lag IAl, being QO the minus lag gaussian density of Expressions (30)-(32) follow irnmediately from Eq. (22)-(25) and (33) follows from (19). Note that consistency is assured by the Theorero of Slutsky. If ML were employed to compute the estimates in Step 2.1, Proposition 3 assures thatthe estimates -.P, Á and :B are asymptotica11y equivalent to ML estimates. 1t also can be applied to compute information Proa! See Appendix B, eriteria Of LR statistics. Note that, replacing (18) by e,* - iid( O, V::E V T), propositions 1 and 2 hold for any choice of V. A general result analogous to Proposition 3 is easy to derive following the proof in Appendix B, as only the ftnal simplification relies in the particular choice of V given in (16). Step 3: If required, compute estimates of the covariances of w, A and B using the following Proposition: Proposition 4. If cóv( w*), cóv(Á *) and cóv(B *) are consistent estimates ofthe covariances of and B", respectively, then the expressions: 4.2. Econometric implementation. The results in Section 4.1 were derived for the true values ofthe data generating process. Building on them, the following empírical implementation is straightforward: w*, A'" (34) cov(w) =Pcov(w ')p' (35) Step 1: Starting from a sample {et } / ~J, compute an estímate ofthe unconditional covariance matrix, t, fue eigenvalue-eigenvector decomposition (17), the matrix V using fue sample analogue of (16) and ...,N' the transformed series {e/}/,,¡, ... ,N using (15). Specify a GARCH model for e;". We will assume that it is a GARCH(I,I) in !he fonu (2). Step 2: Step 2.1: Compute consistent estimates for fue parameters in (21), assure that fue corresponding gradient is small enough. w. , Á" and B". Ifl\.1L is used, (36) provide consistent estimates of the covariances of Ji!, Aand B. Proo! Expression (34) follows immediately from (30). Applying fue yecO operator to both sirles of (31) we obtain: (37) • Step 2.2: Compute the covariances {:E(""} t -l . .. ,N according to (21). Check fue smallest eigenvalue to assure that it is positive, which implies (35). The proof of (36) is analogous to this one. Step 2,3: Ifrequired, obtain estimates ofthe parameters in (2) through the expressions: 1his implementation aIlows one to obtain resutts for original data from those corresponding to transfonned data. The following example illustrates its application. 10 11 ... ------------------------------------------ :~- -- 5. Empirical example: short-run alignment of exchange rates. The anomalous FF excess retwn was corrected using a simple intervention model, see Box and Tiao It is well known that many exchange rates fluctuate in the same direction and in similar proportions. This co-movement can be explained by competitive appreciation ar depreciation policies, by intemational agreements ar just by the faet that aIl the rates are expressed in tenns of a cornmon numeraÍre (afien the US Dollar) which perfonnance affects them aH. (1975). TabIe 3 summarizes both, tbe new scaled eigenvector matrix and the Box-Ljung Q statistics of cross-products of the transformed series. TIris test rejects the null of no conditional heteroskedasticity. Figure 4 shows the resulting scaled series. [lnsert Table 3J Long-tenn comovements can be effeetively measured through sampIe correlations. On the other hand, short-tenn fluctuations rnay deviate substantially frorn the alignment implied by the long-nm eorrelation matrix. In this Section we model short-nm comovements of four relevant currencies through the conditional correlatíons implied by a vector GARCH model. Consider the spot bid exchange rates ofDeutsche Mark (DM), French Frane (FF), British POlllld (BP) and Japanese Yen (JY) against US Dollar, observed in the London Market during 695 weeks, from January 1985 to April 1998. The data has been logged, differenced and scaled by a factor of 100, to obtain the corresponding log pereent yields. Excess retums are then computed by substracting the sample mean. [lnsert Figure 4J A standard analysis ofthe scaled series and their cross-produets suggests that a diagonal GARCH(1,l) will be adequate to capture most ofthe conditional heteroskedasticity. Table 4 summarizes the lv1L estimates ofthis model, expressed in the VARMA form (5). Note that: 1) All the parameters are mueh higher than ¡ts standard errors. As the scaled data is not gaussian, this is onIy informal evidence of statistical significance. 2) Many AR parameters are close to one, which implies a high persistenee of variance effects. 3) The parameters in the constant term, which are the unconditional covariances, have been constrained to identity matrix values, in coherence with the properties of data transformation, see Eq. (18). Free estimates of these parameters (not shown here) are very similar to these and a likelihood-ratio test would not reject the null of that the unconditional covariance is equal to identity. [Insert Table 2J 4) True convergence has been aehieved, as the square root nonn of gradient in both cases is small. We tried to fit diagonal GARCH(l, 1) models to all the possible pairs ofthe excess retums. Most ofthe attempts converged to solutions with a nonzero gradient and sorne negative eigenvalue in the conditional covariances. Convergence was obtained onIy when JY was included in the pair. Taking into account tbe analysis in Section 3.1 this was to be expected, as the correlation between JY retums and those ofthe other currencies is relatively small. AH the attempts to build a mode! for three series failed to converge. Therefore, we will use tbe data transfonnation defined in Section 4. 5) Afier convergence, we have computed the sequences of conditional covariances implied by the model both, for the scaled and original data. The minimum eigenvalues ofboth sequences, sbown in the last two rows ofTable 4, are positive. Table 2 summarizes the main descriptive statistics ofthe excess retums. Note that a) all the series exhibit exeess kurtosis and sorne asynunetry, perhaps relevant for BP and JY, b) the eorrelations are high, ranging from.48 (BP-JY) to .98 (DM-FF), according to this faet and c) the ratio between the lowest and highest eigenvalues of the covariance matrix (Am¡'/ Amax = .0069) suggests tbat there wiIl be a problem ofhigh correlatiollS. Note that the scaled eigenvectors in the last panel ofTable 2 are the sample analogues of V in (16). Inspection of data scaled according to (15)-(17) reveals that the first series has a big outlier (-12.8 standard deviations) in the second week of Apri11986. The corresponding scaled eigenvector implies that this series is roughIy the difference between the returns ofDM and FF (see Table 2). 1his anomaluos value does not occur in a cluster ofhigh volatility and ¡ts souree was traeed to a) a high positive fluetuation ofthe FF exehange rate (+2.77 standard deviations), combined with b) a simultaneous smalI negative variation of the DM (-.69 standard deviations). As the eorrelation between hoth series is .98, this combination is unlikely. 12 [Insert Table 4J Table 5 summarizes the descriptive statistics of standardized residuals. Apart from a typical exeess kurtosis, fuere are no symptoms of misspecification. In particular, tbe Box-Ljung statistics do not reject the null of conditional homoskedasticity. [Insert Table 5J Figure 5 shows the conditional volatilities (square roots of conditional variances) implied by the mode!. Note that: a) volatilities ofDM and FF returns are almost equal, b) BP rettuns share common periods of volatility with DM and FF yields and e) JY is more stable than the European currencies. 13 [Insert Figure 5] Figure 6 show the conditional correlations implied by the morlel, which have clear and intuitive pattems. First, conditional correlations between DM and FF retums are close to unity, with transitory deviations in the last half afthe sample. Tbis result is hardly surprising, as both currencies are in the hard core of the EMS. Secand, conditional correlations ofBP retums with other European currencies are weaker (around .80, with highs and lows of .93 and .45 respectively) and there is a decreasing trend in the last part ofthe sample. Finally correlations of JY retums with those of European currencies are relatively small, around .5 to.6 with highs and lows of .95 and O, respectively. [Insert Figure 6] 6. Concluding remarks. The fust part of this paper concludes that iterative ML estimation of multivariate GARCH models is prone to diverge due to negative eigenvalues in the conditional covariances. Literature is unanimously concemed about the positive definiteness of these matrices and is conscious that :ML estimation of multivariate ARCH models results difficult. Many authors, e.g., Engle and Kroner (1995), worry also about the large number of parameters of unconstrained ARCH processes. Whereas lack of parsimony contributes to instabiJity of IvIL, two reasons suggest that it is not such a serious problem by itself First, in a context ofhigh-frequency financial data, availability ofhuge datasets somewhat balances overparametrization. Second, simplified ARCH models (e.g., diagonal GARCH) ofien show the same instability of 1Ulconstrained specifications. We think that the high correlations and identificability problems discussed in sections 3 and 4 provide a more direct explanation than lack of parsimony. Besides, they suggest how to detect the potential problem before model building and how to improve the behavior of:ML aIgorithms. We have shown that the econometric implementation outlined in Section 4, which i5 closely related to factor-ARCH modeling, see Engle el al. (1990), contributes to the stability of likeliliood computation. It also confirms that instabilíty in likelihood computation is mainly due to the relative scale of the unconditional covariance eigenvalues. On the other hand it has clear limitations, as it does not assure conditional covariances to be positive-definite. This requires using a different parametrization like, e.g., the previously mentioned constant correlations fonn or the BEKK model, see Engie and Kroner (1995) . The proposed transformation has three additional advantages. First, working with original or transformed data is indifferent for practical purposes, as the propositions in Section 4 define one-to-one relationships between their main stochastic properties. Second, the transformed variables, besides an obvious financial interpretation as yields of orthogonal portfolios, have a clear statistical meaning and may help in model building, e.g., by revealing unlikely comovements, as was illustrated in the empirical example in Section 5. Third, as the unconditional covariance of the transformed variables is identity, imposing the corresponding constraints reduces the number of free parameters in the model and improves identificability . Empirical evidence, not shown here, suggests that the data transformation improves the perfonnance of ML algorithms even when using stable parametrizations as, for example, the BEKK model, see Engie and Kroner (1995), We think that this happens because the transformation improves the scaling ofboth, the data and the conditional covariance eigenvalues. Obviously if a model assures that conditional covariances are positive-defmite, negative eigenvalues are not an issue. However, ill-conditioning problems also arise when some eigenvalues are positive but close to zero, The issue ofhigh correlations is obviously the most important ofboth, as it compromises the validity of estimates, This problem is easy to detect before model building, using the eigenvalues of a sample lUlconditional correlation matrix and the corresponding condition number. Except in extreme caseS, the problem of identificability is important only when combined with high correlations. By itself, it implies that point~esma will be highly correlated and imprecise, On the other hand, it does not affect the capacity of GARCH specifications to describe and forecast volatility and can be dealt with by restrictions on the model parameters, e.g., imposing IGARCH constraints. Existence of cofeatures in variance, see EngIe and Kozicki (1993), aIso allows one to improve identificability by simplifying the model dynamic structure. 14 15 Ack.nowledgements. Appendix A. Proof ofProposition 2. Alfonso Novales made useful cornments and suggestions to previous versions of this work. Sonia Sotoca acknowledges financiaI support fram CICYT, project PB95-0912/95 and Fundación Caja de Madrid. Eqs. (15) and (19) imply Ihat: (A. 1) References. (A.2) Bollerslev, T., 1988. On the Correlation Structure for the Generalized Autoregressive Conditional Heteroskedastic Process. Journal ofTime Series Ana/ysis. 9, 2, 121-131. Bollerslev, T., 1990. Modelling the Coherence in Short-Run Nominal Exchange Rates: A Multivariate Generalized ARCH Approach. Review of Economics and Stafistics, 72, 498-505. Bollerslev, T., R.F. Engle and J.M. Wooldridge, 1988. A Capital-Asset Pricing Model wilh Time-Varying Covariances. Journal of Political Economy, 96/1, 16~ 131. Substituting (A.1) and (A.2) in (20) yields: The next steps require to use the following algebraic result: vec(ABA T) =(A0A)vec(B) (AA) Box, G.E.P. and G.C. Tiao, 1975. Intervention analysis with applieations to eeonomie and environmental problems. Journal ofthe American Statistical Association, 70, 70~9. and the faet that the veehO and veeO operators are snch that, for any syrnmetric matrix S, vech(S) = Al vec(S) and vee(S) = .12 veeh(S)vector, being al ,a2 are 0-1 matrices. Engle, R.F., 1982. Antoregressive Conditional Heteroskedasticity with Estimates of the Variance ofU.K. Inflation. Econometrica, 50, 987~10. Then, Exp. (A.3) in veeO fOlm beeomes: Engle, R.F., V.K. Ng and M. Rotschild, 1990. Asset Pricing wilh a FACTOR-ARCH Covariance Strueture: Empirical Estimates for Treasure Bills. Journal of Econometrics, 45, 213-237 Engle, R.F. and S. Kozicki, 1993. Testing for Coroman Features. Journal of Business and Economic Statistics, 11,369-380. Bngle, R.F. and K.F. Kroner, 1995. Multivariate Simultaneous Generalized ARCH. Econometric Theory, 11,122-150. (A.5) and by result (AA): Á, [V-'0V-'lvec(1:;) = IV + A Á, [V-'0( V-, )T1 vec [ ,;_, (.;_,)'] + B Á, [V-'0 V-'lvec(1:;_,) (A.6) which can be expressed in veehO notation as: Á, [V-'0 V-'l Á, vech(1:;) = IV + A Á, [V-'0 V-'lÁ, vech[ .;_, (';-,>'1 + B Á, [V-'0 V-'l Á, vech(1:;_,) (A.7) Denoting: P=Á,[V-'0V-'lÁ, simplifies(A.7)to: P vech(1:;) = IV + AP vech[ ,;_, (.;_,)'1 + BPvech(1:;_,) (A.8) which implies: (A.9) Finally, identifying Ihe parameter matrices in (A.9) and (21) yields Exp. (22)-(25). 16 17 • Appendix B. Proof of Proposition 3. According with (14), the minus log gaussian likelihood of a sample of size N is: 1 ~(e",N)=-kln2+L:E1 1 2 N T -1 +e,:E, e,) 21~ (B.l) Substituting (A.l) and (A.2) in (B.l) yields: ~(e" ... , eN) N = 1. Nk1n(2 n) + 1. L 2 2 ,-, {In I V- 1 :E; (V- 1 fl + (e;n V- 1 f[ V-1 :E; (V- 1 fr' V-1 e;l (B.2) and the terms inside the surnmatory are such that: (B.3) Fig. l. Eigenvalues ofthe conditional covariances in the il1 conditioned case (° 12 w To understand the simplification in (B.3), note that (16) implies that VI :;: ¡A -112 1, because the detenninant of the eigenvector matrix M is one and, therefore, In I VI :;: --In lA! . i 2 largest eigenvalue of Sigma{t) eigenvalue 01 Sigma{1) Smal~st 35 3.5 2.5 2.5 3 Finally, substituting (B.3) and (B.4) in (B.2) implies lba!: (B.5) • 1.5 0.5 0.5 o~ '- 50 100 150 200 250 300 18 19 '" .8). Fig. 3. Isoquantas ofthe log-likelihood function ofmadel (12). J Fig. 2. Eigenvalues of the conditional covariances in fue well-conditioned case ( cr 12 = .1). 0.75 4 Smallesteigenvalue of Slgma{!} largest eigenvalue of Sigma(!) 4 3.5 0.7 3.5 ! 3 2.5 2.5 0.65 2 1.5 0.5 0,L-~51C23 oLI_ _ ~ 50 __ ~ 100 _ _150 ~=- 200 250 300 0.5 '-:0L.f~c, 0.5 1 0.55 0.6 0.65 phi )) 20 21 0.7 0.75 0.8 Figure 5, Estimated conditional volatilities. Figure 4. Scaled yields after intervention in FF. VolaWityol DM/USD log pct rellNll VolatiUyofFF/USD log ¡x;t retl.HT1 3.5,,-_-_'-_ _;:c_~- Stardardized plol of series # 2 Standardized piel of series # 1 " 2 1.5 o:t O lOO 500 0.5 100 200 300 400 500 600 O O 700 log peto retum 3.5r-_ _C:;=~'_ Vo!ati~yfBPIUSD 3 2.5 2.5 0.5 0.5 00 200 300 ,400 500 200 300 400 500 Volalilityof JYIUSD Iog peI. retum 3.5r-_=~:";, Slandardized ptot 01 series # 4 Standardized ptot of series # 3 100 lOO 600 100 200 300 400 500 500 0"-CI~23ÓOo,56c!7 700 600 23 22 600 700 ------------------------------------------ --, Figure 6. Estimated conditional correlations. 1f ,. \('IV correlation DM.SP Iog pct retums oorrelation DM.FF log pct retums yoy. O.9~ o,t O.8~ O.6~ ::¡f 0.3 0.3 o, 02 0.1 0.1 0~-12O3<5&ro ~-C,O203=5&'"7 Table 1. ML estimates, correlations and principal components infonnation. True values correlalion FF-BP 109 pet retums correlation DM-JY Iog pct retums a2 <1> ;= 1.0 =.7 Estimatest 2 0 = 1.065 (.091) .¡, = .706 Correlations e =.6 0.5 0.3 0.3 0.2 0.2 0.1 0.1 00 100 200 300 400 500 600 íl = .609 (.231) -- -- 1 1 0.01 -0.1 0.06 1 -- 0.02 0.05 0.71 -0.71 o 0.98 1 1.98 0.04 0.71 0.71 t The figure in parentheses is the standard deviation of the estimate. °0~1C-2=3<5O'or 700 correlation BP-JYlog pct relums correlation fF..JY Iog pcl rems OA 0.3 ~ 0.2~ 0~1 o __~' 100 1 200 300 400 500 600 0.1 700 25 24 Eigenvectors (by rows) 1 (.203) OA Eigenvalues Table 3. Transfonnation coefficients and Q statistics ofthe scaled series. Table 2. Descriptive statistics of excess retums. Statistie DM FF BP JY Scaled eigenvectors [matrix Vin Eq. (16)] after intervention Standard deviation 1.358 1.31 1.359 1.298 DM FF BP JY Skewness -0.046 -0.021 0.432 -0.609 DM 4.032 -4.195 0.068 -0.088 Excess Kurtosis 1.608 1.874 3.986 2.313 FF -0.652 -0.628 1.062 0.413 BP -0.102 -0.123 -0.482 0.889 JY 0.233 0.224 0.209 0.171 Sample correlations: DM 1 -- -- -- FF 0.978 1 -- -- BP 0.777 0.781 1 -- JY 0.635 0.623 0.477 1 Eigen-strueture oftbe eovarianee matrix Scaled eigenveetors [matrix V in Eq. (16)] Eigenvalue % ofvar. 0.039 0.55 3.535 -3.651 0.041 -0.078 0.472 6.66 -0.652 -0.628 1.062 0.413 0.954 13.45 -0.102 -0.123 -0.482 0.889 5.627 79.34 0.233 0.224 0.209 0.171 Ljung-Box Q statistic (for 10 lags oftbe autocorrelation funerion of cross-products ofilie transformed series)t Series #1 Series #2 Series #3 Series #4 Series #1 288.13 -- Series #2 42.45 19.67 --- --- Series #3 63.27 12.58 57.87 -- Series #4 28.6 23.09 84.04 25.19 t The 95% percentile ofaxio is 18.3. As the data is not gaussian, this is on1y an orientative critical value ofthe statistic under the null ofno autocorrelation. 27 26 r i Table 4. ML estimates ofthe GARCH(l,l) model (standard deviations inparentheses). vech(e,*e; T) a¡j 4>ij é¡j (e;t? 1 (--) .955 (.010) .683 (.017) e;t e;t 0(--) .895 (.015) .845 (.015) e;, e;, 0(--) .273 (.009) .238 (.008) e¡t e;t 0(--) .442 (.007) .232 (.004) (e;t? e;t e;t 1 (--) .895 (.023) .795 (.020) 0(--) .936 (.012) .846 (.014) e;te;t 0(--) .971 (.010) .925 (.014) (e;t? 1 (--) .891 (.015) .763 (.013) e;t e;t 0(--) .957 (.025) .880 (.020) (e;,)' 1 (--) .895 (.018) .745 (.018) Diagnostics of estimation resuIts: Gaussian likelihood (minus log) on convergence 3618.78 Square root norro of gradient 0.0773 Min. eigenvalue of scaled data covariances 0.0658 Min. eigenvalue of original data covariances 0.0046 Table 5. Statistics of standardized residuals. Series #1 Series #2 Series #3 Series #4 Skewness -0.583 0.481 -0.735 -0.015 Excess Kurtosis 3.156 5.016 2.376 1.865 º Ljung-Box statistic (for 10 lags oftbe autocorrelation function of cross-products oftbe standardized series) Series #1 Series #2 Series #3 Series #4 Series #1 5.30 -- Senes #2 4.08 4.48 --- --- Senes #3 6.13 7.67 9.38 -- Series #4 8.80 16.11 5.19 9.90 t The 95% percentile ofaxio is 18.3. As the data is not gaussian, this is only an orientative critical value of the statistic under the null of no autocorrelation. t The parameters in this colunm are constrained to identity matrix values, according to the transfonnation (15)-(17). The minus log likelihood corresponding to this model with free covariances is 3614.52. Therefore, an LR test would not reject the constraints at the 95% confidence level. 29 28