Bayesian model estimation and selection for the weekly colombian exchange rate

Norberto Rodríguez

Bayesian model estimation and selection for the weekly colombian exchange rate

Contents 1 Introduction 3 2 Background 4 2.1 Generalized Autoregressive Conditional Heterocedastic Models. 4 2.2 State Space Models. ....................... 7 2.3 Markov Chain Monte Carlo Methods............... 9 2.4 Model Selection .......................... 9 2.4.1 Markov Chains for Super-Models............. 10 2.4.2 Markov Chains with Jumps................ 12 3 Methodology 14 3.1 The Data.............................. 14 3.2 Estimation. ............................ 15 3.3 Model Selection........................... 18 4 Results 19 4.1 Estimation. ............................ 19 4.2 Model Selection........................... 21 5 Conclusions. 31 References 32 2

Bayesian Model Estimation and Selection for the Weekly Colombian Exchange Rate Norberto Rodríguez Niño ∗ September 2000 Abstract This document reviews and applies recently developed techniques for Bayesian estimation and model selection in the context of Time Series modeling for Stochastic Volatility. After the literature review on Generalized Conditional Autoregressive models, Stochastic Volatility models, and the relevant results on Markov Chain Monte Carlo methods (MCMC), an example applying such techniques is shown. The methodology is used with a series of Weekly Colombian-USA Exchange Rate on seven diﬀerent models. The GARCH model, which uses Type-IV Pearson distribution, is favored for the selecting technique, Reversible Jump MCMC, over other models, including Stochastic Volatility Models with a Student-t distribution. ∗ This work is based on the Master Report presented to The University of Texas at Austin as a requirement for my degree of Master of Science in statistics. I would like to thank Dr. Edward George for his guidance, his advice, and his patience. However, any mistakes that remain are my own. Needless to say, I am very thankful to the Colombian Central Bank for its ﬁnancial support during my studies at UT. 1 Contents 1 Introduction 3 2 Background 4 2.1 Generalized Autoregressive Conditional Heterocedastic Models. 4 2.2 State Space Models. . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Markov Chain Monte Carlo Methods. . . . . . . . . . . . . . . 9 2.4 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4.1 Markov Chains for Super-Models. . . . . . . . . . . . . 10 2.4.2 Markov Chains with Jumps. . . . . . . . . . . . . . . . 12 3 Methodology 14 3.1 The Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Model Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Results 19 4.1 Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Model Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5 Conclusions. 31 References 32 2 1 Introduction Two diﬀerent and competing techniques are nowadays used for econometricians and statisticians to model volatility, as in return assets or exchange rates: One of them, Autoregressive Conditional Heteroscedastic (ARCH), its generalization, GARCH, and multiple extensions have proven to be very successful in modeling ﬁnancial time-varying volatility series. The competing alternative to GARCH models are Stochastic Volatility models, mainly treated in the frequentist framework, which have more “theoretical” background. They appear in the ﬁnancial literature on option pricing as a generalization of the Black-Scholes model. Usually the researcher faces the question of what model to use. Several alternatives have been proposed in the frequentist statistical framework to deal with this, ranging from R2 , the traditional and extensively used Akaike Information Criterion (AIC), PRESS statistic, and many others. In those alternatives the model residuals are obtained (usually observed minus adjusted or perhaps deviance residuals) and aggregated to form the measures of adequacy. Unfortunately, ‘classical’ approaches to model choice are limited. The well-known standard Neyman-Pearson theory provides an ‘optimal’ test through the likelihood ratio for the limited case of the comparison of two distinct models. More generally, the likelihood ratio test enables a choice between models only in the nested case, where there is an unambiguous null hypothesis. Selection is based upon an asymptotic χ2 approximation, which usually is poor for small sample sizes. Frequentist theory does not oﬀer much for model selection of non-nested models, which are not rare in practice. (See Piorier, 1995, and Gelfand, 1995, for references). This documents uses MCMC, an intuitive, computationally easy-to-implement, and inexpensive Bayesian alternative, to decide among suitable models. This document is organized as follows: Section 2 presents a review of GARCH models and Stochastic Volatility models, especially those models to be used later in this paper; then it moves on to MCMC methods and Bayesian model selection techniques. Section 3 deals with how to use MCMC to implement estimation of GARCH models, as Vrontos et al. (2000) suggest, and how to estimate SV models. Section 4 deals with the results and section 5 presents the main conclusions, some suggestions and limitations. 3 2 Background This section presents a short review of some statistical and econometric models proposed in the literature to model time-varying volatility series. After that, Bayesian techniques for model estimation and selection are brieﬂy presented. 2.1 Generalized Autoregressive Conditional Heterocedastic Models. Time series models, traditionally ﬁtted in practice, are suppose to have constant variance, but when working with high frequency time series that is seldom the case, as can be seen in Figure 2. Autoregressive Conditional Heterocedastic (ARCH) models have been proposed in the literature to deal with this problem, and those in the spirit of Engel (1982) could be tried; this is yt = ²t or yt = c + ²t or yt = ct + ²t (1) as some model for the levels of the observed time series, with ct some function of lags of yt or other exogenous variables, or even an ARMAPterm (see Box and Jenkins, 1976); where ²t ∼ N (0, h2t ) and h2t = α0 + ri=1 αi ²2t−i , for αi ≥ 0, i = 0, 1, · · · , r. Hence, the conditional variances are thought of as a function of the square of the previous observational residuals. In the original and simple case that normality is assumed, the likelihood is given by: l(y|α0 , · · · , αr , h20 ) = T Y (2πh2t )1/2 exp(−²2t /2h2t ) (2) t=1 when, in return ﬁnancial time series analysis, the usual most important unknown to estimate and forecast is the volatility, h2t . A Generalized ARCH (GARCH), that usually results in more parsimonious representations, as Bollerslev (1986) proposed, assumes that the conditional variances follow an ARMA process, thus: h2t = α0 + r X αi ²2t−i + s X βj h2t−j (3) j=1 i=1 with the analogous likelihood, as in the ARCH case, if normality for ²t is assumed. With restrictions αi ≥ 0, i = 0, 1, · · · , r, and βj ≥ 0, i = 1, 2, · · · , s, 4 P P to guarantee that h2t >= 0, for all t, and ri=1 αi + sj=1 βj < 1, in order to assure stationarity in variance. Weaker restrictions can be required, in practice, though (See Nelson and Cao, 1992). Another extension of (3) is using a Student-t distribution, with n degrees of freedom to account for the heavier tails of the distribution of the error process {²t }, as was introduced by Bollerslev (1987), and by Baillie and Bollerslev (1989). The likelihood then is l(y|α0 , · · · , αr , β1 , · · · , βs , n, h20 ) = T Y t=1 ) Γ( n+1 2 Γ(n/2)[π n h2t ]1/2 µ ²2 1 + t2 nht ¶−(n+1)/2 (4) An Exponential GARCH (EGARCH), is introduced by Nelson (1991) to avoid imposing restrictions on αi , βj in (3) with ¯ ¯ ¯ ¯ s r X ¯ ¯ ¯ ¯ ²t−i X ² ² t−i t−i 2 ¯ ¯ ¯ ¯ βj ln(h2t−j ) ln(ht ) = α0 + (5) ϕi (¯ ¯ − E ¯ ht−i ¯) + αi ht−i + h t−i j=1 i=1 ¯ ¯ ¯ ¯ for the conditional variance of {²t }, so h2t > 0, for all t, and E ¯ h²t−i ¯ = t−i Γ(2/ν) [Γ(1/ν)Γ(3/ν)]1/2 . Additionally, assuming that ²t follows a Generalized Gaussian Distribution, GED, then the likelihood is: l(y|α0 , αi , ϕi , βj , ν, h20 ) = T Y t=1 c h−1 t ¯ ¯ ¯ ²t ¯ν ¯ ) exp(−0.5 ¯¯ λht ¯ (6) i1/2 h (−2/ν) Γ(1/ν) where c = λ2(1+1/ν) Γ( 1 ) and λ = 2 . Γ(3/ν) ν The use of Bernoulli-Mixtures of two normal distributions, proposed by Ball and Torous (1983), was successfully implemented by Vlaar and Palm (1993): r s X X 2 2 ht = α0 + αi ²t−i + βj h2t−j ν i=1 j=1 for the conditional variances. They used an MA(1) term to model the changes in levels of several exchange rates, as: ²t = yt − φ0 − λν − θ1 ²t−1 5 (7) and each ²t is distributed as a mixture of two normals, hence ²t ∼ (1 − λ)N(−λν, h2t ) + λN ((1 − λ)ν, h2t + δ 2 ) where λ is the jump intensity, ν is the expectation in the jump size, and δ 2 the expected change in variance. This representation is useful and intuitive for economies with target or bands for their exchange rates, like Colombia or the European Economic Union. The MA term, θ1 ²t−1 , is explained as allowing for mean reversion. Therefore, the likelihood is expressed as: l(y|φ0 , θ1 , αi , βj , λ, δ, ν, h20 ) µ ¶ · T Y (²t + λν)2 −1/2 1 − λ exp − + = (2π) ht 2h2t t=1 ¶¸ µ λ (²t − (1 − λ)ν)2 (8) exp − (h2t + δ 2 )1/2 2(h2t + δ 2 ) More recently, Bera and Premaratne (2000) propose the use of the Pearson Type-IV Family Distributions in order to model skewness and leptokurtosis that are larger than usual. For a GARCH(1,1), the log-likelihood is: l(y|Θ) = T X lt (y|Θ) t=1 µ ¶ yt − φ0 − φ1 yt−1 − µht lt (y|Θ) = − ln(α0 )−0.5 ln(ht )−ln(C)+δ arctan α0 ht µ ¶ " µ ¶# r+2 yt − φ0 − φ1 yt−1 − µht 2 − ln 1 + (9) 2 α0 ht with C= Z π sinr ψ exp{−δ ψ} dψ 0 and Θ = (α0 , α1 , β1 , r, δ, µ, φ0 , φ1 ). They used yt = φ0 + φ1 yt−1 + ²t and h2t = 1 + α1 ²2t−1 + β1 h2t−1 6 for their empirical application, but, of course, the model on yt can be extended. But according to Nagahara (1999), (9) should be µ yt − φ0 − φ1 yt−1 − µht lt (y|Θ) = − ln(α0 )−0.5 ln(ht )−ln(C)+(r+2)δ arctan α0 ht ¶ " µ ¶2 # µ yt − φ0 − φ1 yt−1 − µht r+2 ln 1 + (10) − 2 α0 ht with C = exp[π(r + 2)δ/2] Z π sinr ψ exp[−δ (r + 2)ψ] dψ 0 When information arrives at random order and data refers to close-toclose periods, Hsieh (1989) showed that the use of a normal-lognormal mixture distribution improves the ﬁt over other GARCH alternatives. That distribution is not tried here because it requires the computation of a deﬁned integral over <; one must rely on high numerical integration, however, to provide a suitable solution to that problem. 2.2 State Space Models. A diﬀerent alternative in time-series analysis is an State-Space (S-S), model, as West and Harrison (1997) state, they extend and update the seminal paper by Harrison and Stevens (1976). The model is typically represented by two equations: ObservationEquation : Yt = Ft0 θt + νt , νt ∼ N(0, Vt ), (11) SystemEquation : θt = Gt θt−1 + wt , wt ∼ N(0, Wt ) (12) where yt is the observed (sometimes latent) variable ; θt is a vector of unknown parameters, which follows the ﬁrst order Markov process (12); νt is a vector of unobserved and uncorrelated stochastic error terms; Gt is a matrix of known coeﬃcients; and wt is an unobserved stochastic term, generally assumed uncorrelated with νt . It is worth noting that any data series for which there exists a natural ordering of observation, ﬁts into the dynamic framework, so the time series not need be equally spaced, and missing data problems can easily be handled 7 ¶ in this context. Pole et al. (1994) present applied methodology, as well as multiple examples. Usually, and without much loss, Vt can be considered constant, and working in terms of the precision φ = V −1 , it is possible to get estimations of Wt . The likelihood for the S-S model is given below: ¶ µ T Y (yt − Ft0 θt )2 −1 (2πVt ) exp − l(y|θt ) ∝ l(y|Θ) = 2Vt t=1 t=1 T Y (13) which is the density of a normal with mean Ft0 θt and variance Vt . Prior probabilities can be set up on θ0 and a fully Bayesian analysis of the StateSpace model can be run. A non-linear (non-gaussian) S-S model can be set up as follows, (See Harvey et al., 1994 for details): yt = ²t exp (ht /2) (14) as the non-linear observation equation, and ht = γ + φht−1 + ηt , ηt ∼ N(0, ση2 ), (15) as the system equation, where ht = ln(σt2 ). This model, which is known in the ﬁnancial and econometric literature as the Stochastic Volatility model (SV for short) can be transformed to get a linear observational equation, as ln(yt2 ) = ht + ln(²2t ) (16) where yt is the mean corrected return at time t; ht is the log-volatility at time t, which is assumed to follow a stationary process, with h0 drawn from a stationary distribution; ²t and ηt are uncorrelated standard normal white noise shocks; φ is the persistence in the volatility (when for stationarity restriction |φ| <1); and ση2 is the volatility of the log-volatility. The likelihood, assuming normal distribution, can be obtained by using a Kalman Filter (See Jaquier et al., 1994, Ap. B.1). The use of a ν degrees of freedom Student-t distribution on ² has been considered, and the Kalman Filter needs some minor modiﬁcations1 (See Ruiz, 1994). 1 E.g., the use of digamma and trigamma functions; Abramowitz and Stegun (1967) oﬀer computational details. 8 2.3 Markov Chain Monte Carlo Methods. When doing fully Bayesian analysis of complex or high-dimensionality models, the researcher usually faces the problem of non-conjugacy, meaning that non-exact analytical posterior distribution can be achieved. This leads to the necessity of using simulation approaches. Direct simulation is often impossible, due to the complicated mathematical form of the posterior distribution in many applied models. Because of that, an exponential rise in the interest and application of Markov Chain Monte Carlo (referred to by its acronym, MCMC) as a tool for numerical computation of complex integrals, particularly in Bayesian analysis, has emerged. The key to Markov Chain simulation is to create a Markov process whose stationary distribution is a speciﬁed π(θ|y) and to run the simulation long enough that the distribution of the current draws is close enough to the stationary distribution. Once the simulation algorithm has been implemented, it should be iterated until convergence has been reached, or, if convergence is painfully slow, the algorithm should be altered. Hence, the study of MCMC has seen a corresponding interest in the convergence properties of the resultant chains, which may be assessed through a suite of diagnostics borrowed from diverse areas such as time series, exploratory data analysis (EDA), and central limit theory. The most widely used Markov Chain Monte Carlo methods are the Gibbs Sampler and the group of Metropolis-Hastings algorithms and a good description of them can be found in Gamerman (1997) or Gelman et al. (1995), among many others. 2.4 Model Selection The most damaging comment on the standard practice of choosing a single model, and then proceeding conditional on it, is that the research’s uncertainty is understated. – Piorier (1995, p. 605) For frequentists the model selection problem reduces to choosing one from a set of M models. This is usually the main aim of the analysis, and is done according to some model selection criterion, as stated above. Bootstrap methods have been used for model selection (See Maddala and Li, 1996, sec 5, for references). For another perspective see Poskitt and Tremayne (1983). Essentially, two alternative approaches in the Bayesian context are presented. The ﬁrst was introduced by Carlin and Chib (1995) and considers all 9 models in a formation, called here a supermodel. The Markov Chain simulation scheme for this supermodel is presented below. The second approach presents sophisticated simulation techniques using Markov chain with jumps between the diﬀerent models; it is referred to as Reversible Jumping, and it was introduced by Green (1995). It will be assumed throughout this section that y is observed and it can be described according to a model Mj with parameters θj of dimension dj , taking values in a parameter space Θj ⊂ <dj , j = 1, 2, . . . , M. The value of M could be ∞ as, for instance, when considering countable classes of models. m serves the purpose of indicating a speciﬁc model. Assume for the moment that the posterior distribution π(θ, j), the joint distribution of the super-parameter and the model indicator are to be obtained. However, the main interest in inference is to obtain the posterior distribution of θj |m = j, j = 1, 2, . . . , M . These distributions respectively provide the posterior inference within each of the models and the posterior probabilities of the models. The supermodel approach provides a sample from this more general, perhaps unnecessary posterior distribution whereas the approach with jumps only provides samples from θj |m = j, j = 1, 2, . . . , M , and m. The presence of common parameters does not pose any problem here. 2.4.1 Markov Chains for Super-Models. The joint distribution of all random quantities is given by π(y, θ, j) = π(y|θ, j)π(θ|j)πj (17) where j is the value of m and πj = P (m = j). Given that m = j, the distribution of y depends on θ only through θj , or mathematically, π(y|θ, j) = π(y|θj , j) (18) Assume also that the θj are conditionally independent, given the value of m. Hence, M Y π(θ|j) = π(θi |j) i=1 Note that the prior distribution π(θi |j), for i 6= j does not make much sense. It speciﬁes the distribution of the parameters of model i, conditioned on the fact that this is not the true model. Carlin and Chib (1995) refer to these as pseudo-prior or linking distributions. Due to the conditional independence 10 (18), these priors do not interfere in the expressions of the marginal predictive densities for each model. Nevertheless, they are relevant for the construction of the chain and must be speciﬁed. It follows from the above speciﬁcation that π(y, θ, j) = π(y|θj , j) M Y π(θi |j)πj j=1 which is proportional to the joint posterior distribution of θ and m. A natural blocking is formed by grouping each model’s parameters and m. The full conditional distributions for θ1 , θ2 , · · · , θM and m are obtained as follows: • For block θj , j = 1, . . . , M , ½ π(y|θj , j)π(θj |j), for m = j pj (θj ) ∝ π(θi |i), for m = i 6= j • For block m −1 pM (j) = k π(y|θj , j) M Y π(θi |j)πj , j = 1, . . . , M j=1 that is a discrete distribution with proportionality constant k= M X π(y|θl , l) M Y π(θi |l)πl i=1 l=1 m can always be sampled directly because it has a discrete distribution. Direct sampling from blocks θj will depend on the conjugacy structure for model m = j and the form of the pseudo prior distribution. When direct sampling for some of the θj ’s is not possible, Metropolis-Hastings steps may be used. The above scheme satisﬁes the conditions of a conventional Markov Chain and therefore converges to the target distribution given by the posterior. Comparison between models is based on the marginal posterior distribution of m, p(j), j = 1, . . . , M. These probabilities are estimated by the proportion of values of m equal to j in the sample of size n. The pseudo prior distributions must be carefully chosen, as they aﬀect the rate of convergence of the chain. Carlin and Chib (1995) recommend 11 the use of simple standard approximations based on univariate estimates obtained from pilot chains. The same authors suggest using fairly vague prior distributions, but it is well-known that when using this practice on models with diﬀerent dimensions the Bayes factors turn out to be very sensitive. So, this prior setting may need further justiﬁcation to satisfy potential users. Finally, this approach is not applicable to the case of countable number of models under consideration. Hence, the number of practical and theoretical diﬃculties of this approach suggest it should be used with care. See Gamerman (1997), where more details can be found. 2.4.2 Markov Chains with Jumps. Green (1995) introduced a reversible-jump MCMC strategy for generating from the joint posterior π(m, θm |y), based on the standard Metropolis-Hastings approach. The reversible-jump MCMC was also applied by Richardson and Green (1997) for an analysis of univariate normal mixture; by Nobile and Green (2000), for factorial experiments using mixture modeling; and Dellaportas and Forester (1999), for analysis of contingency tables. During reversible-jump MCMC sampling, the constructed Markov Chain moves within and between models, so that the limiting proportion of visits to a given model is the required π(m|y). In general, suppose that the current state of the Markov Chain at time t is (m, θm ), where θm has dimension d(θm ) and a move is proposed at time t + 1 to a new model m0 with probability j(m, m0 ) and corresponding parameter 0 vector θm 0 . Then, a vector u is generated from a speciﬁed proposal density 0 0 q(u|θm , m, m0 ), and (θm 0 , u ) = gm,m0 (θm , u) is set for a speciﬁed invertible −1 0 function gm,m0 such that gm,m0 = gm,m 0 . Note that d(θm ) + d(u) = d(θm0 ) + 0 d(u ). Green (1995) showed that, if the new move is accepted as the next realization of the Markov Chain with probability α = min{1, r}, where r= 0 0 0 0 0 0 0 0 π(y|m0 , θm 0 )π(θm0 |m )π(m )j(m , m)q(u |θm0 , m , m) |J| π(y|m, θm )π(θm |m)π(m)j(m, m0 )q(u|θm , m, m0 ) (19) 0 0 with J = ∂(θm 0 , u )/∂(θm , u) denoting the Jacobian of the transformation, then the chain satisﬁes the condition of detailed balance and has the required limiting distribution π(m, θm |y). The condition of detailed balance requires 0 that the equilibrium probability of moving from a state (m, θm ) to (m0 , θm 0) 0 0 equals that of moving from (m , θm0 ) to (m, θm ); for details, see Green (1995). To implement the reversible-jump MCMC, the probabilities j(m, m0 ) need to be speciﬁed for every proposed move, as well as the proposal 12 0 0 distributions q(u|θm , m, m0 ), q(u0 |θm 0 , m , m) and the function gm,m0 . These choices do not aﬀect the results in terms of models selected but may aﬀect crucially the convergence rate of the Markov Chain. For the probability j(m, m0 ), one non-informative alternative is j(m, m0 ) = (M − 1)−1 , for all m, m0 ∈ M, when at each state of the chain a move from one model to other one is always proposed. Vrontos et al. (2000) proposed a modiﬁcation of Green’s technique, which they have successfully implemented in a series of experiments with GARCH and EGARCH models; this is described as follows: First, they suggest that all the parameters of the proposed model be generated from a proposal distribu0 0 0 0 tion. Consequently, (θm 0 , u ) = (u, θm ) with d(θm ) = d(u ) and d(θm0 ) = d(u) 0 0 0 0 0 0 , q(u|θm , m, m ) = q(u|m ), q(u |θm0 , m , m) = q(u |m), and the Jacobian in (19) is 1. In this case, the probability of acceptance of the new move as the next realization of the Markov chain is given by α = min{1, r}, where 0 0 0 0 0 0 π(y|m0 , θm 0 )π(θm0 |m )π(m )j(m , m)q(u |m) r= π(y|m, θm )π(θm |m)π(m)j(m, m0 )q(u|m0 ) (20) The proposal densities q(u|m0 ) and q(u|m0 ) can be chosen by investigation of a “pilot run.” They start the chain from the best available starting values (e.g., the maximum likelihood estimates when available) and simulate the “within-model” Markov Chain many times to obtain approximate marginal posterior means and covariance matrices for each model parameter vector. These estimates are then used to construct proposal densities q(u|m0 ) and q(u0 |m), taken as multivariate normal densities. 13 3 Methodology 3.1 The Data. In order to illustrate the estimation and sele ting methodologies, the weekly observations of the US/Colombian (on spot) ex hange rate are used. Let ert represent the Fridays ex hange rates, running from O tober 21, 1991, through De ember 29 of 1999; that made T = 428 observations. The daily ex hange rate orresponds to the weight average of trading, selling and buying, of U.S. dollars in the open market. In the event of the market being losed on a Friday, the observation on the previous Thursday was used. Figure 1 shows the raw data; from this and with the help of a unit root test, (See Enders, 1995), it is easy to on lude that the ex hange rate has a unit root, so, as usual, one works with the rst di eren e of the natural logarithm of ex hange rates, returns, also known as ontinuously ompounded rates of return, (rt = ln(ert ) ln(ert 1 ) for t = 1; : : : ; T ), whose representation is shown in Fig. 2. 2200 2000 1800 1600 1400 1200 1000 800 600 1992 1993 1994 1995 1996 1997 1998 1999 Figure 1: Weekly Colombian Ex hange Rate. From Figure 3, whi h shows the histogram of returns, it is noteworthy that the skewness and kurtosis oeÆ ients for rt , whi h are 0.8581 and 5.6872, respe tively, are both signi antly positive and mu h larger than ommon, thus showing asymmetry and leptokurtosis. As pointed out by Vlaar and 14 0.064 0.048 0.032 0.016 0.000 -0.016 -0.032 -0.048 1992 1993 1994 1995 1996 1997 1998 1999 Figure 2: Weekly Return from Colombian Ex hange Rate. Palm (1993), the skewness ould be the result of the asymmetry in the movements of the parity adjustments, and a high kurtosis ould result from a time varying-varian e. These two results lead to onsidering those distributions di erent from the normal, whose use is unlikely to yield appropriate results. Also, it is lear that the varian e is not onstant at all, whi h an be seen from the Lunjg-Box's auto orrelation statisti of the squared returns: 2 2 Q (12) = 28:32 and Q (24) = 44:86 for lags 12 and 24, respe tively. 3.2 Estimation. Based on the theoreti al justi ation of several models and taken in a ount the parti ular Colombian e onomy, seven di erent models were proposed for estimation and sele tion among them; They are des ribed below. With the aim to implement a fully Bayesian analysis, MCMCs were used to estimate the models 2 , mu h of them are estimated by the rst time from the Bayesian point of view, whi h is a new development of this work. First, an ARMA(1,1) model with normal disturban es was tted; this model ould be tried in pra ti e when the s holar ignores the fa t that the varian e is not onstant. In this report, su h a model is also used as the 2 Maximum Likelihood Estimation (MLE) was tried, but problems getting pre ise estimations of the varian e- ovarian e matrix, for some models, restri ted its use. 15 Figure 3: Histogram of return. referen e model, against whi h others are to be ompared. Non-informative Beta(1,1) from -1 to 1 3 priors were hosen for 1 and 1 . A N(0,5) as prior for 0 was used; nally, a nearly non-informative but proper InverseGamma(2.001,0.001), (So, Expe ted value and varian e equal to 1/1000) was used for 2 .4 Se ond, an ARMA(1,1) for the levels of return plus a GARCH(1,1) for the varian e, so, t in (1) omes from an ARMA(1,1), and V (t ) = h2t from (3) with r = 1, and s = 1; still assuming normality, the likelihood is given by (2). Non-informative U ( 1; 1) prior for 1 and 1 ; 1= 0 for 0 ; a N(0,5) for 0 ; U (0; 1) for 1 , and 1 , and in order to assure stationarity in (3), any proposal that did not t 1 + 1 < 1 is reje ted. The same priors on 0 as before was used. From initial runs h20 turned out to be statisti ally equal to zero, thus it is set to (1 0 1 ) , as Nelson and Cao (1992) propose for omputational onvenien e. Third, an ARMA(1,1) plus a GARCH(1,1) with Student-t distribution, and 1 + 1 >= 1 are reje ted in (3) for stationarity purposes; n > 2 in (4); h20 was set as in before. Priors: 1= 02 for 0 ; 1=(n 2)2 on n, and 1 , 1 as in the se ond model. 0 , 1 , and 1 were treated as in the rst model. Fourth, an ARMA(1,1) plus an EGARCH(1,1), (5) instead of (3); with GED distribution, therefore, the likelihood is given by (6). Priors: 0 , 1 , 3 in the sense that if Jaquier et al. (1999) 4 1= 2 as a prior for X 2 Beta(1; 1) then Y = 2X 1 Beta(1,1) from -1 to 1. was tried too, with similar results 16 See and θ1 as above; a N(0,1) was used as prior for α0 ; N(0,5) for α1 , for β1 an U(-1,1); ϕ1 is assumed identically to zero; and U(0,80) for ν. Any proposal with |β1 | > 1 was rejected for stationarity purposes. Fifth, an ARMA(1,1) plus an GARCH(1,1) with a Mixture of two normals as the error term distribution using (8) as the likelihood. Priors: φ0 , φ1 , and θ1 as before. 1/α02 for α0 ; for α1 , and β1 , U (0, 1); N(0,10) for ν; 1/(δ 2 )2 for δ 2 ; ﬁnally, U (0, 1) for λ. Sixth, an ARMA(1,1) plus an GARCH(1,1) with Type-IV Pearson distribution, hence, the likelihood is computed by using (10). Priors: InverseGamma(2.001,0.0001) was used for α0 , U(0, 1) for α1 , and β1 ; (r−2) following an Inverse-Gamma(2.001,0.0001); N(0,1) for δ, and N(0,5) for µ. The distributions used previously for φ0 , φ1 , and θ1 were used here too. The restriction r > 3 is imposed, so the ﬁrst four moments from the Type-IV Pearson distribution exist. Again, any proposal with α1 + β1 >= 1 is rejected. α0 < 0.0001 were rejected for computational reasons. Seventh, and ﬁnally, a Stochastic Volatility model with a Student-t distribution on the error term, expressed in (16) and (15). Non-informative prior U(−1, 1) for φ, a N(0, 5) for γ; Inverse-Gamma(2.001,0.0001) for ση2 ; and for ν a U (5, 80) was used as Jaquier et al. (1999) did to assure the t-Student has at least the ﬁrst-four moments. Although the use of single versus multiple (parallel) chains in MCMC is an open discussion, single chains for each model are used in this job because of time and computing resources limitations. For a recent discussion of this dilemma, the reader is referred to Mengersen et al. (1999), which contains points in favor of each alternative. Raftery and Lewis’ (1996) strategy was implemented here with constant c selected in such a way that the proportions of the proposals accepted were between 20 and 50%, as has become common practice. The update is elementby-element and in random order. However, when high correlations between parameters in conjunction with slow convergence were found the blocking update was implemented to improve convergence. In every case, a ﬁnal chain of 80,000 was run and then steps of 50 to 300 were taken to avoid large autocorrelations in the chains. In that way, ﬁrst-order autocorrelations no larger than 0.55 were guaranteed. Convergence of each chain is assessed by applying the Geweke’s (1992) criterion, which null hypothesis is that stationarity has been reached, and the test-statistic is suppose to follow an standard normal distribution under H0 . This test is implemented by using CODA (See Best et al., 1997). The mean-vector and Variance-covariance matrices are to be obtained in 17 order to feed or implement the RJMCMC, as explained later. 3.3 Model Selection. The model selection exercise consists of applying the Reversible Jump MCMC algorithm and the posterior probabilities, running 200,000 iterations and showing the proportion of each 2,000 that model m (m=1, 2, ... 7) is selected. For checking stability, visual analysis is used. Although there are some fresh results about assessing convergence in RJMCMC, their value are not wellknown yet, as mentioned by Brooks and Guidici (1999). For all seven models the same priors mentioned in Section 3.2 are to be used. The proposal densities q(u|m0 ) and q(u0 |m) for each parameter were constructed by using the MCMC output of the separate model runs described above. These densities are taken as multivariate normals with mean vectors, consisting of the sample mean values and covariance matrix equal to the corresponding sample mean vector and covariance matrix of the parameters in each model. 18 4 4.1 Results Estimation. It took between 12 hours 24 minutes and 56 hours and 48 minutes, from the fastest to the slowest model, to made all the iterations, using a computer with a Pentium I 233 MHz processor, and 64 MB RAM, running under WINDOWS-98 Second Edition. Times reduce to one third using a Pentium III 700 MHz processor, with the same software and 192 MB RAM. In the following presentation the return rates are expressed in 0-100 scale, which was used because of computational and presentational avenues; otherwise, models which use GARCH component get stuck, it seems because values for α0 go so close to zero that the algorithm get overwhelmed. Table 1: Geweke’s Convergence z Scores for the Seven Models. Model φ1 θ1 -0.275 -0.601 0.118 -.594 0.966 -1.13 -.002 0.752 -.186 -.948 0.624 -.374 -.439 -.494 0.667 φ0 φ1 θ1 α0 α1 β1 ν 0.079 -1.01 1.12 1.28 -1.42 -.581 -1.13 φ0 φ1 θ1 α0 α1 β1 λ ν δ2 GARCH(1,1):MixN 2.43 2.40 -2.20 -2.90 -.276 1.08 1.57 -2.17 -2.39 Model φ0 φ1 θ1 α0 α1 β1 r δ µ 1.24 0.586 -.538 -0.092 0.801 -1.66 -0.82 -.308 φ γ ν -1.53 ση2 1.70 -.713 -1.18 -1.52 ARMA(1,1):Normal α0 α1 β1 σ2 φ0 n -0.77 ARMA(1,1)+ GARCH(1,1):N ARMA(1,1)+ GARCH(1,1):t Model -.873 ARMA(1,1)+ EGARCH(1,1):GED Model ARMA(1,1)+ ARMA(1,1)+ GARCH(1,1):T-IV P Model STOCHASTIC VOL.:t The Geweke’s Convergence z Scores for the seven models are presented in Table 1, looking to the numbers it is clear that convergence has been reached for almost every parameter in all models. 19 The estimation results are presented in Table 2 with standard errors in parentheses. 5 Although, parameter transformations 6 were tried for some models, convergence was not improved, hence it use was discharged. From Table 2 it should be said that except for the non-inclusion of some parameters in most of the models, no additional work for the exclusion of non-signiﬁcant parameters in any model was attempted because time limitations, and because it is not the main purpose of this work to improve every and/or one speciﬁc model. Table 2: Estimation Results of the Seven Competing Models. Model One Two Three φ0 Model Six α0 α1 β1 σ2 n 0.269 -.107 .112 0.103 (0.022) (0.010) (3.1E-3) 0.295 -.586 0.564 0.203 0.490 0.398 (.0041) (.0179) (0.016) (.0024) (0.004) (0.004) 0.153 0.151 -.075 0.119 0.623 0.525 3.20 (0.005) (0.024) (0.021) (.008) (0.01) (.006) (0.06) φ0 φ1 θ1 α0 α1 β1 ν 0.045 0.578 -.441 -.484 0.419 0.431 0.864 (0.003) (.032) (.029) (.009) (0.007) (0.009) (0.005) φ0 φ1 θ1 α0 α1 β1 λ ν δ2 0.256 -0.022 0.065 0.047 0.321 0.475 0.247 0.525 2.33 (.013) (.052) (.044) (.009) (.011) (.017) (.015) (.046) (.182) Four Five θ1 (6.6e-3) Model Model φ1 φ0 φ1 θ1 α0 α1 β1 r δ µ -.079 -.0627 1.9e-3 0.111 2.9e-4 0.999 3.17 0.063 -6.1e-3 (.009) (.015) (.013) (3.8e-4) (1.1e-5) (1.8e-5) (8.1e-3) (4.3e-3) (7.4e-4) Model φ γ ν ση2 Seven -.196 0.223 60.70 0.007 (0.023) (0.109) (0.53) (.0016) 5 Such standard errors refer to the time-series estimates which are asymptotic, the square root of the spectral density estimate divided by the sample size. 1+φ 6 Like logarithm or φ0 = ln( 1−φ ), when |φ| < 1, hence φ0 ∈ <. 20 Fig. 4 7 presents the resulting chains diagrams and the histograms of the posterior sample of the parameters of the ARMA(1,1) model. The shape of the posterior distribution of φ0 and σ 2 parameters indicate asymmetry, hence deviation from normality. Fig. 5 does the same for GARCH-N Model, Fig. 6 does the same for GARCH-MixN Model, Fig. 7 does the same for EGARCH-GED Model, Fig. 8 does the same for GARCH-MixN Model, Fig. 9 for Model GARCH Type-IV Pearson, and Fig. 10 for SV-t model. 4.2 Model Selection. The processing time for 80,000 iterations, using the same computer as for estimation, was 16 hours and 23 minutes, for all the seven models, a total of 47 parameters which use above 32 MB of disk-space. Note that, according to Fig 11, this is a conservative run length, and less than one-fourth of the run could be suﬃcient to achieve the same posterior distributions. Table 3: Posterior Probabilities and Bayes Factors of Seven Competing Models. Model Distribution Posterior Prob. Bayes Factor ARMA(1,1) NORMAL 0.00001 1.000 ARMA(1,1)+GARCH(1,1) NORMAL 0.00001 1.000 ARMA(1,1)+GARCH(1,1) t 0.00003 3.000 ARMA(1,1)+EGARCH(1,1) GED 0.00001 1.000 ARMA(1,1)+GARCH(1,1) Mixt. Normal 0.00005 5.000 ARMA(1,1)+GARCH(1,1) T.-IV Pearson 0.99990 199989.000 STOCHASTIC VOL t 0.00001 1.000 The RJMCMC results and Bayes factor are displayed in Table 3; which shows the posterior probabilities and Bayes Factors for the seven models. The last column refers to the relative weight against the worst models, ARMA(1,1), SV, and ARMA+GARCH. According to this results it is very clear that model six, ARMA(1,1)+GARCH(1,1) with Type-IV Pearson distribution over perform the rest of them, with posterior probability 100%. 7 In this graph, as well as in the rest in this Chapter, the next convention is used: T_1 refers to φ0 , T_2 to φ1 , T_3 to θ1 , and T_4 to σ 2 . Hence, for example T_9 in Fig. 8 is used to note δ 2 . 21 Figure 4: Convergen e Diagrams and Histograms of the Posterior Sample for ARMA(1,1) Model Figure 11 shows the onvergen e behavior of the hain. That gure illustrate the probability of ea h of the seven models a ross the sweeps al ulated ergodi ally every 2,000 iterations. Note that the only model that is visited very often in the Reversible Jump MCMC algorithm is model six. Next exer ise onsist in rerun the hain, this time with the six less probable models, that is to say all but model six. Analogous results are obtained, this time that outperforming model being model ve, ARMA(1,1)+ GARCH(1,1) with Normal Mixture Distribution. Neither gure nor table are presented for this ase. 22 Figure 5: Convergen e Diagrams and Histograms of the Posterior Sample for GARCH-Normal Model Additionally, a hain of the same length was run with only the less probable ve models. This time the favored model is ARMA(1,1)+GARCH(1,1) with Student-t distribution. Finally, an exer ise with models one, two, four, and seven was run. Figure 12 and Table 4 show the exer ise result with only the four more improbable models. In this ase the transition probabilities, j (m; m ), were taken as inversely proportional to the number of parameters on ea h model. This 0 23 Figure 6: Convergen e Diagrams and Histograms of the Posterior Sample for GARCH-t Model exer ise provide eviden e that model four is a posteriori the fourth most probable after models six, ve, and three, in that order. 24 Table 4: Posterior Probabilities and Bayes Factors of Four of the Seven Competing Models. Model Distribution Posterior Prob. Bayes Factor ARMA(1,1) NORMAL 0.0097 3.070 ARMA(1,1)+GARCH(1,1) NORMAL 0.1796 56.754 ARMA(1,1)+EGARCH(1,1) GED 0.8075 255.132 STOCHASTIC VOL t 0.0032 1.000 25 Figure 7: Convergen e Diagrams and Histograms of the Posterior Sample for EGARCH-ged Model 26 Figure 8: Convergen e Diagrams and Histograms of the Posterior Sample for GARCH-MixN Model 27 Figure 9: Convergen e Diagrams and Histograms of the Posterior Sample for GARCH-T4P Model 28 Figure 10: Convergen e Diagrams and Histograms of the Posterior Sample for SV-t Model 29 Figure 11: Convergen e Behavior of the Seven Models Figure 12: Convergen e Behavior of FOUR of the Seven Models 30 5 Conclusions. In this report the important issue of model estimation and model selection on time-varying volatility models was addressed, using a Bayesian approach and MCMC methods; this oﬀers advantages over other competing alternatives. The two more important approaches to time-varying volatility were considered (SV and GARCH). From the results Bera and Premaratne’s GARCH were widely favored; after that the Normal Mixture is selected as the best. However, it is clear that checking work for assumptions should be done on each of the favored models. It should be clear that all the results from models choices are conditional to the seven models initially selected; if other models are included or some are not, the results could change. Similar comments as those by George (1999) on Hoeting et al. (1999) apply here. In practice there will always be models left out, as GARCH models of high order or SV with many alternative distributions; unfortunately, the option of including many other models at the same time is highly limited by computational resources. The robustness of the RJMCMC to diﬀerent priors could be tested but much more computational time required discourages this practice. When the main purpose of the model selection exercise is to forecast, work on Bayes model averaging will be easily implemented, once the RJMCMC has been run and results have been saved (See Hoeting, et al., 1999 or Clyde, 1999, and the speciﬁc GARCH and EGARCH case in Vrontos et al., 2000). A fruitful avenue for future research would be the parsimonious incorporation of these features in multivariate models of stochastic volatility, see Jaquier et al. (1999). As for the speciﬁc case of the Colombian exchange rate, the eﬀect of exogenous shocks should be modeled with dummy variables, as Copeland and Wang (1994) did; such task could be the topic of forthcoming work to be reported elsewhere. Finally, and no less important, more work on computational algorithm and randomness behavior faced when working with values extremely near zero, as described at the beginning of Section 4.1, is required by specialists. It seems wise to end with a quote by G. E. P. Box, referenced by Piorier (1995, p. xi): All models are wrong but some are useful. 31 References Abramowitz, M. and Stegun, I. A. (1965). Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables. New York.: Dover Publications, Inc. Baillie, R. T. and Bollerslev, T. (1989). The Message in Daily Exchange Rates: A Conditional-Variance Tale. Journal of Business and Economic Statistics, 7 (3), 297—305. Ball, C. and Torous, W. (1983). A Simpliﬁed Jump Process for Common Stock Returns. Journal of Financial and Quantitative Analysis, 18 (1), 53— 65. Bera, A. and Premaratne, G. (2000). Modeling Asymmetry and Excess Kurtosis in Stock Return Data. Technical report, Department of Economics, University of Illinois U. C., Urbana-Champaign, IL. Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroscedasticity. Journal of Econometrics, 31, 307—327. Bollerslev, T. (1987). A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return. The Review of Economics and Statistics, 69, 542—547. Brooks, S. P. and Guidici, P. (1999). Convergence Assessment for Reversible Jump MCMC Simulation. Bayesian Statistics 6. (Bernardo, J. M., Berger, J. O., Dawid, A.P. and Smith, A.F.M., eds.). Oxford University Press, 733— 742. Carlin, B. P. and Chib, S. (1995). Bayesian Model Choice via Markov Chain Monte Carlo Methods. Journal of the Royal Statistical Society (Ser B), 57 (3), 473—484. Copeland, L. S. and Wang, P. (1994). Estimation Daily Seasonality in Foreign Exchange Rate Changes. Journal of Forecasting, 13 (6), 519—528. Engle, R. F. (1982). Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of the United Kingdom Inﬂation. Econometrica, 32 50 (4), 987—1007. Gamerman, D. (1997). Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. New York: Chapman-Hall. Gelman, Andrew, C., John, S., S., H., and Rubin, D. B. (1995). Bayesian Data Analysis. New York: Chapman-Hall/CRC. Green, P. J. (1995). Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination. Biometrika, 82 (4), 711—732. Harrison, P. J. and Stevens, C. F. (1976). Bayesian Forecasting (with discussion). Journal of the Royal Statistical Society (Ser B), 38 (1), 205—247. Hoeting, J. A., Madigan, David, R., E., A., and Volinsky, C. (1999). Bayesian Modeling Averaging: A Tutorial. (with comments). Statistical Science, 14 (4), 382—417. Hsieh, D. A. (1989). Modeling Heteroscedasticity in Daily Foreign-Exchange Rates. Journal of Business and Economic Statistics, 7 (3), 307—317. Jaquier, E., Polson, N., and Rossi, P. (1999). Stochastic Volatility: Univariate and Multivariate Extensions. Technical report, Graduate School of Business, The University of Chicago, Chicago, IL. Mengersen, K. L., Robert, C. P., and Chantal, G.-J. (1999). MCMC Convergence Diagnostics: A Reviewww. Bayesian Statistics 6. (Bernardo, J. M., Berger, J. O., Dawid, A.P. and Smith, A.F.M., eds.). Oxford University Press, 415—440. Nagahara, Y. (1999). The PDF and CDF of Pearson Type IV Distribution and the ML Estimation of the Parameters. Statistics and Probability Letters, 43, 251—264. Nelson, D. B. (1991). Conditional Heteroskedasticity in Asset Returns: A New approach. Econometrica, 59 (2), 347—370. Nelson, D. B. and Cao, C. Q. (1992). Inequality Constraints in the Univariate GARCH Model. Journal of Business and Economic Statistics, 10 (2), 33 229—235. Poirier, D. J. (1995). Intermediate Statistics and Econometrics: a Comparative Approach. London, England: The MIT Press. Pole, A., West, M., and Harrison, J. (1994). Applied Bayesian Forecasting and Time Series Analysis. New York: Chapman-Hall. Poskitt, D. S. and Tremayne, A. R. (1983). On the Posterior Odds of Time Series Models. Biometrika, 70, 157—162. Vlaar, P. J. G. and Palm, F. C. (1993). The Message in Weekly Exchange Rates in the European Monetary System: Mean Reversion, Conditional Heteroscedasticity, and Jumps. Journal of Business and Economic Statistics, 11 (3), 351—360. Vrontos, I. D., Dellaportas, P., and Politis, D. N. (2000). Full Bayesian Inference for GARCH and EGARCH Models. Journal of Business and Economic Statistics, 18 (2), 187—198. West, M. and Harrison, P. J. (1997). Bayesian Forecasting and Dynamic Models. Second Ed. New York: Springer-Verlag. 34

Log In

Bayesian model estimation and selection for the weekly colombian exchange rate