Bayesian Model Estimation and Selection for
the Weekly Colombian Exchange Rate
Norberto Rodríguez Niño
∗
September 2000
Abstract
This document reviews and applies recently developed techniques
for Bayesian estimation and model selection in the context of Time
Series modeling for Stochastic Volatility. After the literature review
on Generalized Conditional Autoregressive models, Stochastic Volatility models, and the relevant results on Markov Chain Monte Carlo
methods (MCMC), an example applying such techniques is shown.
The methodology is used with a series of Weekly Colombian-USA Exchange Rate on seven different models. The GARCH model, which
uses Type-IV Pearson distribution, is favored for the selecting technique, Reversible Jump MCMC, over other models, including Stochastic Volatility Models with a Student-t distribution.
∗
This work is based on the Master Report presented to The University of Texas at
Austin as a requirement for my degree of Master of Science in statistics. I would like to
thank Dr. Edward George for his guidance, his advice, and his patience. However, any
mistakes that remain are my own. Needless to say, I am very thankful to the Colombian
Central Bank for its financial support during my studies at UT.
1
Contents
1 Introduction
3
2 Background
4
2.1 Generalized Autoregressive Conditional Heterocedastic Models. 4
2.2 State Space Models. . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Markov Chain Monte Carlo Methods. . . . . . . . . . . . . . . 9
2.4 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 Markov Chains for Super-Models. . . . . . . . . . . . . 10
2.4.2 Markov Chains with Jumps. . . . . . . . . . . . . . . . 12
3 Methodology
14
3.1 The Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Model Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Results
19
4.1 Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Model Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 Conclusions.
31
References
32
2
1
Introduction
Two different and competing techniques are nowadays used for econometricians and statisticians to model volatility, as in return assets or exchange
rates: One of them, Autoregressive Conditional Heteroscedastic (ARCH),
its generalization, GARCH, and multiple extensions have proven to be very
successful in modeling financial time-varying volatility series.
The competing alternative to GARCH models are Stochastic Volatility
models, mainly treated in the frequentist framework, which have more “theoretical” background. They appear in the financial literature on option pricing
as a generalization of the Black-Scholes model.
Usually the researcher faces the question of what model to use. Several
alternatives have been proposed in the frequentist statistical framework to
deal with this, ranging from R2 , the traditional and extensively used Akaike
Information Criterion (AIC), PRESS statistic, and many others. In those
alternatives the model residuals are obtained (usually observed minus adjusted or perhaps deviance residuals) and aggregated to form the measures
of adequacy.
Unfortunately, ‘classical’ approaches to model choice are limited. The
well-known standard Neyman-Pearson theory provides an ‘optimal’ test through the likelihood ratio for the limited case of the comparison of two distinct
models. More generally, the likelihood ratio test enables a choice between
models only in the nested case, where there is an unambiguous null hypothesis. Selection is based upon an asymptotic χ2 approximation, which usually is
poor for small sample sizes. Frequentist theory does not offer much for model
selection of non-nested models, which are not rare in practice. (See Piorier,
1995, and Gelfand, 1995, for references). This documents uses MCMC, an
intuitive, computationally easy-to-implement, and inexpensive Bayesian alternative, to decide among suitable models.
This document is organized as follows: Section 2 presents a review of
GARCH models and Stochastic Volatility models, especially those models
to be used later in this paper; then it moves on to MCMC methods and
Bayesian model selection techniques. Section 3 deals with how to use MCMC
to implement estimation of GARCH models, as Vrontos et al. (2000) suggest,
and how to estimate SV models. Section 4 deals with the results and section
5 presents the main conclusions, some suggestions and limitations.
3
2
Background
This section presents a short review of some statistical and econometric models proposed in the literature to model time-varying volatility series. After
that, Bayesian techniques for model estimation and selection are briefly presented.
2.1
Generalized Autoregressive Conditional Heterocedastic Models.
Time series models, traditionally fitted in practice, are suppose to have constant variance, but when working with high frequency time series that is
seldom the case, as can be seen in Figure 2. Autoregressive Conditional
Heterocedastic (ARCH) models have been proposed in the literature to deal
with this problem, and those in the spirit of Engel (1982) could be tried; this
is
yt = ²t or yt = c + ²t or yt = ct + ²t
(1)
as some model for the levels of the observed time series, with ct some function
of lags of yt or other exogenous variables, or even an ARMAPterm (see Box
and Jenkins, 1976); where ²t ∼ N (0, h2t ) and h2t = α0 + ri=1 αi ²2t−i , for
αi ≥ 0, i = 0, 1, · · · , r. Hence, the conditional variances are thought of as a
function of the square of the previous observational residuals. In the original
and simple case that normality is assumed, the likelihood is given by:
l(y|α0 , · · ·
, αr , h20 )
=
T
Y
(2πh2t )1/2 exp(−²2t /2h2t )
(2)
t=1
when, in return financial time series analysis, the usual most important unknown to estimate and forecast is the volatility, h2t .
A Generalized ARCH (GARCH), that usually results in more parsimonious representations, as Bollerslev (1986) proposed, assumes that the conditional variances follow an ARMA process, thus:
h2t
= α0 +
r
X
αi ²2t−i
+
s
X
βj h2t−j
(3)
j=1
i=1
with the analogous likelihood, as in the ARCH case, if normality for ²t is
assumed. With restrictions αi ≥ 0, i = 0, 1, · · · , r, and βj ≥ 0, i = 1, 2, · · · , s,
4
P
P
to guarantee that h2t >= 0, for all t, and ri=1 αi + sj=1 βj < 1, in order
to assure stationarity in variance. Weaker restrictions can be required, in
practice, though (See Nelson and Cao, 1992).
Another extension of (3) is using a Student-t distribution, with n degrees
of freedom to account for the heavier tails of the distribution of the error
process {²t }, as was introduced by Bollerslev (1987), and by Baillie and
Bollerslev (1989). The likelihood then is
l(y|α0 , · · · , αr , β1 , · · ·
, βs , n, h20 )
=
T
Y
t=1
)
Γ( n+1
2
Γ(n/2)[π n h2t ]1/2
µ
²2
1 + t2
nht
¶−(n+1)/2
(4)
An Exponential GARCH (EGARCH), is introduced by Nelson (1991) to
avoid imposing restrictions on αi , βj in (3) with
¯
¯
¯
¯
s
r
X
¯
¯
¯
¯
²t−i X
²
²
t−i
t−i
2
¯
¯
¯
¯
βj ln(h2t−j )
ln(ht ) = α0 +
(5)
ϕi (¯
¯ − E ¯ ht−i ¯) + αi ht−i +
h
t−i
j=1
i=1
¯
¯
¯
¯
for the conditional variance of {²t }, so h2t > 0, for all t, and E ¯ h²t−i
¯ =
t−i
Γ(2/ν)
[Γ(1/ν)Γ(3/ν)]1/2
.
Additionally, assuming that ²t follows a Generalized Gaussian Distribution, GED, then the likelihood is:
l(y|α0 , αi , ϕi , βj , ν, h20 ) =
T
Y
t=1
c h−1
t
¯
¯
¯ ²t ¯ν
¯ )
exp(−0.5 ¯¯
λht ¯
(6)
i1/2
h
(−2/ν) Γ(1/ν)
where c = λ2(1+1/ν) Γ( 1 ) and λ = 2
.
Γ(3/ν)
ν
The use of Bernoulli-Mixtures of two normal distributions, proposed by
Ball and Torous (1983), was successfully implemented by Vlaar and Palm
(1993):
r
s
X
X
2
2
ht = α0 +
αi ²t−i +
βj h2t−j
ν
i=1
j=1
for the conditional variances. They used an MA(1) term to model the changes
in levels of several exchange rates, as:
²t = yt − φ0 − λν − θ1 ²t−1
5
(7)
and each ²t is distributed as a mixture of two normals, hence
²t ∼ (1 − λ)N(−λν, h2t ) + λN ((1 − λ)ν, h2t + δ 2 )
where λ is the jump intensity, ν is the expectation in the jump size, and δ 2
the expected change in variance. This representation is useful and intuitive
for economies with target or bands for their exchange rates, like Colombia
or the European Economic Union. The MA term, θ1 ²t−1 , is explained as
allowing for mean reversion. Therefore, the likelihood is expressed as:
l(y|φ0 , θ1 , αi , βj , λ, δ, ν, h20 )
µ
¶
·
T
Y
(²t + λν)2
−1/2 1 − λ
exp −
+
=
(2π)
ht
2h2t
t=1
¶¸
µ
λ
(²t − (1 − λ)ν)2
(8)
exp −
(h2t + δ 2 )1/2
2(h2t + δ 2 )
More recently, Bera and Premaratne (2000) propose the use of the Pearson
Type-IV Family Distributions in order to model skewness and leptokurtosis
that are larger than usual. For a GARCH(1,1), the log-likelihood is:
l(y|Θ) =
T
X
lt (y|Θ)
t=1
µ
¶
yt − φ0 − φ1 yt−1 − µht
lt (y|Θ) = − ln(α0 )−0.5 ln(ht )−ln(C)+δ arctan
α0 ht
µ
¶ "
µ
¶#
r+2
yt − φ0 − φ1 yt−1 − µht 2
−
ln 1 +
(9)
2
α0 ht
with
C=
Z
π
sinr ψ exp{−δ ψ} dψ
0
and Θ = (α0 , α1 , β1 , r, δ, µ, φ0 , φ1 ). They used
yt = φ0 + φ1 yt−1 + ²t
and
h2t = 1 + α1 ²2t−1 + β1 h2t−1
6
for their empirical application, but, of course, the model on yt can be extended.
But according to Nagahara (1999), (9) should be
µ
yt − φ0 − φ1 yt−1 − µht
lt (y|Θ) = − ln(α0 )−0.5 ln(ht )−ln(C)+(r+2)δ arctan
α0 ht
¶ "
µ
¶2 #
µ
yt − φ0 − φ1 yt−1 − µht
r+2
ln 1 +
(10)
−
2
α0 ht
with
C = exp[π(r + 2)δ/2]
Z
π
sinr ψ exp[−δ (r + 2)ψ] dψ
0
When information arrives at random order and data refers to close-toclose periods, Hsieh (1989) showed that the use of a normal-lognormal mixture distribution improves the fit over other GARCH alternatives. That
distribution is not tried here because it requires the computation of a defined integral over <; one must rely on high numerical integration, however,
to provide a suitable solution to that problem.
2.2
State Space Models.
A different alternative in time-series analysis is an State-Space (S-S), model,
as West and Harrison (1997) state, they extend and update the seminal paper
by Harrison and Stevens (1976). The model is typically represented by two
equations:
ObservationEquation : Yt = Ft0 θt + νt ,
νt ∼ N(0, Vt ),
(11)
SystemEquation : θt = Gt θt−1 + wt , wt ∼ N(0, Wt )
(12)
where yt is the observed (sometimes latent) variable ; θt is a vector of unknown parameters, which follows the first order Markov process (12); νt is
a vector of unobserved and uncorrelated stochastic error terms; Gt is a matrix of known coefficients; and wt is an unobserved stochastic term, generally
assumed uncorrelated with νt .
It is worth noting that any data series for which there exists a natural
ordering of observation, fits into the dynamic framework, so the time series
not need be equally spaced, and missing data problems can easily be handled
7
¶
in this context. Pole et al. (1994) present applied methodology, as well as
multiple examples.
Usually, and without much loss, Vt can be considered constant, and working in terms of the precision φ = V −1 , it is possible to get estimations of Wt .
The likelihood for the S-S model is given below:
¶
µ
T
Y
(yt − Ft0 θt )2
−1
(2πVt ) exp −
l(y|θt ) ∝
l(y|Θ) =
2Vt
t=1
t=1
T
Y
(13)
which is the density of a normal with mean Ft0 θt and variance Vt . Prior
probabilities can be set up on θ0 and a fully Bayesian analysis of the StateSpace model can be run.
A non-linear (non-gaussian) S-S model can be set up as follows, (See
Harvey et al., 1994 for details):
yt = ²t exp (ht /2)
(14)
as the non-linear observation equation, and
ht = γ + φht−1 + ηt ,
ηt ∼ N(0, ση2 ),
(15)
as the system equation, where ht = ln(σt2 ).
This model, which is known in the financial and econometric literature
as the Stochastic Volatility model (SV for short) can be transformed to get
a linear observational equation, as
ln(yt2 ) = ht + ln(²2t )
(16)
where yt is the mean corrected return at time t; ht is the log-volatility at
time t, which is assumed to follow a stationary process, with h0 drawn from
a stationary distribution; ²t and ηt are uncorrelated standard normal white
noise shocks; φ is the persistence in the volatility (when for stationarity
restriction |φ| <1); and ση2 is the volatility of the log-volatility.
The likelihood, assuming normal distribution, can be obtained by using
a Kalman Filter (See Jaquier et al., 1994, Ap. B.1). The use of a ν degrees
of freedom Student-t distribution on ² has been considered, and the Kalman
Filter needs some minor modifications1 (See Ruiz, 1994).
1
E.g., the use of digamma and trigamma functions; Abramowitz and Stegun (1967)
offer computational details.
8
2.3
Markov Chain Monte Carlo Methods.
When doing fully Bayesian analysis of complex or high-dimensionality models, the researcher usually faces the problem of non-conjugacy, meaning that
non-exact analytical posterior distribution can be achieved. This leads to the
necessity of using simulation approaches. Direct simulation is often impossible, due to the complicated mathematical form of the posterior distribution
in many applied models. Because of that, an exponential rise in the interest
and application of Markov Chain Monte Carlo (referred to by its acronym,
MCMC) as a tool for numerical computation of complex integrals, particularly in Bayesian analysis, has emerged.
The key to Markov Chain simulation is to create a Markov process whose
stationary distribution is a specified π(θ|y) and to run the simulation long
enough that the distribution of the current draws is close enough to the stationary distribution. Once the simulation algorithm has been implemented,
it should be iterated until convergence has been reached, or, if convergence is
painfully slow, the algorithm should be altered. Hence, the study of MCMC
has seen a corresponding interest in the convergence properties of the resultant chains, which may be assessed through a suite of diagnostics borrowed
from diverse areas such as time series, exploratory data analysis (EDA), and
central limit theory.
The most widely used Markov Chain Monte Carlo methods are the Gibbs
Sampler and the group of Metropolis-Hastings algorithms and a good description of them can be found in Gamerman (1997) or Gelman et al. (1995),
among many others.
2.4
Model Selection
The most damaging comment on the standard practice of choosing a single model, and then proceeding conditional on it, is that
the research’s uncertainty is understated.
– Piorier (1995, p. 605)
For frequentists the model selection problem reduces to choosing one from
a set of M models. This is usually the main aim of the analysis, and is done
according to some model selection criterion, as stated above. Bootstrap
methods have been used for model selection (See Maddala and Li, 1996, sec
5, for references). For another perspective see Poskitt and Tremayne (1983).
Essentially, two alternative approaches in the Bayesian context are presented. The first was introduced by Carlin and Chib (1995) and considers all
9
models in a formation, called here a supermodel. The Markov Chain simulation scheme for this supermodel is presented below. The second approach
presents sophisticated simulation techniques using Markov chain with jumps
between the different models; it is referred to as Reversible Jumping, and it
was introduced by Green (1995).
It will be assumed throughout this section that y is observed and it can
be described according to a model Mj with parameters θj of dimension dj ,
taking values in a parameter space Θj ⊂ <dj , j = 1, 2, . . . , M. The value of
M could be ∞ as, for instance, when considering countable classes of models.
m serves the purpose of indicating a specific model.
Assume for the moment that the posterior distribution π(θ, j), the joint
distribution of the super-parameter and the model indicator are to be obtained. However, the main interest in inference is to obtain the posterior
distribution of θj |m = j, j = 1, 2, . . . , M . These distributions respectively
provide the posterior inference within each of the models and the posterior
probabilities of the models. The supermodel approach provides a sample from
this more general, perhaps unnecessary posterior distribution whereas the approach with jumps only provides samples from θj |m = j, j = 1, 2, . . . , M ,
and m. The presence of common parameters does not pose any problem here.
2.4.1
Markov Chains for Super-Models.
The joint distribution of all random quantities is given by
π(y, θ, j) = π(y|θ, j)π(θ|j)πj
(17)
where j is the value of m and πj = P (m = j). Given that m = j, the
distribution of y depends on θ only through θj , or mathematically,
π(y|θ, j) = π(y|θj , j)
(18)
Assume also that the θj are conditionally independent, given the value of
m. Hence,
M
Y
π(θ|j) =
π(θi |j)
i=1
Note that the prior distribution π(θi |j), for i 6= j does not make much sense.
It specifies the distribution of the parameters of model i, conditioned on the
fact that this is not the true model. Carlin and Chib (1995) refer to these as
pseudo-prior or linking distributions. Due to the conditional independence
10
(18), these priors do not interfere in the expressions of the marginal predictive
densities for each model. Nevertheless, they are relevant for the construction
of the chain and must be specified.
It follows from the above specification that
π(y, θ, j) = π(y|θj , j)
M
Y
π(θi |j)πj
j=1
which is proportional to the joint posterior distribution of θ and m. A natural
blocking is formed by grouping each model’s parameters and m. The full
conditional distributions for θ1 , θ2 , · · · , θM and m are obtained as follows:
• For block θj , j = 1, . . . , M ,
½
π(y|θj , j)π(θj |j), for m = j
pj (θj ) ∝
π(θi |i),
for m = i 6= j
• For block m
−1
pM (j) = k π(y|θj , j)
M
Y
π(θi |j)πj , j = 1, . . . , M
j=1
that is a discrete distribution with proportionality constant
k=
M
X
π(y|θl , l)
M
Y
π(θi |l)πl
i=1
l=1
m can always be sampled directly because it has a discrete distribution.
Direct sampling from blocks θj will depend on the conjugacy structure for
model m = j and the form of the pseudo prior distribution. When direct
sampling for some of the θj ’s is not possible, Metropolis-Hastings steps may
be used.
The above scheme satisfies the conditions of a conventional Markov Chain
and therefore converges to the target distribution given by the posterior.
Comparison between models is based on the marginal posterior distribution
of m, p(j), j = 1, . . . , M. These probabilities are estimated by the proportion of values of m equal to j in the sample of size n.
The pseudo prior distributions must be carefully chosen, as they affect
the rate of convergence of the chain. Carlin and Chib (1995) recommend
11
the use of simple standard approximations based on univariate estimates
obtained from pilot chains. The same authors suggest using fairly vague prior
distributions, but it is well-known that when using this practice on models
with different dimensions the Bayes factors turn out to be very sensitive. So,
this prior setting may need further justification to satisfy potential users.
Finally, this approach is not applicable to the case of countable number
of models under consideration. Hence, the number of practical and theoretical difficulties of this approach suggest it should be used with care. See
Gamerman (1997), where more details can be found.
2.4.2
Markov Chains with Jumps.
Green (1995) introduced a reversible-jump MCMC strategy for generating
from the joint posterior π(m, θm |y), based on the standard Metropolis-Hastings
approach. The reversible-jump MCMC was also applied by Richardson and
Green (1997) for an analysis of univariate normal mixture; by Nobile and
Green (2000), for factorial experiments using mixture modeling; and Dellaportas and Forester (1999), for analysis of contingency tables. During
reversible-jump MCMC sampling, the constructed Markov Chain moves within
and between models, so that the limiting proportion of visits to a given model
is the required π(m|y).
In general, suppose that the current state of the Markov Chain at time t is
(m, θm ), where θm has dimension d(θm ) and a move is proposed at time t + 1
to a new model m0 with probability j(m, m0 ) and corresponding parameter
0
vector θm
0 . Then, a vector u is generated from a specified proposal density
0
0
q(u|θm , m, m0 ), and (θm
0 , u ) = gm,m0 (θm , u) is set for a specified invertible
−1
0
function gm,m0 such that gm,m0 = gm,m
0 . Note that d(θm ) + d(u) = d(θm0 ) +
0
d(u ). Green (1995) showed that, if the new move is accepted as the next
realization of the Markov Chain with probability α = min{1, r}, where
r=
0
0
0
0
0
0 0
0
π(y|m0 , θm
0 )π(θm0 |m )π(m )j(m , m)q(u |θm0 , m , m)
|J|
π(y|m, θm )π(θm |m)π(m)j(m, m0 )q(u|θm , m, m0 )
(19)
0
0
with J = ∂(θm
0 , u )/∂(θm , u) denoting the Jacobian of the transformation,
then the chain satisfies the condition of detailed balance and has the required
limiting distribution π(m, θm |y). The condition of detailed balance requires
0
that the equilibrium probability of moving from a state (m, θm ) to (m0 , θm
0)
0 0
equals that of moving from (m , θm0 ) to (m, θm ); for details, see Green (1995).
To implement the reversible-jump MCMC, the probabilities j(m, m0 )
need to be specified for every proposed move, as well as the proposal
12
0
0
distributions q(u|θm , m, m0 ), q(u0 |θm
0 , m , m) and the function gm,m0 . These
choices do not affect the results in terms of models selected but may affect
crucially the convergence rate of the Markov Chain. For the probability
j(m, m0 ), one non-informative alternative is j(m, m0 ) = (M − 1)−1 , for all
m, m0 ∈ M, when at each state of the chain a move from one model to other
one is always proposed.
Vrontos et al. (2000) proposed a modification of Green’s technique, which
they have successfully implemented in a series of experiments with GARCH
and EGARCH models; this is described as follows: First, they suggest that all
the parameters of the proposed model be generated from a proposal distribu0
0
0
0
tion. Consequently, (θm
0 , u ) = (u, θm ) with d(θm ) = d(u ) and d(θm0 ) = d(u)
0
0
0 0
0
0
, q(u|θm , m, m ) = q(u|m ), q(u |θm0 , m , m) = q(u |m), and the Jacobian in
(19) is 1. In this case, the probability of acceptance of the new move as the
next realization of the Markov chain is given by α = min{1, r}, where
0
0
0
0
0
0
π(y|m0 , θm
0 )π(θm0 |m )π(m )j(m , m)q(u |m)
r=
π(y|m, θm )π(θm |m)π(m)j(m, m0 )q(u|m0 )
(20)
The proposal densities q(u|m0 ) and q(u|m0 ) can be chosen by investigation
of a “pilot run.” They start the chain from the best available starting values
(e.g., the maximum likelihood estimates when available) and simulate the
“within-model” Markov Chain many times to obtain approximate marginal
posterior means and covariance matrices for each model parameter vector.
These estimates are then used to construct proposal densities q(u|m0 ) and
q(u0 |m), taken as multivariate normal densities.
13
3 Methodology
3.1
The Data.
In order to illustrate the estimation and sele ting methodologies, the weekly
observations of the US/Colombian (on spot) ex hange rate are used. Let ert
represent the Fridays ex hange rates, running from O tober 21, 1991, through
De ember 29 of 1999; that made T = 428 observations. The daily ex hange
rate orresponds to the weight average of trading, selling and buying, of U.S.
dollars in the open market. In the event of the market being losed on a
Friday, the observation on the previous Thursday was used.
Figure 1 shows the raw data; from this and with the help of a unit root
test, (See Enders, 1995), it is easy to on lude that the ex hange rate has a
unit root, so, as usual, one works with the rst di eren e of the natural logarithm of ex hange rates, returns, also known as ontinuously ompounded
rates of return, (rt = ln(ert ) ln(ert 1 ) for t = 1; : : : ; T ), whose representation is shown in Fig. 2.
2200
2000
1800
1600
1400
1200
1000
800
600
1992
1993
1994
1995
1996
1997
1998
1999
Figure 1: Weekly Colombian Ex hange Rate.
From Figure 3, whi h shows the histogram of returns, it is noteworthy
that the skewness and kurtosis oeÆ ients for rt , whi h are 0.8581 and 5.6872,
respe tively, are both signi antly positive and mu h larger than ommon,
thus showing asymmetry and leptokurtosis. As pointed out by Vlaar and
14
0.064
0.048
0.032
0.016
0.000
-0.016
-0.032
-0.048
1992
1993
1994
1995
1996
1997
1998
1999
Figure 2: Weekly Return from Colombian Ex hange Rate.
Palm (1993), the skewness ould be the result of the asymmetry in the movements of the parity adjustments, and a high kurtosis ould result from a time
varying-varian e. These two results lead to onsidering those distributions
di erent from the normal, whose use is unlikely to yield appropriate results.
Also, it is lear that the varian e is not onstant at all, whi h an be
seen from the Lunjg-Box's auto orrelation statisti of the squared returns:
2
2
Q (12) = 28:32 and Q (24) = 44:86 for lags 12 and 24, respe tively.
3.2
Estimation.
Based on the theoreti al justi ation of several models and taken in a ount
the parti ular Colombian e onomy, seven di erent models were proposed for
estimation and sele tion among them; They are des ribed below.
With the aim to implement a fully Bayesian analysis, MCMCs were used
to estimate the models 2 , mu h of them are estimated by the rst time from
the Bayesian point of view, whi h is a new development of this work.
First, an ARMA(1,1) model with normal disturban es was tted; this
model ould be tried in pra ti e when the s holar ignores the fa t that the
varian e is not onstant. In this report, su h a model is also used as the
2 Maximum Likelihood Estimation (MLE) was tried, but problems getting pre ise estimations of the varian e- ovarian e matrix, for some models, restri ted its use.
15
Figure 3: Histogram of return.
referen e model, against whi h others are to be ompared. Non-informative
Beta(1,1) from -1 to 1 3 priors were hosen for 1 and 1 . A N(0,5) as
prior for 0 was used; nally, a nearly non-informative but proper InverseGamma(2.001,0.001), (So, Expe ted value and varian e equal to 1/1000) was
used for 2 .4
Se ond, an ARMA(1,1) for the levels of return plus a GARCH(1,1) for
the varian e, so, t in (1) omes from an ARMA(1,1), and V (t ) = h2t from
(3) with r = 1, and s = 1; still assuming normality, the likelihood is given
by (2). Non-informative U ( 1; 1) prior for 1 and 1 ; 1= 0 for 0 ; a N(0,5)
for 0 ; U (0; 1) for 1 , and 1 , and in order to assure stationarity in (3),
any proposal that did not t 1 + 1 < 1 is reje ted. The same priors on
0 as before was used. From initial runs h20 turned out to be statisti ally
equal to zero, thus it is set to (1 0 1 ) , as Nelson and Cao (1992) propose for
omputational onvenien e.
Third, an ARMA(1,1) plus a GARCH(1,1) with Student-t distribution,
and 1 + 1 >= 1 are reje ted in (3) for stationarity purposes; n > 2 in (4);
h20 was set as in before. Priors: 1= 02 for 0 ; 1=(n 2)2 on n, and 1 , 1 as
in the se ond model. 0 , 1 , and 1 were treated as in the rst model.
Fourth, an ARMA(1,1) plus an EGARCH(1,1), (5) instead of (3); with
GED distribution, therefore, the likelihood is given by (6). Priors: 0 , 1 ,
3 in the sense that if
Jaquier et al. (1999)
4 1= 2 as a prior for
X
2
Beta(1; 1) then Y
= 2X
1
Beta(1,1) from -1 to 1.
was tried too, with similar results
16
See
and θ1 as above; a N(0,1) was used as prior for α0 ; N(0,5) for α1 , for β1 an
U(-1,1); ϕ1 is assumed identically to zero; and U(0,80) for ν. Any proposal
with |β1 | > 1 was rejected for stationarity purposes.
Fifth, an ARMA(1,1) plus an GARCH(1,1) with a Mixture of two normals
as the error term distribution using (8) as the likelihood. Priors: φ0 , φ1 , and
θ1 as before. 1/α02 for α0 ; for α1 , and β1 , U (0, 1); N(0,10) for ν; 1/(δ 2 )2 for
δ 2 ; finally, U (0, 1) for λ.
Sixth, an ARMA(1,1) plus an GARCH(1,1) with Type-IV Pearson distribution, hence, the likelihood is computed by using (10). Priors: InverseGamma(2.001,0.0001) was used for α0 , U(0, 1) for α1 , and β1 ; (r−2) following
an Inverse-Gamma(2.001,0.0001); N(0,1) for δ, and N(0,5) for µ. The distributions used previously for φ0 , φ1 , and θ1 were used here too. The restriction
r > 3 is imposed, so the first four moments from the Type-IV Pearson distribution exist. Again, any proposal with α1 + β1 >= 1 is rejected. α0 < 0.0001
were rejected for computational reasons.
Seventh, and finally, a Stochastic Volatility model with a Student-t distribution on the error term, expressed in (16) and (15). Non-informative prior
U(−1, 1) for φ, a N(0, 5) for γ; Inverse-Gamma(2.001,0.0001) for ση2 ; and for
ν a U (5, 80) was used as Jaquier et al. (1999) did to assure the t-Student
has at least the first-four moments.
Although the use of single versus multiple (parallel) chains in MCMC is
an open discussion, single chains for each model are used in this job because
of time and computing resources limitations. For a recent discussion of this
dilemma, the reader is referred to Mengersen et al. (1999), which contains
points in favor of each alternative.
Raftery and Lewis’ (1996) strategy was implemented here with constant c
selected in such a way that the proportions of the proposals accepted were between 20 and 50%, as has become common practice. The update is elementby-element and in random order. However, when high correlations between
parameters in conjunction with slow convergence were found the blocking
update was implemented to improve convergence.
In every case, a final chain of 80,000 was run and then steps of 50 to
300 were taken to avoid large autocorrelations in the chains. In that way,
first-order autocorrelations no larger than 0.55 were guaranteed.
Convergence of each chain is assessed by applying the Geweke’s (1992)
criterion, which null hypothesis is that stationarity has been reached, and
the test-statistic is suppose to follow an standard normal distribution under
H0 . This test is implemented by using CODA (See Best et al., 1997).
The mean-vector and Variance-covariance matrices are to be obtained in
17
order to feed or implement the RJMCMC, as explained later.
3.3
Model Selection.
The model selection exercise consists of applying the Reversible Jump MCMC
algorithm and the posterior probabilities, running 200,000 iterations and
showing the proportion of each 2,000 that model m (m=1, 2, ... 7) is selected.
For checking stability, visual analysis is used. Although there are some fresh
results about assessing convergence in RJMCMC, their value are not wellknown yet, as mentioned by Brooks and Guidici (1999).
For all seven models the same priors mentioned in Section 3.2 are to be
used. The proposal densities q(u|m0 ) and q(u0 |m) for each parameter were
constructed by using the MCMC output of the separate model runs described
above. These densities are taken as multivariate normals with mean vectors,
consisting of the sample mean values and covariance matrix equal to the
corresponding sample mean vector and covariance matrix of the parameters
in each model.
18
4
4.1
Results
Estimation.
It took between 12 hours 24 minutes and 56 hours and 48 minutes, from the
fastest to the slowest model, to made all the iterations, using a computer
with a Pentium I 233 MHz processor, and 64 MB RAM, running under
WINDOWS-98 Second Edition. Times reduce to one third using a Pentium
III 700 MHz processor, with the same software and 192 MB RAM.
In the following presentation the return rates are expressed in 0-100 scale,
which was used because of computational and presentational avenues; otherwise, models which use GARCH component get stuck, it seems because
values for α0 go so close to zero that the algorithm get overwhelmed.
Table 1: Geweke’s Convergence z Scores for the Seven Models.
Model
φ1
θ1
-0.275
-0.601
0.118
-.594
0.966
-1.13
-.002
0.752
-.186
-.948
0.624
-.374
-.439
-.494
0.667
φ0
φ1
θ1
α0
α1
β1
ν
0.079
-1.01
1.12
1.28
-1.42
-.581
-1.13
φ0
φ1
θ1
α0
α1
β1
λ
ν
δ2
GARCH(1,1):MixN
2.43
2.40
-2.20
-2.90
-.276
1.08
1.57
-2.17
-2.39
Model
φ0
φ1
θ1
α0
α1
β1
r
δ
µ
1.24
0.586
-.538
-0.092
0.801
-1.66
-0.82
-.308
φ
γ
ν
-1.53
ση2
1.70
-.713
-1.18
-1.52
ARMA(1,1):Normal
α0
α1
β1
σ2
φ0
n
-0.77
ARMA(1,1)+
GARCH(1,1):N
ARMA(1,1)+
GARCH(1,1):t
Model
-.873
ARMA(1,1)+
EGARCH(1,1):GED
Model
ARMA(1,1)+
ARMA(1,1)+
GARCH(1,1):T-IV P
Model
STOCHASTIC VOL.:t
The Geweke’s Convergence z Scores for the seven models are presented in
Table 1, looking to the numbers it is clear that convergence has been reached
for almost every parameter in all models.
19
The estimation results are presented in Table 2 with standard errors in
parentheses. 5 Although, parameter transformations 6 were tried for some
models, convergence was not improved, hence it use was discharged. From
Table 2 it should be said that except for the non-inclusion of some parameters
in most of the models, no additional work for the exclusion of non-significant
parameters in any model was attempted because time limitations, and because it is not the main purpose of this work to improve every and/or one
specific model.
Table 2: Estimation Results of the Seven Competing Models.
Model
One
Two
Three
φ0
Model
Six
α0
α1
β1
σ2
n
0.269
-.107
.112
0.103
(0.022)
(0.010)
(3.1E-3)
0.295
-.586
0.564
0.203
0.490
0.398
(.0041)
(.0179)
(0.016)
(.0024)
(0.004)
(0.004)
0.153
0.151
-.075
0.119
0.623
0.525
3.20
(0.005)
(0.024)
(0.021)
(.008)
(0.01)
(.006)
(0.06)
φ0
φ1
θ1
α0
α1
β1
ν
0.045
0.578
-.441
-.484
0.419
0.431
0.864
(0.003)
(.032)
(.029)
(.009)
(0.007)
(0.009)
(0.005)
φ0
φ1
θ1
α0
α1
β1
λ
ν
δ2
0.256
-0.022
0.065
0.047
0.321
0.475
0.247
0.525
2.33
(.013)
(.052)
(.044)
(.009)
(.011)
(.017)
(.015)
(.046)
(.182)
Four
Five
θ1
(6.6e-3)
Model
Model
φ1
φ0
φ1
θ1
α0
α1
β1
r
δ
µ
-.079
-.0627
1.9e-3
0.111
2.9e-4
0.999
3.17
0.063
-6.1e-3
(.009)
(.015)
(.013)
(3.8e-4)
(1.1e-5)
(1.8e-5)
(8.1e-3)
(4.3e-3)
(7.4e-4)
Model
φ
γ
ν
ση2
Seven
-.196
0.223
60.70
0.007
(0.023)
(0.109)
(0.53)
(.0016)
5
Such standard errors refer to the time-series estimates which are asymptotic, the
square root of the spectral density estimate divided by the sample size.
1+φ
6
Like logarithm or φ0 = ln( 1−φ
), when |φ| < 1, hence φ0 ∈ <.
20
Fig. 4 7 presents the resulting chains diagrams and the histograms of
the posterior sample of the parameters of the ARMA(1,1) model. The shape
of the posterior distribution of φ0 and σ 2 parameters indicate asymmetry,
hence deviation from normality. Fig. 5 does the same for GARCH-N Model,
Fig. 6 does the same for GARCH-MixN Model, Fig. 7 does the same for
EGARCH-GED Model, Fig. 8 does the same for GARCH-MixN Model, Fig.
9 for Model GARCH Type-IV Pearson, and Fig. 10 for SV-t model.
4.2
Model Selection.
The processing time for 80,000 iterations, using the same computer as for
estimation, was 16 hours and 23 minutes, for all the seven models, a total of
47 parameters which use above 32 MB of disk-space. Note that, according
to Fig 11, this is a conservative run length, and less than one-fourth of the
run could be sufficient to achieve the same posterior distributions.
Table 3: Posterior Probabilities and Bayes Factors of Seven Competing Models.
Model
Distribution Posterior Prob. Bayes Factor
ARMA(1,1)
NORMAL
0.00001
1.000
ARMA(1,1)+GARCH(1,1)
NORMAL
0.00001
1.000
ARMA(1,1)+GARCH(1,1)
t
0.00003
3.000
ARMA(1,1)+EGARCH(1,1)
GED
0.00001
1.000
ARMA(1,1)+GARCH(1,1)
Mixt. Normal
0.00005
5.000
ARMA(1,1)+GARCH(1,1)
T.-IV Pearson
0.99990
199989.000
STOCHASTIC VOL
t
0.00001
1.000
The RJMCMC results and Bayes factor are displayed in Table 3; which
shows the posterior probabilities and Bayes Factors for the seven models.
The last column refers to the relative weight against the worst models,
ARMA(1,1), SV, and ARMA+GARCH. According to this results it is very
clear that model six, ARMA(1,1)+GARCH(1,1) with Type-IV Pearson distribution over perform the rest of them, with posterior probability 100%.
7
In this graph, as well as in the rest in this Chapter, the next convention is used: T_1
refers to φ0 , T_2 to φ1 , T_3 to θ1 , and T_4 to σ 2 . Hence, for example T_9 in Fig. 8 is
used to note δ 2 .
21
Figure 4: Convergen e Diagrams and Histograms of the Posterior Sample for
ARMA(1,1) Model
Figure 11 shows the onvergen e behavior of the hain. That gure illustrate the probability of ea h of the seven models a ross the sweeps al ulated
ergodi ally every 2,000 iterations. Note that the only model that is visited
very often in the Reversible Jump MCMC algorithm is model six.
Next exer ise onsist in rerun the hain, this time with the six less probable models, that is to say all but model six. Analogous results are obtained, this time that outperforming model being model ve, ARMA(1,1)+
GARCH(1,1) with Normal Mixture Distribution. Neither gure nor table
are presented for this ase.
22
Figure 5: Convergen e Diagrams and Histograms of the Posterior Sample for
GARCH-Normal Model
Additionally, a hain of the same length was run with only the less probable ve models. This time the favored model is ARMA(1,1)+GARCH(1,1)
with Student-t distribution.
Finally, an exer ise with models one, two, four, and seven was run. Figure
12 and Table 4 show the exer ise result with only the four more improbable
models. In this ase the transition probabilities, j (m; m ), were taken as
inversely proportional to the number of parameters on ea h model. This
0
23
Figure 6: Convergen e Diagrams and Histograms of the Posterior Sample for
GARCH-t Model
exer ise provide eviden e that model four is a posteriori the fourth most
probable after models six, ve, and three, in that order.
24
Table 4: Posterior Probabilities and Bayes Factors of Four of the Seven
Competing Models.
Model
Distribution Posterior Prob. Bayes Factor
ARMA(1,1)
NORMAL
0.0097
3.070
ARMA(1,1)+GARCH(1,1)
NORMAL
0.1796
56.754
ARMA(1,1)+EGARCH(1,1)
GED
0.8075
255.132
STOCHASTIC VOL
t
0.0032
1.000
25
Figure 7: Convergen e Diagrams and Histograms of the Posterior Sample for
EGARCH-ged Model
26
Figure 8: Convergen e Diagrams and Histograms of the Posterior Sample for
GARCH-MixN Model
27
Figure 9: Convergen e Diagrams and Histograms of the Posterior Sample for
GARCH-T4P Model
28
Figure 10: Convergen e Diagrams and Histograms of the Posterior Sample
for SV-t Model
29
Figure 11: Convergen e Behavior of the Seven Models
Figure 12: Convergen e Behavior of FOUR of the Seven Models
30
5
Conclusions.
In this report the important issue of model estimation and model selection on
time-varying volatility models was addressed, using a Bayesian approach and
MCMC methods; this offers advantages over other competing alternatives.
The two more important approaches to time-varying volatility were considered (SV and GARCH). From the results Bera and Premaratne’s GARCH
were widely favored; after that the Normal Mixture is selected as the best.
However, it is clear that checking work for assumptions should be done on
each of the favored models.
It should be clear that all the results from models choices are conditional
to the seven models initially selected; if other models are included or some
are not, the results could change. Similar comments as those by George
(1999) on Hoeting et al. (1999) apply here. In practice there will always be
models left out, as GARCH models of high order or SV with many alternative
distributions; unfortunately, the option of including many other models at
the same time is highly limited by computational resources.
The robustness of the RJMCMC to different priors could be tested but
much more computational time required discourages this practice.
When the main purpose of the model selection exercise is to forecast, work
on Bayes model averaging will be easily implemented, once the RJMCMC
has been run and results have been saved (See Hoeting, et al., 1999 or Clyde,
1999, and the specific GARCH and EGARCH case in Vrontos et al., 2000).
A fruitful avenue for future research would be the parsimonious incorporation of these features in multivariate models of stochastic volatility, see
Jaquier et al. (1999).
As for the specific case of the Colombian exchange rate, the effect of
exogenous shocks should be modeled with dummy variables, as Copeland
and Wang (1994) did; such task could be the topic of forthcoming work to
be reported elsewhere.
Finally, and no less important, more work on computational algorithm
and randomness behavior faced when working with values extremely near
zero, as described at the beginning of Section 4.1, is required by specialists.
It seems wise to end with a quote by G. E. P. Box, referenced by Piorier
(1995, p. xi):
All models are wrong but some are useful.
31
References
Abramowitz, M. and Stegun, I. A. (1965). Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables. New York.: Dover
Publications, Inc.
Baillie, R. T. and Bollerslev, T. (1989). The Message in Daily Exchange
Rates: A Conditional-Variance Tale. Journal of Business and Economic
Statistics, 7 (3), 297—305.
Ball, C. and Torous, W. (1983). A Simplified Jump Process for Common
Stock Returns. Journal of Financial and Quantitative Analysis, 18 (1), 53—
65.
Bera, A. and Premaratne, G. (2000). Modeling Asymmetry and Excess Kurtosis in Stock Return Data. Technical report, Department of Economics,
University of Illinois U. C., Urbana-Champaign, IL.
Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroscedasticity. Journal of Econometrics, 31, 307—327.
Bollerslev, T. (1987). A Conditionally Heteroskedastic Time Series Model
for Speculative Prices and Rates of Return. The Review of Economics and
Statistics, 69, 542—547.
Brooks, S. P. and Guidici, P. (1999). Convergence Assessment for Reversible
Jump MCMC Simulation. Bayesian Statistics 6. (Bernardo, J. M., Berger,
J. O., Dawid, A.P. and Smith, A.F.M., eds.). Oxford University Press, 733—
742.
Carlin, B. P. and Chib, S. (1995). Bayesian Model Choice via Markov Chain
Monte Carlo Methods. Journal of the Royal Statistical Society (Ser B),
57 (3), 473—484.
Copeland, L. S. and Wang, P. (1994). Estimation Daily Seasonality in Foreign Exchange Rate Changes. Journal of Forecasting, 13 (6), 519—528.
Engle, R. F. (1982). Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of the United Kingdom Inflation. Econometrica,
32
50 (4), 987—1007.
Gamerman, D. (1997). Markov Chain Monte Carlo: Stochastic Simulation
for Bayesian Inference. New York: Chapman-Hall.
Gelman, Andrew, C., John, S., S., H., and Rubin, D. B. (1995). Bayesian
Data Analysis. New York: Chapman-Hall/CRC.
Green, P. J. (1995). Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination. Biometrika, 82 (4), 711—732.
Harrison, P. J. and Stevens, C. F. (1976). Bayesian Forecasting (with discussion). Journal of the Royal Statistical Society (Ser B), 38 (1), 205—247.
Hoeting, J. A., Madigan, David, R., E., A., and Volinsky, C. (1999). Bayesian
Modeling Averaging: A Tutorial. (with comments). Statistical Science,
14 (4), 382—417.
Hsieh, D. A. (1989). Modeling Heteroscedasticity in Daily Foreign-Exchange
Rates. Journal of Business and Economic Statistics, 7 (3), 307—317.
Jaquier, E., Polson, N., and Rossi, P. (1999). Stochastic Volatility: Univariate and Multivariate Extensions. Technical report, Graduate School of
Business, The University of Chicago, Chicago, IL.
Mengersen, K. L., Robert, C. P., and Chantal, G.-J. (1999). MCMC Convergence Diagnostics: A Reviewww. Bayesian Statistics 6. (Bernardo, J.
M., Berger, J. O., Dawid, A.P. and Smith, A.F.M., eds.). Oxford University
Press, 415—440.
Nagahara, Y. (1999). The PDF and CDF of Pearson Type IV Distribution
and the ML Estimation of the Parameters. Statistics and Probability Letters,
43, 251—264.
Nelson, D. B. (1991). Conditional Heteroskedasticity in Asset Returns: A
New approach. Econometrica, 59 (2), 347—370.
Nelson, D. B. and Cao, C. Q. (1992). Inequality Constraints in the Univariate GARCH Model. Journal of Business and Economic Statistics, 10 (2),
33
229—235.
Poirier, D. J. (1995). Intermediate Statistics and Econometrics: a Comparative Approach. London, England: The MIT Press.
Pole, A., West, M., and Harrison, J. (1994). Applied Bayesian Forecasting
and Time Series Analysis. New York: Chapman-Hall.
Poskitt, D. S. and Tremayne, A. R. (1983). On the Posterior Odds of Time
Series Models. Biometrika, 70, 157—162.
Vlaar, P. J. G. and Palm, F. C. (1993). The Message in Weekly Exchange
Rates in the European Monetary System: Mean Reversion, Conditional Heteroscedasticity, and Jumps. Journal of Business and Economic Statistics,
11 (3), 351—360.
Vrontos, I. D., Dellaportas, P., and Politis, D. N. (2000). Full Bayesian Inference for GARCH and EGARCH Models. Journal of Business and Economic
Statistics, 18 (2), 187—198.
West, M. and Harrison, P. J. (1997). Bayesian Forecasting and Dynamic
Models. Second Ed. New York: Springer-Verlag.
34