Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Bayesian Model Choice of Grouped t-copula arXiv:1103.0606v1 [q-fin.CP] 3 Mar 2011 Xiaolin Luo1∗ Pavel V. Shevchenko2 This version 2 March 2011 1 2 CSIRO Mathematics, Informatics and Statistics, Australia; e-mail: Xiaolin.Luo@csiro.au CSIRO Mathematics, Informatics and Statistics, Australia; e-mail: Pavel.Shevchenko@csiro.au ∗ Corresponding author Abstract One of the most popular copulas for modeling dependence structures is t-copula. Recently the grouped t-copula was generalized to allow each group to have one member only, so that a priori grouping is not required and the dependence modeling is more flexible. This paper describes a Markov chain Monte Carlo (MCMC) method under the Bayesian inference framework for estimating and choosing t-copula models. Using historical data of foreign exchange (FX) rates as a case study, we found that Bayesian model choice criteria overwhelmingly favor the generalized t-copula. In addition, all the criteria also agree on the second most likely model and these inferences are all consistent with classical likelihood ratio tests. Finally, we demonstrate the impact of model choice on the conditional Value-at-Risk for portfolios of six major FX rates. Key words: grouped t−copula, dependence modeling, Bayesian model choice, Markov chain Monte Carlo, foreign exchange. 1 1 Introduction Copula functions have become popular and flexible tools in modeling multivariate dependence among financial risk factors. In practice, one of the most popular copulas in modeling multivariate financial data is perhaps the t-copula implied by the multivariate t-distribution (hereafter referred to as standard t-copula); see Embrechts et al (2001), Fang et al (2002), and Demarta and McNeil (2005). This is due to its simplicity in terms of simulation and calibration, combined with its ability to model tail dependence which is often observed in financial returns data. Papers by Mashal et al (2003) and Breymann et al (2003) have demonstrated that the empirical fit of the standard t-copula is superior in most cases when compared to the Gaussian copula. However, the standard t-copula is often criticized due to the restriction of having only one parameter for the degrees of freedom (dof), which may limit its ability to model tail dependence in multivariate case. To overcome this problem, Daul et al (2003) proposed the use of the grouped t-copula, where risks are grouped into classes and each class has its own standard t-copula with a specific dof. This, however, requires an a priori choice of classes. It is not always obvious how the risk factors should be divided into sub-groups. An adequate choice of grouping configurations requires substantial additional effort if there is no natural grouping, for example, by sector or class of asset. Recently, the grouped t-copula was generalized to a new t-copula with multiple dof parameters (hereafter referred to as generalized t-copula); see Luo and Shevchenko (2010) and Venter et al (2007). This copula can be viewed as a grouped t-copula with each group having only one member. It has the advantages of a grouped t-copula with flexible modeling of multivariate dependences, yet at the same time it overcomes the difficulties with a priori choice of groups. For convenience, denote the new copula as t̃ν -copula, where ν = (ν1 , ..., νn ) denotes the vector of dof parameters and n is the number of dimensions. Luo and Shevchenko (2010) demonstrated that some characteristics of this new copula in 2 the bivariate case are quite different from those of the standard t-copula. For example, the copula is not exchangeable if ν1 6= ν2 and tail dependence implied by the t̃ν -copula depends on both dof parameters. The difference between t̃ν - and standard t-copulas, in terms of impact on Value-at-Risk (VaR) and conditional Value-at-Risk (CVaR) of the portfolio, can be significant as demonstrated by simulation experiments for the bivariate case. This difference is even much larger than the difference between Gaussian copula and the standard t-copula. In examples of maximum likelihood fitting to USD/AUD and USD/JPY daily return data, standard t-copula was statistically rejected by a formal Likelihood Ratio test in favour of the t̃ν copula (i.e. dof parameters in the t̃ν -copula were statistically different). This paper presents a Bayesian model selection study on the t-copula models in the multivariate case. We demonstrate how to perform Bayesian inference using Markov chain Monte Carlo (MCMC) simulations to estimate parameters and make decisions on model choice. From a Bayesian point of view, model parameters are random variables whose distribution can be inferred by combining the prior density with the likelihood of observed data. The complete posterior distribution of the parameters resulting from Bayesian MCMC allows further analysis such as model selection and parameter uncertainty quantification. Specifically, we solve a variable selection problem in the same vein as discussed in Cairns (2000). Increasingly, Bayesian MCMC finds new applications in quantitative financial risk modeling. Recent examples are found in Peters et al. (2009, 2010) for insurance, Shevchenko (2010) for operational risk and Luo and Shevchenko (2010) for credit risk. As a case study, we consider the application of modeling dependence among six major foreign exchange (FX) rates (AUD, CAD, CHF, EUR, GBP and JPY, against USD) using t-copulas. Following common practice (see e.g. McNeil et al 2005), we use the GARCH(1,1) model to standardize the log-returns of the exchange rates marginally. Then the GARCH filtered residuals of the six major FX rates are modeled by a t-copula. In 3 this study we consider altogether 33 competing t-copula models: the standard t-copula, 31 grouped t-copulas and the generalized t-copula (i.e. t̃ν -copula). The 31 grouped t-copulas are a complete set of all possible combinations of two groups from six FXs (see Table 1 for all possible 2-group configurations for the six FX majors). We present procedures and results of MCMC simulation for t-copula models under the Bayesian framework. Also, we demonstrate using Bayesian model inference and actual data, that the generalized t-copula (t̃ν -copula) is convincingly the model of choice for modeling dependence between six FX majors, among considered 33 t-copula models. Even compared with the best grouped t-copula chosen from 31 possible two-group configurations, the t̃ν -copula is overwhelmingly favoured by the Bayesian factors obtained from the MCMC posterior distribution. We demonstrate that the joint calibration of grouped t-copula can be done very efficiently by applying MCMC. Using model parameters estimated from MCMC, we also demonstrate the impact of model choice on CVaR of two portfolios of six FX majors. The organisation of this paper is as follows. Section 2 introduces the various t-copula models and notations. Then it describes the GARCH model filtering for the six FX majors, and calibration of the t-copula models using the maximum likelihood method. Section 3 discusses the Bayesian inference formulation, the MCMC simulation algorithm, the reciprocal importance sampling estimator and the deviance information criterion for model selection. Direct computing of the posterior model probability is also discussed in Section 3. Section 4 presents MCMC results and the corresponding Bayesian model selection, in comparison with the traditional maximum likelihood results and Likelihood Ratio tests. Examples of portfolio CVaR calculation using selected models and calibrated parameters are provided in Section 5, demonstrating the impact of model choice on risk quantification. Concluding remarks are given in the final section. 4 2 Model, data and maximum likelihood calibration It is well known from Sklar’s theorem (see Sklar 1959 and Joe 1997) that any joint distribution function F with continuous (strictly increasing) margins F1 , F2 , . . . , Fn has a unique copula C(u) = F (F1−1 (u1 ), F2−1(u2 ), . . . , Fn−1 (un )). (1) The t-copulas are most easily described and understood by a stochastic representation, as defined below. 2.1 t-copula models We introduce notation and definitions as follows: • Z = (Z1 , . . . , Zn )′ is a random vector from the multivariate normal distribution ΦΣ (z) with zero mean vector, unit variances and correlation matrix Σ. • U = (U1 , . . . , Un )′ is defined on [0, 1]n domain. • V is a random variable from the uniform (0,1) distribution independent of Z. • W = G−1 ν (V ), where Gν (·) is the distribution function of p ν/S with S distributed from the chi-square distribution with ν dof, i.e. W and Z are independent. • tν (·) is the standard univariate t-distribution and t−1 ν (·) is its inverse. Then we have the following representations. Standard t-copula The random vector X=W ×Z (2) is distributed from a multivariate t-distribution and random vector U = (tν (X1 ), . . . , tν (Xn ))′ 5 (3) is distributed from the standard t-copula. Grouped t-copula Partition {1, 2, . . . , n} into m non-overlapping sub-groups of sizes n1 , . . . , nm . Then the copula of the random vector X = (W1 Z1 , . . . , W1 Zn1 , W2 Zn1 +1 , . . . , W2 Zn1 +n2 , . . . , Wm Zn )′ , (4) where Wk = G−1 νk (V ), k = 1, . . . , m, is the grouped t-copula. That is, U = (tν1 (X1 ), . . . , tν1 (Xn1 ), tν2 (Xn1 +1 ), . . . , tν2 (Xn1 +n2 ), . . . , tνm (Xn ))′ (5) is a random vector from the grouped t-copula. Here, the copula for each group is a standard t-copula with its own dof parameter (i.e. νk is dof parameter of the standard t-copula for the k-th group). Generalized t-copula with multiple dof (t̃ν -copula) Consider the grouped t-copula where each group has a single member. In this case the copula of the random vector X = (W1 Z1 , W2 Z2 , . . . , Wn Zn )′ (6) is said to have a t-copula with multiple dof parameters ν = (ν1 , . . . , νn ), which we denote as t̃ν -copula. That is, U = (tν1 (X1 ), tν2 (X2 ), . . . , tνn (Xn ))′ (7) is a random vector distributed according to t̃ν -copula. Note, all Wi are perfectly dependent. Remark: Given the above stochastic representation, simulation of the t̃ν copula is straightforward. In the case of standard t-copula ν1 = · · · = νn = ν and in the case of grouped t-copula the corresponding subsets have the same dof parameter. Note that, the standard t-copula and grouped t-copula are special cases of t̃ν -copula. 6 From the stochastic representation (6-7), it is easy to show that the t̃ν -copula distribution has the following explicit integral expression CνΣ (u) = Z1 ΦΣ (z1 (u1 , s), . . . , zn (un , s))ds (8) 0 and its density is Z1 ∂ n CνΣ (u) cΣ (u) = = ν ∂u1 . . . ∂un ϕΣ (z1 (u1 , s), . . . , zn (un , s)) n Y i=1 0 −1 [wi (s)] ds/ n Y fνi (xi ). (9) i=1 Here: • zi (ui , s) = t−1 νi (ui )/wi (s), i = 1, 2, . . . , n; • wi (s) = G−1 νi (s); • ϕΣ (z1 , . . . , zn ) = exp(− 21 z′ Σ−1 z)/[(2π)n/2 (detΣ)1/2 ] is the multivariate normal density; • xi = t−1 νi (ui ), i = 1, 2, . . . , n; −(ν+1)/2 • fν (x) = (1 + x2 /ν) √ Γ((ν + 1)/2)/[Γ(ν/2) νπ] is the univariate t-density, where Γ(·) is a gamma function. The multivariate density (9) involves a one-dimensional integration which makes the density calculation computationally more demanding than in the case of the standard t-copula, but still practical using available fast and accurate algorithms for the onedimensional integration. If all the dof parameters are equal, i.e. ν1 = · · · = νn = ν, then it is easy to show that the copula defined by (8) becomes the standard t-copula; see Luo and Shevchenko (2010) for a proof. 2.2 FX data and GARCH filtering As a case study we consider modeling dependence between six FXs using t-copulas introduced in previous section. The daily foreign exchange rate data for the six FX majors in 7 the period January 2004 to April 2008 (a total of 1092 trading days) were downloaded from the Federal Reserve Statistical Release (http://www.federalreserve.gov/releases). These daily data have been certified by the Federal Reserve Bank of New York as the noon buying rates in New York City. For our purpose, we study the six major currencies (AUD, CAD, CHF, EUR, GBP and JPY). Rates were converted to USD per currency unit in the present study, if not already in this convention. This unified convention allows a portfolio of currencies to be conveniently valued in terms of a single currency, the USD. Following common practice (see McNeil et al 2005), we use the GARCH(1,1) model to standardize the log-returns of the exchange rates marginally. The GARCH(1,1) model calculates the current squared volatility σt2 as 2 σt2 = ω + α(xt−1 − µ)2 + βσt−1 , ω ≥ 0, α, β ≥ 0, α + β < 1, (10) where xt−1 denotes the log-return of an exchange rate on date t − 1. GARCH parameters ω, α and β are estimated using the maximum likelihood method. Log-return was modeled as xt = µ + σt ε(t) , (11) where µ is the average historical return or drift for the asset and ε(t) is a sequence of iid random variables referred to as the residuals. The GARCH filtered residuals of the FX rates were then used to fit the t-copula models. Before the fitting the residuals were transformed to the (0,1) domain marginally using empirical distributions of the residuals. 2.3 Configuration of grouped t-copula With six dimensions, the grouped t-copula can have a total of 201 possible combinations (not counting the standard t-copula and the t̃ν -copula). In this study we concentrate on the class of configurations with two groups only, which is the next level of complexity compared with the standard t-copula. This reduces the number of possible grouped tcopula models to 31. These 31 grouped t-copula models are: 8 • 10 models from the complete subset of (3,3) configurations (with two groups and three members in each group). • 15 models from the complete subset of (2,4) configurations (with two members in the first group and four members in the second group). • 6 models from the complete subset of (1,5) configurations (with one member in the first group and five members in the second group). Note, a (1,5) combination is the same as a (5,1) combination, and a (2,4) combination is the same as a (4,2) combination. So, altogether we have 33 competing models to choose from – the standard t-copula, the 31 two-grouped t-copula and the generalized t-copula (t̃ν -copula). Table 1 lists all 33 models for modeling the six FX majors, their grouping configurations and parameter notations. In column 2 of Table 1, each pair of parentheses define a subgroup configuration. The generalized grouped t-copula has six sub-groups with a single member in each sub-group, while the standard t-copula has one group containing all six members. Note that for the grouped t-copula, exchanging the two sub-groups makes no difference – these two configurations have exactly the same combinations of members, so no new models will emerge from this exchange. 2.4 Maximum likelihood calibration Consider a random vector of data Y = (Y1 , . . . , Yn )′ . To estimate a parametric copula using observations y(j) , j = 1, . . . , K, where K is the number of observations, the first step is to project the data to the [0, 1]n domain to obtain u(j) , using estimated marginal distributions. In our study the margins are modeled using empirical distributions but it can also be modeled using parametric distributions or a combination of these methods, e.g. empirical distribution for the body and a generalized Pareto distribution for the 9 tail of a marginal distribution (McNeil et al 2005, page 233). Given pseudo sample u(j) constructed using the original data, the copula parameters can be estimated using, for example, the maximum likehood method or MCMC. Accurate maximum likelihood estimates (MLEs) of the copula parameters should be obtained by fitting all unknown parameters jointly. In practice, to simplify the calibration procedure, correlation matrix coefficients for t-copulas are often calculated pair-wise using Kendall’s tau rank correlation coefficients τ (Yi , Yj ) via the formula (McNeil et al 2005) Σij = sin 1 πτ (Yi , Yj ) 2  . (12) Then in a second stage the dof parameters ν1 , . . . , νn are estimated. Strictly speaking (12) is valid for bivariate case only, however in practice it works well for multivariate case too. It was noted in Daul et al (2003) that formula (12) is still highly accurate even when it is applied to find the correlation coefficients between risks from the different groups. McNeil et al (2005) observed that the estimated parameters using Kendall’s tau are identical to those obtained by joint estimation to two significant digits, confirming good accuracy of the Kendall’s tau simplification. It was also observed in Luo and Shevchenko (2010) that the difference in estimated parameters between the Kendall’s tau approximation and the joint estimation was mostly in the third significant digit and was smaller than the standard errors for the MLEs. In addition, a study of small sample properties in Luo and Shevchenko (2010) showed that the bias introduced by the Kendall’s tau approximation is very small even for a small sample size of 50. In the present work the data sample size is over 1000. The small bias of the Kendall’s tau approximation is certainly insignificant when compared with the often large difference existing between dof parameters of different t-copula models. In other words, using (12) for the correlation coefficients should cause little material difference in the present model choice study where the difference is expected to come from different group configurations. Because the Kendall’s tau approximation is applied pair-wise, we have identical correla10 tion matrix for all the copula models to be considered. This simplification is computationally very significant for the grouped t-copula for which the calibration using density (9) is computationally demanding. By using the Kendall’s tau approximation, the number of unknown parameters reduces from M = n(n + 1)/2 to M = n for the generalized grouped t-copula. With six-dimensions considered in this study, this amounts to a reduction from 21 parameters to only 6. For the grouped t-copula with two groups, this reduction is from 17 to 2, an even more dramatic reduction. A substantial saving of computing time is achieved in both cases. Remark: An accurate calibration of grouped t-copula requires joint estimation of dof parameters. Sometimes in practice an approximate approach is taken where a grouped t-copula is calibrated marginally, i.e. each sub-group is calibrated separately using a standard t-copula. This approximation is not always justified; also it can not be applied to a generalized t-copula. For a proper and fair comparison between the grouped t-copula and the generalized grouped t-copula, in this study we perform joint calibration for both copulas. When the grouped t-copula is calibrated jointly, its density is given by the integral formula (9), the same as the generalized t-copula, so a proper joint calibration of the grouped t-copula is also computationally demanding when compared with the calibration of a standard t-copula. Let ν be the vector of n dof parameters ν1 , . . . , νn (the grouped t-copula is treated as a special case of t̃ν -copula). Denote the density of the t̃ν -copula evaluated at u(j) as cν (u(j) ), which can be obtained using (9). Then the MLEs for ν are calculated by maximizing the 11 log-likelihood function K Y ℓU (ν) = ln (j) cΣ ν (u ) j=1 = K X j=1 +  1  Z n  Y (j) ln  ϕΣ z1 (s), . . . , zn(j) (s) [wi (s)]−1 ds i=1 0 K X n X j=1 i=1 n X +K 1 (ν 2 i 1 2 i=1 (j) where xi (j) (j)  (j) + 1) ln[1 + (xi )2 /νi ]  ln(νi π) + ln[Γ( 12 νi )/Γ( 21 (νi + 1))] , (13) (j) = t−1 νi (ui ), zi (s) = xi /wi (s), i = 1, . . . , n, j = 1, . . . , K. In this work we use the double precision IMSL function DQDAGS, a globally adaptive integration scheme documented in Piessens et al (1983) for the integration in (9). For the maximization of (13) the double precision IMSL function DBCPOL is used, which employs a direct search Simplex algorithm that does not require calculation of gradients. 3 Bayesian inference and MCMC In this section we describe Bayesian approach and MCMC procedure to estimate t-copulas, and model selection criteria used to choose the t-copula model. Under the Bayesian approach, the model parameters θ (in our case θ is just the dof parameter ν) are treated as random variables. Given a prior distribution π(θ) and a conditional density of the data given θ (i.e. likelihood) π(y|θ), the joint density of data Y and the model parameters θ is π(y, θ) = π(y|θ)π(θ). Having observed data Y, the distribution of θ conditional on Y, the posterior distribution, is determined by Bayes’ theorem π(θ|y) = R π(y|θ)π(θ) ∝ π(y|θ)π(θ). π(y|θ)π(θ)dθ (14) The posterior can then be used for predictive inference. There is a large number of useful texts on Bayesian inference; for a good introduction, see Berger (1985) and Robert (2001). 12 3.1 MCMC under Bayesian framework The explicit evaluation of the normalization constant in (14) is often difficult especially in high dimensions. The complexity in our case is evident from the log-likelihood expression (13). The MCMC method provides a highly efficient alternative to traditional techniques by sampling from the posterior indirectly and performing the integration implicitly. MCMC is especially suited to a Bayesian inference framework. It facilitates the quantification of parameter uncertainty and model risks. It also allows a unified estimation procedure that estimates parameters and latent variables. In the last case a special algorithm called data augmentation can be employed, see Tanner and Wong (1987). The Bayesian estimates of particular interest from MCMC are the maximum a posterior (MAP) estimate and the minimum mean square error (MMSE) estimate, defined as follows MAP : θ̂ M AP = arg max[π(θ|y)], MMSE : θ θ̂ M M SE = E[θ|y]. (15) (16) The MAP and MMSE estimates are the posterior mode and mean respectively. If the prior π(θ) is constant and the parameter range includes the MLE, then the MAP of the posterior is the same as MLE. 3.2 Metropolis-Hastings algorithm In our case study we use the Metropolis-Hastings algorithm first described by Hastings (1970) as a generalization of the Metropolis algorithm (Metropolis et al 1953). Denote the state vector at step t as θ (t) and we wish to update it to a new state θ (t+1) . We generate a candidate θ ∗ from density q(θ|θ (t) ), and accept this point as the new state of the chain with probability given by   π(θ ∗ )q(θ (t) |θ ∗ ) . α(θ , θ ) = min 1, π(θ (t) )q(θ ∗ |θ (t) ) (t) ∗ 13 (17) If the proposal is accepted, the new state θ (t+1) = θ ∗ , otherwise θ (t+1) = θ (t) . The single component Metropolis-Hastings is often more efficient in practice. Here the state variable (t) (t) (t) θ (t) is partitioned into components θ (t) = (θ1 , θ2 , . . . , θn ) which are updated one by one or block by block. This was the framework for MCMC originally proposed by Metropolis et al. (1953), and is adapted in this study. The likelihood is computed as π(y|θ) = exp(ℓy (θ)), where ℓy (θ) is the log-likelihood given by (13). In computer implementation, we take advantage of the fact that only one component is updated at each sub-step in the single component Metropolis-Hastings algorithm by saving and re-using any values not affected by the current updating. For example, each evaluation of (13) calls for the inverse of the t-distribution for all the data points and all the dof values. Saving and re-using these inverse values reduce the calculation by a factor of six for the six-dimensional MCMC computation. 3.3 Bayesian model selection using MCMC Powerful MCMC methods such as the Gibbs sampler (Gelfand and Smith 1990) and the Metropolis-Hastings (MH) algorithm (Hastings 1970) enable direct estimation of the posterior and predictive quantities of interest, but do not lend themselves readily to estimation of the model probabilities. While one of the most common classical techniques is the Bayesian Information Criterion (BIC) (Schwarz 1978), many new approaches have been suggested in the literature. The most widely used methods include the harmonic mean estimator of Newton and Raftery (1994), importance sampling (Fruhwirth-Schnatter 1995), the reciprocal importance sampling estimator (Gelfand and Dey 1994), and bridge sampling (Meng and Wong 1996, Fruhwirth-Schnatter 2004). A comprehensive review of some of these methods applied to Bayesian model selection can be found in Kass and Raftery (1995). Consider model M with parameter vector θ. The model likelihood with data y can be 14 found by integrating out the parameter θ π(y|M) = Z π(y|θ, M)π(θ|M)dθ, (18) where π(θ|M) is the prior density of θ in model M. Given a set of H competing models M = (M1 , M2 , . . . , MH ), the Bayesian alternative to traditional hypothesis testing is to evaluate and compare the posterior probability ratio between the models. For model Ml (1 ≤ l ≤ H), assuming we have some prior knowledge about the model probability π(Ml ), we can compute the posterior probabilities for all models using the model likelihoods π(y|Ml ) π(Ml ) . π(Ml |y) = PH π(y|M ) π(M ) h h h=1 (19) Consider two competing models M1 and M2 , parameterized by θ[1] and θ[2] respectively. The choice between the two models can be based on the posterior model probability ratio, given by π(M1 |y) π(y|M1 ) π(M1 ) π(M1 ) = = B12 , π(M2 |y) π(y|M2 ) π(M2 ) π(M2 ) (20) where B12 = π(y|M1 )/π(y|M2) is the Bayes factor, the ratio of posterior odds of model M1 to that of model M2 . As shown by Lavine and Scherrish (1999), an accurate interpretation of the Bayes factor is that the ratio B12 captures the change of the odds in favour of model M1 as we move from prior to posterior. Jeffreys (1961) recommended a scale of evidence for interpreting Bayes factors, which was later modified by Wasserman (1997). A Bayes factor B12 > 10 is considered strong evidence in favour of M1 . For a detailed review of Bayes factors, see Kass and Raftery (1995). Typically, the integral (18) required by the Bayes factor is not analytically tractable and sampling based methods must be used to obtain estimates of the model likelihoods. In the current study we choose three methods for model selection: • direct estimation of the Bayes factor in (20) using Reciprocal Importance Sampling Estimation presented in Section 3.3.1; 15 • deviance information criterion (see Section 3.3.2); • direct computation of the posterior model probabilities using formula presented in Section 3.3.3. 3.3.1 Reciprocal Importance Sampling Estimator Given samples θ (t) , t = 1, . . . , N from the posterior distribution obtained through MCMC, Gelfand and Dey (1994) proposed the reciprocal importance sampling estimator (RISE) to approximate the model likelihood as #−1 " N 1 X h(θ (t) |M) π(y|M) ≈ , N t=1 π(y|θ (t) , M) π(θ (t) |M) (21) where h plays the role of an importance sampling density roughly matching the posterior. Gelfand and Dey (1994) suggested a multivariate normal or t-distribution density with mean and covariance fitted to the posterior sample. The RISE estimator can be regarded as a generalization of the harmonic mean estimator suggested by Newton and Raftery (1994). If h = 1 then (21) becomes the harmonic mean estimator. Other estimators include the bridge sampling proposed by Meng and Wong (1996), and the Chib’s candidate’s estimator (Chib 1995). In a recent comparison study by Miazhynskaia and Dorffner (2006), these estimators were employed as competing methods for Bayesian model selection on GARCH-type models, along with the reversible jump MCMC. It was demonstrated that the RISE estimator (either with normal or t importance sampling density), the bridge sampling method and the Chib’s algorithm gave statistically equal performance in model selection, and their performance more or less matches the much more involved reversible jump MCMC. 3.3.2 Deviance Information Criterion The deviance information criterion (DIC) is a generalization of the Bayesian information criterion (Schwarz 1978, Spiegelhalter et al 2002). For a given model M (for simplicity 16 we drop notation M in the formula below) the deviance is defined as D(θ) = −2 log(π(y|θ)) + C, (22) where the constant C is common to all nested models. Then DIC is calculated as DIC = 2Eθ [D(θ)] − D(Eθ [θ]) = Eθ [D(θ)] + (Eθ [D(θ)] − D(Eθ [θ])), (23) where Eθ [·] is the expectation with respect to θ. The expectation Eθ [D(θ)] is a measure of how well the model fits the data; the smaller its value, the better the fit. The difference Eθ [D(θ)] − D(Eθ [θ]) can be regarded as the effective number of parameters, the larger this term, the easier it is for the model to fit the data. So the DIC criterion favours the model with a better fit but at the same time penalizes the model with more parameters. Under this setting the model with the smallest DIC value is the preferred model. 3.3.3 Posterior model probabilities A popular approach for model choice is based on Reversible Jump MCMC (Green 1995). Here we adopt an alternative proposed recently by Peters et al (2009) based on the work of Congdon (2006). In this procedure the posterior model probabilities π(Ml |y) are estimated using the Markov chain in each model as π(Ml |y) = (t) N X t=1 (t) Ly (Ml , θ[l] ) PH (t) h=1 Ly (Mh , θ[h] ) , (24) (t) where θ[l] is the MCMC posterior sample at Markov chain step t for model Ml , Ly (Ml , θ[l] ) (t) is the likelihood of y for a given model Ml with parameter vector θ[l] , and N is the total number of MCMC steps after burn-in period. In (24), it is assumed that priors π(θ[l] |Ml ) and π(Ml ) are constant. 17 4 MCMC simulation results and analysis Prior distributions. In all MCMC simulation runs, we assume a uniform prior for every model parameter. The only subjective judgement we bring to the prior is the support (h) of the dof parameter. Denote the k th dof parameter of the hth t-copula model as νk (see Table 1). We impose a common lower and upper bounds for all dof components, (h) specifically 1 = νmin < νk < νmax = 100 . In our case study the support (1, 100) for dof parameter of the t-distribution should be sufficiently large to allow the posterior to be implied mainly by the observed data. To make sure the range is sufficiently large, we also tested a wider range of (1, 200) and found no material difference in the results. MCMC procedure. The starting value for the Markov chain for each component is set to a uniform random number drawn independently from the support (νmin , νmax ). In the single component Metropolis-Hastings algorithm, we adopt a truncated Gaussian distribution as the symmetric random walk proposal density for q(·|·) in (17). For each component, the mean of the Gaussian density was set to the current state and the variance was pre-tuned so that the acceptance rate is close to the optimal level. For d-dimensional target distributions with iid components, the asymptotic optimal acceptance rate has been reported to be 0.234; see Gelman et al (1997) and Roberts and Rosenthal (2001). In pre-tuning the variances for all the components we set 0.234 as the target acceptance rate. In addition, the Gaussian density was truncated below νmin and above νmax to ensure each proposal was drawn within the support for the parameters. Specifically, for the k th component at chain step t, the proposal density is (t) qk (θ ∗ (t) |θk ) = fN (θ∗ ; θk , σk ) (t) (t) FN (νmax ; θk , σk ) − FN (νmin; θk , σk ) , (25) where fN (·; µ, σ) and FN (·; µ, σ) are the Gaussian density and distribution functions respectively, with mean µ and standard deviation σ. An independent Markov chain was run for each of the 33 models listed in Table 1. 18 Each run consists of three stages: • Tuning - tune and adjust the proposal standard deviation to achieve optimal acceptance rate for each component. • “Burn-in” - samples from this period are discarded. • Posterior sampling - here the Markov chian is considered to have converged to the stationary target distribution and samples are used for model estimates. Unless stated otherwise, we use a “burn-in” period of length Nb = 20, 000. We then let the chain run for an additional N = 100, 000 iterations to generate the posterior samples. Each step contained a complete update of all components. MCMC convergence. Figure 1 shows the first 30,000 samples, taken after the burn-in (0) period, of the dof component ν1 for model M0 (i.e. the case of the generalized t-copula). Since M0 has the highest parameter dimensions among all the candidate models, in general it requires the longest length of chains to converge to a stationary distribution. This figure shows that after the burn-in period the samples are mixing well over the support of the posterior distribution. In addition to inspecting the sample paths, we also monitor the autocorrelation of the samples. Figure 2 shows the autocorrelations over multiple lags computed from the (0) posterior samples for component ν1 of model M0 . A useful value to compute from these autocorrelations for each component is the autocorrelation time defined as τk = 1 + 2 ∞ X ρk (g), (26) g=1 where ρk (g) is the autocorrelation at lag g for component θk . This autocorrelation is sometimes used to compute an “effective sample size” by dividing the number of samples by τk . The standard errors for the parameters can then be based on the effective sample size to compensate for the autocorrelation (see Ripley 1987, Neal 1993). In practice it 19 is necessary to cut off the sum in (26) at g = gkmax where the autocorrelations seem to have fallen to near zero, because including higher lags adds too much noise (for some interesting discussion on this issue, see Kass et al. 1998). As shown in Figure 2, in those well mixed MCMC samples the autocorrelation falls to near zero quickly and stays near zero at larger lags. For this study we have chosen a gkmax for each component such that the autocorrelation at lag gkmax has reduced to less than 0.01. That is, the autocorrelation time τk is estimated by gkmax τ̂k ≈ 1 + 2 X ρk (g), gkmax = min{g : ρk (g) < 0.01}. (27) g=1 The τ̂k values estimated from MCMC output for model M0 are shown in Table 2, along with the cut-off lag number gkmax . MCMC convergence characteristics for other components and for other models are very similar to those shown here for model M0 . 4.1 Bayesian estimates of parameters This section presents results for posterior mean (MMSE), mode (MAP) and numerical error due to finite number of MCMC iterations. 4.1.1 Posterior mean and its numerical error Table 3 shows values for the estimated mean from MCMC posterior samples for all 33 models. The standard errors (numerical error due to finite number of MCMC iterations) are shown in parentheses and the log-likelihoods corresponding to the estimated means are in the last column. Since the samples from MCMC are typically serially correlated, the usual formula for estimating the standard error of a sample mean (i.e. standard deviation √ divided by N ) will introduce significant under-estimation. Here, we use batch sampling for the standard error estimate of the MCMC posterior mean; see Gilks et al. (1996). Consider a MCMC posterior sample y1 , y2 , ..., yN with length N = Q × L, where L is 20 sufficiently large, so that the batch means 1 ȳq = L q×L X y (t) , q = 1, . . . , Q (28) t=(q−1)L+1 are considered approximately independent. Then ȳ = (ȳ1 + · · · + ȳQ )/Q and the standard error of the posterior sample mean ȳ can be approximated by v u Q X p 1 1 u t (ȳq − ȳ)2 , Var(ȳ) ≈ √ Q Q − 1 q=1 (29) Note that Q is the number of quasi-independent batches and L = N/Q is the size of each batch. 4.1.2 Posterior mode and likelihood ratio tests Values of the posterior mode taken from the MCMC samples for all the 33 models are shown in Table 4, along with the corresponding log-likelihood values. Using results of the maximum likelihood corresponding the posterior mode in Table 4, a classical likelihood ratio test can be performed to compare model likelihoods. Consider the null hypothesis that the observed FX daily return data are from distribution described by the grouped t-copula model M1 , and the alternative hypothesis that the data are distributed according to the generalized t-copula model M0 . The likelihood ratio for the two models is simply Λ = L1 /L0 , where L1 and L0 are the maximum likelihood values (i.e. the likelihood value at the mode) for M1 and M0 respectively. The test statistic −2 log(Λ) will be asymptotically χ2 distributed with degrees of freedom equal to the difference in the number of dof parameters in M0 and M1 , which is 4 in this case. We can perform likelihood ratio tests on all other grouped t-copula models (Mh , h = 2, . . . , 31) against the same alternative hypothesis of model M0 , the generalized t-copula. For the standard t-copula the difference in the number of parameters is 5. The test statistic and the associated p-value (χ2 significance) are given in Table 4. Clearly, according to 21 the p-value, all the null hypotheses should be rejected and the alternative hypothesis, the generalized t-copula model M0 , is statistically justified. Excluding model M0 , among the other 32 t-copula models (Mh , h = 1, . . . , 32), the one achieving the highest likelihood is M27 , which is one of the six (1,5) two-group configurations. The p-value of this best grouped t-copula model against M0 is 0.0045 which is still very small, suggesting a rather strong rejection of the grouped t-copula (including the standard t-copula) in favour of the t̃ν -copula model M0 . Achieving the highest likelihood from the fifteen (2,4) configurations is model M4 . It is interesting to notice that both M27 and M4 have three European currencies (CHF, EUR, GBP) in one group (see Table 1), perhaps reflecting a natural geopolitical and economic grouping. 4.2 Bayesian model choice While the likelihood ratio test relies on a single point estimate, the Bayesian model choice makes decisions based on the entire posterior distribution. As discussed in Section 3.3, three Bayesian inference criteria were used to choose among the 33 t-copula models: RISE given by (21); DIC given by (23); and the posterior model probabilities (24). The RISE calculation involves fitting the MCMC posterior samples to a multivariate normal or t-distribution and taking expectation of the reciprocal likelihood. The DIC calculation requires taking expectation of the likelihood and the parameters. Column 2 in Table 5 shows the RISE factor B0h = R0 /Rh , h = 1, . . . , 32, where Rh is the RISE value for model Mh . That is, B0h (1 ≤ h ≤ 32) is a measure of strength in the argument that the generalized t-copula (model M0 ) is the Bayesian choice. The very large Bayes factors (B0h > e11 > 5.9 × 104) shown in Table 5 overwhelmingly support the generalized t-copula, confirming the likelihood ratio tests discussed previously. Excluding model M0 , these Bayes factors also point to M27 as the most favoured model among the grouped t-copulas (Mh , 1 ≤ h ≤ 32) confirming the likelihood ratio tests. The larger the Bayes 22 factor B0h , the stronger the case against model Mh (1 ≤ h ≤ 32). The DIC values for 33 models are shown in column 3 of Table 5. Since only relative DIC value matters, the common constant in (22) was set in such a way that the DIC value for model M0 is zero. As shown in Table 5, the DIC value for all the other models is significantly positive, relative to that of M0 . Thus under the DIC criterion the model of choice is clearly M0 , i.e. the generalized t-copula. In addition, similar to the RISE based Bayes factors and the likelihood tests, the DIC values also pick M27 as the most likely grouped t-copula model after M0 , the same as the RISE factor and the likelihood ratio test. The larger the DIC value, the stronger the case against the model. It is interesting to observe that the magnitude of the DIC value is close to that of the logarithm of the Bayes factor based on the reciprocal importance sampling estimator, when both are evaluated relative to the same model M0 . As shown by column 4 in Table 5, the results for posterior model probabilities (24) also agree with the RISE and DIC results, i.e. model M0 has a very high probability of 88%, and model M27 has the second highest probability. If we exclude model M0 , then model M27 has a high probability of 68%. In summary, all three Bayesian choice criteria point to the same model M0 as the best choice followed by model M27 , and these choices are in agreement with the classical likelihood ratio tests, as shown in Table 4. 5 Conditional Value-at-Risk Consider a portfolio of six major currencies. Denote the exchange rates (USD per currency (t) unit) for these currencies at time t by Si , i = 1, . . . , 6. Assume we hold λi units for the P (t) ith currency. The portfolio value at time t is then V (t) = 6i=1 λi Si . The log-return for (t+1) the ith currency at time t + 1 is given by xi 23 (t+1) = ln Si (t) − ln Si . The portfolio loss for one time step is then − δV (t+1) = V (t) − V (t+1) = 6 X (t) λ i Si i=1 = V (t) 6 X i=1    (t+1) 1 − exp xi    (t+1) wi 1 − exp xi , (30) (t) where wi = λi Si /V (t) is the proportion of the portfolio value in currency i at time t, that is, it is the dollar weight of the ith currency. Now we wish to simulate the distribution of portfolio return Z = −δV (t+1) /V (t) = 6 X i=1 6    X (t+1) (t+1) wi 1 − exp xi −wi xi . ≈ i=1 (t+1) In the present study we model the dependence of the log-returns xi by one of the t-copula models as described in the previous sections. Recall that the dof parameters and their posterior distributions are already obtained by Bayesian MCMC. To focus on the impact of copula models, we use the standard normal distribution for all the six marginals. We take the CVaR as our risk measure. Assume that a random variable Z has continuous density f (·) and distribution F (·). Given a threshold quantile level α, the CVaR above F −1 (α) is defined as 1 CV aRα [Z] = E[Z|Z ≥ F −1 (α)] = 1−α Z∞ xf (x)dx, (31) F −1 (α) which is the average of the losses exceeding F −1 (α). To demonstrate model impact on risk quantification, we compare CVaR of the two most likely models, M0 and M27 , the best and the second best models of all 33 candidates. CVaR is calculated numerically using 107 Monte Carlo simulations with t-copula model parameters given in Table (4). (M ) (M ) Table 6 shows CV aR0.990 and CV aR0.9927 predicted by models M0 and M27 for two portfolios (defined by weights in Table 6). Note in both portfolios we have negative weights (selling the currency) and the weights in each portfolio add to 1.0. As shown in Table 6, model M27 underestimates the 0.99 CVaR by 16% for the first portfolio, and 24 this underestimate reverses to a slight overestimate for the second portfolio, assuming the correct estimates are from model M0 . The second portfolio is only slightly different from the first – swapping the position of EUR and CHF (long/short position) in the first portfolio yields the second portfolio. The two portfolios are deliberately chosen to demonstrate that the model impact on risk quantification can be in either direction – it may be overestimation or it may be underestimation, depending on the portfolio. Thus it is important to choose the most suitable model statistically, such as by means of Bayesian model inference. Table 7 compares 0.99 CVaR prediction of model M0 with that of model M4 , the most likely model from the (3,3) configuration, for the same two portfolios as those in Table 6. Here again the 0.99 CVaR for the first portfolio is underestimated by the incorrect model, and for the second portfolio it is overestimated. 6 Conclusion This paper describes a Bayesian model choice methodology for t-copula models. As an illustration, altogether 33 t-copula models of six dimensions were considered: the generalized t-copula; the standard t-copula; and 31 grouped t-copula models from the complete subset of (3,3), (2,4) and (1,5) configurations. MCMC simulations under a Bayesian inference framework were performed to obtain the posterior distribution of dof parameters for all 33 t-copula models. Using historical data of foreign exchange rates as a case study, we found that Bayesian model choice based on the RISE, the DIC and the posterior model probabilities overwhelmingly favors the generalized t-copula model M0 . In addition, all three Bayesian choice criteria point to the same second most likely model M27 . These Bayesian choices are also in agreement with classical likelihood ratio tests. The impact of model choice on the CVaR for two portfolios of six FX majors was observed to be significant. For a comprehensive modeling of multivariate dependence in finance or insurance, there 25 are other issues in data analysis that should be addressed carefully, such as time-dependent correlation parameters and validation. These are not considered in the present study. 7 Acknowledgement We would like to thank Gareth Peters and John Donnelly for helpful discussions and comments on the manuscript. References [1] Berger, J. O. (1985) Statistical Decision Theory and Bayesian Analysis. SpringerVerlag, New York. [2] Breymann, W., Dias, A. and Embrechts, P. (2003) Dependence structures for multivariate high-frequency data in finance. Quantitative Finance 3, 1–14. [3] Cairns, A. (2000) A discussion of of parameter and model uncertainty in insurance. Insurance: Mathematics and Economics 27, 313–330. [4] Chib, S. (1995) Marginal likelihood from the Gibbs output. Journal of the American Statistical Association 90(432), 1313–1321. [5] Congdon, P. (2006) Bayesian Statistical Modelling. John Wiley & Sons, Ltd, 2nd edn. [6] Daul, S., De Giorgi, E., Lindskog, F. and McNeil, A. (2003) The grouped t-copula with an application to credit risk. Risk 16, 73–76. [7] Demarta, S. and McNeil, A. (2005) The t copula and related copulas. International Statistical Review 73(1), 111–129. 26 [8] Embrechts, P., McNeil, A. and Straumann, D. (2001) Correlation and dependence in risk management: properties and pitfalls. Dempster, M. and Moffatt, H. (Eds.), Risk Management: Value at Risk and Beyond , pp. 176–223, Cambridge University Press. [9] Fang, H., Fang, K. and Kotz, S. (2002) The meta-elliptical distributions with given marginals. Journal of Multivariate Analysis 82, 1–16. [10] Fruhwirth-Schnatter, S. (1995) Bayesian model discrimination and bayes factor for linear Gaussian state space models. Journal of the Royal Statistical Society, series B 57, 237–246. [11] Fruhwirth-Schnatter, S. (2004) Estimating marginal likelihoods for mixture and markov switching models using bridge sampling techniques. The Econometrics Journal 7, 143–167. [12] Gelfand, A. and Dey, D. (1994) Bayesian model choice: Asymptotic and exact calculations. Journal of the Royal Statistical Society, series B 56, 501–514. [13] Gelfand, A. and Smith, A. (1990) Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85, 398–409. [14] Gelman, A., Gilks, W. R. and Roberts, G. O. (1997) Weak convergence and optimal scaling of random walk Metropolis algorithm. Annals of Applied Probability 7, 110– 120. [15] Gilks, W. R., Richardson, S. and Spiegelhalter, D. J. (1996) Markov Chain Monte Carlo in practice. Chapman & Hall, London. [16] Green, P. (1995) Reversible jump MCMC computation and Bayesian model determination. Biometrika 82, 711–732. [17] Hastings, W. (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. 27 [18] Jeffreys, H. (1961) Theory of Probability. Oxford University Press, Oxford, 3 edn. [19] Joe, H. (1997) Multivariate Models and Dependence Concepts. Chapman & Hall, London. [20] Kass, R. and Raftery, A. (1995) Bayes factor. Journal of the American Statistical Association 90, 773–792. [21] Kass, R. E., Carlin, B. P., Gelman, A. and Neal, R. M. (1998) Markov chain Monte Carlo in practice: a roundtable discussion. The American Statistician 52(2), 93–100. [22] Lavine, M. and Schervish, M. J. (1999) Bayes factors: what they are and what they are not. The American Statistician 53(2), 119–122. [23] Luo, X. and Shevchenko, P. V. (2010) The t copula with multiple parameters of degrees of freedom: bivariate characteristics and application to risk management. Quantitative Finance 10(9), 1039–1054. [24] Luo, X. and Shevchenko, P. V. (2010) LGD credit risk model: estimation of capital with parameter uncertainty using MCMC. Preprint arXiv:1011.2827v3 available from http://arxiv.org. [25] Mashal, R., Naldi, M. and Zeevi, A. (2003) On the dependence of equity and asset returns. Risk 16, 83–87. [26] McNeil, A. J., Frey, R. and Embrechts, P. (2005) Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, Princeton. [27] Meng, X. and Wong, W. (1996) Simulating ratios of normalizing constants via a simple identity. Statistical Sinica 6, 831–860. 28 [28] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953) Equations of state calculations by fast computing machines. Journal of Chemical physics 21, 1087–1091. [29] Miazhynskaia, T. and Dorffner, G. (2006) A comparison of Bayesian model selection based on MCMC with an application to GARCH-type models. Statistical Papers 47, 525–549. [30] Neal, R. M. (1993) Porbabilistic inference using Markov chain samplers. Technical report, Department of Computer Science, University of Toronto. [31] Newton, M. and Raftery, A. (1994) Approximate Bayesian inference by the weighted likelihood bootstrap. Journal of the Royal Statistical Society, series B 56, 1–48. [32] Peters, G. W., Shevchenko, P. V. and Wüthrich, M. V. (2009) Model uncertainty in claims reserving within Tweedie’s compound poisson models. ASTIN Bulletin 39(1), 1–33. [33] Peters, G. W., Wüthrich, M. V. and Shevchenko, P. V. (2010) Chain ladder method: Bayesian bootstrap versus classical bootstrap. Insurance Mathematics and Economics 47(1), 36–51. [34] Piessens, R., De Doncker-Kapenga, E., Überhuber, C. W. and Kahaner, D. K. (1983) QUADPACK – a Subroutine Package for Automatic Integration. Springer-Verlag. [35] Ripley, B. D. (1987) Stochastic Simulation. Wiley, New York. [36] Robert, C. P. (2001) The Bayesian Choice. Springer Verlag, New York. [37] Roberts, G. O. and Rosenthal, J. S. (2001) Optimal scaling for various MetropolisHastings algorithms. Statistical Science 16, 351–367. 29 [38] Schwarz, G. (1978) Estimation the dimension of a model. Annals of Statistics 6, 461–464. [39] Shevchenko, P. V. (2010) Implementing loss distribution approach for operational risk. Applied Stochastic Models in Business and Industry 26(3), 277–307. [40] Sklar, A. (1959) Fonctions de rpartition n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 8, 229–231. [41] Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and Van Der Linde, A. (2002) Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B 64(4), 583–639. [42] Tanner, M. A. and Wong, W. H. (1987) The calculation of posterior distributions by data augmentation. Journal of American Statistical Association 82(398), 528–540. [43] Venter, G., Barnett, J., Kreps, R. and Major, J. (2007) Multivariate copulas for financial modeling. Variance 1(1), 103–119. [44] Wasserman, L. (1997) Bayesian model selection and model averaging. Technical report, Statistics Department, Carnegie Mellon University. 30 60 50 40 30 20 10 0 0 5000 10000 15000 20000 25000 30000 (0) Figure 1: Markov chain paths for parameter ν1 of model M0 . 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 (0) 24 25 Figure 2: Autocorrelation of Markov chain samples for dof parameter ν1 of model M0 . 31 Table 1: Group configurations and parameters for the 33 t-copula models. Model M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 Group Configuration (AUD), (CAD), (CHF), (EUR), (GBP), (JPY) (AUD, CAD, CHF), (EUR, GBP, JPY) (AUD, CAD, EUR), (CHF, GBP, JPY) (AUD, CAD, GBP), (CHF, EUR, JPY) (AUD, CAD, JPY), (CHF, EUR, GBP) (AUD, CHF, EUR), (CAD, GBP, JPY) (AUD, CHF, GBP), (CAD, EUR, JPY) (AUD, CHF, JPY), (CAD, EUR, GBP) (AUD, EUR, GBP), (CAD, CHF, JPY) (AUD, EUR, JPY), (CAD, CHF, GBP) (AUD, GBP, JPY), (CAD, CHF, EUR) (GBP, JPY), (AUD, CAD, CHF, EUR) (AUD, CAD), (CHF, EUR, GBP, JPY) (AUD, CHF), (CAD, EUR, GBP, JPY) (AUD, EUR), (CAD, CHF, GBP, JPY) (AUD, GBP), (CAD, CHF, EUR, JPY) (AUD, JPY), (CAD, CHF, EUR, GBP) (CAD, CHF), (AUD, EUR, GBP, JPY) (CAD, EUR), (AUD, CHF, GBP, JPY) (CAD, GBP), (AUD, CHF, EUR, JPY) (CAD, JPY), (AUD, CHF, EUR, GBP) (CHF, EUR), (AUD, CAD, GBP, JPY) (CHF, GBP), (AUD, CAD, EUR, JPY) (CHF, JPY), (AUD, CAD, GBP, EUR) (EUR, GBP), (AUD, CAD, CHF, JPY) (EUR, JPY), (AUD, CAD, GBP, CHF) (AUD), (CAD, CHF, EUR, GBP, JPY) (CAD), (AUD, CHF, EUR, GBP, JPY) (CHF), (CAD, AUD, EUR, GBP, JPY) (EUR), (CAD, CHF, AUD, GBP, JPY) (GBP), (CAD, CHF, EUR, AUD, JPY) (JPY), (CAD, CHF, EUR, GBP, AUD) (AUD, CAD, CHF, EUR, GBP, JPY) 32 Parameters (0) (0) (0) (0) (0) (0) ν1 , ν2 , ν3 , ν4 , ν5 , ν6 (1) (1) ν1 , ν2 (2) (2) ν1 , ν2 (3) (3) ν1 , ν2 (4) (4) ν1 , ν2 (5) (5) ν1 , ν2 (6) (6) ν1 , ν2 (7) (7) ν1 , ν2 (8) (8) ν1 , ν2 (9) (9) ν1 , ν2 (10) (10) ν1 , ν2 (11) (11) ν1 , ν2 (12) (12) ν1 , ν2 (13) (13) ν1 , ν2 (14) (14) ν1 , ν2 (15) (15) ν1 , ν2 (16) (16) ν1 , ν2 (17) (17) ν1 , ν2 (18) (18) ν1 , ν2 (19) (19) ν1 , ν2 (20) (20) ν1 , ν2 (21) (21) ν1 , ν2 (22) (22) ν1 , ν2 (23) (23) ν1 , ν2 (24) (24) ν1 , ν2 (25) (25) ν1 , ν2 (26) (26) ν1 , ν2 (27) (27) ν1 , ν2 (28) (28) ν1 , ν2 (29) (29) ν1 , ν2 (30) (30) ν1 , ν2 (31) (31) ν1 , ν2 (32) ν1 Table 2: Autocorrelation estimates and corresponding cut-off lag number. Parameter τ̂k gkmax (0) ν1 (0) (0) ν2 ν3 8.79 2.23 23 14 (0) ν4 (0) ν5 23.5 23.1 8.41 65 57 30 33 (0) ν6 8.69 34 Table 3: MCMC output values of posterior mean, standard error and log-likelihood. Model Posterior Mean (Standard Error) (0) M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 ν1 = 15.4(0.79), (0) ν4 = 6.38(0.20), (1) ν1 = 15.1 (0.29), (2) ν1 = 10.2 (0.15), (3) ν1 = 18.2 (0.28), (4) ν1 = 24.4 (0.82), (5) ν1 = 8.51 (0.13), (6) ν1 = 13.4 (0.26), (7) ν1 = 13.6 (0.27), (8) ν1 = 9.24(0.13), (9) ν1 = 8.76(0.13), (10) ν1 = 12.6(0.23), (11) ν1 = 27.9(4.87), (12) ν1 = 14.1(0.44), (13) ν1 = 8.56(0.11), (14) ν1 = 13.2(0.74), (15) ν1 = 13.3(0.94), (16) ν1 = 17.4(0.85), (17) ν1 = 9.78(0.14), (18) ν1 = 24.9(4.4), (19) ν1 = 32.5(7.99), (20) ν1 = 7.46(0.11), (21) ν1 = 14.2(0.43), (22) ν1 = 14.3(0.49), (23) ν1 = 8.64(0.11), (24) ν1 = 8.66(0.09), (25) ν1 = 12.8(0.50), (26) ν1 = 15.2(1.63), (27) ν1 = 64.7(3.01), (28) ν1 = 16.0(1.15), (29) ν1 = 7.94(0.32), (30) ν1 = 15.1(1.53), (31) ν1 = 11.4(0.32), (32) ν1 = 11.4 (0.14) (0) Log-likelihood (0) ν2 = 67.3(1.3), ν3 = 8.76(0.34), (0) (0) ν5 = 11.6(0.46), ν6 = 18.3(1.1) (1) ν2 = 9.37 (0.14) (2) ν2 = 13.4 (0.21) (3) ν2 = 8.56 (0.09) (4) ν2 = 7.78 (0.16) (5) ν2 = 18.7 (0.49) (6) ν2 = 10.2 (0.16) (7) ν2 = 10.3 (0.17) (8) ν2 = 15.7 (0.35) (9) ν2 = 14.0 (0.32) (10) ν2 = 11.1 (0.17) (11) ν2 = 8.6 (0.09) (12) ν2 = 10.6 (0.14) (13) ν2 = 13.8 (0.25) (14) ν2 = 11.1 (0.14) (15) ν2 = 11.2 (0.17) (16) ν2 = 9.78 (0.13) (17) ν2 = 12.9 (0.2) (18) ν2 = 8.96 (0.09) (19) ν2 = 8.76 (0.08) (20) ν2 = 16.8 (0.59) (21) ν2 = 10.5 (0.14) (22) ν2 = 10.5 (0.15) (23) ν2 = 14.2 (0.32) (24) ν2 = 13.6 (0.23) (25) ν2 = 11.2 (0.13) (26) ν2 = 11.3 (0.31) (27) ν2 = 9.26 (0.24) (28) ν2 = 11.0 (0.38) (29) ν2 = 12.9 (0.48) (30) ν2 = 11.3 (0.32) (31) ν2 = 1.81 (0.17) 34 2353.1 2342.8 2338.6 2341.9 2343.8 2341.9 2338.6 2338.6 2343.2 2343.2 2336.7 2336.7 2336.7 2343.6 2336.7 2336.6 2343.0 2338.9 2342.5 2343.4 2343.0 2338.6 2338.4 2343.7 2343.4 2336.7 2336.4 2346.7 2338.4 2344.4 2336.5 2336.4 2336.8 Table 4: MCMC output values of posterior mode, corresponding log-likelihood, likelihood ratio Λ and p-value comparing Mh with M0 . Model M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 MCMC posterior mode Log-likelihood ν (0) = (11.5, 82.4, 7.92, 5.81, 10.3, 14.3) ν (1) = (14.0, 8.96) ν (2) = (9.75, 12.6) ν (3) = (16.6, 8.22) ν (4) = (21.0, 7.49) ν (5) = (8.17, 16.9) ν (6) = (12.6, 9.73) ν (7) = (12.8, 9.79) ν (8) = (8.84, 14.4) ν (9) = (8.76, 14.0) ν (10) = (11.8, 10.5) ν (11) = (22.6, 8.68) ν (12) = (13.1, 10.2) ν (13) = (8.23, 13.1) ν (14) = (11.9, 10.7) ν (15) = (11.7, 10.8) ν (16) = (15.7, 9.39) ν (17) = (9.31, 12.7) ν (18) = (20.9, 8.66) ν (19) = (25.2, 8.48) ν (20) = (7.14, 15.7) ν (21) = (13.2, 10.1) ν (22) = (13.3, 10.1) ν (23) = (8.27, 13.4) ν (24) = (8.31, 12.9) ν (25) = (11.7, 10.8) ν (26) = (11.8, 10.9) ν (27) = (68.3, 9.03) ν (28) = (14.0, 10.5) ν (29) = (7.55, 12.2) ν (30) = (12.0, 10.9) ν (31) = (11.4, 11.0) ν (32) = (11.1) 35 2354.3 2342.9 2338.7 2342.1 2344.1 2342.1 2338.8 2338.7 2343.4 2343.3 2336.9 2343.0 2338.6 2343.7 2336.9 2336.8 2343.2 2339.0 2342.7 2343.8 2343.1 2338.5 2338.5 2343.8 2343.5 2336.8 2336.8 2346.8 2338.6 2344.5 2336.8 2336.8 2336.9 −2 log(Λ) p-value 0 22.8 31.2 24.4 20.4 24.4 31.0 31.2 21.8 22.0 34.8 22.6 31.5 21.3 34.8 34.9 22.2 30.6 23.1 21.1 22.3 31.5 31.5 21.0 21.7 34.9 34.9 15.1 31.4 19.6 34.9 35.0 34.9 N/A 0.00014 <0.00001 <0.0001 0.00042 <0.0001 <0.00001 <0.00001 0.00022 0.00020 <0.000001 0.00015 <0.00001 0.00028 <0.000001 <0.000001 0.00018 <0.00001 0.00012 0.00030 0.00017 <0.00001 <0.00001 0.00031 0.00023 <0.000001 <0.000001 0.0045 <0.00001 0.00060 <0.000001 <0.000001 <0.00001 Table 5: Bayes factors B0i , DIC and model probabilities of all candidates. Model M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 Log(B0h ) DIC Model prob. (%) 0 16.7 20.1 16.6 16.4 18.7 19.3 21.5 14.5 18.9 21.9 18.0 24.9 14.9 23.2 25.5 14.2 21.7 15.3 15.1 15.1 21.5 20.6 17.1 15.4 22.6 22.2 11.0 21.0 16.3 28.0 22.8 22.3 0 15.8 24.0 17.5 13.0 17.4 24.1 24.3 15.0 15.1 27.8 15.6 24.7 14.3 27.9 28.0 15.3 23.7 16.1 14.1 15.3 24.3 24.6 14.1 14.7 27.9 28.0 6.73 24.6 12.7 28.0 28.1 28.1 88.5 0.15 < 0.1 < 0.1 0.45 < 0.1 < 0.1 < 0.1 0.12 0.22 < 0.1 0.15 < 0.1 0.31 < 0.1 < 0.1 0.20 < 0.1 0.12 0.29 0.19 < 0.1 < 0.1 0.36 0.25 < 0.1 < 0.1 7.8 < 0.1 0.7 < 0.1 < 0.1 < 0.1 36 Model prob. (%) excl. M0 N/A 1.28 < 0.1 0.54 3.89 0.55 < 0.1 < 0.1 1.07 1.89 < 0.1 1.26 < 0.1 2.68 < 0.1 < 0.1 1.73 < 0.1 1.03 2.52 1.64 < 0.1 < 0.1 3.10 2.22 < 0.1 < 0.1 68.3 < 0.1 6.18 < 0.1 < 0.1 < 0.1 Table 6: The 0.99 conditional Value-at-Risk (CV aR0.99 ) predicted by model M0 and M27 , M27 M0 M0 for two portfolios of six major currencies. δ = (CV aR0.99 − CV aR0.99 )/CV aR0.99 is the relative difference of CVaR between M27 and M0 . Standard errors are in parentheses. Portfolio asset weights wi (AUD, CAD, CHF, EUR, GBP, JPY) (0.25, 0.25, 0.8, -0.8, 0.25, 0.25) (0.25, 0.25, -0.8, 0.8, 0.25, 0.25) (M ) CV aR0.990 (M ) CV aR0.9927 1.707 (0.004) 1.425 (0.003) 1.737 (0.003) 1.782 (0.003) δ -16.5% 2.6% Table 7: The 0.99 conditional Value-at-Risk (CV aR0.99 ) predicted by model M0 and M4 , M4 M0 M0 for two portfolios of six major currencies. δ = (CV aR0.99 − CV aR0.99 )/CV aR0.99 is the relative difference of CVaR between M4 and M0 . Standard errors are in parentheses. Portfolio asset weights wi (AUD, CAD, CHF, EUR, GBP, JPY) (0.25, 0.25, 0.8, -0.8, 0.25, 0.25) (0.25, 0.25, -0.8, 0.8, 0.25, 0.25) (M ) CV aR0.990 (M ) CV aR0.994 1.571 (0.004) 1.366 (0.003) 1.608 (0.003) 1.732 (0.003) 37 δ -13.0% 7.7%