Calibration of Var Models With Overlapping Data
Calibration of Var Models With Overlapping Data
1–30
doi:10.1017/S1357321719000151
DISCUSSION PAPER
Abstract
Under the European Union’s Solvency II regulations, insurance firms are required to use a one-year VaR
(Value at Risk) approach. This involves a one-year projection of the balance sheet and requires sufficient
capital to be solvent in 99.5% of outcomes. The Solvency II Internal Model risk calibrations require annual
changes in market indices/term structure for the estimation of risk distribution for each of the Internal
Model risk drivers. This presents a significant challenge for calibrators in terms of:
• Robustness of the calibration that is relevant to the current market regimes and at the same time able to
represent the historically observed worst crisis;
• Stability of the calibration model year on year with arrival of new information.
The above points need careful consideration to avoid credibility issues with the Solvency Capital
Requirement (SCR) calculation, in that the results are subject to high levels of uncertainty.
For market risks, common industry practice to compensate for the limited number of historic annual
data points is to use overlapping annual changes. Overlapping changes are dependent on each other, and
this dependence can cause issues in estimation, statistical testing, and communication of uncertainty levels
around risk calibrations.
This paper discusses the issues with the use of overlapping data when producing risk calibrations for an
Internal Model. A comparison of the overlapping data approach with the alternative non-overlapping data
approach is presented. A comparison is made of the bias and mean squared error of the first four
cumulants under four different statistical models. For some statistical models it is found that overlapping
data can be used with bias corrections to obtain similarly unbiased results as non-overlapping data, but
with significantly lower mean squared errors. For more complex statistical models (e.g. GARCH) it is
found that published bias corrections for non-overlapping and overlapping datasets do not result in
unbiased cumulant estimates and/or lead to increased variance of the process.
In order to test the goodness of fit of probability distributions to the datasets, it is common to use
statistical tests. Most of these tests do not function when using overlapping data, as overlapping data
breach the independence assumption underlying most statistical tests. We present and test an
adjustment to one of the statistical tests (the Kolmogorov Smirnov goodness-of-fit test) to allow for
overlapping data.
Finally, we explore the methods of converting “high”-frequency (e.g. monthly data) to “low”-frequency
data (e.g. annual data). This is an alternative methodology to using overlapping data, and the approach of
fitting a statistical model to monthly data and then using the monthly model aggregated over 12 time steps
to model annual returns is explored. There are a number of methods available for this approach. We
explore two of the widely used approaches for aggregating the time series.
1. Executive Summary
1.1 Overview
Under the European Union Solvency II regulations, insurance firms are required to calculate a
one-year Value at Risk (VaR) of their balance sheet to a 1 in 200 level. This involves a one-year
projection of a market-consistent balance sheet and requires sufficient capital to be solvent in
© Institute and Faculty of Actuaries 2019. This is an Open Access article, distributed under the terms of the Creative Commons Attribution
licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium,
provided the original work is properly cited.
99.5% of outcomes. In order to calculate one-year 99.5th percentile VaR, a significant volume of
one-year non-overlapping data is needed. In practice there is often a limited amount of relevant
market data for market risk calibrations and an even more limited reliable and relevant data
history for insurance/operational risks.
Two of the key issues with the available market data are:
• The dataset available may be relatively longer (e.g. for corporate credit spread risk, Moody’s
default and downgrade data are available from 19191), but data may not be directly relevant
or not granular enough for risk calibration.
• Dataset may be very relevant to the risk exposure and granular as required, but data length is
not sufficient, for example, for corporate credit spread risk, Merrill Lynch or iBoxx data are
available from 1996 or 2006, respectively.
• use overlapping data or non-overlapping data (If overlapping data is used, then is there any
adjustment that can be made to the probability distribution calibrations and statistical tests to
ensure that the calibration is still fit for purpose?) or
• use non-overlapping data with higher frequency than annual (e.g. monthly) and extract the
statistical properties of this data which can allow us to aggregate the time series to lower-
frequency (e.g. annual) time series.
In section 3 of this paper we consider adjustments to correct for bias in probability distributions
calibrated using overlapping data. In section 4 adjustments to statistical tests are defined and
tested. In section 5, issues with using data periods shorter than a year and then aggregating to
produce annualised calibrations are considered.
• Using published bias adjustments, both overlapping and non-overlapping data can be used to
give unbiased estimates of statistical models where monthly returns are not autocorrelated.
1
Many datasets may be available for about 100 years which could be considered sufficient; however, it is possible that there
exists a materially wide confidence interval in the 1-in-200 point.
Where returns are autocorrelated, bias is more complex for both overlapping and non-
overlapping data.
• In general, overlapping data are more likely to be closer to the exact answer than non-
overlapping data. By using more of the available data, overlapping data generally give cumu-
lant estimates with lower MSE than using non-overlapping data.
• Use of non-overlapping monthly data and annualising using empirical correlation that is
present in the time series (see section 5.2 for further details). The key points to note from
the use of annualisation are:
• This technique involves:
○ fitting a probability distribution to monthly data;
○ aggregating the simulated monthly returns into annual returns using a copula or other
relevant techniques.
• It utilises all the data points and therefore would not miss any information that is present
in the data; and in absence of information on the future, data trends would lead to a more
stable calibration overall.
• In the dataset we explored, it improves the fits considerably in comparison to non-
overlapping data or monthly annual overlapping data because of the large simulated data
used in the calibration.
• However, it does not remove the autocorrelation issue completely (as monthly non-
overlapping data or, for that matter, any “high”-frequency data could be autocorrelated)
and does not handle the issues around volatility clustering.
• Use of statistical techniques such as “temporal aggregation” (section 5.3). The key points to
note from the use of temporal aggregation are:
• Temporal aggregation involves fitting a time series model to monthly data, then using this
time series model to model annual data.
• It utilises as much data as possible without any key events being missed.
• It improves the fit to the empirical data and leads to a stable calibration.
• It can handle data with volatility clustering and autocorrelation.
• However, it suffers from issues such as possible loss of information during the increased
number of data transformations and is complex to understand and communicate to
stakeholders.
• Use of autocorrelation adjustment (or “de-smoothing” the data). This technique is not covered
here as this is a widely researched topic (Marcatoo, 2003). However, a similar technique by
Sun et al. (2009) has been used in section 3, which corrects for bias in the estimate of data variance.
1.5 Conclusions
The key messages concluded from this paper are:
• There is a constant struggle between finding relevant data for risk calibration and sufficient
data for robust calibration.
• Using overlapping data is acceptable for Internal Model calibration; however, communica-
tion of uncertainty in the model and parameters to the stakeholder is important.
• There are some credible alternatives to using overlapping data such as temporal aggregation
and annualisation; however, these alternatives bring their own limitations, and understand-
ing of these limitations is key to using these alternatives. We recommend considering the
comparison of calibration using both non-overlapping monthly data annualised with over-
lapping annual data and discussing the advantages, robustness, and limitations of both the
approaches with stakeholders before finalising the calibration approach.
• Diversification benefit using internal models is one of the key discussion topics among
industry participants. So far, we have only analysed univariate time series. Further efforts
are required in terms of analysing the impact of overlapping data on covariance and corre-
lation properties between two time series.
• Similarly, the impact on statistical techniques such as dimension reduction techniques
(e.g. PCA) needs investigation. Initial efforts can be made in terms of treating each dimen-
sion as a single univariate time series and applying various techniques such a temporal
aggregation or annualisation and applying dimension reduction techniques on both overlap-
ping and non-overlapping transformed datasets to understand the impact.
• The impact on statistical tests other than the KS test has not been investigated. We have also
not investigated using different probability distributions than the normal distribution for the
KS test. Both these areas could be investigated further using the methods covered in this
paper.
• Measurement of parameter and model uncertainty in the light of new information has not
been investigated either for “annualisation” method or for “temporal aggregation” method.
Hansen and Hodrick (1980) examined the predictive power of 6-month forward foreign
exchange rates. The period over which a regression is conducted is 6 months – yet monthly
observations are readily available but clearly dependent. The authors derived the asymptotic
distribution of regression statistics using the Generalised Method of Moments (GMM; Hansen
1982) which does not require independent errors. The regression statistics are consistent, and
GMM provides a formula for the standard error. This approach has proved influential, and several
estimators have been developed for the resulting standard error: Hansen and Hodrick’s original,
Newey and West (1987), and Hodrick (1992) being prominent examples. Newey and West errors
are the most commonly used in practice.
However, the derived distribution of fitted statistics is only true asymptotically, and the small-
sample behaviour is often unknown. Many authors use bootstrapping or Monte Carlo simulation
to assess the degree of confidence to attach to a specific statistical solution. For example, one
prominent strand of finance literature has examined the power of current dividend yields to
predict future equity returns. Ang and Bekaert (2006) and Wei and Wright (2013) show using
Monte Carlo simulation that the standard approach of Newey and West errors produces a
test size (i.e. probability of a type I error) which is much worse than when using Hodrick
(1992) errors.
In addition to the asymptotic theory, there has also been work on small-sample behaviour.
Cochrane (1988) examines the multi-year behaviour of a time series (GNP) for which quarterly data
are available. He calculates the variance of this time series using overlapping time periods and
computes the adjustment factor required to make this calculation unbiased in the case of a random
walk. This adjustment factor generalises the n−1 denominator Bessel correction in the non-
overlapping case. Kiesel, Perraudin and Taylor (2001) extend this approach to third and fourth
cumulants.
Müller (1993) conducts a theoretical investigation into the use of overlapping data to
estimate statistics from time series. He concludes that while the estimation of a sample mean is
not improved by using overlapping rather than non-overlapping data, if the mean is known, then
the standard error of sample variance can be reduced by about 1/3 when using overlapping data. His
analysis of sample variance is extended to the case of unknown mean, again with improvements of
about 1/3, by Sun et al. (2009). Sun et al. also suggest an alternative approach of using the average of
non-overlapping estimates. Like Cochrane and Müller, this leads to a reduction in variance of about
1/3 compared to using just non-overlapping data drawn from the full sample.
Efforts have been made to understand the statistical properties and/or behaviour of the “high”-
frequency (e.g. monthly or daily data points) time series data to transform these into “low”-
frequency time series data (i.e. annual data points) via statistical techniques such as temporal
aggregation. Initial efforts were made to understand the temporal aggregation of ARIMA processes,
and Amemiya and Wu (1972) led the research in this area. Feike_Drost_Nijman (1993) developed
closed-form solutions for temporal aggregation of GARCH processes and described relationships
between various ARIMA processes under “high”-frequency and their transformation under
“low”-frequency time series. Chan et al. (2008) show various aggregation techniques using equity
returns (S&P500 data) and its impact on real-life situations.
The method of moments is not the only, and not necessarily the best, method for fitting
distributions to data, with maximum likelihood being an alternative. There are some comparisons
within the literature; we note the following points:
two distributions with shared fourth moments, while, as far as we know, there are no cor-
responding results bounding misspecification error for maximum likelihood estimates.
• The method of moments often has the advantage of simpler calculation and easy verification
that a fitted distribution indeed replicates sample properties.
• The adaptation of the maximum likelihood method to overlapping data does not seem to
have been widely explored in the literature, while (as we have seen) various overlapping
corrections have been published for method-of-moments estimates. For this reason, in
the current paper, we have focused on moments/cumulants.
• Brownian process
• Normal inverse Gaussian process
• ARMA process
• GARCH2 process
• Simulate a monthly time series of n years data from one of the four processes above.
• Calculate annual returns using overlapping and non-overlapping data.
• Calculate the first four cumulants of annual returns (for overlapping and non-overlapping
data).
• Compare the estimated cumulants with known cumulants.
• Repeat 1000 times to estimate the bias and MSE of both overlapping and non-overlapping data.
The analysis has been carried out for all years up to year 50, and the results are shown below.
The results for ARMA and normal inverse Gaussian are in Appendix B.
2
GARCH (p,q) model specification is calibrated by making sure that |p + q|< 1 to ensure the time series remains stable.
The diagram shows the bias in the plot on the left and the MSE on the plot on the right. The
overlapping and non-overlapping data estimates of the mean appear very similar and not
obviously biased. These also have very similar MSE across all years.
Neither approach appears to have any systemic bias for the third cumulant. The MSE is sig-
nificantly higher for non-overlapping data than overlapping data.
In this case the non-overlapping data appear to have a higher downward bias than overlapping
data at all terms; both estimates appear biased. The non-overlapping data have higher MSE than
the overlapping data.
Plots of the bias and MSE for the normal inverse Gaussian are given in Appendix B. These are
very similar to those of the Brownian process.
The diagram shows the bias in the plot on the left and the MSE on the plot on the right. The
overlapping and non-overlapping data estimates of the mean appear very similar after 20 years.
Below 20 years, the data show some bias under both overlapping and non-overlapping data series.
These have very similar MSEs after 10 years, and below 10 years, non-overlapping data have
marginally lower MSEs.
Both approaches have similar levels of bias, particularly when available data are limited. Bias
corrections now both overstate the variance particularly strongly for datasets with less than 10 years
data. The MSE for overlapping data appears to be materially lower than non-overlapping data.
The non-overlapping data have both higher bias and MSE compared to overlapping data across
all years.
The non-overlapping data have lower bias compared to overlapping data. However, overlap-
ping data have lower MSE.
• For the Brownian and normal inverse Gaussian reference models, the non-overlapping and
overlapping data series are both downwardly biased to a similar extent. In both cases bias can
be corrected for by using the Bessel correction for non-overlapping data, and the Cochrane
(1988) or Sun et al. (2009) corrections for overlapping data. The MSE is lower for the over-
lapping data (due to the additional data included). These results suggest that for the estima-
tion of the second cumulant, the overlapping data perform better due to lower MSE and have
a greater likelihood to be nearer to the true answer.
1. Fit a normal distribution to the dataset of n data points and calculate the parameters for
normal distribution.
2. Measure the Kolmogorov distance for the fitted distribution and the dataset, call this D.
3. Simulate n data points from a normal distribution with the same parameters as found in
step 1. Re-fit another normal distribution and calculate the Kolmogorov distance between
this newly fitted normal and the simulated data.
4. Repeat step 3 1000 (or suitably large) times to generate a distribution of Kolmogorov distances.
5. Calculate the percentile the distance D is on the probability distribution calculated in step 4.
6. If the distance D is greater than the 95th percentile of the probability distribution calculated
in step 4, then it is rejected at the 5% level.
The reason this approach works is because the KS distance is calculated between the data and a
fitted distribution and then compared with 1000 randomly generated such distances. If the dis-
tance between the data and the fitted distribution is greater than 95% of the randomly generated
distances, then there is statistically significant evidence against the hypothesis that the data are
from the fitted distribution.
above to correct for sampling error, except that both the data being tested and the data simulated
as part of the test are overlapping data. The steps are:
1. Fit a normal distribution to the dataset of n overlapping data points and calculate the
parameters for normal distribution.
2. Measure the Kolmogorov distance for the fitted distribution and the dataset, call this D.
3. Simulate n overlapping data points from a normal distribution with the same parameters as
found in step 1. Re-fit another normal distribution and calculate the Kolmogorov distance
between this newly fitted normal and the simulated data.
4. Repeat step 3 1000 times to generate a distribution of Kolmogorov distances.
5. Calculate the percentile the distance D is on the probability distribution calculated in step 4.
6. If the distance D is greater than the 95th percentile of the probability distribution calculated
in step 4, then it is rejected at the 5% level.
A key question is how to simulate the overlapping data in step 3 in the list above. For Levy-
stable processes such as normal distribution, this can be done by simulating from the normal
distribution at a monthly timeframe and then calculating the annual overlapping data directly
from the monthly simulated data. For processes which are not Levy-stable, an alternative is to
directly simulate annual data and then aggregate into overlapping data using a Gaussian copula
with a correlation matrix which gives the theoretical correlation between adjacent overlapping
data points, where the non-overlapping data are independent. This approach generates corre-
lated data from the non-Levy-stable distribution, where the correlations between adjacent data
points are in line with theoretical correlations for overlapping data. (This last approach is not
tested below.)
This adjustment works for the same reason as the adjustment described in section 4.1. The KS
distance is generated between the data and the fitted distribution. This distance is then compared
with 1000 randomly generated distances, except this time using overlapping data. If the distance
between the data and the fitted distribution is greater than 95% of the randomly generated dis-
tances, then there is statistically significant evidence against the hypothesis that the data are from
the fitted distribution.
1. Test of the standard KS test. This is done using non-overlapping simulated data from a
normal distribution with mean 0 and standard deviation 1.
a. 100 data points are simulated from this normal distribution.
b. The KS test is carried out between this simulated data and the normal distribution with
parameters 0 for mean and 1 for standard deviation.
c. The p-value is calculated from this KS test.
d. Steps a, b, and c are repeated 1000 times and the number of p-values lower than 5% are
calculated and divided by 1000.
2. Test of the KS test with sample error. This test is done using non-overlapping simulated data
from a normal distribution with mean 0 and standard deviation 1. The difference between
this test and test 1 is that step b in test 1 is using known parameter values, whereas this test
uses parameters from a distribution fitted to the data.
a. 100 data points are simulated from this normal distribution.
b. The normal distribution is fitted to the data using the maximum likelihood esti-
mate (MLE).
c. The KS test is carried out between the simulated data and the fitted normal distribution.
d. The p-value is calculated for this KS test.
e. Steps a, b, c and d are repeated 1000 times and the number of p-values lower than 5% are
calculated and divided by 1000.
3. Test of the KS test with correction for sample error. This test is done using non-overlapping
simulated data from a normal distribution with mean 0 and standard deviation 1. The dif-
ference between this test and test 2 is that step c is carried out using the KS test adjusted for
sample error.
a. 100 data points are simulated from this normal distribution.
b. The normal distribution is fitted to the data using the MLE.
c. The KS test adjusted for sample error (as described in section 4.1) is carried out between
the simulated data and the fitted normal distribution.
d. The p-value is calculated for this KS test.
e. Steps a, b, c, and d are repeated 1000 times and the number of p-values lower than 5% is
calculated and divided by 1000
4. Test of the KS test with correction for sample error applied to overlapping data. This test is
done using overlapping simulated data from a normal distribution with mean 0 and
standard deviation 1. The difference between this test and test 3 is that the simulated data
in this test is from an overlapping dataset.
a. 100 data points are simulated from this normal distribution.
b. The normal distribution is fitted to the data using the MLE.
c. The KS test adjusted for sample error (as described in section 4.1) is carried out between
the simulated data and the fitted normal distribution.
d. The p-value is calculated for this KS test.
e. Steps a, b, c, and d are repeated 1000 times and the number of p-values lower than 5% are
calculated and divided by 1000.
5. Test of the KS test with correction for sample error applied to overlapping data, and
correction for overlapping data (as described in 5.2). This test is done using overlapping
simulated data from a normal distribution with mean 0 and standard deviation 1. The
difference between this test and test 4 is that the KS test corrects for overlapping data as
well as sample error.
a. 100 data points are simulated from this normal distribution.
b. The normal distribution is fitted to the data using the MLE.
c. The KS test adjusted for sample error (as described in section 4.1) is carried out between
the simulated data and the fitted normal distribution. This was done with a reduced
sample size of 500 in the KS test to improve run times.
d. The p-value is calculated for this KS test.
e. Steps a, b, c, and d are repeated 500 times and the number of p-values lower than 5% are
calculated and divided by 500.
1 4.3
2 0
3 5.0
4 44
5 5.3
• Use of non-overlapping monthly data but annualising these using autocorrelation that is
present in the time series (section 5.2)
○This technique involves fitting a probability distribution to monthly data, simulation from
a large computer-generated dataset from this fitted distribution, and aggregating the
simulated monthly returns into annual returns using a copula and the correlation.
○ It utilises all the data points and leads to a stable calibration.
overlapping data.
○ However, it does not remove the autocorrelation issue completely and does not handle the
○ It improves the fit to the empirical data and leads to a stable calibration.
○ It can handle the data with volatility clustering and avoids the issue of autocorrelation.
However, we have tried using a similar technique by Sun et al. (2009), which tries to correct
the bias in the overlapping variance of the data. We have analysed the impacts of using this
adjustment in section 3 of the paper and have not discussed it further in this section.
The testing carried out in section 5 is based on empirical data where the underlying model
driving the data is unknown. As the model is unknown, the bias and MSE tests carried out in
section 3 are not possible (as these require the model parameters to be known).
We fit distributions to these annualised simulations. We present the use of this technique
using Merrill Lynch (ML) credit data, where we compare the results of using annual overlapping
data (without any aggregation approach) and using the above autocorrelation aggregation
approach.
• The dataset is limited (starting in 1996) and therefore the utilisation of information available
in each of the data points is important.
• This dataset has a single extreme market event (2008–2009 global credit crisis) and the rest of
the data are relatively benign.
• Two significant challenges for calibrating this dataset are:
○ If we use an annual non-overlapping dataset, we may lose the key events of 2008–2009
global credit crisis where the extreme movements in spreads happened during June
2008–March 2009 (a nine-month period).
○ If we use an annual overlapping dataset, the data points used in the fitting process are more
than the data points using an annual non-overlapping dataset, but not sufficient for gen-
erating a credible and robust fit at the 99.5th percentile point.
Figure 1. Annual overlapping versus monthly non-overlapping – a RATING – all maturities. Under the annual overlapping
time series (top left) the ACF starts at 1, slowly converges to 0 (slower decay) and then becomes negative and exceeds the
95% confidence level for the first nine lags. Under the monthly non-overlapping time series (bottom left) the ACF quickly
falls to a very low number, and beyond lag 2 for most time lags, the autocorrelations are within the 95% confidence interval.
For all practical purposes we can ignore the ACF after time lag 2. The suggests that using monthly non-overlapping time
series is less autocorrelated than the annual overlapping time series. Similarly, the PACF for monthly non-overlapping data
(bottom right) shows more time steps where autocorrelations beyond lag 2 are within the 95% confidence interval in com-
parison to the annual overlapping data (top right). The purpose of performing these tests is to show if using monthly non-
overlapping time series is more conducive to modelling or not.
Figure 2. Annual overlapping versus monthly non-overlapping with autocorrelation. From the QQ plots (both using the
hyperbolic distribution) between monthly annual overlapping and monthly non-overlapping with annualisation, it is clear
that using monthly non-overlapping data with autocorrelation appears to improve the fits in the body as well as in the tails.
This is because the QQ plots show a much closer fit to the diagonal for the monthly non-overlapping data with annual-
isation. Note: We used the hyperbolic distribution as it is considered one of the most sophisticated distributions. Similar
conclusions can be drawn using more simpler distributions, such as the normal distribution.
Temporal aggregation can be very useful in cases where we have limited relevant market data
available for calibration and we want to infer the annual process from the monthly/daily process.
5.3.1 Introduction
Under the temporal aggregation technique, the low-frequency data series is called the aggregate
series (e.g. annual series), as shown in Table 2. The high-frequency data series is called the disag-
gregate series (e.g. monthly series). Deriving a low-frequency model from the high-frequency
model is a two-stage procedure:
Table 1. Key quantiles: monthly annualised versus monthly annual overlapping data
• After inferring the orders, the parameters of the low-frequency model should be recovered
from the high-frequency ones, rather than estimating these. Hence, the low-frequency model
parameters incorporate all the economic information from the high-frequency data.
X
A X
k1
Yt W Lyt wj ytj Lj yt
j0 j0
where W(L) is the lag polynomial of order A. W(L) = 1 + L + .. + L^ (k−1), where k represents the
order of aggregation.
If the disaggregate time series yt were to follow a model of the following type,
;Lyt θLεt
where ∅(L) and θ (L) are lag polynomials and εt is an error term, then the temporally aggregated
time series can be described by
βByt ϕBεt
Mu 0 0
Omega 0.00014 0.01591
Alpha 0.1475 0.1656
Beta 0.8071 0.4070
We perform a time series regression model to estimate the coefficients of an ARMA or ARIMA
model on monthly non-overlapping data.
Temporal aggregation leads to a loss of information in the data when performing various data
transformations. However, empirical work done using equity risk data shows that this loss
of information has not been materially significant based on the quantile results observed
under various approaches in section 5.3.2.
Rigorous testing and validation of the behaviour of residuals will be necessary. It is complex to
understand and communicate.
The main complication with using temporal aggregation technique is the fact that it involves
solving an algebraic system of equations, which can get complex for complex time series models of
higher orders, for example, ARIMA (p,d,q), where p, d, and/or q exceed 3.
5.3.1.1 Technical details for AR(1) process. We study this technique using a simple auto-regressive
AR(1) process. Assume that the monthly log-return rt follows an AR(1) process (Chan et al., 2008).
rt ; rt1 at ; at N 0; σ 2a
The annual returns are noted as RT and frequency is defined as m (where m = 12 for annual ag-
gregation). The lag-s auto-covariance functions of the m-period aggregated log return variable.
σ 2a
Cov RT ; RTs m 2m 1 2m 2;2 :: 2;m1 if s 0
1 ;2
mjsj11 2
m1 ;
Cov RT ; RTs 1 ; ; :: ;
2
σ a if s ± 1; ± 2:
1 ;2
σ 2a
Var RT 12 22 20;2 :: 2;11 when s 0 and m 12
1 ;2
1 ; LRT 1 θ Lat at N 0; σ 2a
∅* = ∅m (for real-life applications where for annualisation we use ∅12; it will be
close to zero and therefore the process essentially becomes an MA (1) process). For |∅*| < 1,
1 ; m θ
;m θ
1 2;m θ θ2
; 1 ; ;2 :: ;m1 2 = m 2m 1 2m 2;2 :: 2;m1
5.3.1.2 Technical details for GARCH (1, 1) process. Let at = (rt - μ) be a mean-corrected log-return
and follow GARCH (1,1) process, then
εt at =h0:5
t
ht ω βht1 αa2t1
The m-month non-overlapping period can be “weakly” approximated by GARCH (1,1) process
with corresponding parameters:
µ mµ
1 α βm
ω mω
1 α β
α α β m β
|β*| < 1 is a solution of the following quadratic equation:
β Θα βm Λ
2
1β Θ1 α β2m 2Λ
α αβα β 1 α β2m
Λ
1 α β2
( )
2mm 11 α β2 1 2αβ β2
Θ m1 β 2
κ 1 1 α β2
m 1 mα β α βm α αβα β
4
1 α β2
where κ is the unconditional kurtosis of the data.
( )
κ 3 α αβ α β m 1 m α β α β m
κ 3 6κ 1
m m2 1 α β2 1 2αβ β2
The calibration parameters of GARCH (1, 1) process fitted to monthly non-overlapping and
temporally aggregated GARCH (1, 1) are outlined in Table 2.
6. Conclusions
This paper has considered some of the main issues with overlapping data as well as looking at the
alternatives.
Section 3 presented the results of a simulation study designed to test whether overlapping or
non-overlapping data are better for distribution fitting. For the models tested, overlapping data
appear to be better as biases can be removed (in a similar way to non-overlapping data), but over-
lapping makes a greater use of the data, meaning it has a lower MSE. A lower MSE suggests that
distributions fitted with overlapping data are more likely to be closer to the correct answer.
Section 4 discussed the issues of statistical tests using overlapping data. A methodology was
tested and the adjustment for overlapping data was found to correct the statistical tests in line
with expectations.
Section 5 presented alternative methods for model fitting, by fitting the model to shorter time
frame data and then aggregating the monthly model into an annual model. This approach was
successfully tested in a practical example.
The overall conclusions from this paper are:
• Overlapping data can be used to calibrate probability distributions and is expected to be a
better approach than using non-overlapping data, particularly when there is a constant strug-
gle between finding relevant data for risk calibration and maximising the use of data for a
robust calibration. However, communication of the uncertainty in the model and/or param-
eters to the stakeholder is equally important.
• Some credible alternatives exist to using overlapping data such as temporal aggregation and
annualisation. However, these alternatives bring their own limitations, and understanding
these limitations is key to using these alternatives. We recommend considering a comparison
of calibration using both non-overlapping monthly data annualised with overlapping annual
data, and discussing with stakeholders the advantages, robustness, and limitations of both the
approaches before finalising the calibration approach.
References
Amemiya, T. & Wu, R.Y. (1972). The effect of aggregation on prediction in the autoregressive model. Journal of American
Statistical Association, 67, 628–632.
Ang, A. & Bekaert, G. (2006). Stock return predictability: Is it there? The Review of Financial Studies, 20, 651–707.
Bain, L.J. & Engelhardt, M. (1992). Introduction to Probability and Mathematical Statistics. ISBN 0-534-92930-3. Pacific
Grove, CA: Duxbury/Thomson Learning.
Chan, W.S., Cheung, S.H., Zhang, L.X. & Wu, K.H. (2008). Temporal aggregation of equity return time-series models.
Mathematics and Computers in Simulation, 78, 172–180.
Cochrane, John H. (1988). How big is the random walk in GNP? Journal of Political Economy, 96, 893–920.
Conover, W.J. (1999). Practical Nonparametric Statistics (3rd ed). New York, Wiley.
Cont, R. (2005). Volatility clustering in financial markets: empirical facts and agent-based models, in Long Memory in
Economics (ed. A.K. Teyssiere), Springer: Berlin, Heidelberg.
Drost, F.C. & Nijman, T.E. (1993). Temporal aggregation of GARCH processes. Econometrica, 61, 909–927.
Frankland, R. (chair), Biffis, E., Dullaway, D., Eshun, S., Holtham, A., Smith, A., Varnell, E. & Wilkins, T. (2008). The
modelling of extreme events. British Actuarial Journal, 15, 99–201.
Hansen, L.P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054.
Hansen, L.P. & Hodrick, R.J. (1980). Forward exchange rates as optimal predictors of future spot rates: an econometric
analysis. Journal of Political Economy, 88, 829–853.
Hodrick, R.J. (1992). Dividend yields and expected stock returns: Alternative procedures for inference and measurement. The
Review of Financial Studies, 5, 357–386.
Jarvis, S.J., Sharpe, J. & Smith, A.D. (2017). Ersatz model tests. British Actuarial Journal, 22, 490–521.
Kiesel, R., Perraudin, W. & Taylor, A. (2001). The structure of credit risk: spread volatility and ratings transitions. Working
Paper 131, Bank of England. https://www.bankofengland.co.uk/working-paper/2001/the-structure-of-credit-risk-spread-
volatility-and-ratings-transitions
Kwiatkowski, D., Phillips, P.C.B., Schmidt, P. & Shin, Y. (1992). Testing the null hypothesis of stationarity against the
alternative of a unit root. Journal of Econometrics, 54, 159–178.
Ljung, G.M. & Box, G.E.P. (1978). On a measure of lack of fit in time series models. Biometrika, 65, 297–303.
Mandelbrot, B. (1963). The variation of certain speculative prices. Journal of Business, 36, 394–419.
Marcatoo, P.B. (2003). The Measurement and Modelling of Commercial Real Estate. Presented to IFOA.
Mathworks. (2017). uk.mathworks.com. Retrieved from https://uk.mathworks.com/help/econ/kpsstest.html.
Müller, U.A. (1993). Statistics of variables observed over overlapping intervals. Olsen & Associates Research Group discussion
paper. Available at http://www.olsendata.com/fileadmin/Publications/Working_Papers/931130-intervalOverlap.pdf
Newey, W.K. & West, K.D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent
covariance matrix. Econometrica, 55, 703–708.
Phillips, P.C.B. & Perron, P. (1988). Testing for a unit root in time series regression. Biometrica, 75, 335–346.
Sun, H., Nelken, I., Han, G. & Guo, J. (2009). Error of VaR by overlapping intervals. Asia Risk Magazine, April 2009. https://
www.risk.net/risk-management/1500264/error-var-overlapping-intervals
Wei, M. & Wright, J.H. (2013). Reverse regressions and long-horizon forecasting. Journal of Applied Econometrics, 28,
353–371.
Appendix A
In this section, we provide the mathematical definitions and descriptions of technical terms used in the paper.4
κ1 µ E X
κ2 EX µ2
κ3 EX µ3
4
Source: http://mondi.web.elte.hu/spssdoku/algoritmusok/acf_pacf.pdf
- Estimate the mean, variance, skewness, and kurtosis from the historical data.
- Pick a four-parameter distribution family.
- Evaluate whether the estimated (skew, kurtosis) combination is feasible for the chosen family. If not, adjust the historical
values by projecting onto the boundary of the feasible region.
- Find the distribution matching the adjusted historical skewness and kurtosis.
- Match the mean and variance by shifting and scaling.
- Compare the fitted distribution to the historic data, either by inspection of histograms or more formal statistical tests. If
the fit is not good enough, then think of another four-parameter family and repeat from the third step above.
5
The WTW risk calibration survey 2016 suggests that four-parameter distributions such as hyperbolic and EGB2 are widely
used by UK insurers. Note that the WTW risk calibration survey 2016 is not a publicly available document. However, it can be
made available if requested from Willis Towers Watson.
The plots show the bias in the plot on the left and the MSE on the plot on the right. The overlapping and non-overlapping data
estimates of the mean appear very similar and not obviously biased. These also have very similar MSE across all years. This has
very similar conclusions to the Brownian case.
The second cumulant is the variance (with divisor n). This has very similar conclusions to the Brownian case.
• Overlapping and non-overlapping data both give biased estimates of the second cumulant to a similar extent across all
terms.
• The bias correction factors (using divisor n−1 for non-overlapping variance and the Nelken formula for overlapping
variance) appear to remove the bias. This is evidence that the Nelken bias correction factor works for other processes
than just Brownian motion.
• The plot on the right shows the MSEs for the two approaches, with overlapping data appearing to have lower MSE for
all terms.
Neither approach appears to have any systemic bias for the mean. The MSE is significantly higher for non-overlapping data
than for overlapping data.
In this case the non-overlapping data appear to have a higher downward bias than the overlapping data at all terms; both
estimates appear biased. The bias does not appear to tend to zero as the number of years increases, but it rises above the known
value. The non-overlapping data have higher MSE than the overlapping data.
The plots show the bias in the plot on the left and the MSE on the plot on the right. The overlapping and non-overlapping data
estimates of the mean appear very similar and unbiased. These also have very similar MSEs after 10 years, but overlapping data
appear to have marginally higher MSE below 10 years.
The plot on the left shows that the overlapping and non-overlapping estimates of variance (with divisor n) are too low with
similar bias levels for all terms. This is more marked, the lower the number of years data, and the bias appears to disappear as n
gets larger.
The plot on the left also shows the second cumulant but bias-corrected, using a divisor (n−1) instead of n for the non-over-
lapping data and using the formula in Sun et al. as well as in Cochrane (1988) for the overlapping data. Both these corrections
appear to have removed the bias across all terms for overlapping and non-overlapping data. The MSE is very similar for both
overlapping and non-overlapping data.
It is important to note that neither approach appears to have any materially different bias. Non-overlapping data have higher
MSE compared to overlapping data.
Non-overlapping data have lower bias compared to overlapping data, but overlapping data have lower MSE.
yt α ρyt1 δt ut
The results are used to calculate the test statistics proposed by Phillips and Perron. Phillips and Perron’s test statistics can be
viewed as Dickey–Fuller statistics that have been made robust to serial correlation by using the Newey–West (1987) hetero-
skedasticity- and autocorrelation-consistent covariance matrix estimator. Under PP unit root test, the hypothesis is as follows:
H null: The time series has a unit root (which means it is non-stationary).
H alternative: The time series does not have a unit root (which means it is stationary).
Yt βt rt α et
= rt = rt−1 + μt is a random walk, the initial value r0 = α serves as an intercept, t is the time index, ut are independent
identically distributed (0,σ 2u ). Under the KPSS test, the hypothesis is as follows:
H null: The time series is trend/level stationary (which means it does not show trends).
H alternative: The time series is not trend/level stationary (which means it does show trends).
Xh
ρ̂2k
Q nn 2
k1
n k
where n is the sample size, ρ̂2k is the sample autocorrelation at lag k, and h is the number of lags being tested. Under the null
hypothesis, Q χ2h , where h is the degree of freedom.
These tests have been applied to corporate bond indices with the results presented in Table C.1.
The credit spread data is subject to a number of different stationarity tests. If the process is stationary, it is more conducive for
a robust calibration because its statistical properties remain constant over time (e.g. the mean, variance, autocorrelation, etc.,
do not change). If the process is not stationary, the variation in the fitting parameters can be significant as the new information
emerges in the new data, or in some cases the model may no longer remain valid. This point is important for stakeholders
because the stability of the SCR depends upon the stability of risk calibrations.
There are various definitions of stationarity in the literature. We present a “weak” stationarity definition here. We believe it is
widely used; however, stronger forms may be required, for example, when considering higher moments.
A process is said to be covariance stationary or “weakly stationary” if its first and second moments are time-invariant, that is,
VarYt γ 0 < ∞ 8t
6
A portmanteau test is a type of statistical hypothesis test in which the null hypothesis is well specified, but the alternative
hypothesis is more loosely specified.
7
No adjustment has been applied to this stationarity test for overlapping bias.
8
p-values are used to determine statistical significance in a hypothesis test. Intuitively, higher p-values than the threshold
indicate the data are likely with a true null hypothesis, and lower p-values than a threshold indicate the data are unlikely with a
true null hypothesis. Typically, a 5% threshold is used in many applications.
Table C.1. Credit indices: monthly non-overlapping data – stationarity and unit root tests
PP single-mean test 1 1 1 1
PP trend test 1 1 7 6
KPSS trend 24 67 59 72
KPSS level 15 26 67 75
Ljung-Box 90 13 0 0
PP test Stationary 1 The p-value is less than 5% which suggests that we reject the
null hypothesis of the time series having a unit root. This is
strong evidence of stationarity in the time series.
Monthly non-overlapping annualised data and monthly annual
overlapping data both have similar results, supporting that
both time series do not support the presence of unit root.
KPSS trend Stationary 15–75% The p-value is greater than 5% which means we are unable to
stationarity test reject the null hypothesis. This means the time series is trend
stationary.
Monthly non-overlapping annualised data and monthly annual
overlapping data both have similar results, supporting that
both time series are trend stationary.
KPSS level Stationary 15–41 The p-value is greater than 5% which means we are unable to
stationarity test reject the null hypothesis. This means the time series is level
stationary.
Monthly non-overlapping annualised data and monthly annual
overlapping data both have similar results, supporting that
both time series are level stationary.
Ljung-Box test Not >10 The p-values are greater than 5%. We are able to reject the null
Independent hypothesis and conclude that the time series does not show
serial correlation.
Monthly non-overlapping annualised data do not show serial
correlation. However, we are unable to reject the hypothesis for
monthly annual overlapping.
• It is important to note that the Ljung-Box test suggests that the data have serial correlation for monthly annual over-
lapping data. However, we are able to reject the hypothesis for monthly non-overlapping annualisation approach.
The purpose of doing these tests is to show that using monthly non-overlapping annualised data can be a better alternative if
we can annualise it rather than using monthly annual overlapping data.
Cite this article: Frankland R, Smith AD, Sharpe J, Bhatia R, Jarvis S, Jakhria P, and Mehta G. Calibration of VaR models with
overlapping data. British Actuarial Journal. https://doi.org/10.1017/S1357321719000151