1. Introduction
Forecasting extreme quantiles, which is a part of extreme value (EV) analysis, is an important problem in many applications. For example, in analyzing ocean storm severity it is often critical to forecast an extreme quantile, such as the 99% quantile, for the maximum wave height over a certain time period to facilitate adequate construction design [
1]. In financial risk-management, forecasting extreme quantiles, or so-called VaR analysis, is critical for establishing adequate bank capital levels [
2]. In fact, extreme value analysis, and quantile forecasting, has found wide application in fields as varied as target detection [
3], communication systems [
4], image analysis [
5], power systems [
6], and population studies [
7].
In this paper, which is an extension of the work presented in [
8], we derive a recursive method for Bayesian forecasting of extreme quantiles where the underlying process is non-stationary. This differs from our previous work given our focus here is on the predictive distribution of future observations, rather than simply the model parameters, and we also properly invert the predictive distribution to obtain efficient quantile forecasts. In our previous work, we obtained improved parameter estimates, versus traditional methods, but our prediction ability, as measured by predictive
p-values, was hampered by a number of issues. We made a number of improvements to our previous work including modifying the support assumptions on the observed data as well as implementing an adaptive particle filtering algorithm both of which resulted in demonstrable improvement in quantile forecasting.
We model the maximum of a block of data, which has an asymptotic general extreme value (GEV) distribution [
9], via a non-linear state-space model where the parameters are driven by a hidden stochastic process with unknown static parameters. To recursively estimate the model parameters, we utilize a particle filter (PF) [
10] and, in particular, we derive a Rao-Blackwellized particle filter (RBPF) [
11] to marginalize the unknown, static parameters. Importantly, we derive a recursive solution for the predictive state density, which is required for the particle filter, and we design an algorithm that bears similarity to the well-known Kalman filter [
12] and is therefore readily implemented.
The work presented here, and in [
8], is an extension of some previous studies. In [
13], a deterministic trend is applied to a subset of the GEV parameters, while in [
14,
15,
16], a GEV parameter subset is dynamic with known state equation. More recently, Ref. [
17] developed a dynamic model for the tail-index and a Gaussian-mixture approximation to linearize the estimation problem. Our work has extended these previous studies in a number of ways. First, under a reasonable assumption, we reduce the number of GEV parameters and model the remaining set as a vector Markov process, with unknown system and covariance matrix. Second, we recursively compute the marginalized state density, eliminating the need to estimate unwanted nuisance parameters and, in the process, we derive recursive expressions for the necessary sufficient statistics. This allows for a fast, real-time implementation without the need to batch-process observations. Lastly, we derive the predictive distribution, for the block-maximum, that we use to forecast extreme quantiles.
The paper is organized as follows. In the next section, we formulate the problem for the time-varying block-maximum and propose a Bayesian approach by deriving the recursive solution for marginalizing the nuisance parameters-Rao-Blackwellization. In
Section 3, we implement our solution and derive our simplified likelihood function. We describe our algorithm in detail and show the method’s ability to forecast extreme quantiles, outperforming traditional methods, using simulated data. In the last section, we discuss our results and use our approach to analyze S&P 500 stock market returns from 1928 to 2020. We also conclude the paper with ideas for further research.
2. Materials and Methods
The starting point for our analysis is a sequence of block-maximums denoted by
. The
k’th block-maximum is the maximum of a strictly stationary time series,
so that
where the index
t typically represents time and the
k’th block consists of data with time indices
and
. An example would be in financial markets where the underlying data is the daily return of a stock market index (e.g., S&P 500) and
is the largest daily return loss experienced in year
k (see
Figure 1). In this case,
, where
is the daily return defined as the percentage change in the price,
.
Similar to the central-limit theorem, there is a limiting distribution for the normalized block-maximum of i.i.d. random variables (RVs). The Fisher-Tippet-Gnedenko (FTG) theorem states that the only non-degenerate limiting distribution for the block-maximum, is in the generalized extreme value (GEV) family [
18] with shape parameter, or tail-index,
. That is, the cumulative distribution function (cdf) of the block-maximum,
, properly normalized, converges to a distribution
as
which has the Jenkinson-von Mises representation [
19],
When the tail-index is positive (
), the resulting distribution is the Fréchet distribution with
. The Fréchet is the limiting distribution for many heavy-tailed underlying random variables [
20]. Upon normalization, a three parameter family is obtained and, therefore, we asymptotically model each of the block-maximum
, with
and support
. We note that the FTG theorem’s i.i.d. assumption can be relaxed to include most strictly stationary processes [
21].
To capture the non-stationary effects and clustering often witnessed in real-world data (see
Figure 1), we propose to model the parameters of the GEV distribution as time-varying and, to insure positivity for the shape and scale parameters, we define the unknown state vector as
and the state transition equation as
where
,
,
,
, and
. We assume the matrices
and
are unknown and the Gaussian process noise,
, was chosen for its analytical tractability and maximal entropy property. Modeling the state as a first-order auto-regressive process, as in Equations (3) and (4), is similar to stochastic volatility models [
22].
We specify the observation, or measurement, equation as
where
, and
is distributed according to (2) and, hence, a function of
. Specifically,
is a standard GEV RV that is scaled and translated and, combined, Equations (4) and (5) form a non-linear state-space model from which we wish to make Bayesian inferences.
The Bayesian inferences we wish to make come in the form of three probability density functions (pdf). The first is the state filtering density
where
. The second is the state predictive density
. Lastly, and most germane to our goal, is the predictive distribution of the observation
Since our goal is to estimate extreme quantiles, we require the cdf of this predictive density,
which we can obtain under suitable conditions as
In words, the cumulative predictive distribution is the expected GEV distribution conditioned on the state predictive density and our goal is to forecast or the %-quantile.
To accomplish our goal, we need to derive a recursive solution for the filtering density, , and the state predictive density, . This can be done analytically for the case of known linear state-space models with Gaussian noise. However, with a non-linear state-space model, or non-Gaussian disturbances, a numerical approach is needed.
From standard Bayesian analysis, we can write
where
is the
state stream which are the state vectors from the initial state, at time 0, up to the current state, at time
k. The likelihood function,
, highlights the assumption that the current observation depends only on the current value of the state vector. If we further assume that the state vector process is Markov, independent of the observations, we can write
which is the classic Bayesian recursion equation. If we can further invoke that the state process is first-order Markov, so that
, then we get the integral form of the Bayesian recursion [
23],
which can be implemented with a standard particle filter [
24].
In our case, where the static parameters in the state transition equation (
4) are unknown, we can not directly implement Equations (10) or (11) since the current state depends on its complete history through the unknown parameters. The tact we take is to derive a Rao-Blackwellized particle filter via marginalizing
,
to obtain
), where
are a set of sufficient statistics dependent only on the state stream up to time
. In doing so, we need an efficient, recursive solution to
so that the filter can be implemented without the need for repeated, large matrix operations or the need to retain the complete state history.
To start, we note that Equation (4) can be written compactly as a general linear model for the complete state stream up to time
k as
where
, with
p being the number of state variables. The unknown matrix
and
has columns given by
. The noise term
is a random Normal matrix whose columns are i.i.d.
with
unknown. For the Rao-Blackwellized particle filter, the columns of
are formed by the particles from 1 to
k and the columns of
include the past particles from 0 to
. Therefore, Ref. (
13) is the state transition equation that describes the evolution of the state stream and, in the filter, each particle stream operates under its own version of this multivariate regression model.
The predictive density for the general linear model, with a non-informative, Jeffreys prior [
25], results in a multivariate Student-t distribution [
26], which for a
p-dimensional random vector is denoted as
with
v degrees of freedom, mean
, and scaling matrix
. Using the non-informative prior,
, we can write the marginal predictive density for
as a multivariate Student-t distribution with
degrees of freedom, i.e.,
The mean of this distribution is derived as
and the scale matrix is
with the matrices defined as
As is, Equations (14)–(16) can be used to compute the state predictive density and to extrapolate particles from time
k to time
. That said, the resulting matrix operations are expensive computationally and a recursive solution is desired. Letting
, we derived recursive expressions for the mean and the scale matrix as follows:
These equations bear a resemblance to the recursive equations found in the Kalman filter [
27] and are readily implemented with minimum computational and memory requirements. The specific algorithm we use is discussed in the next section and we note that the above procedure can likewise be used in the case of a linear observation equation with unknown observation matrix.
3. Results
To implement our solution, we first make a simplifying assumption that reduces our state vector to
. Recall that for the GEV distribution, with
, the support for the observation is
, where we have dropped the subscript
k for simplicity. To eliminate the parameter
, we constrain
allowing us to solve for
. This is a generalization of the constraint used in [
8] where the support was
or
resulting in
. The main issue in the previous study was that small values for the parameter
resulted in a substantial portion of the left tail with negligible probability since
as
resulting in sub-par prediction performance. While the parameter estimates in [
8] were quite reasonable, and outperformed the maximum likelihood (ML) method, quantile estimates were biased.
Under our support constraint, we can write
and the now two-parameter GEV distribution is
with support
. The likelihood function,
, can then be written as
where the substitutions
and
can readily be made.
Armed with the likelihood in Equation (24) and the sufficient statistics, computed recursively in Equations (17)–(21), we are ready to implement our version of the RBPF (see Algorithm 1).
Algorithm 1 RBPF recursive implementation. |
Initialization (, particles):
Recursive Loop ():Sample for . (particle extrapolation) Compute predictive statistics: Compute quantile estimates, p-values, and (see Algorithm 2) Systematically resample particles, , using in ( 24) Compute contemporaneous statistics: Compute from ( 17). Update via ( 18) and via ( 19) Update to from ( 20) and from ( 21) , , and repeat loop
|
In Algorithm 1, estimates for
and
are produced by a weighted sum of the particles
available after processing the observation,
, at time
k. To illustrate the performance of the algorithm, we simulated the stochastic GEV parameters from an uncoupled, mean-reverting, AR(1) process as follows
where
and
are independent
. The process was chosen with
and
to be representative of time-series encountered in heavy-tailed phenomena such as financial markets and we refer the reader to [
8] for representative simulation examples.
We produced 225 samples for each simulation run and we initialized the filter with
particles. At each point in time we computed estimates for the GEV parameters
and
. We also computed the maximum likelihood (ML) estimates for the GEV parameters from (24) and used at least 25 observations (
) before reporting estimation errors. For each of the 100 simulation runs, we computed a root-mean-square (RMS) error for the parameter estimates, across time, and shown in
Figure 2 are the histograms for the RMS errors for both the RBPF and ML method. We can see the RBPF method outperforms with an average RMS error for
of 0.11 vs. 0.17 for the ML method. Similarly, the average RMS error for
was 0.32% vs. 0.41% for the ML method.
One of the improvements we made to our original algorithm was to use systematic resampling of the existing set of particles versus standard multinomial resampling which improves the filter’s performance. The other improvement was to adapt the number of particles based on an assessment of the filter’s predictive performance as in [
28] although we modified their method so that new particles, if needed, are generated from the initial prior distribution. Finally, the quantile estimates,
for
, based on averaging
over the particles that approximate the predictive state distribution, were improved. In particular, we computed an estimate of the predictive distribution, via approximating Equation (8), versus using
. Algorithm 2 provides the details of the method used for quantile estimation, computing
p-values, and to adapt the number of particles in the filter.
To test our approach, we examined the predictive performance of our method, outlined in Algorithm 2, versus the maximum likelihood. We applied a chi-square test as described in [
29]. At each time
k, we effectively partitioned the support of the predictive cdf estimate,
into
B equiprobable buckets. For the ML method, we did the same using the ML estimates of the GEV parameters as “truth”. We then compared the
st observation,
, to each bucket interval to find its location and to create an empirical frequency distribution. If the model is true, this frequency distribution will be a sample estimate of a uniform distribution and the statistic
D is approximately chi-square with
degrees of freedom (See Algorithm 2 for details).
For our test, we chose
intervals and shown in
Figure 3 are the frequency distributions for the entire 100 runs. We note that, in total, there were 20,000 sets of quantiles forecasted and each forecast was based solely on past data. More importantly, each prediction was based on a varied amount of past data. For example, in each simulation run, the first set of quantile forecasts were based on just 25 observations (
and the last forecasts were based all that run’s observations except the last (
. We see that the RBPF method has a more uniform frequency distribution compared to the ML. The ML, similar to our previous work in [
8], tended to under-estimate the 5% quantile, leading to more observations than expected. More importantly, the extreme quantiles were overestimated leading to less observations exceeding those thresholds. While one might argue that the bias to higher thresholds, for those extreme quantiles, is at least being “conservative”, it is clearly more beneficial to having a more accurate cdf and quantile forecast, particularly since these quantile forecasts are typically used to compute exceedance probabilities over multi-block periods.
Algorithm 2 Compute quantile estimates, p-values, and . |
Initialization (prior to Recursive Loop)
Set for (B = # buckets) Create y-grid: and set . Initialize counter for
Within Recursive Loop ( )
Compute predictive CDF: . Invert predictive CDF: Find index such that for Increment counter If (compute p-values)
- −
- −
Compute statistic: - −
Compute p-value using test - −
If p-value < (double # of particles)
- *
- *
Initialize new particles (See Initialization in Algorithm 1) - *
- −
If p-value > (halve # of particles)
- *
Randomly discard existing particles - *
|
To further illustrate the improvement in the performance of our approach, we show the histogram of the final
p-values, for the 100 simulation runs, in
Figure 4. The ML method has about 35 of the
p-values in the range of [0,0.1), compared to an expected amount of 10, and most of the
p-values are less than 0.5. This indicates that one would reject the ML method as an alternative hypothesis in most instances. In our previous work, our simulated
p-values were worse than the ML method due to the reasons cited previously. Our new RBPF method, derived herein, has
p-values close to the expected number for each bucket (i.e., 10), and produces quite reasonable quantile forecasts. To further see the accuracy of forecasting extreme quantiles with the RBPF, we plotted the empirical quantile estimate versus the predicted in
Figure 5.
As seen in
Figure 5, at the predicted 99.9%-quantile, 99.87% of the observations were below the threshold. Stated differently, 26 out of the 20,000 predictions exceeded the forecasted 99.9%-quantile threshold, which is six more than expected. We find this to be excellent given that many of the forecasts are made with limited data in hand. At the 99%-quantile, 0.15% additional observations exceeded our forecasts. Out of the 20,000 predictions, 200 exceedances were expected versus the 230 were observed. Our simulation results bear out the view that our methodology not only outperforms traditional quantile forecast methods (e.g., ML) but also our previous work in [
8].
4. Discussion
From our results, illustrated in the previous section, we see that our methodology offers improvements in forecasting extreme quantiles, particularly when the time-series is non-stationary. This is true in many applications, where extreme behaviour tends to cluster in time, such as in finance, or trend over time, such as in climatology. To illustrate the application of our approach to real-world data, we applied it the data shown in
Figure 1 which are the block-maximums of daily losses, or negative returns expressed in percent, for the S&P 500 stock market index, a widely used proxy for the overall stock market.
For completeness, we show in
Figure 6 the likelihood surface for all of the data from 1928–2020 (93 years/block-maximums). While the ML estimates, using all of the data, are
and
, one can see there is a large degree of uncertainty in the GEV parameters. For example, there is considerable probability that
, which would imply an underlying time-series with infinite variance and this has implications for financial models that rely on finite second moments (e.g., Black-Scholes). The Bayesian approach is to use all of the information available from the data to produce posterior densities from which to produce parameter estimates and, more importantly, to forecast. This is particularly important when forecasting quantiles since they can be quite sensitive to point estimates given the highly non-linear form of the GEV cdf.
Shown in
Figure 7 are the S&P 500 block-maximums along with quantile forecasts over time. For the 90% quantile forecast, 8 out of 92 observed stock-market returns exceeded the forecast, or 8.7% of the time. This is well within what one would expect. The current, forecasted 90% quantile is 7.1% and the 99% quantile forecast is 23.8%. We should expect to see this type of market "crash," or worse, about once a century. While the range of the forecasted 90% quantile, from 5.6% in the 1970s to 8% post financial crisis, may seem minimal, we do note that more extreme quantile estimates have significant variation. For example, the 99% quantile forecast went from 18% to 25% and the 99.5% quantile went from 25.8% to 35% over the same period. This would be of use to a financial institution when constructing risk scenarios for stress testing their portfolios. Last year, in 2020, the block-maximum was a 12% loss, during the covid pandemic crisis, which was at the 96% quantile. We should expect to see at least this 12% loss about once every 25 years and it should not be considered unusual highlighting the importance of quantile forecasting-bad things do happen!
It is worth pointing out that the p-value based on the forecasted quantiles was a robust 0.75 versus 0.01 for the ML method. While one must be wary in using p-values to accept a model, we feel that the results we have shown in this paper clearly indicate an improved method of forecasting. But much work needs to be done. For one, we simply used block-maximums in this paper, which one can argue wastes data. We would like to extend our approach further to use more higher-order statistics, or exceedances above a threshold. We would also like to broaden our simulation study and apply our results to other data sets, particularly in the area of climatology. Lastly, additional computational efficiencies should be explored to allow real-time implementation to high-frequency data.