Bayesian Forecasting of Dynamic Extreme Quantiles

Johnston, Douglas E.

doi:10.3390/forecast3040045

Open AccessArticle

Bayesian Forecasting of Dynamic Extreme Quantiles

by

Douglas E. Johnston

Farmingdale State College, The State University of New York, Farmingdale, NY 11735, USA

Forecasting 2021, 3(4), 729-740; https://doi.org/10.3390/forecast3040045

Submission received: 1 September 2021 / Revised: 1 October 2021 / Accepted: 7 October 2021 / Published: 11 October 2021

(This article belongs to the Special Issue Bayesian Time Series Forecasting)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we provide a novel Bayesian solution to forecasting extreme quantile thresholds that are dynamic in nature. This is an important problem in many fields of study including climatology, structural engineering, and finance. We utilize results from extreme value theory to provide the backdrop for developing a state-space model for the unknown parameters of the observed time-series. To solve for the requisite probability densities, we derive a Rao-Blackwellized particle filter and, most importantly, a computationally efficient, recursive solution. Using the filter, the predictive distribution of future observations, conditioned on the past data, is forecast at each time-step and used to compute extreme quantile levels. We illustrate the improvement in forecasting ability, versus traditional methods, using simulations and also apply our technique to financial market data.

Keywords:

extreme value theory; quantile forecasting; particle filter; risk-management; value-at-risk

1. Introduction

Forecasting extreme quantiles, which is a part of extreme value (EV) analysis, is an important problem in many applications. For example, in analyzing ocean storm severity it is often critical to forecast an extreme quantile, such as the 99% quantile, for the maximum wave height over a certain time period to facilitate adequate construction design [1]. In financial risk-management, forecasting extreme quantiles, or so-called VaR analysis, is critical for establishing adequate bank capital levels [2]. In fact, extreme value analysis, and quantile forecasting, has found wide application in fields as varied as target detection [3], communication systems [4], image analysis [5], power systems [6], and population studies [7].

In this paper, which is an extension of the work presented in [8], we derive a recursive method for Bayesian forecasting of extreme quantiles where the underlying process is non-stationary. This differs from our previous work given our focus here is on the predictive distribution of future observations, rather than simply the model parameters, and we also properly invert the predictive distribution to obtain efficient quantile forecasts. In our previous work, we obtained improved parameter estimates, versus traditional methods, but our prediction ability, as measured by predictive p-values, was hampered by a number of issues. We made a number of improvements to our previous work including modifying the support assumptions on the observed data as well as implementing an adaptive particle filtering algorithm both of which resulted in demonstrable improvement in quantile forecasting.

We model the maximum of a block of data, which has an asymptotic general extreme value (GEV) distribution [9], via a non-linear state-space model where the parameters are driven by a hidden stochastic process with unknown static parameters. To recursively estimate the model parameters, we utilize a particle filter (PF) [10] and, in particular, we derive a Rao-Blackwellized particle filter (RBPF) [11] to marginalize the unknown, static parameters. Importantly, we derive a recursive solution for the predictive state density, which is required for the particle filter, and we design an algorithm that bears similarity to the well-known Kalman filter [12] and is therefore readily implemented.

The work presented here, and in [8], is an extension of some previous studies. In [13], a deterministic trend is applied to a subset of the GEV parameters, while in [14,15,16], a GEV parameter subset is dynamic with known state equation. More recently, Ref. [17] developed a dynamic model for the tail-index and a Gaussian-mixture approximation to linearize the estimation problem. Our work has extended these previous studies in a number of ways. First, under a reasonable assumption, we reduce the number of GEV parameters and model the remaining set as a vector Markov process, with unknown system and covariance matrix. Second, we recursively compute the marginalized state density, eliminating the need to estimate unwanted nuisance parameters and, in the process, we derive recursive expressions for the necessary sufficient statistics. This allows for a fast, real-time implementation without the need to batch-process observations. Lastly, we derive the predictive distribution, for the block-maximum, that we use to forecast extreme quantiles.

The paper is organized as follows. In the next section, we formulate the problem for the time-varying block-maximum and propose a Bayesian approach by deriving the recursive solution for marginalizing the nuisance parameters-Rao-Blackwellization. In Section 3, we implement our solution and derive our simplified likelihood function. We describe our algorithm in detail and show the method’s ability to forecast extreme quantiles, outperforming traditional methods, using simulated data. In the last section, we discuss our results and use our approach to analyze S&P 500 stock market returns from 1928 to 2020. We also conclude the paper with ideas for further research.

2. Materials and Methods

The starting point for our analysis is a sequence of block-maximums denoted by

y_{k} \in R, k = 1, \dots N

. The k’th block-maximum is the maximum of a strictly stationary time series,

s_{t},

so that

y_{k} = max_{N_{k - 1} < t \leq N_{k}} s_{t},

(1)

where the index t typically represents time and the k’th block consists of data with time indices

(N_{k - 1}, N_{k}]

and

N_{0} = 0

. An example would be in financial markets where the underlying data is the daily return of a stock market index (e.g., S&P 500) and

y_{k}

is the largest daily return loss experienced in year k (see Figure 1). In this case,

s_{t} = - r_{t}

, where

r_{t} = (p_{t} - p_{t - 1}) / p_{t - 1}

is the daily return defined as the percentage change in the price,

p_{t}

.

Similar to the central-limit theorem, there is a limiting distribution for the normalized block-maximum of i.i.d. random variables (RVs). The Fisher-Tippet-Gnedenko (FTG) theorem states that the only non-degenerate limiting distribution for the block-maximum, is in the generalized extreme value (GEV) family [18] with shape parameter, or tail-index,

ξ

. That is, the cumulative distribution function (cdf) of the block-maximum,

y_{k}

, properly normalized, converges to a distribution

H_{ξ}

as

(N_{k} - N_{k - 1}) \to \infty

which has the Jenkinson-von Mises representation [19],

H_{ξ} (y) = exp (- {(1 + ξ y)}^{- 1 / ξ}) .

(2)

When the tail-index is positive (

ξ > 0

), the resulting distribution is the Fréchet distribution with

y \geq - 1 / ξ

. The Fréchet is the limiting distribution for many heavy-tailed underlying random variables [20]. Upon normalization, a three parameter family is obtained and, therefore, we asymptotically model each of the block-maximum

y_{k} ~ H_{ξ, β, μ} (y_{k}) = H_{ξ} (\frac{y_{k} - μ}{β})

, with

β > 0

and support

y \in [μ - β / ξ, \infty)

. We note that the FTG theorem’s i.i.d. assumption can be relaxed to include most strictly stationary processes [21].

To capture the non-stationary effects and clustering often witnessed in real-world data (see Figure 1), we propose to model the parameters of the GEV distribution as time-varying and, to insure positivity for the shape and scale parameters, we define the unknown state vector as

x_{k} = {[log (ξ_{k}), log (β_{k}), μ_{k}]}^{⊤}

(3)

and the state transition equation as

x_{k} = η + Θ x_{k - 1} + C^{1 / 2} u_{k},

(4)

where

x_{k} \in R^{3}

,

k \in N_{0}

,

u_{k} ~ N (0, I_{3})

,

η \in R^{3}

, and

Θ, C \in R^{3 \times 3}

. We assume the matrices

η, Θ,

and

C

are unknown and the Gaussian process noise,

u_{k}

, was chosen for its analytical tractability and maximal entropy property. Modeling the state as a first-order auto-regressive process, as in Equations (3) and (4), is similar to stochastic volatility models [22].

We specify the observation, or measurement, equation as

y_{k} = x_{3_{k}} + e^{x_{2_{k}}} w_{k},

(5)

where

e^{x_{2_{k}}} = β_{k}, x_{3_{k}} = μ_{k}

, and

w_{k}

is distributed according to (2) and, hence, a function of

e^{x_{1_{k}}} = ξ_{k}

. Specifically,

y_{k}

is a standard GEV RV that is scaled and translated and, combined, Equations (4) and (5) form a non-linear state-space model from which we wish to make Bayesian inferences.

The Bayesian inferences we wish to make come in the form of three probability density functions (pdf). The first is the state filtering density

p (x_{k} | y_{1 : k})

where

y_{1 : k} \equiv {y_{1}, y_{2}, \dots, y_{k}}

. The second is the state predictive density

p (x_{k + 1} | y_{1 : k})

. Lastly, and most germane to our goal, is the predictive distribution of the observation

p (y_{k + 1} | y_{1 : k}) = \int p (y_{k + 1} | x_{k + 1}) p (x_{k + 1} | y_{1 : k}) d x_{k + 1} .

(6)

Since our goal is to estimate extreme quantiles, we require the cdf of this predictive density,

F (y_{k + 1} | y_{1 : k}) = \int_{- \infty}^{y_{k + 1}} p (ζ | y_{1 : k}) d ζ,

(7)

which we can obtain under suitable conditions as

F (y_{k + 1} | y_{1 : k}) = E_{x_{k + 1} | y_{1 : k}} [H_{x_{k + 1}} (y_{k + 1})] .

(8)

In words, the cumulative predictive distribution is the expected GEV distribution conditioned on the state predictive density

p (x_{k + 1} | y_{1 : k})

and our goal is to forecast

η_{α} = F^{- 1} (α | y_{1 : k})

or the

α

%-quantile.

To accomplish our goal, we need to derive a recursive solution for the filtering density,

p (x_{k} | y_{1 : k})

, and the state predictive density,

p (x_{k + 1} | y_{1 : k})

. This can be done analytically for the case of known linear state-space models with Gaussian noise. However, with a non-linear state-space model, or non-Gaussian disturbances, a numerical approach is needed.

From standard Bayesian analysis, we can write

p (x_{0 : k} | y_{1 : k}) \propto p (y_{k} | x_{k}) p (x_{0 : k} | y_{1 : k - 1}),

(9)

where

x_{0 : k} \equiv {x_{0}, x_{1}, \dots, x_{k}}

is the state stream which are the state vectors from the initial state, at time 0, up to the current state, at time k. The likelihood function,

p (y_{k} | x_{k})

, highlights the assumption that the current observation depends only on the current value of the state vector. If we further assume that the state vector process is Markov, independent of the observations, we can write

p (x_{0 : k} | y_{1 : k}) \propto {p (y_{k} | x_{k}) p (x_{k} | x_{0 : k - 1})} p (x_{0 : k - 1} | y_{1 : k - 1}),

(10)

which is the classic Bayesian recursion equation. If we can further invoke that the state process is first-order Markov, so that

p (x_{k} | x_{0 : k - 1}) = p (x_{k} | x_{k - 1})

, then we get the integral form of the Bayesian recursion [23],

p (x_{k} | y_{1 : k}) \propto p (y_{k} | x_{k}) \int_{x_{k - 1}} p (x_{k} | x_{k - 1}) p (x_{k - 1} | y_{1 : k - 1}) d x_{k - 1},

(11)

which can be implemented with a standard particle filter [24].

In our case, where the static parameters in the state transition equation (4) are unknown, we can not directly implement Equations (10) or (11) since the current state depends on its complete history through the unknown parameters. The tact we take is to derive a Rao-Blackwellized particle filter via marginalizing

p (x_{k}, η, Θ, C | x_{0 : k - 1})

,

\int_{η, Θ, C} p (x_{k} | η, Θ, C, x_{0 : k - 1}) d P (η, Θ, C | x_{0 : k - 1}),

(12)

to obtain

p (x_{k} | T_{k - 1}

), where

T_{k - 1}

are a set of sufficient statistics dependent only on the state stream up to time

k - 1

. In doing so, we need an efficient, recursive solution to

T_{k - 1}

so that the filter can be implemented without the need for repeated, large matrix operations or the need to retain the complete state history.

To start, we note that Equation (4) can be written compactly as a general linear model for the complete state stream up to time k as

X_{k} = Φ H_{k} + U_{k},

(13)

where

X_{k} = [x_{k}, x_{k - 1}, \dots, x_{1}] \in R^{p \times k}

, with p being the number of state variables. The unknown matrix

Φ = [η Θ]

\in R^{p \times p + 1}

and

H_{k} = [h_{k}, h_{k - 1}, \dots, h_{1}] \in R^{q \times k}

has columns given by

h_{k}^{⊤} = [1 x_{k - 1}^{T}]

. The noise term

U_{k} \in R^{p \times k}

is a random Normal matrix whose columns are i.i.d.

N (0, C)

with

C

unknown. For the Rao-Blackwellized particle filter, the columns of

X_{k}

are formed by the particles from 1 to k and the columns of

H_{k}

include the past particles from 0 to

k - 1

. Therefore, Ref. (13) is the state transition equation that describes the evolution of the state stream and, in the filter, each particle stream operates under its own version of this multivariate regression model.

The predictive density for the general linear model, with a non-informative, Jeffreys prior [25], results in a multivariate Student-t distribution [26], which for a p-dimensional random vector is denoted as

t_{p} (v, μ, Σ)

\propto {[1 + {(x - μ)}^{T} Σ^{- 1} (x - μ) / v]}^{- (v + p) / 2}

with v degrees of freedom, mean

μ

, and scaling matrix

Σ

. Using the non-informative prior,

p (Θ, Θ, C^{- 1}) \propto {| C |}^{(p + 1) / 2}

, we can write the marginal predictive density for

x_{k + 1}

as a multivariate Student-t distribution with

v_{k} = k - 2 p

degrees of freedom, i.e.,

p (x_{k + 1} | X_{k}, H_{k}, h_{k + 1}) ~ t_{p} (v_{k}, {\hat{x}}_{k + 1}, {\hat{Σ}}_{k + 1}) .

The mean of this distribution is derived as

{\hat{x}}_{k + 1} = S_{X H}^{k} {(S_{H H}^{k})}^{- 1} h_{k + 1} = {\hat{Φ}}_{k} h_{k + 1}

(14)

and the scale matrix is

{\hat{Σ}}_{k + 1} = \frac{1 + h_{k + 1}^{⊤} {(S_{H H}^{k})}^{- 1} h_{k + 1}}{v_{k}} (S_{X X}^{k} - {\hat{Φ}}_{k} {S_{X H}^{k}}^{⊤}),

(15)

with the matrices defined as

S_{X X}^{k} = X_{k} X_{k}^{⊤}, S_{X H}^{k} = X_{k} H_{k}^{⊤}, S_{H H}^{k} = H_{k} H_{k}^{⊤} .

(16)

As is, Equations (14)–(16) can be used to compute the state predictive density and to extrapolate particles from time k to time

k + 1

. That said, the resulting matrix operations are expensive computationally and a recursive solution is desired. Letting

P_{k} = {(S_{H H}^{k})}^{- 1}

, we derived recursive expressions for the mean and the scale matrix as follows:

\begin{matrix} K_{k} & = h_{k}^{⊤} P_{k - 1} {[1 + h_{k}^{⊤} P_{k - 1} h_{k}]}^{- 1}, \end{matrix}

(17)

\begin{matrix} P_{k} & = P_{k - 1} [I - h_{k} K_{k}], \end{matrix}

(18)

\begin{matrix} {\hat{Φ}}_{k} & = {\hat{Φ}}_{k - 1} + (x_{k} - {\hat{x}}_{k}) K_{k}, \\ {\hat{Σ}}_{k + 1} & = \frac{v_{k} - 1}{v_{k}} \{\frac{1 + h_{k + 1}^{⊤} P_{k} h_{k + 1}}{1 + h_{k}^{⊤} P_{k - 1} h_{k}}\} \end{matrix}

(19)

\begin{matrix} \times \{{\hat{Σ}}_{k} + (x_{k} - {\hat{x}}_{k}) {(x_{k} - {\hat{x}}_{k})}^{⊤} / (v_{k} - 1)\}, \end{matrix}

(20)

\begin{matrix} {\hat{x}}_{k + 1} & = {\hat{Φ}}_{k} h_{k + 1} . \end{matrix}

(21)

These equations bear a resemblance to the recursive equations found in the Kalman filter [27] and are readily implemented with minimum computational and memory requirements. The specific algorithm we use is discussed in the next section and we note that the above procedure can likewise be used in the case of a linear observation equation with unknown observation matrix.

3. Results

To implement our solution, we first make a simplifying assumption that reduces our state vector to

x_{k} = {[log (ξ_{k}), log (β_{k})]}^{⊤}

. Recall that for the GEV distribution, with

ξ > 0

, the support for the observation is

y \in [μ - β / ξ, \infty)

, where we have dropped the subscript k for simplicity. To eliminate the parameter

μ

, we constrain

H_{ξ, β, μ} (y_{0}) = α_{0}

allowing us to solve for

μ = - H_{ξ, β, y_{0}}^{- 1} (α_{0})

. This is a generalization of the constraint used in [8] where the support was

y \in [0, \infty)

or

α_{0} = y_{0} = 0

resulting in

μ = β / ξ

. The main issue in the previous study was that small values for the parameter

ξ

resulted in a substantial portion of the left tail with negligible probability since

H_{ξ, β, β / ξ}^{- 1} (α) \to \infty

as

ξ \to 0^{+} \forall α > 0

resulting in sub-par prediction performance. While the parameter estimates in [8] were quite reasonable, and outperformed the maximum likelihood (ML) method, quantile estimates were biased.

Under our support constraint, we can write

μ = y_{0} - \frac{β}{ξ} [{(- ln (α_{0}))}^{- ξ} - 1]

(22)

and the now two-parameter GEV distribution is

H_{ξ, β} (y) = exp [- {(\frac{ξ}{β} (y - y_{0}) + {(- ln (α_{0}))}^{- ξ})}^{- 1 / ξ}]

(23)

with support

y \in [y_{0} - \frac{β}{ξ} {(- ln (α_{0}))}^{- ξ}, \infty)

. The likelihood function,

p (y | x)

, can then be written as

p (y | x) = \frac{d}{d y} H_{ξ, β} (y) = \frac{1}{β} H_{ξ, β} (y) {(\frac{ξ}{β} (y - y_{0}) + {(- ln (α_{0}))}^{- ξ})}^{- (1 + ξ^{- 1})},

(24)

where the substitutions

ξ = e^{x_{1}}

and

β = e^{x_{2}}

can readily be made.

Armed with the likelihood in Equation (24) and the sufficient statistics, computed recursively in Equations (17)–(21), we are ready to implement our version of the RBPF (see Algorithm 1).

Algorithm 1 RBPF recursive implementation.

Initialization (

k = 0

,

m = 1 : M

particles):

Generate $\tilde{p} > 2 p + 1$ random particles, $x_{0 : \tilde{p}}^{(m)}$ from (13) with random $Φ, C$ .
Compute $S_{X X}^{\tilde{p}}$ , $S_{X H}^{\tilde{p}}$ , and $S_{H H}^{\tilde{p}}$ from (16).
Initialize ${\hat{Φ}}_{0} = S_{X H}^{\tilde{p}} {(S_{H H}^{\tilde{p}})}^{- 1}$ , ${\hat{x}}_{1}$ from (14) and ${\hat{Σ}}_{1}$ from (15).
Let $P_{0} = {(S_{H H}^{\tilde{p}})}^{- 1}$ and $v_{k} = \tilde{p} - 2 p$ .
Set particle weights $w^{(m)} = 1 / M$ and $M_{1} = M$ .

Recursive Loop (

k = 1 : N

):

Sample $x_{k}^{(m) -} ~ t_{p} (v_{k}, {\hat{x}}_{k}, {\hat{Σ}}_{k})$ for $m = 1, \dots M_{k}$ . (particle extrapolation)
Compute predictive statistics: ${\hat{x_{k}}}^{-} = \sum_{m = 1}^{M} w^{(m)} x_{k}^{(m) -}$
Compute quantile estimates, p-values, and $M_{k}$ (see Algorithm 2)
Systematically resample particles, $x_{k}^{(m) -} \to x_{k}^{(m) +}$ , using $p (y_{k} | x_{k})$ in (24)
Compute contemporaneous statistics: ${\hat{x_{k}}}^{+} = \sum_{m = 1}^{M} w^{(m)} x_{k}^{(m) +}$
Compute $K_{k}$ from (17). Update $P_{k - 1} \to P_{k}$ via (18) and ${\hat{Φ}}_{k - 1} \to {\hat{Φ}}_{k}$ via (19)
Update to ${\hat{Σ}}_{k + 1}$ from (20) and ${\hat{x}}_{k + 1}$ from (21)
$k = k + 1$ , $v_{k} = v_{k} + 1$ , and repeat loop

In Algorithm 1, estimates for

ξ_{k}

and

β_{k}

are produced by a weighted sum of the particles

x_{k}^{(m) +}

available after processing the observation,

y_{k}

, at time k. To illustrate the performance of the algorithm, we simulated the stochastic GEV parameters from an uncoupled, mean-reverting, AR(1) process as follows

\begin{matrix} log ξ_{k} & = (1 - 0.95) \times log (0.45) + 0.95 log ξ_{k - 1} + 0.1 u_{1, k}, \end{matrix}

(25)

\begin{matrix} log β_{k} & = (1 - 0.9) \times log (0.012) + 0.9 log β_{k - 1} + 0.1 u_{2, k}, \end{matrix}

(26)

where

u_{1, k}

and

u_{2, k}

are independent

N (0, 1)

. The process was chosen with

\bar{ξ} = 0.45

and

\bar{β} = 1.25 %

to be representative of time-series encountered in heavy-tailed phenomena such as financial markets and we refer the reader to [8] for representative simulation examples.

We produced 225 samples for each simulation run and we initialized the filter with

M_{1} = 1024

particles. At each point in time we computed estimates for the GEV parameters

{\hat{ξ}}_{k}

and

{\hat{β}}_{k}

. We also computed the maximum likelihood (ML) estimates for the GEV parameters from (24) and used at least 25 observations (

k \geq 25

) before reporting estimation errors. For each of the 100 simulation runs, we computed a root-mean-square (RMS) error for the parameter estimates, across time, and shown in Figure 2 are the histograms for the RMS errors for both the RBPF and ML method. We can see the RBPF method outperforms with an average RMS error for

\hat{ξ}

of 0.11 vs. 0.17 for the ML method. Similarly, the average RMS error for

\hat{β}

was 0.32% vs. 0.41% for the ML method.

One of the improvements we made to our original algorithm was to use systematic resampling of the existing set of particles versus standard multinomial resampling which improves the filter’s performance. The other improvement was to adapt the number of particles based on an assessment of the filter’s predictive performance as in [28] although we modified their method so that new particles, if needed, are generated from the initial prior distribution. Finally, the quantile estimates,

\hat{η_{α}} = {\hat{F}}^{- 1} (α | y_{1 : k})

for

α \in [0, 1]

, based on averaging

H_{x_{k + 1}} (.)

over the particles that approximate the predictive state distribution, were improved. In particular, we computed an estimate of the predictive distribution, via approximating Equation (8), versus using

E_{x_{k + 1} | y_{1 : k}} [H_{x_{k + 1}}^{- 1} (α)]

. Algorithm 2 provides the details of the method used for quantile estimation, computing p-values, and to adapt the number of particles in the filter.

To test our approach, we examined the predictive performance of our method, outlined in Algorithm 2, versus the maximum likelihood. We applied a chi-square test as described in [29]. At each time k, we effectively partitioned the support of the predictive cdf estimate,

\hat{F} (y_{k + 1} | y_{1 : k})

into B equiprobable buckets. For the ML method, we did the same using the ML estimates of the GEV parameters as “truth”. We then compared the

(k + 1)

st observation,

y_{k + 1}

, to each bucket interval to find its location and to create an empirical frequency distribution. If the model is true, this frequency distribution will be a sample estimate of a uniform distribution and the statistic D is approximately chi-square with

B - 1

degrees of freedom (See Algorithm 2 for details).

For our test, we chose

B = 20

intervals and shown in Figure 3 are the frequency distributions for the entire 100 runs. We note that, in total, there were 20,000 sets of quantiles forecasted and each forecast was based solely on past data. More importantly, each prediction was based on a varied amount of past data. For example, in each simulation run, the first set of quantile forecasts were based on just 25 observations (

y_{1 : 25})

and the last forecasts were based all that run’s observations except the last (

y_{1 : 224})

. We see that the RBPF method has a more uniform frequency distribution compared to the ML. The ML, similar to our previous work in [8], tended to under-estimate the 5% quantile, leading to more observations than expected. More importantly, the extreme quantiles were overestimated leading to less observations exceeding those thresholds. While one might argue that the bias to higher thresholds, for those extreme quantiles, is at least being “conservative”, it is clearly more beneficial to having a more accurate cdf and quantile forecast, particularly since these quantile forecasts are typically used to compute exceedance probabilities over multi-block periods.

Algorithm 2 Compute quantile estimates, p-values, and

M_{k}

.

Initialization (prior to Recursive Loop)

Set $α_{j} = j / B$ for $j = 0, \dots, B$ (B = # buckets)
Create y-grid: $\tilde{y} = 0 : y_{s t e p} : y_{m a x}$ and set ${[{\hat{ξ}}_{k}^{-}, {\hat{β}}_{k}^{-}]}^{T} = exp ({\hat{x_{k}}}^{-})$ .
Initialize counter $C (j) = 0$ for $j = 1, \dots, B$

Within Recursive Loop (

k \geq N_{p r e d}

)

Compute predictive CDF: $\hat{F} (\tilde{y} | y_{1 : k}) = \sum_{m = 1}^{M} w^{(m)} H_{{\hat{ξ}}_{k}^{-}, {\hat{β}}_{k}^{-}} (\tilde{y})$ .
Invert predictive CDF: $η_{α_{j}} = {\hat{F}}^{- 1} (α_{j}) \forall j$
Find index $j^{*}$ such that $y_{k} \in [η_{α_{j^{*} - 1}}, η_{α_{j^{*}}})$ for $j^{*} \in {1, \dots, B}$
Increment counter $C (j^{*})$
If $k \geq 2 * N_{p r e d}$ (compute p-values)
−
$N_{c} = \sum_{j = 1}^{B} C (j)$
−
Compute $χ^{2}$ statistic: $D = B (N_{c} - 1) \sum_{j = 1}^{B} {(C (j) / N_{c} - 1 / B)}^{2}$
−
Compute p-value using $χ_{B - 1}^{2} (D)$ test
−
If p-value < $p_{l o w}$ (double # of particles)
*
$M_{n e w} = M_{k} - min (2 M_{k}, M_{m a x})$
*
Initialize $M_{n e w}$ new particles (See Initialization in Algorithm 1)
*
$M_{k} = M_{k} + M_{n e w}$
−
If p-value > $p_{h i g h}$ (halve # of particles)
*
Randomly discard $M_{k} - max (M_{k} / 2, M_{m i n})$ existing particles
*
$M_{k} = max (M_{k} / 2, M_{m i n})$

To further illustrate the improvement in the performance of our approach, we show the histogram of the final p-values, for the 100 simulation runs, in Figure 4. The ML method has about 35 of the p-values in the range of [0,0.1), compared to an expected amount of 10, and most of the p-values are less than 0.5. This indicates that one would reject the ML method as an alternative hypothesis in most instances. In our previous work, our simulated p-values were worse than the ML method due to the reasons cited previously. Our new RBPF method, derived herein, has p-values close to the expected number for each bucket (i.e., 10), and produces quite reasonable quantile forecasts. To further see the accuracy of forecasting extreme quantiles with the RBPF, we plotted the empirical quantile estimate versus the predicted in Figure 5.

As seen in Figure 5, at the predicted 99.9%-quantile, 99.87% of the observations were below the threshold. Stated differently, 26 out of the 20,000 predictions exceeded the forecasted 99.9%-quantile threshold, which is six more than expected. We find this to be excellent given that many of the forecasts are made with limited data in hand. At the 99%-quantile, 0.15% additional observations exceeded our forecasts. Out of the 20,000 predictions, 200 exceedances were expected versus the 230 were observed. Our simulation results bear out the view that our methodology not only outperforms traditional quantile forecast methods (e.g., ML) but also our previous work in [8].

4. Discussion

From our results, illustrated in the previous section, we see that our methodology offers improvements in forecasting extreme quantiles, particularly when the time-series is non-stationary. This is true in many applications, where extreme behaviour tends to cluster in time, such as in finance, or trend over time, such as in climatology. To illustrate the application of our approach to real-world data, we applied it the data shown in Figure 1 which are the block-maximums of daily losses, or negative returns expressed in percent, for the S&P 500 stock market index, a widely used proxy for the overall stock market.

For completeness, we show in Figure 6 the likelihood surface for all of the data from 1928–2020 (93 years/block-maximums). While the ML estimates, using all of the data, are

{\hat{ξ}}_{M L} = 0.48

and

{\hat{β}}_{M L} = 1.26 %

, one can see there is a large degree of uncertainty in the GEV parameters. For example, there is considerable probability that

ξ > 0.5

, which would imply an underlying time-series with infinite variance and this has implications for financial models that rely on finite second moments (e.g., Black-Scholes). The Bayesian approach is to use all of the information available from the data to produce posterior densities from which to produce parameter estimates and, more importantly, to forecast. This is particularly important when forecasting quantiles since they can be quite sensitive to point estimates given the highly non-linear form of the GEV cdf.

Shown in Figure 7 are the S&P 500 block-maximums along with quantile forecasts over time. For the 90% quantile forecast, 8 out of 92 observed stock-market returns exceeded the forecast, or 8.7% of the time. This is well within what one would expect. The current, forecasted 90% quantile is 7.1% and the 99% quantile forecast is 23.8%. We should expect to see this type of market "crash," or worse, about once a century. While the range of the forecasted 90% quantile, from 5.6% in the 1970s to 8% post financial crisis, may seem minimal, we do note that more extreme quantile estimates have significant variation. For example, the 99% quantile forecast went from 18% to 25% and the 99.5% quantile went from 25.8% to 35% over the same period. This would be of use to a financial institution when constructing risk scenarios for stress testing their portfolios. Last year, in 2020, the block-maximum was a 12% loss, during the covid pandemic crisis, which was at the 96% quantile. We should expect to see at least this 12% loss about once every 25 years and it should not be considered unusual highlighting the importance of quantile forecasting-bad things do happen!

It is worth pointing out that the p-value based on the forecasted quantiles was a robust 0.75 versus 0.01 for the ML method. While one must be wary in using p-values to accept a model, we feel that the results we have shown in this paper clearly indicate an improved method of forecasting. But much work needs to be done. For one, we simply used block-maximums in this paper, which one can argue wastes data. We would like to extend our approach further to use more higher-order statistics, or exceedances above a threshold. We would also like to broaden our simulation study and apply our results to other data sets, particularly in the area of climatology. Lastly, additional computational efficiencies should be explored to allow real-time implementation to high-frequency data.

Funding

This research was funded in part by a Farmingdale State College, Mathematics Department, Summer Research Grant.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Daily closing prices for the S&P 500 index were downloaded from Yahoo! finance, https://finance.yahoo.com, and are available upon request, access date: 1 September 2021.

Acknowledgments

The author thanks Farmingdale State College and Carlos Marques for their continued support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

cdf	Cumulative denisty function
EV	Extreme Value
FTG	Fisher-Tippet-Gnedenko
GEV	Generalized extreme value
ML	Maximum likelihood
pdf	Probability density function
PF	Particle filter
RBPF	Rao-Blackwellized particle filter
FMS	Root mean square
RV	Random variable
S&P	Standard and Poor
VaR	Value at Risk

References

Northrop, P.J.; Attalides, N.; Jonathan, P. Cross-validatory extreme value threshold selection and uncertainty with application to ocean storm severity. J. R. Stat. Soc. Appl. Stat. 2017, 66, 93–120. [Google Scholar] [CrossRef] [Green Version]
Johnston, D.E.; Djurić, P.M. The science behind risk management: A signal processing perspective. Signal Process. Mag. 2011, 28, 26–36. [Google Scholar] [CrossRef]
Broadwater, J.B.; Chellappa, R. Adaptive threshold estimation via extreme value theory. IEEE Trans. Signal Process. 2010, 58, 490–500. [Google Scholar] [CrossRef]
Resnick, S.I.; Rootzeń, H. Self-similar communication models and very heavy tails. Ann. Appl. Probab. 2000, 10, 753–778. [Google Scholar] [CrossRef]
Roberts, S.J. Extreme value statistics for novelty detection in biomedical data processing. IEEE Proc. Sci. Meas. Technol. 2000, 147, 363–367. [Google Scholar] [CrossRef] [Green Version]
Shenoy, S.; Gorinevsky, D. Estimating long tail models for risk trends. IEEE Signal Process. Lett. 2015, 22, 968–972. [Google Scholar] [CrossRef]
Anderson, S.C.; Branch, T.A.; Cooper, A.B.; Dulvy, N.K. Black-swan events in animal populations. Proc. Natl. Acad. Sci. USA 2017, 114, 3252–3257. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Johnston, D.E.; Djurić, P.M. A recursive Bayesian model for extreme values. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019. [Google Scholar]
Coles, S. An Introduction to Statistical Modeling of Extreme Values; Springer: London, UK, 2004. [Google Scholar]
Arulampalam, M.S.; Maskell, N.G.; Clapp, T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 2002, 50, 174–188. [Google Scholar] [CrossRef] [Green Version]
Chen, R.; Wang, X.; Liu, J.S. Adaptive joint detection and decoding in flat fading channels via mixture Kalman filtering. IEEE Trans. Inf. Theory 2000, 46, 2079–2094. [Google Scholar] [CrossRef]
Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
Hundecha, Y.; St-Hilaire, A.; Quarda, T.; Adlouni, S.E. A nonstationary extreme value analysis for the assessment of changes in extreme annual wind speed over the Gulf of St. Lawrence, Canada. Environmetrics 2013, 24, 51–62. [Google Scholar] [CrossRef]
Huerta, G.; Sansó, B. Time-varying models for extreme values. Environ. Ecol. Stat. 2007, 14, 285–299. [Google Scholar] [CrossRef]
Toulemonde, G.; Guillou, A.; Naveau, P. Particle filtering for Gumbel-distributed daily maxima of methane and nitrous oxide. J. Appl. Meteorol. Climatol. 2008, 47, 2745–2759. [Google Scholar] [CrossRef]
Wei, Y.; Huerta, G. Dynamic generalized extreme value modeling via particle filters. Commmications Stat. 2017, 46, 6324–6341. [Google Scholar] [CrossRef]
Mao, G.; Zhang, Z. Stochastic tail index model for high frequency financial data with Bayesian analysis. J. Econom. 2018, 205, 470–487. [Google Scholar] [CrossRef]
Embrechts, P.; Kluppelberg, C.; Mikosch, T. Modelling Extremal Events; Springer: New York, NY, USA, 2003. [Google Scholar]
Smith, R.L. Statistics of extremes with applications in environment, insurance and finance. In Extreme Values in Finance, Telecommunications and the Environment; Finkenstädt, B., Rootzeń, H., Eds.; Chapman and Hall CRC: Boca Raton, FL, USA, 2003; pp. 1–78. [Google Scholar]
Resnick, S. Heavy Tail Phenomena: Probabilistic and Statistical Modeling; Springer: New York, NY, USA, 2008. [Google Scholar]
McNeil, A.; Frey, R.; Embrechts, P. Quantitative Risk Management; Princeton University Press: Princeton, NJ, USA, 2005. [Google Scholar]
Jacquier, E.; Polson, N.G.; Rossi, P.E. Bayesian analysis of stochastic volatility models. J. Bus. Econ. Stat. 1994, 12, 371–417. [Google Scholar]
Kramer, S.C.; Sorenson, H.W. Bayesian parameter estimation. IEEE Trans. Autom. Control. 1988, 33, 217–222. [Google Scholar] [CrossRef]
Djurić, P.M.; Bugallo, M.F. Particle filtering. In Adaptive Signal Processing; Adali, T., Haykin, S., Eds.; Wiley & Sons: Hoboken, NJ, USA, 2010; pp. 271–331. [Google Scholar]
Jaynes, E.T. Prior Probabilities. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 227–241. [Google Scholar] [CrossRef]
Geisser, S. Bayesian estimation in multivariate analysis. Ann. Am. Stat. 1965, 36, 150–159. [Google Scholar] [CrossRef]
Meinhold, R.J.; Singpurwalla, N.D. Understanding the Kalman filter. Am. Stat. 1983, 37, 123–127. [Google Scholar]
Elvira, V.; Miǵuez, J.; Djurić, P.M. Adapting the number of particles in sequential Monte Carlo methods through an online scheme for convergence assessment. IEEE Trans. Signal Process. 2017, 65, 1781–1794. [Google Scholar] [CrossRef]
Djurić, P.M.; Khan, M.; Johnston, D.E. Particle filtering of stochastic volatility modeled with leverage. IEEE J. Sel. Top. Signal Process. 2012, 6, 327–336. [Google Scholar] [CrossRef]

Figure 1. Annual block maximum of S&P 500 return loss (1928–2020).

Figure 2. Histogram of RMS errors for RBPF vs ML for 100 simulation runs.

Figure 3. Empirical frequency distribution of observation predictions for RBPF vs ML simulation.

Figure 4. Distribution of final p-values for RBPF vs ML simulation.

Figure 5. Empirical extreme quantile estimates versus predicted for the RBPF.

Figure 6. Likelihood surface for GEV parameters using S&P 500 block-maximums.

Figure 7. S&P 500 max annual loss and predicted quantiles.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Johnston, D.E. Bayesian Forecasting of Dynamic Extreme Quantiles. Forecasting 2021, 3, 729-740. https://doi.org/10.3390/forecast3040045

AMA Style

Johnston DE. Bayesian Forecasting of Dynamic Extreme Quantiles. Forecasting. 2021; 3(4):729-740. https://doi.org/10.3390/forecast3040045

Chicago/Turabian Style

Johnston, Douglas E. 2021. "Bayesian Forecasting of Dynamic Extreme Quantiles" Forecasting 3, no. 4: 729-740. https://doi.org/10.3390/forecast3040045

Article Menu

Bayesian Forecasting of Dynamic Extreme Quantiles

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI