MScFE 610 ECON - Compiled - Notes - M6
MScFE 610 ECON - Compiled - Notes - M6
MScFE 610 ECON - Compiled - Notes - M6
Compiled Notes
Module 6
MScFE 610
Econometrics
Table of Contents
Bibliography....................................................................................................................... 28
2
Consider two assets with expected returns 𝑟𝑟1𝑡𝑡+1 and 𝑟𝑟2𝑡𝑡+1 ; expected conditional variances 𝜎𝜎1𝑡𝑡+1 and
2
𝜎𝜎2𝑡𝑡+1 ; and expected conditional covariance 𝜎𝜎12,𝑡𝑡+1 .
As in the static case, a portfolio with weight 𝑤𝑤 on asset 1 will have expected portfolio variance that
depends on the weight, and is given by:
2 2
(𝑤𝑤) = 𝑤𝑤 2 𝜎𝜎1𝑡𝑡+1 2
𝜎𝜎𝑝𝑝,𝑡𝑡+1 + (1 − 𝑤𝑤)2 𝜎𝜎2𝑡𝑡+1 + 2𝑤𝑤(1 − 𝑤𝑤)𝜎𝜎12,𝑡𝑡+1
By standard calculus results, we can minimize this variance by finding the first order condition with
respect to the portfolio weight, setting it to zero and solving for the optimal weight 𝑤𝑤 ∗:
2 (𝑤𝑤)
𝑑𝑑𝜎𝜎𝑝𝑝,𝑡𝑡+1 2 2
= 2𝑤𝑤𝜎𝜎1𝑡𝑡+1 + (2𝑤𝑤 − 2)𝜎𝜎2𝑡𝑡+1 + (2 − 4𝑤𝑤)𝜎𝜎12,𝑡𝑡+1
𝑑𝑑𝑑𝑑
2 (𝑤𝑤 ∗ )
𝑑𝑑𝜎𝜎𝑝𝑝,𝑡𝑡+1
Solving = 0 for 𝑤𝑤 ∗ yields:
𝑑𝑑𝑑𝑑
2
𝜎𝜎2𝑡𝑡+1 − 𝜎𝜎12,𝑡𝑡+1
𝑤𝑤 ∗ = 2 2
𝜎𝜎1𝑡𝑡+1 + 𝜎𝜎2𝑡𝑡+1 − 2𝜎𝜎12,𝑡𝑡+1
Substituting this weight into formulae above yields the expected return and conditional variance of
the minimum variance portfolio.
1
In theory, two perfectly correlated assets can be used to obtain a riskless portfolio, but
in practice this never happens with real world financial assets.
Of course, we applied this in a naïve way to the simplest of possible investment situations, but the
results extend easily to more complicated situations. The critical take-away from this section is that
you should understand both (a) the approaches that convert empirical interdependences across
different assets, and (b) the properties of time-series econometrics, to develop the best investment
approach for a specific investor.
One dimension that you should explore on your own is how the algorithm should be adjusted for an
investor with a longer investment horizon – i.e. if 𝑘𝑘 > 1. Then you will have to make a call on what
average of the predicted features of interest to use for the longer horizon investment.
Unit 2: Copulas
In Module 2 we covered the basic ideas and characteristics of joint distributions, for example, the
bivariate normal and t-distributions. These readily extend to fully characterized behaviors for 𝑛𝑛
normal distributed variables (or 𝑛𝑛 variables following a joint t-distribution).
In practice, however, it is unlikely that a single distribution, however precisely dynamically modeled,
would capture the characteristics of a diverse set of assets in a portfolio. What should we do if, say,
one asset is appropriately modeled by a normal distribution and another by a t-distribution (or some
other more heavy-tailed distribution)? It is very unlikely that there even exists a well-characterized
joint distribution of two very disparate marginal distributions, let alone a collection of more varied
distributions.
A copula can be viewed as a “distribution function” that contains all the information of the
interdependencies among a set of jointly distributed variables, but none of the information of the
marginal distribution of any of the constituent variables of the joint distribution. Remarkably, this
is always possible. That is, we can always decompose the joint distribution of any set of variables,
regardless of the individual marginal distributions into a copula and the individual marginal
distributions. The copula is a function that takes the individual marginal distributions as arguments
and yields the joint distribution function as output. This result is due to a theorem by Sklar 2 ( 1959).
Formulas
The copula of (𝑋𝑋1 , 𝑋𝑋2 , … . , 𝑋𝑋𝑑𝑑 ) is defined as the joint cumulative distribution function of
(𝑈𝑈1 , 𝑈𝑈2 , . . . , 𝑈𝑈𝑑𝑑 )
𝐶𝐶(𝑢𝑢1 , 𝑢𝑢2 , … , 𝑢𝑢𝑑𝑑 ) = 𝑃𝑃[𝑈𝑈1 ≤ 𝑢𝑢1 , 𝑈𝑈2 ≤ 𝑢𝑢2 , … , 𝑈𝑈𝑑𝑑 ≤ 𝑢𝑢𝑑𝑑 ]
2
The original text is in French.
By applying the probability integral transform to each component, the random vector
(𝑈𝑈1 , 𝑈𝑈2 , … , 𝑈𝑈𝑑𝑑 ) = �𝐹𝐹1 (𝑋𝑋1 ), 𝐹𝐹2 (𝑋𝑋2 ), … . , 𝐹𝐹𝑑𝑑 (𝑋𝑋𝑑𝑑 )� has uniformly distributed marginal.
In probabilistic terms:
In analytic terms:
• 𝐶𝐶(𝑢𝑢1 , … , 𝑢𝑢𝑖𝑖−1 , 0, 𝑢𝑢𝑖𝑖+1 , … , 𝑢𝑢𝑑𝑑 ) = 0 (the copula is zero if one of the arguments is zero)
• 𝐶𝐶(1, … ,1, 𝑢𝑢, 1, … ,1) = 𝑢𝑢, (the copula is equal to 𝑢𝑢 if one argument is 𝑢𝑢 and all others are equal
to 1)
• 𝐶𝐶 is 𝑑𝑑 −increasing for each hyperrectangle 𝐵𝐵 = ∏𝑑𝑑𝑖𝑖=1[𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 ] ⊆ [0,1]𝑑𝑑 the 𝐶𝐶-volume of B is
non-negative:
where
Sklar’s theorem
According to Sklar’s theorem, every multivariate cumulative distribution function can be expressed
using only the marginals.
A random vector (𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑑𝑑 ) can be expressed by involving only the marginals as following:
𝐶𝐶 – copula
𝑓𝑓 – density function
𝑐𝑐 – density of the copula
𝑓𝑓(𝑥𝑥1 , … , 𝑥𝑥𝑑𝑑 ) = 𝑐𝑐�𝐹𝐹1 (𝑥𝑥1 ), … , 𝐹𝐹𝑑𝑑 (𝑥𝑥𝑑𝑑 )� ∙ 𝑓𝑓1 (𝑥𝑥1 ), . . . , 𝑓𝑓𝑑𝑑 (𝑥𝑥𝑑𝑑 )
The copula is unique on 𝑅𝑅𝑅𝑅𝑅𝑅(𝐹𝐹1 ) ×. . .× 𝑅𝑅𝑅𝑅𝑅𝑅(𝐹𝐹𝑑𝑑 )(the cartesian product of the ranges of the marginal
Cumulative Density Functions (CDFs). If the marginals 𝐹𝐹𝑖𝑖 are continuous, the copula is unique.
Let’s study the type of relationships generated by the normal and t-copulas. For each, we generate
random bivariate draws from the copula and consider their scatterplots to evaluate the
dependence:
library(copula)
norm.cop <- normalCopula(0.0)
U0 <- rCopula(200, norm.cop)
plot(U0,xlab=expression(u[1]), ylab=expression(u[2]))
When there is no correlation in the normal copula, the random draws are clearly unrelated and
uniform. They seem to “fill” the unit square with no apparent relationship.
When we use a normal copula with correlation 0.5, we begin to see a positive dependency. Note
that each 𝑈𝑈 is still individually uniformly distributed, jointly, the points are closer to the positive
diagonal.
When we use a normal copula with a strong negative correlation, -0.95, we see the clear negative
tendency: the points are clustered around the negative diagonal. We can also see the elliptical
nature of the distribution: both extreme ends of the distribution suggest tail dependencies: when
𝑢𝑢1 is very low, 𝑢𝑢2 tends to be very high, and when 𝑢𝑢1 is very high, 𝑢𝑢2 tends to be very low.
In the t-copula, we want to show the impact of the degrees of freedom/tail index, so we do two
draws with same high correlation but vary the tail index.
When the tail index is high, we get a picture similar to the normal distribution, as expected. There is
a clear positive dependence, but the tails are not particularly extremely closely clustered. We can
also see the elliptical nature of the clustering of the points.
When the tail index is very low, we see strong clustering at both tails, even though there are some
points far off the diagonal. This will be a sensible model for two assets that sometimes have strong
tail dependencies in both tails, but not always.
Turning to the Clayton copula, we vary the 𝜃𝜃 parameter to show how this changes
the shape of the distribution:
As we increase the 𝜃𝜃 from 2 to 4, we can clearly see the increase in lower tail dependency, with no
obvious change in upper tail dependency (although there must be some – see below). This would
thus be a copula that would describe the dependency between two assets where they tend to be
highly correlated when returns are low but much less correlated when returns are high. We can also
clearly observe that this is not an elliptical distribution like the normal or t-copulas. You should
experiment to show that as 𝜃𝜃 grows very large (for example, 100) the distribution collapses to the
positive diagonal. For instance, 𝑢𝑢1 and 𝑢𝑢2 are essentially the same random variable.
As we increase the 𝜃𝜃 from 2 to 4, we can clearly see the increase in upper tail dependency, with much
slower increase lower tail dependency. Thus, this would be a copula that would describe the
dependency between two assets where they tend to be highly correlated when returns are high but
much less correlated when returns are low. We can also clearly observe that this is, again, not an
elliptical distribution like the normal or t-copulas. You should experiment to show that as 𝜃𝜃 grows
very large (for example 100) the distribution also collapses to the positive diagonal. An example of
this is the co-monotonicity copula.
Copulas can be used for the simulation of loss distributions of credit portfolios. In this section, we
show how to use correlated random variables in copulas. We use the following steps:
• Simulate a pair of correlated random variables using a Gaussian copula (we initially indicate
the correlation)
• Simulate a pair of correlated random variables using a t-copula (we initially indicate the
correlation)
• Estimate the Gaussian copula parameter using the Maximum Likelihood Estimation
plot(v1)
plot(v2)
fit.ml <- fitCopula(normal_copula, v1, method = "ml")
fit.ml
#Results obtained
#The estimated correlation at about 0.59657
1.0
1.0
0.8
0.8
0.6
0.6
v1[,2]
v2[,2]
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
v1[,1] v2[,1]
Figure 1: Scatter plot of random variable pairs generated by Gaussian copula (left). Scatter plot of random
variable pairs generated by t-copula (right).
Copula in finance
• Risk management
• Credit scoring
• Default risk modelling
• Derivative pricing
• Asset allocation
• The 2008-2009 global crisis is said to have been driven by the extensive use of Gaussian
copulas that did not correctly calculate the collapse of the financial systems. Experts
criticized the simplicity of Gaussian copulas that cannot adequately model the complex
dependencies in a portfolio.
• During periods of upward movement, investors tend to buy riskier assets (equities,
derivatives or real estate assets), while in periods of financial crisis, investors tend to invest
more in cash or bonds (known as the “flight-to-quality effect”).
• Research shows that equities tend to be more correlated during a downward movement of
the market compared to an upward movement.
• Negative news has a more significant impact on stock prices as compared to positive news.
Copula can be used to perform stress-tests and robustness checks during periods of
financial crisis / downward movement / panic.
• The correlation coefficient cannot tell us the entire story behind asset interactions.
Therefore, copula can provide better information regarding asset dependencies.
• Gaussian and Student-t copulas only model elliptical dependence structures and do not
allow for correlation asymmetries where correlations differ on the upside or downside
regimes.
• Vine copula (pair copula) allow us to flexibly model the dependencies in large dimension
portfolios.
• Panic copula estimated by Monte Carlo simulation quantifies the effect of panic in financial
markets on portfolio losses – you will learn more about this in the Computational Finance
course.
• Copulas are used on a large scale to model Collateralized Debt Obligations (CDOs).
Practice exercise
1 Simulate a pair of correlated random variables using a Gaussian copula (the correlation =
0.75, number of simulations = 600, seed = 110).
2 Simulate a pair of correlated random variables using a t-copula (the correlation = 0.75,
number of simulations = 600, seed = 110).
3 Estimate the Gaussian copula parameter using the Maximum Likelihood Estimation.
Introduction
Investing is risky, as financial markets are volatile by their very nature.
Due to this volatility, key questions that investors (and therefore financial engineers) should be
asking are:
• If I believe there is a 5% chance of a very bad shock to the assets in my portfolio over my
investment horizon, what am I likely to lose if it does occur?
• How likely is it that an unusually high (or low) return on a specific stock will occur over the
next 21 days?
• What is the probability that I will get a net return of at least 𝑧𝑧% by investing in a specific
stock for a year?
The first question refers to an often-used quantification of risk known as “Value at Risk” (VAR). This
question and the second one are both about the statistical and probabilistic aspects of rare, extreme
events. These are often called “tail events”, referring to the tails of probability density functions.
The last question is about probabilities of exceeding thresholds, which is intimately related to
extreme events.
In this module we will be studying how to model these extreme events. The theoretical basis for this
analysis is called Extreme Value Theory.
Extreme events are rare by their very definition, so the statistical analysis we have to do is very
different from Modules 2 to 5 in this course. In previous modules, we used the conditional
expectation as a modeling device and, in doing so, used the densest part of the dataset for inference.
In extreme value analysis, we will instead use the sparsest part of the dataset for inference.
To accomplish this, we can use strong results from probability theory that have been turned into
simple statistical procedures to provide good inference about rare events. This is the technique that
we will develop in this module.
For further reading, An Introduction to Statistical Modeling of Extreme Values provides a very
accessible and detailed analysis of the statistical treatment of extreme events with very broad
applications (Coles, 2001). Tsay (2010) provides applications to financial aspects exclusively with
examples in R, as well as more recent, simpler estimation techniques that we will return to in the
last module of this course.
The concept of “an order statistic” refers to the statistical description of the 𝑘𝑘 𝑡𝑡ℎ largest value in this
sample of 𝑛𝑛 elements.
We will focus on the simplest order statistic: the maximum value that the random variable may take
in a sample. The analysis of the other extreme values (e.g. the third largest value in the sample) are
very similar and not as relevant to the financial engineer. Note that it is trivial to use the same
analysis we develop for the maximum element if we actually care about the minimum element of
{𝑥𝑥𝑖𝑖 }𝑛𝑛𝑖𝑖=1 , since max{𝑥𝑥1 , ⋯ , 𝑥𝑥𝑛𝑛 } = min{−𝑥𝑥1 , ⋯ , −𝑥𝑥𝑛𝑛 }.
Suppose the common distribution of the variable 𝑥𝑥 is given by 𝐹𝐹(. ), i.e., Pr(𝑥𝑥𝑖𝑖 ≤ 𝑞𝑞) = 𝐹𝐹(𝑞𝑞) ∀𝑖𝑖. What
is the probability that the largest value of this sample is smaller than or equal to 𝑞𝑞? This is the same
as the probability that all of the 𝑛𝑛 values of {𝑥𝑥𝑖𝑖 }𝑛𝑛𝑖𝑖=1 are smaller than or equal to 𝑞𝑞. Since the variables
are independent, the joint probability that all elements of {𝑥𝑥𝑖𝑖 }𝑛𝑛𝑖𝑖=1 is smaller than or equal to 𝑞𝑞 is
simply the product of the individual probabilities that each element in {𝑥𝑥𝑖𝑖 }𝑛𝑛𝑖𝑖=1 is smaller than or
equal to 𝑞𝑞:
Pr(max[{𝑥𝑥𝑖𝑖 }𝑛𝑛𝑖𝑖=1 ] ≤ 𝑞𝑞) = Pr(𝑥𝑥1 ≤ 𝑞𝑞) ⋅ Pr(𝑥𝑥2 ≤ 𝑞𝑞) ⋅ … ⋅ Pr(𝑥𝑥𝑛𝑛 ≤ 𝑞𝑞) = [𝐹𝐹(𝑞𝑞)]𝑛𝑛
3
This can be extended to dependent but stationary variables under the mild restriction that they are “square
summable”, i.e., that the expectation of the square of the variable exists. This requires that the variable is not “too
persistent”. Typical log returns on an asset easily satisfies these criteria.
This suggests a modeling strategy that might initially seem sensible: estimate the common
distribution of a single observation from the sample. For example: use all 𝑛𝑛 observations to get an
𝑛𝑛
estimate of the common distribution, 𝐹𝐹� (𝑞𝑞) and base inference about the maximum on �𝐹𝐹� (𝑞𝑞)� .
Typical samples of daily returns can be large: the sample of the S&P500 returns we use in the
empirical application below has more than 2000 observations. The smallest mistake we make in our
𝑛𝑛
estimate of 𝐹𝐹� (𝑞𝑞) will be enormously inflated in our estimate of the joint distribution �𝐹𝐹� (𝑞𝑞)� ,
especially about the extreme elements we might draw from this joint distribution. Our inference
𝑛𝑛
about the expected value of [𝐹𝐹(𝑞𝑞)]𝑛𝑛 might not be far off if we use �𝐹𝐹� (𝑞𝑞)� , since errors <1 in size are
reduced when we raise them to higher powers. However, we are interested in inference about the
tails of [𝐹𝐹(𝑞𝑞)]𝑛𝑛 . These are inflated when we raise them to higher powers.
It turns out this problem is simply and elegantly solved by Extreme Value Theory.
GEV Distribution
In a specific empirical example, we will not know the fundamental distribution 𝐹𝐹(𝑞𝑞) of the random
variables. How do we know how to define the location and scale parameters? This turns out to be a
problem we can accommodate.
Since we need to estimate unknown parameters anyway, we may use the following argument in
approximation:
if
𝑚𝑚𝑛𝑛 − 𝑎𝑎𝑛𝑛
𝑚𝑚𝑛𝑛∗ = ~ 𝐺𝐺(𝑞𝑞)
𝑏𝑏𝑛𝑛 𝑛𝑛→∞
then
𝑚𝑚𝑛𝑛 ~ 𝐺𝐺 ∗ (𝑞𝑞)
𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
where 𝐺𝐺 ∗ (𝑞𝑞) is still a member of the generalized extreme value (GEV) distribution family, although
with different location and scale parameters (but the same shape parameter). Thus, we can directly
model the maximum of a sample as following GEV distribution, and directly estimate its parameters
via maximum likelihood
Given that we estimate the three parameters directly, it is standard to use the notation:
• 𝜇𝜇, rather than 𝑎𝑎, for the location parameter (reminiscent of the symbol we use for the
expectation of a distribution), and
• 𝜎𝜎, rather than 𝑏𝑏, for the scale parameter (reminiscent of the symbol we use for the standard
deviation of a distribution).
Thus, in our empirical strategy to gain a statistical description of the properties of extreme values,
we will assume that the maximum from a sample of size 𝑛𝑛 follows the GEV distribution:
1
𝑞𝑞 − 𝜇𝜇 −𝜉𝜉
𝑚𝑚𝑛𝑛 ~ 𝐺𝐺(𝑞𝑞) = exp �− �1 + 𝜉𝜉 � �� �
𝜎𝜎
For concreteness, we assume that we are dealing with the daily log returns on the S&P500 series
and that we have a large number 𝑇𝑇 of observations:
{𝑥𝑥𝑡𝑡 }𝑇𝑇𝑡𝑡=1
First, we need several observations of maxima. For the financial engineer, it should be intuitive that
we care about the likely extreme values over some investment horizon 𝜏𝜏, e.g. 21 days. Thus, we
divide the total series into blocks of length 𝜏𝜏. For convenience, assume that there are exactly 𝑛𝑛 of
these blocks. For each block 𝑖𝑖, we find the maximum value that 𝑥𝑥𝑡𝑡 attains and label it 𝑚𝑚𝜏𝜏,𝑖𝑖 . This yields
a new data set with 𝑚𝑚 values:
𝑚𝑚𝜏𝜏,1 , 𝑚𝑚𝜏𝜏,2,…, 𝑚𝑚𝜏𝜏,𝑛𝑛
It is to this data that we will fit the GEV distribution to find our estimates of 𝜇𝜇, 𝜎𝜎 and 𝜉𝜉, in order to
base our inference and projections about the future.
There is (as always in economics and econometrics) a trade-off to consider. When choosing the size
of the blocks we use in our estimation:
• If the blocks are too small (i.e. we choose many blocks over large blocks), the extrema
identified in each might not be representative of the extrema in the whole sample. A simple
example: suppose there are many blocks with no extreme values. This will bias our
estimates of the parameters of the limiting distribution 𝐺𝐺(𝑞𝑞).
• If there too few blocks (i.e. we choose large blocks over many blocks), the sample of
extrema is too small to obtain strong estimates. Hence, the estimated parameters will have
high variance, which means we cannot make strong inferences.
The correct balance between number of blocks and size of blocks will always be a matter of
judgment of the researcher, conditional on the sample size. To the financial engineer, the desired
window of investment would be a key consideration, but depending on how much data is available,
the financial engineer may have to make this call.
In practice, it is standard to consider a range of block lengths, and to study the stability of the
parameter estimates.
There are a number of suggestions for how to estimate the parameters of the GEV distribution. The
most attractive of these suggestions that we will discuss is maximum likelihood.
There is a complication however: for maximum likelihood results to hold (that is, for the estimators
to be asymptotically normally distributed around their true values) certain regularity conditions
must hold. The GEV distributions in the Weibull and Fréchet cases do not satisfy these conditions.
i. When 𝜉𝜉 > −0.5, maximum likelihood estimators have their usual asymptotic behavior.
ii. When −1 < 𝜉𝜉 < −0.5, maximum likelihood estimates can be found, but they have non-
standard asymptotic behavior.
iii. When 𝜉𝜉 < −1, reliable maximum likelihood estimates cannot be obtained.
They also state, however, that these results are rarely of interest. Note that these concerns all
belong to the case where the GEV distribution is a Weibull distribution – i.e. one with an upper limit.
In financial applications this is unlikely to be of interest. Indeed, we will see that the GEV distribution
fitted to financial data usually gives 𝜉𝜉 > 0, i.e., a Fréchet distribution, which is unbounded above
and hence we can rely on maximum likelihood results.
The maximum likelihood approach is particularly attractive here. Even though the approach was
initially developed for the “central” or “most common” observations, given that we have a simple
distribution function for the extremes, we can obtain estimates in exactly the same way. The log
likelihood function given the 𝑛𝑛 block maxima, 𝑚𝑚𝜏𝜏,𝑖𝑖 , is simple.
𝑛𝑛 𝑛𝑛 1
1 𝑚𝑚𝜏𝜏,𝑖𝑖 − 𝜇𝜇 𝑚𝑚𝜏𝜏,𝑖𝑖 − 𝜇𝜇 −𝜉𝜉
ℓ(𝜇𝜇, 𝜎𝜎, 𝜉𝜉 ≠ 0) = −𝑛𝑛 log 𝜎𝜎 − �1 + � � log �1 + 𝜉𝜉 � �� − � �1 + 𝜉𝜉 � ��
𝜉𝜉 𝜎𝜎 𝜎𝜎
𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛 𝑛𝑛
𝑚𝑚𝜏𝜏,𝑖𝑖 − 𝜇𝜇 𝑚𝑚𝜏𝜏,𝑖𝑖 − 𝜇𝜇
ℓ(𝜇𝜇, 𝜎𝜎, 𝜉𝜉 = 0) = −𝑛𝑛 log 𝜎𝜎 − � � � − � exp �− � ��
𝜎𝜎 𝜎𝜎
𝑖𝑖=1 𝑖𝑖=1
However complicated these equations might look, standard numerical methods easily maximize
these functions. The difference between these functions near 𝜉𝜉 = 0 are easily dealt with by having
a two-part process that uses ℓ(𝜇𝜇, 𝜎𝜎, 𝜉𝜉 = 0), instead of ℓ(𝜇𝜇, 𝜎𝜎, 𝜉𝜉 ≠ 0) when estimates of 𝜉𝜉 seem to
converge to some small vicinity of 0. These are issues you should explore if your applications are
sensitive to this point.
In most financial applications, however, the estimates imply a 𝜉𝜉 which is significantly positive (i.e. a
Fréchet distribution, which has a lower bound, but no upper bound). This makes intuitive sense: it
would be strange if the largest return on a financial asset could be unboundedly negative, which is
one of the implications of the Gumbel (𝜉𝜉 = 0) distribution.
A variety of other estimation approaches have also been developed, including non-parametric
approaches which we will cover in the empirical application.
We can evaluate the goodness of fit of this model by comparing it to the empirical distribution which
imposes no functional form or parameterization, and merely uses the observed sample in the most
direct way possible. This is called the empirical distribution and is defined as follows:
where the new subscripts mean we ignore when a specific block maximum occurred, and treat all of
them as “random draws” from the true distribution of extreme values and simply order them from
small to large.
The empirical density function treats these as independent draws and hence assigns equal
probability to having observed each one, i.e., we assume than in any arbitrary block of length 𝜏𝜏, the
probability of the block maximum 𝑚𝑚𝑖𝑖 being any one of the observed block maxima is equal to the
frequency of that block maximum. Since in a sample we would expect the block maxima to all be
different, this boils down to assuming 4:
1
Pr�𝑚𝑚𝑖𝑖 = 𝑚𝑚(1) � = Pr�𝑚𝑚𝑖𝑖 = 𝑚𝑚(2) � = ⋯ = Pr�𝑚𝑚𝑖𝑖 = 𝑚𝑚(𝑛𝑛) � =
𝑛𝑛 + 1
The empirical distribution function, 𝐺𝐺� (. ), is simply the cumulative sum of these probabilities, from
the smallest to the largest block maximum. As this is now a discrete distribution with 𝑛𝑛 elements,
each observed block maximum is also a quantile. Thus
1 1
4
Using rather than is an adjustment to prevent the largest maximum to be an imposed upper end to the
𝑛𝑛+1 𝑛𝑛
empirical distribution. This adjustment vanishes as 𝑛𝑛 → ∞.
𝑖𝑖
𝐺𝐺� �𝑚𝑚(𝑖𝑖) � =
𝑛𝑛 + 1
From this we can construct several visual approaches to check the reliability of our estimated GEV
distribution by evaluating the distribution at the estimated parameters 𝜇𝜇̂ , 𝜎𝜎�, 𝜉𝜉̂ and each of the
observed block maxima, which we will denote:
1
−
𝑚𝑚(𝑖𝑖) − 𝜇𝜇̂ 𝜉𝜉�
𝐺𝐺� �𝑚𝑚(𝑖𝑖) � = exp �− �1 + 𝜉𝜉̂ � �� �
𝜎𝜎�
Note again that 𝐺𝐺� �𝑚𝑚(𝑖𝑖) � imposes the GEV distribution functional form (an asymptotic argument),
whereas 𝐺𝐺� �𝑚𝑚(𝑖𝑖) � does not.
Probability Plot
A probability plot compares the cumulative probability assigned to each observed block maximum
by the fitted GEV distribution to the cumulative probability assigned to each observed block
maximum assigned by the empirical distribution. If the GEV distribution assumption is absolutely
correct, these probability assignments should be perfect, i.e., a scatter plot of 𝐺𝐺� �𝑚𝑚(𝑖𝑖) � against
𝐺𝐺� �𝑚𝑚(𝑖𝑖) � should trace out a unit slope line through the origin.
Large deviations from the unit slope line through the origin would indicate that the GEV distribution
assumption is a poor approximation of the data-generating process at hand. We can also add
confidence intervals based on the uncertainty around the estimates 𝜇𝜇̂ , 𝜎𝜎�, 𝜉𝜉̂ as an aid to inference
about statistically significant deviations from the assumptions we make.
A problem with this plot is that both 𝐺𝐺� �𝑚𝑚(𝑖𝑖) � and 𝐺𝐺� �𝑚𝑚(𝑖𝑖) � converge to 1 as 𝑛𝑛 → ∞. This is undesirable
as we care most about the largest values of 𝑚𝑚(𝑖𝑖) .
Quantile Plot
An alternative to the probability plot is the quantile plot. That is, rather than comparing the value of
the empirical and modeled cumulative probability distribution functions, we will compare:
• the quantiles that the modeled cumulative probability distribution function would have
assigned to the each empirical cumulative probability, to
• the actually observed quantiles (i.e. the observed block maxima).
That is, we define the quantiles corresponding to the modeled distribution function evaluated at the
points of the empirical distribution function as:
�
−𝜉𝜉
⎧𝜇𝜇̂ − 𝜎𝜎� �1 − �− log � 𝑖𝑖 �� � , for 𝜉𝜉̂ ≠ 0
𝑖𝑖 ⎪
𝑚𝑚
� (𝑖𝑖) = 𝐺𝐺� −1 � �= 𝜉𝜉̂ 𝑛𝑛 + 1
𝑛𝑛 + 1 ⎨ 𝑖𝑖
⎪ 𝜇𝜇̂ − 𝜎𝜎� log �− log � �� for 𝜉𝜉̂ = 0
⎩ 𝑛𝑛 + 1
Again, if the GEV distributional assumption is perfect, the scatterplot will fall perfectly on the unit
slope line through the origin, i.e. 𝑚𝑚
� (𝑖𝑖) = 𝑚𝑚(𝑖𝑖) for all 𝑖𝑖. Thus, the extent of the deviation from this line
is a measure of the deviation of the data-generating process from the assumption that its extrema
follows a GEV distribution.
As previously described, in an empirical application the plot of return levels against the return
period gives an indication of how close the GEV assumption can be for the specific process being
modeled.
Comparing the predicted GEV density, given the estimates of parameters, 𝜇𝜇̂ , 𝜎𝜎�, 𝜉𝜉̂, to the histogram
of observed block maxima is the last standard graphical test of estimation accuracy. This has the
least power, as histograms are strongly dependent on the size of samples and size of bins. It is
difficult to distinguish histograms of Gumbel and Fréchet densities visually, as (in empirical
applications) the lower tail of the Gumbel distribution (which is very close to zero) looks similar to
the lower tail of the Fréchet distribution (which is always equal to zero below the lowest bound of
its range).
In the following example, the GEV results have been extended to the generalized Pareto
distribution, which models the likelihood of a process exceeding some predefined threshold.
It is derived as follows (with the details in Coles (2001)): if the block maxima of a process follows the
GEV distribution with parameters 𝜇𝜇, 𝜎𝜎, 𝜉𝜉; then the distribution of (𝑥𝑥𝑡𝑡 − 𝑢𝑢) for some upper threshold
𝑢𝑢, conditional on 𝑥𝑥𝑡𝑡 > 𝑢𝑢, follows the generalized Pareto distribution (GPD). In other words, for the
part of the distribution of the process where (𝑥𝑥𝑡𝑡 − 𝑢𝑢) > 0, the distribution that describes the
randomness in the extrema of 𝑥𝑥𝑡𝑡 above the threshold 𝑢𝑢 is given by:
1
−
𝜉𝜉(𝑥𝑥𝑡𝑡 − 𝑢𝑢) 𝜉𝜉
(𝑥𝑥𝑡𝑡 − 𝑢𝑢)~𝐻𝐻(𝑥𝑥𝑡𝑡 − 𝑢𝑢) = 1 − �1 + �
𝜎𝜎 + 𝜉𝜉(𝑢𝑢 − 𝜇𝜇)
Moreover, the parameters of the GPD are the same as those identified by the GEV distribution
estimation.
The key parameter that needs to be estimated, which is in principal invariant to block size, is the
shape parameter 𝜉𝜉, hence the GPD provides a more useful characterization of extrema than the
GEV distribution. We will return to this feature in Module 7, where we consider risk management
and its econometric counterparts more explicitly.
The key component in threshold modeling is, unsurprisingly, the choice of the threshold 𝑢𝑢. If we
choose 𝑢𝑢 too low, there will be too many “threshold exceedances”, and our asymptotic analysis will
be biased. If we choose 𝑢𝑢 too high, there will be too few “threshold exceedances”, and our estimates
of the generalized Pareto distribution will have too high variance, and our inferences will not be
informative of the actual process.
In practice, the financial engineer will have to weigh the two sides of this trade-off against each
other to make an appropriate call for the matter at hand.
Bibliography
Bensalah, Y. (2010). Steps in Applying Extreme Value Theory to Finance: A Review. [online] Bank
of Canada. Available at: https://www.banqueducanada.ca/wp-content/uploads/2010/01/wp00-
20.pdf.
Daróczi, G., et al. (2013). ‘Chapter 8: Extreme Value Theory’, in Introduction to R for quantitative
finance. Birmingham: Packt Publishing, pp.113 - 124.
Embrechts, P., Resnick, S. and Samorodnitsky, G. (2000). Extreme Value Theory as a Risk
Management Tool. Derivatives Use Trading & #38 Regulation, [online] 6(1), pp.30-41. Available at:
https://www.researchgate.net/publication/2640395_Extreme_Value_Theory_Potential_And_Limi
tations_As_An_Integrated_Risk_Management_Tool.
Embrechts, P. (2009). Copulas: A Personal View. Journal of Risk and Insurance, [online] 76(3),
pp.639-650. Available at: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1539-
6975.2009.01310.x.
Fernández, V. (2003). Extreme Value Theory: Value at Risk and Returns Dependence Around the
World. Documentos de Trabajo, [online] 161. Available at:
https://ideas.repec.org/p/edj/ceauch/161.html.
Gilli, M. and Këllezi, E. (2006). An Application of Extreme Value Theory for Measuring Financial
Risk. Computational Economics, [online] 27(2-3), pp.207-228. Available at:
https://link.springer.com/article/10.1007/s10614-006-9025-7.
Marimoutou, V., Raggad, B. and Trabelsi, A. (2006). Extreme Value Theory and Value at Risk:
Application to Oil Market. [online] Available at: https://halshs.archives-ouvertes.fr/halshs-
00410746/document.
McNeil, A. (1997). Estimating the Tails of Loss Severity Distributions Using Extreme Value
Theory. ASTIN Bulletin, [online] 27(01), pp.117-137. Available at:
https://www.cambridge.org/core/journals/astin-bulletin-journal-of-the-iaa/article/estimating-
the-tails-of-loss-severity-distributions-using-extreme-value-
theory/745048C645338C953C522CB29C7D0FF9.
McNeil, A., Frey, R. and Embrechts, P. (2005). Quantitative Risk Management: Concepts,
Techniques, and Tools. New Jersey: Princeton University Press.
Ruppert, D. and Matteson, D. (2015). Statistics and Data Analysis for Financial Engineering. New
York: Springer.