Extreme Risk Finance
Extreme Risk Finance
Extreme Risk Finance
ABC
Yannick Malevergne Didier Sornette
Institut de Science Financière et d’Assurances Institute of Geophysics and Planetary Physics
Université Claude Bernard Lyon 1 and Department of Earth and Space Science
50 Avenue Tony Garnier University of California, Los Angeles
69366 Lyon Cedex 07 California 90095
France USA
and and
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations are
liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springeronline.com
c Springer-Verlag Berlin Heidelberg 2006
Printed in The Netherlands
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: by the authors and TechBooks using a Springer LATEX macro package
Cover design: design & production GmbH, Heidelberg
Printed on acid-free paper SPIN: 10939901 54/TechBooks 543210
An error does not become truth by
reason of multiplied propagation, nor
does truth become error because no-
body sees it.
M.K. Gandhi
Preface: Idiosyncratic
and Collective Extreme Risks
return potentials (accompanied with downturn risks). But for this view to
hold so as to promote economic development, fluctuations in values need to
be tamed to minimize the risk of losing a lifetime of savings, or to avoid
the risks of losing the investment potential of companies, or even to prevent
economic and social recessions in whole countries (consider the situation of
California after 2002 with a budget gap representing more than one-fourth of
the entire State budget resulting essentially from the losses of financial and
tax incomes following the collapse of the internet bubble). It is thus highly
desirable to have the tools for monitoring, understanding, and limiting the ex-
treme risks of financial markets. Fully aware of these problems, the worldwide
banking organizations have promoted a series of advices and norms, known as
the recommendations of the Basle committee [41, 42]. The Basle committee
has proposed models for the internal management of risks and the imposi-
tion of minimum margin requirements commensurate with the risk exposures.
However, some criticisms [117, 467] have found these recommendations to be
ill-adapted or even destabilizing. This controversy underlines the importance
of a better understanding of extreme risks, of their consequences and ways to
prevent or at least minimize them.
In our opinion, tackling this challenging problem requires to decompose
it into two main parts. First, it is essential to be able to accurately quan-
tify extreme risks. This calls for the development of novel statistical tools
going significantly beyond the Gaussian paradigm which underpins the stan-
dard framework of classical financial theory inherited from Bachelier [26],
Markowitz [347], and Black and Scholes [60] among others. Second, the ex-
istence of extreme risks must be considered in the context of the practice
of risk management itself, which leads to ask whether extreme risks can be
diversified away similarly to standard risks according to the mean-variance
approach. If the answer to this question is negative as can be surmized for nu-
merous concrete empirical evidences, it is necessary to develop new concepts
and tools for the construction of portfolios with minimum (but unavoidable)
exposition of extreme risks. One can think of mixing equities and derivatives,
as long as derivatives themselves do not add an extreme risk component and
can really provide an insurance against extreme moves, which has been far
from true in recent dramatic instances such as the crash of October 1987.
Another approach could involve mutualism as in insurance.
Risk management, and to the same extent portfolio management, thus re-
quires a precise and rigorous analysis of the distribution of the returns of the
portfolio of risks. Taking into account the moderate sizes of standard portfo-
lios (from tens to thousands of assets typically) and the non-Gaussian nature
of the distributions of the returns of assets constituting the portfolios, the
distributions of the returns of typical portfolios are far from Gaussian, in con-
tradiction with the expectation from a naive use of the central limit theorem
(see for instance Chap. 2 of [451] and other chapters for a discussion of the
deviations from the central limit theorem). This breakdown of universality
then requires a careful estimation of the specific case-dependent distribution
Preface: Idiosyncratic and Collective Extreme Risks IX
of the returns of a given portfolio. This can be done directly using the time
series of the returns of the portfolio for a given capital allocation. A more con-
structive approach consists in estimating the joint distribution of the returns
of all assets constituting the portfolio. The first approach is much simpler and
rapid to implement since it requires solely the estimation of a monovariate
distribution. However, it lacks generality and power by neglecting the observ-
able information available from the basket of all returns of the assets. Only
the multivariate distribution of the returns of the assets embodies the gen-
eral information of all risk components and their dependence across assets.
However, the two approaches become equivalent in the following sense: the
knowledge of the distribution of the returns for all possible portfolios for all
possible allocations of capital between assets is equivalent to the knowledge
of the multivariate distributions of the asset returns. All things considered,
the second approach appears preferable on a general basis and is the method
mobilizing the largest efforts both in academia and in the private sector.
However, the frontal attack aiming at the determination of the multivari-
ate distribution of the asset returns is a challenging task and, in our opinion,
much less instructive and useful than the separate studies of the marginal
distributions of the asset returns on the one hand and the dependence struc-
ture of these assets on the other hand. In this book, we emphasize this second
approach, with the objective of characterizing as faithfully as possible the di-
verse origins of risks: the risks stemming from each individual asset and the
risks having a collective origin. This requires to determine (i) the distributions
of returns at different time scales, or more generally, the stochastic process
underlying the asset price dynamics, and (ii) the nature and properties of
dependences between the different assets.
The present book offers an original and systematic treatment of these two
domains, focusing mainly on the concepts and tools that remain valid for
large and extreme price moves. Its originality lies in detailed and thorough
presentations of the state of the art on (i) the different distributions of finan-
cial returns for various applications (VaR, stress testing), and (ii) the most
important and useful measures of dependences, both unconditional and con-
ditional and a study of the impact of conditioning on the size of large moves
on the measure of extreme dependences. A large emphasis is thus put on the
theory of copulas, their empirical testing and calibration, as they offer intrin-
sic and complete measures of dependences. Many of the results presented here
are novel and have not been published or have been recently obtained by the
authors or their colleagues. We would like to acknowledge, in particular, the
fruitful and inspiring discussions and collaborations with J.V. Andersen, U.
Frisch, J.-P. Laurent, J.-F. Muzy, and V.F. Pisarenko.
Chapter 1 describes a general framework to develop “coherent measures” of
risks. It also addresses the origins of risks and of dependence between assets in
financial markets, from the CAPM (capital asset pricing model) generalized to
the non-Gaussian case with heterogeneous agents, the APT (arbitrage pricing
X Preface: Idiosyncratic and Collective Extreme Risks
theory), the factor models to the complex system view suggesting an emergent
nature for the risk-return trade-off.
Chapter 2 addresses the problem of the precise estimation of the probabil-
ity of extreme events, based on a description of the distribution of asset returns
endowed with heavy tails. The challenge is thus to specify accurately these
heavy tails, which are characterized by poor sampling (large events are rare).
A major difficulty is to neither underestimate (Gaussian error) or overestimate
(heavy tail hubris) the extreme events. The quest for a precise quantification
opens the door to model errors, which can be partially circumvented by using
several families of distributions whose detailed comparisons allow one to dis-
cern the sources of uncertainty and errors. Chapter 2 thus discusses several
classes of heavy tailed distributions: regularly varying distributions (i.e., with
asymptotic power law tails), stretched-exponential distributions (also known
as Weibull or subexponentials) as well as log-Weibull distributions which ex-
trapolate smoothly between these different families.
The second element of the construction of multivariate distributions of as-
set returns, addressed in Chaps. 3–6, is to quantify the dependence structure
of the asset returns. Indeed, large risks are not due solely to the heavy tails of
the distribution of returns of individual assets but may result from a collective
behavior. This collective behavior can be completely described by mathemat-
ical objects called copulas, introduced in Chap. 3, which fully embody the
dependence between asset returns.
Chapter 4 describes synthetic measures of dependences, contrasting and
linking them with the concept of copulas. It also presents an original estima-
tion method of the coefficient of tail dependence, defined, roughly speaking, as
the probability for an asset to lose a large amount knowing that another asset
or the market has also dropped significantly. This tail dependence is of great
interest because it addresses in a straightforward way the fundamental ques-
tion whether extreme risks can be diversified away or not by aggregation in
portfolios. Either the tail dependence coefficient is zero and the extreme losses
occur asymptotically independently, which opens the possibility of diversify-
ing them away. Alternatively, the tail dependence coefficient is non-zero and
extreme losses are fundamentally dependent and it is impossible to completely
remove extreme risks. The only remaining strategy is to develop portfolios that
minimize the collective extreme risks, thus generalizing the mean-variance to
a mean-extreme theory [332, 336, 333].
Chapter 5 presents the main methods for estimating copulas of financial
assets. It shows that the empirical determination of a copula is quite delicate
with significant risks of model errors, especially for extreme events. Specific
studies of the extreme dependence are thus required.
Chapter 6 presents a general and thorough discussion of different mea-
sures of conditional dependences (where the condition can be on the size(s)
of one or both returns for two assets). Chapter 6 thus sheds new light on the
variations of the strength of dependence between assets as a function of the
sizes of the analyzed events. As a startling concrete application of conditional
Preface: Idiosyncratic and Collective Extreme Risks XI
3 Notions of Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.1 What is Dependence? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.2 Definition and Main Properties of Copulas . . . . . . . . . . . . . . . . . . 103
3.3 A Few Copula Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.3.1 Elliptical Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.3.2 Archimedean Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.3.3 Extreme Value Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.4 Universal Bounds for Functionals
of Dependent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.5 Simulation of Dependent Data with a Prescribed Copula . . . . . 120
3.5.1 Simulation of Random Variables Characterized
by Elliptical Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.5.2 Simulation of Random Variables Characterized
by Smooth Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.6 Application of Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.6.1 Assessing Tail Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.6.2 Asymptotic Expression of the Value-at-Risk . . . . . . . . . . 128
3.6.3 Options on a Basket of Assets . . . . . . . . . . . . . . . . . . . . . . . 131
3.6.4 Basic Modeling of Dependent Default Risks . . . . . . . . . . . 137
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Contents XV
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
1
On the Origin of Risks and Extremes
(a) Risk is embedded in the amplitude of the fluctuations of the returns. its
simplest traditional measure is the standard deviation (square-root of the
variance).
(b) The dependence between the different assets of a portfolio of positions
is traditionally quantified by the correlations between the returns of all
pairs of assets.
Thus, in their most basic incarnations, both risk and dependence are thought
of, respectively, as one-dimensional quantities: the standard deviation of
the distribution of returns of a given asset and the correlation coefficient
of these returns with those of another asset of reference (the “market” for
instance). The standard deviation (or volatility) of portfolio returns provides
the simplest way to quantify its fluctuations and is at the basis of Markowitz’s
portfolio selection theory [347]. However, the standard deviation of a portfolio
offers only a limited quantification of incurred risks (seen as the statistical fluc-
tuations of the realized return around its expected – or anticipated – value).
This is because the empirical distributions of returns have “fat tails” (see
Chap. 2 and references therein), a phenomenon associated with the occur-
rence of non-typical realizations of the returns. In addition, the dependences
between assets are only imperfectly accounted for by the covariance matrix
[309].
The last few decades have seen two important extensions.
• First, it has become clear, as synthesized in Chap. 2, that the standard
deviation offers only a reductive view of the genuine full set of risks em-
bedded in the distribution of returns of a given asset. As distributions of
returns are in general far from Gaussian laws, one needs more than one
centered moment (the variance) to characterize them. In principle, an in-
finite set of centered moments is required to faithfully characterize the
potential for small all the way to extreme risks because, in general, large
risks cannot be predicted from the knowledge of small risks quantified by
the standard deviation. Alternatively, the full space of risks needs to be
characterized by the full distribution function. It may also be that the dis-
tributions are so heavy-tailed that moments do not exist beyond a finite
order, which is the realm of asymptotic power law tails, of which the stable
Lévy laws constitute an extreme class. The Value-at-Risk (VaR) [257] and
many other measures of risks [19, 20, 73, 447, 453] have been developed to
account for the larger moves allowed by non-Gaussian distributions and
non-linear correlations.
• Second and more recently, the correlation coefficient (and its associated
covariance) has been shown to only be a partial measure of the full de-
pendence structure between assets. Similarly to risks, a full understanding
of the dependence between two or more assets requires, in principle, an
infinite number of quantifiers or a complete dependence function such as
the copulas, defined in Chap. 3.
1.1 The Multidimensional Nature of Risk and Dependence 3
According to Artzner et al. [19, 20], the risk involved in the variations of the
values of a market position is measured by the amount of capital invested
in a risk-free asset, such that the market position can be prolonged in the
future. In other words, the potential losses should not endanger the future
actions of the fund manager of the company, or more generally, of the person
or structure which underwrites the position. In this sense, a risk measure
1.2 How to Rank Risks Coherently? 5
constitutes for Artzner et al. a measure of economic capital. The risk measure
ρ can be either positive, if the risk-free capital must be increased to guarantee
the risky position, or negative, if the risk-free capital can be reduced without
invalidating it.
A risk measure is said to be coherent in the sense of Artzner et al. [19, 20]
if it obeys the four properties or axioms that we now list. Let us call G the
space of risks. If the space Ω of all possible states of nature is finite, G is
isomorphic to RN and a risky position X is nothing but a vector in RN . A risk
measure ρ is then a map from RN onto R. A generalization to other spaces G
of risk has been proposed by Delbaen [123].
Let us consider a risky position with terminal value X and a capital α
invested in the risk-free asset at the beginning of the time period. At the end
of the time period, α becomes α · (1 + µ0 ), where µ0 is the risk-free interest
rate. Then,
Axiom 1 (Translational Invariance)
These four axioms define the coherent measures of risks, which admit the
following general representation:
−X
ρ(X) = sup EP , (1.5)
P∈P 1 + µ0
∀X ∈ G , ρ̃(X) ≥ 0 , (1.11)
where the equality holds if and only if X is certain. Let us now add to this
position a given amount α invested in the risk-free asset whose return is µ0
(with therefore no randomness in its price trajectory) and define the future
wealth of the new position Y = X + α(1 + µ0 ). Since µ0 is non-random,
the fluctuations of X and Y are the same. Thus, it is desirable that ρ̃ en-
joys a property of translational invariance, whatever X and the non-random
coefficient α may be:
We also require that the risk measure increases with the quantity of assets
held in the portfolio. This assumption reads
Note that the case of liquid markets is recovered by ζ = 1 for which the risk is
directly proportional to the size of the position, as in the case of the coherent
risk measures.
These axioms, which define the so-called consistent measures of risk [333]
can easily be extended to the risk measures associated with the return on the
2
Using the trick ρ̃(λ1 λ2 ·X) = f (λ1 )·ρ̃(λ2 ·X) = f (λ1 )·f (λ2 )·ρ̃(X) = f (λ1 ·λ2 )·ρ̃(X)
leading to f (λ1 · λ2 ) = f (λ1 ) · f (λ2 ). The unique increasing convex solution of
this functional equation is fζ (λ) = λζ with ζ ≥ 1.
1.2 How to Rank Risks Coherently? 9
risky position. Indeed, a one-period return is nothing but the variation of the
value of the position divided by its initial value X0 . One can thus easily check
that the risk defined on the risky position is [X0 ]ζ times the risk defined on
the return distribution. In the following, we will only consider the risk defined
on the return distribution and, to simplify the notations, the symbol X will be
used to denote both the asset price and its return in their respective context
without ambiguity.
Now, restricting to the case of a perfectly liquid market (ζ = 1) and adding
a sub-additivity assumption
Axiom 9 (Sub-additivity)
one obtains the so-called general deviation measures [407]. Again, this axiom is
open to controversy and its main raison d’être is to ensure the well-posedness
of optimization problems (such as minimizing portfolio risks). It could be
weakened along the lines used previously to derive the convex measures of
risk from the coherent measures of risk.
One can easily check that the deviation measures defined in (1.16) cor-
respond one-to-one to the expectation-bounded measures of risk defined in
(1.10) through the relation
E [−X]
ρ(X) = ρ̃(X) + ⇐⇒ ρ̃(X) = ρ (X + E [−X]) . (1.17)
1 + µ0
The set of risk measures obeying Axioms 7–8 is huge since it includes all the
homogeneous functionals of (X − E[X]), for instance. The centered moments
(or moments about the mean) and the cumulants are two well-known classes
of semi-invariants. Then, a given value of ζ can be seen as nothing but a
specific choice of the order n of the centered moments or of the cumulants.4
In this case, the risk measure defined via these semi-invariants fulfills the two
following conditions:
ρ̃(X + µ) = ρ̃(X) , (1.20)
ρ̃(λ · X) = λ · ρ̃(X) .
n
(1.21)
4
The relevance of the moments of high order for the assessment of large risks is
discussed in Appendix 1.A.
1.2 How to Rank Risks Coherently? 11
In order to satisfy the positivity condition (Axiom 6), one needs to restrict
the set of values taken by n. By construction, the centered moments of even
order are always positive while the odd order centered moments can be neg-
ative. In addition, a vanishing value of an odd order moment does not mean
that the random variable, or risk, X ∈ G is certain in the sense of footnote 1,
since for instance any symmetric random variable has vanishing odd order
moments. Thus, only the even-order centered moments seem acceptable risk
measures. However, this restrictive constraint can be relaxed by first recalling
that, given any homogeneous function f (·) of order p, the function f (·)q is
also homogeneous of order p · q. This allows one to decouple the order of the
moments to consider, which quantifies the impact of the large fluctuations,
from the influence of the size of the positions held, measured by the degree
of homogeneity of the measure ρ̃. Thus, considering any even-order centered
ζ/2n
moments, we can build a risk measure ρ̃(X) = E (X − E[X])2n , which
accounts for the fluctuations measured by the centered moment of order 2n
but with a degree of homogeneity equal to ζ.
A further generalization is possible for odd-order moments. Indeed, the
absolute centered moments satisfy the three Axioms 6–8 for any odd or even
order. So, we can even go one step further and use non-integer order absolute
centered moments, and define the more general risk measure
ζ/γ
ρ̃(X) = E [|X − E[X]|γ ] , (1.22)
which ensures that aggregating two risky assets diversifies their risk. In fact,
in the special case γ = 1, these measures enjoy the stronger sub-additivity
property, and therefore belong to the class of general deviation measures.
More generally, any discrete or continuous (positive) sum of these risk
measures with the same degree of homogeneity is again a risk measure.
This allows us to define “spectral measures of fluctuations” in the spirit of
Acerbi [2]:
γ ζ/γ
ρ̃(X) = dγ φ(γ) E [|X − E[X]| ] , (1.24)
The situation is not so clear for the cumulants, since the even-order cumu-
lants, as well as the odd-order ones, can be negative (even if, for a large class
of distributions, even-order cumulants remain positive, especially for fat-tailed
distributions – even though there are simple but somewhat artificial counter-
examples). In addition, cumulants suffer from another problem with respect
to the positivity axiom. As for the odd-order centered moments, they can
vanish even when the random variable is not certain. Just think of the cu-
mulants of the Gaussian law. All but the first two (which represent the mean
and the variance) are equal to zero. Thus, the strict formulation of the posi-
tivity axiom cannot be fulfilled by the cumulants. Should we thus reject them
as useful measures of risks? It is important to emphasize that the cumulants
enjoy a property which can be considered as a natural requirement for a risk
measure. It can be desirable that the risk associated with a portfolio made of
independent assets is exactly the sum of the risk associated with each individ-
ual asset. Thus, given N independent assets {X1 , . . . , XN }, and the portfolio
SN = X1 + · · · + XN , we would like to have
This property is verified for all cumulants, while it does not hold for centered
moments excepted the variance. In addition, as seen from their definition in
terms of the characteristic function
+∞
ik·X (ik)n
E e = exp Cn , (1.26)
n=1
n!
cumulants Cn of order larger than 2 quantify deviations from the Gaussian law
and therefore measure large risks beyond the variance (equal to the second-
order cumulant).
What are the implications of using the cumulants as almost consistent
measures of risks? In particular, what are the implications on the preferences
of the agents employing such measures? To address this question, it is infor-
mative to express the cumulants as a function of the centered moments. For
instance, let us consider the fourth-order cumulant:
C4 = µ4 − 3 · µ2 2 = µ4 − 3 · C2 2 , (1.27)
the position is certain) but that the agent is indifferent between the large risks
of this position measured by µ4 and the small risks quantified by µ2 .
To summarize, centered moments of even orders possess all the minimal
properties required for a suitable portfolio risk measure. Cumulants only par-
tially fulfill these requirements, but have an additional advantage compared
with the centered moments, that is, they fulfill the condition (1.25). For these
reasons, we think it is interesting to consider both the centered moments and
the cumulants in risk analysis and decision making. Finally let us stress that
the variance, originally used in Markowitz’s portfolio theory [347], is nothing
but the second centered moment, also equal to the second-order cumulant (the
three first cumulants and centered moments are equal). Therefore, a portfo-
lio theory based on the centered moments or on the cumulants automatically
contains Markowitz’s theory as a special case, and thus offers a natural gen-
eralization encompassing large risks of this masterpiece of financial science. It
also embodies several other generalizations where homogeneous measures of
risks are considered, as for instance in [241].
We should also mention the measure of attractiveness for risky invest-
ments, the gain–loss ratio, introduced by Bernardo and Ledoit [50]. The gain
(loss) of a portfolio is the expectation, under a benchmark risk-adjusted prob-
ability measure, of the positive (negative) part of the portfolio’s excess payoff.
The gain–loss ratio constitutes an improvement over the widely used Sharpe
ratio (average return over volatility). The advantage of the gain–loss ratio is
that it penalizes only downside risk (losses) and rewards all upside potential
(gains). The gain–loss ratio has been show to yield useful bounds for asset
pricing in incomplete markets that gives the modeler the flexibility to control
the trade-off between the precision of equilibrium models and the credibility
of no-arbitrage methods. The gain–loss approach is valuable in applications
where the security returns are not normally distributed. Bernardo and Ledoit
[50] cite the following domains of application: (i) valuing real options on non-
traded assets; (ii) valuing executive stock options when the executive cannot
trade the options or the underlying due to insider trading restrictions; (iii)
evaluating the performance of portfolio managers who invest in derivatives;
(iv) pricing options on a security whose price follows a jump-diffusion or a fat-
tailed Pareto–Levy diffusion process; and (v) pricing fixed income derivatives
in the presence of default risk.
The capacity to manage risk, and with it the appetite to take risk
and make forward-looking choices, are key elements [...] that drive the
economic system forward.
The concept of risks in economics and finance is elaborated in [52], starting
with the origins of the Cowles foundation as the consequence of Cowles’s
personal interest in the question: Are stock prices predictable? In the words
of J.L. McCauley (see his customer review on www.amazon.com),
this book is all about heroes and heroic ideas, and Bernstein’s heroes
are Adam Smith, Bachelier, Cowles, Markowitz (and Roy), Sharpe,
Arrow and Debreu, Samuelson, Fama, Tobin, Samuelson, Markowitz,
Miller and Modigliani, Treynor, Samuelson, Osborne, Wells-Fargo
Bank (McQuown, Vertin, Fouse and the origin of index funds), Ross,
Black, Scholes, and Merton. The final heroes (see Chap. 14, The Ul-
timate Invention) are the inventors of (synthetic) portfolio insurance
(replication/synthetic options).
One of these achievements is the capital asset pricing model (CAPM),
which is probably still the most widely used approach to relative asset val-
uation, although its empirical roots have been found to be weaker in recent
years [59, 160, 223, 287, 306, 401]. Its major idea was that priced risk cannot
be diversified and cannot be eliminated through portfolio aggregation. This
asset valuation model describing the relationship between expected risk and
expected return for marketable assets is strongly entangled with the Mean-
Variance Portfolio Model of Markowitz. Indeed, both of them fundamentally
rely on the description of the probability distribution function (pdf) of as-
set returns in terms of Gaussian functions. The mean-variance description is
thus at the basis of the Markowitz portfolio theory and of the CAPM and its
inter-temporal generalization (see for instance [359]).
The CAPM is based on the concept of economic equilibrium between ra-
tional expectation agents. Economic equilibrium is itself the self-organized
result of complex nonlinear feedback processes between competitive inter-
acting agents. Thus, while not describing the specific dynamics of how self-
organization makes the economy converge to a stable regime [10, 18, 280], the
concept of economic equilibrium describes the resulting state of this dynamic
self-organization and embodies all the hidden and complex interactions be-
tween agents with infinite loops of recurrence. This provides a reference base
for understanding risks.
We put some emphasis on the CAPM and its generalized versions because
the CAPM is a remarkable starting point for answering the question on the
origin of risks and returns: in economic equilibrium theory, the two are con-
ceived as intrinsically entangled. In the following, we expand on this class of
explanation before exploring briefly other directions.
Let us now show how an equilibrium model generalizing the original CAPM
[308, 364, 429] can be formulated on the basis of the coherence measures
1.3 Origin of Risk and Dependence 15
adapted to large risks. This provides an “explanation” for risks from the
point of view of the non-separable interplay between agents’ preferences and
their collective organization. We should stress that many generalizations have
already been proposed to account for the fat-tailness of the assets return dis-
tributions, which led to the multimoments CAPM. For instance, Rubinstein
[421], Krauss and Litzenberger [278], Lim [306] and Harvey and Siddique [223]
have underlined and tested the role of the asymmetry in the risk premium
by accounting for the skewness of the distribution of returns. More recently,
Fang and Lai [162] and Hwang and Satchell [241] have introduced a four-
moments CAPM to take into account the leptokurtic behavior of the assets
return distributions. Many other extensions have been presented such as the
VaR-CAPM [3] or the Distributional-CAPM [389]. All these generalizations
become more complicated but unfortunately do not necessarily provide more
accurate predictions of the expected returns.
Let us assume that the relevant risk measure is given by any measure of
fluctuations previously presented that obey the Axioms 6–8 of Sect. 1.2.2.
We will also relax the usual assumption of a homogeneous market to give
to the economic agents the choice of their own risk measure: some of them
may choose a risk measure which puts the emphasis on the small fluctuations,
while others may prefer those which account for the larger ones. In such an
heterogeneous market, we will recall how an equilibrium can still be reached
and why the excess returns of individual stocks remain proportional to the
market excess return, which is the fundamental tenet of CAPM.
For this, we need the following assumptions about the market:
• H1: We consider a one-period market, such that all the positions held at
the beginning of a period are cleared at the end of the same period.
• H2: The market is perfect, i.e., there are no transaction costs or taxes,
the market is efficient and the investors can lend and borrow at the same
risk-free rate µ0 .
Of course, these standard assumptions are to be taken with a grain of salt
and are made only with the goal of obtaining a normative reference theory.
We will now add another assumption that specifies the behavior of the agents
acting on the market, which will lead us to make the distinction between
homogeneous and heterogeneous markets.
The market is said to be homogeneous if all the agents acting on this market
aim at fulfilling the same objective. This means that:
• H3-1: All the agents want to maximize the expected return of their port-
folio at the end of the period under a given constraint of measured risk,
using the same measure of risks ρζ for all of them (the subscript ζ refers
to the degree of homogeneity of the risk measure, see Sect. 1.2).
16 1 On the Origin of Risks and Extremes
In the special case where ρζ denotes the variance, all the agents follow a
Markowitz’s optimization procedure, which leads to the CAPM equilibrium,
as proved by Sharpe [429]. When ρζ represents the centered moments, this
leads to the market equilibrium described in [421]. Thus, this approach allows
for a generalization of the most popular asset pricing in equilibrium market
models.
When all the agents have the same risk function ρζ , whatever ζ may be,
we can assert that they have all a fraction of their capital invested in the same
portfolio Π (see, for instance [333] for the derivation of the composition of
the portfolio), and the remaining in the risk-free asset. The amount of capital
invested in the risky fund only depends on their risk aversion and/or on the
legal margin requirement they have to fulfill.
Let us now assume that the market is at equilibrium, i.e., supply equals
demand. In such a case, since the optimal portfolios can be any linear combi-
nations of the risk-free asset and of the risky portfolio Π, it is straightforward
to show that the market portfolio, made of all traded assets in proportion
of their market capitalization, is nothing but the risky portfolio Π. Thus, as
shown in [333], we can state that, whatever the risk measure ρζ chosen by
the agents to perform their optimization, the excess return of any asset i over
the risk-free interest rate (µ(i) − µ0 ) is proportional to the excess return of
the market portfolio Π over the risk-free interest rate:
where
1
∂ ln ρζ ζ
βζi = , (1.29)
∂wi
w1∗ ,··· ,wN
∗
where w1∗ , . . . , wN
∗
are the optimal allocations of the assets in the following
sense:
⎧
wi ∈[0,1] ρζ ({wi })
⎨ inf
wi = 1 (1.30)
⎩ i≥0
i≥0 i w µ(i) = µ ,
In other words, the set of normalized weights wi∗ define the portfolio with min-
imum risk as measured by any convex5 measure ρζ of risk obeying Axioms 6–8
of Sect. 1.2.2 for a given amount of expected return µ.
When ρζ denotes the variance, we recover the usual β i given by the mean-
variance approach:
Cov(Xi , Π)
βi = . (1.31)
Var(Π)
5
Convexity is necessary to ensure the existence and the unicity of a minimum.
1.3 Origin of Risk and Dependence 17
Thus, the relations (1.28) and (1.29) generalize the usual CAPM formula,
showing that the specific choice of the risk measure is not very important,
as long as it follows the Axioms 6–8 characterizing the fluctuations of the
distribution of asset returns.
Does this result hold in the more realistic situation of an heterogeneous mar-
ket? A market will be said to be heterogeneous if the agents seek to fulfill
different objectives. We thus consider the following assumption:
• H3-2: There exist N agents. Each agent n is characterized by her choice of
a risk measure ρζ (n) so that she invests only in the mean-ρζ (n) efficient
portfolios.
According to this hypothesis, an agent n invests a fraction of her wealth in
the risk-free asset and the remaining in Πn , the mean-ρζ (n) efficient portfolio,
only made of risky assets. Again, the fraction of wealth invested in the risky
fund depends on the risk aversion of each agent, which may vary from one
agent to another.
The composition of the market portfolio Π for such a heterogeneous mar-
ket is found to be nothing but the weighted sum of the mean-ρζ (n) optimal
portfolio Πn [333]:
N
Π= γn Πn , (1.32)
n=1
where γn is the fraction of the total wealth invested in the fund Πn by the
nth agent.
Moreover, for every asset i and for any mean-ρζ (n) efficient portfolio Πn ,
for all n, the following equation holds
so that
with
−1
γn
βi = . (1.37)
n
βni
The empirical inadequacy of the CAPM has led to the development of more
general models of risk and return, such as Ross’s Arbitrage Pricing Theory
(APT) [418]. Quoting Sargent [427],
Ross posited a particular statistical process for asset returns, then de-
rived the restrictions on the process that are implied by the hypothesis
that there exist no arbitrage possibilities.
Like the CAPM, the APT assumes that only non-diversifiable risk is priced.
But it differs from the CAPM by accounting for multiple causes of such risks
and by assuming a sufficiently large number of such factors so that almost
riskless portfolios can be constructed. Reisman recently presented a general-
ization of the APT showing that, under the assumption that there exists no
asymptotic arbitrage (i.e., in the limit of a large number of factors, the market
risk can be decreased to almost zero), there exists an approximate multi-beta
pricing relationship relative to any admissible proxy of dimension equal to the
number of factors [402]. Unlike the CAPM which specifies returns as a linear
function of only systematic risk, the APT is based on the well-known obser-
vations that multiple factors affect the observed time series of returns, such as
industry factors, interest rates, exchange rates, real output, the money sup-
ply, aggregate consumption, investor confidence, oil prices, and many other
variables [414]. However, while observed asset prices respond to a wide variety
of factors, there is much weaker evidence that equities with larger sensitivity
to some factors give higher returns, as the APT requires.
their factor decomposition: they still see, as in the CAPM and APT, a large
return as a reward for taking a high risk. For instance suppose that returns are
found to increase with book/price. Then those stocks with a high book/price
ratio must be more risky than average. This is in a sense the opposite to the
traditional interpretation of a financial professional analyst, who would say
that high book/price indicates a buying opportunity because the stock looks
cheap. In contrast, according to the efficient market theory, a stock, which is
cheap, can only be so because investors think it is risky.
Actually, the relationship between return and risk is not automatically
positive. Diether et al. [124] have recently documented that firms with more
uncertain earnings (as measured by the dispersion of analysts’ forecasts) have
smaller stock returns. As stressed by Johnson [255], this finding is important
because it directly links asset returns with information, but the relation is
apparently in contradiction with standard economic wisdom: the larger the
risks, the smaller the return! Actually, Johnson proposes a simple explanation
reconciling this new anomaly with the standard asset pricing theory, which is
based on the following ingredients: (i) the equity value of the leveraged firm
(i.e., with non-zero debt) is equivalent to a call option on the firm’s value,
following Merton’s model of credit risk [358]; (ii) the dispersion of analysts’
forecasts is a measure of idiosyncratic risk, which is not priced. Then, by
the Black–Merton–Scholes price for the equity-call option, the firm expected
excess return (i.e., relative variation of the equity price) has its risk premium
amplified by a factor reflecting the effective exposure of the equity price to
the real firm value. This factor turns out to decrease with increasing volatility,
because more unpriced risk raises the option value, which has the consequence
of lowering its exposure to priced risks. It is important to stress that this
effect increases with the firm leverage and vanishes if the firm has no debt,
as verified empirically with impressive strength in [255]. This new anomaly is
thus fundamentally due to the impact of the volatility in the option pricing
of the firm equity value in the presence of debt, together with the existence
of a non-priced component of volatility.
The efficient market hypothesis (EMH) has a long history in finance and offers
a mechanism for the trade-off between risk and return [158, 159]. Similarly to
the concept of economic equilibrium, it must be understood as the result of
repetitive feedback interactions between investors, and thus provides a top–
down answer to the question on the origin of risk and return.
dividual in favor of the search for emerging collective behaviors. Three fields
of research highlight this idea and suggest a reconciliation, while enlarging
significantly the perspective of the EMH.
In statistical physics, the fight between order (through the interaction between
elementary constituents of matter) and disorder (modeled by thermal fluctu-
ations) gives rise to the spontaneous occurrence of “spontaneous symmetry
breaking” also called phase transitions in this context [451]. The understand-
ing of the large-scale organization as well as the sudden macroscopic changes
of organization due to small variations of a control parameter has led to power-
ful concepts such as “emergence” [9]: the macroscopic organization has many
properties not shared by its constituents. For the market, this suggests that
its overall properties can only be understood through the study of the trans-
formation from the microscopic level of individual agents to the macroscopic
level of the global market. In statistical physics, this can often be performed
by the very powerful tool called the “renormalization group” [490, 489].
Biology has clearly demonstrated that an organism has greater abilities than
its constituent parts. This is true for multiorgan animals as well as for insect
societies for instance (see E. O. Wilson’s book [488]). More recently, this has
led to the concept of “swarm intelligence” [67, 68, 70, 135]: the collective be-
haviors of (unsophisticated) agents interacting locally with their environment
may cause coherent functional global patterns to emerge. Swarm intelligence
is being used to obtain collective (or distributed) problem solving without cen-
tralized control or the provision of a global model in many practical industrial
applications [69]. The importance of evolution, competition, and ecologies to
understand stock markets has been stressed by Farmer [164].
wealth, and have limited resources and time to dedicate to novel strategies,
and the minority mechanism is found in markets. For an introduction to the
Minority Game see [92, 251] and the Web page on the Minority Game by D.
Challet at www.unifr.ch/econophysics/minority/minority.html. An important
outcome of this work is the discovery of different market regimes, depending
on the value of a control parameter, roughly defined as the ratio of the num-
ber of effective strategies available to agents divided by the total number of
agents. In the minority game, agents choose their strategies according to the
condition of the market so as on average to minimize their chance of being
in the majority. When the “control” parameter is large, the recent history of
the game contains some useful information that strategies can exploit and the
market is not efficient. Below a critical value of the control parameter (i.e.,
for sufficiently many agents), reasonable measures of predictability suggest
that the market is efficient and cannot be predicted. These two phases are
characterized by different risks, which can be quantified as a function of the
control parameter. However, even in the “efficient market” phase, large and
extreme price moves occur, which may be preceded by distinct patterns that
allow agents in some cases to forecast them [289, 7].
deviations, especially those that occurred in October 1987 and wiped out
confidence in the methods of so-called portfolio insurance of Leland-O’ Brien-
Rubinstein Associates. Similarly, the assumptions of near-normal distributions
and stable covariance broke down during the failure of LTMC (Long-Term
Capital Management) in October 1998 [394].
As mentioned above, factor models are nowadays the approaches most often
used for extracting regularities in and for explaining the vagaries of stock mar-
ket prices. Factor models conceptually derive from and generalize the CAPM
and APT models. Factors, which are often invoked to explain prices, are the
overall market factor and the factors related to firm size, firm industry and
book-to-market equity, thought to embody most of the relevant dependence
structure between the studied time series [160, 161]. Indeed, there is no doubt
that observed equity prices respond to a wide variety of unanticipated fac-
tors, but there is much weaker evidence that expected returns are higher for
equities that are more sensitive to these factors, as required by Markowitz’s
mean-variance theory, by the CAPM and the APT [414]. This severe failure
of the most fundamental finance theories could conceivably be attributed to
an inappropriate proxy for the market portfolio, but nobody has been able
to show that this is really the correct explanation. This remark constitutes
the crux of the problem: the factors invoked to model the cross-sectional de-
pendence between assets are not known in general and are either postulated
based on the economic intuition in financial studies, or obtained as black-box
results in the recent analyses using the random matrix theory to large finan-
cial covariance matrices [392, 288]. In other words, explanatory factors emerge
endogenously.
Here, we follow [337] to show that the existence of factors have a natural
bottom-up explanation: they can be seen to result from a collective effect of
the assets, similar to the emergence of a macroscopic self-organization of in-
teracting microscopic constituents. To show this, we unravel the origin of the
large eigenvalues of large covariance and correlation matrices and provide a
complete understanding of the coexistence of features resembling properties
of random matrices and of large “anomalous” eigenvalues. The main insight
here is that, in any large system possessing non-vanishing average correlations
between a finite fraction of all pairs of elements, a self-organized macroscopic
state generically exists. In other words, “explanatory” factors emerge endoge-
nously.
1.3 Origin of Risk and Dependence 25
with multiplicity N − 1 and with ρ ∈ (0, 1) in order for the correlation ma-
trix to remain positive definite. Thus, in the large size limit N → ∞, even
for a weak positive correlation ρ → 0 (with ρN 1), a very large eigen-
value appears, associated with the “delocalized”
√ (i.e., uniformly spread over
all components) eigenvector v1 = (1/ N )(1, 1, · · · , 1), which dominates com-
pletely the correlation structure of the system. This trivial example stresses
that the key point for the emergence of a large eigenvalue is not the strength
of the correlations, provided that they do not vanish, but the large size N of
the system.
This result (1.38) still holds qualitatively when the correlation coefficients
are all distinct. To see this, it is convenient to use a perturbation approach.
We thus add a small random component to each correlation coefficient:
where the coefficients aij = aji have zero mean, variance σ 2 and are inde-
pendently distributed (there are additional constraints on the support of the
distribution of the aij ’s in order for the matrix Cij to remain positive definite
with probability one). The determination of the eigenvalues and eigenevectors
of Cij is performed using the perturbation theory up to the second order in
. We find that the largest eigenvalue satisfies
(N − 1)(N − 2) 2 σ 2
E[λ1 ] = (N − 1)ρ + 1 + · + O(3 ) (1.40)
N2 ρ
while, at the same order, the corresponding eigenvector v1 remains unchanged.
The degeneracy of the eigenvalue λ = 1 − ρ is broken and leads to a complex
set of smaller eigenvalues described below.
In fact, this result (1.40) can be generalized to the non-perturbative do-
main of any correlation matrix with independent random coefficients Cij ,
provided that they have the same mean value ρ and variance σ 2 . Indeed, in
such a case, the expectations of the largest and second largest eigenvalues are
[180]
E[λ1 ] = (N − 1) · ρ + 1 + σ 2 /ρ + o(1) , (1.41)
√
E[λ2 ] ≤ 2σ N + O(N 1/3 log N ) . (1.42)
Moreover, the statistical fluctuations of these
√ two largest eigenvalues are as-
ymptotically (for large fluctuations t > O( N )) bounded by a Gaussian term
according to the following large deviation result
26 1 On the Origin of Risks and Extremes
for some positive constant c1,2 [279]. Numerical simulations of the distribution
of eigenvalues of a random correlation matrix confirm indeed that the largest
eigenvalue is indeed proportional to N , while the bulk of the eigenvalues are
much smaller and are described by a modified semicircle law [357] centered
on λ = 1 − ρ, in the limit of large N .
This result is very different from that obtained when the mean value ρ
vanishes. In such a case, the distribution of eigenvalues of the random matrix
C is given by the semicircle law [357]. However, due to the presence of the ones
on the main diagonal of the correlation matrix C, the center of the circle is not
at the origin but at the point λ = 1. Thus, the distribution of the eigenvalues
of random correlation √matrices with zero mean correlation coefficients is a
semicircle of radius 2σ N centered at λ = 1.
The result (1.41) is deeply related to the so-called friendship theorem
in mathematical graph theory, which states that, in any finite graph such
that any two vertices have exactly one common neighbor, there is one and
only one vertex adjacent to all other vertices [155]. A more heuristic but
equivalent statement is that, in a group of people such that any pair of persons
have exactly one common friend, there is always one person (the “politician”)
who is the friend of everybody. Consider the matrix C with its non-diagonal
entries Cij (i
= j) equal to Bernoulli random variable with parameter ρ, that
is, P r[Cij = 1] = ρ and P r[Cij = 0] = 1 − ρ. Then, the matrix Cij − I,
where I is the unit matrix, becomes nothing but the adjacency matrix of the
random graph G(N, ρ) [279]. The proof of [155] of the “friendship theorem” √
indeed relies on the N -dependence of the largest eigenvalue and on the N -
dependence of the second largest eigenvalue of Cij as given by (1.41) and
(1.42).
Figure 1.1 shows the distribution of eigenvalues of a random correlation
matrix. The inset shows the largest eigenvalue lying at the predicted size
ρN = 56.8, while the bulk of the eigenvalues are much smaller and are de-
scribed by a modified semicircle law centered on λ = 1 − ρ, in the limit of
large N . This result, on the largest eigenvalue emerging from the collective
effect of the cross-correlation between all N (N − 1)/2 pairs, provides a novel
perspective to the observation [40, 413] that the only reasonable explanation
for the simultaneous crash of 23 stock markets worldwide in October 1987
is the impact of a world market factor: according to the results (1.41) and
(1.42) and the view expounded by Fig. 1.1, the simultaneous occurrence of
significant correlations between the markets worldwide is bound to lead to
the existence of an extremely large eigenvalue, the world market factor con-
structed by... a linear combination of the 23 stock markets! What this result
shows is that invoking factors to explain the cross-sectional structure of stock
returns is cursed by the chicken-and-egg problem: factors exist because stocks
are correlated; stocks are correlated because of common factors impacting
them.
1.3 Origin of Risk and Dependence 27
12
12
10
10
8
6 largest
eigenvalue
8 4
2
6 0
0 20 40 60
0
0 0.5 1 1.5 2 2.5 3
Fig. 1.1. Spectrum of eigenvalues of a random correlation matrix with average
correlation√coefficient ρ = 0.14 and standard deviation of the correlation coefficients
σ = 0.345 N : the ordinate is the number of eigenvalues in a bin with value given by
the abscissa. One observes that all eigenvalues except the largest one are smaller than
or equal to ≈ 1.5. The size N = 406 of the matrix is the same as in previous studies
[392] for the sake of comparison. The continuous curve is the theoretical translated
semicircle distribution of eigenvalues describing the bulk of the distribution which
passes the Kolmogorov test. The center value λ = 1 − ρ ensures the conservation
of the trace equal to N . There is no adjustable parameter. The inset represents
the whole spectrum with the largest eigenvalue whose size is in agreement with the
prediction ρN = 56.8. Reproduced from [337]
Empirically [392, 288], a few other eigenvalues below the largest one have an
amplitude of the order of 5–10 that deviate significantly from the bulk of the
distribution. The above analysis provides a very simple mechanism for them,
justifying the postulated model in [373]. The solution consists in considering,
as a first approximation, the block diagonal matrix C with diagonal ele-
ments made of the matrices A1 , · · · , Ap of sizes N1 , · · · , Np with Ni = N ,
constructed according to (1.39) such that each matrix Ai has the average
correlation coefficient ρi . When the coefficients of the matrix C outside the
matrices Ai are zero, the spectrum of C is given by the union of all the spec-
tra of the Ai ’s, which are each dominated by a large eigenvalue λ1,i
ρi · Ni .
28 1 On the Origin of Risks and Extremes
14
15
12
frequency
10
10 largest
5 eigenvalue
frequency
8
0
0 20 40 60
6 λ
0
0 0.5 1 1.5 2 2.5 3
λ
Fig. 1.2. Spectrum of eigenvalues estimated from the sample correlation matrix
of N = 406 time series of length T = 1309. The times series have been constructed
from a multivariate Gaussian distribution with a correlation matrix made of three
block-diagonal matrices of sizes respectively equal to 130, 140, and 136 and mean
correlation coefficients equal to 0.18 for all of them. The off-diagonal elements are
all equal to 0.1. The same results hold if the off-diagonal elements are random. The
inset shows the existence of three large eigenvalues, which result from the three-block
structure of the correlation matrix. Reproduced from [337]
1.3 Origin of Risk and Dependence 29
posterior evolution. Such extreme events express more than anything else the
underlying forces usually hidden by almost perfect balance and thus pro-
vide the potential for a better scientific understanding of complex systems.
These crises have fundamental societal impacts and range from large nat-
ural catastrophes, catastrophic events of environmental degradation, to the
failure of engineering structures, crashes in the stock market, social unrest
leading to large-scale strikes and upheaval, economic drawdowns on national
and global scales, regional power blackouts, traffic gridlocks, diseases and epi-
demics, etc. An outstanding scientific question is how such large-scale patterns
of catastrophic nature might evolve from a series of interactions on the small-
est and increasingly larger scales. In complex systems, it has been found that
the organization of spatial and temporal correlations do not stem, in general,
from a nucleation phase diffusing across the system. It results rather from a
progressive and more global cooperative process occurring over the whole sys-
tem by repetitive interactions, which is partially described by the distributed
correlations at the origin of a large eigenvalue as described above. An instance
would be the many occurrences of simultaneous scientific and technical discov-
eries signaling the global nature of the maturing process. Recent developments
suggest that non-traditional approaches, based on the concepts and methods
of statistical and nonlinear physics coupled with ideas and tools from com-
putation intelligence could provide novel methods in complexity to direct the
numerical resolution of more realistic models and the identification of rele-
vant signatures of large and extreme risks. To address the challenge posed by
the identification and modeling of such outliers, the available theoretical tools
comprise in particular bifurcation and catastrophe theories, dynamical critical
phenomena and the renormalization group, nonlinear dynamical systems, and
the theory of partially (spontaneously or not) broken symmetries. This field
of research is presently very active and is expected to advance significantly
our understanding, quantification, and control of risks.
In the mean time, both practitioners and academics need reliable metrics
to characterize risks and dependences. This is the purpose of the following
chapters, which expose powerful models and measures of large risks and com-
plex dependences between time series.
Appendix
Therefore, the whole pdf cannot provide an adequate measure of risk, which
should be embodied by a single variable. In order to perform a selection among
a basket of assets and construct optimal portfolios, one needs measures given
as real numbers, not functions, which can be ordered according to the natural
ordering of real numbers on the line.
In this vein, Markowitz [347] has proposed to summarize the risk of an
asset by the variance of its returns (or equivalently by the corresponding
standard deviation). It is clear that this description of risks is fully satisfying
only for assets with Gaussian pdfs. In any other case, the variance generally
provides a very poor estimate of the real risk. Indeed, it is a well-established
empirical fact that the pdfs of asset returns have fat tails (see Chap. 2),
so that the Gaussian approximation underestimates significantly the large
price movements frequently observed on stock markets (see Fig. 2.1). Conse-
quently, the variance cannot be taken as a suitable measure of risks, since it
only accounts for the smallest contributions to the fluctuations of the asset’s
returns.
The variance of the return X of an asset involves its second moment E[X 2 ]
and, more precisely,
is equal to its second centered moment (or moment about
2
the mean) E (X − E[X]) . Thus, the weight of a given fluctuation X con-
tributing to the variance of the returns is proportional to its square. Due to
the decay of the pdf of X for large X bounded from above by ∼ 1/|X|1+µ with
µ > 2 (see Chap. 2), the largest fluctuations do not contribute significantly to
this expectation. To increase their contributions, and in this way to account
for the largest fluctuations, it is natural to invoke moments of order n higher
5
x P(x)
4.5
x2 P(x)
4 x4 P(x)
3.5
xn P(x)
3
2.5
2
1.5
1
0.5
0
0 1 2 3 4 5 6 7 8 9 10
x
Fig. 1.3. This figure represents the function xn · e−x for n = 1, 2, and 4 and shows
the typical size of the fluctuations involved in the moment of order n. Reproduced
from [333]
32 1 On the Origin of Risks and Extremes
than 2. The larger n is, the larger is the contribution of the rare and large
returns in the tail of the pdf. This phenomenon is demonstrated in Fig. 1.3,
where we can observe the evolution of the quantity xn ·f (x) for n = 1, 2, and 4,
where f (x), in this example, denotes the density of the standard exponential
distribution e−x . The expectation E[X n ] is then simply represented geomet-
rically as equal to the area below the curve xn · f (x). These curves provide an
intuitive illustration of the fact that the main contributions to the moment
E[X n ] of order n come from values of X in the vicinity of the maximum of
xn · f (x), which increases fast with the order n of the moment we consider,
all the more so, the fatter is the tail of the pdf of the returns X. In addition,
the typical size of the return assessed by the moment of order n is given by
1/p
λn = E [X p ] (which coincide with the Lp norm of X, for positive random
variables). For the exponential distribution chosen to construct Fig. 1.3, the
value of x corresponding to the maximum of xn · f (x) is exactly equal to n,
while λn = ne + O(ln n). Thus, increasing the order of the moment allows one
to sample larger fluctuations of the asset prices.
2
Marginal Distributions of Returns
2.1 Motivations
4
3
2
1
1000 2000 3000 4000 5000 6000 7000
2
10
0
10
−10 −5 0 5 10
10 −10
8 −8
1000
6 −6
4 −4 2000
JPY (r )
2 −2
2
3000
0 0
4000
−2 2
−4 4 5000
−6 6
6000
−8 8
−10 10 7000
−10 −5 0 5 10 0 2 100 200 300
10 10
CHF (r1)
Fig. 2.1. Bivariate distribution of the daily annualized returns ri of the Swiss franc
(CHF) in US $ (i = 1) and of the Japanese yen (JPY) in US $ (i = 2) for the time
interval from January 1971 to October 1998. One-fourth of the data points are rep-
resented for clarity of the figure. The contour lines are obtained by smoothing the
empirical bivariate density distribution and represent equilevels. The outer (respec-
tively middle and inner) line is such that 90% (respectively 50% and 10%) of the
total number of data points fall within it. It is apparent that the data is not described
by an elliptically contoured pdf as it should be if the dependence was prescribed by
a Gaussian (or more generally by an elliptic) distribution. Instead, the contour line
takes the shape of a “bean”. Also shown are the price-time series and the marginal
distributions (in log-linear scales) in the panels at the top and on the side. The
parabolas in thick lines correspond to the best fits by Gaussian distributions. The
thin lines correspond to the best fits by stretched exponentials ∼ exp[−(ri /r0i )ci ]
with exponents c1 = 1.14 for CHF and c2 = 0.8 for JPY. Reproduced from [457]
asset risks which can then be combined with the help of copulas to build the
multivariate risk edifice. The emphasis is put on the determination of the pre-
cise shape of the tail of the distribution of returns of a given asset, which is a
major issue both from a practical and from an academic point of view. Indeed,
for practitioners, it is crucial to accurately estimate the high and low quan-
tiles of the distribution of returns (profit and loss) because they are involved
in almost all the modern risk management methods while from an academic
perspective, many economic and financial theories rely on a specific parame-
terization of the distributions whose parameters are intended to represent the
“macrovariables” influencing the agents.
For the purpose of practical market risk management, one typically needs
to assess tail risks associated with the distribution of returns or profit and
losses. Following the recommendations of the BIS,1 one has to focus on risks
associated with positions held for 10 days. Therefore, this requires to consider
the distributions of 10-day returns. However, at such a large time scale, the
number of (non-overlapping) historical observations dramatically decreases.
Even over a century, one can only collect 2500 data points, or so, per asset.
Therefore, the assessment of risks associated with high quantiles is particularly
unreliable.
Recently, the use of high frequency data has allowed for an accurate es-
timation of the very far tails of the distributions of returns. Indeed, using
samples of one to 10 million points enables one to efficiently calibrate prob-
ability distributions up to probability levels of order 99.9995%. Then, one
can hope to reconstruct the distribution of returns at a larger time scale by
convolution. It is the stance taken by many researchers advocating the use of
Lévy processes to model the dynamics of asset prices [109, 196, and references
therein]. The recent study by Eberlein and Özkan [141] shows the relevance of
this approach, at least for fluctuations of moderate sizes. However, for large
fluctuations, this approach is not really accurate, as shown in Fig. 2.2, which
compares the probability density function (pdf) of raw 60-minute returns of
the Standard & Poor’s 500 index with the hypothetical pdf obtained by 60
convolution iterates of the pdf of the 1-minute returns; it is clear that the
former exhibits significantly fatter tails than the latter.
This phenomenon derives naturally from the fact that asset returns can-
not be merely described by independent random variables, as assumed when
prices are modeled by Lévy processes. In fact, independence is too strong an
assumption. For instance, the no free-lunch condition only implies the absence
of linear time dependence since the best linear predictor of future (discounted)
prices is then simply the current price. Volatility clustering, also called ARCH
effect [150], is a clear manifestation of the existence of nonlinear dependences
1
Bank for International Settlements. The BIS is an international organization
which fosters cooperation among central banks and other agencies in pursuit
of monetary and financial stability. Its banking services are provided exclusively
to central banks and international organizations.
36 2 Marginal Distributions of Returns
103
Raw
Reshuffled
102
101
100
10−1
10−2
−0.025 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02 0.025
Fig. 2.2. Kernel density estimates of the raw 60-minute returns and the density
obtained by 60 convolution iterates of the raw 1-minute returns kernel density for
the Standard & Poor’s 500
between returns observed at different lags. These dependences prevent the use
of convolution for estimating tail risks with sufficient accuracy. Figure 2.2 il-
lustrates the important observation that fat tails of asset return distributions
owe their origin, at least in part, to the existence of volatility correlations. In
the example of Fig. 2.2, a given 60-minute return is the sum of sixty 1-minute
returns. If there was no dependence between these sixty 1-minute returns,
the 60-minute return could be seen as the sum of 60 independent random
variables; hence, its probability density could be calculated exactly by taking
60 convolutions of the probability density of the 1-minute returns. Note that
this 60-fold convolution is equivalent to estimating the density of 60-minute
returns in which their sixty 1-minute returns have been reshuffled randomly
to remove any possible correlation. Figure 2.2 shows a faster decay of the pdf
of these reshuffled 60-minute returns compared with the pdf of the true em-
pirical 60-minute returns. Thus, assessing extreme risks at large time scales (1
or 10 days) by simple convolution of the distribution of returns at time scales
of 1 or of 5 minutes leads to crude approximations and to dramatic underes-
timations of the amount of risk really incurred. The role of the dependence
between successive returns is even more important in times of crashes: very
large drawdowns (amplitudes of runs of losses) have been shown to result from
anomalous transient dependences between a few successive days [249, 250]; as
a consequence, they cannot be explained or modeled by the distribution cal-
ibrated for the bulk (99%) of the rest of the sample of drawdowns. These
extreme events have been termed “outliers”, “kings” [286] or “black swans”
[470].
2.2 A Brief History of Return Distributions 37
Table 2.1. Descriptive statistics for the daily Dow Jones Industrial Average index
returns (from 27, May 1896 to 31, May 2000, sample size n = 28415) calculated
over 1 day and 1 month and for Nasdaq Composite Index returns calculated over 5
minutes and 1 hour (from 8, April 1997 to 29, May 1998, sample size n = 22123)
by Osborne [377] and Samuelson [425], the log-normal paradigm has been
the starting point of many financial theories such as Markowitz’s portfolio
selection method [347], Sharpe’s market equilibrium model (CAPM) [429] or
Black and Scholes rational option pricing theory [60]. However, for real finan-
cial data, the convergence in distribution to a Gaussian law is very slow (see
for instance [72, 88]), much slower than predicted for independent returns.
As shown in Table 2.1, the excess kurtosis (which is zero for a normal dis-
tribution) typically remains large even for monthly returns, testifying (i) of
significant deviations from normality, (ii) of the heavy tail behavior of the
distribution of returns and (iii) of significant time dependences between asset
returns [88].
2.2 A Brief History of Return Distributions 39
100
10−1
Complementary sample DF
10−2
10−3
10−4
10−5
10−5 10−4 10−3 10−2 10−1 100
Absolute log−return, x
Fig. 2.3. Complementary sample distribution function for the Standard & Poor’s
500 30-minute returns over the two decades 1980–1999. The plain (resp. dotted)
line depicts the complementary distribution for the positive (the absolute value of
negative) returns. Reproduced from [330]
if ζ(q) < 1 [27]. The calibration4 of λ2 gives in general very small values in the
range 0.01–0.04 leading to a tail index b in the range 15–50 [366]. This has led
previous workers to conclude that such a large tail exponent is unobservable
with available data sets, and may well be described by other effective laws.
However, Muzy et al. [365] have recently shown that empirical distributions
of log-returns do not give access to the unconditional prediction b ≈ 15–50
with (2.2). This is because the value of q determining the exponent b according
to (2.2) is itself associated with an α (through the Legendre transformation
(2.A.20) and (2.A.21)) for which the multifractal spectrum f (α) defined in
Sect. 2.A.2 is negative. But negative f (α)’s are unobservable.5 Indeed, from
the definition (2.A.19) of f (α), only positive f (α)’s correspond to genuine
fractal dimensions and are thus observable: this is because they correspond to
more than a few points of observations in the limit ∆t T . The key remark of
Muzy et al. [365] is therefore that the observable exponent bobs for an infinite
time series will be the largest positive q such that f (α) ≥ 0:
In the early 1960s, Mandelbrot [339] and Fama [157] presented evidence that
distributions of returns can be well approximated by a symmetric Lévy stable
law with tail index b about 1.7. These estimates of the tail index have recently
been supported by Mittnik et al. [362], and slightly different indices of the
stable law (b = 1.4) were suggested by Mantegna and Stanley [345, 346].
On the other hand, there are numerous evidences of a larger value of the
tail index b ∼= 3 [217, 312, 320, 322, 367]. See also the various alternative
parameterizations in terms of the Student distribution [62, 275], or Pearson
Type-VII distributions [368], which all have an asymptotic power law tail and
are regularly varying. Thus, a general conclusion of this group of authors con-
cerning tail fatness can be formulated as follows: the tails of the distribution
of returns are heavier than a Gaussian tail and heavier than an exponential
tail; they certainly admit the existence of a finite variance (b > 2), whereas
the existence of the third (skewness) and the fourth (kurtosis) moments is
questionable.
These two classes of results are contradictory only on the surface, because
they actually do not apply to the same quantiles of the distributions of re-
turns. Indeed, Mantegna and Stanley [345] have shown that the distribution
of returns of the Standard & Poor’s 500 index can be described accurately
by a Lévy stable law only within a limited range up to about 5 standard
deviations, while a faster decay (approximated by an exponential or a power
law with larger exponent) of the distribution is observed beyond. This almost-
but-not-quite Lévy stable description could explain (at least, in part) the slow
convergence of the distribution of returns to the Gaussian law under time ag-
gregation [72, 451]; and it is precisely outside this range of up to 5 standard
deviations, where the Lévy law does not apply anymore that a tail index b ∼ =3
has been estimated. Indeed, most authors who have reported a tail index b ∼ =3
have used some optimality criteria for choosing the sample fractions (i.e., the
largest values) for the estimation of the tail index. Thus, unlike the authors
supporting stable laws, they have used only a fraction of the largest (positive
tail) and smallest (negative tail) sample values.
It would thus seem that all has been said on the distributions of returns.
However, there are still dissenting views in the literature. Indeed, the class
of regularly varying distributions is not the sole one able to account for the
large kurtosis and fat-tailness of the distributions of returns. Some recent
works suggest alternative descriptions for the distributions of returns. For in-
stance, Gouriéroux and Jasiak [208] claim that the distribution of returns on
the French stock market decays faster than any power law. Cont et al. [108]
2.3 Constraints from Extreme Value Theory 43
restrictive assumption that (i) financial time series are made of independent
and identically distributed returns, and (ii) the corresponding distributions
of returns belong to one of only three possible maximum domains of attrac-
tion.7 However, these assumptions are not fulfilled in general. While Smith’s
results [444] indicate that the dependence of the data does not constitute a
major problem in the limit of large samples, so that volatility clustering of
financial data does not prevent the reliability of EVT, we shall see that it
can significantly bias standard statistical tools for samples of size commonly
used in extreme tails studies. Moreover, the conclusions of many studies are
essentially based on an aggregation procedure which stresses the central part
of the distribution while smoothing and possibly distorting the characteristics
of the tail (whose properties are obviously essential in characterizing the tail
behavior).
The question then arises whether the limitations of these statistical tools
could have led to erroneous conclusions about the tail behavior of the distrib-
utions of returns. In this section, presenting tests performed on synthetic time
series with time dependence in the volatility with both Pareto and Stretched
Exponential (SE) distributions, and on two empirical time series (the daily
returns of the Dow Jones Industrial Average Index over a century (n = 28415
data points) and the 5-minute returns of the Nasdaq Composite index over 1
year from April 1997 to May 1998 (n = 22123 data points)), we exemplify the
fact that the standard generalized extreme value (GEV) estimators can be
quite inefficient due to the possibly slow convergence toward the asymptotic
theoretical distribution and the existence of biases in the presence of depen-
dence between data. Thus, one cannot reliably distinguish between rapidly
and regularly varying classes of distributions. The generalized Pareto distri-
bution (GPD) estimators work better, but still lack power in the presence of
strong dependence. Note that the two empirical data sets used in the illus-
tration below are justified by their similarity with (i) the data set of daily
returns used in [312] particularly, and (ii) the high frequency data used in
[217, 322, 367] among others.
7
Extensions of the asymptotic theory of extreme values to correlated sequences
have been developed by Berman [48, 49] for Gaussian sequences and Loynes [318],
O’Brien [374], Leadbetter [293] and others [369] in the more general context of
stationary sequences satisfying mixing conditions. See also Kotz and Nadarajah
[277] for the limit distribution of extreme values of 2D correlated random vari-
ables. Recently, there is a growing interest in the extreme value theory of strongly
correlated random variables in many areas of science, including applications to
diffusing particles in correlated random potentials [89], to the understanding of
large deviations in spin glass ground state energies [13], to front propagation and
fluctuations [378], fragmentation, binary search tree problem in computer science
[325, 326], to maximal height of growing surfaces [399], to the Hopfield model of
brain learning [75], and so on.
2.3 Constraints from Extreme Value Theory 45
Two limit theorems allow one to study the extremal properties and to deter-
mine the maximum domain of attraction (MDA) of a distribution function in
two forms.
First, consider a sample of N iid realizations X1 , X2 , . . . , XN of a random
∧
variable. Let XN denote the maximum of this sample.8 Then, the Gnedenko
theorem states that, if, after an adequate centering and normalization, the
∧
distribution of XN converges to a non-degenerate distribution as N goes to
infinity, this limit distribution is then necessarily the generalized extreme value
(GEV) distribution defined by
Hξ (x) = exp −(1 + ξ · x)−1/ξ , (2.4)
for some value of the centering parameter µN , scale factor ψN and form para-
meter ξN . The form parameter ξ is of paramount importance for the shape of
the limiting distribution. Its sign determines the three possible limiting forms
of the GEV distribution of maxima (2.4):
1. If ξ > 0 the limit distribution is the (shifted) Fréchet power-like distribu-
tion;
2. If ξ = 0, the limit distribution is the Gumbel (double-exponential) distri-
bution;
3. If ξ < 0, the limit distribution has a support bounded from above.
The determination of the parameter ξ is the central problem of extreme
value analysis. Indeed, it allows one to determine the maximum domain of
attraction of the underlying distribution and therefore its behavior in the tails.
When ξ > 0, the underlying distribution belongs to the Fréchet maximum
domain of attraction and is regularly varying (power-like tail). When ξ = 0, it
belongs to the Gumbel maximum domain of attraction and is rapidly varying
(exponential tail), while if ξ < 0 it belongs to the Weibull maximum domain
of attraction and has a finite right endpoint, which means that there exists a
finite xF such that X ≤ xf with probability one.
8 ∨
Similar results hold for XN = min {X1 , . . . , XN } since min {X1 , . . . , XN } =
− max {−X1 , . . . , −XN }.
46 2 Marginal Distributions of Returns
The usefulness of formula (2.6) for risk assessment purposes seems obvious
as it provides a universal estimation of the Value-at-Risk. If X denotes the
∧
profit and loss, XN represents the largest among N losses. The Value-at-Risk
at confidence level α, denoted by VaRα , is given by the unique solution of:
ψN −ξ
VaRα
µN + (−N ln α) N − 1 . (2.10)
ξN
When the observations are not iid, one can generally replace N by θ ·N , where
θ ∈ [0, 1] is the so-called extremal index [146, 293, 313], related to the size of
the clusters of extremes which may appear when the data exhibit temporal
dependence. Indeed, generally speaking, one can write [146, p.419]:
∧ θ·N
Pr [XN < VaRα ]
Pr [X < VaRα ] = αθ·N , (2.11)
so that
ψ −ξ
VaRα
µ + (−θ · N ln α) − 1 . (2.12)
ξ
The second limit theorem is called after Gnedenko-Pickands-Balkema-de
Haan (GPBH) and its formulation is as follows [146, pp. 152–168] (see also
[451, Chap. 1] for an intuitive exposition). In order to state the GPBH the-
orem, let us define the right endpoint xF of a distribution function F (x) as
xF = sup{x : F (x) < 1}. Let us call the function
where
x
x −1/ξ
G(x | ξ, s) = 1 + ln Hξ =1− 1+ξ· (2.15)
s s
is called the generalized Pareto distribution (GPD). By taking the limit ξ → 0,
expression (2.15) leads to the exponential distribution. The support of the
distribution function (2.15) is defined as follows:
0 x < ∞, if ξ 0
(2.16)
0 x −s/ξ, if ξ < 0 .
There exist two main ways of estimating the form parameter ξ. First, if there is
a sample of maxima (taken from subsamples of sufficiently large size), then one
can fit to this sample the GEV distribution, thus estimating the parameters
by the maximum likelihood method, for instance. Alternatively, one can prefer
the distribution of exceedances over a large threshold given by the GPD (2.15),
whose tail index can be estimated with Pickands’ estimator or by maximum
10
Recall that ξ is the inverse of the tail exponent.
48 2 Marginal Distributions of Returns
generated samples respectively drawn from (i) an asymptotic power law dis-
tribution with tail index b = 3, (ii) a SE distribution, i.e., such that
ln Pr [X ≤ x] ∝ −xc , as x −→ ∞ , (2.21)
with fractional exponent c = 0.7 and (iii) a SE with fractional exponent
c = 0.3. Considering 1000 replications of each of these three samples (made of
10, 000 data each), they show that the estimates of ξ obtained from the distri-
bution of maxima (2.6) are compatible (at the 95% confidence level) with the
theoretical value for the first two distributions (Pareto and SE with c = 0.7)
as soon as the size p of the subsamples, from which the maxima are drawn,
is larger than 10. For the SE with fractional exponent c = 0.3, an average
value ξ larger than 0.2 is obtained even for large subsample sizes (p = 200).
This value is reported to be significantly different from the theoretical value
ξ = 0.0. These results clearly show that the distribution of the maximum
drawn from a SE distribution with c = 0.7 converges quickly toward the the-
oretical asymptotic GEV distribution, while for c = 0.3 the convergence is
very slow. A fast convergence for c = 0.7 is not surprising since, for this value
of the fractional index c, the SE distribution remains close to the exponential
distribution, which is known to converge very quickly to the GEV distribu-
tion [220]. For c = 0.3, the SE distribution behaves, over a wide range, like
the power law (see page 59 hereafter for a theoretical formalization with an
exact embedding of the power law into the SE family). Thus, it is not sur-
prising to obtain an estimate of ξ which remains significantly positive for SE
distributions with small exponents c’s.
Overall, the results reported in [329] are slightly better for the maximum
likelihood estimates obtained from the GPD. Indeed, the bias observed for
the SE with c = 0.3 seems smaller for large quantiles than the smallest biases
reached by the GEV method. Thus, it appears that the distribution of ex-
ceedance converges faster to its asymptotic distribution than the distribution
of maximum. However, while in line with the theoretical values, the standard
deviations are found to be almost always larger than in the previous case,
which testifies of the higher variability of this estimator. Thus, for sample of
sizes of 10, 000 or so – a typical size for most financial samples – the GEV and
GPD maximum likelihood estimates should be handled with care and their
results interpreted with caution due to possibly important bias and statistical
fluctuations. If a small value of ξ seems to allow one to reliably conclude in
favor of a rapidly varying distribution, a positive estimate does not appear
informative, and in particular does not allow one to reject the rapidly varying
behavior of a distribution. Pickands’ estimator does not perform better, in so
far as it is also unable to distinguish between a regularly varying distribution
and a SE with a low fractional exponent [329].
As another example illustrating the very slow convergence to the limit
distributions of the extreme value theory mentioned above, even with very
large samples, let us consider a simulated sample of iid random variables (we
thus fulfill the most basic assumption of extreme values theory, i.e, iid-ness)
50 2 Marginal Distributions of Returns
1.8
1.6
1.4
1.2
1
MLE of ξ
0.8
c=0.3
0.6
0.4
c=0.7
0.2
−0.2
0 2 4 6 8 10 12 14 16
Number of lower threshold k
Fig. 2.4. Maximum likelihood estimates of the GPD form parameter ξ in (2.15)
as a function of the index k of the thresholds Uk defined in the text for stretched-
exponential samples of size 50,000 and their 95% confidence interval. Reproduced
from [329]
2.3 Constraints from Extreme Value Theory 51
Stronger deviations from the correct value ξ = 0 are found for the smaller
thresholds U1 , ..., U6 while the discrepancy abates for larger thresholds Uk ’s
for k > 7. These results occur notwithstanding the huge size of the implied
data set; indeed, the probability Pr (X > U7 ) for c = 0.7 is about 10−9 , so
that in order to obtain a data set of conditional samples from an unconditional
data set of the size studied here (50,000 realizations above U7 ), the size of such
an unconditional sample should be approximately 109 times larger than the
number of “peaks over threshold.” It is practically impossible to have such
a sample. For c = 0.3, the convergence to the theoretical value zero is much
slower and the discrepancy with the correct value ξ = 0 remains even for
the largest financial data sets: for a single asset, the largest data sets, drawn
from high frequency data, are no larger than or of the order of one million
points;11 the situation does not improve for data sets one or two orders of
magnitudes larger as considered in [211], obtained by aggregating thousands
of stocks.12 Thus, although the GPD form parameter should be theoretically
zero in the limit of a large sample for the Weibull distribution, this limit cannot
be reached for any available sample sizes. This is another clear illustration that
a rapidly varying distribution, like the SE distribution, can be mistaken for a
regularly varying distribution for any practical applications.
As we already mentioned, Kearns and Pagan [267] have reported how mislead-
ing could be Hill’s and Pickands’ estimators in the presence of dependence in
data. Focusing on IGARCH processes, they show that the estimated standard
deviations of these estimators increase significantly with respect to the theo-
retical standard deviations derived under the iid assumption. They also find
an important bias. Generalizing these results, the study by Malevergne et al.
[329] shows that the presence of simple Markovian time dependences is suffi-
cient to draw erroneous conclusions from GEV or GPD maximum likelihood
estimates and Pickands estimates as well. Considering Markovian processes
with different stationary distributions including a regularly varying distribu-
tion with the tail index b = 3 and two SEs with fractional exponents c = 0.3
and c = 0.7, they report the presence of a significant downward bias (with
respect to the iid case) in almost every situation for the GPD estimates: the
stronger the dependence (measured by the correlation time varying from 20
to 100), the more important is the bias. At the same time, the empirical val-
ues of the standard deviations remain comparable with those obtained for iid
11
One year of data sampled at the 1-minute time scale gives approximately 1.2 · 105
data points.
12
In this case, another issue arises concerning the fact that the aggregation of
returns from different assets may distort the information and the very structure
of the tails of the probability density functions (pdf), if they exhibit some intrinsic
variability [351].
52 2 Marginal Distributions of Returns
data. The downward bias can be ascribed to the dependence between data.
Indeed, positive dependence yields important clustering of extremes and ac-
cumulation of realizations around some values, which – for small samples –
could (misleadingly) appear as the consequence of the compactness of the
support of the underlying distribution. In other words, for finite samples, the
dependence prevents the full exploration of the tails and creates clusters that
mimic a thinner tail (even if the clusters are all occurring at large values since
the range of exploration of the tail controls the value of ξ).
The situation is different for the GEV estimates which exhibit biases which
can be either upward or downward (with respect to the iid case). For the GEV
estimates, two effects are competing. On the one hand, the dependence cre-
ates a downward bias, as explained above, while, on the other hand, the lack
of convergence of the distribution of maxima toward its GEV asymptotic dis-
tribution results in an upward bias, as observed on iid data (see the previous
section). This last phenomenon is strengthened by the existence of time de-
pendence which leads to decrease the “effective” sample size (the actual size
divided by the correlation time of the time series) and thus slows down the
convergence rate toward the asymptotic distribution even more. Interestingly,
both the GEV and GPD estimators for the Pareto distribution may be utterly
wrong in presence of long-range dependence for any cluster sizes.
The same kind of results are reported for Pickands’ estimator. However,
the estimated standard deviations reported in [329] remain of the same order
as the theoretical ones, contrarily to results reported by [267] for IGARCH
processes. Nonetheless, in both studies, a very significant bias, either positive
or negative, is found, which can lead to misclassify a SE distribution for a
regularly varying distribution. Thus, in presence of dependence, Pickands’
estimator becomes unreliable.
To summarize, the determination of the maximum domain of attraction
with usual estimators does not appear to be a very efficient way to study the
extreme properties of financial time series. Many studies on the tail behav-
ior of the distributions of asset returns have focused on these methods (see
the influential study [312] for instance) and may thus have led to spurious
conclusions. In particular, the fact that rapidly varying distribution functions
may be mistaken for regularly varying distribution functions casts doubts on
the strength of the seeming consensus according to which the distributions of
returns are regularly varying. It also casts doubts on the reliability of EVT
for risk assessment. If an accurate estimation of the shape parameter ξ is
so difficult to reach, how can one hope to obtain trustful estimates of the
Value-at-Risk or expected-shortfall by use of EVT?
As an illustration, let us apply the GEV and GDP estimators to the daily
returns of the Dow Jones Industrial Average Index over the last century and
2.3 Constraints from Extreme Value Theory 53
0.15 0.01
0.1 0.008
0.05 0.006
0.004
0
Daily return
Daily return
0.002
−0.05
0
−0.1
−0.002
−0.15
−0.004
−0.2 −0.006
−0.25 −0.008
−0.3 −0.01
1900 1920 1940 1960 1980 2000 M J J A S O N D J F M A M
Fig. 2.5. Daily returns of the Dow Jones Industrial Average Index from 1900 to
2000 (left panel ) and 5-minute returns of the Nasdaq Composite index over 1 year
from April 1997 to May 1998 (right panel )
to the 5-minute returns of the Nasdaq Composite index over 1 year from April
1997 to May 1998. These two time series are depicted on Fig. 2.5.
For the intraday Nasdaq data, there are two caveats that must be ad-
dressed before any estimation can be made. First, in order to remove the
effect of overnight price jumps, the intraday returns have to be determined
separately for each of 289 days contained in the Nasdaq data. Then, the union
of all these 289 return data sets provide a better global return data set. Sec-
ond, the volatility of intraday data is known to exhibit a U-shape, also called
“lunch effect”, that is, an abnormally high volatility at the beginning and the
end of the trading day compared with a low volatility at the approximate time
of lunch. Such an effect is present in this data set and it is desirable to cor-
rect it. Such a correction has been performed by renormalizing the 5-minute
returns at a given instant of the trading day by the corresponding average
absolute return at the same instant (when the average is performed over the
289 days). We shall refer to this time series as the corrected Nasdaq returns in
contrast with the raw (incorrect) Nasdaq returns and we shall examine both
data sets for comparison.
The daily returns of the Dow Jones also exhibit some non-stationarity.
Indeed, one can observe a clear excess volatility roughly covering the time of
the bubble ending in the October 1929 crash followed by the Great Depres-
sion. To investigate the influence of this non-stationarity, the statistical study
presented below has been performed twice: first with the entire sample, and
then after having removed the period from 1927 to 1936 from the sample. The
results are somewhat different, but on the whole, the conclusions about the
nature of the tail are the same.
Although the distributions of positive and negative returns are known to be
very similar (see for instance [256]), we have chosen to treat them separately.
For the Dow Jones, this gives us 14949 positive and 13464 negative data points
while, for the Nasdaq index, we have 11241 positive and 10751 negative data
points.
54 2 Marginal Distributions of Returns
Table 2.2. Mean values and standard deviations of the maximum likelihood esti-
mates of the parameter ξ for the distribution of the maximum (cf. (2.6)) when data
are grouped in samples of size 20, 40, 200, and 400 and for the generalized pareto
distribution (2.15) for thresholds u corresponding to quantiles 90%, 95%, 99%, and
99.5%
GPD GPD
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995
ξ 0.248 0.247 0.174 0.349 ξ 0.214 0.204 0.250 0.345
Emp Std 0.036 0.053 0.112 0.194 Emp Std 0.041 0.062 0.156 0.223
Theor Std 0.032 0.046 0.096 0.156 Theor Std 0.033 0.046 0.108 0.164
GPD GPD
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995
ξ 0.200 0.289 0.389 0.470 ξ 0.143 0.202 0.229 0.242
Emp Std 0.040 0.058 0.120 0.305 Emp Std 0.040 0.057 0.143 0.205
Theor Std 0.036 0.054 0.131 0.196 Theor Std 0.035 0.052 0.118 0.169
GPD GPD
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995
ξ 0.209 0.229 0.307 0.344 ξ 0.165 0.160 0.210 0.054
Emp Std 0.039 0.052 0.111 0.192 Emp Std 0.039 0.052 0.150 0.209
Theor Std 0.036 0.052 0.123 0.180 Theor Std 0.036 0.050 0.116 0.143
Panel (a) gives the results for the Dow Jones index, panel (b) for the raw Nas-
daq index, and in panel (c) for the Nasdaq index corrected for the “lunch effect.”
Reproduced from [329]
56 2 Marginal Distributions of Returns
Table 2.3. Pickand’s estimates (2.19) of the parameter ξ for the generalized Pareto
distribution (2.15) for thresholds u corresponding to quantiles 90%, 95%, 99% and
99.5% and two different values of the ratio N/k respectively equal to 4 and 10
N/k 10 N/k 10
mean 0.3119 0.0890 –0.3452 0.9413 ξ 0.3462 0.3215 0.9111 –0.3873
emp. Std 0.1523 0.2219 0.8294 1.1352 emp. Std 0.1766 0.1929 0.6983 1.6038
th. Std 0.1883 0.2577 0.5537 0.9549 th. Std 0.1894 0.2668 0.6706 0.7816
N/k 10 N/k 10
ξ 0.2623 0.1583 –0.8781 0.8855 ξ 0.2885 0.1435 1.3734 –0.8395
emp. Std 0.1940 0.3085 0.9126 1.5711 emp. Std 0.2166 0.3220 0.7359 1.5087
th. Std 0.1868 0.2602 0.5543 0.9430 th. Std 0.1876 0.2596 0.7479 0.7824
N/k 10 N/k 10
ξ –0.0878 0.4619 0.0329 0.3742 ξ 0.0877 0.3907 1.4680 0.1098
emp. Std 0.1882 0.2728 0.7561 1.1948 emp. Std 0.1935 0.2495 0.8045 1.2345
th. Std 0.1786 0.2734 0.5722 0.8512 th. Std 0.1822 0.2699 0.7655 0.8172
Panel (a) gives the results for the Dow Jones, panel (b) for the raw Nasdaq data and
panel (c) for the Nasdaq corrected for the “lunch effect.” Reproduced from [329]
2.4 Fitting Distributions of Returns with Parametric Densities 57
c
A(b, c, d, u) x−(b+1) exp − xd if x u > 0
fu (x|b, c, d) = (2.23)
0 if x < u .
The parameter b ranges from minus infinity to infinity while c and d range
from zero to infinity. In the particular case where c = 0, the parameter b also
needs to be positive to ensure the normalization of the probability density
function. The family (2.23) includes several well-known pdfs often used in
different applications. We enumerate them.
1. The Pareto distribution:
Γ (−b, x/d)
Fu (x) = 1 − (2.29)
Γ (−b, u/d)
Links between these different models reveal themselves under specific as-
ymptotic conditions. Very interesting is the behavior of the SE model when
c → 0 and u > 0. In this limit, and provided that
u c
c· → β, as c → 0 , (2.30)
d
where β is a positive constant, the SE model tends to the Pareto model.
Indeed, we can write
c
u c xc−1
u c
x c
c x − uc
c
· xc−1
· exp − c
=c · c exp − · −1 ,
d d d u d u
u c x
β · x−1 exp −c · ln , as c → 0
d u
x
β · x−1 exp −β · ln ,
u
uβ
β β+1 , (2.31)
x
which is the pdf of the Pareto model with tail index β. The condition (2.30)
comes naturally from the properties of the maximum likelihood estimator of
the scale parameter d given by (2.B.53) in Appendix 2.B. It implies that, as
c → 0, the characteristic scale d of the SE model must also go to zero with c
to ensure the convergence of the SE model toward the Pareto model.
The Pareto model with exponent β can therefore be approximated with
any desired accuracy on any finite interval [u, U ], U > u > 0, by the SE
c
model with parameters (c, d) satisfying c ud = β (cf. (2.30), where the ar-
row is replaced by an equality). Although the value c = 0 does not give,
strictly speaking, a SE distribution, the limit c −→ 0 provides any desired ap-
proximation to the Pareto distribution, uniformly on any finite interval [u, U ].
This deep relationship between the SE and PD models allows us to under-
stand why it can be very difficult to decide, on a statistical basis, which of
these models fits the data best.
Another interesting behavior is obtained in the limit b → +∞, where the
Pareto model tends to the exponential model [72]. Indeed, provided that the
scale parameter u of the power law is simultaneously scaled as ub = (b/α)b ,
we can write the tail of the cumulative distribution function of the PD as
ub /(u + x)b which is indeed of the form ub /xb for large x. Then,
ub
x −b
= 1 + α → exp(−αx) for b → +∞ . (2.32)
(u + x)b b
This shows that the exponential model can be approximated with any desired
accuracy on intervals [u, u + A] by the PD model with parameters (β, u)
satisfying ub = (b/α)b , for any positive constant A. Although the value b →
+∞ does not give, strictly speaking, an exponential distribution, the limit u ∝
b −→ +∞ provides any desired approximation to the exponential distribution,
uniformly on any finite interval [u, u + A]. This limit is thus less general than
60 2 Marginal Distributions of Returns
whose density is
x c−1 c
b·c
ln u
x exp −b ln ux , if x u > 0
fu (x|b, c, d) = (2.34)
0, if x < u .
This family of pdf interpolates smoothly between the SE and the Pareto
classes. It recovers the Pareto family for c = 1, in which case the parameter b
is the tail exponent. For c larger than one, the tail of the log-Weibull is thinner
than any Pareto distribution but heavier than any Stretched-Exponential.14 In
particular, when c equals two, the log-normal distribution is retrieved (above
threshold u). For c smaller than one, the tails of the log-Weibull distributions
are even heavier than any regularly varying distribution. It is interesting to
note that in this case the log-Weibull distributions do not belong to the do-
main of attraction of a law of the maximum. Therefore, the standard extreme
values theory cannot apply to such distributions. If it would appear that the
log-Weibull distributions with an index c < 1 provides a reasonable description
of the tails of distributions of returns, this would mean that risk management
methods based upon EVT are particularly unreliable (see below).
It is instructive to fit the two data sets used in Sect. 2.3.4 – i.e. the Dow Jones
daily returns and the Nasdaq 5-minute returns – in addition to a sample of
returns of the Standard & Poor’s 50015 over the two decades 1980–1999 by the
distributions enumerated above (2.23), (2.26–2.29) and (2.34). We will show
that no single parametric representation among any of the cited pdfs fits the
whole range of the data sets. Positive and negative returns will be analyzed
separately, the later being converted to the positive semi-axis. The analysis
14
A generalization of the log-Weibull distributions to the following three-parameter
family also contains the SE family in some formal limit. Consider indeed 1 −
F (x) = exp(−b(ln(1 + x/D))c ) for x > 0, which has the same tail as expression
(2.33). Taking D → +∞ together with b = (D/d)c with d finite yields 1 − F (x) =
exp(−(x/d))c ).
15
The returns on the Standard & Poor’s 500 are calculated at five different time
scales: 1 minute, 5 minutes, 30 minutes, an hour and 1 day.
2.4 Fitting Distributions of Returns with Parametric Densities 61
The figures within parenthesis characterize the goodness of fit: they represent the significance levels with which the considered
model can be rejected. Note that these significance levels are only lower bounds since one or two parameters are fitted. Reproduced
63
from [330]
64 2 Marginal Distributions of Returns
distributions are the best since they are only rejected at the 1-minute time
scale for the Standard & Poor’s 500. The Pareto distribution provides a reli-
able description for time scales larger than or equal to 30 minutes. However,
it remains less accurate than the log-Weibull and the SE distributions, on
average. Overall, it can be noted that the Nasdaq and the 60-minute returns
of the Standard & Poor’s 500 behave very similarly. Let us now analyze each
distribution in more detail.
Pareto Distribution
Figures 2.3 and 2.6 show the complementary sample distribution functions
1 − FN (x) for the Standard & Poor’s 500 index at the 30-minute time scale
and for the daily Dow Jones Industrial Average index, respectively. In Fig. 2.6,
the mismatch between the Pareto distribution and the data can be seen with
the naked eye: even in the tails, one observes a continuous downward curvature
in the double logarithmic diagram, instead of a straight line as would be the
case if the distribution ultimately behaved like a Pareto law. To formalize this
impression, we calculate the ML and AD estimators for each threshold u. For
the Pareto law, the ML estimator is well known to agree with Hill’s estimator.
Indeed, denoting x1 . . . xNu the ordered subsample of values exceeding u
where Nu is the size of this subsample, the Hill maximum likelihood estimate
of the parameter b is [233]
100
10−1
Complementary sample DF
10−2
10−3
10−4
10−5
10−5 10−4 10−3 10−2 10−1 100
Absolute log−return, x
Fig. 2.6. Complementary sample distribution function for the daily returns of the
Dow Jones index over the time period from 1900–2000. The plain (resp. dotted) line
shows the complementary distribution for the positive (resp. the absolute value of
negative) returns. Reproduced from [330]
2.4 Fitting Distributions of Returns with Parametric Densities 65
5 6
4.5
5
4
Hill’s estimate b
3.5
4
Hill estimate b
2.5 3
2
2
1.5
1
1
0.5
0 −4 −3 −2 −1
0 −4 −3 −2 −1
10 10 10 10 10 10 10 10
Lower threshold u Lower threshold u
Fig. 2.7. Hill estimate b̂u as a function of the threshold u for the Dow Jones (left
panel) and for the Standard & Poor’s 1-minute returns(right panel)
−1
1
Nu
x
k
b̂u = log . (2.37)
Nu u
k=1
under the assumption of iid data, but very severely underestimate the true
standard deviation when samples exhibit dependence, as reported by Kearns
and Pagan [267] (see the previous section of this chapter).
Figure 2.7 shows the Hill estimates b̂u as a function of u for the Dow Jones
and for the Standard & Poor’s 500 1-minute returns. Instead of an approx-
imately constant exponent (as would be the case for true Pareto samples),
the tail index estimator, for the Dow Jones, increases until u ∼ = 0.04, beyond
which it seems to slow its growth and oscillates around a value ≈ 3 − 4 up to
the threshold u ∼ = .08. It should be noted that the interval [0, 0.04] contains
99.12% of the sample whereas the interval [0.04, 0.08] contains only 0.64% of
the sample. The behavior of b̂u is very similar for the Nasdaq (not shown).
The behavior of b̂u for the Standard & Poor’s 500 shown on the right panel of
Fig. 2.7 is somewhat different: Hill’s estimate b̂u slows its growth at u ∼ = 0.006,
corresponding to the 95% quantile, then decays until u ∼ = 0.05 (99.99% quan-
tile) and then strongly increases again. Are these slowdowns of the growth
of b̂u genuine signatures of a possible constant well-defined asymptotic value
that would qualify a regularly varying function?
To answer this question, let us have a look at Fig. 2.8 which shows the Hill
estimator b̂u for all data sets (positive and negative branches of the distribu-
tion of returns for the Dow Jones, the Nasdaq and the Standard & Poor’s 500
(SP)) as a function of the index n = 1, 2, . . . , 18 of the quantiles or standard
significance levels q1 , . . . , q18 . Similar results are obtained with the AD esti-
mates. The three branches of the distribution of returns for the Dow Jones
66 2 Marginal Distributions of Returns
0
0 5 10 15 20
Number n of the quantile qn
Fig. 2.8. Hill estimator b̂u for all sets (positive and negative branches of the
distribution of returns for the Dow Jonesc (DJ), Nasdaq (ND) and Standard &
Poor’s 500 (SP)) as a function of the index n = 1, . . . , 18 of the 18 quantiles or
standard significance levels q1 , . . . , q18 given in Table 6.3. The two thick lines (in red)
show the 95% confidence bounds obtained from synthetic time series of 10000 data
points generated with a Student distribution with exponent b = 3.5. Reproduced
from [330]
and the negative tail of the Nasdaq suggest a continuous growth of the Hill
estimator b̂u as a function of n = 1, . . . , 18. However, it turns out that this
apparent growth may be explained solely on the basis of statistical fluctua-
tions and slow convergence to a moderate b-value. Indeed, the two thick lines
show the 95% confidence bounds obtained from synthetic time series of 10000
data points generated with a Student distribution with exponent b = 3.5. It is
clear that the growth of the upper bound can explain the observed behavior
of the b-value obtained for the Dow Jones and Nasdaq data. It would thus be
incorrect to extrapolate this apparent growth of the b-value. However, con-
versely, we cannot conclude with certainty that the growth of the b-value has
been exhausted and that we have access to the asymptotic value. Indeed, large
values of tail indices are for instance predicted by traditional GARCH models
giving b∼ 10–20 [153, 463].
2.4 Fitting Distributions of Returns with Parametric Densities 67
Weibull Distributions
We now present the results of the fits of the same data with the SE distribution
(2.27). The corresponding Anderson-Darling statistics (ADS) are shown in
Table 2.4. The ML-estimates and AD-estimates of the form parameter c are
represented in Table 2.5. Table 2.4 shows that, for the highest quantiles, the
ADS for the SE is the smallest of all ADS, suggesting that the SE is the
best model of all. Moreover, for the lowest quantiles, it is the sole model not
systematically rejected at the 95% level.
The c-estimates are found to decrease when increasing the order q of the
threshold u(q) beyond which the estimations are performed. In addition, sev-
eral c-estimates are found very close to zero. However, this does not auto-
matically imply that the SE model is not the correct model for the data even
for these highest quantiles. Indeed, numerical simulations show that, even
for synthetic samples drawn from genuine SE distributions with exponent c
smaller than 0.5 and whose size is comparable with that of our data, in about
one case out of three (depending on the exact value of c) the estimated value
of c is zero. This a priori surprising result comes from condition (2.B.57) in
Appendix 2.B which is not fulfilled with certainty even for samples drawn for
SE distributions.
Notwithstanding this cautionary remark, note that the c-estimate of the
positive tail of the Nasdaq data equals zero for all quantiles higher than q14 =
0.97%. In fact, in every case, the estimated c is not significantly different
from zero – at the 95% significance level – for quantiles higher than q12 –q14 ,
except for quantile q21 of the negative tail of the Standard & Poor’s 500,
but this value is probably doubtful. In addition, the values of the estimated
scale parameter d, not reported here, are found very small, particularly for
the Nasdaq – beyond q12 = 95% – and the S&P 500 – beyond q10 = 90%. In
contrast, the Dow Jones keeps significant scale factors until q16 –q17 .
These evidences taken all together provide a clear indication on the exis-
tence of a change of behavior of the true pdf of these distributions: while the
bulks of the distributions seem rather well approximated by a SE model, a
distribution with a tail fatter than that of the SE model is required for the
highest quantiles. Actually, the fact that both c and d are extremely small may
be interpreted according to the asymptotic correspondence given by (2.30) and
(2.31) as the existence of a possible power law tail.
At this stage, we can state the following conservative statement: the true
distribution of returns is probably bracketed by a power law, as a lower bound
and a SE as an upper bound. It is therefore particularly interesting to focus
on distributions such as log-Weibull distributions which interpolate between
these two classes in order to obtain – hopefully – a better description of the
data.
68
Table 2.5. Maximum likelihood (MLE) and Anderson-Darling (ADE) estimates of the form parameter c of the Weibull (Stretched-
Exponential) distribution
q10 0.152 (0.010) 0.394 0.159 (0.010) 0.387 0.359 (0.092) 0.631 0.522 (0.099) 0.688 0.304 (0.074) 0.346 0.403 (0.076) 0.387
q11 0.079 (0.012) 0.327 0.091 (0.012) 0.339 0.252 (0.110) 0.515 0.481 (0.120) 0.697 0.231 (0.087) 0.158 0.379 (0.091) 0.337
q12 < 10−8 0.151 < 10−8 0.169 0.039 (0.138) 0.177 0.273 (0.155) 0.275 0.269 (0.111) 0.207 0.357 (0.119) 0.288
q13 < 10−8 0.0793 < 10−8 0.084 0.057 (0.155) 0.233 0.255 (0.177) 0.274 0.253 (0.127) 0.147 0.428 (0.136) 0.465
q14 < 10−8 0.008 < 10−8 0.020 < 10−8 0 0.215 (0.209) 0.194 0.290 (0.150) 0.174 0.448 (0.164) 0.641
q15 < 10−8 0.008 < 10−8 0.008 < 10−8 0 0.103 (0.260) 0 0.379 (0.192) 0.407 0.451 (0.210) 0.863
q16 < 10−8 0.008 < 10−8 0.008 9.6 × 10−8 0 0.064 (0.390) 0 0.398 (0.290) 0.382 0.022 (0.319) 0.110
q17 < 10−8 0.008 < 10−8 0.008 < 10−8 0 0.158 (0.452) 0.224 0.307 (0.346) 0.255 0.178 (0.367) 0.703
q18 < 10−8 0.008 < 10−8 0.008 < 10−8 0 < 10−8 0 2 × 10−8 0 < 10−8 0
q19 0.035 (0.082) 0.007 0.009 (0.032) 0.007 – – – – – – – –
q20 0.111 (0.119) 0.075 0.316 (0.117) 0.007 – – – – – – – –
q21 < 10−8 0.008 0.827 (0.393) 0.900 – – – – – – – –
Log-Weibull Distributions
The parameters b and c of the log-Weibull distribution defined by (2.33) are
estimated with both the maximum likelihood and Anderson-Darling methods
for the 18 standard significance levels q1 , . . . , q18 (given on page 62) for the
Dow Jones and Nasdaq data and up to q21 for the Standard & Poor’s 500
data. The results for the Dow Jones and the Standard & Poor’s 500 are
given in Table 2.6. For both positive and negative tails of the Dow Jones, the
results are very stable for all quantiles lower than q10 : c = 1.09 ± 0.02 and
b = 2.71 ± 0.07. These results reject the Pareto distribution degeneracy c = 1
at the 95% confidence level. Only for the quantiles higher than or equal to
q16 , an estimated value c compatible with the Pareto distribution is found.
Moreover both for the positive and negative Dow Jones tails, one finds that
c ≈ 0.92 and b ≈ 3.6−3.8, suggesting either a possible change of regime or
a sensitivity to “outliers” or a lack of robustness due to a too small sample
size. For the positive Nasdaq tail, the exponent c is found compatible with
c = 1 (the Pareto value), at the 95% significance level, above q11 while b
remains almost stable at b
3.2. For the negative Nasdaq tail, we find that c
decreases almost systematically from 1.1 for q10 to 1 for q18 for both estimators
while b regularly increases from about 3.1 to about 4.2. The Anderson-Darling
distances are significantly better than for the SE and this statistics cannot be
used to conclude neither in favor of nor against the log-Weibull class.
The situation is different for the Standard & Poor’s 500 (1-min). For
the positive tail, the parameter c remains significantly smaller than 1 from
q14 = 97% to q21 except for q19 and q20 . Therefore, it seems that for very small
time scales, the tails of the distribution of returns might be even fatter than a
power law. As stressed in Sect. 2.4.1, when c is less than one, the log-Weibull
distribution does not belong to the domain of attraction of a law of the max-
imum. As a consequence, EVT cannot provide reliable results when applied
to such data, neither from a theoretical point of view nor from a practical
stance (e.g. extreme risk assessment). The conclusions are the same for the
5-minute time scale. For the 30-minute and 60-minute time scales, c remains
systematically less than one for the highest quantiles but this difference ceases
to be significant. In the negative tail, the situation is overall the same.
Dow Jones (1 day) Positive tail Dow Jones (1 day) Negative tail
MLE ADE MLE ADE
c b c b c b c b
q1 5.262 (0.005) 0.000 (0.000) 5.55 0.000 5.085 (0.005) 0.000 (0.000) 5.320 0.000
q2 2.140 (0.009) 0.241 (0.002) 2.25 0.220 2.125 (0.009) 0.211 (0.002) 2.240 0.191
q3 1.790 (0.010) 0.531 (0.005) 1.87 0.510 1.751 (0.010) 0.495 (0.005) 1.800 0.481
q4 1.616 (0.012) 0.830 (0.008) 1.65 0.820 1.593 (0.012) 0.744 (0.008) 1.630 0.735
q5 1.447 (0.012) 1.165 (0.012) 1.47 1.160 1.459 (0.013) 1.022 (0.011) 1.480 1.015
q6 1.339 (0.012) 1.472 (0.017) 1.36 1.473 1.353 (0.013) 1.311 (0.016) 1.370 1.311
q7 1.259 (0.013) 1.768 (0.023) 1.28 1.773 1.269 (0.014) 1.609 (0.022) 1.270 1.610
q8 1.173 (0.013) 2.097 (0.031) 1.17 2.096 1.188 (0.015) 1.885 (0.030) 1.190 1.887
q9 1.125 (0.015) 2.362 (0.043) 1.12 2.358 1.158 (0.017) 2.178 (0.042) 1.150 2.174
q10 1.090 (0.020) 2.705 (0.070) 1.08 2.695 1.087 (0.022) 2.545 (0.069) 1.090 2.545
q11 1.035 (0.022) 2.771 (0.083) 1.03 2.762 1.074 (0.024) 2.688 (0.085) 1.070 2.681
q12 1.047 (0.027) 2.867 (0.105) 1.04 2.857 1.068 (0.029) 2.880 (0.111) 1.050 2.857
q13 1.046 (0.030) 2.960 (0.121) 1.03 2.933 1.067 (0.032) 2.900 (0.125) 1.080 2.924
q14 1.044 (0.034) 3.000 (0.142) 1.03 2.976 1.132 (0.038) 3.171 (0.158) 1.120 3.155
q15 1.090 (0.043) 3.174 (0.184) 1.09 3.165 1.163 (0.047) 3.439 (0.209) 1.180 3.472
q16 1.085 (0.059) 3.424 (0.280) 1.09 3.425 1.025 (0.056) 3.745 (0.322) 1.010 3.731
q17 1.093 (0.066) 3.666 (0.345) 1.09 3.650 1.108 (0.069) 3.822 (0.380) 1.120 3.891
q18 0.935 (0.071) 3.556 (0.411) 0.902 3.484 0.921 (0.071) 3.804 (0.461) 0.933 3.846
S&P 500 (1 min) Positive tail S&P 500 (1 min) Negative tail
MLE ADE MLE ADE
c b c b c b c b
q1 3.261 (0.003) 0.029 (0.000) 3.298 0.027 3.232 (0.003) 0.030 (0.000) 3.264 0.028
q2 1.875 (0.002) 0.433 (0.001) 1.878 0.410 1.884 (0.002) 0.420 (0.001) 1.881 0.399
q3 1.645 (0.002) 0.723 (0.001) 1.642 0.690 1.647 (0.002) 0.707 (0.001) 1.641 0.676
q4 1.471 (0.002) 1.017 (0.001) 1.477 0.970 1.465 (0.002) 1.000 (0.001) 1.470 0.954
q5 1.414 (0.002) 1.277 (0.002) 1.405 1.233 1.411 (0.002) 1.251 (0.002) 1.401 1.208
q6 1.382 (0.002) 1.512 (0.002) 1.387 1.477 1.383 (0.002) 1.477 (0.002) 1.389 1.443
q7 1.233 (0.002) 1.862 (0.003) 1.234 1.811 1.232 (0.002) 1.823 (0.003) 1.239 1.776
q8 1.187 (0.002) 2.155 (0.005) 1.192 2.116 1.192 (0.002) 2.117 (0.005) 1.196 2.079
q9 1.112 (0.002) 2.508 (0.007) 1.111 2.470 1.113 (0.002) 2.455 (0.007) 1.112 2.415
q10 1.069 (0.003) 2.876 (0.011) 1.078 2.896 1.062 (0.003) 2.818 (0.011) 1.074 2.831
q11 1.048 (0.003) 2.961 (0.014) 1.066 3.016 1.055 (0.003) 2.927 (0.014) 1.069 2.972
q12 1.016 (0.004) 3.048 (0.018) 1.033 3.123 1.015 (0.004) 3.006 (0.017) 1.034 3.076
q13 1.002 (0.004) 3.063 (0.020) 1.021 3.151 1.001 (0.004) 3.033 (0.020) 1.020 3.115
q14 0.981 (0.005) 3.054 (0.023) 1.003 3.153 0.990 (0.005) 3.033 (0.023) 1.012 3.134
q15 0.961 (0.006) 3.015 (0.027) 0.985 3.133 0.978 (0.006) 3.004 (0.027) 1.003 3.132
q16 0.941 (0.008) 2.867 (0.036) 0.961 2.980 0.937 (0.008) 2.871 (0.037) 0.957 2.987
q17 0.937 (0.010) 2.798 (0.040) 0.951 2.899 0.927 (0.010) 2.780 (0.041) 0.947 2.887
q18 0.902 (0.011) 2.649 (0.046) 0.902 2.677 0.925 (0.012) 2.644 (0.046) 0.940 2.726
q19 0.994 (0.028) 2.256 (0.084) 0.971 2.201 0.962 (0.027) 2.134 (0.080) 0.923 2.063
q20 0.999 (0.039) 2.245 (0.118) 0.967 2.139 1.011 (0.040) 2.037 (0.107) 0.933 1.879
q21 0.949 (0.083) 2.686 (0.330) 0.957 2.801 1.288 (0.115) 3.387 (0.455) 1.234 3.272
The numbers in parenthesis give the standard deviations of the estimates. Repro-
duced from [330]
2.4 Fitting Distributions of Returns with Parametric Densities 71
One can go further and ask which of these models are sufficient to describe
the data compared with the comprehensive distribution (2.23) encompassing
all of them. Here, the four distributions (2.26–2.29) are compared with the
comprehensive distribution (2.23) using Wilks’ theorem [485] on maximum
likelihood ratios, which allows to compare nested hypotheses. It will be shown
that the Pareto and the SE models are the most parsimonious. We then turn
to a direct comparison of the best two-parameter models (the SE and log-
Weibull models) with the best one-parameter model (the Pareto model), which
will require an extension of Wilks’ theorem derived in Appendix 2.D. This
extension allows us to directly test the SE model against the Pareto model.
max L(CD, X, Θ)
Λ = 2 log , (2.39)
max L(z, X, θ)
has asymptotically (as the size N of the sample X tends to infinity) the χ2 -
distribution. Here L denotes the likelihood function, θ and Θ are parametric
spaces corresponding to hypotheses z and CD (comprehensive distribution
defined in (2.23)) correspondingly (hypothesis z is one of the four hypotheses
(2.26–2.29) that are particular cases of the CD under some parameter re-
strictions recalled in Sect. 2.4.1). The statement of the theorem is valid under
the condition that the sample X obeys the hypothesis z for some particular
value of its parameter belonging to the space θ. The number of degrees of
freedom of the χ2 -distribution is equal to the difference of the dimensions
of the two spaces Θ and θ. We have dim(Θ) = 3, dim(θ) = 2 for the SE
and for the incomplete Gamma distributions while dim(θ) = 1 for the Pareto
and the Exponential distributions. This leads to one degree of freedom of the
χ2 -distribution for the two former cases and two degrees of freedom of the
χ2 -distribution for the later models. The maximum of the likelihood in the
numerator of (2.39) is taken over the space Θ, whereas the maximum of the
likelihood in the denominator of (2.39) is taken over the space θ. Since we
have always θ ⊂ Θ, the likelihood ratio is always larger than 1, and the log-
likelihood ratio is non-negative. If the observed value of Λ does not exceed
some high-confidence level (say, 99% confidence level) of the χ2 , we then re-
ject the hypothesis CD in favor of the hypothesis z, considering the space Θ
redundant. Otherwise, we accept the hypothesis CD, considering the space θ
insufficient.
The double log-likelihood ratios (2.39) are shown for the positive and neg-
ative branches of the distribution of returns in Fig. 2.9 for the Nasdaq Com-
posite index. Similar results (not shown) are obtained for the Dow Jones and
the Standard & Poor’s 500 (1, 5, 30 and 60 minutes) indices.
72 2 Marginal Distributions of Returns
100
90
Wilks statistic (doubled log−likelihood ratio)
80
70
60
50
40
30
20
10
0
0 0.5 1 1.5 2 2.5 3
Lower threshold, u −3
x 10
80
70
Wilks statistic (doubled log−likelihood ratio)
60
50
40
30
20
10
0
0 0.5 1 1.5 2 2.5 3
Lower threshold, u −3
x 10
Fig. 2.9. Wilks statistic for the comprehensive distribution versus the four para-
metric distributions: Pareto (), Stretched-Exponential (∗), Exponential, (◦) and
incomplete Gamma () for the Nasdaq 5-minute returns. The upper (lower) panel
refers to the positive (negative) returns. The horizontal lines represent the critical
values at the 95% confidence level of the test for the χ2 -distribution with one (lower
line) and two (upper line) degrees of freedom. Reproduced from [330]
2.4 Fitting Distributions of Returns with Parametric Densities 73
For the Nasdaq data, Figure 2.9 clearly shows that the Exponential dis-
tribution is completely insufficient: for all lower thresholds, the Wilks log-
likelihood ratio exceeds the critical value corresponding to the 95% level of
the χ21 function. The Pareto distribution is insufficient for thresholds corre-
sponding to quantiles less than q11 = 92.5% and becomes comparable with the
comprehensive distribution beyond. It is natural that the families with two
parameters, the incomplete Gamma and the SE, have higher goodness-of-fit
than the one-parameter Exponential and Pareto distributions. The incomplete
Gamma distribution is comparable with the comprehensive distribution be-
yond quantile q10 = 90%, whereas the SE is somewhat better beyond quantile
q8 = 70%. For the tails representing 7.5% of the data, all parametric families
except for the Exponential distribution fit the sample distribution with almost
the same efficiency according to this test.
The results obtained for the Dow Jones data are similar. The SE is com-
parable with the comprehensive distribution starting with q8 = 70%. On the
whole, one can say that the SE distribution performs better than the three
other parametric families.
The situation is somewhat different for the Standard & Poor’s 500 index.
For the positive tail, none of the four distributions is really sufficient in order
to accurately describe the data. The comprehensive distribution is overall
the best. In the negative tail, we retrieve a behavior more similar to that
observed in the two previous cases, except for the Exponential distribution
which also appears to be better than the comprehensive distribution. However,
it should be noted that the comprehensive distribution is only rejected in the
very far tail. The four models (2.26–2.29) are better than the comprehensive
distribution only for the two highest quantiles (q20 and q21 ) of the negative
tail. In contrast, the Pareto, SE and incomplete Gamma models are better
than the comprehensive distribution over the 10 highest quantiles (or so) for
the Nasdaq and the Dow Jones.
We should stress again that each log-likelihood ratio, so-to say “acts on its
own ground” that is, the corresponding χ2 -distribution is valid under the as-
sumption of the validity of each particular hypothesis whose likelihood stands
in the numerator of the double log-likelihood (2.39). It would be desirable to
compare all combinations of pairs of hypotheses directly, in addition to com-
paring each of them with the comprehensive distribution. Unfortunately, the
Wilks theorem cannot be used in the case of pair-wise comparison because the
problem is no more that of comparing nested hypothesis (i.e., one hypothe-
sis is a particular case of the comprehensive model). As a consequence, the
previous results on the comparison of the relative merits of each of the four
distributions using the generalized log-likelihood ratio should be interpreted
with care, in particular, in a case of contradictory conclusions. Fortunately, the
main conclusion of the comparison (an advantage of the SE distribution over
the three other distributions) does not contradict the earlier results discussed
above.
74 2 Marginal Distributions of Returns
Let us compare formally the descriptive power of the SE distribution and the
log-Weibull distribution (the two best two-parameter models qualified until
now) with that of the Pareto distribution (the best one-parameter model).
For the comparison of the log-Weibull model versus the Pareto model, Wilks’
theorem can still be applied since the log-Weibull distribution encompasses
the Pareto distribution. A contrario, the comparison of the SE versus the
Pareto distribution should in principle require that we use the methods for
testing non-nested hypotheses [209], such as the Wald encompassing test or
the Bayes factors [266]. Indeed, the Pareto model and the (SE) model are
not, strictly speaking, nested. However, as shown in Sect. 2.4.1, the Pareto
distribution is a limited case of the SE distribution, as the fractional exponent
c goes to zero. Changing the parametric representation of the (SE) model into
b
x c
f (x|b, c) = b u−c xc−1 exp − −1 , x>u, (2.40)
c u
c
i.e., setting b = c · ud , where the parameter d refers to the former (SE)
representation (2.27), Appendix 2.D shows that the doubled log-likelihood
ratio
maxb,c LSE
W = 2 log (2.41)
maxb LP D
still follows Wilks’ statistic, namely is asymptotically distributed according to
a χ2 -distribution, with one degree of freedom in the present case. Thus, even
in this case of non-nested hypotheses, Wilks’ statistic still allows us to test
the null hypothesis H0 according to which the Pareto model is sufficient to
describe the data.
Concerning the comparison between the Pareto model and the SE one, the
null hypothesis is found to be more often rejected for the Dow Jones than for
the Nasdaq and the Standard & Poor’s 500 [330]. Indeed, beyond the quantile
q12 = 95%, the Pareto model cannot be rejected in favor of the SE model at
the 95% confidence level for the Nasdaq and the Standard & Poor’s 500 data.
For the Dow Jones, one must consider quantiles higher than q16 = 99% – at
least for the negative tail – in order not to reject H0 at the 95% significance
level. These results are in qualitative agreement with what we could expect
from the action of the central limit theorem: the power law regime (if it really
exists) is pushed back to higher quantiles due to time aggregation (recall that
the Dow Jones data is at the daily scale while the Nasdaq data is at the
5-minute time scale).
It is, however, more difficult to rationalize the fact reported in [330] that
the SE model is not rejected (at the 99% confidence level) for the two highest
quantiles (q20 = 99.95% and q21 = 99.99%) of the negative tail of the 1 minute
2.4 Fitting Distributions of Returns with Parametric Densities 75
returns of the Standard & Poor’s 500 and for the quantiles q19 = 99.9% and
q20 = 99.95% for its positive tail. This might be ascribed to a lack of power
of the test, but recall that we have restricted our investigation to empirical
quantiles with more than a hundred points (or so). Therefore, invoking a lack
of power is not very convincing. In addition, for these high quantiles, the
fractional exponent c in the SE model becomes significantly different from
zero (see Table 2.5). It could be an empirical illustration of the existence of
a cut-off beyond which the power law regime is replaced by an exponential
(or stretched-exponential) decay of the distribution function as suggested by
Mantegna and Stanley [344] and by the recent model [493] based upon a pure
jump Lévy process, whose jump arrival rate obeys a power law dampened
by an exponential function. To strengthen this idea, it can be noted that the
exponential distribution is found sufficient to describe the distributions of the
1 minute returns of the Standard & Poor’s 500, while it is always rejected
(with respect to the comprehensive distribution) for the Nasdaq and the Dow
Jones. Thus, this non-rejection could really be the genuine signature of a cut-
off beyond which the decay of the distribution is faster than any power law.
However, this conclusion is only drawn from the one hundred most extreme
data points and, therefore, should be considered with caution. Larger samples
should be considered to obtain a confirmation of this intuition. Unfortunately,
samples with more than 10 million (non zero) data points (for a single asset)
are not yet accessible.
Based upon the study of [330], Wilks’ test for the Pareto distribution
versus the log-Weibull distribution shows that, for quantiles above q12 , the
Pareto distribution cannot be rejected in favor of the log-Weibull for the Dow
Jones, the Nasdaq and the Standard & Poor’s 500 30-minute returns. This
parallels the lack of rejection of the Pareto distribution against the SE beyond
the significance level q12 . The picture is different for the 1-minute returns of
the Standard & Poor’s 500. The Pareto model is almost always rejected. The
most interesting point is the following: in the negative tail, the Pareto model
is always strongly rejected except for the highest quantiles. Comparing with
Table 2.6, one clearly sees that between q15 and q18 the exponent c is signif-
icantly (at the 95% significance level) less than one, indicating a tail fatter
than any power law. On the contrary, for q21 , the exponent c is found signif-
icantly larger than one, indicating a change of regime and again an ultimate
decay of the tail of the distribution faster than any power law.
In summary, the null hypothesis that the true distribution is the Pareto
distribution is strongly rejected until quantiles 90–95% or so. Thus, within
this range, the Stretched-Exponential and log-Weibull models seem the best
and the Pareto model is insufficient to describe the data. But, for the very
highest quantiles (above 95%–98%), one cannot reject any more the hypothe-
sis that the Pareto model is sufficient compared with the SE and log-Weibull
models. These two parameter models can then be seen as a redundant para-
meterization for the extremes compared with the Pareto distribution, except
for the returns calculated at the smallest time scales.
76 2 Marginal Distributions of Returns
2.5.1 Summary
This chapter has revisited the generally accepted fact that the tails of the
distributions of returns present a power-like behavior. Often, the conviction
of the existence of a power-like tail is based on the Gnedenko theorem stating
the existence of only three possible types of limit distributions of normalized
maxima (a finite maximum value, an exponential tail, and a power-like tail)
together with the exclusion of the first two types by empirical evidence. The
power-like character of the tails of the distribution of log-returns follows then
simply from the power-like distribution of maxima. However, in this chain
of arguments, the conditions needed for the fulfillment of the correspond-
ing mathematical theorems are often omitted and not discussed properly. In
addition, widely used arguments in favor of power law tails invoke the self-
similarity of the data but are often assumptions rather than experimental
evidence or consequences of economic and financial laws.
Sharpening and generalizing the results obtained by Kearns and Pagan
[267], Sect. 2.3.3 has recalled that standard statistical estimators of heavy
tails are much less efficient than often assumed and cannot in general clearly
distinguish between a power law tail and a SE tail (even in the absence of
long-range dependence in the volatility). So, in view of the stalemate reached
with the nonparametric approaches and in particular with the standard ex-
treme value estimators, resorting to a parametric approach appears essential.
The parametric approach is useful to decide which class of extreme value
distributions – rapidly versus regularly varying – accounts best for the em-
pirical distributions of returns at different time scales. However, here again,
the problem is not as straightforward as its appears. Indeed, in order to apply
statistical methods to the study of empirical distributions of returns and to de-
rive their resulting implication for risk management, it is necessary to keep in
mind the existence of necessary conditions that the empirical data must obey
for the conclusions of the statistical study to be valid. Maybe the most impor-
tant condition in order to speak meaningfully about distribution functions is
the stationarity of the data, a difficult issue that we have barely touched upon
here. In particular, the importance of regime switching is now well established
[14, 397] and its possible role should be assessed and accounted for.
The results that standard statistical estimators of heavy tails are much less
efficient than often assumed and cannot in general clearly distinguish between
a power law tail and a SE tail, can be rationalized by the fact that, into a cer-
tain limit, the Stretched-Exponential pdf tends to the Pareto distribution (see
(2.30–2.31) and Appendix 2.B). Thus, the Pareto (or power law) distribution
2.5 Discussion and Conclusions 77
Table 2.7. Best parameters c and d of the Stretched-Exponential model and best
parameter b of the Pareto model estimated beyond quantile q12 = 95% for the Dow
Jones (DJ), the Nasdaq (ND) and the Standard & Poor’s 500 (SP) indices. The
apparent Pareto exponent c(u(q12 )/d)c (see expression (2.30)) is also shown
SP pos. returns (5min) 0.033 (0.031) 3.06 × 10−59 2.95 2.95 (0.03)
SP neg. returns (5min) 0.033 (0.031) 3.26 × 10−56 2.87 2.86 (0.03)
78 2 Marginal Distributions of Returns
Note also that the exponents c are larger for the daily Dow Jones data
than for the 5-minute Nasdaq data and the 1-minute and 5-minute Standard
& Poor’s 500 data, in agreement with an expected (slow) convergence to the
Gaussian law according to the central limit theory.18 However, a t-test does
not allow one to reject the hypothesis that the exponents c remain the same for
the positive and negative tails of the Dow Jones data. This confirms previous
results, for instance [319, 256] according to which the extreme tails can be
considered as symmetric, at least for the Dow Jones data. In contrast, there is
a very strong asymmetry for the 5-minute sampled Nasdaq and the Standard
& Poor’s 500 data.
These are the evidences in favor of the existence of an asymptotic power
law tail. Balancing this view, many of the tests have shown that the power
law model is not as powerful compared with the SE and log-Weibull models,
even arbitrarily far in the tail (as far as the available data allows us to probe).
In addition, for the smallest time scales, the tail of the distribution of return
is, over a large range, well-described by a log-Weibull distribution with an
exponent c less than one, i.e., is fatter than any power law. A change of
regime is ultimately observed and the very extreme tail decays faster than any
power law. Both a SE or a log-Weibull model with exponent c > 1 provide a
reasonable description.
Attempting to wrap up the different results obtained by the battery of
tests presented here, we can offer the following conservative conclusion: it
seems that the tails of the distributions examined here are decaying faster
than any (reasonable) power law but slower than any Stretched-Exponentials.
Maybe log-normal distributions could offer a better effective description of
the distribution of returns,19 as suggested in [436].
In sum, in the most practical case, the Pareto distribution is sufficient
above quantiles q12 = 95% but is not stable enough to ascertain with strong
confidence an asymptotic power law nature of the pdf.
time series beyond the range provided by the empirical reconstruction of the
distributions. For risk management, the determination of the tail of the dis-
tribution is crucial. Indeed, many risk measures, such as the Value-at-Risk or
the expected-shortfall, are based on the properties of the tail of the distrib-
utions of returns. In order to assess risk at probability levels of 95% or so,
nonparametric methods have merits. However, in order to estimate risks at
high probability level such as 99% or larger, nonparametric estimations fail by
lack of data and parametric models become unavoidable. This shift in strat-
egy has a cost and replaces sampling errors by model errors. The considered
distribution can be too thin-tailed as when using normal laws, and risk will be
underestimated, or it can be too fat-tailed and risk will be overestimated as
with Lévy law and possibly with regularly varying distributions. In each case,
large amounts of money are at stake and can be lost due to a too conservative
or too optimistic risk measurement.
In order to bypass these problems, many authors [34, 313, 355, among
others] have proposed to estimate the extreme quantiles of the distributions
in a semiparametric way, which allows one (i) to avoid the model errors and (ii)
to limit the sampling errors with respect to nonparametric methods and thus
to keep a reasonable accuracy in the estimation procedure. For this aim, it has
been suggested to use the extreme value theory.20 However, as emphasized in
Sect. 2.3.3, estimates of the parameters of such (GEV or GPD) distributions
can be very unreliable in the presence of dependence, so that these methods
finally appear to be not very accurate and one cannot avoid a parametric
approach for the estimation of the highest quantiles.
The above analysis suggests that the Paretian paradigm leads to an overes-
timation of the probability of large events and therefore leads to the adoption
of too conservative positions. Generalizing to larger time scales, the overly
pessimistic view of large risks deriving from the Paretian paradigm should be
all the more revised, due to the action of the central limit theorem. The above
comparison between several models, which turn out to be almost undistin-
guishable such as the Stretched-Exponential, the Pareto and the log-Weibull
distributions, offers the important possibility of developing scenarios that can
test the sensitivity of risk assessment to errors in the determination of para-
meters and even more interesting with respect to the choice of models, often
referred to as model errors.
Finally, an additional note of caution is in order. This chapter has fo-
cused on the marginal distributions of returns calculated at fixed time scales
and thus neglects the possible occurrence of runs of dependencies, such as
in cumulative drawdowns. In the presence of dependencies between returns,
and especially if the dependence is nonstationary and increases in time of
stress, the characterization of the marginal distributions of returns is not suf-
ficient. As an example, Johansen and Sornette [249] (see also Chap. 3 of [450])
20
See, for instance, http://www.gloriamundi.org for an overview of the extensive
application of EVT methods for VaR and expected-shortfall estimation.
80 2 Marginal Distributions of Returns
have recently shown that the recurrence time of very large drawdowns cannot
be predicted from the sole knowledge of the distribution of returns and that
transient dependence effects occurring in times of stress make very large draw-
downs more frequent, qualifying them as abnormal “outliers” (other names
are “kings” or “black swans”).
Appendix
for some real-valued function ζ(·) such that ζ(0) = 0. Considering the relation
between the q-order moments of δl X(t) and δL X(t), (2.A.2) generalizes as
follows
ζ(q)
l
M (q, l) = M (q, L) . (2.A.8)
L
Assuming that W follows a log-normal law with parameters (µ, λ2 ), the magni-
tude ω admits a density at scale ln , Qln (ω), which satisfies the simple equation
Qln (ω) = ϕ(µ, λ2 )∗n ∗ QL (ω) (2.A.12)
where ∗ is the convolution product and ϕ(µ, λ2 ) denotes the Gaussian density
function with mean µ and variance λ2 . Going back to the original variable δX,
the previous equation provides us with the expression of the density function
of δX at scale ln
Pln (x) = Gln ,L (u)e−u PL (e−u x)du , (2.A.13)
where
Appendix 83
Cω (τ, l) = Cov(ω(t, l), ω(t + τ, l)) ∝ −λ2 ln(τ /T ) , for l < τ < T ,
(2.A.15)
which is proportional to the logarithm of the lag τ . The parameter T is called
the “integral time scale” and is such that Cω (τ, l) is exactly 0 for τ > T .
Mandelbrot et al. [341] have proposed a very simple way to obtain a multi-
fractal process with suitable properties for the modeling of asset returns. In
its simplest form, it is based upon the subordination of a Brownian motion
by a multifractal process. Indeed, considering the price process {P (t)}t≥0 , the
logarithm of the price
is assumed to be defined by
Definition
The MRW is a stochastic volatility model which has exact multifractal prop-
erties, is invariant under continuous dilations, and possesses stationary incre-
ments. It is constructed so as to mimic the crucial logarithmic dependence
(2.A.15) of the magnitude correlation function, at the basis of multifractality
in cascade processes.
The MRW is constructed as the continuous limit for ∆t → 0 of the dis-
cretized version X∆t (using a time discretization step ∆t) defined by adding
up t/∆t random variables:
t/∆t
X∆t (t) = δX∆t [k] . (2.A.27)
k=1
where ω∆t [k] is the logarithm of the stochastic variance. ∆t is a Gaussian
white noise independent of ω and of variance σ 2 ∆t.23
Following the cascade model, ω∆t is a Gaussian stationary process whose
covariance reads
where ρ∆t is chosen in order to mimic the correlation structure (2.A.15) ob-
served in cascade models:
T
for |k| ≤ T /∆t − 1
ρ∆t [k] = (|k|+1)∆t (2.A.30)
1 otherwise
In order for the variance of X∆t (t) to converge when ∆t → 0, one must choose
the mean of the process ω∆t such that [27]
Multifractal Spectrum
Since, by construction, the increments of the model are stationary, the pdf of
X∆t (t + l) − X∆t (t) does not depend on t and is the same as that of X∆t (l). In
23
Introducing an asymmetric dependence between ω and the noise allows one
to account for the Leverage effect [170] while preserving the scale invariance
properties of the MRW, but forbids the existence of a limit as ∆t −→ 0 [388].
86 2 Marginal Distributions of Returns
[27], it was proven that the moments of X(l) ≡ X∆t→0+ (l) can be expressed
as
l
σ 2p (2p)! l 2
E(X(l)2p ) = p
du 1 ... dup ρ(ui − uj )4λ , (2.A.32)
2 p! 0 0 i<j
where ρ is defined by
T /|t| for |t| ≤ T
ρ(t) = . (2.A.33)
1 otherwise
K2p is nothing but the moment of order 2p of the random variable X(T ) or
equivalently of δT X(t). Expression (2.A.35) leads to ζ2p = p − 2p(p − 1)λ2 ,
and by analytical continuation, the corresponding full ζq spectrum is thus the
parabola
Consider the returns at scale ∆t, defined by r∆t (t) ≡ ln[p(t)/p(t − ∆t)]. Then,
mapping the increments δX∆t [k] defined in (2.A.28) onto r∆t (t) makes the
price p(t) a multifractal random walk in the continuous limit ∆t → 0. The
discrete return r∆t (t) can thus be written as
where (t) is a standardized Gaussian white noise independent of ω∆t (t) and
ω∆t (t) is a nearly Gaussian process (exactly Gaussian for ∆t → 0) with mean
and covariance:
1
µ∆t = ln(σ 2 ∆t) − C∆t (0) (2.A.38)
2
C∆t (τ ) = Cov[ω∆t (t), ω∆t (t + τ )] ,
T
λ2 ln |τ |+e−3/2 ∆t
if |τ | < T − e−3/2 ∆t ,
= (2.A.39)
0 if |τ | ≥ T − e−3/2 ∆t
Appendix 87
where W (t) denotes a standard Wiener process and the memory kernel K∆t (·)
is a causal function, ensuring that the system is not anticipative. The process
W (t) can be seen as the cumulative information flow. Thus ω(t) represents
the response of the price to incoming information up to the date t. At time
(t)3/2
t, the distribution of ω∆t is Gaussian with mean µ∆t and variance V∆t =
∞ 2 2 Te
0
dτ K∆t (τ ) = λ ln ∆t . Its covariance, which entirely specifies the
random process, is given by
∞
C∆t (τ ) = dt K∆t (t)K∆t (t + |τ |) . (2.A.41)
0
ub
fu (x|b) = b . (2.B.45)
xb+1
Let us denote by
T
LP D
T (b̂) = max ln fu (Xi |b) (2.B.46)
b
i=1
1
T
1
+ ln u − ln Xi = 0 , (2.B.47)
b T i=1
which yields
−1
1
T
1 PD b̂ 1
b̂ = ln Xi − ln u , and LT (b̂) = ln − 1 + .
T i=1 T u b̂
(2.B.48)
T
LSE ˆ ln fu (Xi |c, d)
T (ĉ, d) = max (2.B.51)
c,d
i=1
Appendix 89
ˆ are solution of
Thus, the maximum likelihood estimators (ĉ, d)
T Xi c Xi
1 Xi
1 T
1 ln u
= 1 i=1
T
T u
c − ln , (2.B.52)
c
T i=1 u
Xi
−1 T i=1 u
T c
u c Xi
c
d = −1. (2.B.53)
T i=1 u
T
1 SE ˆ = ln ĉ + ĉ − 1
LT (ĉ, d) ln Xi − 1 . (2.B.54)
T dˆĉ T i=1
√
Since c > 0, the vector T (ĉ − c, dˆ − d) is asymptotically normal, with a
covariance matrix whose expression is given in Appendix 2.C.
It should be noted that the maximum likelihood (2.B.52–2.B.53) do not
admit a solution with positive c for all possible samples (X1 , . . . , XT ). Indeed,
the function
T Xi c Xi
1 Xi
1 T
1 ln u
h(c) = − 1 i=1
T
T u
c + ln , (2.B.55)
c i=1
T
Xi
u −1 T i=1 u
A finite sample may not automatically obey this condition even if it has been
generated by the SE distribution. However, the probability of occurrence of a
sample leading to a negative maximum likelihood estimate of c tends to zero
(under the SE Hypothesis with a positive c) as
√
c T σ c2 T
Φ −
√ e− 2σ2 , (2.B.58)
σ 2πT c
90 2 Marginal Distributions of Returns
1
T
dˆ = Xi − u , (2.B.61)
T i=1
and is given by
1 ED ˆ ˆ .
L (d) = −(1 + ln d) (2.B.62)
T T
√
The random variable T (dˆ − d) is asymptotically normally distributed with
zero mean and variance d2 /T .
db
x
−(b+1)
fu (x|b, d) = · x exp − , x≥u. (2.B.63)
Γ −b, ud d
ˆ
The maximum of the log-likelihood function is reached at the point (b̂, d)
solution of
1 Xi
T
u
ln = Ψ −b, , (2.B.65)
T i=1 d d
1 Xi
T
1
u −b u
= e− d − b , (2.B.66)
T i=1 d Γ −b, ud d
and is equal to
1 IG ˆ
u
LT (b̂, d) = − ln dˆ − ln Γ −b,
T d
u 1
u −b u
+ (b + 1) · Ψ −b, +b− e− d (2.B.67)
.
d Γ −b, ud d
T
LSE
T (b̂, ĉ) = max ln fu (Xi |b, c) . (2.B.69)
b,c
i=1
The
√ solution of these equations is unique and it can be shown that the vector
T (b̂ − b, ĉ − c) is asymptotically Gaussian with a covariance which can be
deduced from the matrix (2.C.88) given in Appendix 2.C.
x c
F̄ (x) = exp −v −1 , xu. (2.C.73)
u
Here, the parameter v involves both the unknown parameters c, d and the
known threshold u:
u c
v= . (2.C.74)
d
The log-likelihood L for sample (X1 , . . . , XT ) has the form:
N N c
Xi Xi
L = N ln v + N ln c + (c − 1) ln −v −1 . (2.C.75)
i=1
u i=1
u
and find
∂2L N
= − , (2.C.77)
∂v 2 v2
N c c
∂2L 1 Xi Xi N →∞ X X
= −N · ln −→ −N E ln (2.C.78)
∂v∂c N i=1 u u u u
N c
∂2L N 1 Xi Xi
= − − Nv · ln2
∂c2 c2 N i=1 u u
c
N →∞ N X X
−→ − − Nv · E ln2 . (2.C.79)
c2 u u
After some calculations, we obtain:
c
X X 1 + E1 (v)
E ln = , (2.C.80)
u u c·v
Similarly we find:
c
X X 2ev
E ln2 = [E1 (v) + E2 (v) − ln(v)E1 (v)] , (2.C.82)
u u v · c2
where E2 (v) is the partial derivative of the incomplete Gamma function:
∞ ∞
ln(t) −t ∂ ∂
E2 (v) = · e dt = ta−1 e−t dt = Γ (a, x) .
v t ∂a v a=0 ∂a a=0
(2.C.83)
The covariance matrix B of the ML-estimates (ṽ, c̃) is equal to the inverse
of the Fisher matrix. Thus, inverting the Fisher matrix Φ in (2.C.84) provides
the desired covariance matrix:
⎛ 2 ⎞
v
[1 + 2ev
E 1 (v)+2ev
E 2 (v)−ln(v)ev
E 1 (v)] − cv
[1 + ev
E 1 (v)]
B=⎝ ⎠,
N H(v) N H(v)
c2
−N H(v) [1 + e E1 (v)]
cv v
N H(v)
(2.C.85)
After some calculations following the same steps as above, we find the co-
variance matrix B of the limit Gaussian distribution of ML-estimates (g̃, c̃):
π2
6 g2 + (γ + ln(g) − 1)2 g · c [γ + ln(g) − 1]
6
B= , (2.C.88)
N π2 g · c [γ + ln(g) − 1] c2
where γ is the Euler number: γ
0.577 215 . . .
This Appendix derives the statistic that allows one to test the SE hypothe-
sis f1 (x|c, b) versus the Pareto hypothesis f0 (x|β) on a semi-infinite interval
(u, ∞), u > 0. The following parameterization is used:
94 2 Marginal Distributions of Returns
b
x c
f1 (x|c, b) = b u−c xc−1 exp − −1 ; x≥u (2.D.89)
c u
for the Stretched-Exponential distribution and
uβ
f0 (x|β) = β ; x≥u (2.D.90)
x1+β
for the Pareto distribution.
Theorem: Assuming that the sample X1 , . . . , XN is generated from the
Pareto distribution (2.D.90), and taking the supremum of the log-likelihoods
L0 and L1 of the Pareto and (SE) models respectively over the domains
(β > 0) for L0 and (b > 0, c > 0) for L1 , then Wilks’ log-likelihood ratio
W:
WN = 2 sup L1 − sup L0 , (2.D.91)
b,c β
and is equal to
1
sup L0 = −N 1 + log u + − log β̂N . (2.D.94)
β β̂N
The log-likelihood L1 is
N c $
1 b 1
N
Xi Xi
L1 = −N log u − (c − 1) log − log b + −1 .
N i=1 u c N i=1 u
(2.D.95)
and is equal to
1
N
Xi
sup L1 = −N 1 + log u − (c − 1) log − log b̂N . (2.D.97)
b N i=1 u
which yields
c
1 Xi ∼ c2 c3
= 1 + c · S1 + · S2 + S 3 , (2.D.103)
N u 2 6
c 2
1 Xi Xi ∼ c
log = S1 + c · S2 + S3 , (2.D.104)
N u u 2
where
96 2 Marginal Distributions of Returns
1
N
Xi
S1 = log , (2.D.105)
N i=1 u
1 2 Xi
N
S2 = log , (2.D.106)
N i=1 u
1 3 Xi
N
S3 = log . (2.D.107)
N i=1 u
Putting these expansions into (2.D.96) and (2.D.98) and keeping only
terms in c up to second order, the solutions of these equations reads
S2 3S 2 − 2S1 S3 2 1
2 S2 − S12
b̂N
S1−1 1− ĉN + 2 ĉN , and ĉN
.
2 S1 S2 − 3 S3
2S1 12S12 1 1
(2.D.108)
we can write
√ 4 √ 2
N βξ2 = ( N βξ1 ) + ˆ , (2.D.116)
β β
where ˆ is a Gaussian variable with zero mean and unit variance, independent
from ξ1 . This implies that
In this chapter, we introduce the notion of copulas, which describes the depen-
dence between several random variables. These variables can be the returns
of different assets or the value of a given asset at different times, and more
generally, any set of economic variables. We present some examples of clas-
sical families of copulas and provide several illustrations of the usefulness of
copulas for actuarial,1 economic, and financial applications.
Until relatively recently, the correlation coefficient was the measure of
reference used to quantify the amplitude of dependence between two assets.
From the old hypothesis or belief that the marginal distribution of returns is
Gaussian, it was natural to extend this assumption of normality to the mul-
tivariate domain. Recall that only under the assumption of multivariate nor-
mality2 is the correlation coefficient necessary and sufficient to capture the
full dependence structure between asset returns. The growing attacks of the
past three decades and the now overwhelming evidence against the Gaussian
hypothesis also cast doubts on the relevance of the correlation coefficient as
an adequate measure of dependence. See for instance [404] for a specific test
of multivariate normality of asset returns. Actually, it is now clear that the
correlation coefficient is grossly insufficient to provide an accurate description
of the dependence between two assets [64, 148, 149] and that it is necessary
to characterize the full joint multivariate distribution of asset returns. This is
all the more important for rare large events whose deviations from normality
are the most extreme both in amplitude and dependence.
Consider for simplicity the problem of characterizing the bivariate distrib-
ution of the returns of only two assets. It is essential to realize that the bivari-
1
Actuarial science is a sister discipline of statistics. Actuaries play an important
role in many of the financial plans that involve people, e.g., life insurance, pension
plans, retirement benefits, car insurance, unemployment insurance, and so on.
2
To some extent, the correlation coefficient also adequately quantifies the depen-
dence between elliptically distributed random variables, even if it may yield spu-
rious conclusions – especially in the far tails – as we shall see in the next chapters.
100 3 Notions of Copulas
Pr [X ≤ x; Y ≤ y] = Pr [X ≤ x] · Pr [Y ≤ y] , (3.1)
or equivalently
Pr [X ≤ x | Y ] = Pr [X ≤ x] . (3.2)
which, as stressed in [270], implies the perfect predictability of one of the ran-
dom variables from the other one. The mapping f is either strictly increasing
or strictly decreasing. In the first case, the random variables are said to be
comonotonic.
In a second stage of our investigation of the concept of dependence, let us
ask what could be the meaning of the following statement:
The random variables X and Y exhibit the same dependence as the
random variables X and Y .
102 3 Notions of Copulas
It turns out that the function C defined by (3.9) is the only object obeying the
property of invariance under strictly increasing mapping and which entirely
captures the full dependence between X and Y .
The following properties follow from simple calculations:
• C(u, 1) = u and C(1, v) = v, ∀u, v ∈ [0, 1],
• C(u, 0) = C(0, v) = 0, ∀u, v ∈ [0, 1],
• C is 2-increasing, namely, for all u1 ≤ u2 and v1 ≤ v2 :
As we shall see in the sequel, these three properties define the mathemati-
cal object called copula, which has been introduced by A. Sklar in the late
1950s [443] in order to describe the general dependence properties of random
variables.
6
When this assumption fails, Sklar’s theorem still holds, but in a weaker sense: a
representation like (3.17) still exists but is not unique anymore.
7
The quantile function, or generalized inverse, Fi−1 of the distribution Fi can be
defined by:
Fi−1 (u) = inf{x | Fi (x) ≥ u}, ∀u ∈ (0, 1).
When the distribution function Fi is strictly increasing, Fi−1 denotes the usual
inverse of Fi . In fact, any quantile function can be chosen. But, for noncontinuous
margins, the copula (3.18) depends upon the precise quantile function which is
selected.
3.2 Definition and Main Properties of Copulas 105
While the survival copula is indeed a true copula, the dual copula is not.
However, it can be simply related to the probability that (at least) one of the
Xi ’s is less than or equal to xi . Indeed, one can easily check that:
n
%
Pr {Xi ≤ xi } = C ∗ (F1 (x1 ), . . . , Fn (xn )) . (3.21)
i=1
and
C(u, v) ≥ u + v − 1 . (3.26)
It is clear that these two bounds fulfill all the requirements of copulas, quali-
fying the functions max (u + v − 1, 0) and min(u, v) as genuine bivariate cop-
ulas. These two bounds are thus the tightest possible bounds. Generalization
to higher dimension is straightforward, so that we can state
Proposition 3.2.3 (Fréchet-Hoeffding Upper and Lower Bounds).
Given an n-copula C, for all u1 , . . . , un ∈ [0, 1]:
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.4 0.6 0.4 0.6
0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
v u v u
Fig. 3.1. The Fréchet-Hoeffding lower (left panel ) and upper (right panel ) bounds
for bivariate copulas
These lower and upper bounds, which constitute the so-called Fréchet-
Hoeffding bounds, are represented in Fig. 3.1 for the bivariate case. The upper
bound is itself an n-copula, while the lower one is a copula only for n = 2.
However, this lower bound remains the best possible insofar as, for any fixed
point (u1 , . . . , un ) ∈ [0, 1]n , there exists a copula C̃ such that, at this particular
point:
Multiplicative factor models, which account for most of the stylized facts
observed on financial time series, generate distributions with elliptical copu-
las. Multiplicative factor models contain in particular multivariate stochastic
volatility models with a common stochastic volatility factor. They can be
formulated as
X=σ·Y , (3.31)
The Gaussian copula is the copula derived from the multivariate Gaussian
distribution. The Gaussian copula provides a natural setting for generaliz-
ing Gaussian multivariate distributions into so-called meta-Gaussian distrib-
utions. Meta-Gaussian distributions have been introduced in [283] (see [163]
for a generalization to meta-elliptical distributions) and have been applied in
many areas, from the analysis of experiments in high-energy particle physics
[265] to finance [453]. These meta-Gaussian distributions have exactly the
same dependence structure as the Gaussian distributions while differing in
their marginal distributions which can be arbitrary.
Let Φ denote the standard Normal (cumulative) distribution and Φρ,n
the n-dimensional standard Gaussian distribution with correlation matrix ρ.
Then, the Gaussian n-copula with correlation matrix ρ is
Cρ,n (u1 , . . . , un ) = Φρ,n Φ−1 (u1 ), . . . , Φ−1 (un ) , (3.32)
8
See footnote 3 page 39.
3.3 A Few Copula Families 109
1 1e−005 0.1 1 0.1 0.01 1e−005
0.001 1e−
005 0.4 005
1e− 01 0.01 0.1 0.4 0.1 0.01 0.0
0.0 01
0.1
0.1
0.4 0.8 0.4
0.8 0.1
0.1
0.8
0.8 0.8
1.3
1
0 .8
0.0
0.4
0.0
0.
0.8
4
0.8 1.3
1
1.3
0.01
1e−005 1
1e−005
1.3
0.4
0.001
1.6 1.
0.6 0.6 3
0.4
0.00
1.6
0.1
0.1
v
0.1
0.1
0.8
0.0
1e 005
1.6
0.
1e 005
1. 6
1.3
8
0.4
01
0.4 0.4
0.8
0.4
0. 1.3
0.01
1.3
0.
4
1.3
1
0.0
1.3
0.4
0.0
0.8
1
0.2 0.2 0.1
0.8
0.8 0.1
0.4
0.1
0.1
0.4 01 0.0 0.8
0.4 0.0 01
0.1 0.01 1e− 0.01 0.1
005 005 0.4
0.1 0.01 0.001 1e− 0.1
0 1e 005 0 1e 005
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 3.2. Contour plot of the density (3.34) of the bivariate Gaussian copula with
a correlation coefficient ρ = 0.8 (left panel ) and ρ = −0.8 (right panel )
Student’s Copula
10
0.1 1 1.5
0.15 0.0
3
1 5
0.1 3
2
2
0.5
0.5
1 2
0.5
1.5
0.8 5 0.8 1
0.5
0.1
0.1
1
5 2
1 1.
5
0.
0.10.15
0.6 2 0.6 5
0.1
2 0.
1 1.5
1
v
v
5
0.
1.5
0.1
1 1
5
0.
0.4
0.15
0.4
0.1
2 2
0.5
5
2 1.
0.5
1
1
0.5
5
0.1
0.2 0.2
5
1
1.
0.5
0.1
0.1
1
2
21
3
2
5
0
0.
3 1 0.5 0.15 5
10 0.1 0 .1 1.5
0.15 0.1 32 1 0.5 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 3.3. Contour plot of the density (3.37) of a bivariate Student t copula with
a shape parameter ρ = 0.8 and ν = 2 degrees of freedom (left panel ) or ν = 10
degrees of freedom (right panel ). For small ν’s, the difference between the Student
copula and the Gaussian copula is striking on both diagonals. As ν increases, this
difference decreases on the second diagonal but remains large (for ν = 10) on the
main diagonal, as can be observed by comparing the above right with the left panel
of Fig. 3.2
ν+1
ν+n ν n−1 &n yk2 2
1 Γ 2 Γ 2 k=1 1 + ν
cn,ρ,ν (u1 , . . . , un ) = √ ν+1 n
ν+n ,
det ρ Γ 2 y t ρ−1 y 2
1+ ν
(3.37)
where y t = (Tν−1 (u1 ), . . . , Tν−1 (un )). See also Fig. 3.3.
Since Student’s distribution tends to the normal distribution when ν goes
to infinity, Student’s copula tends to the Gaussian copula as ν → +∞ [350]:
dk ϕ[−1] (t)
(−1)k ≥ 0, ∀k = 0, 1, . . . , n . (3.42)
dtk
When this later relation holds for all n ∈ N, ϕ[−1] is said completely monotonic.
In such a case, the bivariate Archimedean copula can be generalized to any
dimension.
10
Lindskog et al. [307] have recently introduced a robust estimation technique for
the calibration of the shape matrix of any elliptical copula, which is described in
Chap. 5.
112 3 Notions of Copulas
1 1
0.1
0.5
1.5
0.8 0.8
1.25
1.25
0.6 0.6
1
0.5
0.1
v
v
1
1
0.2 0.2
11.20.5
0.5
1.5
2
.55
0.1
25 1
1.
0.5
1 0.1
1
0 0 2.55 1 0.50 1 0.1
25
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 3.4. Contour plot of Clayton’s copula (left panel ) and contour plot of its
density (right panel ) for parameter value θ = 1
1 −θ
with generator ϕ(t) = t −1 ,
θ
• Gumbel’s copula, which plays a special role in the description of depen-
dence using extreme value theory (see next Sect. 3.3.3):
1/θ
CθG (u, v) = exp − (− ln u)θ + (− ln v)θ , θ ∈ [1, ∞) (3.44)
θ
with generator ϕ(t) = (− ln t) ,
• Frank’s copula:
1 (e−θu − 1)(e−θv − 1)
CθF (u, v) = − ln 1 + , θ∈R (3.45)
θ e−θ − 1
e−θt − 1
with generator ϕ(t) = − ln .
e−θ − 1
Note that the bivariate Fréchet-Hoeffding lower bound is an Archimedean
copula, while the upper bound copula is not. For an overview of the members
of the Archimedean family, we refer to Table 4.1 in [370].
3.3 A Few Copula Families 113
1 1 1.
5
8
0.
1.4
1.
7
0.
3
1.2
1
9
1.1
0.
0.8 0.8
0.8
1.1
0.6 0.6 0.9
1
1
v
v
1
0.8
0.2 0.2
9
1.1
0.
0.7
1.2
1.
1.
3
4
8
1
0.
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 3.5. Contour plot of Frank’s copula (left panel ) and contour plot of its density
(right panel ) for parameter value θ = 1
where the bi (t)’s are the base-line hazard rates and β is the vector of regression
parameters (the same for all individuals).
Defining the frailty variable V = eβ·Z and integrating the conditional haz-
ard rates, one obtains the expression of the conditional survival distributions:
t
Si (t|V = v) = e−v·fi (t) , where fi (t) = bi (s)ds . (3.50)
0
Then, assuming that V has the distribution function F with Laplace transform
φ (cf. (3.46)), the joint survival distribution of the Ti ’s is given by
Pr [T1 > t1 , . . . , Tn > tn ] = EV [S1 (t1 |V ) · · · Sn (tn |V )] ,
= EV e−V ·(f1 (u1 )+···+fn (un )) ,
∞
= e−v·(f1 (u1 )+···+fn (un )) dF (v),
0
= ϕ−1 (f1 (u1 ) + · · · + fn (un )) . (3.51)
Since the unconditional marginal survival function of a given Ti reads
Sklar’s theorem shows that the (survival) copula of all the Ti ’s is:
alone and the two last ones together. Therefore, if the dependence of the three
random variables is described by an Archimedean copula, this implies a strong
symmetry between the different variables in that they are exchangeable. As a
consequence, when there is no reason to expect a breaking of symmetry be-
tween the random variables, an Archimedean copula may be a good choice to
model their dependence. Such an assumption is often used in modeling large
credit baskets. A contrario, when the random variables play very different
roles, namely when they are not exchangeable, Archimedean copulas do not
provide valid models of their dependence.
Another interesting property of Archimedean copulas is that their values
C(u, u) on the first bisectrix verify the following inequality:
Reciprocally, one can demonstrate [370, Theorem 4.1.6] that any copula pos-
sessing these two properties (associativity and C(u, u) < u) are Archimedean.
This provides an intuitive understanding of the nature of Archimedean copu-
las. It also allows one to understand why the Fréchet–Hoeffding upper bound
copula is not Archimedean. Indeed, although it enjoys the associativity prop-
erty, the Fréchet-Hoeffding upper bound is such that C(u, u) = u for all
u ∈ [0, 1] (note that it is the only copula with this property).
Archimedean copulas obey an important limit theorem [260] of the type
of the Gnedenko-Pikand-Balkema-de Haan (GPBH) theorem (see Chap. 2).
Consider two random variables, X and Y , distributed uniformly on [0, 1], and
whose dependence structure can be described by an Archimedean copula C.
Then, the copula associated with the distribution of left-ordered quantiles
tends, in most cases, to Clayton’s copula (3.43) in the limit where the prob-
ability level of the quantiles goes to zero. To be more specific, let us denote
by ϕ the generator of the copula C, assumed differentiable. Let us define the
conditional distribution
C(x ∧ u, u)
Fu (x) = Pr[X ≤ x|X ≤ u, Y ≤ u] = , ∀x ∈ [0, 1] , (3.57)
C(u, u)
where x ∧ u means the minimum of x and u, and the conditional copula
Cu (x, y) = Pr[X ≤ Fu−1 (x), Y ≤ Fu−1 (y)|X ≤ u, Y ≤ u]
C Fu−1 (x), Fu−1 (y)
= . (3.58)
C(u, u)
One can first show that, provided that ϕ is a strick generator (that is, ϕ(0)
is infinite such that ϕ[−1] = ϕ−1 ), Cu is a strict Archimedean copula with
generator:
ϕu (t) = ϕ Fu−1 (t) − ϕ(u), (3.59)
−1
= ϕ t · ϕ (2ϕ(u)) − 2ϕ(u) , (3.60)
from which, it follows that the limiting behavior of Cu , as u goes to zero, is:
116 3 Notions of Copulas
For suitably chosen norming sequences (ak,T , bk,T ), the limit distribution
M1,T − b1,T Mn,T − bn,T
lim Pr ≤ z1 , . . . , ≤ zn
T →∞ a1,T an,T
if it exists, is given by
11
See footnote 3 page 39.
3.3 A Few Copula Families 117
1 1 0.1 1.5
0.1 11.25
5
0.5
55
11.21.
2
2
0.8 0.8
0 .1
1
25
5
0.5 1.
1.
0.6 0.6 1.5
0.5
5
1 1.2
5 1
v
v
1.2
0.5
1.5
5
1.
0.4 0.4
0.1
25
1.5 1.
1. 1
25
1
5
0.
0.2 0.2
5
1.
5 5
1.21 0.5 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 3.6. Contour plot of Gumbel’s copula (left panel ) and of its density (right
panel ) for the parameter value θ = 2
Definition 3.3.2 (Extreme Value Copula). Any copula which admits the
representation:
1 1
C(u1 , . . . , un ) = exp −V − ,...,− , (3.67)
ln u1 ln un
with
wi
V (x1 , . . . , xn ) = max dH(w) , (3.68)
Πn i xi
where H is any positive finite measure such that Πn
wi dH(w) = 1 and Πn
is the (n-1)-dimensional unit simplex:
$
n
Πn = w ∈ Rn+ : wi = 1 , (3.69)
i=1
It is interesting to notice that this copula is the only associative extreme value
copula which is not Archimedean. Indeed, due to the relation (3.70), either
C(u, . . . , u) = u for all u ∈ [0, 1] or C(u, . . . , u) < u for all u ∈ (0, 1).12
Since the Fréchet-Hoeffding upper bound copula is the only copula such that
C(u, . . . , u) = u, for all u ∈ [0, 1] [370], we can conclude that any extreme value
copula, which enjoys the associativity property, is an Archimedean copula.
where
12
Assuming that there exists a number u∗ ∈ (0, 1) such that C(u∗ , . . . , u∗ ) = u∗ ,
and raising this equation to the power α, it follows that C (u∗α , . . . , u∗α ) = u∗α
for any positive α, by (3.70). Note that u∗α spans the entire interval (0, 1) when
α ranges from zero to infinity. Thus, for all u ∈ (0, 1), C(u, . . . , u) = u, and since
this equality still holds when u = (0, . . . , 0) and u = (1, . . . , 1), we have:
so that either C(u, . . . , u) = u for all u ∈ [0, 1] or C(u, . . . , u) < u for all u ∈ (0, 1).
13
Former results concerning the case where ψ is a sum of n terms or where ψ is an
increasing continuous function can be found for instance in [327, 179, 486, 126].
3.4 Universal Bounds for Functionals of Dependent Random Variables 119
Finf (y) = sup Cinf (F1 (x1 ), . . . , Fn−1 (xn−1 ), Fn (ξ(x1 , . . . , xn−1 , y))) ,
x1 ,...,xn−1 ∈R
(3.73)
∗
Fsup (y) = inf Csup (F1 (x1 ), . . . , Fn−1 (xn−1 ), Fn (ξ(x1 , . . . , xn−1 , y))) ,
x1 ,...,xn−1 ∈R
(3.74)
with
ξ(x1 , . . . , xn−1 , y) = sup {t ∈ R; ψ (x1 , . . . , xn−1 , t) ≤ y} . (3.75)
A heuristic proof of this result can be found in Appendix 3.A.
In this theorem, Cinf and Csup can be copulas, but this is not necessary. In
particular, since any copula is larger than the Fréchet-Hoeffding lower bound,
in the absence of any information on the dependence between the random
variables, one can always resort to
Csup (u1 , . . . , un ) = Cinf (u1 , · · · , un ) = max(u1 +. . .+un −n+1, 0) . (3.76)
This allows one to derive a universal bound for the probability that
ψ (X1 , . . . , Xn ) be less than y. Obviously, when additional information on
the dependence is available, the bound can be improved. For instance, when
the random variables are known to be positive orthant dependent – we will
come back in Chap. 4 on this notion – we can choose the independence (or
product) copula14 for Cinf and Csup .
The bound provided by Theorem 3.4.1 is point-wise the best possible.
Indeed, as shown in [145, Theorem 3.2], there always exists a copula C̃ for
X1 , . . . , Xn such that the distribution of ψ(X1 , . . . , Xn ) reaches the bound, at
least at one point. Therefore, on the entire set of distribution functions, it is
not possible to improve on this bound.
To conclude this section, let us state a straightforward bound implied
by Theorem 3.4.1 for expectations. Denoting by Xinf and Xsup two random
variables with distribution functions Finf and Fsup respectively, and a non-
decreasing function G, we obviously have:
E [G (Xsup )] ≤ E [G (ψ (X1 , . . . , Xn ))] ≤ E [G (Xinf )] . (3.78)
Similar bound exists – mutatis mutandis – for any non-increasing function.
14
Recall that the independence (or product) copula is:
C(u, v) = u · v, ∀u, v ∈ [0, 1] ,
so that:
Pr [X ≤ x, Y ≤ y] = F (x, y) = C (FX (x), FY (y))
= FX (x) · FY (y) = Pr [X ≤ x] · Pr [Y ≤ y] .
120 3 Notions of Copulas
0.8 0.8
0.6 0.6
v
v
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 3.7. Five thousand realizations of two random variables whose distribution
function is given by the Gaussian copula with correlation coefficient ρ = 0.4 (left
panel ) and ρ = 0.8 (right panel )
Algorithm 1
1. Generate n independent standard Gaussian random variables: u = (u1 ,
. . . , un ) using the Box-Müller algorithm [77], for instance,
2. find the Cholevsky composition of ρ: ρ = A · At , where A is a lower-
triangular matrix,
3. set y = A · u,
4. and finally evaluate xi = Φ (yi ), i = 1, . . . , n, where Φ denotes the univari-
ate standard Gaussian distribution function.
To generate an n-dimensional random vector drawn from a more compli-
cated elliptical copula, it is useful to recall that any centered and elliptically
distributed random vector X admits the following stochastic representation
[252]:
X=R·N, (3.80)
0.8 0.8
0.6 0.6
v
v
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 3.8. Five thousand realizations of two random variables whose distribution
function is given by Student’s copula with shape coefficient ρ = 0.4 (left panel ) and
ρ = 0.8 (right panel ) and ν = 3 degrees of freedom
The second general method is based upon the simple fact that:
Pr [U1 ≤ u1 , . . . , Un ≤ un ] = Pr [Un ≤ un |U1 = u1 , . . . , Un−1 = un−1 ]
× Pr [U1 ≤ u1 , . . . , Un−1 ≤ un−1 ] ,
which gives
Pr [U1 ≤ u1 , . . . , Un ≤ un ] = Pr [Un ≤ un |U1 = u1 , . . . , Un−1 = un−1 ]
× Pr [Un−1 ≤ un−1 |U1 = u1 , . . . , Un−2 = un−2 ]
..
.
× Pr [U2 ≤ u2 |U1 = u1 ] · Pr [U1 ≤ u1 ] (3.81)
by a straightforward recursion.
Therefore, applying this reasoning to the n-copula C, and denoting by Ck
the copula of the k first variables, this yields:
C (u1 , . . . , un ) = Cn (un |u1 , . . . , un−1 ) . . . C2 (u2 |u1 ) · C1 (u1 ) , (3.82)
' () *
=u1
3.5 Simulation of Dependent Data with a Prescribed Copula 123
where we define:
∂u1 . . . ∂uk−1 Ck (u1 , . . . , uk )
Ck (uk |u1 , . . . , uk−1 ) = . (3.83)
∂u1 . . . ∂uk−1 Ck−1 (u1 , . . . , uk−1 )
The same scheme can also be used to simulate Clayton’s copula. However,
Devroye [129] has proposed a somewhat simpler method for Clayton’s copula
with positive parameter θ:
1. generate two standard exponential random variables: v1 , v2,
2. generate a random variable x following the distribution Γ θ−1 , 1 ,
−1/θ −1/θ
3. set u1 = 1 + vx1 and u2 = 1 + vx2 .
This approach is in fact related to Marshall and Olkin’s work [348]. Indeed,
it is straightforward to check that, with the specification above, one has:
−θ
Pr [Ui ≤ ui |X = x] = e−x·(ui −1)
, (3.84)
0.8 0.8
0.6 0.6
v
v
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 3.9. Five thousand realizations of two random variables whose distribution
function is given by Clayton’s copula with parameter θ = 1 (left panel ) and by
Frank’s copula with parameter θ = 5 (right panel )
law with tail index 1/θ, since the inverse of the generator of such a copula
is φ(t) = e−t , t ≥ 0. For an overview and softwares to generate random
1/θ
Lévy variables, see the Web pages of Professors J. Huston McCulloch (http://
economics.sbs.ohio-state.edu/jhm/jhm.html) and John P. Nolan (http:
//academic2.american.edu/∼jpnolan/stable/stable.html).
To conclude on the question concerning the simulation of dependent ran-
dom variables, the second approach is sometimes more appropriate for n-
copulas, with n > 2, because the algorithm based upon the inversion of the
conditional copulas can rapidly become intractable for large n.
One of the most important activities in the financial as well as in the actuarial
worlds consists in assessing the risk of uncertain aggregated positions. This
risk is often measured by the Value-at-Risk VaRα at probability level α. VaRα
is the lower α-quantile of the net risk position Y , as illustrated in Fig. 3.10:
3.6 Application of Copulas 125
α
Fy (y)
0
VaRα Losses
Fig. 3.10. Value-at-Risk at probability level α for the loss Y with distribution
function FY . We show the case where the distribution has a gap to exemplify how
the Value-at-Risk is defined in such a degenerate case
with
n $
Fmin (y) = sup max F̃i− (xi ) − (n − 1), 0 (3.88)
x ∈A(y) i=1
and
126 3 Notions of Copulas
n $
Fmax (y) = inf min F̃i (xi ), 1 , (3.89)
x ∈A(y)
i=1
VaRmin
α ≤ VaRα (Y ) ≤ VaRmax
α , (3.91)
with:
VaRmin
α = inf {t ∈ R; Fmax (t) ≥ α} , (3.92)
and
VaRmax
α = inf {t ∈ R; Fmin (t) ≥ α} . (3.93)
These two relations have a clear economic meaning: they represent respectively
the most optimistic and pessimistic outcomes one can expect in the absence
of any information on the actual dependence structure between the different
sources of risk.
A closed-form expression for Fmin and Fmax is almost impossible to obtain
in the general case where the marginal distributions of each of the assets
are different. However, when all the risks can be described by distributions
belonging to the same class, some general results have been obtained [126]. As
an example, let us consider the case of a portfolio made of n risks (with the
set of weights {wi , i = 1, . . . , n}) following shifted-Pareto distributions with
the same tail index β > 0:
β
λi
Pr [Xi ≤ x] = 1 − , x ≥ θi . (3.94)
λi + (x − θi )
and
n
λ̃
VaRmax
α = wi · (θi − λi ) + , (3.96)
i=1
(1 − α)1/β
where
n 1+β
β
β
λ̃ = (wi · λi ) 1+β
. (3.97)
i=1
and equivalently
n $
F̃i− (xi )
(n)
Fmin (y) = sup max − (n − 1), 0 , (3.100)
x ∈A(y) i=1
+ ,
(n−1)
= sup max Fmin (x) + F̃n (y − x) − 1, 0 . (3.101)
x ∈R
Unfortunately, this approach is efficient only as long as Fmax and Fmin remain
of the same class as the distributions Fi ’s, as occurs in the shifted-Pareto ex-
ample (3.94). In general, the Fi ’s are different and one has to rely on numerical
procedures to derive the bounds of real portfolio risks.
An efficient numerical algorithm has been proposed by Williamson and
Downs [486]. Starting with T − 1 observations of the risks X1 , . . . , Xn , one
first evaluates the upper and lower bounds for the VaRα of a portfolio made
of X1 and X2 . Let qi (k/T ) denote the empirical quantiles of order k/T of Xi .
Let us set −∞ < qi (0) < qi (1/T ) and qi (1 − 1/T ) < qi (1) < ∞. It can be
shown that convergent estimators of VaRmin α and VaRmax
α are given by:
ˆ min
VaRk/T = max {q1 (j/T ) + q2 ((k − j)/T )} , (3.102)
0≤j≤k
ˆ max = min {q1 (j/T ) + q2 (1 − (j − k)/T )} .
VaR (3.103)
k/T
k≤j≤T
In practice, the convergence of VaR ˆ k/T is very fast. Using the same kind
of arguments as in (3.98–3.100), it appears that this method can be used
iteratively, making possible the calculation of the bounds for (reasonably)
large portfolios. An illustration of this method for three portfolios made of
large capitalization US stocks is depicted in Fig. 3.11. For a portfolio of ten
stocks and T = 1500, only a few seconds are required to obtain the Value-at-
Risk bounds.
128 3 Notions of Copulas
0.2
2 asset portfolio
5 asset portfolio
10 asset portfolio
0.15
0.1
α
VaR
0.05
−0.05
0.85 0.9 0.95 1
α
Fig. 3.11. Upper and lower bounds for the VaR of a portfolio over the period from
25 January, 1995 to 29 December, 2000 made of two assets (Applied Materials Inc.
and Coca Cola Co: plain lines), five assets (the two above plus E.M.C Corp MA,
General Electric Co, General Motors Corp: dotted lines) and ten assets (the five
above plus Hewlett Packard Co, I.B.M Corp, Intel Corp, Medtronic Inc. and Merck
& Co Inc.: dash-dotted lines). We find practically identical results when exchanging
these assets with others from the largest capitalization stocks. The lower negative
bounds for portfolios of 5 and 10 assets correspond to the favourable situation where
diversification has removed the risks of losses
In several special cases, the tail risk of a portfolio, made of assets exhibit-
ing nontrivial dependence, can be approximately calculated by a linear or
quadratic approximation [472] or by using an asymptotic expansion. Here, we
follow this later approach and provide an example borrowed from [336].
Consider a portfolio of N assets whose dependence structure is given by
the Gaussian copula. We will discuss the relevance and the limits of this as-
sumption in Chap. 5. In addition, we assume that the returns of each asset
are distributed according to a so-called modified-Weibull distribution charac-
terized by its density
1 c −
|x| c
2 −1 e ( χ ) ,
c
p(x) = √ c |x| (3.104)
2 π χ2
or more generally
3.6 Application of Copulas 129
c
|x|
1 c+ c+
−
+
p(x) = √ 2 −1 e
c+ |x| if x ≥ 0 (3.105)
χ+
2 πχ 2
+
c
|x| −
1 c− c−
−
p(x) = √ 2 −1 e
c− |x| if x < 0 , (3.106)
χ−
2 πχ 2
−
follows a standard Gaussian law. This offers a simple visual test of the hypoth-
esis that the returns are distributed according to the modified-Weibull distrib-
ution: starting from the empirical returns, one transforms them by converting
the empirical distribution into a Gaussian one. Then, plotting the transformed
variables as a function of the raw returns should give the power law (3.107)
if the modified-Weibull distribution is a good model. Figure 3.12 shows the
(negative) transformed returns of the S&P’s 500 index as a function of the
raw returns over the time interval from 03 January, 1995 to 29 December,
1
10
Gaussian Returns
0
10
c/2 = 0.73
−1
10 −4 −3 −2 −1
10 10 10 10
Raw Returns
Fig. 3.12. Graph of the Normalized returns Y of the Standard & Poor’s 500 index
(as explained in the text) versus its raw returns X, from 03 January, 1995 to 29
December, 2000 for the negative tail of the distribution. The double logarithmic
scales clearly show a straight line over an extended range of data, qualifying the
power law relationship (3.107)
130 3 Notions of Copulas
2000. The double logarithmic scales of Fig. 3.12 qualifies a power law with
exponent c/2 = 0.73 over an extended range of data.
For such a portfolio constituted of assets with returns distributed according
to modified-Weibull distributions with the same exponent c > 1, it can be
shown that the distribution of its returns is still given by a modified-Weibull
law, in the asymptotic regime of large losses (counted as negative). Specifically,
the distribution function Fπ of the portfolio losses is asymptotically equivalent
to a modified-Weibull distribution function FZ ,
where λ is a constant, with the same exponent c and with a scale factor χ̂
given by:
c−1
c
χ̂ = wi χi σi , (3.109)
i
and V is the correlation matrix of the Gaussian copula. The proof of this
result can be found in [336].
For two particular cases, the above equations allow us to retrieve simple
closed-form formulas. For independent assets, one has V = Id, so that the
solution of (3.110) is
1
σi = (wi χi ) c−1 , ∀i = 1, . . . , N (3.111)
and thus
N c−1 N2−1
c
c
c
χ̂ = (wi χi ) c−1 , , and λ = , (3.112)
i=1
2(c − 1)
(see Appendix 3.B for a direct proof of this result). For comonotonic assets,
Vij = 1 for all i, j = 1, . . . N , which leads to
c−1
1
N
σi = wk χk , ∀i = 1, . . . , N (3.113)
k=1
and thus
N
χ̂ = wi χi , and λ = 1 . (3.114)
i=1
3.6 Application of Copulas 131
This result is obvious and can be directly retrieved from the comonotonicity
between the assets. In fact, in such a case the distribution of the portfolio is
a modified-Weibull law, not only asymptotically but exactly over the whole
range.
Denoting by W (0) the initial amount of money invested in the risky port-
folio, the asymptotic Value-at-Risk, at probability level α, can easily be com-
puted with the formula
χ̂
α 2/c
VaRα
W (0) 1/c Φ−1 1 − , (3.115)
2 λ
ξ(α)2/c W (0) · χ̂ , (3.116)
where the function Φ(·) denotes the cumulative Normal distribution function
and
1
α
ξ(α) ≡ √ Φ−1 1 − . (3.117)
2 λ
The example provided here for a portfolio made of assets whose dependence
is described by a Gaussian copula can be easily extended to more complex
cases. For instance, the same kind of asymptotic expansion can be performed
for the Student’s copula. This illustrates the simplification brought by the use
of copula for some parametric calculations of tail risks.
As suggested in [99, 417], copulas offer a useful framework for pricing mul-
tivariate contingent claims. Indeed, they provide natural pricing kernels that
allow one to determine the price of options defined on a basket of assets by
simply gathering the prices of options written on each individual asset.
Following [99], let us consider a market with two risky assets S1 and S2
and a risk-free asset B. For simplicity – but without loss of generality – the
risk-free interest rate is set to zero. Let us assume the existence of two digital
options O1 on S1 and O2 on S2 respectively, with maturity T . They pay one
monetary unit at time T if the value Si (T ) of the underlying asset at time T
is more than Ki . Their price Pi is:
Pi = EQ 1{Si (T )>Ki } = PrQ [Si (T ) > Ki ] , (3.118)
By Sklar’s Theorem (3.2.1), we can write the price of the bivariate digital
option as a function of the price of each individual digital option:
P = C Q (P1 , P2 ) , (3.120)
where
1HH (O) = 1{S1 (T )>K1 ,S2 (T )>K2 }|HH = 1 ,
1HL (O) = 1{S1 (T )>K1 ,S2 (T )>K2 }|HL = 0 ,
1LH (O) = 1{S1 (T )>K1 ,S2 (T )>K2 }|LH = 0 ,
1LL (O) = 1{S1 (T )>K1 ,S2 (T )>K2 }|LL = 0 ,
1HH (1) = 1HL (1) = 1LH (1) = 1LL (1) = 1 for the risk-free asset, and
3.6 Application of Copulas 133
In short, the matrix Π allows one to obtain the value of the four assets in
each of the four states of the world.
The absence of arbitrage opportunity amounts to the existence of a vec-
tor p̃ with positive components such that the vector p of prices can be written
as follows [103, 214]
p = Π · p̃ . (3.125)
Since, in the present case, the market is complete by construction, the matrix
Π can be inverted and we have
⎛ ⎞
P
⎜ P1 − P ⎟
p̃ = Π −1 · p = ⎜
⎝
⎟ .
⎠ (3.126)
P2 − P
P − P1 − P2 + 1
This retrieves (3.121) except for the fact that the Fréchet-Hoeffding bounds
are now excluded. In fact, as recalled earlier, the Fréchet-Hoeffding upper
and lower bounds are associated with the comonotonicity and the counter-
monotonicity. These two situations are obviously excluded from the formu-
lation in terms of the pricing kernel since the market cannot be considered
134 3 Notions of Copulas
as complete in those cases. Therefore, the prices associated with the Fréchet-
Hoeffding bounds are nothing but the static super-replication15 prices of the
bivariate digital option. Indeed, selling for instance the bivariate digital option
for the price P = min{P1 , P2 }, the trader can buy the least expensive of the
two digital options, say O1 if P1 ≤ P2 . Then, at maturity, she can pay one
monetary unit to the buyer of the binary digital option with certainty since
the binary option generates a cash-flow of one monetary unit if and only if
the world is in the state HH for which O1 also generates a cash-flow of one
monetary unit.
It is straightforward to extend the previous calculations to the case of
multivariate digital options written on a larger basket of underlying assets.
The restriction to bivariate digital options presented here is only for notational
convenience.
More generally, let us consider an option written on a basket of N under-
lying assets S1 , . . . , SN . Let the pay-off of such an option be
where T still denotes the maturity. G is typically the univariate pay-off char-
acterizing the contract. For instance, for a European call with strike K, we
have:
+
G(x) = [x − K] . (3.129)
The function ψ describes how the N underlying assets Si determine the ter-
minal cash-flow. For instance, one can consider an option on the minimum of
the N assets
N
ψ (S1 (T ), . . . , SN (T )) = wi · Si (T ) . (3.131)
i=1
where Sinf and Ssup are two random variables with distribution functions
Finf and Fsup respectively (see Theorem. 3.4.1).
15
To super-replicate means to hedge with certainty.
3.6 Application of Copulas 135
P = EQ [min{S1 (T ), . . . , SN (T )} − K]
+
. (3.134)
and
Thus, defining Sinf and Ssup as two random variables such that:
PrQ [Sinf ≤ x] = 1 − min{P1 (x), . . . , PN (x)}, (3.141)
Q
Pr [Ssup ≤ x] = 1 − max{P1 (x) + · · · + PN (x) − (N − 1), 0} (3.142)
it follows from (3.133) that
EQ [Ssup − K] ≤ P ≤ EQ [Sinf − K]
+ +
. (3.143)
The quantitative values of these two bounds are obtained after calibration
and numerical integration.
To obtain more accurate information on the price of options defined on
a basket of assets, it is necessary to specify the nature of the risk-neutral
copula. The problem comes from the fact that there exists no general rela-
tion between the historical copula C P and the risk-neutral C Q . However, in
some special cases, one can obtain this relation. For instance, in the multivari-
ate Black-Scholes model, both the historical and the risk-neutral copulas are
Gaussian copulas, with the same correlation matrix. This result generalizes to
the case where asset prices follow diffusion processes with deterministic drifts
and volatilities [112].
In the more realistic case where one considers a stochastic volatility model
(under P) like
16
Rainbow options get their name from the fact that their underlying is two or
more assets rather than one.
136 3 Notions of Copulas
dSi (t)
= µi (t, σi (t)) dt + σi (t)dBi (t), i = 1, . . . , N (3.144)
Si (t)
dσi (t) = ai (t, σi (t)) dt + bi (t, σi (t)) dWi (t) , (3.145)
for instance, where Bi (t) and Wi (t) denote standard Wiener processes and
where ai (·, ·) and bi (·, ·) are chosen such that the σi (t)’s remain positive almost
surely, one cannot express C P and C Q explicitly. In addition, since individual
volatilities are a non-traded assets, the market is incomplete, and the choice
of a risk-neutral measure Q – which amounts to choosing the market prices of
volatility risks λi – is not unique. One has to set additional constraints in order
to select an appropriate Q. Many methods have been developed for univariate
stochastic volatility models, which can be extended to the multivariate case.
Let us mention the minimal martingale measure [176, 434, 435], the minimal
entropy measure [192, 403] or the variance-optimal measure [54, 177, 226, 292,
384], for instance. All these examples are, in fact, particular cases of q-optimal
measures for q = 0, 1 and 2, respectively) [125, 234], i.e. measures which are
the closest to the objective (or historical) measure P in the sense of the qth
moment of their relative density. Such measures minimize the functional
dQ q
E q−1 q
, if Q P
Hq (P, Q) = dP
(3.146)
+∞, otherwise ,
for q ∈ {0, 1}. The symbol “” means absolutely continuous, i.e., the sets of
zero measure for P are also sets of zero measure for Q.
Such measures have the additional advantage of allowing an interpretation
in terms of utility maximizing agents. Indeed, asset prices obtained under q-
optimal measures represent the marginal utility indifferent prices for investors
with HARA17 utility functions [230].
Using the risk-neutral probability measure Q which amounts to taking a
vanishing market price of the volatility risk, and if in addition the rates of
return µi (t, σi (t)) do not depend on σi , then it can be shown that C P =
C Q (see Appendix 3.C). In such a case, the calibration of the copula under
historical data provides the risk-neutral copula.
Unfortunately, when these conditions are not met, or when one considers
more general diffusion models of the form
dSi (t) = µi (t, Si (t)) dt + σi (t, Si (t)) dWi (t), i = 1, . . . , N , (3.148)
it is in general impossible to obtain a relation between C P and C Q . In this
case, the risk-neutral copula can only be and has to be determined directly
17
Hyperbolic absolute risk aversion.
3.6 Application of Copulas 137
from options prices. In practice, when one deals with contracts which are not
actively traded, or contracts negotiated OTC,18 data may be rare, leading to
serious restrictions for the calibration of the risk-neutral copula and showing
the limit of the approach.
Default risk models are basically of two kinds. The first class contains models
which are close to many actuarial models. They rely on the assumption that,
conditional on a set of economic factors, the individual default probabilities
of each obligator are independent. Such models are known as mixture models
[248]. They include frailty models, presented page 113, as well as professional
models like CreditRisk+ [114]. It is in general difficult to obtain an analytical
expression of their dependence structure.
The second class of default risk models are based on Merton’s seminal work
on firm value [358]. In particular, industry standards like Moody’s KMV [273]
and RiskMetrics [406] are extensions of this original model. They consider
that the default of an obligator occurs when a latent variable, which usually
represents the firm’s asset value, goes below some level usually representing
the value of the firm’s liabilities. In the more recent model by Li [303], the
latent variables account for the time-to-default of an obligator and the crossing
level represents the time horizon of interest. These approaches are equivalent
since, once a dynamics is specified for the assets, one can derive, in principle,
the law of the time-to-default.
These models assume the same dependence structure for the latent vari-
ables, characterized by a Gaussian copula. Hence, the joint probability of
default is closely related to the Gaussian copula. Indeed, let us consider N
obligators and let Di be the default indicator of obligator i. Di equals one if
obligator i has defaulted and zero otherwise. Let (X1 , . . . , XN ) denote the vec-
tor of latent variables and (T1 , . . . , TN ) the vector of thresholds below which
default occurs:
Di = 1 ⇐⇒ Xi ≤ Ti . (3.149)
In the KMV methodology, the variables {Xi } model the return processes
of the assets. They are assumed multivariate Gaussian, and their correlations
are set by a factor model representing the various underlying macroeconomic
variables impacting the dynamics of the asset returns. Each threshold Ti is
determined by an option technique applied to the historical data of the ith
firm.
CreditMetrics’ approach is also based upon the assumption that the Xi ’s
are multivariate Gaussian random variables. However, they do not represent
the evolution of the asset value itself but the evolution of the rating of the
firm. The range of each Xi is divided into classes which represent the possible
rating classes of the firm. The classes are determined so that they agree with
historical data. This procedure allows one to fix simultaneously all the values
of the thresholds {Ti }. Again, the correlations are calibrated by assuming a
factor model.
In Li’s model, the latent variable Xi is interpreted as the time-to-default
of obligator i and the thresholds Ti ’s are all equal to T , the time horizon over
which the credit portfolio is monitored. Here, the multivariate distribution of
the Xi ’s is not Gaussian anymore (since, now, the Xi ’s are positive random
variables). The marginal distribution of each Xi is exponential with parameter
λi :
while the copula remains Gaussian. Again, the correlations between the Xi ’s
can be determined from a factor model.
This recurrent use of a Gaussian factor model which is equivalent to de-
scribing the dependence between the latent variables in terms of a Gaussian
copula has been ratified by the recommendations of the BIS [42] concerning
credit risk modeling. However, there are many indications suggesting that
this Gaussian copula approach may be grossly inadequate to account for large
credit risk (see [186] for instance), since the Gaussian copula might – by con-
struction – underestimate the largest concomitant risks. We will come back
in more detail on this crucial point in the next chapter (Chap. 4) where we
will present and contrast the different available measures of dependence and
address more precisely how to assess the dependence in the tails of the distri-
bution.
Appendix
ψ(x,y)≤t
* *
(x ,y )
X
Fig. 3.13. The area hatched with plain lines represents the set of points (x, y) such
that ψ(x, y) ≤ t. The area hatched with dashed lines represents the set of points
(x , y ) such that x ≤ x∗ and y ≤ y ∗ for some (x∗ , y ∗ ) satisfying ψ (x∗ , y ∗ ) = t. By
definition, the F -measure of this area is Pr [X ≤ x∗ , Y ≤ y ∗ ] = F (x∗ , y ∗ )
where C denotes the copula of the random vector (X, Y ), we can write:
for all (x∗ , y ∗ ) such that ψ (x∗ , y ∗ ) = t, which finally allows us to assert that:
N
wi hi = 0 . (3.B.9)
i=1
N
N N
x∗
i +hi
ti
f (xi ) = f (x∗i ) + dti dui f (ui ) . (3.B.10)
i=1 i=1 i=1 x∗
i x∗
i
Thus exp(− f (xi )) can be rewritten as follows:
N
N
N x∗
i +hi ti
exp − f (xi ) = exp − f (x∗i ) − dti
dui f (ui ) .
i=1 i=1 i=1 x∗
i x∗
i
(3.B.11)
N
Let us now define the compact set AC = {h ∈ RN , i=1 f (x∗i )2 ·h2i ≤ C 2 }
N
for any given positive constant C and the set H = {h ∈ RN , i=1 wi hi = 0}.
We can thus write
N
PS (S) = dh e− i=1 [f (xi )−ln g(xi )] , (3.B.12)
H
N
= dh e− i=1 [f (xi )−ln g(xi )]
A ∩H
C N
+ dh e− i=1 [f (xi )−ln g(xi )] . (3.B.13)
AC ∩H
Let us analyze in turn the two integrals of the right-hand side of (3.B.13).
Concerning the first integral, it can be shown that
N x∗ t
− i +hi
i=1 x∗ dt x∗
du f (u)−ln g(x∗
i +hi )
AC ∩H
dh e i i
lim N −1 & = 1, for some positive C.
S→∞ (2π) 2 g(x∗
i)
/ i
&N
N w2 f (x∗ )
i j=1 j j
i=1 f (x∗ )
i i
(3.B.14)
The cumbersome proof of this assertion is found in [336]. It is based upon the
fact that
142 3 Notions of Copulas
g(x∗i + hi )
∀h ∈ AC , (1 − i )ν ≤ ≤ (1 + i )ν , (3.B.16)
g(x∗i )
by Assumptions 1 and 6.
Now, for the second integral on the right-hand side of (3.B.13), we have
to show that
∗ ∗
dh e− f (xi +hi )−g(xi +hi ) (3.B.17)
AC ∩H
for some positive α. Thus, for S large enough, the density PS (S) is asymptot-
ically equal to
N
(2π)
N −1
2
PS (S) = g(x∗i ) &N
exp − f (x∗i ) . (3.B.19)
i
N wi
2
j=1 f (x∗ )
j j i=1
i=1 fi (x∗
i)
and
c
g(x) = √ c/2 · |x| 2 −1 ,
c
(3.B.21)
2 πχ
which satisfies our assumptions if and only if c > 1. In such a case, we obtain
Appendix 143
1
c−1
w
x∗i = N i c ·S , (3.B.22)
j=i ωjc=1
χ̂ = wic−1
·χ. (3.B.24)
i=1
SN = w1 X1 + w2 X2 + · · · + wN XN (3.B.25)
which yields
d
SN = w1 χ1 · Y1 + w2 χ2 · Y2 + · · · + wN χN · YN . (3.B.28)
where Si is the price of asset i, while ai (·, ·) and bi (·, ·) are chosen so that the
volatility σi (t) of each asset remains positive almost surely. As an example,
one can choose
mi
ai (t, σi ) = κi − σi , and bi (t, σi ) = βi . (3.C.31)
σi
This stochastic volatility model is equivalent to the Heston model [232] written
for the squared volatility instead of the volatility itself. In the present case,
the condition κi · mi ≥ βi 2 , together with κi , mi > 0, ensures the positivity of
σi (t), provided that σi (0) > 0.
The solution of (3.C.29) with Si (0) = Si0 is:
t t
1
Si (t) = Si0 exp µi (s, σi (s)) − σi (s) ds +
2
σi (s)dBi (s) ,
0 2 0
(3.C.32)
we can assert that the copula C P of (S1 (t), . . . , SN (t)) is the same as the
copula of (Z1 (t), . . . , ZN (t)), since each Si (t) = Si0 · exp [Zi (t)] is an increasing
transform of the corresponding Zi (t).
Assuming that the usual conditions are satisfied, Girsanov Theorem19 al-
lows us to assert that there exists a probability measure Q, equivalent to P
on FT , such that
N 2
dQ t
µi (s, σi (s)) 1 t µi (s, σi (s))
= exp − dBi (s) + ds
dP i=1 0 σi (s) 2 0 σi (s)
N t
1 t 2
− λi (s, σi (s)) dWi (s) + λi (s, σi (s)) ds , (3.C.34)
i=1 0 2 0
for any suitable processes (λ1 , . . . , λN ), and that
t
µi (s, σi (s))
B̃i (t) = Bi (t) + ds, i = 1, . . . , N (3.C.35)
0 σi (s)
t
W̃i (t) = Wi (t) + λi (s, σi (s)) ds, i = 1, . . . , N (3.C.36)
0
19
In the theory of probability, the Girsanov Theorem specifies how stochastic
processes change under changes in measure. The theorem is especially important
in the theory of asset pricing as it allows one to convert the physical measure
which describes the probability that an underlying (such as a share price or in-
terest rate) will take a particular value into the risk-neutral measure used for
evaluating the derivatives on the underlying.
Appendix 145
are Brownian motions under Q, with correlation matrix ρ. Since the volatility
is a non-traded asset, the problem of market incompleteness arises, so that
there is not a unique risk-neutral measure such that discounted assets prices
are martingale.
For simplicity, let us assume that the risk-free interest rate is vanishing so
that asset prices are directly discounted prices. Under any Q, using (3.C.35)
and (3.C.36), (3.C.29–3.C.30) can be written
dSi (t)
= σi (t)dB̃i (t), i = 1, . . . , N (3.C.37)
Si (t)
dσi (t) = [ai (t, σi (t)) − λi (t, σi (t)) · bi (t, σi (t))] dt
+ bi (t, σi (t)) dW̃i (t), (3.C.38)
which shows that Si (t) is a Q-martingale. The solution of (3.C.37) with
Si (0) = Si0 , under Q, is:
t
1 t
Si (t) = Si exp −
0 2
σi (s) ds + σi (s)dB̃i (s) , (3.C.39)
2 0 0
where σi (t) is now the solution of (3.C.38). Denoting by Z̃i (t) the random
variable
t
1 t
− σi (s)2 ds + σi (s)dB̃i (s) , (3.C.40)
2 0 0
Q
we can assert
that the copula C of (S1 (t), . . . , SN (t)) is the same as the
copula of Z̃1 (t), . . . , Z̃N (t) .
Therefore, C P = C Q if and only if the copula of Z̃1 (t), . . . , Z̃N (t) is the
same as the copula of (Z1 (t), . . . , ZN (t)). In the general case, the Zi (t)’s and
Z̃i (t)’s are not simple increasing transforms of each other. Therefore, their
copulas are not identical and C P
= C Q . But in the particular case where the
rates µi are deterministic functions – i.e., independent of σi (t) – the copula
C P is nothing but the copula of the random variables:
t
∗ 1 t 2
Zi (t) = − σi (s) ds + σi (s)dBi (s), i = 1, . . . , N , (3.C.41)
2 0 0
i ’s of volatility risks are vanishing, the vectors (Z1 (t), . . . , ZN (t)) and
Z̃1 (t), . . . , Z̃N (t) are equal in law, since (3.C.30) and (3.C.38) are then the
same. Thus, in this case, (Z1∗ (t), . . . , ZN∗
(t)) and Z̃1 (t), . . . , Z̃N (t) have the
same copula, and therefore C P = C Q .
4
Measures of Dependences
In the previous chapter, we have shown how to describe with copulas the
general dependence structure of several random variables, with the goal of
modeling baskets of asset returns, or more generally, any multivariate financial
risk. However, the general framework provided by copulas does not exclude
more specific measures of dependences that can be useful to target particular
ranges of variations of the random variables.
This chapter presents and describes in detail the most important depen-
dence measures. Starting with the description of the basic concept of linear de-
pendence, through linear correlation and canonical N -correlation coefficients,
we then focus on concordance measures and on more interesting families of
dependence measures. We then turn to measures of extreme dependence. In
each case, we underline their relationship with copulas.
The linear correlation is probably still the most widespread measure of de-
pendence, both in finance and insurance. Given two random variables X and
Y , the linear correlation coefficient is defined as:
Cov [X, Y ]
ρ(X, Y ) = , (4.1)
Var [X] · Var [Y ]
provided that the variances Var [X] and Var [Y ] exist. Cov [X, Y ] is the co-
variance of X and Y . The coefficient ρ(X, Y ) is called a linear correlation
coefficient because its knowledge is equivalent to that of the coefficient β of
the linear regression Y = βX + , where 0is the residual which is linearly
Var[X]
uncorrelated with X. We have indeed ρ = β Var[Y ] .
148 4 Measures of Dependences
1
1
0.5
V
0
V
−0.5
−1
0
−1 −0.5 0 0.5 1 0 θ 1
U U
Fig. 4.1. Graph of the variable V = sin ω versus U = cos ω for ω ∈ [0, 2π] (left
panel ) and graph of the variable V = Uθ · 1U ∈[0,θ] + 1−U
1−θ
· 1U ∈[θ,1] (right panel )
1
T
Xi − X̄ · Yi − Ȳ
T i=1
ρ̂T = 1 , (4.2)
2
21 T
1 T
3 Xi − X̄ ·
2
Yi − Ȳ
2
T i=1 T i=1
It is easy to check that ρ(U, V ) = 0, even though the two random variables
are not independent, as shown in the left panel of Fig. 4.1 which plots the
variable V as a function of U .
More striking is the case where the knowledge of one of the variables com-
pletely determines the other one. As an example, consider a random variable
U, uniformly distributed on [0, 1] and the random variable V defined by:
V = Uθ U ∈ [0, θ],
(4.4)
1−U
V = 1−θ U ∈ [θ, 1] ,
for some θ ∈ [0, 1] (see right panel of Fig. 4.1). One can easily show that V is
also uniformly distributed on [0, 1] and that
ρ(U, V ) = 2θ − 1 , (4.5)
so that U and V are uncorrelated for θ = 1/2 while V remains perfectly
predictable from U .
When two random variables, X and Y , are linearly dependent:
Y =α+β·X , (4.6)
the correlation coefficient ρ(X, Y ) equals ±1, depending on whether β is pos-
itive or negative (in the previous example, this corresponds to θ = 1 or 0,
respectively). Here, the converse holds. This derives from the representation:
E (Y − (α + β · X))2
ρ(X, Y ) = 1 − min
2
, (4.7)
α,β Var [Y ]
where E [ ] denotes the expectation with respect to the joint distribution of
X and Y . ρ(X, Y )2 is called the coefficient of determination and gives the
proportion of the variance of one variable (Y ) that is predictable from the
other variable (X).
By Cauchy-Schwartz inequality, (4.1) allows one to show that ρ ∈ [−1, 1].
But, given two random variables X and Y with fixed marginal distribution
functions FX and FY , it is not always possible for the correlation coefficient
to reach the bounds ±1. Indeed, Chap. 3 has shown that any bivariate distri-
bution function F is bracketed by the Fréchet-Hoeffding bounds:
max {FX (x) + FY (y) − 1, 0} ≤ F (x, y) ≤ min {FX (x), FY (y)} . (4.8)
Therefore, applying Hoeffding identity [130]
ρ(X, Y ) = [F (x, y) − FX (x) · FY (y)] dx dy , (4.9)
one can now conclude that, given FX and FY , the correlation coefficient ρ
lies between ρmin and ρmax , where ρmin is attained when X and Y are coun-
termonotonic random variables while ρmax is attained when X and Y are
comonotonic random variables.
150 4 Measures of Dependences
0.8
0.6
0.4
0.2
ρ
−0.2
−0.4
−0.6
−0.8
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
σ
Fig. 4.2. Graph of ρmin and ρmax given by (4.12) versus σ for two random variables
with log-normal marginal distributions: log N (0, 1) and log N (0, σ)
4.1 Linear Correlations 151
X = a · X + b, a>0, (4.13)
Y = c · Y + d, c>0, (4.14)
since ρ (X , Y ) = ρ (X, Y ). However, this property does not generalize to any
(nonlinear) increasing transformation. As a consequence, the correlation coef-
ficient does not give access to the dependence between two random variables
in the sense of Chap. 3. This lack of invariance with respect to nonlinear
changes of variables is due to the fact that the correlation coefficient aggre-
gates information on both the marginal behavior of each random variable and
on their true dependence structure given by the copula.
Instead of focusing on the overall correlation, one can look at the local linear
dependence between two random variables. This idea, introduced by Doksum
et al. [58, 134], enables one to probe the changes of the correlation strength as
a function of the value of the realizations of the random variables. It allows, for
instance, to address the question of whether the correlation remains constant
or vary when the realizations of the random variables are typical or not. This
is particularly useful when dealing with contagions of crises (see Chap. 6)
or when investigating whether flight-to-quality actually occurs between stock
and bond markets, for instance.
The definition of the local correlation coefficient is quite natural. It starts
from the remark that, in a linear framework, if the two random variables X
and Y are related by
Y = α + βX + , (4.15)
2
where σX and σ2 denote respectively the variance of X and of the error term
.
Let us now assume that the more general relation
and, by analogy with (4.16), define the local linear correlation coefficient by
152 4 Measures of Dependences
f (x0 ) · σ(x0 )
ρ (x0 ) = 0 . (4.19)
2 2
f (x0 ) · σ(x0 )2 + σ (x0 )
1
L
S X(t) = X (t) · X (t)T , (4.20)
L t=1
S −1 −1
ξi ξi S ξi Xi SXi Xi S Xi ξi (4.21)
where
The matrices in formulas (4.21) and (4.22) are submatrices of the general
N × N covariance matrix S XX = Cov(X, X T ) (whose estimation is given
in (4.20)). Thus, replacing the matrix S XX (and its submatrices) in (4.21)
and (4.22) by its sample estimate (4.20) allows one to compute the vector φ
and the set of scalar values ζi for i = 1, . . . , N . One can call the maximum
eigenvalue of the matrix (4.21) the “canonical coefficient of N -correlation”
between the random variable Xi and the other N −1 variables, which captures
the common factors between Xi and all the other N − 1 variables. Performing
similar operations with all other components of the vector X, one thus obtains
a N -dimensional vector of canonical coefficients of N -correlation equal to the
largest eigenvalues of the matrices (4.21) for i = 1, . . . , N . For N = 2, the (N −
1)-dimensional matrix (4.21) reduces to the square of the standard correlation
coefficient between the N = 2 variables.
A slightly different but equivalent formulation is as follows. Consider the
regression of a random variable Xi on the (N − 1)-dimensional random vec-
tor ξ i (t) = [X1 (t), . . . , Xi−1 (t), Xi+1 (t), . . . , XN (t)]T , i.e., the evaluation of a
vector φ of regression coefficients in the linear formula:
Xi = φj Xj + i = φT · ξ i + i , (4.23)
j=i
L
2
φT · ξ i − Xi (4.24)
t=1
φ̂ = S −1
ξi ξi · S ξi Xi . (4.25)
T
Let ξˆi = φ̂ · ξ i denote the contribution to the regression (4.23) for this
estimate (4.25). Since
Cov(Xi , ξ̂ i ) = Cov(Xi , S −1 −1
ξi ξi · S ξi Xi · ξ i ) = S Xi ξi · S ξi ξi · S ξi Xi , (4.26)
154 4 Measures of Dependences
ρiN = S Xi ξi · S −1 −1
ξi ξi · S ξi Xi · SXi Xi (4.27)
can be determined from the solution of the regression problem (4.23, 4.24).
This correspondence between the two formulations is rooted in the equivalence
between linear correlation and the coefficient of linear regression, as pointed
out above.
Again, this canonical coefficient of N -correlation is, by construction, in-
variant under linear transformations of each Xi individually. However, it is
not left unchanged under nonlinear monotonic transformations. It is therefore
necessary to look for other measures of dependence which are only functions
of the copula. The concordance measures described below enjoy this property.
The left-most term in the r.h.s. (right-hand side) gives the probability of con-
cordance, i.e., the probability that X and Y move together upward or down-
ward. In contrast, the right-most term in the r.h.s. represents the probability
of discordance, i.e., the probability that the two random variables move in
opposite directions.
The expression (4.28) defines the population version of the so-called
Kendall’s τ . This quantity is invariant under increasing transformation of the
4.2 Concordance Measures 155
From this equation, one easily checks that Kendall’s τ varies between −1
and +1. The lower bound is reached if and only if the variables (X, Y ) are
countermonotonic, while the upper bound is attained if and only if (X, Y ) are
comonotonic. In addition, τ equals zero for independent random variables.
However, as for the (linear) correlation coefficient, τ may vanish even for
non-independent random variables.
In spite of its attractive structure, (4.32) is not always very useful for calcu-
lations and one often has to resort to numerical integration (by use of quadra-
ture, for instance). However, some more tractable expressions have been found
for particular families of copulas.
Archimedean Copulas
Genest and McKay [198] have shown that, for generators ϕ which are strictly
decreasing functions from [0, 1] onto [0, ∞] with ϕ(1) = 0, Kendall’s τ of the
Archimedean copula
is given by
1
ϕ(t)
τ =1+4 dt . (4.34)
0 ϕ (t)
This expression relies on the general fact that (4.32) can be rewritten as
τ = 4 · E [C(U, V )] − 1 , (4.35)
where U and V are uniform random variables with joint distribution function
C. Now, in the particular case of an Archimedean copula, one can show that
[370]
156 4 Measures of Dependences
ϕ(t)
Pr [C(U, V ) ≤ t] = t − , (4.36)
ϕ (t+ )
which immediately yields the results given by (4.34). Table 4.1 provides closed
form expressions for Kendall’s τ ’s of Clayton’s copula, Gumbel’s copula and
Frank’s copula, which are shown in Fig. 4.3 as a function of their corresponding
form parameters θ.
1 −θ θ
Clayton t −1 θ ∈ [−1, ∞]
θ θ+2
θ−1
Gumbel (− ln t)θ θ ∈ [1, ∞]
θ
e−θt − 1 4
Frank − ln 1− [1 − D1 (θ)] θ ∈ [−∞, ∞]
e−θ − 1 θ
0.8
0.6
0.4
0.2
0
τ
−0.2
−0.4
−0.6
−0.8
−1
−10 0 10 20 30 40 50
θ
Elliptical Copulas
This particularly useful family of copulas also allows for tractable calculation
of Kendall’s τ . Generalizing the result originally obtained by Stieltjes [115] for
the Gaussian distribution, Lindskog et al. [307] have shown that the relation
2
τ= arcsin ρ (4.37)
π
holds for any pair of random variables whose dependence structure is given
by an elliptical copula. The parameter ρ denotes the shape coefficient (or
correlation coefficient, when it exists) of the elliptical distribution naturally
associated with the considered elliptical copula.
This result is particularly interesting because it provides a robust esti-
mation method for the shape parameter ρ. Of course, when the elliptical
distribution associated with the elliptical copula admits a second moment,
the correlation coefficient exists and ρ can be estimated from Pearson’s coeffi-
cient (4.2). However, when the elliptical distribution does not admit a second
moment, this approach fails. In this case, Kendall’s τ has the advantage of
always existing and of being easily estimated. In fact, its superiority is even
greater, as demonstrated by Fig. 4.4 which shows that estimates of τ yield
more robust estimates of ρ via (4.37). This is especially true when the tails
of the marginals associated with the elliptical distributions are heavy. In the
example depicted in Fig. 4.4, we have considered two Student’s distributions
with three and ten degrees of freedom respectively. While the estimates of
ρ provided by Kendall’s τ (dashed curve) remain approximately equally effi-
cient in both cases, the efficiency of the estimates of ρ provided by Pearson’s
5 6
5
4
4
3
2
2
1
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ρ ρ
with
Q (C1 , C2 ) = 4 C1 (u, v) dC2 (u, v) − 1 , (4.39)
[0,1]2
=4 C2 (u, v) dC1 (u, v) − 1 . (4.40)
[0,1]2
between these two expressions (4.40) and (4.32) therefore allows one to define
the notion of proximity between two copulas.
Since any copula is bounded by the Fréchet-Hoeffding upper and lower
bounds, we have
[max(u + v − 1, 0) − C2 (u, v)] dC2 (u, v)
[0,1]2
≤ [C1 (u, v) − C2 (u, v)] dC2 (u, v) (4.42)
[0,1]2
≤ [min(u, v) − C2 (u, v)] dC2 (u, v) ,
[0,1]2
Spearman’s Rho
while
1 1
1
u · v dudv = , (4.47)
0 0 4
so that the central fraction in (4.45) leads to define the so-called Spearman’s
rho:
ρs (C) = 12 C(u, v) dudv − 3 . (4.48)
[0,1]2
It enlightens the fact that Spearman’s rho is related to the linear correlation of
the rank. Indeed, considering two random variables X and Y , with marginal
distributions FX and FY , it is straightforward to check that
Cov (FX (X), FY (Y ))
ρs = . (4.51)
VarFX (X) · VarFY (Y )
Our introduction of Spearman’s rho, motivated from Kendall’s τ , shows
that they are closely related. In fact, given any copula C, Kruskal [281] has
shown that
3τ − 1 τ 2 − 2τ − 1
≤ ρS ≤ − , τ ≥0, (4.52)
2 2
τ 2 + 2τ − 1 3τ + 1
≤ ρS ≤ , τ ≤0. (4.53)
2 2
Figure 4.5 shows that the area of accessible values for the couple (τ, ρS ) repre-
sents a relatively narrow strip, reflecting the strong relation between Kendall’s
τ and Spearman’s rho.
Fig. 4.5. The shaded area represents the allowed values for the couple (τ, ρS )
4.2 Concordance Measures 161
Gini’s Gamma
and
1 1
1
C2 (u, v) dC2 (u, v) = . (4.56)
0 0 4
The central fraction in (4.45) then yields the so-called Gini’s gamma:
1 1
1
γ(C) = 4 C(u, u) du + C(u, 1 − u) du − . (4.57)
0 0 2
Note that this measure of dependence only relies on the values taken by C on
its main diagonals. The alternative expression
1 1
γ(C) = 4 C(u, u) du − [u − C(u, 1 − u)] du (4.58)
0 0
shows that Gini’s gamma represents the difference of the area between the
values of C(u, v) and max(u + v − 1, 0) on the first diagonal and between the
value of C(u, v) and min(u, v) on the second diagonal (see the shaded areas
in Fig. 4.6).
The three measures of dependence – Kendall’s tau, Spearman’s rho and Gini’s
gamma – presented in the previous paragraphs enjoy the same set of proper-
ties:
1. they are defined for any pair of continuous random variables X and Y ,
2. they are symmetric: for any pair X and Y , τ (X, Y ) = τ (Y, X), for instance,
3. they range from −1 to +1, and reach these bounds when X and Y are
countermonotonic and comonotonic respectively,
162 4 Measures of Dependences
Fig. 4.6. The shaded surface represents the area between the values of C(u, v) –
here the product copula Π(u, v) = u · v – and max(u + v − 1, 0) on the first diagonal
and between the value of C(u, v) and min(u, v) on the second diagonal
then the same ranking holds for any of these three measures; for instance,
τ (X1 , X1 ) ≥ τ (Y1 , Y2 ).
Any measure of dependence fulfilling these five properties is named a con-
cordance measure. The central fraction in (4.45), with any exchangeable cop-
ula C2 such that condition (4.44) is fulfilled together with ρs (C2 ) = 3 τ (C2 ),
ensuring that the numerator of the central term of (4.45) vanishes for
C1 (u, v) = u.v, provides a measure of dependence which satisfies the five
conditions above, and is thus a concordance measure.
can then show that S is equal to one-fourth of the symmetric relative entropy
of h and f · g for the 1/2-class entropy.
Dependence metrics such as S provide very useful tools to test the presence
of complicated serial dependences. This is particularly important not only to
analyze and forecast financial time series [214] but also to test the goodness-
of-fit in copula modeling, as we shall see in Chap. 5.
This defines two random variables as PQD if the probability that they are
simultaneously large or small is at least as large as it would be if these two
random variables were independent. This definition is relevant for risk man-
agement purpose, since it amounts to ask whether large losses of individual
assets tend to occur more frequently together than they would if the assets
were independent.
Definition (4.65) implies that X and Y are PQD if and only if their copula
C satisfies
This ensures that the PQD property depends only on the dependence structure
of the random variables (and not on their marginals).
The PQD property and the concordance measures are intimately related.
Indeed, as recalled in Sect. 4.2.3, if the pair of random variables (X1 , X2 ) is
more dependent than the pair (Y1 , Y2 ), it is also more concordant. So, any
PQD pair of random variables is more concordant than independent pairs of
random variables. But, since any concordance measure equals zero for inde-
pendent random variables, we can assert that, given any concordance mea-
sure, any pair of PQD random variables has a positive concordance measure.
In particular, Kendall’s tau, Spearman’s rho or Gini’s gamma are necessarily
positive for PQD random variables. Besides, (4.48) shows that the Spearman’s
rho is a kind of averaged positive quadrant dependence.
To conclude this brief survey of the properties of PQD random variables,
let us stress that the same result holds for the usual linear correlation coeffi-
cient. Indeed, by Hoeffding identity (4.9), any PQD random variables exhibit
a nonnegative correlation coefficient. Unfortunately, the converse does not
hold. However, given two random variables X and Y such that the linear
correlation coefficient ρ (f (X), g(Y )) exists and is non-negative for any non-
decreasing functions f and g, then these two random variables are PQD [300].
Let us now generalize the bivariate concept of positive quadrant depen-
dence to the multivariate concept of positive orthant dependence. We will say
166 4 Measures of Dependences
for all xi ’s. As in the bivariate case, this equation simply means that the
probability that the N random variables X1 , . . . , XN are simultaneously small
is at least as large as it would be if these N random variables were independent.
Similarly, N random variables X1 , X2 , . . . , XN are Positive Upper Orthant
Dependent (PUOD) if
for all xi ’s. Again, this equation has a simple interpretation: the probability
that the N random variables X1 , . . . , XN are simultaneously large is at least as
large as it would be if these N random variables were independent. Note that
the two definitions (4.69) and (4.70) are not equivalent anymore for N > 2.
Finally, N random variables X1 , X2 , . . . , XN are Positive Orthant Depen-
dent (POD) if they are both PUOD and PLOD: the probability that the N
random variables X1 , . . . , XN are simultaneously small or large is at least as
large as it would be if these N random variables were independent.
In terms of copulas, these definitions can be expressed as follows. Given a
N -random vector X = (X1 , . . . , XN ) with copula C,
N
X is PLOD ⇐⇒ C(u1 , . . . , uN ) ≥ ui , ∀ui ∈ [0, 1] , (4.71)
i=1
and
N
X is PUOD ⇐⇒ C̄(u1 , . . . , uN ) ≥ (1 − ui ) , (4.72)
i=1
which is equivalent to
N
Cϕ (u1 , . . . , uN ) ≥ ui . (4.76)
i=1
The proof that the subadditivity of ϕ (e−x ) is in fact a necessary and sufficient
condition for an Archimedean copula to be PLOD can be found in [147].
Let us remark that any completely monotonic generator fulfills the re-
quirement that (4.73) be concave. Therefore, any Archimedean copula which
admits a generalization to arbitrary dimension is PLOD. Archimedean cop-
ulas which exist in any dimension necessarily exhibit positive associations
and their bivariate marginals cannot have negative concordance measures. In
this respect, the bivariate Clayton or Frank copulas admit an n-dimensional
generalization for positive parameter value θ only.
The property of POD is a reasonable assumption for most asset returns.
This allows us to sharpen the (universal) bound for the VaR of the portfolios
considered in Fig. 3.11. Instead of considering the Fréchet-Hoeffding lower
bound in (3.76), one can choose Cinf = Csup = Π, where Π(u, v) = u · v is
the product copula.
The concept of POD is also appealing for testing whether some trading
strategies are actually market neutral. Such strategies are very common in the
alternative investment industry. They aim at decoupling portfolio moves from
market moves, in order to ensure a better stability of the performance of port-
folios. Portfolio managers often focus solely on their fund’s beta, trying to keep
it as small as possible while raising their alpha (the market-independent part
of the expected return). However, if this approach allows them in principle
to remove any linear dependence between the portfolio and the market, it to-
tally neglects nonlinear and extreme dependences. Therefore, testing for POD
seems necessary in order to check whether a fund is actually market neutral.
Denuit and Scaillet [127] have proposed a nonparametric test for POD and,
considering the HRF and CSFB/Tremont market neutral hedge fund indices,
they have shown that both of them exhibit weak linear dependence with the
S&P 500 index – as expected – but that POD cannot be rejected between the
CSFB/Tremont market neutral index and the Standard & Poor’s 500. There-
fore, some funds contributing in the composition of the CSFB/Tremont index
may exhibit nonlinear or extreme dependence with the Standard & Poor’s 500.
This teaches us that focusing on beta is clearly not sufficient to ensure market
neutrality. We will come back to this problem at the end of this chapter when
constructing portfolios which minimize the impact of extreme market moves.
168 4 Measures of Dependences
Indeed, let C be the copula of the variables X and Y . If their bivariate copula
C is such that
1 − 2u + C(u, u) log C(u, u)
lim = lim 2 − = λU (4.78)
u→1 1−u u→1 log u
exists, then C has an upper tail dependence coefficient λU (see [106, 149, 147]).
In a similar way, one can define the coefficient of lower tail dependence:
C(u, u)
λL = lim+ Pr{X < FX −1 (u) | Y < FY −1 (u)} = lim+ . (4.79)
u→0 u→0 u
If λ > 0,3 the copula presents tail dependence and large events tend to
occur simultaneously, with (conditional) probability λ. On the contrary, when
λ = 0, the copula has no tail dependence and the variables X and Y are said
to be asymptotically independent. There is however a subtlety in this definition
(4.77) of tail dependence. To make it clear, first consider the case where, for
large X and Y , the cumulative distribution function F (x, y) factorizes such
that
F (x, y)
lim =1, (4.80)
x,y→∞ FX (x)FY (y)
where FX (x) and FY (y) are the margins of X and Y respectively. This means
that, for X and Y sufficiently large, these two variables can be considered as
independent. It is then easy to show that
lim Pr{X > FX −1 (u)|Y > FY −1 (u)} = lim 1 − FX (FX −1 (u)) (4.81)
u→1 u→1
= lim 1 − u = 0 , (4.82)
u→1
2 log(1 − u)
= lim −1 . (4.84)
u→1 log[1 − 2u + C(u, u)]
3
In the sequel, λ without subscript will represent either λU or λL .
170 4 Measures of Dependences
It can be shown that the coefficient λ̄U = 1 if and only if the coefficient of tail
dependence λU > 0, while λ̄U takes values in [−1, 1) when λU = 0, allowing
us to refine the nature of the dependence in the tail in the case when the tail
dependence coefficient is not sufficiently informative. It has been established
that, when λ̄ > 0, the variables X and Y are simultaneously large more
frequently than independent variables, while simultaneous large deviations of
X and Y occur less frequently than under independence when λ̄ < 0. In the
first case, the variables X and Y can be said to be locally PQD (positive
quadrant dependent) in the neighborhood of the point (0, 0) and/or (1, 1) in
probability space.
To summarize, independence (factorization of the bivariate distribution)
implies no tail dependence (λ = 0). But λ = 0 is not sufficient to imply factor-
ization and thus true independence. It also requires as a necessary condition
that λ̄ = 0.
We present several general results allowing for the calculation of the tail de-
pendence of Archimedean copulas, elliptical copulas and copulas derived from
factor models.
Archimedean Copulas
As a consequence, if ϕ−1 (0) > −∞, the coefficient of upper tail dependence
is identically zero. For an Archimedean copula to present tail dependence, it
is necessary that lim ϕ−1 (t) = −∞.
t→0
Similarly, the coefficient of lower tail dependence is
ϕ−1 (2t)
λL = 2 lim , (4.86)
t→∞ ϕ−1 (t)
so that ϕ−1 (∞) must be equal to 0 in order for the Archimedean copula to
have a nonzero lower tail dependence.
4.5 Tail Dependence 171
Table 4.2. Expressions of the coefficient of upper and lower tail dependence for
three Archimedean copulas. Note that the usual range for the parameter θ of Clay-
ton’s copula has to be restricted to [0, ∞] in order for the generator to be “strict”
Copula ϕ(t)−1 λL λU Range
−1/θ −1/θ
Clayton (1 + θt) 2 0 θ ∈ [0, ∞]
t1/θ−1
Gumbel − exp −t1/θ 0 2 − 21/θ θ ∈ [1, ∞]
θ
1 1 − e−θ e−t
Frank − · 0 0 θ ∈ [−∞, ∞]
θ 1 − (1 − e−θ e−t )
and, since ϕ is regularly varying with tail index −θ, ϕ−1 is also regularly
varying with tail index −1/θ [57], so that
ϕ−1 (2x)
λL = lim = 2−1/θ , (4.88)
x→0+ ϕ−1 (x)
1.21. 2
155
0.5
0.8 0.8 1
25
1. 1.5
0.5
0.1
2
0 .1
2
0.6 0.6 0.
5
1 1.5 5
2
25 1. 1
v
v
1. 1.5
0.4 0.4 2
5
0.
2
5
1. 5
2 1
1. 1 0.
2 11.2 1.5
0.1
0
.55
0.2 0.2
0.5
5 2
.55 0.1
11.2 1
0.5 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 4.7. Contour plot of the copula with generator (4.89) (left panel ) and of its
density (right panel ) for the parameters value α = 1 and β = 2
have a coefficient of upper tail dependence equal to 2 − 21/θ , since they lead
to generators which are regularly varying at 1, with tail index θ. Finally, to
obtain an Archimedean copula with both upper and lower tail dependence,
one just has to consider generators which are regularly varying at 0 and 1, or
alternatively to have frailty parameters with regular variation at zero and at
infinity.
An example is the following generator:
ϕ(t) = t−α · (− ln t) ,
β
(α, β) ∈ [0, ∞) × [1, ∞) , (4.89)
with inverse
−1 β α 1/β
ϕ (t) = exp − · W t , (4.90)
α β
Elliptical Copulas
1
ν=3
0.9 ν=5
ν=10
0.8 ν=20
ν=50
0.7 ν=100
0.6
0.5
λ
0.4
0.3
0.2
0.1
0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
ρ
Fig. 4.8. Coefficient of upper tail dependence as a function of the correlation co-
efficient ρ for various values of the number of degrees of freedom ν for Student’s
copula
which is greater than zero for all ρ > −1, and thus λ̄ = 1. Tν+1 is the Student
distribution with ν degrees of freedom and the bar denote the complemen-
tary distribution. This result λ̄ = 1 proves that extremes appear more likely
together whatever the correlation coefficient may be, showing that, in fact,
there is no general relationship between the asymptotic dependence and the
linear correlation coefficient. Figure 4.8 shows the coefficient of upper tail de-
pendence as a function of the correlation coefficient ρ for various values of the
number of degrees of freedom ν.
These distinctive properties of the Gaussian and Student’s copulas, char-
acterized by the absence or presence of tail dependence, are illustrated in
Fig. 4.9 which shows the realizations of two random variables with identical
standard Gaussian marginals, with a Gaussian copula or a Student’s copula
with three degrees of freedom and the same correlation coefficient ρ = 0.8.
In the right panel for the Student’s copula, the realizations (dots) are found
to lie within a diamond-shaped domain with narrower and narrower tips as
more extreme values are considered. This phenomenon can be observed, not
only for the bottom-left and upper-right quadrants, but also for the upper-left
and bottom-right quadrants. This results from the fact that the tail depen-
dence coefficient remains nonzero even for negative correlation coefficients as
illustrated in Fig. 4.8.
The Gaussian and Student’s distributions are two examples of elliptical
distributions. More generally, the following result is known: elliptically dis-
tributed random variables present a nonzero tail dependence if and only if
they are regularly varying, i.e., their distributions behave asymptotically like
174 4 Measures of Dependences
4 4
3 3
2 2
1 1
Y
Y
0 0
−1 −1
−2 −2
−3 −3
−4 −4
−5 −5
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
X X
Fig. 4.9. Realizations of two random variables with Gaussian marginals and with
a Gaussian copula (left panel ) and a Student’s copula with three degrees of freedom
(right panel ) with the same correlation coefficient ρ = 0.8
power laws with some exponent ν > 0 [239]. In such a case, for every regu-
larly varying pair of random variables which are elliptically distributed, the
coefficient of tail dependence λ is given by expression (4.91). This result is
natural since the correlation coefficient is an invariant quantity within the
class of elliptical distributions and since the coefficient of tail dependence is
only determined by the asymptotic behavior of the distribution, so that it
does not matter that the distribution is a Student’s distribution with ν de-
grees of freedom or any other elliptical distribution as long as they have the
same asymptotic behavior in the tail.
λ = min{λ1 , λ2 } . (4.94)
To understand this result, note that the tail dependence between X1 and X2
is created only through the common factor Y . It is thus natural that the tail
dependence between X1 and X2 is bounded from above by the weakest tail
4.5 Tail Dependence 175
dependence between the Xi ’s and Y while deriving the equality requires more
work. The result (4.94) generalizes to an arbitrary number of random variables
and shows that the study of the tail dependence in linear factor models can
be reduced to the analysis of the tail dependence between each individual
Xi and the common factor Y . In the following, we thus omit the subscript i
and consider without loss of generality one X linearly regressed on a factor Y
according to X = β · Y + .
A general result concerning the tail dependence generated by factor models
for any kind of factor and noise distributions is as follows [332, 335]: the
coefficient of (upper) tail dependence between X and Y is given by
∞
λ= dx f (x) , (4.95)
max{1, βl }
t · PY (t · x)
f (x) = lim , (4.97)
t→∞ F̄Y (t)
where FX and FY are the marginal distribution functions of X and Y respec-
tively, and PY is the density of Y .
As a direct consequence, one can show that any rapidly varying factor,
which encompasses the Gaussian, the exponential or the gamma distributed
factors for instance, leads to a vanishing coefficient of tail dependence, what-
ever the distribution of the idiosyncratic noise may be. This result is obvious
when both the factor and the idiosyncratic noise are normally distributed,
since then X and Y follow a bivariate Gaussian distribution, whose tail de-
pendence has been said to be zero.
On the contrary, regularly varying factors, like the Student’s distributed
factors, lead to a tail dependence, provided that the distribution of the idio-
syncratic noise does not become fatter-tailed than the factor distribution. One
can thus conclude that, in order to generate tail dependence, the factor must
have a sufficiently “wild” distribution. To present an explicit example, let us
assume now that the factor Y and the idiosyncratic noise have centered Stu-
dent’s distributions with the same number ν of degrees of freedom and scale
factors respectively equal to 1 and σ. The choice of the scale factor equal to
1 for Y is not restrictive but only provides a convenient normalization for σ.
Appendix 4.A shows that the tail dependence coefficient is given by
1
λ=
ν . (4.98)
σ
1+ β
This expression shows that, the larger the typical scale σ of the fluctuation of
and the weaker the coupling coefficient β, the smaller is the tail dependence,
in accordance with intuition.
176 4 Measures of Dependences
Surprisingly, λ does not go to zero for all ρ’s as ν goes to infinity, as could
be anticipated from the fact that the Student’s distribution converges to the
Gaussian distribution which is known to have zero tail dependence. Expression
√
(4.99) predicts that λ → 0 when ν → ∞ for all ρ’s smaller √ than 1/ 2. But,
and here lies the surprise, λ → 1 for all ρ larger than 1/ 2 when ν → ∞. This
counterintuitive result is due to a non-uniform convergence which makes the
order of the two limits non-commutative: taking first the limit u → 1 and then
ν → ∞ is different from taking first the limit ν → ∞ and then u → 1. In a
sense, by taking first the limit u → 1, we always ensure the power law regime
even if ν is later taken to infinity. This is different from first “sitting” on
the Gaussian limit ν → ∞. This paradoxical behavior reveals the sometimes
paradoxical consequences of taking the limit u → 1 in the definition of the
tail dependence.
As an illustration, Fig. 4.10 presents the coefficient of tail dependence for
the Student’s factor model as a function of ρ for various values of ν. It is
interesting to compare this figure with Fig. 4.8 depicting the coefficient of tail
dependence for the Student’s copula. Note that λ is vanishing for all negative
ρ’s in the case of the factor model, while λ remains nonzero for negative values
of the correlation coefficient for bivariate Student’s variables.
If Y and have different numbers νY and ν of degrees of freedom, two
cases occur. For νY < ν , is negligible asymptotically and λ = 1. For νY > ν ,
X becomes asymptotically identical to . Then, X and Y have the same tail-
dependence as and Y , which is zero by construction.
A straightforward generalization of this result can be derived for the mul-
tifactor model [72]:
X1 = β1,1 · Y1 + · · · + β1,n · Yn + 1 , (4.100)
X2 = β2,1 · Y1 + · · · + β2,n · Yn + 2 . (4.101)
The following generalization of (4.98) gives the coefficient of tail dependence
between X1 and one of the Yi as
ν
β1,i
λ1,i = n ν ν
, (4.102)
j=1 β1,j + σ
1
ν→2+
0.9 ν=3
ν=5
0.8 ν=10
ν=20
0.7 ν=50
0.6
0.5
λ
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ρ
n
λ= 1{β1,i ·β2,i >0} · min (λ1,i , λ2,i ) . (4.103)
i=1
These results are of particular interest for portfolio analysis and risk man-
agement, as we shall see in the next section.
Table 4.3. This table presents the coefficients of lower and of upper tail depen-
dence of the companies traded on the NYSE and listed in the first column with
the Standard & Poor’s 500. The returns used for the calculations are sampled in
the time interval from January 1991 to December 2000. The numbers within the
parentheses are the estimated standard deviations of the empirical coefficients of
tail dependence. Reproduced from [332]
λL λU
Bristol-Myers Squibb Co. 0.16 (0.03) 0.14 (0.01)
Chevron Corp. 0.05 (0.01) 0.03 (0.01)
Hewlett-Packard Co. 0.13 (0.01) 0.12 (0.01)
Coca-Cola Co. 0.12 (0.01) 0.09 (0.01)
Minnesota Mining & MFG Co. 0.07 (0.01) 0.06 (0.01)
Philip Morris Cos Inc. 0.04 (0.01) 0.04 (0.01)
Procter & Gamble Co. 0.12 (0.02) 0.09 (0.01)
Pharmacia Corp. 0.06 (0.01) 0.04 (0.01)
Schering-Plough Corp. 0.12 (0.01) 0.11 (0.01)
Texaco Inc. 0.04 (0.01) 0.03 (0.01)
Texas Instruments Inc. 0.17 (0.02) 0.12 (0.01)
Walgreen Co. 0.11 (0.01) 0.09 (0.01)
tail dependence between any two assets is then easily derived from (4.94). It
is interesting to observe that the coefficients of tail dependence seem almost
identical in the lower and the upper tail. Nonetheless, the coefficient of lower
tail dependence is always slightly larger than the upper one, showing that large
losses are more likely to come together compared with large gain occurrences.
Two clusters of assets clearly stand out: those with a tail dependence of
about 10% (or more) and those with a tail dependence of about 5%. Let
us exploit this observation and explore some consequences of the existence
of stocks with drastically different tail dependence coefficients with the in-
dex. These stocks offer the interesting possibility of constructing a prudential
portfolio which can be significantly less sensitive to the large market moves.
Figure 4.11 compares the daily returns of the Standard & Poor’s 500 with
those of two portfolios P1 and P2 : P1 is made of the four stocks (Chevron
Corp., Philip Morris Cos Inc., Pharmacia Corp., and Texaco Inc.,) with the
smallest λ’s while P2 is made of the four stocks (Bristol-Meyer Squibb Co.,
Hewlett-Packard Co., Schering-Plough Corp., and Texas Instruments Inc.,)
with the largest λ’s. In fact, we have constructed two variants of P1 and two
variants of P2 . The first variant corresponds to choose the same weight 1/4
of each asset in each class of assets (with small λ’s for P1 and large λ’s for
P2 ). The second variant has asset weights in each class chosen in addition to
minimize the variance of the resulting portfolio. We find that the results are
almost the same between the equally weighted and minimum-variance port-
folios. This makes sense since the tail dependence coefficient of a bivariate
4.5 Tail Dependence 179
random vector does not depend on the variances of the components, which
only account for the price moves of moderate amplitudes.
Figure 4.11 presents the results for the equally weighted portfolios gener-
ated from the two groups of assets. Observe that only one large drop occurs
simultaneously for P1 and for the Standard & Poor’s 500 in contrast with P2
for which several large drops are associated with the largest drops of the in-
dex and only a few occur desynchronized. The figure clearly shows an almost
circular scatter plot for the large moves of P1 and the index compared with a
rather narrow ellipse, whose long axis is approximately along the first diago-
nal, for the large returns of P2 and the index, illustrating that the small tail
dependence between the index and the four stocks in P1 automatically implies
that their mutual tail dependence is also very small, according to (4.94). As a
consequence, P1 offers a better diversification with respect to large drops than
P2 . This effect already, quite significant for such small portfolios, should be
overwhelming for large ones. The most interesting result stressed in Fig. 4.11
is that optimizing for minimum tail dependence automatically diversifies away
the large risks.
0.1
Portfolio P1
0.08 Portfolio P2
0.06
0.04
Portfolio daily return
0.02
−0.02
−0.04
−0.06
−0.08
−0.1
−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06
S&P 500 daily return
Fig. 4.11. Daily returns of two equally weighted portfolios P1 (made of four stocks
with small λ ≤ 0.06) and P2 (made of four stocks with large λ ≥ 0.12) as a function
of the daily returns of the Standard & Poor’s 500 over the period January 1991 to
December 2000. The straight (resp. dashed) line represents the regression of portfolio
P1 (resp. P2 ) on the Standard & Poor’s 500. Reproduced from [332]
180 4 Measures of Dependences
Di = 1 ⇐⇒ Xi ≤ Ti , (4.104)
Table 4.4. Ratio of the 99% quantiles of the distribution of defaulting obligators
when the latent variables have a Student copula with ν degrees of freedom, nor-
malized with respect to the Gaussian copula, for portfolios of 10,000 homogeneous
credits with default probability πi and correlation ρ. The values of πi and ρ are the
same as in [186]
πi ρ ν = 50 ν = 10 ν=4
0.01% 2.58% 2.33 5.62 6.00
0.50% 3.80% 1.66 3.75 6.84
7.50% 9.21% 1.09 1.39 1.78
loss distribution of the credit portfolio. Therefore, even if the copula is not so
important for derivative pricing, it could be really crucial to establish hedging
strategies. This point has not yet been really explored to our knowledge,
but appears as an important future development of the research on credit
derivatives.
Appendix
X = βY + , (4.A.1)
−1
Lemma 4.5.1. The probability that X is larger than FX (u) knowing that Y
−1
is larger than FY (u) is given by :
−1
Pr[X > FX (u)|Y > FY−1 (u)] = F̄ (η)
∞
β
+ dy F̄Y (y) · P [βFY−1 (u) + η − βy] , (4.A.4)
1 − u FY−1 (u)
with
−1
η = FX (u) − βFY−1 (u) . (4.A.5)
The proof of this lemma relies on a simple integration by part and a change
of variable, which are detailed in Appendix 4.A.1.
Introducing the notation
By definition,
∞ ∞
−1
Pr[X > FX (u), Y > FY−1 (u)] = dx dy PY (y) · P (x − βy)
−1
FX (u) FY−1 (u)
∞
−1
= dy PY (y) · F̄ [FX (u) − βy] .
FY−1 (u)
The factor Y and the idiosyncratic noise are distributed according to the
Student’s distributions with ν degrees of freedom given by (4.A.2) and (4.A.3)
respectively. It follows that the survival distributions of Y and are:
ν−1
ν 2 Cν
F̄Y (y) = + O(y −(ν+2) ) , (4.A.12)
yν
ν−1
σν ν 2 Cν
F̄ () = + O(−(ν+2) ) , (4.A.13)
ν
and
ν−1
(β ν + σ ν ) ν 2 Cν
F̄X (x) = + O(x−(ν+2) ) . (4.A.14)
xν
Using the notation (4.A.6), (4.A.5) can be rewritten as
To obtain this equation, we have used the asymptotic expressions of F̄X and
F̄Y given in (4.A.14) and (4.A.12).
1 1 Cν
+ ν du u ν .
x0 0
x (1 + x0 ) (1 + u2 ) ν+1 2
ν
Appendix 185
Consider the second integral in the right-hand side of the last equality. We
have
x0
u≥ , (4.A.18)
which allows us to write
ν+1
1 ν 2 ν+1
≤ , (4.A.19)
xν+1
ν+1
(1 + u2 ) 2
0
so that
∞ ν ν+1 ν+1 ∞
1 Cν 2 Cν
x du ≤ du (4.A.20)
0 (1 + x0 ) (1 + u ) 2
u ν 2
ν+1
x0 ν+1 x 0 (1 + xu0 )ν
ν 2 ν ∞
ν+1
Cν
= dv (4.A.21)
xν0 1 (1 + v)ν
= O( ).
ν
(4.A.22)
The next step of the proof is to show that
x0
1 Cν
du −→ 1 as −→ 0 . (4.A.23)
1−x0 (1 + xu0 )ν (1 + u2 ) ν+1
2
ν
Let us calculate
x0
1 Cν
1−x du − 1
0 (1 + u ν
x0 ) (1 + u2 ν+1
) 2
x0 ν
∞
1 Cν Cν
= du − du
1−x0 (1 + u
x0 )ν u
(1 + ν )
2 ν+1
2 −∞
u
(1 + ν )
2 ν+1
2
x0
1 Cν
= du u ν − 1
1−x 0 (1 + x0 ) 2 ν+1
(1 + uν ) 2
1−x 0 ∞
Cν Cν
− du ν+1 − du ν+1
(1 + ν ) 2
u2 u2
−∞ (1 + ν ) 2 x 0
x0 1−x0
1 Cν Cν
≤ du − 1 + du
1−x0 (1 + u ν
x0 ) (1 + u2 ν+1
) 2 −∞ (1 + u2 ν+1
) 2
ν ν
∞
Cν
+ du ν+1 . (4.A.24)
x0 (1 + ν ) 2
u2
186 4 Measures of Dependences
xν0 −1
− ⋅ε⋅ u
x0−1
ν⋅ ε ⋅ u
x0
0
(1−x0)/ε 0 u x0 /ε
The second and third integrals obviously behave like O(ν ) when goes
to zero since we have assumed x0 > 1 which ensures that 1−x
0
→ −∞ and
x0
→ ∞ when → 0 +
. For the first integral, we have
x0
1 Cν
1−x du u ν − 1 ν+1
0 (1 + x0 ) (1 + ν ) 2
u2
x 0
1 Cν
≤ du − 1 .
1−x0 (1 + x0 )ν
u
(1 + u2 ) ν+1
2
ν
The function
1
u ν − 1 (4.A.25)
(1 + x0 )
Fig. 4.12), so that there are two constants A, B > 0 such that
Appendix 187
xν0 − 1 1 − x0
1
− 1 ≤ − · u = −A · · u , ∀u ∈ ,0 (4.A.26)
(1 + xu0 )ν x0 − 1
x
1 ν 0
− 1 ≤ u = B · · u, ∀u ∈ 0, . (4.A.27)
(1 + x0 )
u ν
x0
We can thus conclude that
x0 0
u · Cν
1 Cν
1−x du − 1 ≤ −A · du
0
u ν
(1 + x0 ) (1 + ν ) 2
u2 ν+1 1−x 0
2 ν+1
(1 + uν ) 2
x 0
u · Cν
+ B· du 2 ν+1
0 (1 + uν ) 2
= O(α ) , (4.A.28)
with α = min{ν, 1}. Indeed, the two integrals can be performed exactly, which
shows that they behave as O(1) if ν > 1 and as O(ν−1 ) otherwise. Thus, we
finally obtain
x0
1 Cν
1−x du − 1 = O(α ) . (4.A.29)
0 (1 + u ν
x0 ) u2 ν+1
(1 + ν ) 2
Cν
ν−1
ν 2
F̄Y (y) = 1 + O(y −2 ) . (4.A.31)
yν
where
ν 1/ν
σ
γ =β 1+ . (4.A.33)
β
188 4 Measures of Dependences
where the change of variable x = Ỹy has been performed in the last equation.
u
We now apply Lemma 4.5.2 with x0 = βγ > 1 and = βσỸ which goes to
u
zero as u → 1. This gives
∞ ν−1 ν
ν 2 Cν β
dy F̄Y (y) · P (β Ỹu + η − βy) ∼u→1 , (4.A.35)
Ỹu β Ỹuν γ
Therefore
ν
−1 β
Pr[X > FX (u)|Y > FY−1 (u)] ∼u→1 , (4.A.36)
γ
There are two general methods for estimating empirically the copula best de-
scribing the dependence structure of a basket of assets, and more generally of
a portfolio made of different financial and/or actuarial risks: parametric and
nonparametric. The latter class is by far the most general since it does not
require the a priori specification of a model, and should thus avoid the prob-
lem of misspecification (model error). In contrast, the parametric approach
has the advantage that, if a model is correctly specified, it leads to a much
more precise parametric estimation. In addition, the reduced number of para-
meters involved in the description of the selected copula can be interpreted as
being the relevant meaningful variables that summarize the dependence prop-
erties between the assets. Consider for instance the Gaussian representation,
or more generally any presentation in terms of elliptical distributions, whose
dependence structure is, to large extent (see Chap. 4), summarized by the
set of linear coefficients of correlation. These coefficients of correlation thus
play a pivotal role and it is tempting to interpret them as the macrovariables
(or phenomenological variables) synthesizing all possible microstructural in-
teractions between economic agents leading to the observed dependence. Let
us recall that identifying the “correct variables” constitutes the critical first
step in model building to obtain the best possible representation of observed
phenomena. The usefulness of the parametric estimation is thus obvious from
this point of view.
The first section of this chapter reviews the most representative methods
to estimate copulas, with an emphasis on the description of parametric ap-
proaches. The following section focuses on the problem of model selection and
on goodness-of-fit tests. Indeed, the estimation procedure has no sense if the
quality and the likelihood of the model are not assessed. Instead of reviewing
the many available goodness-of-fit tests, we discuss how to best describe the
dependence structure of asset returns and we compare the relative merits of
the different models considered in the literature to address this question.
190 5 Description of Financial Dependences with Copulas
The very first copula estimation method dates back to the work by De-
heuvels [121, 122]. It relies on a simple generalization of the usual estimator
of a multivariate distribution. Indeed, considering an n-dimensional random
vector X = (X1 , . . . , Xn ) whose copula is C and given a sample of size T
{(x1 (1), x2 (1), . . . , xn (1)), . . . , (x1 (T ), x2 (T ), . . . , xn (T ))}, a natural idea is to
estimate the empirical distribution function F of X as
1
T
F̂ (x) = 1{x1 (k)≤x1 ,...,xn (k)≤xn } , (5.1)
T
k=1
1
T
F̂i (xi ) = 1{xi (k)≤xi } . (5.2)
T
k=1
where xp (k; T ) denotes the k th order statistics of the sample {xp (1), . . . ,
xp (T )}. Following Deheuvels, one can define an empirical copula as any copula
which satisfies the relation (5.3).
It is well-known that the empirical distribution function F̂ converges, al-
most surely, uniformly to the underlying distribution function F from which
1
The same issue arises, of course, for the empirical estimation of marginal as well
as multivariate distributions.
5.1 Estimation of Copulas 191
the sample is drawn, as the sample size T goes to infinity. This property still
holds for the nonparametric estimator defined by the empirical copula
a.s
sup Ĉ(u) − C(u) −→ 0 . (5.4)
u∈[0,1]n
The following relation holds between the empirical copula Ĉ and the empirical
copula density:
2 2
i1 in k +···+kn
ĉ ,..., = ··· (−1) 1 ×
T T
k1 =1 kn =1
i1 − k1 + 1 in − kn + 1
×Ĉ ,..., . (5.6)
T T
A natural question arises: what is the estimated value of C(u) or c(u) when
u does not belong to the lattice defined by the set of points iT1 , iT2 , . . . , iTn ,
with ik ∈ {1, 2, . . . , T }? It would seem that this is nothing but a straightfor-
ward interpolation problem, which could be solved by constructing a sim-
ple staircase function or applying spline functions, for instance. However,
such methods of interpolation do not ensure that the function so obtained
fulfills the requirements for a copula, according to Definition 3.2.1; in par-
ticular, the function must be n-increasing, which requires a multilinear in-
terpolation
scheme. In the bivariate
case (for simplicity), given any point
(u, v) ∈ kTu , kuT+1 × kTv , kvT+1 , where ku , kv ∈ {0, 1, . . . T − 1} denotes the
integer part of T · u and T · v respectively, the following interpolation
ku kv
C̃(u, v) = Ĉ , · (ku + 1 − T · u) (kv + 1 − T · v)
T T
ku kv + 1
+ Ĉ , · (ku + 1 − T · u) (T · v − kv )
T T
ku + 1 kv
+ Ĉ , · (T · u − ku ) (kv + 1 − T · v)
T T
ku + 1 kv + 1
+ Ĉ , · (T · u − ku ) (T · v − kv ) , (5.7)
T T
defines a bona fide empirical
copula. Indeed,
by construction C̃ is a copula
(see [370, p. 16]) and C̃ Ti , Tj = Ĉ Ti , Tj for all i, j ∈ {1, . . . , T }.
Li et al. [304, 305] have provided some other insightful methods. One of
them relies on the use of Bernstein polynomials,
192 5 Description of Financial Dependences with Copulas
n i
Pi,n (x) = x (1 − x)n−i . (5.8)
i
Defining
T
T
i1 i2 in
ĈB (u) = ··· Pi1 ,T (u1 ) · · · Pin ,T (un ) · Ĉ , ,..., , (5.9)
i1 =1 in =1
T T T
C (F1 (X1 ) , . . . , Fn (Xn )). The most commonly used kernel is probably the
Gaussian kernel,
1
ϕ(x) = √ e− 2 x .
1 2
(5.10)
2π
We present the general procedure detailed in [168] on this particular example.
Let us first estimate the joint distribution of X. Given the sample of size
T {(x1 (1), x2 (1), . . . , xn (1)), . . . , (x1 (T ), x2 (T ), . . . , xn (T ))}, the kernel esti-
mates of Fi (x) and F (x) are
1
T
xi − xi (t)
F̂i (xi ) = Φ , (5.11)
T t=1 hi
and
1
T n
xi − xi (t)
F̂ (x) = Φ , (5.12)
T t=1 i=1 hi
where
x
Φ(x) = ϕ(t) dt (5.13)
−∞
In practice, one usually chooses hi = σ̂i · (4/3T )1/5 , where σ̂i denotes the
sample standard deviation of {xi (1), . . . , xi (T )}.
Defining q̂, the vector whose ith component is the ui -quantile of F̂i ,
This asymptotic behavior holds even when the sample is not iid, provided
that the underlying process satisfies some strong mixing conditions, roughly
speaking (see [168] for details). Therefore, this method can be applied to
financial asset returns, which are known to exhibit volatility clustering among
other time dependence patterns.
By construction of the kernel estimator, it can be differentiated with re-
spect to the ui ’s. It is thus easy to obtain an estimator of a partial derivative
of the copula with respect to one (or more) of the variables. For instance, the
kernel estimator of the first order partial derivative of C with respect to ui is
∂C(u) ∂ Ĉ(u) 1
= = · ∂i F̂ (q̂(u)) (5.19)
∂ui ∂ui ˆ
fi (q̂i (ui ))
Applying the same kind of arguments, one can estimate the higher order
partial derivatives of the copula C:
∂C(u) ∂ Ĉ(u) ∂i1 ,...,ik F̂ (q̂(u))
= = , (5.22)
∂ui1 · · · ∂uik ∂ui1 · · · ∂uik fˆi1 (q̂i1 (ui1 )) · · · fˆik (q̂ik (uik ))
fˆ (q̂(u))
ĉ(u) = , (5.23)
fˆ1 (q̂1 (u1 )) · · · fˆn (q̂n (un ))
T n
1 xi − xi (t)
fˆ(x) = & ϕ . (5.24)
T· i hi t=1 i=1 hi
5.1 Estimation of Copulas 195
1 0.5 1 1 .0
75
0.5 4 0.95 98
1.1
1. 1
5
1.0
5
1.2 0.
1
0.
1.0
99
2
0.5
0.
1
0.5
2
1
1.00
0 . 75
0.8 1.5 0.8
1
5
1.2
98
1.
0.
5
02
0.
25
1.
00
1.
95 1.01
0.5
3
0.6 0.7
5 0.6 0.9
1
03
1
1.0
00
5
1.2
1.
1.001 1
v
v
1 1
0.5 001
1.1.003 1.001
5
0 .7
1
5
0.4 1.2 0.4 5
1 99
1.01 0.
1.0
251 5 0.5
0
25
8
1.0
0.9
1.
0.5
2
1. 0.7
1.001
1.5 1
0.2 0.7
5 0.2
2
0.5 5
1.0
1.5 99
0.
1.1
1.
1
4 8 0.95
1
5 5 1.05
0.9
02
1.2 1 0.70.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 5.1. Contour plot of the copula density estimated by the kernel method for
the daily returns of the couple constituted of the German Mark (u variable) and the
Japanese Yen (v variable) over the time interval from 1 May, 1973 to 2 November,
2001 (left panel ) and for the couple made of General Motors (u variable) and Procter
& Gamble (v variable) over the time period from 3 July, 1962 to 29 December, 2000
(right panel ). The German Mark data has been reconstructed from the Euro data
after 31 December, 1999
Two examples of copula densities estimated by the kernel method are shown
in Fig. 5.1. Observe that the level curves of the left panel are rather similar to
those of Fig. 3.3, which depicts the contour plot of a Student copula. This is
suggestive of the relevance of a Student’s copula with a moderate number of
degrees of freedom as a possible candidate for modeling dependencies between
the returns of foreign exchange rates.4 For stock returns, the situation is less
clear, even if one could surmise that a Student copula with a large number of
degrees of freedom could be a reasonable model.
To sum up this paragraph on kernel estimators, let us stress that, notwith-
standing their seeming attractiveness, they have a severe drawback as they
require a very large amount of data. As an illustration, in order to obtain the
two pictures of Fig. 5.1, we used between 7,000 and 10,000 data points. With
less than 2,500–5,000 points, one obtains unreliable estimates in most cases,
showing that the kernel estimators behave badly for small samples. Therefore,
with daily returns, an accurate non-parametric estimate of the copula requires
between 30 and 40 years of data. Over such a long time period, it is far from
given that the dependence structure remains stationary, thus possibly blowing
up the whole effort.
When the number of observations is not large enough and/or when one has
a sufficiently accurate idea of the true model, it is in general more profitable
4
Of course, this statement should be formally tested by using rigorous statistical
techniques. See the following sections.
196 5 Description of Financial Dependences with Copulas
As an illustration, let us consider again the two samples of the daily returns
of the German Mark and the Japanese Yen on the one hand, and of General
Motors and Procter & Gamble on the other hand. Assuming that their copula
belongs to the class of elliptical copulas, we can apply this method to infer
the value of the shape parameter ρ for each pair of assets. For the first one
(German Mark/Japanese Yen), we obtain: τ̂T = 0.37 so that ρ̂T = 0.54, while
for the second one: τ̂T = 0.18 and therefore ρ̂T = 0.29. This shows that the
dependence is stronger between the pair of currencies than between the pair
of stocks.
This method is particularly attractive due to its simplicity but is a bit
naive. While it provides very simple and robust estimators, these estima-
tors are not always very accurate. This justifies turning to more elaborated
methods, such as that developed by Genest et al. [197], which relies on the
maximization of a pseudo likelihood.
Let us still consider a sample of size T {(x1 (1), x2 (1), . . . , xn (1)), . . . , (x1 (T ),
x2 (T ), . . . , xn (T ))}, drawn from a common distribution F with copula C and
margins Fi . By definition of the copula, the random vector U whose ith com-
ponent is given by Ui = Fi (Xi ) has a distribution function equal to C. Assum-
ing that the copula C = C(·; θ 0 ) belongs to the family {C(u1 , . . . , un ; θ); θ ∈
Θ ⊂ Rp }, where θ denotes the vector parameterizing the copula, the function
T
ln L = ln c (F1 (x1 (i)) , . . . , Fn (xn (i)) ; θ) , (5.29)
i=1
where c(·; θ) denotes the density of C(·; θ), provides the likelihood of the se-
quence {(u1 (k) = F1 (x1 (k)), . . . , uN (k) = FN (xN (k)))}Tk=1 . Note that the se-
quence {(u1 (k) = F1 (x1 (k)), . . . , uN (k) = FN (xN (k)))}Tk=1 is independently
and identically distributed provided that the xi (k)’s are independent and iden-
tically distributed realizations.
Since the marginal distributions are generally unknown and when no para-
metric model seems available, it is reasonable to use the empirical marginal
distribution functions F̂i defined by (5.2) to obtain an estimator of U ,
Û = F̂1 (X1 ), . . . , F̂n (Xn ) . (5.30)
Then, one derives the pseudo-sample {(û1 (k), . . . , ûn (k))}Tk=1 , where ûi (k) =
F̂i (xi (k)), which is not iid even if the xi (k)’s are. Hence, substituting the
u(k)’s for the û(k)’s in the log-likelihood function (5.29), one obtains the
peudo log-likelihood of the model, based on the sample {(x1 (1), . . . , x1 (T )),
. . . , (xn (1), . . . , xn (T ))}:
198 5 Description of Financial Dependences with Copulas
T
ln L̃ = ln c F̂1 (x1 (i)) , . . . , F̂n (xn (i)) ; θ . (5.31)
i=1
where
∂ 2 ln c(u; θ)
Wki (Uk ) = 1{Uk ≤uk } dC u; θ 0 . (5.36)
u∈[0,1]p ∂θi ∂ui θ=θ 0
7
0.
1
1 5
1.1
2
1
1.
1.5
1
75
1.
5
0.5
0. 1.2 1.1
25
1.
1.15
9
0.
0.6 0.6
5 1
1 1.2 1. 1
75
v
v
1.1
0.
0.5
0.5
1 1
.25
9
0.7
1
0.
5
0.4 0.4
0.7
1.15
0.7
0.9
0.5
25
1
1.
1.
2 5.1
11. .111
5
1.2
5
0.7
1.21
5
0.2 1. 1 0.2 3 1
1. 0.9
2 0.5 1.5
1.5
0.7
15
1.
0.75 0.5 3 1.3 1.11
1.215
2
0.9
4
0.7 0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u
Fig. 5.2. Contour plot of the Student copula maximizing the pseudo likeli-
hood (5.31) for the daily returns of the couple German Mark/Japanese Yen over
the time interval from 1 May, 1973 to 2 November, 2001 (left panel ) and for the
couple General Motors/Procter & Gamble over the time period from 3 July, 1962
to 29 December, 2000 (right panel )
Japanese Yen over the time interval from 1 May, 1973 to 2 November, 2001
and the daily returns of the couple of stocks (General Motors; Procter &
Gamble) over the time period from 3 July, 1962 to 29 December, 2000. As
aforementioned, the kernel estimates (see Fig. 5.1) of the copulas of these two
couples suggest that the Student copula could provide a reasonable description
of their dependence structure, at least for the pair of currencies. The pseudo
log-likelihood of these samples for a Student copula with ν degrees of freedom
and shape matrix ρ can be straightforwardly derived from (3.37) p. 110. No
closed form for ρ̂ and ν̂ can be obtained. One has to maximize the pseudo
log-likelihood with a numerical procedure. Figure 5.2 depicts the contour plot
of the Student copula maximizing the pseudo likelihood for each sample. For
the sample (German Mark; Japanese Yen), we obtain the following estimates
for the shape parameter and the number of degrees of freedom respectively:
ρ̂ = 0.54 and ν̂ = 5.82. Comparing the left panels of Figs. 5.1 and 5.2, we
observe that the Student copula estimated from the data seems reasonably
close to the kernel estimate of the copula, suggesting that the copula model
is realistic in this case. For the couple (General Motors; Procter & Gamble),
we find ρ̂ = 0.29 and ν̂ = 5.92. However, when comparing the right panels of
Figs. 5.1 and 5.2, one can observe a clear discrepancy between the two models
and it is doubtful that the Student copula provides a good representation
of the dependence in this case. Settling this question requires to qualify the
goodness-of-fit of the model, which will be discussed in Sect. 5.2.
In the mean time, let us stress two important points concerning the prac-
tical implementation of the pseudo maximum likelihood estimation method.
• It is convenient to replace the empirical distribution function F̂i (·), defined
T
by (5.2), by F̂i (·). These two quantities are asymptotically equivalent
T +1
200 5 Description of Financial Dependences with Copulas
but the use of the latter allows to prevent potential unboundedness of the
pseudo log-likelihood when one of the ûi ’s tends to one.
• Any maximization algorithm requires an initialization. The choice of the
starting point is not innocuous since the performance of the algorithm
can depend, for a large part, on it. For any elliptical copula, assessing the
Kendall’s tau and applying relation (5.28) allows one to obtain a good
starting point. In fact, the estimation of ρij from τij often provides such
a good starting point that the pseudo maximum likelihood estimate of
ρ does not significantly improve on it [350]. Our examples confirm this
point: with both methods, we have obtained the same values (within their
confidence interval). In addition, the first estimation method is much faster
than the second one. These remarks are specially important when one deals
with large portfolios for which the numerical maximization of the pseudo
likelihood becomes particularly tricky (and time consuming). Therefore,
in such a case, the non-parametric estimation of ρ by relation (5.28) is
probably the best method. Then, one can obtain an accurate estimate
of the number ν of degrees of freedom by maximization of the pseudo
likelihood with respect to this single parameter only,
While many procedures exist, we will only focus on maximum likelihood meth-
ods. Among those, two main approaches can be distinguished: the one-step
maximum likelihood estimation and the two-step maximum likelihood esti-
mation.
Given a multivariate distribution function F (x; θ) depending on the vec-
tor of parameters θ ∈ Θ ⊂ Rp , which can be represented as F (x; θ) =
C (F1 (x1 ; θ), . . . , Fn (xn ; θ); θ), and given a sample of size T {(x1 (1), x2 (1),
. . . , xn (1)), . . . , (x1 (T ), x2 (T ), . . . , xn (T ))}, the log-likelihood of the model is
5.1 Estimation of Copulas 201
T
ln L ({x}; θ) = ln c (F1 (x1 (i); θ), . . . , Fn (xn (i); θ); θ)
i=1
T
T
+ ln f1 (x1 (i); θ) + · · · + ln fn (xn (i); θ) , (5.38)
i=1 i=1
where, as usual, c(·; θ) denotes the density of C(·; θ) and the fi (·; θ)’s are
the densities of the marginal distribution function Fi (·; θ)’s. The one-step
maximum likelihood estimator of θ is then
t
where A−1 BA−1 is the inverse of Godambe’s information matrix,6 with
⎛ ⎞
E ∂β1 ,β1 ln f1 |β01 0 0 0
⎜ ⎟
⎜ .. ⎟
A=⎜
⎜ 0 .
0 0 ⎟ ,
⎟
⎝ 0 0 E ∂βn ,βn ln fn |β0n 0 ⎠
E [∂β1 ,α ln c|θ0 ] · · · E [∂βn ,α ln c|θ0 ] E [∂α,α ln c|θ0 ]
(5.47)
and
B = Cov ∂β1 ln f1 |β01 , · · · , ∂βn ln fn |β0n , ∂α ln c|θ0 . (5.48)
While asymptotically less efficient than the one-step estimator, this approach
has the obvious advantage of reducing the dimensionality of the problem,
which is particularly useful when one has to resort to a numerical maximiza-
tion.
In practice, one has often to deal with samples of different lengths. This
may occur for instance when considering simultaneously mature and emerging
6
Godambe’s information matrix has been introduced in the context of inference
functions (or estimating equations) [205, 354].
5.1 Estimation of Copulas 203
markets with different lifespans, or market returns together with the returns of
a company which has only been recently introduced in the stock exchange or
which has defaulted, or foreign exchange rates where one of the currencies of
interest has only a short history, such as the Euro. In such a case, the two-step
method is much better than the one-step method. The latter requires using a
data set which is the intersection of all the marginal samples, leading often to
a significant loss of efficiency in the estimation of the parameters of marginal
distributions. In contrast, the two-step method uses the whole set of samples
for the estimation of marginal parameters and restricts to the intersection of
the marginal samples only for the estimation of the parameters of the copula.
This two-step estimator is still consistent and asymptotically Gaussian. Its
asymptotic variance can be derived from (5.47–5.48), by accounting for the
different lengths of the marginal samples (see Patton [380]). While the one-
step estimator still remains asymptotically more efficient than the two-step
estimator, Patton reports that the accuracy of the two-step estimator is much
better than that of the one-step estimator, when the size of the intersection
of the marginal samples is small.
with
∂u1 · · · ∂uk−1 Ck (u1 , . . . , uk )
Ck (uk |u1 , . . . , uk−1 ) = , (5.50)
∂u1 · · · ∂uk−1 Ck−1 (u1 , . . . , uk−1 )
are identically, uniformly, and independently distributed. This property has
already been used in Sect. 3.5.2 to provide an algorithm for the generation of
random variables with a given copula C. Thus, testing the null hypothesis is
equivalent to testing that the sample of T vectors
T
{Cn (ûn (t)|û1 (t), · · · , ûn−1 (t)) , · · · , C2 (û2 (t)|û1 (t)) , û1 (t)}t=1 , (5.51)
but the statistical properties of such tests are rather poor [169]. A simpler
approach focuses on the discrepancy between the fitted copula and the null
copula on the main diagonal only by use of the K function:
n−1
ϕk (z)
K(z) = z + (−1)k · χk−1 (z) , (5.54)
k!
k=1
∂z χk−1 (z) −1
χk (z) = , with χ0 (z) = [∂z ϕ(z)] . (5.55)
∂z ϕ(z)
Section 3.6 has discussed the importance of the Gaussian copula for financial
modeling. We now review the empirical tests of the hypothesis, denoted H0 ,
that the Gaussian copula is the correct description of the dependence between
financial assets. After summarizing the testing procedure developed in [334],
we describe the results.
Let us first derive the test statistics which will allow us to reject or not reject
the null hypothesis H0 . The following proposition, whose proof is given in
Appendix 5.A, can be stated.
Proposition 5.2.1. Assuming that the N -dimensional random vector X =
(X1 , . . . , XN ) with joint distribution function F and marginals Fi , satisfies
the null hypothesis H0 , then, the variable
N
Z2 = Φ−1 (Fi (Xi )) (ρ−1 )ij Φ−1 (Fj (Xj )) , (5.56)
j,i=1
5.2 Description of Financial Data in Terms of Gaussian Copulas 205
ρij = Cov[Φ−1 (Fi (Xi )), Φ−1 (Fj (Xj ))] , (5.57)
1
T
F̂i (xi ) = 1{xi (k)≤xi } . (5.58)
T
k=1
1
T
ρ̂ = ŷ(i) · ŷ(i)t (5.60)
T i=1
|Fz2 (z 2 ) − Fχ2 (z 2 )|
Anderson-Darling: d3 = max (5.64)
z Fχ2 (z 2 )[1 − Fχ2 (z 2 )]
|F 2 (z 2 ) − Fχ2 (z 2 )|
average Anderson-Darling: d4 = z dFχ2 (z 2 ) (5.65)
Fχ2 (z 2 )[1 − Fχ2 (z 2 )]
The Kolmogorov distance d1 and its average d2 are more sensitive to the de-
viations occurring in the bulk of the distributions. In contrast, the Anderson-
Darling distance d3 and its average d4 are more accurate in the tails of the
distributions. Considering statistical tests for these four distances is important
in order to be as complete as possible with respect to the different sensitiv-
ity of the tests. The averaging introduced in the distances d2 and d4 (which
are simply the average of d1 and d3 respectively) provides important infor-
mation. Indeed, the distances d1 and d3 are mainly controlled by the point
that maximizes the argument within the max(·) function. They can thus be
quite sensitive to the presence of an outlier. By averaging, d2 and d4 become
less sensitive to outliers, since the weight of such points is only of order 1/T
(where T is the size of the sample) while it equals one for d1 and d3 .
For the usual Kolmogorov and Anderson-Darling distance, the law of the
empirical counterpart of d1 and d2 is known, at least asymptotically. In ad-
dition, it is free from the underlying distribution. However, for such a result
to hold, one needs to know the exact value of the covariance matrix ρ and
the exact expression of the marginal distribution functions Fi . In the present
case, the variables ẑ 2 (k), given by (5.61), are only pseudo-observations since
their assessment requires the preliminary estimation of the covariance matrix
ρ̂ and the marginal distribution functions F̂i . And, as outlined in [200, 201],
when one considers the empirical process constructed from the pseudo-sample
{ẑ 2 (k)}Tt=1 , the limiting behavior is not the same as in the case where one
would actually observe z 2 (k), because there are two extra terms: one due to
the fact that Fi is replaced by F̂i and another one due to the fact that ρ
5.2 Description of Financial Data in Terms of Gaussian Copulas 207
is replaced by ρ̂. Therefore, one cannot directly use the asymptotic results
known for these standard statistical tests.
As a very simple remedy, one can use a bootstrap method [143], whose
accuracy has been proved to be at least as good as that given by asymptotic
methods used to derive the theoretical distributions [97]. For the assessment
of the asymptotic laws of d2 and d4 , such a numerical study is compulsory
since, even for the true observations z 2 (k), one does not know the expression
of the asymptotic laws. Putting all this together, a possible implementation
of this testing procedure is the following:
1. Given the original sample {x(t)}Tt=1 , generate the pseudo-Gaussian vari-
ables ŷ(t), t ∈ {1, . . . , T } defined by (5.59).
2. Then, estimate the covariance matrix ρ̂ of the pseudo-Gaussian variables
ŷ, which allows one to compute the variables ẑ 2 and then measure the
distance of its estimated distribution to the χ2 -distribution.
3. Given this covariance matrix ρ̂, generate numerically a sample of T bi-
variate Gaussian random vectors with the same covariance matrix ρ̂.
4. For the sample of Gaussian vectors synthetically generated with covari-
ance matrix ρ̂, estimate its sample covariance matrix ρ̃ and its marginal
distribution functions F̃i .
5. To each of the T vectors of the synthetic Gaussian sample, associate the
corresponding realization of the random variable z 2 , called z̃ 2 (t).
6. Construct the empirical distribution for the variable z̃ 2 and measure the
distance between this distribution and the χ2 -distribution.
7. Repeat 10,000 times (for instance) the steps 3 to 6, and then obtain an
accurate estimate of the cumulative distribution of distances between the
distribution of the synthetic Gaussian variables and the theoretical χ2 -
distribution. This cumulative distribution represents the test statistic,
which will allow you to reject or not the null hypothesis H0 at a given
significance level.
8. The significance of the distance obtained at step 2 for the true variables –
i.e., the probability to observe, at random and under H0 , a distance larger
than the empirically estimated distance – is finally obtained by a sim-
ple reading on the complementary cumulative distribution estimated at
step 7.
Currencies
The Federal Reserve Board provides access to a large set of historical quotes
of spot foreign exchange rates. Following [334], let us focus on the Swiss Franc,
the German Mark, the Japanese Yen, the Malaysian Ringit, the Thai Baht
and the British Pound during the time interval of ten years from 25 January,
1989 to 31 December, 1998. All these exchange rates are expressed against
the US dollar.
At the 95% significance level, one observes that only 40% (according to d1
and d3 ) but 60% (according to d2 and d4 ) of the tested pairs of currencies are
compatible with the Gaussian copula hypothesis over the entire time interval.
During the first half-period from 25 January, 1989 to 11 January, 1994, 47%
(according to d3 ) and up to about 75% (according to d2 and d4 ) of the tested
currency pairs are compatible with the assumption of the Gaussian copula,
while during the second subperiod from 12 January, 1994 to 31 December,
1998, between 66% (according to d1 ) and about 75% (according to d2 , d3
and d4 ) of the currency pairs remain compatible with the Gaussian copula
hypothesis. These results raise several comments both from a statistical and
from an economic point of view.
We first have to stress that the most significant rejection of the Gaussian
copula hypothesis is obtained for the distance d3 , which is indeed the most
sensitive to the events in the tail of the distributions. The test statistics given
by this distance can indeed be very sensitive to the presence of a single large
event in the sample, so much so that the Gaussian copula hypothesis can
be rejected only because of the presence of this single event (outlier). The
difference between the results given by d3 and d4 (the averaged d3 ) are very
significant in this respect. The case of the German Mark and the Swiss Franc
provides a particularly startling example. Indeed, during the time interval
from 12 January, 1994 to 31 December, 1998, the probability p(d) of non-
rejection is rather high according to d1 , d2 and d4 (p(d) ≥ 31%) while it is
very low according to d3 : p(d) = 0.05%, which should lead to the rejection
of the Gaussian copula hypothesis on the basis of the distance d3 alone. This
discrepancy between the different distances suggests the presence of an outlier
in the sample.
To check this hypothesis, we show in the upper panel of Fig. 5.3 the func-
tion
|Fz2 (z 2 (t)) − Fχ2 (χ2 (t))|
f3 (t) = , (5.66)
Fχ2 (χ2 )[1 − Fχ2 (χ2 )]
10/09/97
f3
20/06/1989
1 16/09/1992
0
Jan 89 Jan 94 Dec 98
4 3
2
2
German Mark
German Mark 1
0 0
−1
−2
−2
−4 −3
−4 −2 0 2 4 −6 −4 −2 0 2 4
Swiss Franc Swiss Franc
Fig. 5.3. The upper panel represents the graph of the function f3 (t) defined in
(5.66) used in the definition of the distance d3 for the couple Swiss Franc/German
Mark as a function of time t, over the time intervals from 25 January, 1989 to
11 January, 1994 and from 12 January, 1994 to 31 December, 1998. The two lower
panels represent the scatter plot of the return of the German Mark versus the return
of the Swiss Franc during the two previous time periods. The circled dot, in each
figure, shows the pair of returns responsible for the largest deviation of f3 during
the considered time interval. Reproduced from [332]
statistical fluctuations measured by f3 (t) remain small and of the same or-
der. Removing the contribution of these outlier events in the determination of
d3 , the new statistical significance derived according to d3 becomes similar to
that obtained with d1 , d2 and d4 on each subinterval. From the upper panel
of Fig. 5.3, it is clear that the Anderson-Darling distance d3 is equal to the
height of the largest peak corresponding to the event on 19 August, 1991 for
the first period and to the event on 10 September, 1997 for the second period.
These events are depicted by a circled dot in the two lower panels of Fig. 5.3,
which represent the return of the German Mark versus the return of the Swiss
Franc over the two considered time periods.
The event on 19 August, 1991 is associated with the coup against
Gorbachev in Moscow: the German mark (respectively the Swiss Franc) lost
3.37% (respectively 0.74%) against the US dollar. The 3.37% drop of the Ger-
210 5 Description of Financial Dependences with Copulas
man Mark is the largest daily move of this currency against the US dollar over
the whole first period. On 10 September, 1997, the German Mark appreciated
by 0.60% against the US dollar while the Swiss Franc lost 0.79%, which repre-
sents a moderate move for each currency, but a large joint move. This event is
related to the contradictory announcements of the Swiss National Bank about
its monetary policy, which put an end to a rally of the Swiss Franc along with
the German mark against the US dollar.
Thus, removing the large moves associated with major historical events or
events associated with unexpected incoming information7 – which cannot be
accounted for in a statistical study, unless one relies on a stress-test analy-
sis – we obtain, for d3 , significance levels compatible with those obtained with
the other distances. We can thus conclude that, according to the four dis-
tances, during the time interval from 12 January, 1994 to 31 December, 1998
the Gaussian copula hypothesis cannot be rejected for the couple German
Mark/Swiss Franc.
From an economic point of view, the impact of regulatory mechanisms
between currencies or monetary crises can be well identified by the rejection
or the absence of rejection of the null hypothesis. Indeed, consider the couple
German Mark/British Pound. During the first half period, their correlation
coefficient is very high (ρ = 0.82) and the Gaussian copula hypothesis is
strongly rejected according to the four distances. On the contrary, during
the second half period, the correlation coefficient decreases significantly (ρ =
0.56) and none of the four distances allows us to reject the null hypothesis.
Such non-stationarity can be easily explained. Indeed, on 1 January, 1990,
the British Pound entered the European Monetary System (EMS), so that
the exchange rate between the German Mark and the British Pound was not
allowed to fluctuate beyond a margin of 2.25%. However, due to a strong
speculative attack, the British Pound was devaluated in September 1992 and
had to leave the EMS. Thus, between January 1990 and September 1992, the
exchange rate of the German Mark and the British Pound was confined within
a narrow spread, incompatible with the Gaussian copula description. After
1992, the British Pound exchange rate floated with respect to the German
Mark, and the dependence between the two currencies decreased, as shown
by their correlation coefficient. In this latter regime, one can no more reject
the Gaussian copula hypothesis.
The impact of major crises on the copula can also be clearly identified.
An example is given by the Malaysian Ringit/Thai Baht couple. During the
period from January 1989 to January 1994, these two currencies have only
undergone moderate and weakly correlated fluctuations (ρ = 0.29), so that the
null hypothesis cannot be rejected at the 95% significance level. In contrast,
during the period from January 1994 to October 1998, the Gaussian copula
7
Modeling the volatility by a mean reverting stochastic process with long memory
(the multifractal random walk (MRW)), Sornette et al. [456] have demonstrated
the outlier nature of the event on 19 August, 1991.
5.2 Description of Financial Data in Terms of Gaussian Copulas 211
Stocks
Let us now turn to the description of the dependence properties of the dis-
tributions of daily returns for a diversified set of stocks among the largest
companies quoted on the New York Stock Exchange. We report the results
presented in [334] concerning Appl. Materials, AT&T, Citigroup, Coca Cola,
EMC, Exxon-Mobil, Ford, General Electric, General Motors, Hewlett Packard,
IBM, Intel, MCI WorldCom, Medtronic, Merck, Microsoft, Pfizer, Procter &
Gamble, SBC Communication, Sun Microsystem, Texas Instruments, and Wal
Mart.
The dataset covers the time interval from 8 February, 1991 to 29 December,
2000. At the 95% significance level, 75% of the pairs of stocks are compatible
with the Gaussian copula hypothesis. Over the time subinterval from February
1991 to January 1996, this percentage becomes larger than 99% for d1 , d2
and d4 while it equals 94% according to d3 . Over the time subinterval from
February 1996 to December 2000, 92% of the pairs of stocks are compatible
with the Gaussian copula hypothesis according to d1 , d2 and d4 and more
than 79% according to d3 . Therefore, the Gaussian copula assumption is much
more widely accepted for stocks than it was for the currencies reported above.
In addition, the nonstationarity observed for currencies does not seem very
prominent for stocks.
For the sake of completeness, let us add a word concerning the results of
the tests performed for five stocks belonging to the computer sector : Hewlett
Packard, IBM, Intel, Microsoft, and Sun Microsystem. During the first half pe-
riod (from Feb. 1991 to Jan. 1996), all the pairs of stocks qualify the Gaussian
copula hypothesis at the 95% significance level. The results are rather differ-
ent for the second half period (from Feb. 1996 to Dec. 2000) since about 40%
of the pairs of stocks reject the Gaussian copula hypothesis according to d1 ,
d2 and d3 . This can certainly be ascribed to the existence of a few shocks,
notably associated with the crash of the “new economy” in March–April 2000
[450]. However, on the whole, it appears that there is no systematic rejection
of the Gaussian copula hypothesis for stocks within the same industrial sector,
notwithstanding the fact that one can expect correlations stronger than the
average between such stocks.
212 5 Description of Financial Dependences with Copulas
d d
1 2
1 1
0.8 0.8
0.6 0.6
95%
p95%
0.4 0.4
p
0.2 0.2
0 0
0 0.1 0.2 0 0 0.1 0.2 0
0.2 0.3 0.8 0.6 0.4 0.2 0.3 0.8 0.6 0.4
0.4 1 ρ 0.4 1 ρ
1/ν 1/ν
d3 d4
1 1
0.8 0.8
0.6 0.6
95%
p95%
0.4 0.4
p
0.2 0.2
0 0 0 0
0 0.1 0.4 0.2 0 0.1 0.4 0.2
0.2 0.3 0.6 0.2 0.3 0.6
0.4 1 0.8 ρ 0.4 1 0.8 ρ
1/ν 1/ν
Fig. 5.4. Probability of non-rejection of the Gaussian copula hypothesis when the
true copula is given by a Student copula with ν degrees of freedom and a correlation
coefficient equal to ρ (error of type II: “false positive”), when the error of type I
(“false negative”) of the test is set equal to 5%, for the four distances d1 –d4
weak sensitivity of the test in the extreme regions of the copula. It is therefore
important to discuss the sensitivity of the test presented in Sect. 5.2.1 and to
review the other alternatives proposed in the literature.
The previous section has found that the Gaussian copula provides a reasonably
good model, in the sense that it cannot be rejected by a statistical test at the
95% significance level. However, could this be due to the lack of power of the
statistical test rather than to the goodness of the Gaussian copula?
Let us denote by Hν,ρ the hypothesis that the true copula of the data is
the Student copula with ν degrees of freedom with the correlation coefficient
ρ. Considering the alternative hypothesis Hν,ρ , one needs to know what is
the probability that one cannot reject the null hypothesis H0 when the true
model is Hν,ρ . A complementary information is: what is the minimum p-
value (significance level) of the test allowing us to reject the Gaussian copula
hypothesis for instance 95 times out of 100 when the true copula is the Student
214 5 Description of Financial Dependences with Copulas
copula. Answering these questions on the power of the test require a numerical
study.
Figure 5.4 shows the minimum p-value of the test, denoted by p95% , as
a function of the (inverse of the) number of degrees of freedom ν and of the
correlation coefficient ρ of the true Student copula. Overall, the four tests asso-
ciated with the four different distances d1 –d4 behave similarly. As expected,
for large ν, namely ν ≥ 10 − 20 (1/ν ≤ 0.05 − 0.1), a very high p-value is
required to reject the Gaussian hypothesis. In such a case, it is almost impos-
sible to distinguish between the Gaussian hypothesis and a Student copula
for most realizations. If one leaves out distance d3 , the power of the tests
is almost independent of the value of the correlation coefficient. For d3 , the
power is clearly weaker for the smallest correlations.
In the light of these results on the performance of the tests, the previous
conclusion on the relevance of the Gaussian copula for the modeling of the
dependence between financial risks must be reconsidered. Concerning curren-
cies, the non-rejection of the Gaussian copula hypothesis does not exclude
at the 95% significance level that the dependence of the currency pairs may
be described by a Student copula with adequate values of ν and ρ. For the
German Mark/Swiss Franc pair, a Student copula with about five degrees of
freedom was found to obtain the same p-values [334]. For the correlation coef-
ficient ρ = 0.92 of the German Mark/Swiss Franc pair, Student’s copula with
five degrees of freedom predicts a tail dependence coefficient λ5 (0.92) = 63%,
in constrast with a zero value for the Gaussian copula. Such a large value of
λ5 (0.92) implies that, when an extreme event occurs for the German Mark, it
also occurs for the Swiss Franc with a frequency of 63%. Therefore, a stress
scenario based on the assumption of a Gaussian copula would fail to account
for such coupled extreme events, which may represent as many as two-third
of all extreme events, if it would turn out that the true copula was Student’s
copula with five degrees of freedom. Note that, with such a large value of the
correlation coefficient, the tail dependence remains high even if the number
of degrees of freedom is as large as 20 or more (see Fig. 4.8).
The Swiss Franc and Malaysian Ringit pair offers a very different case. For
instance, during the time period from January 1994 to December 1998, the test
statistics are so high that the description of the dependence with Student’s
copula would require it to have at least 7–10 degrees of freedom. In addition,
the correlation coefficient of the two currencies is only ρ = 0.16, so that, even
in the most pessimistic situation ν = 7, the choice of the Gaussian copula
would amount to neglecting the tail dependence coefficient λ5 (0.16) = 4%
predicted by Student’s copula. In this case, stress scenarios based on the
Gaussian copula would predict uncoupled extreme events, which would be
wrong only once in 25 times.
These two examples highlight the fact that, as much as the number of
degrees of freedom of Student’s copula which is necessary to describe the
data, the correlation coefficient remains an important parameter.
5.3 Limits of the Description in Terms of the Gaussian Copula 215
Breymann and his co-authors [83] have shown that the dependence struc-
ture of the German Mark/Japanese Yen couple is better described by Stu-
dent’s copula with about six degrees of freedom (for daily returns) than with
a Gaussian copula, the latter being the second best copula among a set of
five copulas comprising Clayton’s, Gumbel’s, and Frank’s copulas. This re-
sult refines those obtained in [334] and is in line with the results obtained
by non-parametric and semiparametric estimation shown in Figs. 5.1–5.2. In
addition, Student’s copula is found to provide an even better description when
one considers FX returns calculated at smaller time scales [83]. Indeed, the
Student copula seems to provide a reliable model for FX returns calculated
for time scales larger than 2 hours. The number of degrees of freedom is found
to increase with the time scale: it increases from 4 at the 2 hours time scale to
6 at the daily time scale. Such a result is expected since, under time aggrega-
tion, the distribution of returns should converge to the Gaussian distribution
according to the central limit theorem, therefore the dependence structure
of the returns is expected to also converge toward the Gaussian copula at a
large time scale. At time scales smaller than 2 hours, the study by Breymann
et al. shows that neither the Gaussian nor the Student copulas are sufficient
to describe the dependence structure of the distributions of FX returns. At
these small time scales, microstructural effects probably come into play and
require more elaborated copulas to model the dependences observed at very
high frequencies.
In addition, for all time scales, the copula of the bivariate excess returns
for high (or low) threshold appears to be best described by Clayton’s (or by
the survival Clayton) copula. This result can lead us to the following interpre-
tation on the existence of concomitant extremes. Assume that, conditional on
a frailty random variable representing the information flow, the assets returns
are independent. The copula of the returns of the 2 assets then exhibit the
behavior reported in the study by Breymann et al. [83] if one assumes that
the random variable representing the information flow has a regularly varying
distribution – which means that pieces of information with great impact on
asset returns arrive relatively often.
In contrast with the case of foreign exchange rates, the estimated number of
degrees of freedom of the Student copula best fitting the dependence between
216 5 Description of Financial Dependences with Copulas
L̃1 (θ̂ 1 )
ΛT = −2 ln (5.68)
L̃2 (θ̂ 2 )
is asymptotically distributed as a χ21 with one degree of freedom if dim θ 1 =
dim θ 2 −1, up to a scale factor 1+γ larger than one due to the use of a pseudo
maximum-likelihood instead of the true maximum likelihood:
law
ΛT −→ (1 + γ)χ21 , as T −→ ∞ . (5.69)
The positive parameter γ depends on the choice of the model and can be
determined by numerical simulations. In more general cases where dim θ 2 −
dim θ 1 = m > 1, ΛT does not follows an asymptotic χ2m -distribution with m
degrees of freedom as in standard tests of nested hypotheses. This results from
the fact that the log-likelihood ratio statistic does not converge to a χ2 distri-
bution when the model is misspecified, which is the relevant situation when
using the pseudo likelihood instead of the true likelihood (see Appendix 5.B).
In such a case, the Wald or Lagrange multiplier tests are more appropriate
[209, 372].
While these results improve somewhat on the initial study [334], in con-
trast with the case of currencies, one can question the existence of a real
improvement brought by the Student copula to describe the dependence be-
tween stocks. Indeed, correlation coefficients between two stocks are hardly
greater than 0.4–0.5, so that the tail dependence of a Student copula with
11–12 degrees of freedom is about 2.5% or less. In view of all the differ-
ent sources of uncertainty during the estimation process in addition to the
possible non-stationarity of the data, one can doubt that such a description
eventually leads to concrete improvements for practical purposes. To highlight
this point, let us consider several portfolios made of 50% of the Standard &
Poor’s 500 index and 50% of one stock (whose name is indicated in the first
column of Table 5.1). Let us then estimate the probability Pr that this portfo-
lio incurs a loss larger than n times its standard deviation (n = 2, . . . , 5). For
the same portfolio, let us estimate the probability Pg (resp. Ps ) that it incurs
the same loss ( i.e., n times its standard deviation) when the dependence be-
tween the index and the stock is given by a Gaussian copula (resp. a Student
5.3 Limits of the Description in Terms of the Gaussian Copula 217
copula with ten degrees of freedom). The row named Pr /Pg/s gives the aver-
age values of Pr /Pg and Pr /Ps over the 20 portfolios. For shocks of two- and
three-standard deviations, the values of Pr /Pg close to 1 indicate that the
dependence structure is correctly captured by a Gaussian copula. For shocks
of four- and five-standard deviations, Pr /Pg becomes larger than 1, showing
that large shocks are more probable than predicted by the Gaussian depen-
dence, and all the more so, the larger the amplitude of the shocks. This occurs
notwithstanding the use of marginals with heavy tails, suggesting the effect of
a non-zero tail dependence in the true data. In contrast, the values of Pr /Ps
are significantly smaller than 1 showing that the Student copula overestimates
the frequency of large shocks. In addition, this overestimation is surprisingly
worse for larger shocks (by as much as a Factor 2.5) in the range in which the
Gaussian copula becomes less adequate. This suggests that the tail depen-
dence of the Student copula is too large to describe this data set. This simple
exercise illustrates that neither the Gaussian copula nor a Student copula with
a reasonable number of degrees of freedom provide an accurate description of
the dependence between stock returns.8 The discrepancies between these two
models and the real dependence structure becomes all the more important,
the more extreme is the amplitude of the shock. And in fact, the situation is
worse for the Student copula. This suggests that, for practical applications,
Student’s copula may not provide a real improvement with respect to the
Gaussian copula for traditional portfolio management.
The aforementioned studies have not taken into account, or only partially,
the well-known volatility clustering phenomenon, which certainly impacts
on the dependence properties of assets returns. This issue has been addressed
by Patton [380], who has shown that the two-step maximum likelihood es-
timation can be extended to conditional copulas to account for the time-
varying nature of financial time series. Filtering marginal data by a GARCH
process, Patton has shown that the conditional dependence structure between
exchange rates (Japanese Yen against Euro) is better described by Clayton’s
copula than by the Gaussian copula. We also note that Muzy et al. [366] have
constructed a multivariate “multifractal” process to account for both volatil-
ity clustering and the dependence between assets. In this case, the conditional
copula is (nearly) Gaussian.
The main limitation of Patton’s approach comes from the fact that fil-
tering the data does not leave the dependence structure, i.e., the copula,
unchanged. Thus, the copula of the residuals is not the same as the cop-
ula of the raw returns. Moreover, the copula of the residuals changes with
8
This point confirms the doubts raised by the comparison of the nonparametric
and the semiparametric estimates of the density of the copula of the daily returns
of General Motors and Procter & Gamble, represented in Figs. 5.1–5.2.
Table 5.1. Portfolios made of 50% of the Standard & Poors 500 index and 50% of one stock (whose name is indicated in the first
column) are considered
218
100 × Pr[R ≤ −n · σ]
n=2 n=3 n=4 n=5
Pr Pg Ps Pr Pg Ps Pr Pg Ps Pr Pg Ps
Abbott Labs 2.07 2.06 2.62 0.58 0.51 0.84 0.21 0.15 0.39 0.09 0.07 0.26
American Home Products Corp. 1.98 2.07 2.72 0.51 0.56 0.98 0.30 0.24 0.43 0.17 0.13 0.22
Boeing Co. 2.03 1.96 2.50 0.53 0.51 0.95 0.21 0.18 0.44 0.13 0.09 0.19
Bristol-Myers Squibb Co. 1.56 1.81 2.33 0.55 0.48 0.98 0.26 0.22 0.81 0.11 0.1 0.42
Chevron Corp. 1.94 1.99 2.26 0.40 0.42 0.88 0.13 0.15 0.55 0.08 0.07 0.30
Du Pont (E.I.) de Nemours & Co. 2.13 2.02 2.59 0.51 0.47 0.87 0.21 0.19 0.58 0.09 0.07 0.32
Disney (Walt) Co. 1.83 1.87 2.40 0.47 0.53 1.28 0.24 0.22 0.73 0.15 0.12 0.43
General Motors Corp. 1.73 1.95 2.12 0.45 0.42 0.76 0.21 0.13 0.59 0.08 0.06 0.36
Hewlett-Packard Co. 1.77 2.08 2.54 0.53 0.51 0.99 0.21 0.19 0.44 0.08 0.09 0.15
Coca-Cola Co. 1.60 1.83 2.13 0.45 0.5 0.77 0.19 0.18 0.58 0.09 0.07 0.46
Minnesota Mining & MFG Co. 1.85 2.01 2.23 0.57 0.49 0.80 0.19 0.19 0.60 0.08 0.09 0.52
Philip Morris Cos Inc. 2.00 2.07 2.33 0.45 0.5 1.10 0.21 0.19 0.65 0.13 0.12 0.34
Pepsico Inc. 1.92 2.08 2.50 0.51 0.49 0.83 0.15 0.18 0.39 0.15 0.07 0.22
Procter & Gamble Co. 1.51 1.67 2.05 0.45 0.48 0.95 0.24 0.21 0.82 0.13 0.09 0.67
Pharmacia Corp. 1.81 1.94 2.69 0.53 0.54 1.06 0.23 0.25 0.80 0.11 0.12 0.45
Schering-Plough Corp. 1.85 1.94 2.01 0.49 0.44 0.73 0.11 0.14 0.58 0.08 0.06 0.31
Texaco Inc. 1.90 1.94 2.77 0.55 0.55 1.01 0.28 0.23 0.41 0.11 0.11 0.21
5 Description of Financial Dependences with Copulas
Texas Instruments Inc. 1.87 2.02 2.09 0.49 0.5 0.89 0.21 0.15 0.66 0.06 0.07 0.16
United Technologies Corp 2.17 2.1 2.28 0.47 0.45 0.78 0.17 0.14 0.47 0.11 0.06 0.30
Walgreen Co. 1.81 1.96 2.28 0.47 0.41 0.92 0.23 0.14 0.40 0.09 0.08 0.21
Pr /Pg/s 0.95 0.79 1.02 0.55 1.15 0.39 1.24 0.38
We estimate the probability Pr that each portfolio incurs a loss larger than n times its standard deviation (n = 2, . . . , 5). For each
portfolio, we also estimate the probability Pg (resp. Ps ) that it incurs the same loss (i.e., n times its standard deviation) when the
dependence between the index and the stock is given by a Gaussian copula (resp. a Student copula with ten degrees of freedom).
The row named Pr /Pg/s gives the average values of Pr /Pg and Pr /Ps over the 20 portfolios.
5.4 Summary 219
the chosen filter. Residuals are not the same when one filters the data with
an ARCH, a GARCH or a Multifractal Random Walk. In addition, for an
arbitrage-free market, the (multivariate) log-price process can be expressed
as a time changed multivariate Brownian motion9 [264], so that conditional
on the (realized) volatility [8, 38], the log-price process is nothing but a mul-
tivariate Brownian motion. As a consequence, conditional on the volatility,
the multivariate distribution of returns should be Gaussian, and, therefore,
the copula of conditional returns should also be the Gaussian copula. Thus,
the estimation of the conditional copula does not really bring new insights. In
fine, the discrepancy between the Gaussian copula and the conditional copula
provided by some other model mainly highlights the weakness of the model
under consideration. This raises the question whether performing a model-free
analysis (without any pre-filtering process) is not a more satisfying alterna-
tive. Obviously, the price to pay for such a model-free approach is a weakening
of the power of the statistical test due to the presence of (temporal) depen-
dence between data. There is no free lunch, neither on financial markets, nor
in statistics.
5.4 Summary
The Gaussian paradigm has had a long life in finance. While it is now clear that
marginal distributions cannot be described by Gaussian laws, especially in
their tails (see Chap. 2), the dependence structure between two or more assets
is much less known and nothing suggests to reject a priori the Gaussian copula
as a correct description of the observed dependence structure. In addition,
the Gaussian copula can be derived in a very natural way from a principle of
maximum entropy [265, 453].10 The Gaussian copula has also the advantage
of being the simplest possible one in the class of elliptical copulas, since it
is entirely specified by the knowledge of the correlation coefficients while, for
instance, Student’s copula requires in addition the specification of the number
of degrees of freedom. This has led to taking the Gaussian copula as a logical
starting point for the study of the dependence structure between financial
assets.
However, as recalled in Chap. 3, if the Gaussian and Student copulas are
very similar in their bulk, they become significantly different in their tails.
9
More precisely, in an arbitrage-free market, any n-dimensional square-integrable
log-price process ln p(t), with continuous sample path, satisfies
t+τ t+τ
rτ (t) = ln p(t + τ ) − ln p(t) = µ(s) ds + σ(s) dW (s) ,
t t
Concretely, the essential difference between the Gaussian and Student copu-
las is that the former has independent extremes (in the sense of the asymptotic
tail dependence; see Chap. 4), while the latter generates concomitant extremes
with a non-zero probability which is all the larger, the smaller is the number
of degrees of freedom and the larger is the correlation coefficient. Thus, by
providing a slight departure from the Gaussian copula in the bulk of the dis-
tributions, Student’s copula could also be a good candidate to model financial
dependencies. It turns out that it is indeed a good model for foreign exchange
rates. The situation is not so clear of stock returns, as Student’s copula does
not seem to perform significantly better than the Gaussian copula, both be-
ing apparently approximations of the true copula. From a practical point of
view, there have been several efforts to find better copulas, but the obtained
gains are not clear. From an economic point of view, the reasons explaining
the difference between the dependence structure of the FX rate and the stock
returns remain to be found. The differences between stock markets and FX
markets organizations can be seen as an obvious reason, but direct links be-
tween markets organization and returns distribution or copula have not yet
been clearly articulated.
One of the motivations in introducing the tail dependence coefficient λ is
to quantify the potential risks incurred in modeling the dependence structure
between assets with Gaussian copulas, for which λ = 0. Indeed, for assets with
large correlation coefficients, it may be dangerous to use Gaussian copulas as
long as one does not have a better idea of the value of the tail dependence
coefficient. Parametric models do not provide readily this information since
they fix the tail dependence coefficient and therefore do not provide an inde-
pendent test of whether λ is small (and undistinguishable from 0) or large.
To get further insight, nonparametric methods could thus be useful.
Nonparametric models have the advantage of being much more general
since, by construction, they do not assume a specific copula and might thus
allow for an independent determination of the tail dependence coefficient.
Some of these methods have the advantage of leading to estimated copulas
which are smooth and differentiable everywhere, which is convenient for the
generation of random variables having the estimated copula, for sensitivity
analysis and for the generation of synthetic scenarios [149]. However, this
advantage comes with the main drawback that the tail dependence coefficient
vanishes by construction. In sum, all methods mentioned until now suffer from
the same problem of neglecting concomitant extremes. It thus seems that the
use of copulas is not the easiest path to calibrate extreme events. We address
this problem in the next chapter, in particular by describing direct methods
for estimating extreme concomitant events.
Appendix 221
Appendix
5.A Proof of the Existence of a χ2 -Statistic
for Testing Gaussian Copulas
To prove proposition 5.2.1, we first consider an n-dimensional random vector
X = (X1 , . . . , Xn ). Let us denote by F its distribution function and by Fi
the marginal distribution of each Xi . Let us now assume that the distribution
function F satisfies H0 , so that F has a Gaussian copula with correlation ma-
trix ρ while the Fi ’s can be any distribution functions. According to Theorem
3.2.1, the distribution F can be represented as :
F (x1 , . . . , xn ) = Φρ,n (Φ−1 (F1 (x1 )), . . . , Φ−1 (FN (xn , ))) . (5.A.1)
Let us now transform the Xi ’s into Normal random variables Yi ’s:
Yi = Φ−1 (Fi (Xi )) . (5.A.2)
−1
Since the mapping Φ (Fi (·)) is increasing, the invariance Theorem 3.2.2 al-
lows us to conclude that the copula of the variables Yi ’s is identical to the
copula of the variables Xi ’s. Therefore, the variables Yi ’s have Normal mar-
ginal distributions and a Gaussian copula with correlation matrix ρ. Thus, by
definition, the multivariate distribution of the Yi ’s is the multivariate Gaussian
distribution with correlation matrix ρ:
G(y) = Φρ,n (Φ−1 (F1 (x1 )), . . . , Φ−1 (Fn (xn ))) (5.A.3)
= Φρ,n (y1 , . . . , yn ) , (5.A.4)
and Y is a Gaussian random vector. From (5.A.3–5.A.4), we have
ρij = Cov[Φ−1 (Fi (Xi )), Φ−1 (Fj (Xj ))] . (5.A.5)
Consider now the random variable
n
Z 2 = Y t ρ−1 Y = Yi (ρ−1 )ij Yj , (5.A.6)
i,j=1
where ·t denotes the transpose operator. It is well known that the variable
Z 2 follows a χ2 -distribution with n degrees of freedom. Indeed, since Y is
a Gaussian random vector with covariance matrix11 ρ, it follows that the
components of the vector
Ỹ = AY , (5.A.7)
are independent Normal random variables. Here, A denotes the square root
of the matrix ρ−1 , obtained by the Cholevsky decomposition, so that At A =
ρ−1 . Thus, the sum Ỹ t Ỹ = Z 2 is the sum of the squares of n independent
Normal random variables, which follows a χ2 -distribution with n degrees of
freedom.
11
Up to now, the matrix ρ was named correlation matrix. But in fact, since the
variables Yi ’s have unit variance, their correlation matrix is also their covariance
matrix.
222 5 Description of Financial Dependences with Copulas
Let us consider the iid sample {(x1 (1), x2 (1), . . . , xn (1)), . . . , (x1 (T ), x2 (T ),
. . . , xn (T ))} drawn from the n-dimensional distribution F with copula C and
margins Fi . We aim at estimating the unknown copula C by use of the semi-
parametric method presented in Sect. 5.1.2. Its pseudo likelihood reads
T
ln L̃T = ln c (û1 (i), . . . , ûn (i); θ) , (5.B.8)
i=1
with ûk (i) = F̂k (xk (i)), where the F̂i ’s are the empirical estimates of the
marginal distribution functions Fi ’s, and c(·; θ) denotes the copula density
Cθ , θ ∈ Θ ⊂ Rp . The parameter vector θ can be estimated by maximization
of this pseudo log-likelihood, so that
where
∂ 2 ln c (u; θ)
Wki (Uk ) = 1{Uk ≤uk } dC u; θ 0 .
u∈[0,1]dim θ ∂θi ∂ui θ=θ 0
(5.B.13)
where
1
T
hT (θ) = ∇θ ln c (û(i); θ) . (5.B.15)
T i=1
1 2
T
A˜T (θ) = ∂θi θj ln c (û(k); θ) . (5.B.17)
ij T
k=1
Proposition A.1 in [197] provides a generalized form of the law of large num-
bers for functionals of rank statistics, so that
a.s
A˜T (θ) −→ E ∂θ2i θj ln c (U ; θ) = −I θ 0 , (5.B.18)
0θ=θ
where I θ 0 denotes Fisher’s information matrix (5.B.11). Evaluating (5.B.16)
at θ = θ̂ T , one finally obtains
√ √
T · hT (θ 0 ) = T · I θ 0 θ̂ T − θ 0 + op (1) , (5.B.19)
as usual.
Proposition A.1 in [197] also states a generalized form of the central limit
theorem for functionals of rank statistics, which allows one to write
√
T · hT (θ 0 ) −→ N 0, Γ θ 0 , (5.B.20)
0 0
where Γ θ = I θ + Ω. Then, (5.B.19–5.B.20) allow us to conclude that
√
T · θ̂ T − θ 0 −→ N 0, Σ 2 , (5.B.21)
−1 −1 −1
where Σ 2 stands for I θ 0 + I θ0 ΩI θ 0 .
Since Ω is a positive definite matrix, the variance of the estimator θ̂ T is
larger than it would be, were the marginal distributions Fi perfectly known.
Indeed, in such a case, the variance of the estimator would be nothing but the
−1
inverse of Fisher’s information matrix I θ 0 .
Now, let us write the vector θ of parameters as follows:
θ1
θ= (5.B.22)
θ2
224 5 Description of Financial Dependences with Copulas
with dim θ 1 = d and dim θ 2 = p − d. We would like to test the null hypothesis
according to which θ 1 = θ 01 , i.e, H0 = {θ ∈ Θ, θ 1 = θ 1}. In Mashal
0
and
Zeevi’s approach [350], this amounts to test H0 = ν, Σ ; ν = ∞ , where
2
where χ2d denotes the χ2 distribution with d degrees of freedom (see Chap. 2,
Sect. 2.4.4).
Unfortunately, this test does not apply with the pseudo likelihood, as pre-
viously assumed [209, 372, 491]. Actually, expanding the pseudo log-likelihood
(5.B.8) around θ 0 and accounting for (5.B.19), we obtain
T
t
L̃T θ̂ T = L̃T θ 0 + θ̂ T − θ 0 I θ 0 θ̂ T − θ 0 + op (1) .
2
(5.B.24)
0
Denoting by θ̂ T the pseudo maximum likelihood estimator under the null
hypothesis (i.e., assuming θ 1 = θ 01 ):
0
T
θ̂ T = arg max ln c (û(i); θ) , (5.B.25)
θ∈H0
i=1
The notation
0 θ 01
θ̂ T = 0 , (5.B.28)
θ̂ 2,T
where the last equality uses the fact that each term is a scalar and is thus
equal to its transpose.
Substituting (5.B.19) in (5.B.26) yields
0
√
1 ∇θ1 L̃T θ̂ T 0
√ · = T · I(θ 0 ) θ̂ T − θ̂ n + op (1) (5.B.29)
T 0
t
√
0 t √ 0
and, left-multiplying by T · θ̂ T − θ 0 = T · 0 shows that
θ̂ 2,T − θ 02
0 t
0
T θ̂ T − θ 0 I θ 0 θ̂ T − θ̂ T = op (1) , (5.B.30)
we have:
0 t
1 ∇θ1 L̃T θ̂ T 0 −1 ∇θ1 L̃T θ̂ 0T
ΛT = I θ + op (1) , (5.B.33)
T 0 0
0 t
0
= T −1 · ∇θ1 L̃T θ̂ T I −1 11 ∇θ1 L̃T θ̂ T + op (1) , (5.B.34)
where I −1 11 denotes the p × p submatrix of the p first rows and columns of
the inverse of I(θ0 ).
From (5.B.29) again, we have
1
0 √
√ · I −1 11 ∇θ1 L̃T θ̂ T = T θ̂ 1,T − θ 01 + op (1) , (5.B.35)
T
so that
226 5 Description of Financial Dependences with Copulas
0 t
0
ΛT = T −1 · ∇θ1 L̃T θ̂ T I −1 11 ∇θ1 L̃T θ̂ n
t −1
= T · θ̂ 1,T − θ 01 I −1 11 θ̂ 1,T − θ 01 + op (1) . (5.B.36)
As a consequence,
ΛT −→
/ χ2d , as T → ∞ (5.B.40)
unless
−1
B I −1 11
B = Idd , (5.B.41)
which holds when Ω = 0, for instance. Therefore, when one resorts to the
pseudo likelihood instead of the actual likelihood, the asymptotic distribu-
tion of Λn is not a simple χ2 distribution and the log-likelihood ratio test
becomes impracticable. In the particular case where dim θ 1 = 1, as in [350],
−1
B I −1 11 B is a scalar so that Λn follows a χ2 distribution with one
degree of freedom, up the scale factor.
6
Measuring Extreme Dependences
should be used with caution since conditioning alone induces a change in the
dependence structure which has nothing to do with a genuine change of un-
conditional dependence. In this respect, for its stability, the coefficient of tail
dependence should be preferred to the conditional correlations. Moreover,
the various measures of dependence exhibit different and sometimes opposite
behaviors, showing that extreme dependence properties possess a multidimen-
sional character that can be revealed in various ways.
As an illustration, the theoretical results and their interpretation presented
below are applied to the controversial contagion problem across Latin Amer-
ican markets during the turmoil periods associated with the Mexican crisis
in 1994 and with the Argentinean crisis that started in 2001. The analysis of
several measures of dependence between the Argentinean, Brazilian, Chilean
and Mexican markets shows that the above conditioning effect does not fully
explain the behavior of the Latin American stock indexes, confirming the ex-
istence of a possible genuine contagion. Our analysis below suggests that the
1994 Mexican crisis has spread over to Argentina and Brazil through conta-
gion mechanisms and to Chile only through co-movements. Concerning the
recent Argentinean crisis that started in 2001, no evidence of contagion to the
other Latin American countries (except perhaps in the direction of Brazil)
can be found but significant co-movements are identified.
The chapter is organized as follows. Sect. 6.1 motivates the whole chap-
ter by presenting a number of historically important cases which suggested
to previous authors that, “during major market events, correlations change
dramatically” [71]. This section then offers a review of the different existing
view points on conditional dependences.
Section 6.2 describes three conditional correlation coefficients:
• the correlation ρ+
v conditioned on signed exceedance of one variable,
• or on both variables (ρu ) and
• the correlation ρsv conditioned on the exceedance of the absolute value of
one variable (amounting to a conditioning on large values of the volatility).
Boyer et al. [78] have provided the general expression of ρ+ s
v and ρv for the
Gaussian bivariate model, which we use to derive their v dependence for large
thresholds v. This analysis shows that, for a given distribution, the condi-
tional correlation coefficient changes even if the unconditional correlation is
left unchanged, and the nature of this change depends on the conditioning set.
We then give the general expression of ρ+ s
v and ρv for the Student’s bivariate
model with ν degrees of freedom and for the factor model X = βY + , for
arbitrary distributions of Y and . By comparison with the Gaussian model,
these expressions exemplify that, for a fixed conditioning set, the behavior of
the conditional correlation change dramatically from one distribution to an-
other one. Conditioning on both variables, we give the asymptotic dependence
of ρu for the bivariate Gaussian model and show that it essentially behaves
like ρ+
v . Applying these results to the Latin American stock indexes, we find
that one cannot entirely explain the behavior of the conditional correlation
6 Measuring Extreme Dependences 229
coefficient for these markets by the conditioning effect, suggesting the exis-
tence of a possible genuine contagion as mentioned above.
In Sect. 6.3, to account for several deficiencies of the correlation coefficient,
we study an alternative measure of dependence, the conditional rank corre-
lation (Spearman’s rho) which, in its unconditional form, is related to the
probability of concordance and discordance of several events drawn from the
same probability distribution, as recalled in Chap. 4. This measure provides
an important improvement with respect to the correlation coefficient since it
only takes into account the dependence structure of the variable and is not
sensitive to the marginal behavior of each variable. Numerical computations
allow us to derive the behavior of the conditional Spearman’s rho, denoted
by ρs (v). This allow us to prove that there is no direct relation between the
Spearman’s rho conditioned on large values and the correlation coefficient
conditioned on the same values. Therefore, each of these coefficients quanti-
fies a different kind of extreme dependence. Then, calibrating the models on
the Latin American market data confirms that the conditional effect cannot
fully explain the observed dependence and that contagion can therefore be in-
voked. These results are much clearer for the conditional Spearman’s rho than
for the condition (linear) correlation coefficient, due to the greater impact of
large statistical fluctuations in the later.
Section 6.4 discusses the tail-dependence parameters λ and λ̄, introduced
in Chap. 4. Applying the procedure of [390], we estimate nonparametrically
the tail dependence coefficients. We find them significant and thus conclude
that, with or without contagion mechanism, extreme co-movements must nat-
urally occur on the various Latin American markets as soon as one of them
undergoes a crisis.
Section 6.5 provides a comparison between these different results and a
synthesis. A first important message is that there is no unique measure of
extreme dependence. Each of the coefficients of extreme dependence that we
have presented provides a specific quantification that is sensitive to a certain
combination of the marginals and of the copula of the two random variables.
Similarly to risks whose adequate characterization requires an extension be-
yond the restricted one-dimensional measure in terms of the variance (volatil-
ity) to include the knowledge of the full distribution, tail-dependence has
also a multidimensional character. A second important message is that the
increase of some of the conditional coefficients of extreme dependence when
weighting more and more the extreme tail range does not necessarily signal
a genuine increase of the unconditional correlation or dependence between
the two variables. The calculations presented here firmly confirm that this
increase is a general and unavoidable result of the statistical properties of
many multivariate models of dependence. From the standpoint of the con-
tagion across Latin American markets, the theoretical and empirical results
suggest an asymmetric contagion phenomenon from Chile and Mexico towards
Argentina and Brazil: large moves of the Chilean and Mexican markets tend
to propagate to Argentina and Brazil through contagion mechanisms, i.e.,
230 6 Measuring Extreme Dependences
with a change in the dependence structure, while the converse does not hold.
As a consequence, this seems to prove that the 1994 Mexican crisis had spread
over to Argentina and Brazil through contagion mechanisms and to Chile only
through co-movements. Concerning the more recent Argentinean crisis start-
ing in 2001, no evidence of contagion to the other Latin American countries
is found (except perhaps in the direction of Brazil) and only co-movements
can be identified.
6.1 Motivations
the volatility increases (many papers on contagion unfortunately use the con-
ditional correlation coefficient as a probe to detect changes of dependence).
We then present an empirical illustration of the evolution of the correlation
between several stock indexes of Latin American markets.
6.2.1 Definition
Cov(X, Y | Y ∈ A)
ρA = . (6.1)
Var(X | Y ∈ A) · Var(Y | Y ∈ A)
Note that ρ and ρA have the same sign, that ρA = 0 if and only if ρ = 0
and that ρA does not depend directly on Var(X). Note also that ρA can be
either greater or smaller than ρ since Var(Y | Y ∈ A) can be either greater
or smaller than Var(Y ). Let us illustrate this property in the two following
examples, with a conditioning on large positive (or negative) returns and a
conditioning on large volatility. The difference comes from the fact that in the
first case, one accounts for the trend while one neglects this information in
the second case.
These two simple examples will show that, in the case of two Gaussian ran-
dom variables, the two conditional correlation coefficients ρ+ s
v and ρv exhibit
+
opposite behaviors since the conditional correlation coefficient ρv is a decreas-
ing function of the conditioning threshold v (and goes to zero as v → +∞)
while the conditional correlation coefficient ρsv is an increasing function of v
and goes to one as v → ∞. These opposite behaviors seem very general and
do not depend on the particular choice of the joint distribution of X and Y ,
namely the Gaussian distribution studied until now, as it will be seen in the
sequel.
6.2 Conditional Correlation Coefficient 235
Let us first consider the conditioning set A = [v, +∞), with v ∈ R+ . Thus
ρA is the correlation coefficient conditioned on the returns Y larger than a
given positive threshold v. It will be denoted by ρ+
v in the sequel. Assuming for
simplicity, but without loss of generality that Var(Y ) = 1, an exact calculation
given below shows that, for large v,
ρ 1
v ∼v→∞
ρ+ · , (6.3)
1− ρ2 |v|
Proof. We start with the calculation of the first and the second moments of
Y conditioned on Y larger than v:
√
2 1 2 1
E(Y | Y > v) = √ v2
=v+ − 3 +O 5
, (6.4)
πe 2 erfc √2v v v v
√
2v 2 1
E(Y | Y > v) = 1 + √ v2
2
=v +2− 2 +O
2
4
, (6.5)
πe 2 erfc √2v v v
Let now the conditioning set be A = (−∞, −v] ∪ [v, +∞), with v ∈ R+ .
Thus ρA is the correlation coefficient conditioned on |Y | larger than v, i.e.,
it is conditioned on a large volatility of Y . Still assuming Var(Y ) = 1, this
correlation coefficient is denoted by ρsv and, for large v
ρ 1 1 − ρ2 1
ρsv ∼v→∞ 0 ∼v→∞ sgn(ρ) · 1 − , (6.7)
ρ2 + 1−ρ
2 2 ρ2 v 2
2+v 2
Expression (6.10) is the same as (6.6) as it should. This gives the following
conditional variance:
√
2v 1
Var(Y | |Y | > v) = 1 + √ v2
=v +2+O2
2
, (6.11)
πe 2 erfc √2v v
Intuitive Meaning
Let us provide an intuitive explanation (see also [315]). As seen from (6.2), ρ+
v
is controlled by Var(Y | Y > v) ∝ 1/v 2 derived in the example 1. In contrast,
as seen from (6.8), ρsv is controlled by Var(Y | |Y | > v) ∝ v 2 given in the
example 2. The difference between ρ+ s
v and ρv can thus be traced back to that
between Var(Y | Y > v) ∝ 1/v and Var(Y | |Y | > v) ∝ v 2 for large v.
2
This results from the following effect. For Y > v, one can picture the
possible realizations of Y as those of a random particle on the line, which is
strongly attracted to the origin by a spring (the Gaussian distribution that
prevents Y from performing significant fluctuations beyond a few standard
6.2 Conditional Correlation Coefficient 237
ρsv ∼v→∞ 1/v 2 directly by the following intuitive argument. Using the picture
of particles, X and Y can be visualized as the positions of two particles which
fluctuate randomly. Their joint bivariate Gaussian distribution with nonzero
unconditional correlation amounts to the existence of a spring that ties them
together. Their Gaussian marginals also exert a spring-like force attaching
them to the origin. When Y > v, the X-particle is teared off between two
extremes, between 0 and v. When the unconditional correlation ρ is less than
1, the spring attracting to the origin is stronger than the spring attracting
to the wall at v. The particle X thus undergoes tiny fluctuations around the
origin that are relatively less and less attracted by the Y -particle, hence the
v ∼v→∞ v → 0. In contrast, for |Y | > v, notwithstanding the still
1
result ρ+
strong attraction of the X-particle to the origin, it can follow the sign of
the Y -particle without paying too much cost in matching its amplitude |v|.
Relatively tiny fluctuation of the X-particle but of the same sign as Y ≈ ±v
will result in a strong ρsv , thus justifying that ρsv → 1 for v → +∞.
ρ+ s
v and ρv converge both, at infinity, to nonvanishing constants (excepted for
ρ = 0). Moreover, for ν larger than νc
2.839, this constant is smaller than
the unconditional correlation coefficient ρ, for all value of ρ, in the case of ρ+
v,
while for ρsv it is always larger than ρ, whatever ν (larger than two) may be.
These results show that, conditioned on large returns, ρ+ v is a decreasing
function of the threshold v (at least when ν ≥ 2.839), while, conditioned on
large volatilities, ρsv is an increasing function of v.
To give another example, let us now assume that X and Y are two random
variables following the equation:
X = βY + , (6.15)
where
6.2 Conditional Correlation Coefficient 239
β · σy
ρ= 0 (6.17)
β 2 · σy2 + σ2
sgn(β)
ρ+,s
v ∼0 , (6.18)
1 + vK2
Cov(X, Y | X ∈ A, Y ∈ B)
ρA,B = . (6.19)
Var(X | X ∈ A, Y ∈ B) · Var(Y | X ∈ A, Y ∈ B)
In this case, it is much more difficult to obtain general results for any
specified class of distributions compared with the previous case of conditioning
on a single variable. Here, we give the asymptotic behavior for a Gaussian
240 6 Measuring Extreme Dependences
Let us consider four national stock markets in Latin America, namely Ar-
gentina (MERVAL index), Brazil (IBOV index), Chile (IPSA index) and Mex-
ico (MEXBOL index). We are particularly interested in the contagion effects
which may have occurred across these markets. We will study this question
for the market indexes expressed in US Dollar to emphasize the effect of the
devaluations of local currencies and to account for monetary crises. Doing so,
we follow the same methodology as in most contagion studies (see [178], for
instance). Our sample contains the daily (log) returns of each stock in local
currency and US dollar during the time interval from 15 January, 1992 to 15
June, 2002 and thus encompasses both the Mexican crisis as well as the more
recent Argentinean crisis.
Before applying the theoretical results derived above, we need to test
whether the distributions of the returns are not too fat-tailed so that the
correlation coefficient exists. Recall that this is the case if and only if the tail
of the distribution decays faster than a power law with tail index µ = 2, and
its estimator given by the Pearson’s coefficient is well behaved if at least the
fourth moment of the distribution is finite.
Figure 6.1 shows the complementary distribution of the positive and neg-
ative tails of the index returns of four Latin American countries in US dollars.
The positive tail clearly decays faster than a power law with tail index µ = 2.
In fact, Hill’s estimator provides a value ranging between 3 and 4 for the four
indexes. The situation for the negative tail is slightly different, particularly for
the Brazilian index. For the Argentina, the Chilean and the Mexican indexes,
6.2 Conditional Correlation Coefficient 241
Positive Tail
100
10−1
µ=2
10−2
Argentina
10−3 Brazil
Chile
Mexico
10−4
10−4 10−3 10−2 10−1 100
Negative Tail
100
10−1
µ=2
10−2
Argentina
10−3 Brazil
Chile
Mexico
10−4
10−4 10−3 10−2 10−1 100
Fig. 6.1. The upper (respectively lower) panel graphs the complementary distri-
bution of the positive (respectively the minus negative) returns in US dollar of the
indices of four countries (Argentina, Brazil, Chile and Mexico). The straight line
represents the slope of a power law with tail exponent µ = 2
the negative tail behaves almost like the positive one, but for the Brazilian
index, the negative tail exponent is hardly larger than two, as confirmed by
Hill’s estimator. This means that, in the Brazilian case, the estimates of the
correlation coefficient will be particularly noisy and thus of weak statistical
value.
We have checked that the fat-tailness of the indexes expressed in US dollar
comes from the impact of the exchange rates. Thus, an alternative should be
to consider the indexes in local currency, following the methodology of [314]
and [315], but it would lead to focus on the linkages between markets only and
to neglect the impact of the devaluations, which is precisely the main concern
of studies on contagion.
Figures 6.2, 6.3 and 6.4 give the conditional correlation coefficient ρv+,−
(plain thick line) for the pairs (Argentina/Brazil), (Brazil/Chile) and (Chile/
Mexico) while the Figs. 6.5, 6.6 and 6.7 show the conditional correlation co-
efficient ρsv for the same pairs. For each figure, the thick dashed line gives the
theoretical curve obtained under the bivariate Gaussian assumption whose
analytical expressions can be found in Sect. 6.2.2. The unconditional corre-
242 6 Measuring Extreme Dependences
Argentina−Brazil
1
0.5
ρv ,y=Brazil
0
+,−
−0.5
−1
−5 −4 −3 −2 −1 0 1 2 3
v
1
ρv ,y=Argentina
0.5
0
+,−
−0.5
−1
−4 −3 −2 −1 0 1 2 3 4
v
Fig. 6.2. In the upper panel, the thick plain curve depicts the correlation coeffi-
cient between the daily returns of the Argentinean and the Brazilian stock indices
conditional on the Brazilian stock index daily returns larger than (smaller than) a
given positive (negative) value v (after normalization by the standard deviation).
The thick dashed curve represents the theoretical conditional correlation coefficient
ρv+,− calculated for a bivariate Gaussian model, while the two thin dashed curves
define the area within which we cannot consider at the 95% confidence level that
the estimated correlation coefficient is significantly different from its Gaussian theo-
retical value. The dotted curves provide the same information under the assumption
of a bivariate Student’s model with ν = 3 degrees of freedom. The lower panel is
the same as the upper panel but the conditioning is on the Argentinean stock index
daily returns larger than (smaller than) a given positive (negative) value v (after
normalization by the standard deviation)
Brazil−Chile
1
0.5
ρv ,y=chile
0
+,−
−0.5
−1
−4 −3 −2 −1 0 1 2 3 4
v
0.5
ρv ,y=Brazil
0
+,−
−0.5
−1
−4 −3 −2 −1 0 1 2 3
v
Fig. 6.3. Same as Fig. 6.2 for the (Brazil, Chile) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Chilean (respectively Brazilian)
stock market index
Appendices 6.B.3 and 6.B.4) and the two thin dotted lines are its 95% confi-
dence level. Here, the Fisher’s statistics cannot be applied, since it requires at
least that the fourth moment of the distribution exists. In fact, Meerschaert
and Scheffler have shown that, for ν = 3, the distribution of the sample cor-
relation converges to a stable law with index 3/2 [356]. This explains why
the confidence interval for the Student’s model with three degrees of free-
dom is much larger than the confidence interval for the Gaussian model. In
the present case, we have used a bootstrap method to derive this confidence
interval since the scale factor of the stable law is difficult to calculate.
In Figs. 6.2, 6.3 and 6.4, the changes in the conditional correlation coef-
ficients ρv+,− are not significantly different, at the 95% confidence level, from
those obtained with a bivariate Student’s model with three degrees of free-
dom. In contrast, the Gaussian model is almost always rejected as expected,
since marginal returns distributions are not Gaussian (as shown by Fig. 6.1).
In fact, similar results hold (but are not depicted here) for the three oth-
ers pairs (Argentina/Chile), (Argentina/Mexico) and (Brazil/Mexico). Since
these results are compatible with a Student’s model with constant correlation,
244 6 Measuring Extreme Dependences
Chile−Mexico
1
ρv ,y=Mexico
0.5
0
+,−
−0.5
−1
−4 −3 −2 −1 0 1 2 3 4
v
0.5
ρv ,y=Chile
0
+,−
−0.5
−1
−4 −3 −2 −1 0 1 2 3 4
v
Fig. 6.4. Same as Fig. 6.2 for the (Chile, Mexico) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Mexican (respectively Chilean)
stock market index
Argentina−Brazil
1.2
1
0.8
ρv ,y=Brazil
0.6
0.4
0.2
s
0
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
v
1.2
1
ρv ,y=Argentina
0.8
0.6
0.4
0.2
s
0
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4
v
Fig. 6.5. In the upper panel, the thick plain curve gives the correlation coeffi-
cient between the daily returns of the Argentinean and the Brazilian stock indices
conditioned on the daily volatility of the Brazilian stock index being larger than
a given value v (after normalization by the standard deviation). The thick dashed
curve represents the theoretical conditional correlation coefficient ρv+,− calculated
for a bivariate Gaussian model, while the two thin dashed curves delineate the area
within which we cannot consider at the 95% confidence level that the estimated
correlation coefficient is significantly different from its Gaussian theoretical value.
The dotted curves provide the same information using a bivariate Student’s model
with ν = 3 degrees of freedom. The lower panel is the same as the upper panel but
the conditioning is on the Argentinean stock index
and Mexico on the other hand, when the volatility or the returns exhibit large
moves. In contrast, in period of high volatility, the Chilean and Mexican mar-
ket seem to have a genuine impact on the Argentinean and Brazilian markets.
A priori, this should confirm the existence of a contagion across these mar-
kets. However, this conclusion is based only on two theoretical models. One
should thus remain cautious before concluding positively on the existence of
contagion on the sole basis of these results, in particular in view of the use of
theoretical models which are all symmetric in their positive and negative tails.
Such a symmetry is crucial for the derivation of the theoretical expressions of
ρsv . However, the empirical sample distributions are certainly not symmetric,
246 6 Measuring Extreme Dependences
Brazil−Chile
1.2
1
0.8
ρv ,y=Chile
0.6
0.4
0.2
s
0
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
v
1.2
1
0.8
ρv ,y=Brazil
0.6
0.4
0.2
s
0
−0.2
0 1 2 3 4 5 6
v
Fig. 6.6. Same as Fig. 6.5 for the (Brazil, Chile) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Chilean (respectively Brazilian)
stock market index
as shown in Fig. 6.1. Using univariate and bivariate switching volatility mod-
els, Edwards and Susmel [142] have found strong volatility co-movements in
Latin American but no clear evidence of contagion.
6.2.6 Summary
The previous sections have shown that the conditional correlation coefficients
can exhibit all possible types of behavior, depending on their conditioning set
and the underlying distributions of returns. More precisely, we have shown
that the correlation coefficients, conditioned on large returns or volatility
above a threshold v, can be either increasing or decreasing functions of the
threshold, can go to any value between zero and one when the threshold goes
to infinity and can produce contradictory results in the sense that accounting
for a trend or not can lead to conclude on an absence of linear correlation or
on a perfect linear correlation. Moreover, due to the large statistical fluctua-
tions of the empirical estimates, one should be very careful when concluding
on an increase or decrease of the genuine correlations.
Thus, from the general standpoint of the study of extreme dependences,
but more particularly for the specific problem of the contagion across coun-
tries, the use of conditional correlation does not seem very informative and
6.3 Conditional Concordance Measures 247
Chile−Mexico
1.2
1
ρv ,y=Mexico
0.8
0.6
0.4
0.2
s
0
−0.2
0 1 2 3 4 5 6
v
1.2
1
0.8
ρv ,y=Chile
0.6
0.4
0.2
s
0
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
v
Fig. 6.7. Same as Fig. 6.5 for the (Chile, Mexico) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Mexican (respectively Chilean)
stock market index
6.3.1 Definition
Recall that Spearman’s rho, denoted ρs in the sequel, measures the difference
between the probability of concordance and the probability of discordance
for the two pairs of random variables (X1 , Y1 ) and (X2 , Y3 ), where the pairs
(X1 , Y1 ), (X2 , Y2 ) and (X3 , Y3 ) are three independent realizations drawn from
the same distribution:
ρs = 3 (Pr[(X1 − X2 )(Y1 − Y3 ) > 0] − Pr[(X1 − X2 )(Y1 − Y3 ) < 0]) . (6.21)
Thus, setting U = FX (X) and V = FY (Y ), we have seen that ρs is nothing
but the (linear) correlation coefficient of the uniform random variables U and
6.3 Conditional Concordance Measures 249
Cov(U, V )
ρs = , (6.22)
Var(U )Var(V )
which justifies its name as a correlation coefficient of the rank, and shows that
it can easily be estimated.
An attractive feature of the Spearman’s rho is to be independent of the
margins, as we can see in equation (6.22). Thus, contrarily to the linear cor-
relation coefficient, which aggregates the marginal properties of the variables
with their collective behavior, the rank correlation coefficient takes into ac-
count only the dependence structure of the variables.
Using expression (6.22), a natural definition of the conditional rank cor-
relation, conditioned on V larger than a given threshold ṽ, can be proposed:
Cov(U, V | V ≥ ṽ)
ρs (ṽ) = , (6.23)
Var(U | V ≥ ṽ)Var(V | V ≥ ṽ)
6.3.2 Example
ρs(v)
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v v
Fig. 6.8. Conditional Spearman’s rho for a bivariate Gaussian copula (left panel )
and a Student’s copula with three degrees of freedom (right panel ), with an uncon-
ditional linear correlation coefficient ρ = 0.1, 0.3, 0.5, 0.7, 0.9, as a function of the
constraint level v
Figures 6.9, 6.10 and 6.11 give the conditional Spearman’s rho respectively for
the (Argentinean/Brazilian), the (Brazilian/Chilean), and the (Chilean/Mex-
ican) stock markets. As previously, the plain thick line refers to the estimated
correlation, while the dashed lines refer to the Gaussian copula and its 95%
confidence levels and and dotted lines to Student’s copula with three degrees
of freedom and its 95% confidence levels.
Contrarily to the cases of the conditional (linear) correlation coefficient
exhibited in Figs. 6.2, 6.3 and 6.4, the empirical conditional Spearman’s ρ
does not always comply with the Student’s model (neither with the Gaussian
Argentina−Brazil
0.4
0.2
ρv ,y=Brazil
−0.2
+
−0.4
0.4
ρv ,y=Argentina
0.2
0
+
−0.2
−0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v
Fig. 6.9. In the upper panel, the thick curve shows Spearman’s rho between the
Argentinean stock index daily returns and the Brazilian stock index daily returns.
Above the quantile v = 0.5, Spearman’s rho is conditioned on the Brazilian index
daily returns whose quantiles are larger than v, while below the quantile v = 0.5 it is
conditioned on the Brazilian index daily returns whose quantiles are smaller than v.
As in the above figures for the correlation coefficients, the dashed lines refer to the
prediction of the Gaussian copula and its 95% confidence levels and the dotted lines
to Student’s copula with three degrees of freedom and its 95% confidence levels. The
lower panel is the same as the upper panel but with the conditioning done on the
Argentinean index daily returns
252 6 Measuring Extreme Dependences
Brazil−Chile
0.6
0.4
ρv ,y=Chile
0.2
+,−
−0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v
0.2
0
ρv ,y=Brazil
−0.2
−0.4
+,−
−0.6
−0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v
Fig. 6.10. Same as Fig. 6.9 for the (Brazil, Chile) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Chilean (respectively Brazilian)
stock market index
one), and thus confirm the discrepancies observed in Figs. 6.5, 6.6 and 6.7. In
all cases, for thresholds v larger than the quantile 0.5 corresponding to the
positive returns, the Student model with three degrees of freedom is almost
always sufficient to explain the data. In contrast, for the negative returns and
thus thresholds v lower then the quantile 0.5, only the interaction between
the Chilean and the Mexican markets is well described by the Student copula
and does not need to invoke the contagion mechanism. For all other pairs,
none of these models explain the data satisfyingly. Therefore, for these cases
and from the perspective of these models, the contagion hypothesis seems to
be needed.
There are however several caveats. First, even though we have considered
the most natural financial models, there may be other models with constant
dependence structure, that we have ignored, which could account for the ob-
served evolutions of the conditional Spearman’s ρ. If this is the case, then
the contagion hypothesis would not be needed. Second, the main discrepancy
between the empirical conditional Spearman’s ρ and the prediction of Stu-
dent’s model does not occur in the tails of the distribution, i.e for large and
extreme movements, but in the bulk. Thus, during periods of turmoil, the
6.3 Conditional Concordance Measures 253
Chile−Mexico
0.5
0.4
ρv ,y=Mexico
0.3
0.2
0.1
+,−
0
−0.1
−0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v
0.6
0.4
ρv ,y=Chile
0.2
+,−
−0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v
Fig. 6.11. Same as Fig. 6.9 for the (Chile, Mexico) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Mexican (respectively Chilean)
stock market index
Student’s model with three degrees of freedom seems to remain a good model
of co-movements. Third, the contagion effect is never necessary for upwards
moves. Indeed, we observe the same asymmetry or trend dependence as found
in [315] for five major equity markets. This was apparent in Figs. 6.2, 6.3 and
6.4 for ρv+,− , and is strongly confirmed on the conditional Spearman’s ρ.
Interestingly, there is also an asymmetry or directivity in the mutual influ-
ence between markets. For instance, the Chilean and Mexican markets have
an influence on the Argentinean and Brazilian markets, but the later do not
have any impact on the Mexican and Chile markets. Chile and Mexico have
no contagion effect on each other while Argentina and Brazil have.
These empirical results on the conditional Spearman’s rho are different
from and often opposite to the conclusion derived from the conditional corre-
lation coefficients ρv+,− . This puts in light the difficulty in obtaining reliable,
unambiguous and sensitive estimations of conditional correlation measures.
In particular, Pearson’s coefficient usually employed to estimate the correla-
tion coefficient between two variables is known to be not very efficient when
the variables are fat-tailed and when the estimation is performed on a small
254 6 Measuring Extreme Dependences
For the sake of completeness, and since it is directly related to the multivari-
ate extreme value theory, we study the coefficient of tail dependence λ, which
has been defined in Sect. 4.5. It would seem that the coefficient of tail depen-
dence could provide a useful measure of the extreme dependence between two
random variables for the analysis of contagion between markets. Two possi-
bilities can occur. Either the whole data set does not exhibit tail dependence,
and a contagion mechanism seems necessary to explain the occurrence of con-
comitant large movements during turmoil periods. Or, the data set exhibits
tail dependence which by itself is enough to produce concomitant extremes
(and contagion is not needed).
Unfortunately, the empirical estimation of the coefficient of tail dependence
is a strenuous task. Indeed, a direct estimation of the conditional probability
Pr{X > FX −1 (u) | Y > FY −1 (u)}, which should tend to λ when u → 1 is
very difficult to implement in practice due to the combination of the curse of
dimensionality and the drastic decrease of the number of realizations as u be-
come close to one. A better approach consists in using kernel methods, which
generally provide smooth and accurate estimators [168, 284, 305]. However,
these smooth estimators lead to copulas which are differentiable. This auto-
matically gives vanishing tail dependence, as already mentioned in Chap. 5.
Indeed, in order to obtain a nonvanishing coefficient of tail dependence, it is
necessary for the corresponding copula to be nondifferentiable at the point
(1, 1) (or at (0, 0)). An alternative is then the fully parametric approach. One
can choose to model dependence via a specific copula, and thus to deter-
mine the associated tail dependence [315, 334, 380]. The problem with such
a method is that the choice of the parameterization of the copula amounts to
choose a priori whether or not the data presents tail dependence.
6.4 Extreme Co-movements 255
In fact, there are three ways for estimating the tail dependence coefficient.
The two first methods are specific to a class of copulas or of models, while
the last one is very general, but less accurate. The first method is only re-
liable when the underlying copula is known to be Archimedean. In such a
case, the limit theorem established by Juri and Wüthrich [260] (see Chap. 3.)
allows one to estimate the tail dependence. The problem is that it is not
obvious that the Archimedean copulas provide a good representation of the
dependence structure for financial assets. For instance, the Archimedean cop-
ulas are generally inconsistent with a representation of assets by linear factor
models. A second method – based upon results of Sect. 4.5.3 – offers good
results by allowing to estimate the tail dependence in a semiparametric way,
which solely relies on the estimation of marginal distributions, when the data
can be explained by a factor model [332, 335].
When none of these situations occur, or when the factors are too difficult
to extract, a third and fully nonparametric method exists, which is based upon
the mathematical results of Ledford and Tawn [294, 295] and Coles et al. [106]
and has recently been applied by Poon et al. [390]. The method consists in
transforming the original random variables X and Y into Fréchet random
variables denoted by S and T respectively. Then, considering the variable
Z = min{S, T }, its survival distribution is:
the coefficient of tail dependence λ and the coefficient λ̄, defined by (4.84),
are simple functions of d and η: λ̄ = 2 · η − 1 with λ = 0 if η < 1, or λ̄ = 1
and λ = d otherwise. The parameters η and d can be estimated by maximum
likelihood, and deriving their asymptotic statistics allows one to test whether
the hypothesis λ̄ = 1 can be rejected or not, and consequently, whether the
data present tail dependence or not.
Let us implement this procedure on the four previously considered Latin
American markets (Argentina, Brazil, Chile and Mexico). The results for the
estimated values of the coefficient of tail dependence are given in Table 6.1
both for the positive and the negative tails. The tests show that one cannot
reject the hypothesis of tail dependence between the four considered Latin
American markets. Notice that the positive tail dependence is almost always
slightly smaller than the negative one, which could be linked with the existence
of trend asymmetry [315], but it turns out that these differences are not sta-
tistically significant. These results indicate that, according to this analysis of
the extreme dependence coefficient, the propensity of extreme co-movements
is almost the same for each pair of stock markets: even if the transmission
mechanisms of a crisis are different from one country to another one, the
propagation occurs with the same probability overall. Thus, the subsequent
256 6 Measuring Extreme Dependences
Table 6.1. Coefficients of tail-dependence between pairs among four Latin Ameri-
can markets. The figure within parenthesis gives the standard deviation of the esti-
mated value derived under the assumption of asymptotic normality of the estimators.
Only the coefficients above the diagonal are indicated since they are symmetric
Table 6.2. Coefficients of tail dependence between pairs among four Latin Amer-
ican markets derived under the assumption of a Student copula with three degrees
of freedom
Student hypothesis ν = 3
Argentina Brazil Chile Mexico
Argentina – 0.24 0.25 0.27
Brazil – 0.24 0.27
Chile – 0.28
Mexico –
risks are the same. Table 6.2 also gives the coefficients of tail dependence es-
timated under the Student’s copula (or in fact any copula derived from an
elliptical distribution – see Chap. 4) with three degrees of freedom, given by
expression (4.91). One can observe a remarkable agreement between these
values and the nonparametric estimates given in Table 6.1. This is consis-
tent with the results given by the conditional Spearman’s rho, for which we
have remarked that the Student’s copula seems to reasonably account for the
extreme dependence.
variables for the bivariate Gaussian, the Student’s model, the Gaussian factor
model and the Student’s factor model. These results provide a quantitative
proof that conditioning on exceedance leads to conditional correlation coef-
ficients that may be very different from the unconditional correlation. This
provides a straightforward mechanism for fluctuations or changes of correla-
tions, based on fluctuations of volatility or changes of trends. In other words,
the many reported variations of correlation structure might be in large part
attributed to changes in volatility (and statistical uncertainty).
The distinct dependences as a function of exceedance v and u of the condi-
tional correlation coefficients offer novel tools for characterizing the statistical
multivariate distributions of extreme events. Since their direct characteriza-
tion is in general restricted by the curse of dimensionality and the scarcity of
data, the conditional correlation coefficients provide reduced statistics which
can be estimated with reasonable accuracy and reliability at least when the
pdf of the data decays faster than any hyperbolic function with tail index
equal to 2. In this respect, the empirical results suggest that a Student’s cop-
ula, or more generally an elliptical copula, with a tail index of about three
accounts for the main extreme dependence properties investigated here. This
result is not really surprising since Chap. 5 has shown that Student’s cop-
ula is a reasonable choice to account for the dependence structure between
foreign exchange rates. In the present case, since the value of any domestic
stock index has been converted into the US dollar, the influence of the depen-
dence structure of foreign exchange rates can be considered as dominant in
comparison with the dependence structure between each domestic stock index
expressed in local currency. This dominance of the dependence structure of
foreign exchange rates seems particularly true during turmoil periods.
Table 6.4 gives the asymptotic values of ρ+ v , ρv and ρu for v → +∞ and
s
(λ = 0) =⇒ (ρ+
v→∞ = 0)
does not hold in general. A counter example is offered by the Student’s factor
model in the case where νY > ν (the tail of the distribution of the idio-
syncratic noise is fatter than that of the distribution of the factor). In this
case, X and Y have the same tail-dependence as and Y , which is zero by
construction. But, ρ+ s
v=∞ and ρv=∞ are both one because a large Y almost
258
s
Table 6.3. Large v and u dependence of the conditional correlations ρ+ v (signed condition), ρv (unsigned condition) and ρu (on
both variables) for the different models discussed in this chapter, described in the first column. The numbers in parentheses give the
equation numbers from which the formulas are derived. The factor model is defined by (6.15), i.e., X = βY + . ρ is the unconditional
correlation coefficient
v
ρ+ ρsv ρu
2
ρ 1 1 1−ρ 1 1+ρ 1
Bivariate Gaussian √ · v
(6.3) sgn(ρ) · 1 − 2 ρ2 v2
(6.7) ρ 1−ρ · u2
(6.20)
1−ρ2
6 Measuring Extreme Dependences
ρ ρ
Bivariate student’s 0 (6.13) 0 (6.14) –
ν−2 1 ν−2
ρ2 +(ν−1) ν
(1−ρ2 ) ρ2 + (ν−1) ν
(1−ρ2 )
v=∞
ρ+ ρsv=∞ ρu=∞ λ λ̄
Bivariate Gaussian 0 sgn(ρ) 0 0 ρ
√ 0
1−ρ
Bivariate student’s see Table 6.3 see Table 6.3 – 2 · T̄ν+1 ν+1 1+ρ
1
ρν
Student’s factor model sgn(β) sgn(β) – 1
ρν +(1−ρ2 )ν/2 {β>0}
1
6.5 Synthesis and Consequences
259
260 6 Measuring Extreme Dependences
major equity markets. It has also put in light the asymmetry in the conta-
gion effects: Mexico and Chile can be potential sources of contagion toward
Argentina and Brazil, while the reverse does not seem to hold. This phenom-
enon has been observed during the 1994 Mexican crisis and appears to remain
true in the recent Argentinean crisis, for which only Brazil seems to exhibit
the signature of a possible contagion.
The origin of the discovered asymmetry may lie in the difference between
the more market-oriented countries and the more state-intervention oriented
economies, giving rise to either currency floating regimes adapted to an im-
portant manufacturing sector which tend to deliver more competitive real
exchange rates (Chile and Mexico) or to fixed rate pegs (Argentina until the
2001 crisis and Brazil until the early 1999 crisis) [187, 188, 189]. The asym-
metry of the contagion is compatible with the view that fixed exchange rates
tighten more strictly an economy and its stock market to external shocks (case
of Argentina and Brazil) while a more flexible exchange rate seems to pro-
vide a cushion allowing a decoupling between the stock market and external
influences.
Finally, the absence of contagion does not imply necessarily the absence
of contamination. Indeed, the study of the coefficient of tail dependence has
proven that with or without contagion mechanisms (i.e., increase in the link-
age between markets during crisis) the probability of extreme co-movements
during the crisis (i.e., the contamination) is almost the same for all pairs of
markets. Thus, whatever the propagation mechanism may be – historically
strong relationship or irrational fear and herd behavior – the observed effects
are the same: the propagation of the crisis. From the practical perspective of
risk management or regulatory policy, this last point is perhaps more impor-
tant than the real knowledge of the occurrence or not of contagion.
Appendix
6.A Correlation Coefficient for Gaussian Variables Conditioned
on Both X and Y Larger Than u
Using the proposition A.1 of [15] or the expressions in [252, p.113], we can
assert that
1−ρ
m10 L(u, u; ρ) = (1 + ρ) ϕ(u) 1 − Φ u , (6.A.3)
1+ρ
1−ρ
m20 L(u, u; ρ) = (1 + ρ2 ) u ϕ(u) 1 − Φ u
1+ρ
ρ 1 − ρ2 2
+ √ ϕ u + L(u, u; ρ), (6.A.4)
2π 1+ρ
1−ρ
m11 L(u, u; ρ) = 2ρ u ϕ(u) 1 − Φ u
1+ρ
1−ρ 2 2
+ √ ϕ u + ρ L(u, u; ρ) , (6.A.5)
2π 1+ρ
where L(·, ·; ·) denotes the bivariate Gaussian survival (or complementary
cumulative) distribution:
∞ ∞
1 1 x2 − 2ρxy + y 2
L(h, k; ρ) = dx dy exp − , (6.A.6)
2π 1 − ρ2 h k 2 1 − ρ2
Let us focus on the asymptotic behavior of L(u, u; ρ), where L(h, k; ρ) is de-
fined by (6.A.6), for large u. Performing the change of variables x = x − u
and y = y − u, we can write
u2 ∞ ∞
e− 1+ρ x + y
L(u, u; ρ) = dx dy exp −u
2π 1 − ρ2 0 0 1+ρ
1 x2 − 2ρx y + y 2
× exp − . (6.A.9)
2 1 − ρ2
Using the fact that
1 x2 − 2ρx y + y 2 x2 − 2ρx y + y 2
exp − = 1 −
2 1 − ρ2 2(1 − ρ2 )
2 2 2
(x − 2ρx y + y ) (x − 2ρx y + y 2 )3
2
+ − + ··· , (6.A.10)
8(1 − ρ2 )2 48(1 − ρ2 )3
Appendix 263
and applying Theorem 3.1.1 in [247, p. 68] (Laplace’s method), (6.A.9) and
(6.A.10) yield
u2
(1 + ρ)2 e− 1+ρ (2 − ρ)(1 + ρ) 1
L(u, u; ρ) = · 1− · 2
2π 1 − ρ 2 u 2 1−ρ u
(2ρ2 − 6ρ + 7)(1 + ρ)2 1
+ · 4
(1 − ρ)2 u
(12 − 13ρ + 8ρ − 2ρ )(1 + ρ)3 1
2 3
1
−3 · + O , (6.A.11)
(1 − ρ)3 u6 u8
and
2π u2 1 − ρ2 1+ρ u2 (2 − ρ)(1 + ρ) 1
1/L(u, u; ρ) = ·e 1+ · 2
(1 + ρ)2 1−ρ u
3 − 2ρ + ρ )(1 + ρ)
2 2
1
− · 4
(1 − ρ) 2 u
(16 − 13ρ + 10ρ2 − 3ρ3 )(1 + ρ)3 1 1
+ · 6 +O . (6.A.12)
(1 − ρ)3 u u8
The first moment m10 = E[X | X > u, Y > u] is given by (6.A.3). For large
u,
/
1−ρ 1 1−ρ
1−Φ u = erfc u (6.A.13)
1+ρ 2 2(1 + ρ)
1−ρ 2 2
1 + ρ e− 2(1+ρ) u 1+ρ 1 1+ρ 1
= √ 1− · 2 +3 · 4
1−ρ 2π u 1−ρ u 1−ρ u
3
1+ρ 1 1
−15 · 6 +O , (6.A.14)
1−ρ u u8
so that multiplying by (1 + ρ) φ(u), we obtain
u2
(1 + ρ)2 e− 1+ρ 1+ρ 1
m10 L(u, u; ρ) = 1− ·
1 − ρ2 2π u 1 − ρ u2
2 3
1+ρ 1 1+ρ 1 1
+3 · 4 − 15 · 6 +O . (6.A.15)
1−ρ u 1−ρ u u8
Using the result given by equation (6.A.11), we can conclude that
1 (1 + ρ)2 (2 − ρ) 1
m10 = u + (1 + ρ) · − · 3
u (1 − ρ) u
(10 − 8ρ + 3ρ2 )(1 + ρ)3 1 1
+ · 5 +O . (6.A.16)
(1 − ρ) 2 u u7
264 6 Measuring Extreme Dependences
Putting these two expressions together and factorizing the term (1+ρ)/(1+ρ2 )
gives
u2
(1 + ρ)2 e− 1+ρ 1 + ρ2 1 (1 + ρ2 )(1 + ρ) 1
m20 L(u, u; ρ) = 1− · 2 +3 · 4
1 − ρ2 2π 1−ρ u (1 − ρ)2 u
2 2
(1 + ρ )(1 + ρ) 1 1
−15 · 6 +O + L(u, u; ρ), (6.A.20)
(1 − ρ)3 u u8
which finally yields
(1 + ρ)2 1
m20 = u2 + 2 (1 + ρ) − 2 · 2
1−ρ u
3 2
(5 + 4ρ + ρ )(1 + ρ) 1 1
+2 +O . (6.A.21)
(1 − ρ)2 u4 u6
u2
1 − ρ2 2 e− 1+ρ
√ φ u = 1−ρ2 , (6.A.23)
2π 1+ρ 2π
m11 − m10 2
ρu = . (6.A.26)
m20 − m10 2
Putting together the previous results, we have
(1 + ρ)2 (4 − ρ + 3ρ2 + 3ρ3 )(1 + ρ)2 1 1
m20 − m10 2 = − 2 + O ,
u2 1−ρ u4 u6
(6.A.27)
3
(1 + ρ) 1 1
m11 − m10 2 =ρ · 4 +O , (6.A.28)
1−ρ u u6
which proves that
1+ρ 1 1
ρu = ρ · +O and ρ ∈ [−1, 1). (6.A.29)
1 − ρ u2 u4
266 6 Measuring Extreme Dependences
6.B.1 Proposition
Let us consider a pair of Student’s random variables (X, Y ) with ν > 2 degrees
of freedom and unconditional correlation coefficient ρ. Let A be a subset of R
such that Pr{Y ∈ A} > 0. The correlation coefficient of (X, Y ), conditioned
on Y ∈ A defined by
Cov(X, Y | Y ∈ A)
ρA = (6.B.30)
Var(X | Y ∈ A) Var(Y | Y ∈ A)
can be expressed as
ρ
ρA = 0 , (6.B.31)
E[E(x2 | Y )−ρ2 Y 2 | Y ∈A]
ρ2 + Var(Y | Y ∈A)
with
⎡ +0 , ⎤
ν
Pr Y ∈A | ν−2
⎢ν − 1 ν−2 ⎥
Var(Y | Y ∈ A) = ν ⎣ · − 1⎦
ν−2 Pr{Y ∈ A | ν}
2
y∈A
dy y · ty (y)
− , (6.B.32)
Pr{Y ∈ A | ν}
Let the variables X and Y have a multivariate Student distribution with ν > 2
degrees of freedom and a correlation coefficient ρ :
− ν+2
Γ ν+2 x2 − 2ρxy + y 2 2
Γ ν+1 1 Cν
tν (x) = ν 2 · ν+1 = . (6.B.36)
Γ 2 (νπ)1/2 1 + x2 2 1 + x2
ν+1
2
ν ν
Appendix 267
so that
⎡ +0 , ⎤
Pr ν
ν−2 Y ∈A | ν−2
⎢ν − 1 ⎥
Var(Y | Y ∈ A) = ν ⎣ · − 1⎦
ν−2 Pr{Y ∈ A | ν}
2
y∈A
dy y · ty (y)
− . (6.B.45)
Pr{Y ∈ A | ν}
ν 1 − ρ2
E[E(X 2 | Y )−ρ2 Y 2 | Y ∈ A] = (1−ρ2 )+ E[Y 2 | Y ∈ A] , (6.B.47)
ν−1 ν−1
and applying the result given in equation (6.B.44), we finally obtain
+0 ,
ν
ν Pr ν−2
Y ∈A | ν−2
E[E(X 2 | Y ) − ρ2 Y 2 | Y ∈ A] = (1 − ρ2 ) · , (6.B.48)
ν−2 Pr{Y ∈ A | ν}
:
ν ν−p
Pr Y ∈A | ν−p = T̄ν−p v
ν−p ν
ν
ν−p
2 Cν−p
−(ν−p+2)
= 1 + O v , (6.B.50)
(ν − p) 2 v ν−p
ν ν−2
dy y · ty (y) = tν−2 v
y∈A ν−2 ν
ν2
ν
Cν−2
= √ + O v −(ν−3) , (6.B.51)
ν − 2 v ν−1
where tν (·) and T̄ν (·) denote respectively the density and the Student survival
distribution with ν degrees of freedom and Cν is defined in (6.B.36).
Using equation (6.B.31), one can thus give the exact expression of ρ+ v.
Since it is very cumbersome, we will not write it explicitly. We will only give
the asymptotic expression of ρ+ v:
ν
Var(Y | Y ∈ A) = v 2 + O(1) (6.B.52)
(ν − 2)(ν − 1)2
ν 1 − ρ2 2
E[E(X | Y ) − ρ Y | Y ∈ A] =
2 2 2
v + O(1) . (6.B.53)
ν−2 ν−1
Appendix 269
The conditioning set is now A = (−∞, −v] ∪ [v, +∞), with v ∈ R+ . Thus, the
right-hand sides of equations (6.B.49) and (6.B.50) have to be multiplied by
two while
dy y · ty (y) = 0 , (6.B.55)
y∈A
The results (6.B.54) and (6.B.57) are valid for ν > 2, as one can expect since
the second moment must exist for the correlation coefficient to be defined.
Contrarily to the Gaussian case, the conditioning set is not really important.
Indeed with both conditioning set, ρ+ s
v and ρv go to constants different from
zero and (plus or minus) one, when v goes to infinity. This striking difference
with the Gaussian case can be explained by the large fluctuations allowed by
the Student’s distribution, and can be related to the fact that the coefficient of
tail dependence for this distribution does not vanish even though the variables
are anticorrelated (see Sect. 4.5.3).
Contrarily to the Gaussian distribution which binds the fluctuations of the
variables near the origin, the Student’s distribution allows for “wild” fluctu-
ations. These properties are thus responsible for the result that, contrarily to
the Gaussian case for which the conditional correlation coefficient goes to zero
when conditioned on large signed values and goes to one when conditioned on
large unsigned values, the conditional correlation coefficient for Student’s vari-
ables have a similar behavior in both cases. Intuitively, the large fluctuations
of X for large v dominate and control the asymptotic dependence.
270 6 Measuring Extreme Dependences
7.1 Synthesis
A common theme underlying the chapters of this book is that many impor-
tant applications of risk management rely on the assessment of the positive
or negative outcomes of uncertain positions. The probability theory and sta-
tistics, together with the valuation of losses incurred for a given exposition to
various risk factors, take a predominant place in this process. However, they
are not, by far, the sole ingredients needed in an efficient risk management
system. Quoting Andrew Lo [311], one can assert that
[Although most] current risk-management practices are based on prob-
abilities of extreme dollar losses (e.g., measures like Value-at-Risk),
[. . . ] these measures capture only part of the story. Any complete risk
management system must address two other important factors: prices
and preferences. Together with probabilities, these comprise the three
P’s of Total Risk Management. [Understanding] how the three P’s in-
teract [allows] to determine sensible risk profiles for corporations and
for individuals, guidelines for how much risk to bear and how much
to hedge. By synthesizing existing research in economics, psychology,
and decision sciences, and through an ambitious research agenda to
extend this synthesis into other disciplines, a complete and systematic
approach to rational decision making in an uncertain world is within
reach.
Among the three P’s, Probability constitutes today, in our opinion, the
most solid pillar of risk management, because it has reached the highest level
of maturation. Compared with Price and Preference, the Probability theory is
clearly the most developed in terms of its mathematical formulation, providing
important and accurate quantitative results.
Asset valuation – and therefore Price assessment – is also very developed
quantitatively, but it remains, for a large part, subordinated to the quality of
the estimation of the probabilities. Indeed, a cornerstone of modern finance
272 7 Summary and Outlook
theory holds that the (fair) value of a given investment vehicle is nothing
but the mathematical expectation – under a suitable probability measure –
of the future discounted cash-flows generated by this investment vehicle. The
assessment of future cash-flows for complex investment vehicles is nothing
but an exercise of pure financial analysis but, without a correct probability
measure, this exercise has little value. This indubitably shows that Prices and
Probabilities are inextricably entangled and that an accurate price assessment
requires an accurate determination of the probabilities.
Preferences are also of crucial importance – in fact the most important of
the three P’s, according to Lo – since under this term is embodied the entire
human decision making process. But here, in contrast to the two other P’s,
our knowledge is still in its infancy. The pioneering theoretical work by Von
Neuman and Morgenstern [482] has laid the foundations of a rational decision
theory. However, this theory has been undermined over the years by several
paradoxes and deficiencies [4, 5], when tested against real human preferences.
Most of the recent theories, notably those directly inspired by psychological
studies [204, 263, 352, and references therein], attempt to cure the original
rational decision theory from its inconsistencies. But, one should recognize
that, while significant qualitative progress has been obtained, there is not yet
a satisfying fully operational theory of decision making.
For all these reasons, Probability still plays a dominant role in current
risk management practice. And we firmly believe that this supremacy will
extend well into the future, in view of the still large remaining potential for
improvement. Of course, the modern science of human psychology and decision
making is in constant progression and accounts better and better for the many
anomalies observed on financial markets. However, its fusion with finance,
which has given birth to the field of “behavioral finance,” will provide useful
practical tools only with the development of accurate quantitative predictions.
Until then, behavioral finance will continue to be mostly the playground of
academic research. We thus believe that, in the next few years, the most
important improvements in applied risk management will occur through more
elaborate modeling of financial markets and more generally of the economic
environment.
In spite of the key role of Price and Preference, this book has mainly
focused on the role of Probability in the risk assessment and management
processes. The different probabilistic concepts presented in the core chapters
of this book should provide a better understanding and modeling of the various
sources of uncertainty and therefore of risk factors that investors are facing.
Our presentation has been organized around the key idea that the risk of a
set of positions can be decomposed into two major components:
(i) the marginal risks associated with the variations of wealth of each risky
position,
(ii) the cross-dependence between the change in the wealth of each position.
7.1 Synthesis 273
In its conclusion, Chap. 2 notes the existence of “outliers” (also called “kings”
or “black swans”), in the distribution of financial risks measured at variable
time scales such as with drawdowns. These outliers are identified only with
metrics adapted to take into account transient increases of the time depen-
dence in the time series of returns of individual financial assets [249] (see also
Chap. 3 of [450]). These outliers seem to belong to a statistical population
which is different from the bulk of the distribution and require some additional
amplification mechanisms active only at special times.
Chapter 5 shows that two exceptional events in the period from January
1989 to December 1998 stand out in statistical tests determining the relevance
of the Gaussian copula to describe the dependence between the German Mark
and the Swiss Franc. The first of the two events is the coup against Gorbachev
in Moscow on 19 August, 1991 for which the German mark (respectively the
Swiss Franc) lost 3.37% (respectively 0.74%) against the US dollar. The second
event occurred on 10 September, 1997, and corresponds to an appreciation of
the German Mark of 0.60% against the US dollar while the Swiss Franc lost
0.79% which represents a moderate move for each currency, but a large joint
move.
The presence of such outliers both in marginal distributions and in con-
comitant moves, together with the strong impact of crises and of crashes,
suggests the need for novel measures of dependence between drawdowns and
other time-varying metrics across different assets. This program is part of
the more general need for a joint multi-time-scale and multi-asset approach
to dependence. Examples of efforts in this direction include multidimensional
GARCH models [23, 45, 46, 154, 296, 400, 477] and the multivariate multi-
fractal random walk [366]. It also epitomizes the need for new multi-period
risk measures, which would account for this class of events. Several avenues
of research have recently been opened by attempting to generalize the no-
tions of Value-at-Risk and of coherent measures of risk within a multi-period
framework [21, 405, 483].
The presence of outliers such as those mentioned in the previous section poses
the problem of exogeneity versus endogeneity. An event identified as anom-
alous could perhaps be cataloged as resulting from exogenous influences.1
The same issue has been investigated in Chap. 6 when testing for contagion
versus contamination in the Latin American crises. Contamination refers to
an endogenous dependence described by an approximately constant copula.
In contrast, contagion is by definition the concept that the dependence has
1
However, outliers may also have an endogenous origin, as described for financial
crashes [250, 449, 450].
7.2 Outlook and Future Directions 277
continuous stream of news gets incorporated into market prices for instance,
it has recently been shown how one can distinguish the effects of events like
the 11 September, 2001 attack or the coup against Gorbachev on 19 August,
1991 from events like financial crashes such as October, 1987 as well as smaller
volatility bursts. Based on a stochastic volatility model with long range de-
pendence (the so-called “multifractal random walk”, whose main properties
are given in Appendix 2.A), Sornette et al. [456] have predicted different re-
sponse functions of the volatility to large external shocks compared with what
we term endogenous shocks, i.e., which result from the cooperative accumu-
lation of many small news. This theory, which has been successfully tested
against empirical data with no adjustable parameters, suggests a general clas-
sification into two classes of events (endogenous and exogenous) with specific
signatures and characteristic precursors for the endogenous class. It also pro-
poses a simple origin for endogenous shocks as the accumulations, in certain
circumstances, of tiny bad news that add coherently due to their persistence.
Another example supporting the existence of specific signatures distin-
guishing endogenous and exogenous events has been provided by a recent in-
vestigation concerning the origin of the success of best sellers [128, 455]. The
question is whether the latest best seller is simply the product of a clever mar-
keting campaign or if it has truly permeated society? In other words, can one
determine whether a book’s popularity will wane as quickly as it appeared
or will it become a classic for future generations? The study in [455, 128]
describes a simple and generic method that distinguishes exogenous shocks
(e.g., very large news impact) from endogenous shocks (e.g., book that be-
comes a best seller by word of mouth) within the network of online buyers.
An endogenous shock appears slowly but results in a long-lived growth and
decay of sales due to small but very extensive interactions in the network of
buyers. In contrast, while an exogenous shock appears suddenly and propels
a book to best seller status, these sales typically decline rapidly as a power
law with exponent larger than for endogenous shocks. These results suggest
that the network of human acquaintances is close to “critical,” with informa-
tion neither propagating nor disappearing but spreading marginally between
people. These results have interesting potential for marketing agencies, which
could measure and maximize the impact of their publicity on the network of
potential buyers, for instance.
These two examples show that the concepts of endogeneity and exogene-
ity should have many applications including the modeling and prediction of
financial crashes [250, 458], Initial Public Offerings (IPO) [245], the movie
industry [119] and many other domains related to marketing [452], for which
the mechanism of information cascade derives from the fact that agents can
observe box office revenues and communicate word of mouth about the quality
of the movies they have seen. The formulation of a comprehensive theory of
(time) dependence allowing to characterize endoneity and exogeneity and to
distinguish between them is thus of great importance in future developments.
7.2 Outlook and Future Directions 279
mechanisms predict the existence of random abrupt changes. For the future,
it would be interesting to combine both mechanisms as they are arguably
present together in real markets, in order to clarify their relative importance
and interplay. Another important field of research is to combine these micro-
economic models with the tools developed to detect regime switching.
Determining the arrow of causality between two time series X(t) and Y (t) has
a long history, especially in economics, econometrics and finance and it is often
asked which economic variable might influence other economic phenomena
[93, 199]. This question is raised in particular for the relationships between
respectively inflation and GDP, inflation and growth rate, interest rate and
stock market returns, exchange rate and stock prices, bond yields and stock
prices, returns and volatility [95], advertising and consumption and so on. One
simple naive measure is the lagged cross-correlation function
Cov [X(t)Y (t + τ )]
CX,Y (τ ) = .
Var[X]Var[Y ]
The need for the rather sophisticated statistical methods described in this
book, as well as the developments suggested in this concluding chapter, reflect
in our opinion the absence of a fundamental genuine economic understand-
ing. To make a comparison with Natural Sciences, the need of such statistical
methods has been less important, probably because most of the fundamental
equations are known (at least at the macroscopic level) and the challenge lies
more in understanding the emergence of complex solutions from seemingly
simple mathematical formulations. In physics, for instance, the issues of de-
pendences raised in this book are better and more simply attacked from a
study of the fundamental dynamical equations. In contrast, we lack a deep
underpinning for understanding the mechanisms at the origin of the dynam-
ical behavior of financial markets. It is thus possible that the emerging field
of behavioral finance, with its sister fields of neuroeconomics and evolution-
ary psychology, and their exploration of the impact on decision making of
imperfect bounded subjective probability perceptions [36, 206, 437, 439, 474],
may provide a fundamental shift in our understanding and therefore in the
formulation of dependence between assets. This will have major impacts on
risk assessment and its optimization.
References
16. Arneodo, A., E. Bacry and J.F. Muzy (1998) Random cascades on wavelet
dyadic trees. Journal of Mathematical Physics 39, 4142–4164. 82
17. Arneodo, A., J.F. Muzy and D. Sornette (1998) Direct causal cascade in the
stock market. European Physical Journal B 2, 277–282. 82
18. Arthur, W.B., S.N. Durlauf and D.A. Lane (1997) The Economy As an Evolv-
ing Complex System II. Santa Fe Institute Studies in the Sciences of Complexity
27 Westview Press, Addison-Wesley, Redwood City CA. 14
19. Artzner, P., F. Delbaen, J.M. Eber and D. Heath (1997) Thinking coherently.
Risk 10 (November), 68–71. 2, 4, 5
20. Artzner, P., F. Delbaen, J.M. Eber and D. Heath (1999) Coherent measures of
risk. Mathematical Finance 9, 203–228. 2, 4, 5
21. Artzner, P., F. Delbaen, J.M. Eber, D. Heath and H. Ku (2004) Coherent
multiperiod risk adjusted values and Bellman’s principle. Working Paper. 276
22. Ashley, R., C.W.J. Granger and R. Schmalensee (1980) Advertising and ag-
gregate consumption: An analysis of causality. Econometrica 48, 1149–1167.
280
23. Audrino, F. and G. Barone-Adesi (2004) Average conditional correlation and
tree structures for multivariate GARCH models. Working Paper. Available at
http://papers.ssrn.com/paper.taf?abstract id=553821 276
24. Axelrod, R. (1997) The Complexity of Cooperation. Princeton University Press,
Princeton, NJ. 22
25. Axtell, R. (2001) Zipf distribution of U.S. firm sizes. Science 293, 1818–1820.
41
26. Bachelier, L. (1900) Théorie de la spéculation. Annales Scientifiques de l’Ecole
Normale Supérieure 17, 21–86. VIII, 37, 80
27. Bacry, E., J. Delour and J.-F. Muzy (2001) Multifractal random walk. Physical
Review E 64, 026103. 39, 40, 84, 85, 86
28. Bacry, E. and J.-F. Muzy (2003) Log-infinitely divisible multifractal processes.
Communications in Mathematical Physics 236, 449–475. 41, 84
29. Baig, T. and I. Goldfajn (1999) Financial market contagion in the Asian
crisis. IMF Staff Papers 46(2), 167–195. Available at http://www.imf.org/
external/Pubs/FT/staffp/1999/06-99/baig.htm 211, 238
30. Baillie, R.T. (1996) Long memory processes and fractional integration in econo-
metrics, Journal of Econometrics 73, 5–59. 87
31. Baillie, R.T., T. Bollerslev and H.O. Mikelsen (1996) Fractionally integrated
generalized autoregressive conditional heteroskedasticity. Journal of Economet-
rics 74, 3–30. 37, 80
32. Bak, P. (1996) How Nature Works: The Science of Self-organized Criticality.
Copernicus, New York. 277
33. Bak, P. and M. Paczuski (1995) Complexity, contingency and criticality. Pro-
ceedings of the National Academy of Science USA 92, 6689–6696. 277
34. Bali, T.G. (2003) An extreme value approach to estimating volatility and
Value-at-Risk. Journal of Business 76, 83–108. 79
35. Barbe, P., C. Genest, K. Ghoudi and B. Rémillard (1996) On Kendall’s process.
Journal of Multivariate Analysis 58, 197–229. 204
36. Barberis, N. and R. Thaler (2003) A survey of behavioral finance. In Handbook
of the Economics of Finance, 1(B), G.M. Constantinides, M. Harris and R.M.
Stulz, eds. Elsevier, Amsterdam, 1053–1123. 4, 281
37. Barndorff-Nielsen, O.E. (1997) Normal inverse Gaussian distributions and the
modeling of stock returns. Scandinavian Journal of Statistics 24, 1–13. 43
References 285
59. Black, F., M.C. Jensen and M.S. Scholes (1972) The capital asset pricing model:
Some empirical tests. In Studies in the Theory of Capital Markets, M.C. Jensen,
ed. Praeger, New York, 79–121. 14
60. Black, F. and M. Scholes (1973) The pricing of options and corporate liabilities.
Journal of Political Economy 81, 637–653. VIII, 38
61. Blanchard, O.J. and M.W. Watson (1982) Bubbles, rational expectations and
speculative markets. In Crisis in Economic and Financial Structure: Bubbles,
Bursts, and Shocks, P. Wachtel, ed. Lexington Books, Lexington. 39
62. Blattberg, R. and Gonnedes, N. (1974) A comparison of stable and Student
distribution as statistical models for stock prices. Journal of Business 47, 244–
280. 42
63. Blum, A. and A. Kalai (1999) Universal portfolios with and without transaction
costs. Machine Learning 35, 193–205. 275
64. Blum, P., A. Dias and P. Embrechts (2002) The ART of dependence modeling:
The latest advances in correlation analysis. In Alternative Risk Strategies. Risk
Books, Morton Lane, London, 339–356. 99
65. Bollerslev, T. (1986) Generalized autoregressive conditional heteroscedasticity.
Journal of Econometrics 31, 307–327. 43
66. Bollerslev, T., R.F. Engle and D.B. Nelson (1994) ARCH models. In Hand-
book of Econometrics, 4, R.F. Engle and D.L. McFadden, eds. North-Holland,
Amsterdam, 2959–3038. 43
67. Bonabeau, E., M. Dorigo and G. Théraulaz (1999) Swarm Intelligence: From
Natural to Artificial Systems. Oxford University Press, Oxford. 22
68. Bonabeau, E., J. Kennedy and R.C. Eberhart (2001) Swarm Intelligence. Aca-
demic Press, New York. 22
69. Bonabeau, E. and C. Meyer (2001) Swarm intelligence: A whole new way to
think about business. Havard Business Review (May), 106–114. 22
70. Bonabeau, E. and G. Théraulaz (2000) Swarm smarts. Scientific American
(March), 72–79. 22
71. Bookstaber, R. (1997) Global risk management: Are we missing the point?
Journal of Portfolio Management 23, 102–107. 228, 231
72. Bouchaud, J.-P. and M. Potters (2003) Theory of Financial Risks: From Sta-
tistical Physics to Risk Management, 2nd edition. Cambridge University Press,
Cambridge, New York. 38, 42, 59, 176
73. Bouchaud, J.-P., D. Sornette, C. Walter and J.-P. Aguilar (1998) Taming large
events: Optimal portfolio theory for strongly fluctuating assets. International
Journal of Theoretical & Applied Finance 1, 25–41. 2
74. Bouyé, E., V. Durrleman, A. Nikeghbali, G. Riboule and T. Roncalli (2000)
Copulas for finance: A reading guide and some applications. Technical Docu-
ment, Groupe de Recherche Opérationelle, Crédit Lyonnais. 103
75. Bovier, A. and D.M. Mason (2001) Extreme value behavior in the Hopfield
model. Annals of Applied Probability 11, 91–120. 44
76. Bowyer, K. and P.J. Phillips (1998) Empirical Evaluation Techniques in Com-
puter Vision. IEEE Computer Society, Los Alamos, CA. 3
77. Box, G.E.P. and M.E. Muller (1958) A note on the generation of random
normal deviates. Annals of Mathematics & Statistics 29, 610–611. 121
78. Boyer, B.H., M.S. Gibson and M. Lauretan (1997) Pitfalls in tests for changes
in correlations. Board of the Governors of the Federal Reserve System, Inter-
national Finance Discussion Paper 597. 228, 231, 232, 233, 234, 260
References 287
79. Bracewell, R. (1999) The Hilbert transform. In The Fourier Transform and Its
Applications, 3rd edition. McGraw-Hill, New York, 267–272. 279
80. Bradley, B. and M. Taqqu (2004) Framework for analyzing spatial contagion
between financial markets. Finance Letters 2(6), 8–15. 232
81. Bradley, B. and M. Taqqu (2005) Empirical evidence on spatial contagion be-
tween financial markets. Finance Letters 3(1), 64–76. 232
82. Bradley, B. and M. Taqqu (2005) How to estimate spatial contagion between
financial markets. Finance Letters 3(1), 77–86. 232
83. Breymann, W., A. Dias and P. Embrechts (2003) Dependence structures for
multivariate high-frequency data in finance. Quantitative Finance 3, 1–14. 215
84. Brock, W.A., W.D. Dechert, J.A. Scheinkman and B. Le Baron (1996) A test
for independence based on the correlation dimension. Econometric Reviews 15,
197–235. 38
85. Brockwell, P.J. and R.A. Davis (1996) Introduction to Time Series and Fore-
casting. Springer Series in Statistics, Springer, New York. 3, 80
86. Buhmann, M.D. (2003) Radial Basis Functions: Theory and Implementations.
Cambridge University Press, Cambridge, New York. 3
87. Calvo, S. and C.M. Reinhart (1995) Capital flows to Latin America: Is there
evidence of contagion effects? In Private Capital Flows to Emerging Market
After the Mexican Crisis, G.A. Calvo, M. Goldstein and E. Haochreiter, eds.
Institute for International Economics, Washington, DC. 247, 260
88. Campbell, J.Y., A.W. Lo and A.C. MacKinlay (1997) The Econometrics of
Financial Markets. Princeton University Press, Princeton, NJ. 38
89. Carpentier, D. and P. Le Doussal (2001) Glass transition of a particle in a
random potential, front selection in nonlinear renormalization group, and en-
tropic phenomena in Liouville and Sinh–Gordon models. Physical Review E
63, 026110. 44
90. Carr, P., H. Geman, D.B. Madan and M. Yor (2002) The fine structure of
assets returns: An empirical investigation. Journal of Business 75, 305–332.
43
91. Challet, D. and M. Marsili (2003) Criticality and market efficiency in a simple
realistic model of the stock market. Physical Review E 68, 036132. 57
92. Challet, D., M. Marsili and Y.-C. Zhang (2004) The Minority Game. Oxford
University Press, Oxford. 23
93. Chamberlain, G. (1982) The general equivalence of Granger and Sims causality.
Econometrica 50, 569–582. 280
94. Champenowne, D.G. (1953) A model of income distribution. Economic Journal
63, 318–351. 39
95. Chan, K.C., L.T.W. Cheng and P.P. Lung (2001) Implied volatility and equity
returns: Impact of market microstructure and cause–effect relation. Working
Paper. 280
96. Charpentier, A. (2004) Multivariate risks and copulas. Ph.D. thesis, University
of Paris IX. 249
97. Chen, K. and S.-H. Lo (1997) On a mapping approach to investigating the
bootstrap accuracy. Probability Theory & Related Fields 107, 197–217. 207
98. Chen, Y., G. Rangarajan, J. Feng and M. Ding (2004) Analyzing multiple
nonlinear time series with extended Granger causality. Physics Letters A 324,
26–35. 280
99. Cherubini, U. and Luciano, E. (2002) Bivariate option pricing with copulas.
Applied Mathematical Finance 9, 69–86. 100, 124, 131
288 References
100. Cherubini, U.E. Luciano and W. Vecchiato (2004) Copula Methods for Finance.
Wiley, New York. 124
101. Cizeau, P., M. Potters and J.P. Bouchaud (2001) Correlation structure of ex-
treme stock returns. Quantitative Finance 1, 217–222. 232
102. Claessen, S., R.W. Dornbush and Y.C. Park (2001) Contagion: Why crises
spread and how this can be stopped. In International Financial Contagion, S.
Cleassens and K.J. Forbes, eds. Kluwer Academic Press, Dordrecht, Boston.
231
103. Clayton, D.G. (1978) A model for association in bivariate life tables and its
application in epidemiological studies of familial tendency in chronic disease
incidence. Biometrika 65, 141–151. 113
104. Cochran, J.H. (2001) Asset Pricing. Princeton University Press, Princeton,
NJ.
105. Cohen, E., R.F. Riesenfeld and G. Elber (2001) Geometric Modeling with
Splines: An Introduction. AK Peters, Natick, MA. 3
106. Coles, S., J. Heffernan and J. Tawn (1999) Dependence measures for extreme
value analyses. Extremes 2, 339–365. 169, 255
107. Coles, S. and J.A. Tawn (1991) Modeling extreme multivariate events. Journal
of the Royal Statistical Society, Series B 53, 377–392. 116
108. Cont, R., Potters, M. and J.-P. Bouchaud (1997) Scaling in stock market
data: Stable laws and beyond. In Scale Invariance and Beyond, B. Dubrulle,
F. Graner and D. Sornette, eds. Springer, Berlin. 42
109. Cont, R. and P. Tankov (2003) Financial Modeling with Jump Processes. Chap-
man & Hall, London. 35
110. Cossette, H., P. Gaillardetz, E. Marceau and J. Rioux (2002) On two dependent
individual risk models. Insurance: Mathematics & Economics 30, 153–166. 100
111. Cox, D.R. (1972) Regression models in life tables (with discussion). Journal of
the Royal Statistical Society, Series B 34, 187–220. 113
112. Coutant, S., V. Durrleman, G. Rapuch and T. Roncalli (2001) Copulas, multi-
variate risk-neutral distributions and implied dependence functions. Technical
Document, Groupe de Recherche Opérationelle, Crédit Lyonnais. 100, 135
113. Cover, T.M. (1991) Universal portfolios. Mathematical Finance 1, 1–29. 275
114. Credit-Suisse-Financial-Products (1997) CreditRisk+ : A credit risk manage-
ment framework. Technical Document. Available at http://www.csfb.com/
creditrisk. 137
115. Cramer, H. (1946) Mathematical Methods of Statistics. Princeton University
Press, Princeton, NJ. 157
116. Cromwell, J.B., W.C. Labys and M. Terraza (1994) Univariate Tests for Time
Series Models. Sage, Thousand Oaks, CA, 20–22. 38
117. Danielsson, J., P. Embrechts, C. Goodhart, C. Keating, F. Muennich, O. Re-
nault and H.-S. Shin (2001) An academic response to Basel II. FMG and ESRC,
130 (London). VIII
118. Dawkins, R. (1989) The Selfish Gene, 2nd edition. Oxford University Press,
Oxford. 21
119. De Vany, A. & Lee, C. (2001) Quality signals in information cascades and the
dynamics of the distribution of motion picture box office revenues. Journal of
Economic Dynamics & Control 25, 593–614. 278
120. DeGregori, T.R. (2002) The zero risk fiction. April 12, American Council on
Science and Health. Available at http://www.acsh.org VII
References 289
140. Eberlein, E., Keller, U. and Prause, K. (1998) New insights into smile, mispric-
ing and value at risk: The hyperbolic model. Journal of Business 71, 371–405.
43, 58
141. Eberlein, E. and F. Özkan (2003) Time consistency of Lévy models. Quantita-
tive Finance 3, 40–50. 35
142. Edwards, S. and R. Susmel (2001) Volatility dependence and contagion in
emerging equity markets. Working Paper. Available at http://papers.ssrn.
com/sol3/papers.cfm?abstract id=285631 246
143. Efron, B. and R.J. Tibshirani (1986) Bootstrap method for standard errors,
confidence intervals and other measures of statistical accuracy. Statistical Sci-
ence 1, 54–77. 207
144. Efron, B. and R.J. Tibshirani (1993) An Introduction to the Bootstrap. Chap-
man & Hall, CRC. 3
145. Embrechts, P., A. Hoeing and A. Juri (2003) Using copulae to bound the Value-
at-Risk for functions of dependent risk. Finance & Stochastics 7, 145–167. 100,
118, 119, 124
146. Embrechts, P., C.P. Klüppelberg and T. Mikosh (1997) Modelling Extremal
Events. Springer-Verlag, Berlin. 46, 58
147. Embrechts, P., F. Lindskog and A. McNeil (2003) Modelling dependence with
copulas and applications to risk management. In Handbook of Heavy Tailed
Distributions in Finance, S. Rachev, ed. Elsevier, Amsterdam, 329–384. 167,
169, 232
148. Embrechts, P., A.J. McNeil and D. Straumann (1999) Correlation: Pitfalls and
alternatives. Risk 12(May), 69–71. 99
149. Embrechts, P., A.J. McNeil and D. Straumann (2002) Correlation and depen-
dence in risk management: Properties and pitfalls. In Risk Management: Value
at Risk and Beyond, M.A.H. Dempster, ed. Cambridge University Press, Cam-
bridge, 176–223. 99, 150, 169, 172, 220, 232
150. Engle, R.F. (1982) Autoregressive conditional heteroskedasticity with estima-
tion of the variance of UK inflation. Econometrica 50, 987–1008. 35
151. Engle, R.F. (1984) Wald, likelihood ratio, and lagrange multiplier tests in
econometrics. In Handbook of Econometrics, II, Z. Griliches and M.D. Intrili-
gator, eds. North-Holland, Amsterdam. 38
152. Engle, R.F., D.F. Hendry and J.-F. Richard (1983) Exogeneity. Econometrica
51, 277–304. 277
153. Engle, R.F. and A.J. Patton (2001) What good is a volatility model? Quanti-
tative Finance 1, 237–245. 66
154. Engle, R.F. and K. Sheppard (2001) Theoretical and empirical properties of
dynamic conditional correlation multivariate GARCH. NBER Working Papers
number 8554. Available at http://www.nber.org/papers/w8554.pdf 276
155. Erdös, P., A. Rényi and V.T. Sós (1966) On a problem of graph theory. Studia
Scientiarum Mathematicarum Hungarica 1, 215–235. 26
156. Ericsson, N. and J.S. Irons (1994) Testing Exogeneity: Advanced Texts in
Econometrics. Oxford University Press, Oxford. 277
157. Fama, E.F. (1965) The behavior of stock market prices. Journal of Business
38, 34–105. 42
158. Fama, E.F. (1970) Efficient capital markets: A review of theory and empirical
work. Journal of Finance 25, 383–417. 20
159. Fama, E.F. (1991) Efficient capital markets II. Journal of Finance 46, 1575–
1617. 20
References 291
160. Fama, E.F. and K.R. French (1992) The cross-section of expected stock returns.
Journal of Finance 47, 427–465. 3, 14, 18, 19, 24
161. Fama E.F. and K.R. French (1996) Multifactor explanations of asset pricing
anomalies. Journal of Finance 51, 55–84. 3, 24, 37
162. Fang, H.B. and T. Lai (1997) Co-kurtosis and capital asset pricing. Financial
Review 32, 293–307. 15, 58
163. Fang, H.B., K.T. Fang and S. Kotz (2002) The meta-elliptical distributions
with given marginals. Journal of Multivariate Analysis 82, 1–16. 108, 109
164. Farmer, J.D. (2002) Market force, ecology and evolution. Industrial & Corpo-
rate Change 11(5), 895–953. 22
165. Farmer, J.D. and Lillo, F. (2004) On the origin of power-law tails in price
fluctuations. Quantitative Finance 4, 11–15. 41
166. Farmer, J.D., P. Patelli and I. I. Zovko (2005) The predictive power of zero
intelligence models in financial markets. Proceedings of the National Academy
of Sciences 102(6), 2254–2259. 21
167. Feller, W. (1971) An Introduction to Probability Theory and its Applications,
II. Wiley, New York. 58
168. Fermanian, J.D. and O. Scaillet (2003) Nonparametric estimation of copulas
for time series. Journal of Risk 5(4), 25–54. 192, 193, 194, 254
169. Fermanian, J.D. and O. Scaillet (2005) Some statistical pitfalls in copula model-
ing for financial applications. In Capital Formation, Governance and Banking,
E. Klein, ed. Nova Publishers, Hauppauge, NY. 204
170. Figlewski, S. and X. Wang (2001) Is the “Leverage Effect” a Leverage Effect?
Unpublished working paper, New York University. 85
171. Fishman, G.S. (1996) Monte Carlo. Springer-Verlag, New York. 120
172. Flood, P. and P.M. Garber (1994) Speculative Bubbles, Speculative Attacks,
and Policy Switching. MIT Press, Cambridge, MA. 3
173. Flury, B. (1997) A First Course in Multivariate Statistics. Springer, New York.
3
174. Föllmer, H. and A. Schied (2002) Convex measures of risk and trading con-
straints. Finance & Stochastics 6, 429–447. 4, 7
175. Föllmer, H. and A. Schied (2003) Robust preferences and convex measures
of risk. In Advances in Finance and Stochastics, Essays in Honour of Dieter
Sondermann, K. Sandmann and P.J. Schonbucher, eds. Springer-Verlag, New
York. 4, 7
176. Föllmer, H. and M. Schweizer (1991) Hedging of contingent claims under in-
complete information. In Applied Stochastic Analysis, Stochastic Monographs,
5, M.H.A. Davis and R.J. Elliot, eds. Gordon and Breach, New York. 136
177. Föllmer, H. and D. Sondermann (1986) Hedging of non-redundant contingent
claims. In Contributions to Mathematical Economics, W. Hildenbrand and A.
Mascolell, eds. North-Holland, Amsterdam. 136
178. Forbes, K.J. and R. Rigobon (2002) No contagion, only interdependence: Mea-
suring stock market co-movements. Journal of Finance 57, 2223–2261. 232,
238, 240, 260
179. Frank, M.J., R.B. Nelsen and B. Schweitzer (1987) Best-possible bounds for
the distribution for a sum – A problem of Kolmogorov. Probability Theory &
Related Fields 74, 199–211. 118, 124, 127
180. Franses, P.H. and D. van Dijk (2000) Nonlinear Time Series Models in Em-
pirical Finance. Cambridge University Press, Cambridge, New York. 3
292 References
181. Füredi, Z. and J. Komlós (1981) The eigenvalues of random symmetric matrices
Combinatorica 1, 233–241.
182. Frees, W.E., J. Carriere and E.A. Valdez (1996) Annuity valuation with de-
pendent mortality. Journal of Risk & Insurance 63, 229–261. 113
183. Frees, W.E. and E.A. Valdez (1998) Understanding relationship using copulas.
North American Actuarial Journal 2, 1–25. 100, 103, 111, 124, 201
184. Frey, R. and A. McNeil (2001) Modelling dependent defaults. ETH E-
Collection. Available at http://e-collection.ethbib.ethz.ch/show?type=
bericht&nr=273 100, 124
185. Frey, R. and A. McNeil (2002) VaR and expected shortfall in portfolios of
dependent credit risks: Conceptual and practical insights. Journal of Banking
& Finance 26, 1317–1334. 181
186. Frey, R., A. McNeil and M. Nyfeler (2001) Credit risk and copulas. Risk 14,
111–114. 100, 138, 181
187. Frieden, J.A. (1992) Debt, Development, and Democracy: Modern Political
Economy and Latin America, 1965–1985. Princeton University Press, Prince-
ton, NJ. 261
188. Frieden, J.A., P. Ghezzi and E. Stein (2001) Politics and exchange rates: A
cross-country approach. In The Currency Game: Exchange Rate Politics in
Latin America, J.A. Frieden, P. Ghezzi, E. Stein, eds. Inter-American Devel-
opment Bank, Washington, DC. 261
189. Frieden, J.A. and E. Stein (2000) The political economy of exchange rate policy
in Latin America: An analytical overview. Working Paper, Harvard University.
261
190. Frisch, U. (1995) Turbulence. Cambridge University Press, Cambridge. 82
191. Frisch, U. and D. Sornette (1997) Extreme deviations and applications. Journal
de Physique I, France 7, 1155–1171. 57
192. Frittelli, M. (2000) The minimal entropy martingale measure and the valuation
problem in incomplete markets. Mathematical Finance 10, 39–52. 136
193. Gabaix, X. (1999) Zipf’s law for cities: An explanation. Quarterly Journal of
Economics 114, 739–767. 39
194. Gabaix, X., P. Gopikrishnan, V. Plerou and H.E. Stanley (2003) A theory of
power-law distributions in financial market fluctuations. Nature 423, 267–270.
41
195. Gabaix, X., P. Gopikrishnan, V. Plerou and H.E. Stanley (2003) A theory
of large fluctuations in stock market activity. MIT Department of Economics
Working Paper 03-30. Available at http://ssrn.com/abstract=442940 41
196. Geman, H. (2002) Pure jump Lévy process for asset price modelling. Journal
of Banking & Finance 26, 1297–1316. 35
197. Genest, C., K. Ghoudi and L.P. Rivest (1995) A semiparametric estimation
procedure of dependence parameters in multivariate families of distributions.
Biometrika 82, 543–552. 197, 198, 200, 222, 223
198. Genest, C. and J. MacKay (1986) The joy of copulas: Bivariate distribution
with uniform marginals. American Statistician 40, 28–283. 123, 155
199. Geweke, J. (1984) Inference and causality in economic time series models. In
Handbook of Economics, Vol. II, Z. Griliches and M.D. Intriligator, eds. Elsevier
Science Publisher BV, Amsterdam, 1101–1144. 280
200. Ghoudi, K. and B. Remillard (1998) Empirical processes based on pseudo-
observations. In Asymptotic Methods in Probability and Statistics: A Volume
in Honour of Miklos Csorgo, B. Szyskowicz, ed. Elsevier, Amsterdam. 206
References 293
221. Hamilton, J.D. (1989) A new approach to the economic analysis of non-
stationary time series and the business cycle. Econometrica 57, 357–384. 279
222. Hamilton, J.D. (1994) Time Series Analysis. Princeton University Press,
Princeton, NJ. 3, 80
223. Harvey, C.R. and A. Siddique (2000) Conditional skewness in asset pricing
tests. Journal of Finance 55, 1263–1295. 14, 15
224. Hauksson, H.A., M.M. Dacorogna, T. Domenig, U.A. Müller and G. Samorod-
nitsky (2001) Multivariate extremes, aggregation and risk estimation. Quanti-
tative Finance 1, 79–95. 232
225. Havrda, J. and F. Charvat (1967) Quantification method of classification
processes: Concept of structural α-entropy. Kybernetica Cislo I. Rocnik 3, 30–
34. 163
226. Heath, D., E. Platen and M. Schweizer (2001) A comparison of two quadratic
approaches to hedging in incomplete markets. Mathematical Finance 11, 385–
413. 136
227. Helmbold, D.P., R.E. Schapire, Y. Singer and M.K. Warmuth (1998) On-line
portfolio selection using multiplicative updates. Mathematical Finance 8, 325–
347, 275
228. Hennessy, D. and H.E. Lapan (2002) The use of Archimedean copulas to model
portfolio allocations. Mathematical Finance 12, 143–154. 124
229. Herffernan, J.E. (2000) A directory of tail dependence. Extremes 3, 279–290.
172
230. Henderson, V., D. Hobson, S. Howison and T. Kluge (2005) A comparison of
option prices under different pricing measures in a stochastic volatility model
with correlation. Review of Derivatives Research 8, 5–25. 136
231. Hergarten, S. (2002) Self-organized Criticality in Earth Systems. Springer-
Verlag, Heidelberg. 277
232. Heston, S.L. (1993) A closed-form solution for options with stochastic volatility
with applications to bond and currency options. Review of Financial Studies
6, 327–343. 37, 144
233. Hill, B.M. (1975) A simple general approach to inference about the tail of a
distribution. Annals of Statistics 3, 1163–1174. 64
234. Hobson, D. (2004) Stochastic volatility models, correlation and the q-optimal
measure. Mathematical Finance 14, 537–556. 136
235. Hotelling, H. (1936) Relations between two sets of variates. Biometrika 28,
321–377. 152
236. Houggard, P. (1984) Life table methods for heterogeneous populations: Distri-
butions describing for heterogeneity. Biometrika 71, 75–83. 113
237. Houggard, P., B. Harvald and N.V. Holm (1992) Measuring the similarities
between the lifetimes of adult Danish twins born between 1881–1930. Journal
of the American Statistical Association 87, 17–24. 113
238. Hubert, P.J. (2003) Robust Statistics. Wiley-Intersecience, New York. 275
239. Hult, H. and F. Lindskog (2001) Multivariate extremes, aggregation and depen-
dence in elliptical distributions. Advances in Applied Probability 34, 587–609.
174
240. Hwang, S. and M. Salmon (2002) An analysis of performance measures using
copulae. In Performance Measurement in Finance: Firms, Funds and Man-
agers, J. Knight and S. Satchell, eds. Butterworth-Heinemann, London. 124
References 295
241. Hwang, S. and S. Satchell (1999) Modelling emerging market risk premia using
higher moments. International Journal of Finance & Economics 4, 271–296.
13, 15, 58
242. Ingersoll, J.E. (1987) The Theory of Financial Decision Making. Rowman &
Littlefield, Totowa, NJ.
243. Iman, R.L. and W.J. Conover (1982) A distribution-free approach to inducing
rank correlation among input variables. Communications in Statistics Simula-
tion & Computation 11, 311–334. 120
244. Jackel, P. (2002) Monte Carlo Methods in Finance. Wiley, New York. 120
245. Jenkinson, T. and Ljungqvist, A. (2001) Going Public: The Theory and Evi-
dence on How Companies Raise Equity Finance, 2nd edition. Oxford University
Press, Oxford. 278
246. Jensen, H.J. (1998) Self-organized Criticality: Emergent Complex Behavior in
Physical and Biological Systems, Cambridge Lecture Notes in Physics. Cam-
bridge University Press, Cambridge. 277
247. Jensen, J.L. (1995) Saddlepoint Approximations. Oxford University Press, Ox-
ford. 263
248. Joe, H. (1997) Multivariate Models and Dependence Concepts. Chapman &
Hall, London. 103, 117, 137, 202
249. Johansen, A. and D. Sornette (2001) Large stock market price Drawdowns are
outliers. Journal of Risk 4, 69–110, Hauppauge, NY. 3, 23, 36, 79, 276
250. Johansen, A. and D. Sornette (2004) Endogenous versus exogenous crashes
in financial markets. In Contemporary Issues in International Finance. Nova
Science Publishers. 3, 36, 276, 278
251. Johnson, N.F., P. Jefferies and P.M. Hui (2003) Financial Market Complexity.
Oxford University Press, Oxford. 23
252. Johnson, N.L. and S. Kotz (1972) Distributions in Statistics: Continuous Mul-
tivariate Distributions. Wiley, New York. 107, 121, 240, 262
253. Johnson, N.L., S. Kotz and N. Balakrishnan (1997) Discrete Multivariate Dis-
tributions. Wiley, New York. 3
254. Johnson, R.A. and D.W. Wichern (2002) Applied Multivariate Statistical
Analysis, 5th edition. Prentice Hall, Upper Saddle River, NJ. 3
255. Johnson, T.C. (2004) Forecast dispersion and the cross section of expected
returns. Journal of Finance 59, 1957–1978. 20
256. Jondeau, E. and M. Rockinger (2003) Testing for differences in the tails of
stock-market returns. Journal of Empirical Finance 10, 559–581. 53, 78
257. Jorion, P. (1997) Value-at-Risk: The New Benchmark for Controlling Deriva-
tives Risk. Irwin Publishing, Chicago, IL. 2
258. Jouini, M.N. and R.T. Clemen (1996) Copula models for aggregating expert
opinions. Operation Research 43, 444–457. 124
259. Jurczenko, E. and B. Maillet (2005) The 4-CAPM: in between Asset Pricing
and Asset Allocation. In Multi-Moment Capital Pricing Models and Related
Topics, C. Adcock, B. Maillet and E. Jurzenko, eds. Springer. Forthcoming 58
260. Juri, A. and M.V. Wüthrich (2002) Copula convergence theorem for tail events.
Insurance: Mathematics & Economics 30, 405–420. 115, 232, 255
261. Kalai, A. and S. Vempala (2000) Efficient algorithms for universal portfolios. In
Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer
Science, 486–491. 275
296 References
262. Kaminsky, G.L. and S.L. Schmukler (1999) What triggers market jitters? A
chronicle of the Asian crisis. Journal of International Money & Finance 18,
537–560. 211
263. Kahneman, D. (2002) Maps of bounded rationality: A perspective on intuitive
judgment an choice. Nobel Prize Lecture. Available at http://nobelprize.
org/economics/laureates/2002/kahnemann-lecture.pdf 272
264. Karatzas, I. and S.E. Shreve (1991) Brownian Motion and Stochastic Calculus.
Springer-Verlag, New York. 219
265. Karlen, D. (1998) Using projection and correlation to approximate probability
distributions. Computer in Physics 12, 380–384. 108, 219
266. Kass, R.E. and A.E. Raftery (1995) Bayes factors. Journal of the American
Statistical Association 90, 773–795. 74
267. Kearns, P. and A. Pagan (1997) Estimating the density tail index for financial
time series. Review of Economics & Statistics 79, 171–175. 43, 48, 51, 52, 65,
76
268. Kesten, H. (1973) Random difference equations and renewal theory for prod-
ucts of random matrices. Acta Mathematica 131, 207–248. 39
269. Kim, C.-J. and C.R. Nelson (1999) State-Space Models with Regime Switch-
ing: Classical and Gibbs-Sampling Approaches with Applications. MIT Press,
Cambridge, MA. 3
270. Kimeldorf, G. and A. Sampson (1978) Monotone dependence. Annals of Sta-
tistics 6, 895–903. 101
271. King, M. and S. Wadhwani (1990) Transmission of volatility between stock
markets. Review of Financial Studies 3, 5–330. 231, 247
272. Klugman, S.A. and R. Parsa (1999) Fitting bivariate loss distributions with
copulas. Insurance: Mathematics & Economics 24, 139–148. 124, 201, 203
273. KMV Corporation (1997) Modelling default risk. Technical Document. Avail-
able at http://www.kmv.com 137
274. Knopoff, L. and D. Sornette (1995) Earthquake death tolls. Journal de Ph-
ysisque I, France 5, 1681–1688. 126
275. Kon, S. (1984) Models of stock returns: A comparison. Journal of Finance 39,
147–165. 42
276. Kotz, S. (2000) Continuous Multivariate Distributions, 2nd edition. Wiley, New
York. 3
277. Kotz, S. and S. Nadarajah (2000) Extreme Value Distribution: Theory and
Applications. Imperial College Press, London. 44
278. Krauss, A. and R. Litzenberger (1976) Skewness preference and the valuation
of risk assets. Journal of Finance 31, 1085–1099. 15
279. Krivelevich, M. and V.H. Vu (2002) On the concentration of eigenvalues of
random symmetric matrices. Israel Journal of Mathematics 131, 259–268. 26
280. Krugman, P. (1996) Self-organizing Economy. Blackwell publishers,
Cambridge, MA, and Oxford. 14
281. Kruskal, W.H. (1958) Ordinal measures of association. Journal of the American
Statistical Association 53, 814–861. 160
282. Krzanowski, W.J. (2000) Principles of Multivariate Analysis: A User’s Perspec-
tive, revised edition. Clarendon Press, Oxford; Oxford University Press, New
York. 3
283. Krzysztofowicz, R. and K.S. Kelly (1996) A meta-Gaussian distribution with
specified marginals. Technical Document, University of Virginia. 108
References 297
303. Li, D.X. (2000) On default correlation: A copula approach. Journal of Fixed
Income 9, 43–54. 124, 137
304. Li, D.X., Mikusinski, P., H. Sherwood and M.D. Taylor (1997) On approxima-
tion of copulas. In Distributions with Given Marginals and Moments Problems,
V. Benes and J. Stephan, eds. Kluwer Academic Publisher, Dordrecht, Boston.
191
305. Li, D.X., Mikusinski, P. and Taylor, M.D. (1998) Strong approximation of
copulas. Journal of Mathematical Analysis & Applications 225, 608–623. 191,
254
306. Lim, K.G. (1989) A new test for the three-moment capital asset pricing model.
Journal of Financial & Quantitative Analysis 24, 205–216. 14, 15
307. Lindskog, F., A.J. McNeil and U. Schmock (2003) Kendall’s tau for elliptical
distributions. In Credit Risk – Measurement, Evaluation and Management, G.
Bol, G. Nakhaeizadeh, S. Rachev, T. Ridder and K.-H. Vollmer, eds. Physica-
Verlag, Heidelberg. 111, 157
308. Lintner, J. (1975) The valuation of risk assets and the selection of risky invest-
ments in stock portfolios and capital budgets. Review of Economics & Statistics
13, 13–37. 14
309. Litterman, R. and K. Winkelmann (1998) Estimating Covariance Matrices.
Risk Management Series, Goldman Sachs. 2
310. Little, R.J.A. and D.B. Rubin (1987) Statistical Analysis with Missing Data.
Wiley, New York. 3
311. Lo, A.W. (1999) The three P’s of total risk management. Financial Analysts
Journal 55 (January/February), 13–26. 271
312. Longin, F.M. (1996) The asymptotic distribution of extreme stock market re-
turns. Journal of Business 96, 383–408. 42, 43, 44, 52, 61
313. Longin, F.M. (2000) From value at risk to stress testing: The extreme value
approach. Journal of Banking & Finance 24, 1097–1130. 43, 46, 79
314. Longin, F.M. and B. Solnik (1995) Is the correlation in international equity
returns constant: 1960–1990? Journal of International Money & Finance 14,
3–26. 231, 241
315. Longin F.M. and B. Solnik (2001) Extreme correlation of international equity
markets. Journal of Finance 56, 649–676. 231, 232, 236, 241, 253, 254, 255,
260
316. Loretan, M. (2000) Evaluating changes in correlations during periods of high
market volatility. Global Investor 135, 65–68. 231, 232
317. Loretan, M. and W.B. English (2000) Evaluating “correlation breakdowns”
during periods of market volatility. BIS Quarterly Review (June), 29–36. 231,
232
318. Loynes, R.M. (1965) Extreme values in uniformly mixing stationary stochastic
processes. Annals of Mathematical Statistics 36, 993–999. 44
319. Lux, T. (1996) The stable Paretian hypothesis and the frequency of large re-
turns: An examination of major German stocks. Applied Financial Economics
6, 463–475. 78
320. Lux, T. (2000) On moment condition failure in German stock returns: An
application of recent advances in extreme value statistics. Empirical Economics
25, 641–652. 42
321. Lux, T. (2003) The multifractal model of asset returns: Its estimation via GMM
and its use for volatility forecasting. Working Paper, University of Kiel. 40
References 299
322. Lux, T. (2001) The limiting extreme behavior of speculative returns: An analy-
sis of intra-daily data from the Francfurt stock exchange. Journal of Economic
Behavior & Organization 46, 327–342. 42, 44
323. Lux, T. and D. Sornette (2002) On rational bubbles and fat tails. Journal of
Money Credit & Banking 34, 589–610. 39
324. Lyubushin Jr., A.A. (2002) Robust wavelet-aggregated signal for geophysical
monitoring problems. Izvestiya, Physics of the Solid Earth 38, 1–17. 152
325. Majumdar, S.N. and P.L. Krapivsky (2002) Extreme value statistics and trav-
eling fronts: Application to computer science. Physical Review E 65, 036127.
44
326. Majumdar, S.N. and P.L. Krapivsky (2003) Extreme value statistics and trav-
eling fronts: Various applications. Physica A 318, 161–170. 44
327. Makarov, G.D. (1981) Estimates for the distribution function of a sum of two
random variables when the marginal distributions are fixed. Theory of Proba-
bility & its Applications 26, 803–806. 118
328. Malamud, B.D., G. Morein and D.L. Turcotte (1998) Forest fires – An example
of self-organized critical behavior. Science 281, 1840–1842. 126
329. Malevergne, Y., V.F. Pisarenko and D. Sornette (2003) On the power of gen-
eralized extreme value (GEV) and generalized pareto distribution (GPD) esti-
mators for empirical distributions of log-returns. Applied Financial Economics.
Forthcoming. 38, 48, 49, 50, 51, 52, 55, 56
330. Malevergne, Y., V.F. Pisarenko and D. Sornette (2005) Empirical distribution
of log-returns: Between the stretched-exponential and the power law? Quanti-
tative Finance 5. Forthcoming. 40, 54, 63, 64, 66, 68, 70, 72, 74, 75
331. Malevergne, Y. and D. Sornette (2001) Multi-dimensional rational bubbles and
fat tails. Quantitative Finance 1, 533–541. 39
332. Malevergne, Y. and D. Sornette (2002) Minimizing extremes. Risk 15 (11),
129–132. X, 174, 175, 178, 179, 255
333. Malevergne, Y. and D. Sornette (2005) Multi-moment methods for portfo-
lio management: Generalized capital asset pricing model in homogeneous and
heterogeneous markets. In Multi-Moment Capital Pricing Models and Related
Topics, C. Adcock, B. Maillet and E. Jurzenko, eds. Springer. Forthcoming. X,
8, 16, 17, 31, 58
334. Malevergne, Y. and Sornette, D. (2003) Testing the Gaussian copula hypothesis
for financial assets dependences. Quantitative Finance 3, 231–250. 204, 207,
208, 211, 212, 214, 215, 216, 232, 254
335. Malevergne, Y. and D. Sornette (2004) How to account for extreme co-
movements between individual stocks and the market. Journal of Risk 6(3),
71–116. 174, 175, 255
336. Malevergne, Y. and D. Sornette (2004) VaR-efficient portfolios for a class of
super- and sub-exponentially decaying assets return distributions. Quantitative
Finance 4, 17–36. X, 124, 128, 130, 141
337. Malevergne, Y. and D. Sornette (2004) Collective origin of the coexistence of
apparent RMT noise and factors in large sample correlation matrices. Physica
A 331, 660–668. 24, 27, 28
338. Malevergne, Y. and D. Sornette (2005) Higher-moment portfolio theory (Cap-
italizing on behavioral anomalies of stock markets). Journal of Portfolio Man-
agement 31(4), 49–55. 124
339. Mandelbrot, B.B. (1963) The variation of certain speculative prices. Journal
of Business 36, 392–417. 42
300 References
340. Mandelbrot, B.B. (1997) Fractals and Scaling in Finance: Discontinuity, Con-
centration, Risk. Springer, New York. 82
341. Mandelbrot, B.B., A. Fisher and L. Calvet (1997) A multifractal model of asset
returns. Coles Fundation Discussion Paper #1164. 37, 41, 82, 84
342. Mansilla, R. (2001) Algorithmic complexity of real financial markets. Physica
A 301, 483–492. 232
343. Mantegna, R.N. (1999) Hierarchical structure in financial markets. European
Physical Journal B 11, 193–197. 28
344. Mantegna, R.N. and H.E. Stanley (1994) Stochastic process with ultra slow
convergence to a Gaussian: The truncated Lévy flight. Physical Review Letters
73, 2946–2949. 75
345. Mantegna, R.N. and H.E. Stanley (1995) Scaling behavior of an economic
index. Nature 376, 46–55. 42
346. Mantegna, R.N. and H.E. Stanley (2000) An Introduction to Econophysics,
Correlations and Complexity in Finance. Cambridge University Press, Cam-
bridge. 42
347. Markovitz, H. (1959)Portfolio Selection: Efficient Diversification of Invest-
ments. Wiley, New York. VIII, 1, 2, 13, 31, 38
348. Marshall, A. and I. Olkin (1988) Families of multivariate distributions. Journal
of the American Statistical Association 83, 834–841. 113, 123
349. Marsili, M. (2002) Dissecting financial markets: Sectors and states. Quantita-
tive Finance 2, 297–302. 28
350. Mashal, R. and A.J. Zeevi (2002) Beyond correlation: Extreme co-movements
between financial assets. Working Paper, Columbia Business School. 110, 200,
212, 215, 216, 224, 226
351. Matia, K., L.A.N. Amaral, S.P. Goodwin and H.E. Stanley (2002) Different
scaling behaviors of commodity spot and future prices. Physical Review E 66,
045103. 51
352. McClure, S.M., D.I. Laibson, G. Loewenstein and J.D. Cohen (2004) Separate
neural systems value immediate and delayed monetary rewards. Science 306,
503–507. 272
353. McDonald, J.B. and W.K. Newey (1988) Partially adaptive estimation of re-
gression models via the generalized t-distribution. Econometric Theory 4, 428–
457. 54
354. McLeish, D.L. and C.G. Small (1988) The Theory and Application of Statistical
Inference Functions. Springer-Verlag, Berlin. 202
355. McNeil, A. and R. Frey (2000) Estimation of tail-related risk measures for
heteroscedastic financial time series: An extreme value approach. Journal of
Empirical Finance 7, 271–300. 79
356. Meerschaert, M.M. and H. Scheffler (2001) Sample cross-correlations for mov-
ing averages with regularly varying tails. Journal of Time Series Analysis 22,
481–492. 58, 148, 243, 254
357. Mehta, M.L. (1991) Random Matrices, 2nd edition. Academic Press, Boston.
26
358. Merton, R.C. (1974) On the pricing of corporate debt: The risk structure of
interest rates. Journal of Finance 29, 449–470. 20, 137
359. Merton, R.C. (1990) Continuous-Time Finance. Blackwell, Cambridge. 14
360. Mézard, M., G. Parisi and M.A. Virasoro (1987) Spin Glass Theory and Be-
yond, World Scientific Lecture Notes in Physics, Vol. 9. World Scientific, Sin-
gapore. 164
References 301
361. Mills, T.C. (1993) The Econometric Modelling of Financial Time Series. Cam-
bridge University Press, Cambridge, New York. 3
362. Mittnik S., S.T. Rachev and M.S. Paolella (1998) Stable Paretian modeling in
finance: Some empirical and theoretical aspects. In A Practical Guide to Heavy
Tails, R.J. Adler, R.E. Feldman and M.S. Taqqu, eds. Birkhauser, Boston,
79–110. 42
363. Morrison, D.F. (1990) Multivariate Statistical Methods, 3rd edition. McGraw-
Hill, New York. 3
364. Mossin, J. (1966) Equilibrium in a capital market. Econometrica 34, 768–783.
14
365. J.-F. Muzy, A. Kozhemyak and E. Bacry (2005) Extreme values and fat tails
of multifractal fluctuations. Working Paper. 40, 41
366. Muzy, J.F., D. Sornette, J. Delour and A. Arnéodo, (2001) Multifractal returns
and hierarchical portfolio theory. Quantitative Finance 1, 131–148. 40, 217, 276
367. Müller, U.A., M.M. Dacorogna, O.V. Pictet (1998) Heavy tails in high-
frequency financial data. In A Practical Guide to Heavy Tails, R.J. Adler,
R.E. Feldman and M.S. Taqqu, eds. Birkhauser, Boston, 55–78. 42, 44
368. Nagahara, Y. and G. Kitagawa (1999) A non-Gaussian stochastic volatility
model. Journal of Computational Finance 2, 33–47. 42
369. Naveau, P. (2003) Almost sure relative stability of the maximum of a stationary
sequence. Advances in Applied Probability 35, 721–736. 44
370. Nelsen, R.B. (1998) An Introduction to Copulas, Lectures Notes in statistic,
139. Springer Verlag, New York. 103, 112, 115, 118, 155, 191
371. Neudecker, H., R. Heijmans, D.S.G. Pollock, A. Satorra (2000) Innovations in
Multivariate Statistical Analysis. Kluwer Academic Press, Dordrecht, Boston.
3
372. Newey, W.K. and D. McFadden (1994) Large sample estimation and hypothesis
testing. In Handbook of Econometrics, 4, R.F. Engle and D. McFadden, eds.
North-Holland, Amsterdam. 202, 216, 224
373. Noh, J.D. (2000) Model for correlations in stock market. Physical Review E
61, 5981–5982. 27
374. O’Brien, G.L. (1987) Extreme values for stationary and Markov sequences.
Annals of Probability 15, 281–291. 44
375. Oakes, D. (1982) A model for association in bivariate survival data. Journal of
the Royal Statistical Society, Series B 44, 414–422. 196
376. Okuyama, K., M. Takayasu and H. Takayasu (1999) Zipf’s law in income dis-
tribution of companies. Physica A 269, 125–131. 41
377. Osborne, M.F.M. (1959) Brownian motion in the stock market. Operations
Research 7, 145–173. Reprinted in The Random Character of Stock Market
Prices, P. Cootner ed. MIT Press, Cambridge, MA (1964), 100–128. 38, 80
378. Panja, D. (2004) Fluctuating fronts as correlated extreme value problems: An
example of Gaussian statistics. Physical Review E 70, 036101. 44
379. Papoulis, A. (1962) Hilbert transforms. In The Fourier Integral and Its Appli-
cations. McGraw-Hill, New York, 198–201. 279
380. Patton, J.A (2005) Estimation of multivariate models for time series of possibly
different lengths. Journal of Applied Econometrics. Forthcoming. 100, 203, 217,
254
381. Patton, J.A. (2005) Modelling asymmetric exchange rate dependence. Inter-
national Economic Review. Forthcoming. 203
302 References
403. Rheinländer, T. (2005) An entropy approach to the Stein and Stein model with
correlation. Finance & Stochastics 9, 399–413. 136
404. Richardson, M. and T. Smith (1993) A test for multivariate normality in stocks.
Journal of Business 66, 295–321. 99
405. Riedel, F. (2004) Dynamic coherent risk measures. Stochastic Processes & their
Applications 112, 185–200. 276
406. RiskMetrics Group (1997) CreditMetrics. Technical Document. Available at
http://www.riskmetrics.com/research 137
407. Rockafellar, R.T., S. Uryasev and M. Zabarankin (2005) Generalized Devia-
tions in Risk Analysis. Finance & Stochastics 9. Forthcoming. 4, 7, 8, 9
408. Rockafellar, R.T., S. Uryasev and M. Zabarankin (2005) Master Funds in Port-
folio Analysis with General Deviation Measures. Journal of Banking & Finance
29. Forthcoming. 4, 18
409. Rockinger, M. (2005) Modeling the Dynamics of Conditional Dependency be-
tween Financial Series. In Multi-Moment Capital Pricing Models and Related
Topics, C. Adcock, B. Maillet and E. Jurzenko, eds. Springer. Forthcoming.
100
410. Rockinger, M. and E. Jondeau (2002) Entropy densities with an application
to autoregressive conditional skewness and kurtosis. Journal of Econometrics
106, 119–142. 219
411. Rodkin, M.V. and V.F. Pisarenko (2001) Earthquake losses and casualties: A
statistical analysis. In Problems in Dynamics and Seismicity of the Earth: Coll.
Sci. Proc. Moscow. Computational Seismology 31, 242–272 (in Russian). 126
412. Rodriguez, J.C. (2003) Measuring financial contagion: A copula approach.
Working Paper, Eurandom. 100
413. Roll, R. (1988) The international crash of October 1987. Financial Analysts
Journal 4(5), 19–35. 26
414. Roll, R. (1994) What every CFO should know about scientific progress in
financial economics: What is known and what remains to be resolved. Financial
Management 23(2), 69–75. 1, 18, 19, 20, 24
415. Romer, D. (1996) Advanced Macroeconomics. McGraw-Hill, New York. 277
416. Rootzen, H., M.R. Leadbetter and L. de Haan (1998) On the distribution of
tail array sums for strongly mixing stationary sequences. Annals of Applied
Probability 8, 868–885. 43
417. Rosenberg, J.V. (2003) Nonparametric pricing of multivariate contingent
claims. Journal of Derivatives, Spring. 124, 131
418. Ross, S. (1976) The arbitrage theory of capital asset pricing. Journal of Eco-
nomic Theory 17, 254–286. 19
419. Rothschild, M. and J.E. Stiglitz (1970) Increasing risk I: A definition. Journal
of Economic Theory 2, 225–243. 4
420. Rothschild, M. and J.E. Stiglitz (1971) Increasing risk II: Its economic conse-
quences. Journal of Economic Theory 3, 66–84. 4
421. Rubinstein, M. (1973) The fundamental theorem of parameter-preference se-
curity valuation. Journal of Financial & Quantitative Analysis 8, 61–69. 15,
16, 58
422. Rüschendorf, L. (1974) Asymptotic distributions of multivariate rank order
statistics. Annals of Mathematical Statistics 4, 912–923. 198, 222
423. Ruymgaart, F.H. (1974) Asymptotic normality of nonparametric tests for in-
dependence. Annals of Mathematical Statistics 2, 892–910. 198, 222
304 References
424. Ruymgaart, F.H., G.R. Shorack and W.R. van Zwet (1972) Asymptotic nor-
mality of nonparametric tests for independence. Annals of Statistics 44, 1122–
1135. 198, 222
425. Samuelson, P.A. (1965) Rational theory of warrant pricing. Industrial Manage-
ment Review 6(Spring), 13–31. 38, 80
426. Sarfraz, M. (2003) Advances in Geometric Modeling. Wiley, Hoboken. 3
427. Sargent, T.J. (1987) Dynamic Macroeconomic Theory. Harvard University
Press, Cambridge, MA. 19
428. Scott, D.W. (1992) Multivariate Density Estimation: Theory, Practice, and
Visualization. Wiley, New York. 3
429. Sharpe, W. (1964) Capital asset prices: A theory of market equilibrium under
condition of risk. Journal of Finance 19, 425–442. 14, 16, 38
430. Schloegl, L. and D. O’Kane (2004) Pricing and risk-managing CDO tranches.
Quantitative Credit Research, Lehman Brothers. 181
431. Schmeidler, D. (1986) Integral representation without additivity. Proceedings
of the American Mathematical Society 97, 255–261. 7
432. Schumpeter, J.A. (1939) Business Cycles: A Theoretical, Historical and Sta-
tistical Analysis of the Capitalist Process. McGraw-Hill, New York. 277
433. Schürmann, J. (1996) Pattern Classification: A Unified View of Statistical and
Neural Approaches. Wiley, New York. 3
434. Schweizer, M. (1995) On the minimal martingale measure and the Föllmer-
Schweizer decomposition. Stochastic Analysis & Applications 13, 573–599. 136
435. Schweizer, M. (1999) A minimality property of the minimal martingale mea-
sure. Statistics & Probability Letters 42, 27–31. 136
436. Serva, M., U.L. Fulco, M.L. Lyra and G.M. Viswanathan (2002) Kinematics of
stock prices. Preprint at http://arXiv.org/abs/cond-mat/0209103. 78
437. Shefrin, H. (2000) Beyond Greed and Fear: Understanding Behavioral Finance
and the Psychology of Investing. Harvard Business School Press, Boston, MA.
4, 281
438. Shiller, R.J. (2000) Irrational Exuberance. Princeton University Press, Prince-
ton, NJ. 23
439. Shleifer, A. (2000) Inefficient Markets: An Introduction to Behavioral Finance,
Clarendon Lectures in Economics. Oxford University Press, Oxford. 4, 281
440. Silvapulle, P. and C.W.J. Granger (2001) Large returns, conditional correlation
and portfolio diversification: A Value-at-Risk approach. Quantitative Finance
1, 542–551. 231
441. Simon, H.A. (1957) Models of Man: Social and Rational; Mathematical Essays
on Rational Human BEHAVIOR in a Social Setting. Wiley, New York. 39
442. Skaug, H. and D. Tjostheim (1996) Testing for serial independence using mea-
sures of distance between densities. In Athen Conference on Applied Probability
and Time Series, P. Robinson and M. Rosenblatt, eds. Springer, New York.
163
443. Sklar, A. (1959) Fonction de répartition à n dimensions et leurs marges. Pub-
lication de l’Institut de Statistique de l’Université de Paris 8, 229–231. 103,
104
444. Smith, R.L. (1985) Maximum likelihood estimation in a class of non-regular
cases. Biometrika 72, 67–90. 44, 48
445. Sobehart, J. and R. Farengo (2003) A dynamical model of market under- and
overreaction. Journal of Risk 5(4). 57
References 305
446. Sornette, D. (1998) Linear stochastic dynamics with nonlinear fractal proper-
ties. Physica A 250, 295–314. 39
447. Sornette, D. (1998) Large deviations and portfolio optimization. Physica A
256, 251–283. 2
448. Sornette, D. (2002) Predictability of catastrophic events: Material rupture,
earthquakes, turbulence, financial crashes and human birth. Proceedings of the
National Academy of Science USA 99 (Suppl), 2522–2529. 277
449. Sornette, D. (2003) Critical market crashes. Physics Reports 378, 1–98. 276
450. Sornette, D. (2003) Why Stock Markets Crash, Critical Events in Complex
Financial Systems. Princeton University Press, Princeton, NJ. VII, 23, 79,
211, 276
451. Sornette, D. (2004) Critical Phenomena in Natural Sciences, Chaos, Frac-
tals, Self-organization and Disorder: Concepts and Tools, 2nd enlarged edition.
Springer Series in Synergetics, Heidelberg. VIII, 22, 42, 46, 58, 78, 277
452. Sornette, D. (2005) Endogenous versus exogenous origins of crises. In Extreme
Events in Nature and Society, S. Albeverio, V. Jentsch and H. Kantz, eds.
Springer, Heidelberg. 278
453. Sornette, D., J.V. Andersen and P. Simonetti (2000) Portfolio theory for “fat
tails”. International Journal of Theoretical & Applied Finance 3, 523–535. 2,
58, 78, 108, 219, 233
454. Sornette, D. and R. Cont (1997) Convergent multiplicative processes repelled
from zero: Power laws and truncated power laws. Journal de Physique I France
7, 431–444. 39
455. Sornette, D., F. Deschatres, T. Gilbert and Y. Ageon (2004) Endogenous versus
exogenous shocks in complex networks: An empirical test. Physical Review
Letters 93(22), 228701. 278
456. Sornette, D., Y. Malevergne and J.F. Muzy (2003) What causes craches? Risk
16 (2), 67–71. http://arXiv.org/abs/cond-mat/0204626. 87, 210, 278
457. Sornette, D., Simonetti, P. and Andersen, J.V. (2000) φq -field theory for portfo-
lio optimization: “Fat-tails” and non-linear correlations. Physics Reports 335,
19–92. 34
458. Sornette D. and W.-X. Zhou (2005) Predictability of large future changes in
complex systems. International Journal of Forecasting. Forthcoming. Available
at http://arXiv.org/abs/cond-mat/0304601. 278
459. Sornette, D. and W.-X. Zhou (2004) Non-parametric determination of real-
time lag structure between two time series: The “optimal thermal causal path”
method. Working Paper. Available at http://arXiv.org/abs/cond-mat/
0408166 281
460. Sornette, D. and W.-X. Zhou (2005) Importance of positive feedbacks and over-
confidence in a self-fulfilling Ising model of financial Markets. Working Paper.
Available at http://arxiv.org/abs/cond-mat/0503607. 279
461. Srivastava, M.S. (2002) Methods of Multivariate Statistics. Wiley-Interscience,
New York. 3
462. Starica, C. (1999) Multivariate extremes for models with constant conditional
correlations. Journal of Empirical Finance 6, 515–553. 232
463. Starica, C. and O. Pictet (1999) The tales the tails of GARCH(1,1) process
tell. Working Paper, University of Pennsylvania. 66
464. Stollnitz, E.J., T.D. DeRose and D.H. Salesin (1996) Wavelets for Computer
Graphics: Theory and Applications. Morgan Kaufmann Publishers, San Fran-
cisco, CA. 3
306 References
465. Stuart, A. and K. Ord (1994) Kendall’s Advances Theory of Statistics. Wiley,
New York. 10, 58
466. Stulz, R.M. (1982) Options on the minimum or the maximum of two risky
assets: Analysis and applications. Journal of Financial Economics 10, 161–
185. 135
467. Szergö, G. (1999) A critique to Basel regulation, or how to enhance (im)moral
hazards. In Proceedings of the International Conference on Risk Management
and Regulation in Banking. Bank of Israel, Kluwer Academic Press, Dordrecht,
Boston. VIII
468. Tabachnick, B.G. and L.S. Fidell. (2000) Using Multivariate Statistics, 4th
edition. Pearson Allyn & Bacon, Boston and New York. 3
469. Taleb, N.N. (2004) Fooled by Randomness: The Hidden Role of Chance in Life
and in the Markets, 2nd edition. Texere, New York. 21
470. Taleb, N. (2004) Learning to expect the unexpected. The New York Times,
April 8. 36
471. Tashe, D. (2002) Expected shortfall and beyond. Journal of Banking & Finance
26, 1519–1533. 6
472. Tasche, D. and L. Tibiletti (2001) Approximations for the Value-at-Risk ap-
proach to risk-return analysis. The ICFAI Journal of Financial Risk Manage-
ment 1(4), 44–61. 128
473. Taylor, S.J. (1994) Modeling stochastic volatility: A review and comparative
study. Mathematical Finance 2, 183–204. 37
474. Thaler, R.H. (1993) Advances in Behavioral Finance. Russell Sage Foundation,
New York. 4, 281
475. Toulouse, G. (1977) Theory of the frustration effect in spin glasses. Communi-
cations in Physics 2, 115–119. 164
476. Tsallis, C. (1988) Possible generalization of Boltzmann–Gibbs statistics. Jour-
nal of Statistical Physics 52, 479–487. For an updated bibliography on this
subject, see http://tsallis.cat.cbpf.br/biblio.htm 163
477. Tsui, A.K. and Q. Yu (1999) Constant conditional correlation in a bivariate
GARCH model: Evidence from the stock markets of China. Mathematics &
Computers in Simulation 48, 503–509. 231, 276
478. Valdez, E.A. (2001) Bivariate analysis of survivorship and persistency. Insur-
ance: Mathematics & Economics 29, 357–373. 100
479. van der Vaart, A.W., R. Gill, B.D. Ripley, S. Ross, B. Silverman and M. Stein
(2000) Asymptotic Statistics. Cambridge University Press, Cambridge. 275
480. Vaupel, J.W., K.G. Manton and E. Stallard (1979) The impact of heterogeneity
in individual frailty on the dynamics of mortality. Demography 16, 439–454.
113
481. Vannimenus, J. and G. Toulouse (1977) Theory of the frustration effect: II.
Ising spins on a square lattice. Journal of Physics C: Solid State Physics 10,
537–541. 164
482. Von Neuman, J. and O. Morgenstern (1944) Theory of Games and Economic
Behavior. Princeton University Press, Princeton, NJ. 4, 272
483. Wang, T. (2002) A class of dynamic risk measures. Working Paper, Faculty of
Commerce and Business Administration, U.B.C. 276
484. Wilcox, R.R. (2004) Introduction to Robust Estimation and Hypothesis Testing,
2nd edition. Academic Press, New York. 275
485. Wilks, S.S. (1938) The large sample distribution of the likelihood ratio for
testing composite hypotheses. Annals of Mathematical Statistics 9, 60–62. 71
References 307
Akaike information criterion 215 Clayton’s copula 112, 114, 200, 217,
ALAE 201 249
Anderson-Darling distance 61 Kendall’s tau 156
Arbitrage 35, 132, 133 simulation 123
Arbitrage pricing theory X, 19 tail dependence 171
ARCH see GARCH Coefficient of tail dependence see Tail
Archimedean copula 111, 204, 255 dependence
Clayton 112, 114 Coherent measures of risk 4, 276
Frank 112 Comonotonicity 101, 102, 107, 149,
Gumbel 112 155, 160
Kendall’s tau 155 Complete market 133, 136
orthant dependence 166 Complete monotonicity 111
tail dependence 170 Concordance measure 154–162, 165
Asian crisis 211, 230 Conditional correlation coefficient 233
Associativity 114 Consistent measures of risk 7
Asymptotic independence 169 Contagion XI, 231, 260
Contingent claim see Option
Bank for International Settlements Convex measure of risk 7
VIII, 35 Copula X, 34, 35, 103, 273
Bernstein polynomial 191 Archimedean see Archimedean
Bhattacharya-Matusita-Hellinger copula
dependence metric 163, 203 dual 104, 118
Black swan see Outlier elliptical see Elliptical copula
Black-Merton-Scholes’ option pricing extreme value see Extreme value
model VIII, 20, 38 copula
Book-to-market 18 Fréchet-Hoeffding bounds 106
Bootstrap 62, 192, 207 survival 104, 114, 132, 140, 166, 215
British Pound 208, 210 Correlation coefficient 2, 24, 99, 105,
147–154, 165, 173, 174, 189, 219,
Canonical coefficient of N -correlation 220
153, 154 Hoeffding identity 149
Capital asset pricing model IX, 14, 38 Countermonotonicity 107, 149, 155,
Central limit theorem VIII 160, 164
310 Index
Gaussian distribution 2, 37, 148, 169, Lévy stable law 2, 39, 42, 148, 243
175, 233 Lambert function 172
General deviation measures 8 Laplace transform 113
Generalized Extreme Value distribution Latin American crisis 228, 260
45, 47, 116 Argentinean crisis 231, 233, 240, 261
Generalized Pareto distribution 39, Mexican crisis 230, 233, 240, 247,
44–47, 116 261
German Mark 197–199, 208–210, 214, Linear dependence see Correlation
215, 276 coefficient
Gini’s gamma 161, 165 Local correlation coefficient 151
Girsanov theorem 144 Log infinitely divisible process 41
Gnedenko theorem 45, 76 Log-normal distribution 37, 60, 78, 150
Goodness of fit 61, 164, 189 Log-Weibull distribution 60, 69, 91
GPBH theorem 115 LTCM 24
Great Depression 53 Lunch effect 53
Gumbel distribution 45, 48
Gumbel’s copula 112, 117 Malaysian Ringit 208, 210, 214
Kendall’s tau 156 Market crash VII, 23, 36, 38
simulation 123 April 2000 230
tail dependence 171 October 1987 VII, VIII, 26, 230, 247
Market index
Heavy tail 2, 12, 15, 36, 38, 42, 57, 157 CSFB/Tremont 167
Heteroscedasticity see Volatility Dow Jones Industrial Average 44,
clustering 53, 62, 78
High frequency data 35, 37, 44 Ibov 240
Hill estimator 43, 48, 64, 240 Ipsa 240
Hoeffding identity 149 Merval 240
Mexbol 240
Ibov index 240
Nasdaq Composite 53, 77, 230
Incomplete gamma function 57
Standard & Poor’s 500 39, 60, 129,
Inflation VII
167, 177, 216
Information matrix
Market liquidity 6, 41
Fisher 198, 201, 222
Market trend 231, 234, 253, 255
Godambe 202
Markowitz’ portfolio selection see
Internet bubble VII, VIII
Mean-variance portfolio theory
Invariance theorem 105
Maximum domain of attraction 45, 46
Ipsa index 240
Mean-variance portfolio theory VIII,
Japanese Yen 197, 199, 208, 215, 217 33, 38, 58
Merton model of credit risk 20, 137
Kendall’s tau 154, 165, 196, 200, 249 Merval index 240
Archimedean copula 155 Meta-elliptical distribution 109
elliptical copula 157 Meta-Gaussian distribution 108
Kernel estimator 192 Mexbol index 240
King see Outlier Micro-structure 189, 215
KMV 138 Minimum option 135
Kolmogorov distance 61 Minority game 22
Kullback-Leibler divergence 61, 163 Mixture model 137
Modified-Weibull distribution 128, 131
Lévy process 35 Monte Carlo 120, 192
312 Index