Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Extreme Risk Finance

Download as pdf or txt
Download as pdf or txt
You are on page 1of 322

Extreme Financial Risks

Yannick Malevergne Didier Sornette

Extreme Financial Risks


From Dependence to Risk Management

ABC
Yannick Malevergne Didier Sornette
Institut de Science Financière et d’Assurances Institute of Geophysics and Planetary Physics
Université Claude Bernard Lyon 1 and Department of Earth and Space Science
50 Avenue Tony Garnier University of California, Los Angeles
69366 Lyon Cedex 07 California 90095
France USA

and and

EM Lyon Business School Laboratoire de Physique de la Matière


23 Avenue Guy de Collongue Condensée, CNRS UMR6622
69134 Ecully Cedex and Université des Sciences
France Parc Valrose
E-mail: yannick.malevergne@univ-lyon1.fr 06108 Nice Cedex 2
France
E-mail: sornette@moho.ess.ucla.edu

Library of Congress Control Number: 2005930885

ISBN-10 3-540-27264-X Springer Berlin Heidelberg New York


ISBN-13 978-3-540-27264-9 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations are
liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springeronline.com
c Springer-Verlag Berlin Heidelberg 2006
Printed in The Netherlands
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: by the authors and TechBooks using a Springer LATEX macro package
Cover design: design & production GmbH, Heidelberg
Printed on acid-free paper SPIN: 10939901 54/TechBooks 543210
An error does not become truth by
reason of multiplied propagation, nor
does truth become error because no-
body sees it.
M.K. Gandhi
Preface: Idiosyncratic
and Collective Extreme Risks

Modern western societies have a paradoxical relationship with risks. On the


one hand, there is the utopian quest for a zero-risk society [120]. On the other
hand, human activities may increase risks of all kinds, from collaterals of new
technologies to global impacts on the planet. The characteristic multiplication
of major risks in modern society and its reflexive impact on its development
is at the core of the concept of the “Risk Society” [47]. Correlatively, our per-
ception of risk has evolved so that catastrophic events (earthquakes, floods,
droughts, storms, hurricanes, volcanic eruptions, and so on) are no more sys-
tematically perceived as unfair outcomes of an implacable destiny. Catastro-
phes may also result from our own technological developments whose com-
plexity may engender major industrial disasters such as Bhopal, Chernobyl,
AZT, as well as irreversible global changes such as global warming leading to
climatic disruptions or epidemics from new bacterial and viral mutations. The
proliferation of new sources of risks imposes new responsibilities concerning
their determination, understanding, and management. Government organiza-
tions as well as private institutions such as industrial companies, insurance
companies, and banks which have to face such risks, in their role of regulators
or of risk bearers, must ensure that the consequences of extreme risks are
supportable without endangering the institutions in charge of bearing these
risks.
In the financial sector, crashes probably represent the most striking events
among all possible extreme phenomena, with an impact and frequency that
has been increasing in the last two decades [450]. Consider the worldwide
crash in October 1987 which evaporated more than one thousand billion dol-
lars in a few days or the more recent collapse of the internet bubble in which
more than one-third of the world capitalization of 1999 disappeared after
March 2000. Finance and stock markets are based on the fluid convertibility
of stocks into money and vice versa. Thus, to work well, money is requested
to be a reliable standard of value, that is, an effective store of value, hence the
concerns with the negative impacts of inflation. Similarly, investors look at the
various financial assets as carriers of value, like money, but with additional
VIII Preface: Idiosyncratic and Collective Extreme Risks

return potentials (accompanied with downturn risks). But for this view to
hold so as to promote economic development, fluctuations in values need to
be tamed to minimize the risk of losing a lifetime of savings, or to avoid
the risks of losing the investment potential of companies, or even to prevent
economic and social recessions in whole countries (consider the situation of
California after 2002 with a budget gap representing more than one-fourth of
the entire State budget resulting essentially from the losses of financial and
tax incomes following the collapse of the internet bubble). It is thus highly
desirable to have the tools for monitoring, understanding, and limiting the ex-
treme risks of financial markets. Fully aware of these problems, the worldwide
banking organizations have promoted a series of advices and norms, known as
the recommendations of the Basle committee [41, 42]. The Basle committee
has proposed models for the internal management of risks and the imposi-
tion of minimum margin requirements commensurate with the risk exposures.
However, some criticisms [117, 467] have found these recommendations to be
ill-adapted or even destabilizing. This controversy underlines the importance
of a better understanding of extreme risks, of their consequences and ways to
prevent or at least minimize them.
In our opinion, tackling this challenging problem requires to decompose
it into two main parts. First, it is essential to be able to accurately quan-
tify extreme risks. This calls for the development of novel statistical tools
going significantly beyond the Gaussian paradigm which underpins the stan-
dard framework of classical financial theory inherited from Bachelier [26],
Markowitz [347], and Black and Scholes [60] among others. Second, the ex-
istence of extreme risks must be considered in the context of the practice
of risk management itself, which leads to ask whether extreme risks can be
diversified away similarly to standard risks according to the mean-variance
approach. If the answer to this question is negative as can be surmized for nu-
merous concrete empirical evidences, it is necessary to develop new concepts
and tools for the construction of portfolios with minimum (but unavoidable)
exposition of extreme risks. One can think of mixing equities and derivatives,
as long as derivatives themselves do not add an extreme risk component and
can really provide an insurance against extreme moves, which has been far
from true in recent dramatic instances such as the crash of October 1987.
Another approach could involve mutualism as in insurance.
Risk management, and to the same extent portfolio management, thus re-
quires a precise and rigorous analysis of the distribution of the returns of the
portfolio of risks. Taking into account the moderate sizes of standard portfo-
lios (from tens to thousands of assets typically) and the non-Gaussian nature
of the distributions of the returns of assets constituting the portfolios, the
distributions of the returns of typical portfolios are far from Gaussian, in con-
tradiction with the expectation from a naive use of the central limit theorem
(see for instance Chap. 2 of [451] and other chapters for a discussion of the
deviations from the central limit theorem). This breakdown of universality
then requires a careful estimation of the specific case-dependent distribution
Preface: Idiosyncratic and Collective Extreme Risks IX

of the returns of a given portfolio. This can be done directly using the time
series of the returns of the portfolio for a given capital allocation. A more con-
structive approach consists in estimating the joint distribution of the returns
of all assets constituting the portfolio. The first approach is much simpler and
rapid to implement since it requires solely the estimation of a monovariate
distribution. However, it lacks generality and power by neglecting the observ-
able information available from the basket of all returns of the assets. Only
the multivariate distribution of the returns of the assets embodies the gen-
eral information of all risk components and their dependence across assets.
However, the two approaches become equivalent in the following sense: the
knowledge of the distribution of the returns for all possible portfolios for all
possible allocations of capital between assets is equivalent to the knowledge
of the multivariate distributions of the asset returns. All things considered,
the second approach appears preferable on a general basis and is the method
mobilizing the largest efforts both in academia and in the private sector.
However, the frontal attack aiming at the determination of the multivari-
ate distribution of the asset returns is a challenging task and, in our opinion,
much less instructive and useful than the separate studies of the marginal
distributions of the asset returns on the one hand and the dependence struc-
ture of these assets on the other hand. In this book, we emphasize this second
approach, with the objective of characterizing as faithfully as possible the di-
verse origins of risks: the risks stemming from each individual asset and the
risks having a collective origin. This requires to determine (i) the distributions
of returns at different time scales, or more generally, the stochastic process
underlying the asset price dynamics, and (ii) the nature and properties of
dependences between the different assets.
The present book offers an original and systematic treatment of these two
domains, focusing mainly on the concepts and tools that remain valid for
large and extreme price moves. Its originality lies in detailed and thorough
presentations of the state of the art on (i) the different distributions of finan-
cial returns for various applications (VaR, stress testing), and (ii) the most
important and useful measures of dependences, both unconditional and con-
ditional and a study of the impact of conditioning on the size of large moves
on the measure of extreme dependences. A large emphasis is thus put on the
theory of copulas, their empirical testing and calibration, as they offer intrin-
sic and complete measures of dependences. Many of the results presented here
are novel and have not been published or have been recently obtained by the
authors or their colleagues. We would like to acknowledge, in particular, the
fruitful and inspiring discussions and collaborations with J.V. Andersen, U.
Frisch, J.-P. Laurent, J.-F. Muzy, and V.F. Pisarenko.
Chapter 1 describes a general framework to develop “coherent measures” of
risks. It also addresses the origins of risks and of dependence between assets in
financial markets, from the CAPM (capital asset pricing model) generalized to
the non-Gaussian case with heterogeneous agents, the APT (arbitrage pricing
X Preface: Idiosyncratic and Collective Extreme Risks

theory), the factor models to the complex system view suggesting an emergent
nature for the risk-return trade-off.
Chapter 2 addresses the problem of the precise estimation of the probabil-
ity of extreme events, based on a description of the distribution of asset returns
endowed with heavy tails. The challenge is thus to specify accurately these
heavy tails, which are characterized by poor sampling (large events are rare).
A major difficulty is to neither underestimate (Gaussian error) or overestimate
(heavy tail hubris) the extreme events. The quest for a precise quantification
opens the door to model errors, which can be partially circumvented by using
several families of distributions whose detailed comparisons allow one to dis-
cern the sources of uncertainty and errors. Chapter 2 thus discusses several
classes of heavy tailed distributions: regularly varying distributions (i.e., with
asymptotic power law tails), stretched-exponential distributions (also known
as Weibull or subexponentials) as well as log-Weibull distributions which ex-
trapolate smoothly between these different families.
The second element of the construction of multivariate distributions of as-
set returns, addressed in Chaps. 3–6, is to quantify the dependence structure
of the asset returns. Indeed, large risks are not due solely to the heavy tails of
the distribution of returns of individual assets but may result from a collective
behavior. This collective behavior can be completely described by mathemat-
ical objects called copulas, introduced in Chap. 3, which fully embody the
dependence between asset returns.
Chapter 4 describes synthetic measures of dependences, contrasting and
linking them with the concept of copulas. It also presents an original estima-
tion method of the coefficient of tail dependence, defined, roughly speaking, as
the probability for an asset to lose a large amount knowing that another asset
or the market has also dropped significantly. This tail dependence is of great
interest because it addresses in a straightforward way the fundamental ques-
tion whether extreme risks can be diversified away or not by aggregation in
portfolios. Either the tail dependence coefficient is zero and the extreme losses
occur asymptotically independently, which opens the possibility of diversify-
ing them away. Alternatively, the tail dependence coefficient is non-zero and
extreme losses are fundamentally dependent and it is impossible to completely
remove extreme risks. The only remaining strategy is to develop portfolios that
minimize the collective extreme risks, thus generalizing the mean-variance to
a mean-extreme theory [332, 336, 333].
Chapter 5 presents the main methods for estimating copulas of financial
assets. It shows that the empirical determination of a copula is quite delicate
with significant risks of model errors, especially for extreme events. Specific
studies of the extreme dependence are thus required.
Chapter 6 presents a general and thorough discussion of different mea-
sures of conditional dependences (where the condition can be on the size(s)
of one or both returns for two assets). Chapter 6 thus sheds new light on the
variations of the strength of dependence between assets as a function of the
sizes of the analyzed events. As a startling concrete application of conditional
Preface: Idiosyncratic and Collective Extreme Risks XI

dependences, the phenomenon of contagion during financial crises is discussed


in detail.
Chapter 7 presents a synthesis of the six previous chapters and then offers
suggestions for future work on dependence and risk analysis, including time-
varying measures of extreme events, endogeneity versus exogeneity, regime
switching, time-varying lagged dependence and so on.
This book has been written with the ambition to be useful to (a) the
student looking for a general and in-depth introduction to the field, (b) fi-
nancial engineers, economists, econometricians, actuarial professionals and re-
searchers, and mathematicians looking for a synoptic view comparing the pros
and cons of different modeling strategies, and (c) quantitative practitioners
for the insights offered on the subtleties and many dimensional components of
both risk and dependence. The content of this book will also be useful to the
broader scientific community in the natural sciences, interested in quantifying
the complexity of many physical, geophysical, biophysical etc. processes, with
a mounting emphasis on the role and importance of extreme phenomena and
their non-standard dependences.

Lyon, Nice and Los Angeles Yannick Malevergne


August 2005 Didier Sornette
Contents

1 On the Origin of Risks and Extremes . . . . . . . . . . . . . . . . . . . . . . 1


1.1 The Multidimensional Nature of Risk
and Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 How to Rank Risks Coherently? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Coherent Measures of Risks . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Consistent Measures of Risks and Deviation Measures . . 7
1.2.3 Examples of Consistent Measures of Risk . . . . . . . . . . . . . 10
1.3 Origin of Risk and Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 The CAPM View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 The Arbitrage Pricing Theory (APT)
and the Fama–French Factor Model . . . . . . . . . . . . . . . . . . 18
1.3.3 The Efficient Market Hypothesis . . . . . . . . . . . . . . . . . . . . 20
1.3.4 Emergence of Dependence Structures
in the Stock Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.5 Large Risks in Complex Systems . . . . . . . . . . . . . . . . . . . . 29
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.A Why Do Higher Moments Allow
us to Assess Larger Risks? . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2 Marginal Distributions of Returns . . . . . . . . . . . . . . . . . . . . . . . . . 33


2.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 A Brief History of Return Distributions . . . . . . . . . . . . . . . . . . . . 37
2.2.1 The Gaussian Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.2 Mechanisms for Power Laws in Finance . . . . . . . . . . . . . . 39
2.2.3 Empirical Search for Power Law Tails
and Possible Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3 Constraints from Extreme Value Theory . . . . . . . . . . . . . . . . . . . . 43
2.3.1 Main Theoretical Results on Extreme Value Theory . . . 45
2.3.2 Estimation of the Form Parameter and Slow
Convergence to Limit Generalized Extreme Value
(GEV) and Generalized Pareto (GPD) Distributions . . . 47
XIV Contents

2.3.3 Can Long Memory Processes Lead to Misleading


Measures of Extreme Properties? . . . . . . . . . . . . . . . . . . . . 51
2.3.4 GEV and GPD Estimators of the Distributions
of Returns of the Dow Jones and Nasdaq Indices . . . . . . 52
2.4 Fitting Distributions of Returns with Parametric Densities . . . . 54
2.4.1 Definition of Two Parametric Families . . . . . . . . . . . . . . . 54
2.4.2 Parameter Estimation Using Maximum Likelihood
and Anderson-Darling Distance . . . . . . . . . . . . . . . . . . . . . 60
2.4.3 Empirical Results on the Goodness-of-Fits . . . . . . . . . . . 62
2.4.4 Comparison of the Descriptive Power
of the Different Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.5.2 Is There a Best Model of Tails? . . . . . . . . . . . . . . . . . . . . . 76
2.5.3 Implications for Risk Assessment . . . . . . . . . . . . . . . . . . . . 78
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.A Definition and Main Properties of Multifractal Processes 80
2.B A Survey of the Properties
of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . 87
2.C Asymptotic Variance–Covariance of Maximum
Likelihood Estimators of the SE Parameters . . . . . . . . . . 91
2.D Testing the Pareto Model versus
the Stretched-Exponential Model . . . . . . . . . . . . . . . . . . . . 93

3 Notions of Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.1 What is Dependence? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.2 Definition and Main Properties of Copulas . . . . . . . . . . . . . . . . . . 103
3.3 A Few Copula Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.3.1 Elliptical Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.3.2 Archimedean Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.3.3 Extreme Value Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.4 Universal Bounds for Functionals
of Dependent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.5 Simulation of Dependent Data with a Prescribed Copula . . . . . 120
3.5.1 Simulation of Random Variables Characterized
by Elliptical Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.5.2 Simulation of Random Variables Characterized
by Smooth Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.6 Application of Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.6.1 Assessing Tail Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.6.2 Asymptotic Expression of the Value-at-Risk . . . . . . . . . . 128
3.6.3 Options on a Basket of Assets . . . . . . . . . . . . . . . . . . . . . . . 131
3.6.4 Basic Modeling of Dependent Default Risks . . . . . . . . . . . 137
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Contents XV

3.A Simple Proof of a Theorem on Universal Bounds


for Functionals of Dependent Random Variables . . . . . . . 138
3.B Sketch of a Proof of a Large Deviation Theorem
for Portfolios Made of Weibull Random Variables . . . . . . 140
3.C Relation Between the Objective
and the Risk-Neutral Copula . . . . . . . . . . . . . . . . . . . . . . . . 143

4 Measures of Dependences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147


4.1 Linear Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.1.1 Correlation Between Two Random Variables . . . . . . . . . . 147
4.1.2 Local Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.1.3 Generalized Correlations Between N > 2 Random
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.2 Concordance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.2.1 Kendall’s Tau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.2.2 Measures of Similarity Between Two Copulas . . . . . . . . . 158
4.2.3 Common Properties of Kendall’s Tau,
Spearman’s Rho and Gini’s Gamma . . . . . . . . . . . . . . . . . 161
4.3 Dependence Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.4 Quadrant and Orthant Dependence . . . . . . . . . . . . . . . . . . . . . . . . 164
4.5 Tail Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4.5.2 Meaning and Refinement of Asymptotic Independence . 168
4.5.3 Tail Dependence for Several Usual Models . . . . . . . . . . . . 170
4.5.4 Practical Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.A Tail Dependence Generated by Student’s Factor Model . 182

5 Description of Financial Dependences with Copulas . . . . . . . 189


5.1 Estimation of Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.1.1 Nonparametric Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.1.2 Semiparametric Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.1.3 Parametric Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
5.1.4 Goodness-of-Fit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.2 Description of Financial Data in Terms of Gaussian Copulas . . 204
5.2.1 Test Statistics and Testing Procedure . . . . . . . . . . . . . . . . 204
5.2.2 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.3 Limits of the Description in Terms of the Gaussian Copula . . . 212
5.3.1 Limits of the Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
5.3.2 Sensitivity of the Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
5.3.3 The Student Copula: An Alternative? . . . . . . . . . . . . . . . . 215
5.3.4 Accounting for Heteroscedasticity . . . . . . . . . . . . . . . . . . . 217
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
XVI Contents

5.A Proof of the Existence of a χ2 -Statistic


for Testing Gaussian Copulas . . . . . . . . . . . . . . . . . . . . . . . 221
5.B Hypothesis Testing with Pseudo Likelihood . . . . . . . . . . . 222

6 Measuring Extreme Dependences . . . . . . . . . . . . . . . . . . . . . . . . . . 227


6.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6.1.1 Suggestive Historical Examples . . . . . . . . . . . . . . . . . . . . . 230
6.1.2 Review of Different Perspectives . . . . . . . . . . . . . . . . . . . . 231
6.2 Conditional Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . 233
6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
6.2.2 Influence of the Conditioning Set . . . . . . . . . . . . . . . . . . . . 234
6.2.3 Influence of the Underlying Distribution
for a Given Conditioning Set . . . . . . . . . . . . . . . . . . . . . . . . 237
6.2.4 Conditional Correlation Coefficient on Both Variables . . 239
6.2.5 An Example of Empirical Implementation . . . . . . . . . . . . 240
6.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
6.3 Conditional Concordance Measures . . . . . . . . . . . . . . . . . . . . . . . 247
6.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
6.3.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
6.3.3 Empirical Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
6.4 Extreme Co-movements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
6.5 Synthesis and Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
6.A Correlation Coefficient for Gaussian Variables
Conditioned on Both X and Y Larger Than u . . . . . . . . 261
6.B Conditional Correlation Coefficient for Student’s
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
6.C Conditional Spearman’s Rho . . . . . . . . . . . . . . . . . . . . . . . . 270

7 Summary and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271


7.1 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
7.2 Outlook and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
7.2.1 Robust and Adaptive Estimation of Dependences . . . . . . 274
7.2.2 Outliers, Kings, Black Swans and Their Dependence . . . 276
7.2.3 Endogeneity Versus Exogeneity . . . . . . . . . . . . . . . . . . . . . 276
7.2.4 Nonstationarity and Regime Switching in Dependence . 279
7.2.5 Time-Varying Lagged Dependence . . . . . . . . . . . . . . . . . . . 280
7.2.6 Toward a Dynamical Microfoundation of Dependences . 281

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
1
On the Origin of Risks and Extremes

1.1 The Multidimensional Nature of Risk


and Dependence
In finance, the fundamental variable is the return that an investor accrues from
his investment in a basket of assets over a certain time period. In general, an
investor is interested in maximizing his gains while minimizing uncertainties
(“risks”) on the expected value of the returns on his investment, at possibly
multiple time scales – depending upon the frequency with which the manager
monitors the portfolio – and time periods – depending upon the investment
horizon. From a general standpoint, the return-risk pair is the unavoidable du-
ality underlying all human activities. The relationship between return and risk
constitutes one of the most important unresolved questions in finance. This
question permeates practically all financial engineering applications, and in
particular the selection of investment portfolios. There is a general consensus
among academic researchers that risk and return should be related, but the
exact quantitative specification is still beyond our comprehension [414].
Uncertainties come in several forms, which we cite in the order of increasing
aversion for most human beings:
(i) stochastic occurrences of events quantified by known probabilities;
(ii) stochastic occurrences of events with poorly quantified or unknown prob-
abilities;
(iii) random events that are “surprises,” i.e., that were previously thought
to be impossible or unthinkable until they happened and revealed their
existence.
Here we address the first form, using the mathematical tools of probability
theory.
Within this class of uncertainties, one must still distinguish several
branches. In the simplest traditional theory exemplified by Markowitz [347],
the uncertainties underlying a given set of positions (portfolio) result from
the interplay of two components: risk and dependence.
2 1 On the Origin of Risks and Extremes

(a) Risk is embedded in the amplitude of the fluctuations of the returns. its
simplest traditional measure is the standard deviation (square-root of the
variance).
(b) The dependence between the different assets of a portfolio of positions
is traditionally quantified by the correlations between the returns of all
pairs of assets.
Thus, in their most basic incarnations, both risk and dependence are thought
of, respectively, as one-dimensional quantities: the standard deviation of
the distribution of returns of a given asset and the correlation coefficient
of these returns with those of another asset of reference (the “market” for
instance). The standard deviation (or volatility) of portfolio returns provides
the simplest way to quantify its fluctuations and is at the basis of Markowitz’s
portfolio selection theory [347]. However, the standard deviation of a portfolio
offers only a limited quantification of incurred risks (seen as the statistical fluc-
tuations of the realized return around its expected – or anticipated – value).
This is because the empirical distributions of returns have “fat tails” (see
Chap. 2 and references therein), a phenomenon associated with the occur-
rence of non-typical realizations of the returns. In addition, the dependences
between assets are only imperfectly accounted for by the covariance matrix
[309].
The last few decades have seen two important extensions.
• First, it has become clear, as synthesized in Chap. 2, that the standard
deviation offers only a reductive view of the genuine full set of risks em-
bedded in the distribution of returns of a given asset. As distributions of
returns are in general far from Gaussian laws, one needs more than one
centered moment (the variance) to characterize them. In principle, an in-
finite set of centered moments is required to faithfully characterize the
potential for small all the way to extreme risks because, in general, large
risks cannot be predicted from the knowledge of small risks quantified by
the standard deviation. Alternatively, the full space of risks needs to be
characterized by the full distribution function. It may also be that the dis-
tributions are so heavy-tailed that moments do not exist beyond a finite
order, which is the realm of asymptotic power law tails, of which the stable
Lévy laws constitute an extreme class. The Value-at-Risk (VaR) [257] and
many other measures of risks [19, 20, 73, 447, 453] have been developed to
account for the larger moves allowed by non-Gaussian distributions and
non-linear correlations.
• Second and more recently, the correlation coefficient (and its associated
covariance) has been shown to only be a partial measure of the full de-
pendence structure between assets. Similarly to risks, a full understanding
of the dependence between two or more assets requires, in principle, an
infinite number of quantifiers or a complete dependence function such as
the copulas, defined in Chap. 3.
1.1 The Multidimensional Nature of Risk and Dependence 3

These two fundamental extensions from one-dimensional measures of risk


and dependence to infinitely dimensional measures of risk and dependence
constitute the core of this book. Chapter 2 reviews our present knowledge
and the open challenges in the characterization of distribution of returns.
Chapter 3 introduces the notion of copulas which are applied later in Chap. 5
to financial dependences. Chapter 4 describes the main properties of the most
important and varied measures of dependence, and underlines their connec-
tions with copulas. Finally, Chap. 6 expands on the best methods to capture
the dependence between extreme returns.
Understanding the risks of a portfolio of N assets involves the characteriza-
tion of both the marginal distributions of asset returns and their dependence.
In principle, this requires the knowledge of the full (time-dependent) mul-
tivariate distribution of returns, which is the joint probability of any given
realization of the N asset returns at a given time. This remark entails the
two major problems of portfolio theory: (1) to determine the multivariate
distribution function of asset returns; (2) to derive from it useful measures
of portfolio risks and use them to analyze and optimize the performance of
the portfolios. There is a large literature on multivariate distributions and
multivariate statistical analysis [363, 468, 282]. This literature includes:
• the use of the multivariate normal distribution on density estimation [428];
• the corresponding random vectors treated with matrix algebra, and thus
on matrix methods and multivariate statistical analysis [173, 371];
• the robust determination of multivariate means and covariances [297, 298];
• the use of multivariate linear regression and factor models [160, 161];
• principal component analysis, with excursions in clustering and classifica-
tion techniques [276, 254];
• methods for data analysis in cases with missing observations [133, 310];
• detecting outliers [249, 250];
• bootstrap methods and handling of multicollinearity [461];
• methods of estimation using the plug-in principles and maximum likeli-
hood [144];
• hypothesis testing using likelihood ratio tests and permutation tests [398];
• discrete multivariate distributions [253];
• computer-aided geometric design, geometric modeling, geodesic applica-
tions, and image analysis [464, 105, 426];
• radial basis functions [86], scattered data on spheres, and shift-invariant
spaces [139, 433];
• non-uniform spline wavelets [139];
• scalable algorithms in computer graphics [76];
• reverse engineering [139], and so on.
The growing literature on (1) non-stationary processes [85, 210, 222, 361]
and (2) regime-switching [172, 180, 215, 269] is not covered here. Nor do
we address the more complex issues of embedding financial modeling within
economics and social sciences. We do not cover either the consequences for risk
4 1 On the Origin of Risks and Extremes

assessment coming from the important emerging field of behavioral finance,


with its exploration of the impact on decision-making of imperfect bounded
subjective probability perceptions [36, 206, 437, 439, 474]. Our book thus uses
objective probabilities which can be estimated (with quantifiable errors) from
suitable analysis of available data.

1.2 How to Rank Risks Coherently?


The question on how to rank risks, so as to make optimal decisions, is recur-
rent in finance (and in many other fields) but has not yet received a general
solution.
Since the middle of the twentieth century, several paths have been ex-
plored. The pioneering work by Von Neuman and Morgenstern [482] has given
birth to the mathematical definition of the expected utility function, which
provides interesting insights on the behavior of a rational economic agent
and has formalized the concept of risk aversion. Based upon the properties
of the utility function, Rothschild and Stiglitz [419, 420] have attempted to
define the notion of increasing risks. But, as revealed by Allais [4, 5], em-
pirical investigations have proven that the postulates chosen by Von Neuman
and Morgenstern are actually often violated by humans. Many generalizations
have been proposed for curing the so-called Allais’ Paradox, but until now,
no generally accepted procedure has been found.
Recently, a theory due to Artzner et al. [19, 20] and its generalization by
Föllmer and Schied [174, 175] have appeared. Based on a series of postulates
that are quite natural, this theory allows one to build coherent (resp., convex)
measures of risks that provide tools to compare and rank risks [383]. In fact,
if this theory seems well-adapted to the assessment of the needed economic
capital, that is, of the fraction of capital a company must keep as risk-free
assets in order to face its commitments and thus avoid ruin, it seems less
natural for the purpose of quantifying the fluctuations of the asset returns
or equivalently the deviation from a predetermined objective. In fact, as will
be exposed in this section, it turns out that the two approaches consisting in
assessing the risk in terms of economic capital on the one hand, and in terms
of deviations from an objective on the other hand, are actually the two sides
of the same coin as recently shown in [407, 408].

1.2.1 Coherent Measures of Risks

According to Artzner et al. [19, 20], the risk involved in the variations of the
values of a market position is measured by the amount of capital invested
in a risk-free asset, such that the market position can be prolonged in the
future. In other words, the potential losses should not endanger the future
actions of the fund manager of the company, or more generally, of the person
or structure which underwrites the position. In this sense, a risk measure
1.2 How to Rank Risks Coherently? 5

constitutes for Artzner et al. a measure of economic capital. The risk measure
ρ can be either positive, if the risk-free capital must be increased to guarantee
the risky position, or negative, if the risk-free capital can be reduced without
invalidating it.
A risk measure is said to be coherent in the sense of Artzner et al. [19, 20]
if it obeys the four properties or axioms that we now list. Let us call G the
space of risks. If the space Ω of all possible states of nature is finite, G is
isomorphic to RN and a risky position X is nothing but a vector in RN . A risk
measure ρ is then a map from RN onto R. A generalization to other spaces G
of risk has been proposed by Delbaen [123].
Let us consider a risky position with terminal value X and a capital α
invested in the risk-free asset at the beginning of the time period. At the end
of the time period, α becomes α · (1 + µ0 ), where µ0 is the risk-free interest
rate. Then,
Axiom 1 (Translational Invariance)

∀X ∈ G and ∀α ∈ R, ρ(X + α · (1 + µ0 )) = ρ(X) − α . (1.1)

This simply means that an investment of amount α in the risk-free asset


decreases the risk by the same amount α. In particular, for any risky position
X, ρ(X +ρ(X)·(1+r)) = 0, which expresses that investing an amount ρ(X) in
the risk-free asset enables one to exactly make up for the risk of the position
X.
Let us now consider two risky investments X1 and X2 , corresponding to
the positions of two traders of an investment house. It is important for the
supervisor that the aggregated risk of all traders be less than the sum of risks
incurred by all traders. In particular, the risk associated with the position
(X1 + X2 ) should be smaller than or equal to the sum of the separated risks
associated with the two positions X1 and X2 .
Axiom 2 (Sub-additivity)

∀(X1 , X2 ) ∈ G × G, ρ(X1 + X2 ) ≤ ρ(X1 ) + ρ(X2 ) . (1.2)

The condition of sub-additivity encourages a portfolio managers to aggregate


her different positions by diversification to minimize her overall risk. This
axiom is probably the most debated among the four axioms underlying the
theory of coherent measures of risk (see [131] and references therein). As an
example, the VaR is well known to lack sub-additivity. At the same time, VaR
is comonotonically additive, which means that the VaR of two comonotonic
assets equals the sum of the VaR of each individual asset. But, since the
comonotonicity represents the strongest kind of dependence (see Chap. 3),
it is particularly disturbing to imagine that one can find situations where a
portfolio made of two comonotonic assets is less risky than a portfolio with
assets whose marginal risks are the same as in the previous situation but with
a weaker dependence. Here is the rub with sub-additivity.
6 1 On the Origin of Risks and Extremes

Axiom 3 (Positive Homogeneity)

∀X ∈ G and ∀λ ≥ 0, ρ(λ · X) = λ · ρ(X) . (1.3)

This third axiom stresses the importance of homogeneity. Indeed, it means


that the risk associated with a given position increases with its size, here
proportionally with it. Again, this axiom is controversial. Obviously, one can
assert that the risk associated with the position 2 · X is naturally twice as
large as the risk of X. This is true as long as we can consider that a large
position can be cleared as easily as a smaller one. However, it is not realistic
because of the limited liquidity of real markets; a large position in a given
asset is more risky than the sum of the risks associated with the many smaller
positions which add up to the large position.
Eventually, if it is true that, for all possible states of nature, the risk of X
leads to a loss larger than that of Y (i.e., all components of the vector X in
RN are always less than or equal to those of the vector Y ), the risk measure
ρ(X) must be larger than or equal to ρ(Y ) :
Axiom 4 (Monotony)

∀X, Y ∈ G such that X ≤ Y, ρ(X) ≥ ρ(Y ) . (1.4)

These four axioms define the coherent measures of risks, which admit the
following general representation:
 
−X
ρ(X) = sup EP , (1.5)
P∈P 1 + µ0

where P denotes a family of probability measures. Thus, any coherent measure


of risk appears as the expectation of the maximum loss over a given set of
scenarios (the different probability measures P ∈ P). It is then obvious that
the larger the set of scenarios, the larger the value of ρ(X) and thus, the more
conservative the risk measure.
It is particularly interesting that expression (1.5) is very similar to the
result obtained in the theory of utility with non-additive probabilities [202,
203]. Indeed, in such a case, the utility of position X is given by

U (X) = inf EP [u(X)] , (1.6)


P∈P

where u(·) is a usual utility function.


When the coherent risk measure is invariant in law and comonotonically
additive, an alternative representation holds in terms of the spectral measure
of risk [285, 471]
 1
ρ(X) = p VaRα (X) dF (α) + (1 − p)VaR1 (X) , (1.7)
0
1.2 How to Rank Risks Coherently? 7

where F is a continuous convex distribution function on [0, 1], p is any real


in [0, 1] and VaRα is the Value-at-Risk defined in (3.85) page 125. Therefore,
most coherent measures of risk appear as a convex sum of VaRα (a non-
coherent risk measure) at different probability levels. The weighting function
F can be interpreted as a distortion of the objective probabilities, as under-
lined in the non-expected utility context [431, 495].
Coherent measures of risk can be generalized to define the so-called con-
vex measures of risk by replacing the controversial axioms 2–3, by a single
axiom of convexity of the risk measure [174, 175]. In the case where the risk
measure is still positively homogeneous, this requirement is equivalent to the
sub-additivity, but it becomes less restrictive when Axiom 3 is discarded.
Then, one obtains the following representation of the convex risk measures:
 
−X
ρ(X) = sup EP − α (P) , (1.8)
P∈M 1 + µ0
where M is the set of all probability measures on (Ω, F), F denotes a σ-
algebra on the state space Ω. More generally, M is the set of all finitely
additive and non-negative set functions P on F satisfying P(Ω) = 1 and the
functional
 
−X
α (P) = sup EP (1.9)
X∈G|ρ(X)≤0 1 + µ0
is a penalty function that fully characterizes the convex measure of risk. In
the case of a coherent risk measure, the set P (in (1.5)) is in fact the class of
set functions P in M such that the penalty function vanishes: α(P) = 0.
Another alternative leads one to replace Axiom 4 by the following:
Axiom 5 (Expectation-Boundedness)
E [−X]
∀X ∈ G ρ(X) ≥ , (1.10)
1 + µ0
where the equality holds if and only if X is certain.1 Then, together with
axioms 1–3, it allows one to define the expectation-bounded risk measures
[407]. They are particularly interesting insofar as they enable one to capture
the inherent relationship existing between the assessment of risk in terms of
economic capital and the measure of risk in terms of deviations from a target
objective, as we shall see hereafter.

1.2.2 Consistent Measures of Risks and Deviation Measures


We now present a slightly different approach, which we think offers a suitable
complement to coherent (and/or convex) risk measures for financial invest-
ments, and in particular for portfolio risk assessments. These measures are
1
We say that X is certain if X(ω) = a, for some a ∈ R, for all ω ∈ Ω, such that
P(ω) = 0, where P denotes a probability measure on (Ω, F ) and F is a σ-algebra
so that (Ω, F , P) is a usual probability space.
8 1 On the Origin of Risks and Extremes

called “consistent measures of risks” in [333] and “general deviation measures”


in [407]. As before, we consider the future value of a risky position denoted
by X, and we call G the space of risks.
Let us first require that the risk measure ρ̃(·), which is a functional on G,
always remains positive:
Axiom 6 (Positivity)

∀X ∈ G , ρ̃(X) ≥ 0 , (1.11)

where the equality holds if and only if X is certain. Let us now add to this
position a given amount α invested in the risk-free asset whose return is µ0
(with therefore no randomness in its price trajectory) and define the future
wealth of the new position Y = X + α(1 + µ0 ). Since µ0 is non-random,
the fluctuations of X and Y are the same. Thus, it is desirable that ρ̃ en-
joys a property of translational invariance, whatever X and the non-random
coefficient α may be:

∀X ∈ G, ∀α ∈ R , ρ̃(X + (1 + µ0 ) · α) = ρ̃(X) . (1.12)

This relation is obviously true for all µ0 and α; therefore, we set


Axiom 7 (Translational Invariance)

∀X ∈ G, ∀κ ∈ R , ρ̃(X + κ) = ρ̃(X) . (1.13)

We also require that the risk measure increases with the quantity of assets
held in the portfolio. This assumption reads

∀X ∈ G, ∀λ ∈ R+ , ρ̃(λ · X) = f (λ) · ρ̃(X) , (1.14)

where the function f : R+ −→ R+ is increasing and convex to account for


liquidity risk, as previously discussed. In fact, it is straightforward to show2
that the only functions satisfying this requirement are the functions fζ (λ) =
λζ with ζ ≥ 1, so that Axiom 3 can be reformulated in terms of positive
homogeneity of degree ζ:
Axiom 8 (Positive Homogeneity)

∀X ∈ G, ∀λ ∈ R+ , ρ̃(λ · X) = λζ · ρ̃(X). (1.15)

Note that the case of liquid markets is recovered by ζ = 1 for which the risk is
directly proportional to the size of the position, as in the case of the coherent
risk measures.
These axioms, which define the so-called consistent measures of risk [333]
can easily be extended to the risk measures associated with the return on the
2
Using the trick ρ̃(λ1 λ2 ·X) = f (λ1 )·ρ̃(λ2 ·X) = f (λ1 )·f (λ2 )·ρ̃(X) = f (λ1 ·λ2 )·ρ̃(X)
leading to f (λ1 · λ2 ) = f (λ1 ) · f (λ2 ). The unique increasing convex solution of
this functional equation is fζ (λ) = λζ with ζ ≥ 1.
1.2 How to Rank Risks Coherently? 9

risky position. Indeed, a one-period return is nothing but the variation of the
value of the position divided by its initial value X0 . One can thus easily check
that the risk defined on the risky position is [X0 ]ζ times the risk defined on
the return distribution. In the following, we will only consider the risk defined
on the return distribution and, to simplify the notations, the symbol X will be
used to denote both the asset price and its return in their respective context
without ambiguity.
Now, restricting to the case of a perfectly liquid market (ζ = 1) and adding
a sub-additivity assumption
Axiom 9 (Sub-additivity)

∀(X, Y ) ∈ G × G , ρ̃(X + Y ) ≤ ρ̃(X) + ρ̃(X) , (1.16)

one obtains the so-called general deviation measures [407]. Again, this axiom is
open to controversy and its main raison d’être is to ensure the well-posedness
of optimization problems (such as minimizing portfolio risks). It could be
weakened along the lines used previously to derive the convex measures of
risk from the coherent measures of risk.
One can easily check that the deviation measures defined in (1.16) cor-
respond one-to-one to the expectation-bounded measures of risk defined in
(1.10) through the relation

E [−X]
ρ(X) = ρ̃(X) + ⇐⇒ ρ̃(X) = ρ (X + E [−X]) . (1.17)
1 + µ0

It follows straightforwardly that minimizing the risk of a portfolio (measured


either by ρ or by ρ̃) under constraints on the expected return is equivalent,
as long as the constraints on the expected return are active. Indeed, in such a
case, searching for the minimum of ρ̃ or of ρ̃(X) + E[−X]
1+µ0 is the same problem
since the value of the expected return is fixed by the constraints.
Additionally, it can be shown that the expectation-bounded measure of
risk ρ defined by (1.17) is coherent if (and only if) the deviation measure ρ̃
satisfies [407]

∀X ∈ G , ρ̃(X) ≤ E [X] − inf X . (1.18)

The general representation of the deviation measures satisfying this restric-


tion can be easily derived from the representation of coherent risk measures.
When such a requirement is not fulfilled, one can still have the following rep-
resentation:3
3
Strictly speaking, this representation only holds for lower semicontinous deviation
measures, i.e., deviation measures such that the sets {X|ρ̃(X) ≤ } are closed in
L2 (Ω), for all  > 0. This condition is fulfilled by most of the deviation measures
of common use: the standard deviation, the semi-standard deviation, the absolute
deviation, and so on.
10 1 On the Origin of Risks and Extremes

ρ̃(X) = sup E [Y · (E [X] − X)] = sup Cov(−X, Y ) , (1.19)


Y ∈Y Y ∈Y

where Y is a closed and convex subset of L2 (Ω) such that


1. 1 ∈ Y,
2. ∀Y ∈ Y, E [Y ] = 1,
3. ∀X ∈ L2 (Ω), ∃Y ∈ Y, such that E [Y · X] < E [X].
When the random variables in Y are all positive, they can be interpreted as
density functions relative to some reference probability measure P0 on (Ω, F)
(the objective probability measure). Thus, the term E [Y · X] is nothing but
the expectation of X under the probability measure P, such that its Radon
dP
density dP 0
= Y . Therefore, one obtains a deviation measure associated with
a coherent measure of risk.
These derivations show that deviation measures of risk on the one hand and
coherent (or convex/expectation-bounded) measures of risk on the other hand
are inextricably entangled. In fact, they are the two sides of the same coin,
as mentioned in the introduction to this section. The various representation
theorems show that, in most cases, these risk measures can be interpreted as
worst-case scenarios, which rationalizes the use of stress-testing procedures as
a sound practice for risk management.
In the more general case when the exponent ζ defined in Axiom 8 is no
more equal to 1, and more precisely, when we only require that Axioms 6–8
hold, there is no general representation for the consistent risk measures to
the best of our knowledge. The risk measures ρ̃ obeying Axioms 7 and 8 are
known as the semi-invariants of the distribution of returns of X (see [465,
pp. 86–87]). Among the large family of semi-invariants, we can cite the well-
known centered moments and cumulants of X (including the usual variance).
They are interesting cases that we discuss further below.

1.2.3 Examples of Consistent Measures of Risk

The set of risk measures obeying Axioms 7–8 is huge since it includes all the
homogeneous functionals of (X − E[X]), for instance. The centered moments
(or moments about the mean) and the cumulants are two well-known classes
of semi-invariants. Then, a given value of ζ can be seen as nothing but a
specific choice of the order n of the centered moments or of the cumulants.4
In this case, the risk measure defined via these semi-invariants fulfills the two
following conditions:
ρ̃(X + µ) = ρ̃(X) , (1.20)
ρ̃(λ · X) = λ · ρ̃(X) .
n
(1.21)
4
The relevance of the moments of high order for the assessment of large risks is
discussed in Appendix 1.A.
1.2 How to Rank Risks Coherently? 11

In order to satisfy the positivity condition (Axiom 6), one needs to restrict
the set of values taken by n. By construction, the centered moments of even
order are always positive while the odd order centered moments can be neg-
ative. In addition, a vanishing value of an odd order moment does not mean
that the random variable, or risk, X ∈ G is certain in the sense of footnote 1,
since for instance any symmetric random variable has vanishing odd order
moments. Thus, only the even-order centered moments seem acceptable risk
measures. However, this restrictive constraint can be relaxed by first recalling
that, given any homogeneous function f (·) of order p, the function f (·)q is
also homogeneous of order p · q. This allows one to decouple the order of the
moments to consider, which quantifies the impact of the large fluctuations,
from the influence of the size of the positions held, measured by the degree
of homogeneity of the measure ρ̃. Thus, considering any even-order centered
 ζ/2n
moments, we can build a risk measure ρ̃(X) = E (X − E[X])2n , which
accounts for the fluctuations measured by the centered moment of order 2n
but with a degree of homogeneity equal to ζ.
A further generalization is possible for odd-order moments. Indeed, the
absolute centered moments satisfy the three Axioms 6–8 for any odd or even
order. So, we can even go one step further and use non-integer order absolute
centered moments, and define the more general risk measure
ζ/γ
ρ̃(X) = E [|X − E[X]|γ ] , (1.22)

where γ denotes any positive real number.


Due to the Minkowski inequality, these risk measures are convex for any
ζ and γ larger than 1 (and for 0 ≤ u ≤ 1) :

ρ̃(u · X + (1 − u) · Y ) ≤ u · ρ̃(X) + (1 − u) · ρ̃(Y ) , (1.23)

which ensures that aggregating two risky assets diversifies their risk. In fact,
in the special case γ = 1, these measures enjoy the stronger sub-additivity
property, and therefore belong to the class of general deviation measures.
More generally, any discrete or continuous (positive) sum of these risk
measures with the same degree of homogeneity is again a risk measure.
This allows us to define “spectral measures of fluctuations” in the spirit of
Acerbi [2]:

γ ζ/γ
ρ̃(X) = dγ φ(γ) E [|X − E[X]| ] , (1.24)

where φ is a positive real-valued function defined on any subinterval of [1, ∞),


such that the integral in (1.24) remains finite. It is sufficient to restrict the
construction of ρ̃(X) to normalized functions φ, such that dγ φ(γ) = 1,
since the risk measures are defined up to a global normalization factor. Then,
φ(γ) represents the relative weight of the fluctuations measured by a given
moment order and can be considered as a measure of the risk aversion of the
risk manager with respect to large fluctuations.
12 1 On the Origin of Risks and Extremes

The situation is not so clear for the cumulants, since the even-order cumu-
lants, as well as the odd-order ones, can be negative (even if, for a large class
of distributions, even-order cumulants remain positive, especially for fat-tailed
distributions – even though there are simple but somewhat artificial counter-
examples). In addition, cumulants suffer from another problem with respect
to the positivity axiom. As for the odd-order centered moments, they can
vanish even when the random variable is not certain. Just think of the cu-
mulants of the Gaussian law. All but the first two (which represent the mean
and the variance) are equal to zero. Thus, the strict formulation of the posi-
tivity axiom cannot be fulfilled by the cumulants. Should we thus reject them
as useful measures of risks? It is important to emphasize that the cumulants
enjoy a property which can be considered as a natural requirement for a risk
measure. It can be desirable that the risk associated with a portfolio made of
independent assets is exactly the sum of the risk associated with each individ-
ual asset. Thus, given N independent assets {X1 , . . . , XN }, and the portfolio
SN = X1 + · · · + XN , we would like to have

ρ̃(SN ) = ρ̃(X1 ) + · · · + ρ̃(XN ) . (1.25)

This property is verified for all cumulants, while it does not hold for centered
moments excepted the variance. In addition, as seen from their definition in
terms of the characteristic function
 +∞
 ik·X   (ik)n
E e = exp Cn , (1.26)
n=1
n!

cumulants Cn of order larger than 2 quantify deviations from the Gaussian law
and therefore measure large risks beyond the variance (equal to the second-
order cumulant).
What are the implications of using the cumulants as almost consistent
measures of risks? In particular, what are the implications on the preferences
of the agents employing such measures? To address this question, it is infor-
mative to express the cumulants as a function of the centered moments. For
instance, let us consider the fourth-order cumulant:

C4 = µ4 − 3 · µ2 2 = µ4 − 3 · C2 2 , (1.27)

where µn is the centered moment of order n. An agent assessing the fluctua-


tions of an asset with respect to C4 exhibits an aversion for the fluctuations
quantified by the fourth central moment µ4 – since C4 increases with µ4 – but
is attracted by the fluctuations measured by the variance – since C4 decreases
with µ2 . This behavior is not irrational because it remains globally risk-averse.
Indeed, it depicts an agent which tries to avoid the larger risks but is ready to
accept the smallest ones. This kind of behavior is characteristic of any agent
using the cumulants as risk measures. In such a case, having C4 = 0 does not
mean that the agent considers that the position is not risky (in the sense that
1.3 Origin of Risk and Dependence 13

the position is certain) but that the agent is indifferent between the large risks
of this position measured by µ4 and the small risks quantified by µ2 .
To summarize, centered moments of even orders possess all the minimal
properties required for a suitable portfolio risk measure. Cumulants only par-
tially fulfill these requirements, but have an additional advantage compared
with the centered moments, that is, they fulfill the condition (1.25). For these
reasons, we think it is interesting to consider both the centered moments and
the cumulants in risk analysis and decision making. Finally let us stress that
the variance, originally used in Markowitz’s portfolio theory [347], is nothing
but the second centered moment, also equal to the second-order cumulant (the
three first cumulants and centered moments are equal). Therefore, a portfo-
lio theory based on the centered moments or on the cumulants automatically
contains Markowitz’s theory as a special case, and thus offers a natural gen-
eralization encompassing large risks of this masterpiece of financial science. It
also embodies several other generalizations where homogeneous measures of
risks are considered, as for instance in [241].
We should also mention the measure of attractiveness for risky invest-
ments, the gain–loss ratio, introduced by Bernardo and Ledoit [50]. The gain
(loss) of a portfolio is the expectation, under a benchmark risk-adjusted prob-
ability measure, of the positive (negative) part of the portfolio’s excess payoff.
The gain–loss ratio constitutes an improvement over the widely used Sharpe
ratio (average return over volatility). The advantage of the gain–loss ratio is
that it penalizes only downside risk (losses) and rewards all upside potential
(gains). The gain–loss ratio has been show to yield useful bounds for asset
pricing in incomplete markets that gives the modeler the flexibility to control
the trade-off between the precision of equilibrium models and the credibility
of no-arbitrage methods. The gain–loss approach is valuable in applications
where the security returns are not normally distributed. Bernardo and Ledoit
[50] cite the following domains of application: (i) valuing real options on non-
traded assets; (ii) valuing executive stock options when the executive cannot
trade the options or the underlying due to insider trading restrictions; (iii)
evaluating the performance of portfolio managers who invest in derivatives;
(iv) pricing options on a security whose price follows a jump-diffusion or a fat-
tailed Pareto–Levy diffusion process; and (v) pricing fixed income derivatives
in the presence of default risk.

1.3 Origin of Risk and Dependence


1.3.1 The CAPM View
Our purpose is not to review the huge literature on the origin of risks and their
underlying mechanisms, but to suggest guidelines for further understanding.
For enticing introductions and synopses, we refer to the very readable books of
Bernstein [51, 52]. In [51], Bernstein reviews the history, since ancient times,
of those thinkers who showed how to quantify risk:
14 1 On the Origin of Risks and Extremes

The capacity to manage risk, and with it the appetite to take risk
and make forward-looking choices, are key elements [...] that drive the
economic system forward.
The concept of risks in economics and finance is elaborated in [52], starting
with the origins of the Cowles foundation as the consequence of Cowles’s
personal interest in the question: Are stock prices predictable? In the words
of J.L. McCauley (see his customer review on www.amazon.com),
this book is all about heroes and heroic ideas, and Bernstein’s heroes
are Adam Smith, Bachelier, Cowles, Markowitz (and Roy), Sharpe,
Arrow and Debreu, Samuelson, Fama, Tobin, Samuelson, Markowitz,
Miller and Modigliani, Treynor, Samuelson, Osborne, Wells-Fargo
Bank (McQuown, Vertin, Fouse and the origin of index funds), Ross,
Black, Scholes, and Merton. The final heroes (see Chap. 14, The Ul-
timate Invention) are the inventors of (synthetic) portfolio insurance
(replication/synthetic options).
One of these achievements is the capital asset pricing model (CAPM),
which is probably still the most widely used approach to relative asset val-
uation, although its empirical roots have been found to be weaker in recent
years [59, 160, 223, 287, 306, 401]. Its major idea was that priced risk cannot
be diversified and cannot be eliminated through portfolio aggregation. This
asset valuation model describing the relationship between expected risk and
expected return for marketable assets is strongly entangled with the Mean-
Variance Portfolio Model of Markowitz. Indeed, both of them fundamentally
rely on the description of the probability distribution function (pdf) of as-
set returns in terms of Gaussian functions. The mean-variance description is
thus at the basis of the Markowitz portfolio theory and of the CAPM and its
inter-temporal generalization (see for instance [359]).
The CAPM is based on the concept of economic equilibrium between ra-
tional expectation agents. Economic equilibrium is itself the self-organized
result of complex nonlinear feedback processes between competitive inter-
acting agents. Thus, while not describing the specific dynamics of how self-
organization makes the economy converge to a stable regime [10, 18, 280], the
concept of economic equilibrium describes the resulting state of this dynamic
self-organization and embodies all the hidden and complex interactions be-
tween agents with infinite loops of recurrence. This provides a reference base
for understanding risks.
We put some emphasis on the CAPM and its generalized versions because
the CAPM is a remarkable starting point for answering the question on the
origin of risks and returns: in economic equilibrium theory, the two are con-
ceived as intrinsically entangled. In the following, we expand on this class of
explanation before exploring briefly other directions.
Let us now show how an equilibrium model generalizing the original CAPM
[308, 364, 429] can be formulated on the basis of the coherence measures
1.3 Origin of Risk and Dependence 15

adapted to large risks. This provides an “explanation” for risks from the
point of view of the non-separable interplay between agents’ preferences and
their collective organization. We should stress that many generalizations have
already been proposed to account for the fat-tailness of the assets return dis-
tributions, which led to the multimoments CAPM. For instance, Rubinstein
[421], Krauss and Litzenberger [278], Lim [306] and Harvey and Siddique [223]
have underlined and tested the role of the asymmetry in the risk premium
by accounting for the skewness of the distribution of returns. More recently,
Fang and Lai [162] and Hwang and Satchell [241] have introduced a four-
moments CAPM to take into account the leptokurtic behavior of the assets
return distributions. Many other extensions have been presented such as the
VaR-CAPM [3] or the Distributional-CAPM [389]. All these generalizations
become more complicated but unfortunately do not necessarily provide more
accurate predictions of the expected returns.
Let us assume that the relevant risk measure is given by any measure of
fluctuations previously presented that obey the Axioms 6–8 of Sect. 1.2.2.
We will also relax the usual assumption of a homogeneous market to give
to the economic agents the choice of their own risk measure: some of them
may choose a risk measure which puts the emphasis on the small fluctuations,
while others may prefer those which account for the larger ones. In such an
heterogeneous market, we will recall how an equilibrium can still be reached
and why the excess returns of individual stocks remain proportional to the
market excess return, which is the fundamental tenet of CAPM.
For this, we need the following assumptions about the market:
• H1: We consider a one-period market, such that all the positions held at
the beginning of a period are cleared at the end of the same period.
• H2: The market is perfect, i.e., there are no transaction costs or taxes,
the market is efficient and the investors can lend and borrow at the same
risk-free rate µ0 .
Of course, these standard assumptions are to be taken with a grain of salt
and are made only with the goal of obtaining a normative reference theory.
We will now add another assumption that specifies the behavior of the agents
acting on the market, which will lead us to make the distinction between
homogeneous and heterogeneous markets.

Equilibrium in a Homogeneous Market

The market is said to be homogeneous if all the agents acting on this market
aim at fulfilling the same objective. This means that:
• H3-1: All the agents want to maximize the expected return of their port-
folio at the end of the period under a given constraint of measured risk,
using the same measure of risks ρζ for all of them (the subscript ζ refers
to the degree of homogeneity of the risk measure, see Sect. 1.2).
16 1 On the Origin of Risks and Extremes

In the special case where ρζ denotes the variance, all the agents follow a
Markowitz’s optimization procedure, which leads to the CAPM equilibrium,
as proved by Sharpe [429]. When ρζ represents the centered moments, this
leads to the market equilibrium described in [421]. Thus, this approach allows
for a generalization of the most popular asset pricing in equilibrium market
models.
When all the agents have the same risk function ρζ , whatever ζ may be,
we can assert that they have all a fraction of their capital invested in the same
portfolio Π (see, for instance [333] for the derivation of the composition of
the portfolio), and the remaining in the risk-free asset. The amount of capital
invested in the risky fund only depends on their risk aversion and/or on the
legal margin requirement they have to fulfill.
Let us now assume that the market is at equilibrium, i.e., supply equals
demand. In such a case, since the optimal portfolios can be any linear combi-
nations of the risk-free asset and of the risky portfolio Π, it is straightforward
to show that the market portfolio, made of all traded assets in proportion
of their market capitalization, is nothing but the risky portfolio Π. Thus, as
shown in [333], we can state that, whatever the risk measure ρζ chosen by
the agents to perform their optimization, the excess return of any asset i over
the risk-free interest rate (µ(i) − µ0 ) is proportional to the excess return of
the market portfolio Π over the risk-free interest rate:

µ(i) − µ0 = βζi · (µΠ − µ0 ), (1.28)

where

1
∂ ln ρζ ζ
βζi = , (1.29)
∂wi

w1∗ ,··· ,wN

where w1∗ , . . . , wN

are the optimal allocations of the assets in the following
sense:

 wi ∈[0,1] ρζ ({wi })
⎨ inf
wi = 1 (1.30)
⎩ i≥0
i≥0 i w µ(i) = µ ,

In other words, the set of normalized weights wi∗ define the portfolio with min-
imum risk as measured by any convex5 measure ρζ of risk obeying Axioms 6–8
of Sect. 1.2.2 for a given amount of expected return µ.
When ρζ denotes the variance, we recover the usual β i given by the mean-
variance approach:

Cov(Xi , Π)
βi = . (1.31)
Var(Π)
5
Convexity is necessary to ensure the existence and the unicity of a minimum.
1.3 Origin of Risk and Dependence 17

Thus, the relations (1.28) and (1.29) generalize the usual CAPM formula,
showing that the specific choice of the risk measure is not very important,
as long as it follows the Axioms 6–8 characterizing the fluctuations of the
distribution of asset returns.

Equilibrium in a Heterogeneous Market

Does this result hold in the more realistic situation of an heterogeneous mar-
ket? A market will be said to be heterogeneous if the agents seek to fulfill
different objectives. We thus consider the following assumption:
• H3-2: There exist N agents. Each agent n is characterized by her choice of
a risk measure ρζ (n) so that she invests only in the mean-ρζ (n) efficient
portfolios.
According to this hypothesis, an agent n invests a fraction of her wealth in
the risk-free asset and the remaining in Πn , the mean-ρζ (n) efficient portfolio,
only made of risky assets. Again, the fraction of wealth invested in the risky
fund depends on the risk aversion of each agent, which may vary from one
agent to another.
The composition of the market portfolio Π for such a heterogeneous mar-
ket is found to be nothing but the weighted sum of the mean-ρζ (n) optimal
portfolio Πn [333]:


N
Π= γn Πn , (1.32)
n=1

where γn is the fraction of the total wealth invested in the fund Πn by the
nth agent.
Moreover, for every asset i and for any mean-ρζ (n) efficient portfolio Πn ,
for all n, the following equation holds

µ(i) − µ0 = βni · (µΠn − µ0 ) , (1.33)

where βni is defined in (1.29). Multiplying these equations by γn /βni , we get


γn
· (µ(i) − µ0 ) = γn · (µΠn − µ0 ) , (1.34)
βni

for all n, and summing over the different agents, we obtain


 
 γn 
· (µ(i) − µ0 ) = γn · µΠn − µ0 , (1.35)
n
βni n

so that

µ(i) − µ0 = β i · (µΠ − µ0 ) , (1.36)


18 1 On the Origin of Risks and Extremes

with
 −1
 γn
βi = . (1.37)
n
βni

This allows us to conclude that, even in a heterogeneous market, the expected


excess return of each individual stock is directly proportional to the expected
excess return of the market portfolio, showing that the homogeneity of the
market is not required for observing a linear relationship between individual
excess asset returns and the market excess return.
The above calculations miss the possibility stressed by Rockafellar et al.
[408] that two kinds of efficient portfolios Πn may exist in a heterogeneous
market: long optimal portfolios which correspond to a net long position, and
short optimal portfolios which correspond to a net short position. If the exis-
tence of the second kind of portfolio is not compatible with an equilibrium in
a homogeneous market,6 their existence is not precluded in a heterogeneous
market. Indeed, the net short positions of a certain class of agents can be
compensated by the net long position of another class of agents. Thus, as long
as a market portfolio Π corresponding to an overall long position exists, an
equilibrium can be reached, and the results derived in this section still hold.

1.3.2 The Arbitrage Pricing Theory (APT)


and the Fama–French Factor Model

The CAPM proposed a solution for what Roll [414] called


perhaps the most important unresolved problem in finance, because
it influences so many other problems, (which) is the relation between
risk and return. Almost everyone agrees that there should be some
relation, but its precise quantification has proven to be a conundrum
that has haunted us for years, embarrassed us in print, and caused
business practitioners to look askance at our scientific squabbling and
question our relevance.
Indeed, past and recent tests cast strong doubts on the validity of the CAPM.
The recent Fama–French analysis [160] shows basically no support for the
CAPM’s central result of a positive relation between expected return and
global market risk (quantified by the so-called beta parameter). In contrast,
other variables, such as market capitalization and the book-to-market ratio,7
present some weak explanatory power.
6
An equilibrium cannot be reached if all investors want to sell stocks.
7
Ratio of the book value of a firm to its market value. Typically, the book-to-
market is used to identify undervalued companies. If the book-to-market is less
than one the stock is overvalued, while it is undervalued otherwise.
1.3 Origin of Risk and Dependence 19

The Arbitrage Pricing Theory (APT)

The empirical inadequacy of the CAPM has led to the development of more
general models of risk and return, such as Ross’s Arbitrage Pricing Theory
(APT) [418]. Quoting Sargent [427],
Ross posited a particular statistical process for asset returns, then de-
rived the restrictions on the process that are implied by the hypothesis
that there exist no arbitrage possibilities.
Like the CAPM, the APT assumes that only non-diversifiable risk is priced.
But it differs from the CAPM by accounting for multiple causes of such risks
and by assuming a sufficiently large number of such factors so that almost
riskless portfolios can be constructed. Reisman recently presented a general-
ization of the APT showing that, under the assumption that there exists no
asymptotic arbitrage (i.e., in the limit of a large number of factors, the market
risk can be decreased to almost zero), there exists an approximate multi-beta
pricing relationship relative to any admissible proxy of dimension equal to the
number of factors [402]. Unlike the CAPM which specifies returns as a linear
function of only systematic risk, the APT is based on the well-known obser-
vations that multiple factors affect the observed time series of returns, such as
industry factors, interest rates, exchange rates, real output, the money sup-
ply, aggregate consumption, investor confidence, oil prices, and many other
variables [414]. However, while observed asset prices respond to a wide variety
of factors, there is much weaker evidence that equities with larger sensitivity
to some factors give higher returns, as the APT requires.

The Fama–French Three Factor Model

This empirical weakness in the APT has led to further generalizations of


factor models, such as the Fama–French three-factor model [160], which does
not use an arbitrage condition anymore. Fama and French started with the
observation that two classes of stocks show better returns than the average
market: (1) stocks with small market capitalization (“small caps”) and (2)
stocks with a high book-value-to-price ratio (often “value” stocks as opposed
to “growth” stocks). They added the overall market return to obtain the
three factors: (i) the overall market return (Rm), (ii) the performance of small
stocks relative to big stocks (SMB, small minus big), and (iii) the performance
of value stocks relative to growth stocks (HML, high minus low). See the
website of Professor K.R. French8 which updates every quarter the benchmark
factors and also presents the performance of several benchmark portfolios
using different combinations of weights on the three factors. An important
observation must be made concerning Fama and French’s approach to risk in
8
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data library.
html
20 1 On the Origin of Risks and Extremes

their factor decomposition: they still see, as in the CAPM and APT, a large
return as a reward for taking a high risk. For instance suppose that returns are
found to increase with book/price. Then those stocks with a high book/price
ratio must be more risky than average. This is in a sense the opposite to the
traditional interpretation of a financial professional analyst, who would say
that high book/price indicates a buying opportunity because the stock looks
cheap. In contrast, according to the efficient market theory, a stock, which is
cheap, can only be so because investors think it is risky.
Actually, the relationship between return and risk is not automatically
positive. Diether et al. [124] have recently documented that firms with more
uncertain earnings (as measured by the dispersion of analysts’ forecasts) have
smaller stock returns. As stressed by Johnson [255], this finding is important
because it directly links asset returns with information, but the relation is
apparently in contradiction with standard economic wisdom: the larger the
risks, the smaller the return! Actually, Johnson proposes a simple explanation
reconciling this new anomaly with the standard asset pricing theory, which is
based on the following ingredients: (i) the equity value of the leveraged firm
(i.e., with non-zero debt) is equivalent to a call option on the firm’s value,
following Merton’s model of credit risk [358]; (ii) the dispersion of analysts’
forecasts is a measure of idiosyncratic risk, which is not priced. Then, by
the Black–Merton–Scholes price for the equity-call option, the firm expected
excess return (i.e., relative variation of the equity price) has its risk premium
amplified by a factor reflecting the effective exposure of the equity price to
the real firm value. This factor turns out to decrease with increasing volatility,
because more unpriced risk raises the option value, which has the consequence
of lowering its exposure to priced risks. It is important to stress that this
effect increases with the firm leverage and vanishes if the firm has no debt,
as verified empirically with impressive strength in [255]. This new anomaly is
thus fundamentally due to the impact of the volatility in the option pricing
of the firm equity value in the presence of debt, together with the existence
of a non-priced component of volatility.

1.3.3 The Efficient Market Hypothesis

The efficient market hypothesis (EMH) has a long history in finance and offers
a mechanism for the trade-off between risk and return [158, 159]. Similarly to
the concept of economic equilibrium, it must be understood as the result of
repetitive feedback interactions between investors, and thus provides a top–
down answer to the question on the origin of risk and return.

Origin of Possible Efficiency of Stock Markets

Roll uses an illuminating biological analogy to explain the principle leading to


the EMH, in terms of the model of the hawks and the doves [414], which has
been introduced to illustrate the concept of an evolutionary stable equilibrium
1.3 Origin of Risk and Dependence 21

(see also [118] for a seminal presentation of the concept of an evolutionary


stable equilibrium in the genetic and biological context):
Biologists note that competition for food results in a stable evolution-
ary equilibrium characterized by multiple strategies. When competi-
tors meet at a food site, they can either fight over the prize and risk
injury – the “hawk” strategy – or withdraw and lose the food – the
“dove” strategy. If every individual fights, a mutant who withdraws
would eventually have a greater probability of procreating than the
average fighter because of the risk of injury and the fact that only one
fighter can win. (The dove occasionally finds uncontested food.) On
the other hand, if every individual followed the dove strategy, a single
fighter would gain a lot of food. The evolutionary equilibrium can be
shown to involve either (a) part of the population always follows the
hawk strategy and the complementary part follows the dove strategy
or (b) every individual follows a randomized strategy, sometimes be-
having as a hawk and sometimes as a dove. We can definitely rule out
a world in which everyone follows the same fixed strategy.
The analogy to market efficiency is immediate: investors compete for
the most “undervalued” asset. The hawk strategy is conducting se-
curity analysis. The dove strategy is passive investing: expending no
effort on information analysis. Clearly, if everyone analyses securities,
the benefits will be less than the costs. If everyone is passive, the
benefits of analysis will be tremendous. The equilibrium is that some
analyze, some don’t. Does it sound familiar? Note that the final equi-
librium is characterized by a situation in which it is not worthwhile for
the marginal passive investor to begin analyzing nor for the marginal
active investor to cease conducting security analysis.
The EMH is an idealization of a self-consistent dynamical state of the
market resulting from the incessant actions of the traders (arbitragers). It
is not the out-of-fashion equilibrium approximation sometimes described but
rather embodies a very subtle cooperative organization of the market. A grow-
ing number of academic studies and many practitioners have questioned the
EMH on the basis of the non-rationality of individuals. Studies in psychology
and behavioral sciences show indeed that people cannot be represented faith-
fully by the Von Neumann/Morgenstern axioms of expected utility, especially
in their limited intelligence, partial memory of the past and finite processing
abilities, in their overconfidence and their biased assessments of probabilities
[469]. However, interestingly, there are many works that demonstrate that
“zero-intelligence” agents (to use the term of Farmer et al. [166]), who are
very inefficient individually, often collectively provide efficient solutions. In-
deed, the relevant question for understanding stock markets is not so much
to focus on these irrationalities but rather to study how they aggregate in
the complex, long-lasting, repetitive, and subtle environment of the market.
This extension requires to abandon the emphasis on the description of the in-
22 1 On the Origin of Risks and Extremes

dividual in favor of the search for emerging collective behaviors. Three fields
of research highlight this idea and suggest a reconciliation, while enlarging
significantly the perspective of the EMH.

Collective Phenomena in Statistical Physics

In statistical physics, the fight between order (through the interaction between
elementary constituents of matter) and disorder (modeled by thermal fluctu-
ations) gives rise to the spontaneous occurrence of “spontaneous symmetry
breaking” also called phase transitions in this context [451]. The understand-
ing of the large-scale organization as well as the sudden macroscopic changes
of organization due to small variations of a control parameter has led to power-
ful concepts such as “emergence” [9]: the macroscopic organization has many
properties not shared by its constituents. For the market, this suggests that
its overall properties can only be understood through the study of the trans-
formation from the microscopic level of individual agents to the macroscopic
level of the global market. In statistical physics, this can often be performed
by the very powerful tool called the “renormalization group” [490, 489].

Collective Phenomena in Biological Systems

Biology has clearly demonstrated that an organism has greater abilities than
its constituent parts. This is true for multiorgan animals as well as for insect
societies for instance (see E. O. Wilson’s book [488]). More recently, this has
led to the concept of “swarm intelligence” [67, 68, 70, 135]: the collective be-
haviors of (unsophisticated) agents interacting locally with their environment
may cause coherent functional global patterns to emerge. Swarm intelligence
is being used to obtain collective (or distributed) problem solving without cen-
tralized control or the provision of a global model in many practical industrial
applications [69]. The importance of evolution, competition, and ecologies to
understand stock markets has been stressed by Farmer [164].

Collective Phenomena in Agent-Based Models

Agent-based models (also called multi-agent games) are composed of collec-


tions of synthetic, autonomous, interacting entities. They are used to explore
how structure and interactions control the emergence of macroscopic behav-
iors [24]. Ultimately, the goal is to produce faithful synthetic models of reality
by capturing the salient structure and strategies of real agents. The so-called
minority game is perhaps the simplest in the class of multi-agent games of
interacting inductive agents with limited abilities competing for scarce re-
sources. Many published works on minority game have motivated their study
by their relevance to financial markets, because investors exhibit a large het-
erogeneity of investment strategies, investment horizons, risk aversions and
1.3 Origin of Risk and Dependence 23

wealth, and have limited resources and time to dedicate to novel strategies,
and the minority mechanism is found in markets. For an introduction to the
Minority Game see [92, 251] and the Web page on the Minority Game by D.
Challet at www.unifr.ch/econophysics/minority/minority.html. An important
outcome of this work is the discovery of different market regimes, depending
on the value of a control parameter, roughly defined as the ratio of the num-
ber of effective strategies available to agents divided by the total number of
agents. In the minority game, agents choose their strategies according to the
condition of the market so as on average to minimize their chance of being
in the majority. When the “control” parameter is large, the recent history of
the game contains some useful information that strategies can exploit and the
market is not efficient. Below a critical value of the control parameter (i.e.,
for sufficiently many agents), reasonable measures of predictability suggest
that the market is efficient and cannot be predicted. These two phases are
characterized by different risks, which can be quantified as a function of the
control parameter. However, even in the “efficient market” phase, large and
extreme price moves occur, which may be preceded by distinct patterns that
allow agents in some cases to forecast them [289, 7].

Self-Organization During Bubbles and Crashes

A particular type of organization which requires special mention in this book


is found in the occurrence of crashes. Market crashes exemplify in a dramatic
way the spontaneous emergence of extreme events in self-organizing systems.
Stock market crashes are indeed remarkable vehicles of important ideas needed
to deal and cope with our risky world, as explained in [450]. By studying the
frequency distribution of drawdowns, or runs of successive losses, Johansen
and Sornette have shown that large financial crashes are “outliers” [249]: they
form a class of their own which is characterized by its specific statistical signa-
tures. An important consequence derives from this property: if large financial
crashes are “outliers,” they are special and thus requires a special explanation,
a specific model, a theory of their own. In addition, their special properties
may perhaps be used for their prediction. The main mechanism at work in
bubbles and then in their destabilization during crashes is the existence of pos-
itive feedbacks, i.e., self-reinforcement. Positive feedbacks have many sources
both technical and behavioral, a dominant one being imitative behavior and
herding between investors [450], which has been associated with behavioral
“irrational exuberance” [438]. Positive feedbacks provide the fuel for the de-
velopment of speculative bubbles, preparing the instability for a major crash.
The understanding of financial bubbles and crashes requires a synthesis be-
tween the theory of collective behavior combined with the economic theory
of anticipating agents who can change the future by their forecasts and the
actions based on them. During a time of market instabilities, the tools of eco-
nomic and financial theory break down; for instance, the idea of portfolio in-
surance breaks down as no portfolio can be perfectly insured against extreme
24 1 On the Origin of Risks and Extremes

deviations, especially those that occurred in October 1987 and wiped out
confidence in the methods of so-called portfolio insurance of Leland-O’ Brien-
Rubinstein Associates. Similarly, the assumptions of near-normal distributions
and stable covariance broke down during the failure of LTMC (Long-Term
Capital Management) in October 1998 [394].

1.3.4 Emergence of Dependence Structures


in the Stock Markets

Factors and Large Eigenvalues of Correlation Matrices

As mentioned above, factor models are nowadays the approaches most often
used for extracting regularities in and for explaining the vagaries of stock mar-
ket prices. Factor models conceptually derive from and generalize the CAPM
and APT models. Factors, which are often invoked to explain prices, are the
overall market factor and the factors related to firm size, firm industry and
book-to-market equity, thought to embody most of the relevant dependence
structure between the studied time series [160, 161]. Indeed, there is no doubt
that observed equity prices respond to a wide variety of unanticipated fac-
tors, but there is much weaker evidence that expected returns are higher for
equities that are more sensitive to these factors, as required by Markowitz’s
mean-variance theory, by the CAPM and the APT [414]. This severe failure
of the most fundamental finance theories could conceivably be attributed to
an inappropriate proxy for the market portfolio, but nobody has been able
to show that this is really the correct explanation. This remark constitutes
the crux of the problem: the factors invoked to model the cross-sectional de-
pendence between assets are not known in general and are either postulated
based on the economic intuition in financial studies, or obtained as black-box
results in the recent analyses using the random matrix theory to large finan-
cial covariance matrices [392, 288]. In other words, explanatory factors emerge
endogenously.
Here, we follow [337] to show that the existence of factors have a natural
bottom-up explanation: they can be seen to result from a collective effect of
the assets, similar to the emergence of a macroscopic self-organization of in-
teracting microscopic constituents. To show this, we unravel the origin of the
large eigenvalues of large covariance and correlation matrices and provide a
complete understanding of the coexistence of features resembling properties
of random matrices and of large “anomalous” eigenvalues. The main insight
here is that, in any large system possessing non-vanishing average correlations
between a finite fraction of all pairs of elements, a self-organized macroscopic
state generically exists. In other words, “explanatory” factors emerge endoge-
nously.
1.3 Origin of Risk and Dependence 25

Derivation of the Largest Eigenvalues

Let us first consider a large basket of N assets with correlation matrix C


in which every non-diagonal pair of elements exhibits the same correlation
coefficient Cij = ρ for i
= j and Cii = 1. Its eigenvalues are

λ1 = 1 + (N − 1)ρ and λi≥2 = 1 − ρ (1.38)

with multiplicity N − 1 and with ρ ∈ (0, 1) in order for the correlation ma-
trix to remain positive definite. Thus, in the large size limit N → ∞, even
for a weak positive correlation ρ → 0 (with ρN 1), a very large eigen-
value appears, associated with the “delocalized”
√ (i.e., uniformly spread over
all components) eigenvector v1 = (1/ N )(1, 1, · · · , 1), which dominates com-
pletely the correlation structure of the system. This trivial example stresses
that the key point for the emergence of a large eigenvalue is not the strength
of the correlations, provided that they do not vanish, but the large size N of
the system.
This result (1.38) still holds qualitatively when the correlation coefficients
are all distinct. To see this, it is convenient to use a perturbation approach.
We thus add a small random component to each correlation coefficient:

Cij = ρ +  · aij for i


= j , (1.39)

where the coefficients aij = aji have zero mean, variance σ 2 and are inde-
pendently distributed (there are additional constraints on the support of the
distribution of the aij ’s in order for the matrix Cij to remain positive definite
with probability one). The determination of the eigenvalues and eigenevectors
of Cij is performed using the perturbation theory up to the second order in
. We find that the largest eigenvalue satisfies
(N − 1)(N − 2) 2 σ 2
E[λ1 ] = (N − 1)ρ + 1 + · + O(3 ) (1.40)
N2 ρ
while, at the same order, the corresponding eigenvector v1 remains unchanged.
The degeneracy of the eigenvalue λ = 1 − ρ is broken and leads to a complex
set of smaller eigenvalues described below.
In fact, this result (1.40) can be generalized to the non-perturbative do-
main of any correlation matrix with independent random coefficients Cij ,
provided that they have the same mean value ρ and variance σ 2 . Indeed, in
such a case, the expectations of the largest and second largest eigenvalues are
[180]
E[λ1 ] = (N − 1) · ρ + 1 + σ 2 /ρ + o(1) , (1.41)

E[λ2 ] ≤ 2σ N + O(N 1/3 log N ) . (1.42)
Moreover, the statistical fluctuations of these
√ two largest eigenvalues are as-
ymptotically (for large fluctuations t > O( N )) bounded by a Gaussian term
according to the following large deviation result
26 1 On the Origin of Risks and Extremes

Pr{|λ1,2 − E[λ1,2 ]| ≥ t} ≤ e−c1,2 t ,


2
(1.43)

for some positive constant c1,2 [279]. Numerical simulations of the distribution
of eigenvalues of a random correlation matrix confirm indeed that the largest
eigenvalue is indeed proportional to N , while the bulk of the eigenvalues are
much smaller and are described by a modified semicircle law [357] centered
on λ = 1 − ρ, in the limit of large N .
This result is very different from that obtained when the mean value ρ
vanishes. In such a case, the distribution of eigenvalues of the random matrix
C is given by the semicircle law [357]. However, due to the presence of the ones
on the main diagonal of the correlation matrix C, the center of the circle is not
at the origin but at the point λ = 1. Thus, the distribution of the eigenvalues
of random correlation √matrices with zero mean correlation coefficients is a
semicircle of radius 2σ N centered at λ = 1.
The result (1.41) is deeply related to the so-called friendship theorem
in mathematical graph theory, which states that, in any finite graph such
that any two vertices have exactly one common neighbor, there is one and
only one vertex adjacent to all other vertices [155]. A more heuristic but
equivalent statement is that, in a group of people such that any pair of persons
have exactly one common friend, there is always one person (the “politician”)
who is the friend of everybody. Consider the matrix C with its non-diagonal
entries Cij (i
= j) equal to Bernoulli random variable with parameter ρ, that
is, P r[Cij = 1] = ρ and P r[Cij = 0] = 1 − ρ. Then, the matrix Cij − I,
where I is the unit matrix, becomes nothing but the adjacency matrix of the
random graph G(N, ρ) [279]. The proof of [155] of the “friendship theorem” √
indeed relies on the N -dependence of the largest eigenvalue and on the N -
dependence of the second largest eigenvalue of Cij as given by (1.41) and
(1.42).
Figure 1.1 shows the distribution of eigenvalues of a random correlation
matrix. The inset shows the largest eigenvalue lying at the predicted size
ρN = 56.8, while the bulk of the eigenvalues are much smaller and are de-
scribed by a modified semicircle law centered on λ = 1 − ρ, in the limit of
large N . This result, on the largest eigenvalue emerging from the collective
effect of the cross-correlation between all N (N − 1)/2 pairs, provides a novel
perspective to the observation [40, 413] that the only reasonable explanation
for the simultaneous crash of 23 stock markets worldwide in October 1987
is the impact of a world market factor: according to the results (1.41) and
(1.42) and the view expounded by Fig. 1.1, the simultaneous occurrence of
significant correlations between the markets worldwide is bound to lead to
the existence of an extremely large eigenvalue, the world market factor con-
structed by... a linear combination of the 23 stock markets! What this result
shows is that invoking factors to explain the cross-sectional structure of stock
returns is cursed by the chicken-and-egg problem: factors exist because stocks
are correlated; stocks are correlated because of common factors impacting
them.
1.3 Origin of Risk and Dependence 27

12
12
10
10
8
6 largest
eigenvalue
8 4
2

6 0
0 20 40 60

0
0 0.5 1 1.5 2 2.5 3
Fig. 1.1. Spectrum of eigenvalues of a random correlation matrix with average
correlation√coefficient ρ = 0.14 and standard deviation of the correlation coefficients
σ = 0.345 N : the ordinate is the number of eigenvalues in a bin with value given by
the abscissa. One observes that all eigenvalues except the largest one are smaller than
or equal to ≈ 1.5. The size N = 406 of the matrix is the same as in previous studies
[392] for the sake of comparison. The continuous curve is the theoretical translated
semicircle distribution of eigenvalues describing the bulk of the distribution which
passes the Kolmogorov test. The center value λ = 1 − ρ ensures the conservation
of the trace equal to N . There is no adjustable parameter. The inset represents
the whole spectrum with the largest eigenvalue whose size is in agreement with the
prediction ρN = 56.8. Reproduced from [337]

Generalization to a Segmented Market


with Different Coupled Industries

Empirically [392, 288], a few other eigenvalues below the largest one have an
amplitude of the order of 5–10 that deviate significantly from the bulk of the
distribution. The above analysis provides a very simple mechanism for them,
justifying the postulated model in [373]. The solution consists in considering,
as a first approximation, the block diagonal matrix C  with diagonal  ele-
ments made of the matrices A1 , · · · , Ap of sizes N1 , · · · , Np with Ni = N ,
constructed according to (1.39) such that each matrix Ai has the average
correlation coefficient ρi . When the coefficients of the matrix C  outside the
matrices Ai are zero, the spectrum of C  is given by the union of all the spec-
tra of the Ai ’s, which are each dominated by a large eigenvalue λ1,i ρi · Ni .
28 1 On the Origin of Risks and Extremes

The spectrum of C  then exhibits p large eigenvalues. Each block Ai can be


interpreted as a sector of the economy, including all the companies belonging
to a same industrial branch and the eigenvector associated with each largest
eigenvalue represents the main factor driving this sector of activity [343, 349].
For similar sector sizes Ni and average correlation coefficients ρi , the largest
eigenvalues are of the same order of magnitude. In addition, a very large
unique eigenvalue is obtained by introducing some coupling constants outside
the block diagonal matrices. A well-known result of the perturbation the-
ory states that such coupling leads to a “repulsion” between the eigenvalues,
which can be observed in Fig. 1.2 where C  has been constructed with three
block matrices A1 , A2 , and A3 and non-zero off-diagonal coupling described
in the figure caption. These values allow to quantitatively replicate the em-
pirical finding of Laloux et al. in [392], where the three first eigenvalues are
approximately λ1 57, λ2 10 and λ3 8.

14
15
12
frequency

10

10 largest
5 eigenvalue
frequency

8
0
0 20 40 60
6 λ

0
0 0.5 1 1.5 2 2.5 3
λ

Fig. 1.2. Spectrum of eigenvalues estimated from the sample correlation matrix
of N = 406 time series of length T = 1309. The times series have been constructed
from a multivariate Gaussian distribution with a correlation matrix made of three
block-diagonal matrices of sizes respectively equal to 130, 140, and 136 and mean
correlation coefficients equal to 0.18 for all of them. The off-diagonal elements are
all equal to 0.1. The same results hold if the off-diagonal elements are random. The
inset shows the existence of three large eigenvalues, which result from the three-block
structure of the correlation matrix. Reproduced from [337]
1.3 Origin of Risk and Dependence 29

Expressions (1.40,1.41) and numerical tests for a large variety


√ of correlation
matrices show that the equally weighted eigenvector v1 = (1/ N )(1, 1, . . . , 1),
associated with the largest eigenvalue is extremely robust and remains (on
average) the same for any large system. Thus, even for time-varying correlation
matrices, which is the result of heteroscedastic effects, the composition of
the main factor remains almost the same. This can be seen as a generalized
limit theorem reflecting the bottom-up organization of broadly correlated time
series.

1.3.5 Large Risks in Complex Systems

These calculations show that an endogenous small positive correlation between


all stock-pairs gives rise to large eigenvalues which can then be associated with
“market factors.” It seems that earlier researches have promoted the other way
around: existing market factors (stock indices, news agencies, etc.) introduce
exogenous market impact which affect different stocks similarly, thereby in-
troducing positive correlation and thus large eigenvalues. This is clear from
the general formulation of (linear) factor models such as the CAPM, APT,
and Fama–French approaches in which the returns of all stocks are regressed
against the same set of factors. Actually, we propose that the two chains of
cause and result may be intrinsically coupled: the correlation structure be-
tween stocks is a stable attractor of a self-organized dynamics with positive
and negative feedbacks in which factors exist because correlations exist, and
correlations exist because factors exist. It would suggest the development of
dynamical factor models, in which agents form anticipations on correlations
based on their calibration of the past behavior of the regression to factors,
in order to study the possible types of attractors (single or multiple equilib-
ria) in the correlation structure of stocks. This may cast new light on the
major unsolved problem stated in the introduction of this chapter concerning
the relationship between return and risks: perhaps, the concept of return as
the remuneration of risk which is so fundamental in financial theory should
be replaced by the concept of the emergence of the risk-return duality, in
which their relationship can be negative or positive, depending upon circum-
stances that remain to be worked out. Moreover, simulations of complex self-
organizing systems show that large fluctuations and extreme variations are
the rule rather than the exception.
The complex system approach, which involves seeing interconnections and
relationships, i.e., the whole picture as well as the component parts, is nowa-
days pervasive in modern control of engineering devices and business manage-
ment. A central property of a complex system is the possible occurrence of
coherent large-scale collective behaviors with a very rich structure, resulting
from the repeated non-linear interactions among its constituents: the whole
turns out to be much more than the sum of its parts. Most complex systems
around us do exhibit rare and sudden transitions that occur over time in-
tervals that are short compared with the characteristic time scales of their
30 1 On the Origin of Risks and Extremes

posterior evolution. Such extreme events express more than anything else the
underlying forces usually hidden by almost perfect balance and thus pro-
vide the potential for a better scientific understanding of complex systems.
These crises have fundamental societal impacts and range from large nat-
ural catastrophes, catastrophic events of environmental degradation, to the
failure of engineering structures, crashes in the stock market, social unrest
leading to large-scale strikes and upheaval, economic drawdowns on national
and global scales, regional power blackouts, traffic gridlocks, diseases and epi-
demics, etc. An outstanding scientific question is how such large-scale patterns
of catastrophic nature might evolve from a series of interactions on the small-
est and increasingly larger scales. In complex systems, it has been found that
the organization of spatial and temporal correlations do not stem, in general,
from a nucleation phase diffusing across the system. It results rather from a
progressive and more global cooperative process occurring over the whole sys-
tem by repetitive interactions, which is partially described by the distributed
correlations at the origin of a large eigenvalue as described above. An instance
would be the many occurrences of simultaneous scientific and technical discov-
eries signaling the global nature of the maturing process. Recent developments
suggest that non-traditional approaches, based on the concepts and methods
of statistical and nonlinear physics coupled with ideas and tools from com-
putation intelligence could provide novel methods in complexity to direct the
numerical resolution of more realistic models and the identification of rele-
vant signatures of large and extreme risks. To address the challenge posed by
the identification and modeling of such outliers, the available theoretical tools
comprise in particular bifurcation and catastrophe theories, dynamical critical
phenomena and the renormalization group, nonlinear dynamical systems, and
the theory of partially (spontaneously or not) broken symmetries. This field
of research is presently very active and is expected to advance significantly
our understanding, quantification, and control of risks.
In the mean time, both practitioners and academics need reliable metrics
to characterize risks and dependences. This is the purpose of the following
chapters, which expose powerful models and measures of large risks and com-
plex dependences between time series.

Appendix

1.A Why Do Higher Moments Allow us to Assess Larger Risks?

As asserted in the main body of this chapter, the complete description of


the fluctuations of an asset or a portfolio at a fixed time scale is given by
the knowledge of the probability density function (pdf) of its return. The pdf
encompasses all the risk dimensions associated with this asset. Unfortunately,
it is impossible to classify or order the risks described by the entire pdf,
except in special cases where the concept of stochastic dominance applies.
Appendix 31

Therefore, the whole pdf cannot provide an adequate measure of risk, which
should be embodied by a single variable. In order to perform a selection among
a basket of assets and construct optimal portfolios, one needs measures given
as real numbers, not functions, which can be ordered according to the natural
ordering of real numbers on the line.
In this vein, Markowitz [347] has proposed to summarize the risk of an
asset by the variance of its returns (or equivalently by the corresponding
standard deviation). It is clear that this description of risks is fully satisfying
only for assets with Gaussian pdfs. In any other case, the variance generally
provides a very poor estimate of the real risk. Indeed, it is a well-established
empirical fact that the pdfs of asset returns have fat tails (see Chap. 2),
so that the Gaussian approximation underestimates significantly the large
price movements frequently observed on stock markets (see Fig. 2.1). Conse-
quently, the variance cannot be taken as a suitable measure of risks, since it
only accounts for the smallest contributions to the fluctuations of the asset’s
returns.
The variance of the return X of an asset involves its second moment E[X 2 ]
and, more precisely,
 is equal to its second centered moment (or moment about
2
the mean) E (X − E[X]) . Thus, the weight of a given fluctuation X con-
tributing to the variance of the returns is proportional to its square. Due to
the decay of the pdf of X for large X bounded from above by ∼ 1/|X|1+µ with
µ > 2 (see Chap. 2), the largest fluctuations do not contribute significantly to
this expectation. To increase their contributions, and in this way to account
for the largest fluctuations, it is natural to invoke moments of order n higher

5
x P(x)
4.5
x2 P(x)
4 x4 P(x)
3.5
xn P(x)

3
2.5
2
1.5
1
0.5
0
0 1 2 3 4 5 6 7 8 9 10
x
Fig. 1.3. This figure represents the function xn · e−x for n = 1, 2, and 4 and shows
the typical size of the fluctuations involved in the moment of order n. Reproduced
from [333]
32 1 On the Origin of Risks and Extremes

than 2. The larger n is, the larger is the contribution of the rare and large
returns in the tail of the pdf. This phenomenon is demonstrated in Fig. 1.3,
where we can observe the evolution of the quantity xn ·f (x) for n = 1, 2, and 4,
where f (x), in this example, denotes the density of the standard exponential
distribution e−x . The expectation E[X n ] is then simply represented geomet-
rically as equal to the area below the curve xn · f (x). These curves provide an
intuitive illustration of the fact that the main contributions to the moment
E[X n ] of order n come from values of X in the vicinity of the maximum of
xn · f (x), which increases fast with the order n of the moment we consider,
all the more so, the fatter is the tail of the pdf of the returns X. In addition,
the typical size of the return assessed by the moment of order n is given by
1/p
λn = E [X p ] (which coincide with the Lp norm of X, for positive random
variables). For the exponential distribution chosen to construct Fig. 1.3, the
value of x corresponding to the maximum of xn · f (x) is exactly equal to n,
while λn = ne + O(ln n). Thus, increasing the order of the moment allows one
to sample larger fluctuations of the asset prices.
2
Marginal Distributions of Returns

2.1 Motivations

As discussed in Chap. 1, the risks of a portfolio of N assets are fully charac-


terized by the (possibly time-dependent) multivariate distribution of returns,
which is the joint probability of any given realization of the N asset returns.
For Gaussian models, this requires only the estimation of the average returns
and of their covariance matrix. However, there is no doubt anymore that the
Gaussian model is an inadequate description of real financial data (see for
instance Fig. 2.1): the tails of the distributions are much fatter than Gaussian
and the dependence between assets is not fully captured by the sole covariance
matrix. The calibration and tests of multivariate models as well as their use
for derivative pricing, portfolio analysis, and optimization are thus daunting
tasks, characterized by the “curse of dimensionality.”
The present book is constructed upon the foundation offered by the math-
ematical theory of copulas: as shown in Chap. 3 and used in subsequent chap-
ters, any multivariate distribution can be uniquely decomposed into a part
(the copula) capturing the intrinsic dependence between the assets and an-
other part quantifying the risks embodied in the marginal distributions. In
this representation, the information contained in a multivariate distribution
of asset returns is thus decomposed in two sets: the intrinsic dependence and
the marginals. Portfolio risks result from the multivariate composition of both
the risks embedded in the marginals and the risks due to dependence, as well-
known since Markowitz’s mean-variance portfolio theory. Diversification of
risks may then result from two mechanisms (working independently or in con-
junction): (i) the law of large numbers (the larger the number of assets, the
smaller the relative amplitude of the fluctuations of the total value relative to
its mean) and (ii) anticorrelations (two assets whose prices tend to move in
opposite directions give a lower risk when combined in a portfolio). The former
mechanism often dominates in large portfolios while the second mechanism is
at the basis of derivative hedging.
34 2 Marginal Distributions of Returns

4
3
2
1
1000 2000 3000 4000 5000 6000 7000

2
10

0
10
−10 −5 0 5 10
10 −10

8 −8
1000
6 −6

4 −4 2000
JPY (r )

2 −2
2

3000
0 0
4000
−2 2

−4 4 5000
−6 6
6000
−8 8

−10 10 7000
−10 −5 0 5 10 0 2 100 200 300
10 10
CHF (r1)

Fig. 2.1. Bivariate distribution of the daily annualized returns ri of the Swiss franc
(CHF) in US $ (i = 1) and of the Japanese yen (JPY) in US $ (i = 2) for the time
interval from January 1971 to October 1998. One-fourth of the data points are rep-
resented for clarity of the figure. The contour lines are obtained by smoothing the
empirical bivariate density distribution and represent equilevels. The outer (respec-
tively middle and inner) line is such that 90% (respectively 50% and 10%) of the
total number of data points fall within it. It is apparent that the data is not described
by an elliptically contoured pdf as it should be if the dependence was prescribed by
a Gaussian (or more generally by an elliptic) distribution. Instead, the contour line
takes the shape of a “bean”. Also shown are the price-time series and the marginal
distributions (in log-linear scales) in the panels at the top and on the side. The
parabolas in thick lines correspond to the best fits by Gaussian distributions. The
thin lines correspond to the best fits by stretched exponentials ∼ exp[−(ri /r0i )ci ]
with exponents c1 = 1.14 for CHF and c2 = 0.8 for JPY. Reproduced from [457]

In the present chapter, we review the knowledge accumulated on the char-


acterization of marginal distributions of asset returns. This knowledge com-
bined with adequate representations of the dependence structure between as-
sets described in the following chapters can then be used to fully define the
multivariate risks. The present chapter thus reviews the bricks of individual
2.1 Motivations 35

asset risks which can then be combined with the help of copulas to build the
multivariate risk edifice. The emphasis is put on the determination of the pre-
cise shape of the tail of the distribution of returns of a given asset, which is a
major issue both from a practical and from an academic point of view. Indeed,
for practitioners, it is crucial to accurately estimate the high and low quan-
tiles of the distribution of returns (profit and loss) because they are involved
in almost all the modern risk management methods while from an academic
perspective, many economic and financial theories rely on a specific parame-
terization of the distributions whose parameters are intended to represent the
“macrovariables” influencing the agents.
For the purpose of practical market risk management, one typically needs
to assess tail risks associated with the distribution of returns or profit and
losses. Following the recommendations of the BIS,1 one has to focus on risks
associated with positions held for 10 days. Therefore, this requires to consider
the distributions of 10-day returns. However, at such a large time scale, the
number of (non-overlapping) historical observations dramatically decreases.
Even over a century, one can only collect 2500 data points, or so, per asset.
Therefore, the assessment of risks associated with high quantiles is particularly
unreliable.
Recently, the use of high frequency data has allowed for an accurate es-
timation of the very far tails of the distributions of returns. Indeed, using
samples of one to 10 million points enables one to efficiently calibrate prob-
ability distributions up to probability levels of order 99.9995%. Then, one
can hope to reconstruct the distribution of returns at a larger time scale by
convolution. It is the stance taken by many researchers advocating the use of
Lévy processes to model the dynamics of asset prices [109, 196, and references
therein]. The recent study by Eberlein and Özkan [141] shows the relevance of
this approach, at least for fluctuations of moderate sizes. However, for large
fluctuations, this approach is not really accurate, as shown in Fig. 2.2, which
compares the probability density function (pdf) of raw 60-minute returns of
the Standard & Poor’s 500 index with the hypothetical pdf obtained by 60
convolution iterates of the pdf of the 1-minute returns; it is clear that the
former exhibits significantly fatter tails than the latter.
This phenomenon derives naturally from the fact that asset returns can-
not be merely described by independent random variables, as assumed when
prices are modeled by Lévy processes. In fact, independence is too strong an
assumption. For instance, the no free-lunch condition only implies the absence
of linear time dependence since the best linear predictor of future (discounted)
prices is then simply the current price. Volatility clustering, also called ARCH
effect [150], is a clear manifestation of the existence of nonlinear dependences
1
Bank for International Settlements. The BIS is an international organization
which fosters cooperation among central banks and other agencies in pursuit
of monetary and financial stability. Its banking services are provided exclusively
to central banks and international organizations.
36 2 Marginal Distributions of Returns

103
Raw
Reshuffled

102

101

100

10−1

10−2
−0.025 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02 0.025

Fig. 2.2. Kernel density estimates of the raw 60-minute returns and the density
obtained by 60 convolution iterates of the raw 1-minute returns kernel density for
the Standard & Poor’s 500

between returns observed at different lags. These dependences prevent the use
of convolution for estimating tail risks with sufficient accuracy. Figure 2.2 il-
lustrates the important observation that fat tails of asset return distributions
owe their origin, at least in part, to the existence of volatility correlations. In
the example of Fig. 2.2, a given 60-minute return is the sum of sixty 1-minute
returns. If there was no dependence between these sixty 1-minute returns,
the 60-minute return could be seen as the sum of 60 independent random
variables; hence, its probability density could be calculated exactly by taking
60 convolutions of the probability density of the 1-minute returns. Note that
this 60-fold convolution is equivalent to estimating the density of 60-minute
returns in which their sixty 1-minute returns have been reshuffled randomly
to remove any possible correlation. Figure 2.2 shows a faster decay of the pdf
of these reshuffled 60-minute returns compared with the pdf of the true em-
pirical 60-minute returns. Thus, assessing extreme risks at large time scales (1
or 10 days) by simple convolution of the distribution of returns at time scales
of 1 or of 5 minutes leads to crude approximations and to dramatic underes-
timations of the amount of risk really incurred. The role of the dependence
between successive returns is even more important in times of crashes: very
large drawdowns (amplitudes of runs of losses) have been shown to result from
anomalous transient dependences between a few successive days [249, 250]; as
a consequence, they cannot be explained or modeled by the distribution cal-
ibrated for the bulk (99%) of the rest of the sample of drawdowns. These
extreme events have been termed “outliers”, “kings” [286] or “black swans”
[470].
2.2 A Brief History of Return Distributions 37

The only way to reliably aggregate high-frequency data is to have a con-


sistent model at one’s disposal. By consistent model is meant a model that
accounts for the complex time structure of asset returns. (FI)-GARCH [31],
α-ARCH [132], multifractal2 models [39, 341] or any other stochastic volatil-
ity model [232, 473] can be used for this purpose, but none of them is yet
universally recognized since they do not rely on well-established founding of
economic principles. As a consequence, one is exposed to model error: for
instance, a simple GARCH model still underestimates the tail risks since it
underestimates the long-range dependence of the volatility.
In this context, the most pragmatic approach may be to let the data speak
by themselves, which is the stance taken in this chapter. For each different
horizon, we discuss the possible parametric distributions that fit the data
best. As we shall see, three main scales should be distinguished: small scale (a
few minutes), intermediate scale (about an hour) and a large scale (1 day or
more). At the smallest time scale, we will see that the tails of distributions are
probably decaying more slowly than any power law. At the medium scale, reg-
ularly varying distributions provide a reasonable model, while at time scales
of 1 day or more, rapidly varying distributions – like Weibull distributions –
seem to accurately describe the tails of the distributions of asset returns, at
least in the range of quantiles useful for risk management.

2.2 A Brief History of Return Distributions

The distribution of returns is one of the most basic characteristics of the


markets and many academic studies have been devoted to it. Contrarily to
the average or expected return, for which economic theory provides guidelines
to assess them in relation to risk premium, firm size, or book-to-market (see
Chap. 1 and [161] for instance), the functional form of the distribution of
returns, and especially of extreme returns, is much less constrained and still
a topic of active debate.

2.2.1 The Gaussian Paradigm

Generally, the central limit theorem would lead to a Gaussian distribution


for sufficiently large time intervals over which the return is estimated. Tak-
ing the continuous time limit, such that any finite time interval is seen as
the sum of an infinite number of increments thus leads to the paradigm of
log-normal distributions of prices and equivalently of Gaussian distributions
of returns. Based on the pioneering work of Bachelier [26] and later improved
2
While fractal objects, processes, or measures enjoy a global scale invariance prop-
erty – i.e., look similar at any (time) scale – multifractals only enjoy this property
locally, i.e., they can be conceived as a fractal superposition of infinitely many
local fractals.
38 2 Marginal Distributions of Returns

Table 2.1. Descriptive statistics for the daily Dow Jones Industrial Average index
returns (from 27, May 1896 to 31, May 2000, sample size n = 28415) calculated
over 1 day and 1 month and for Nasdaq Composite Index returns calculated over 5
minutes and 1 hour (from 8, April 1997 to 29, May 1998, sample size n = 22123)

Mean St. dev. Skewness Ex. Kurtosis Jarque-Bera


Nasdaq (5 minutes) † 1.80 × 10−6 6.61 ×10−4 0.0326 11.8535 1.30 × 105 (0.00)
Nasdaq (1 hour) † 2.40 × 10−5 3.30 ×10−3 1.3396 23.7946 4.40 × 104 (0.00)
Nasdaq (5 minutes) ‡ −6.33 × 10−9 3.85 × 10−4 −0.0562 6.9641 4.50 × 104 (0.00)
Nasdaq (1 hour) ‡ 1.05 × 10−6 1.90 × 10−3 −0.0374 4.5250 1.58 × 103 (0.00)
Dow Jones (1 day) 8.96 × 10−5 4.70 × 10−3 −0.6101 22.5443 6.03 × 105 (0.00)
Dow Jones (1 month) 1.80 × 10−3 2.54 × 10−2 −0.6998 5.3619 1.28 × 103 (0.00)
(†) Raw data, (‡) data corrected for the U -shape of the intraday volatility due to the

opening, lunch, and closing effects.


The Dow Jones exhibits a significantly negative skewness, which can probably be ascribed
to the impact of the market crashes. The raw Nasdaq returns are significantly positively
skewed while the returns corrected for the “lunch effect” are negatively skewed, showing
that the lunch effect plays an important role in the shaping of the distribution of the
intraday returns. Note also the important decrease of the kurtosis after correction of the
Nasdaq returns for the lunch effect, confirming the strong impact of the lunch effect. In
all cases, the excess-kurtosis are high and remain significant even after a time aggregation
of one month. The numbers within parentheses represent the p-value of Jarque-Bera’s
normality test, a joint statistic using skewness and kurtosis coefficients [116]: the normality
assumption is rejected for these time series. The Lagrange multiplier test proposed in [151]
allows to test for heteroscedasticity. It leads to the T · R2 test statistic, where T denotes
the sample size and R2 is the determination coefficient of the regression of the squared
centered returns xt on a constant and on q of their lags xt−1 , xt−2 , . . . , xt−q . Under the
null hypothesis of homoscedastic time series, T · R2 follows a χ2 -statistic with q degrees
of freedom. The test – performed up to lag q = 10 – shows that, in every case, the null
hypothesis is strongly rejected at any usual significance level.Thus, the time series are
heteroscedastic and exhibit volatility clustering.The BDS test [84], which allows one to
detect not only volatility clustering, as in the previous test, but also departure from iid-
ness due to non-linearities confirms that the null-hypothesis of iid data is strongly rejected
at any usual significance level. Reproduced from [329].

by Osborne [377] and Samuelson [425], the log-normal paradigm has been
the starting point of many financial theories such as Markowitz’s portfolio
selection method [347], Sharpe’s market equilibrium model (CAPM) [429] or
Black and Scholes rational option pricing theory [60]. However, for real finan-
cial data, the convergence in distribution to a Gaussian law is very slow (see
for instance [72, 88]), much slower than predicted for independent returns.
As shown in Table 2.1, the excess kurtosis (which is zero for a normal dis-
tribution) typically remains large even for monthly returns, testifying (i) of
significant deviations from normality, (ii) of the heavy tail behavior of the
distribution of returns and (iii) of significant time dependences between asset
returns [88].
2.2 A Brief History of Return Distributions 39

2.2.2 Mechanisms for Power Laws in Finance

Another approach rooted in economic theory, which can be invoked to derive


the distribution of financial returns, consists in applying the “Gibrat princi-
ple” [441] initially introduced to account for the growth of cities and of wealth
through a mechanism combining stochastic multiplicative and additive noises
[55, 207, 268, 446, 454] leading to a Pareto distribution of sizes [94, 193]. Ra-
tional bubble models à la Blanchard [61] can also be cast in this mathematical
framework of stochastic recurrence equations and leads to distributions with
regularly varying tails, albeit with a strong constraint on the tail exponent
(see [323] for the monovariate case and [331] for the multivariate case). These
works suggest that an alternative and natural way to capture the heavy tail
character of the distributions of returns is to use distributions with power-like
tails (Pareto, generalized Pareto, Lévy stable laws) or more generally, reg-
ularly varying distributions[57],3 the later ones encompassing all the former
ones. At first glance, Fig. 2.3, which depicts the complementary sample dis-
tribution function for the 30-minute returns of the Standard & Poor’s 500,
seems to substantiate this thesis.
Other mechanisms involving the existence of a long memory of the volatil-
ity have been recently found to describe many of the stylized facts of mono-
variate financial returns. In particular, the multifractal random walk (MRW)
is a process constructed with a very long memory in the volatility, such that
it has a bona fide continuous limit with exact multifractal properties on the
absolute values of the returns [27]. Appendix 2.A defines random cascade
models from which the MRW derives and summarizes their main proper-
ties. For random cascade models exhibiting multifractality, it has been shown
exactly that the random variables, defined in Sect. 2.A.1 as the increments
δ∆t X(t) = X(t) − X(t − ∆t) corresponding to the log-returns calculated over
the horizon ∆t, are distributed in the tail according to a power law

Pr [δ∆t X ≥ x] = L(x) · x−b , (2.1)

where the exponent b is given by

b = sup{q, q > 1, ζ(q) > 1} , (2.2)

and ζ(q), defined in (2.A.7), is the spectrum of exponents of the moments of


the absolute value of the log-returns (see Appendix 2.A). For the MRW, we
have ζ(q) = (q − q(q − 2)λ2 )/2 according to (2.A.36), where λ2 is the so-called
multifractal parameter. Condition (2.2) then yields b = 1/λ2 to leading order.
Another equivalent way to arrive at the same result is to use the moments
defined by (2.A.34) with (2.A.35), which can be shown to diverge (Kq = +∞)
3
The general representation of a regularly varying distribution F is given by 1 −
F (x) = L(x) · x−α , where L(·) is a slowly varying function, that is, a function
such that limx→∞ L(tx)/L(x) = 1 for any finite t. The parameter α is usually
called the tail index or the tail exponent.
40 2 Marginal Distributions of Returns

100

10−1
Complementary sample DF

10−2

10−3

10−4

10−5
10−5 10−4 10−3 10−2 10−1 100
Absolute log−return, x
Fig. 2.3. Complementary sample distribution function for the Standard & Poor’s
500 30-minute returns over the two decades 1980–1999. The plain (resp. dotted)
line depicts the complementary distribution for the positive (the absolute value of
negative) returns. Reproduced from [330]

if ζ(q) < 1 [27]. The calibration4 of λ2 gives in general very small values in the
range 0.01–0.04 leading to a tail index b in the range 15–50 [366]. This has led
previous workers to conclude that such a large tail exponent is unobservable
with available data sets, and may well be described by other effective laws.
However, Muzy et al. [365] have recently shown that empirical distributions
of log-returns do not give access to the unconditional prediction b ≈ 15–50
with (2.2). This is because the value of q determining the exponent b according
to (2.2) is itself associated with an α (through the Legendre transformation
(2.A.20) and (2.A.21)) for which the multifractal spectrum f (α) defined in
Sect. 2.A.2 is negative. But negative f (α)’s are unobservable.5 Indeed, from
the definition (2.A.19) of f (α), only positive f (α)’s correspond to genuine
fractal dimensions and are thus observable: this is because they correspond to
more than a few points of observations in the limit ∆t  T . The key remark of
Muzy et al. [365] is therefore that the observable exponent bobs for an infinite
time series will be the largest positive q such that f (α) ≥ 0:

bobs = sup{q, q > 1, f (α) > 0} . (2.3)


4
From the correlation function of the log-volatility, from the scaling approach using
the multifractal spectrum or from the generalized method of moment [321].
5
Mandelbrot has shown that they can be interpreted in terms of singularities of
large deviations.
2.2 A Brief History of Return Distributions 41

Using the form (2.A.36) together with (2.A.20), we obtain bobs = 2/λ. For
a financial time series of finite length L, Muzy et al. [365] have shown in
addition that the observed exponent is further reduced as a function of the
ratio ln NL / ln NT of the logarithm of the number NL = L/T of integral scales
over the logarithm of the number NT = T /∆t of data points per integral scale.
This makes a huge difference: rather than tail indices in the range 15–50, this
gives observable tail indices in the range 3–5, as observed empirically.
See also [28] for extensions of the MRW to log infinitely divisible processes
and [39, 341] for other applications of the multifractal process to the modeling
of asset returns dynamics.
A different point of view on the underlying mechanism for the power law
tails of price fluctuations has been proposed by Gabaix et al. [194, 195]. In
essence, their proposal is that price variations are driven by fluctuations in
the volume of transactions, V , whose cumulative distribution function FV has
a regularly-varying tail with a universal exponent γ ≈ 1.5. The fluctuations
in the volume of transactions are argued in addition to be modulated by a
deterministic market impact function, which describes the response of prices
to transactions of the form r = kV β , where r is the change in the logarithm
of price resulting from a transaction of volume V , k is a constant and β = 0.5.
This relationship can be derived from the assumption that agents are profit
maximizers. These two ingredients FV (V ) ∼ 1 − 1/V γ and r = kV β imply
that large price returns r have also a power law distribution with exponent
µ = γ/β ≈ 3. Gabaix et al. find that their theory is consistent with the
data. It is important to stress that these results are obtained by using ag-
gregated data over a fixed time interval. Farmer and Lillo [165] argue that
aggregating the data in time complicates the discussion, since the functional
form of the market impact generally depends on the length of the time in-
terval. They find that the same analysis based on individual (rather than
time-aggregated) transactions does not confirm Gabaix et al.’s results and
they suggest that the tail of price changes is driven by fluctuations in liq-
uidity rather than in the volume of transactions. However, Plerou et al. [387]
make the important point that individual transactions do not reflect true or-
ders, especially the large ones, since the large orders of a large fund, say, are
generally split in several transactions. Thus, the correct observable seems in-
deed to be the time-aggregated volume (albeit with variations in timespan),
rather than individual trades. It is probable, however, that both mechanisms of
fluctuations in the volume of transactions and in liquidity play a role in deter-
mining the statistics of price changes. In addition, the mechanism proposed by
Gabaix et al. transfers the question of the origin of the power law distribution
of the returns to the open question of the origin of the power law distribution
of the volumes of transactions, which could reflect the power law distribu-
tion of the fund sizes, since the larger a fund is, the larger its orders can
be expected to be. However, the distribution of the top 10% mutual funds
[194] and of firm sizes [25, 376] are found to be regularly varying with a tail
index close to 1, significantly smaller than the value 1.5 of the exponent of
42 2 Marginal Distributions of Returns

the distribution of the volumes of transactions. Unraveling the origin of this


exponent 1.5 thus requires an understanding of the strategies of investors and
how they organize, fragment, and delay their orders.

2.2.3 Empirical Search for Power Law Tails


and Possible Alternatives

In the early 1960s, Mandelbrot [339] and Fama [157] presented evidence that
distributions of returns can be well approximated by a symmetric Lévy stable
law with tail index b about 1.7. These estimates of the tail index have recently
been supported by Mittnik et al. [362], and slightly different indices of the
stable law (b = 1.4) were suggested by Mantegna and Stanley [345, 346].
On the other hand, there are numerous evidences of a larger value of the
tail index b ∼= 3 [217, 312, 320, 322, 367]. See also the various alternative
parameterizations in terms of the Student distribution [62, 275], or Pearson
Type-VII distributions [368], which all have an asymptotic power law tail and
are regularly varying. Thus, a general conclusion of this group of authors con-
cerning tail fatness can be formulated as follows: the tails of the distribution
of returns are heavier than a Gaussian tail and heavier than an exponential
tail; they certainly admit the existence of a finite variance (b > 2), whereas
the existence of the third (skewness) and the fourth (kurtosis) moments is
questionable.
These two classes of results are contradictory only on the surface, because
they actually do not apply to the same quantiles of the distributions of re-
turns. Indeed, Mantegna and Stanley [345] have shown that the distribution
of returns of the Standard & Poor’s 500 index can be described accurately
by a Lévy stable law only within a limited range up to about 5 standard
deviations, while a faster decay (approximated by an exponential or a power
law with larger exponent) of the distribution is observed beyond. This almost-
but-not-quite Lévy stable description could explain (at least, in part) the slow
convergence of the distribution of returns to the Gaussian law under time ag-
gregation [72, 451]; and it is precisely outside this range of up to 5 standard
deviations, where the Lévy law does not apply anymore that a tail index b ∼ =3
has been estimated. Indeed, most authors who have reported a tail index b ∼ =3
have used some optimality criteria for choosing the sample fractions (i.e., the
largest values) for the estimation of the tail index. Thus, unlike the authors
supporting stable laws, they have used only a fraction of the largest (positive
tail) and smallest (negative tail) sample values.
It would thus seem that all has been said on the distributions of returns.
However, there are still dissenting views in the literature. Indeed, the class
of regularly varying distributions is not the sole one able to account for the
large kurtosis and fat-tailness of the distributions of returns. Some recent
works suggest alternative descriptions for the distributions of returns. For in-
stance, Gouriéroux and Jasiak [208] claim that the distribution of returns on
the French stock market decays faster than any power law. Cont et al. [108]
2.3 Constraints from Extreme Value Theory 43

have proposed to use exponentially truncated stable distributions, Barndorff-


Nielsen [37], Eberlein et al. [140] and Prause [393] have respectively considered
normal inverse Gaussian and (generalized) hyperbolic distributions, which as-
ymptotically decay as xα · exp(−βx), while Laherrère and Sornette [286] sug-
gest to fit the distributions of stock returns by the Stretched-Exponential
law.6 Of the same type are the marginal distributions of the so-called CGMY
model proposed by Carr et al. [90]. These results, challenging the traditional
hypothesis of a power-like tail, offer a new representation of the returns dis-
tributions.
In addition, real financial time series exhibit (G)ARCH effects [65, 66] lead-
ing to heteroscedasticity and to clustering of high threshold exceedances due
to a long memory of the volatility. These rather complex dependent structures
make difficult, if not questionable, the blind application of standard statistical
tools for data analysis. In particular, the existence of significant dependence in
the return volatility leads to the existence of a significant bias and an increase
of the true standard deviation of the statistical estimators of tail indices.
Indeed, there are now many examples showing that dependences and long
memories as well as non-linearities mislead standard statistical tests (see for
instance [12, 216]). Consider the Hill’s and Pickands’ estimators, which play an
important role in the study of the tails of distributions. It is often overlooked
that, for dependent time series, Hill’s estimator remains only consistent but
not asymptotically efficient [416]. Moreover, for financial time series with a de-
pendence structure described by an IGARCH process, it has been shown that
the standard deviation of Hill’s estimator obtained by a bootstrap method
can be seven to eight times larger than the standard deviation derived under
the asymptotic normality assumption [267]. These figures are even worse for
Pickands’ estimator.

2.3 Constraints from Extreme Value Theory


The application of extreme value theory (EVT) to the investigation of the
properties of the distributions of asset returns has grown rapidly during the
last decade. Longin [312] was one of the main promoters of this method and
has advocated its use for risk management purposes [313], particularly for
Value-at-Risk assessment and stress testing. The conclusions drawn from the
various studies of empirical distributions of log-returns, based on the extreme
value theory, show that they should belong to the maximum domain of at-
traction of the Fréchet distribution, so that they are necessarily regularly
varying laws. However, most of these studies have been performed under the
6
Picoli et al. [385] have also presented fits comparing the relative merits of
Stretched-Exponential and so-called q-exponentials (which are similar to Student
distribution with power law tails) for the description of the frequency distribu-
tions of basketball baskets, cyclone victims, brand-name drugs by retail sales, and
highway lengths.
44 2 Marginal Distributions of Returns

restrictive assumption that (i) financial time series are made of independent
and identically distributed returns, and (ii) the corresponding distributions
of returns belong to one of only three possible maximum domains of attrac-
tion.7 However, these assumptions are not fulfilled in general. While Smith’s
results [444] indicate that the dependence of the data does not constitute a
major problem in the limit of large samples, so that volatility clustering of
financial data does not prevent the reliability of EVT, we shall see that it
can significantly bias standard statistical tools for samples of size commonly
used in extreme tails studies. Moreover, the conclusions of many studies are
essentially based on an aggregation procedure which stresses the central part
of the distribution while smoothing and possibly distorting the characteristics
of the tail (whose properties are obviously essential in characterizing the tail
behavior).
The question then arises whether the limitations of these statistical tools
could have led to erroneous conclusions about the tail behavior of the distrib-
utions of returns. In this section, presenting tests performed on synthetic time
series with time dependence in the volatility with both Pareto and Stretched
Exponential (SE) distributions, and on two empirical time series (the daily
returns of the Dow Jones Industrial Average Index over a century (n = 28415
data points) and the 5-minute returns of the Nasdaq Composite index over 1
year from April 1997 to May 1998 (n = 22123 data points)), we exemplify the
fact that the standard generalized extreme value (GEV) estimators can be
quite inefficient due to the possibly slow convergence toward the asymptotic
theoretical distribution and the existence of biases in the presence of depen-
dence between data. Thus, one cannot reliably distinguish between rapidly
and regularly varying classes of distributions. The generalized Pareto distri-
bution (GPD) estimators work better, but still lack power in the presence of
strong dependence. Note that the two empirical data sets used in the illus-
tration below are justified by their similarity with (i) the data set of daily
returns used in [312] particularly, and (ii) the high frequency data used in
[217, 322, 367] among others.
7
Extensions of the asymptotic theory of extreme values to correlated sequences
have been developed by Berman [48, 49] for Gaussian sequences and Loynes [318],
O’Brien [374], Leadbetter [293] and others [369] in the more general context of
stationary sequences satisfying mixing conditions. See also Kotz and Nadarajah
[277] for the limit distribution of extreme values of 2D correlated random vari-
ables. Recently, there is a growing interest in the extreme value theory of strongly
correlated random variables in many areas of science, including applications to
diffusing particles in correlated random potentials [89], to the understanding of
large deviations in spin glass ground state energies [13], to front propagation and
fluctuations [378], fragmentation, binary search tree problem in computer science
[325, 326], to maximal height of growing surfaces [399], to the Hopfield model of
brain learning [75], and so on.
2.3 Constraints from Extreme Value Theory 45

2.3.1 Main Theoretical Results on Extreme Value Theory

Two limit theorems allow one to study the extremal properties and to deter-
mine the maximum domain of attraction (MDA) of a distribution function in
two forms.
First, consider a sample of N iid realizations X1 , X2 , . . . , XN of a random

variable. Let XN denote the maximum of this sample.8 Then, the Gnedenko
theorem states that, if, after an adequate centering and normalization, the

distribution of XN converges to a non-degenerate distribution as N goes to
infinity, this limit distribution is then necessarily the generalized extreme value
(GEV) distribution defined by
 
Hξ (x) = exp −(1 + ξ · x)−1/ξ , (2.4)

with x ∈ [−1/ξ, ∞) if ξ > 0 and x ∈ (−∞, −1/ξ] if ξ < 0. When ξ = 0, Hξ (x)


should be understood as

H0 (x) = exp[− exp(−x)], x ∈ R. (2.5)

Thus, for N large enough


 
∧ x − µN
Pr [XN < x] HξN , (2.6)
ψN

for some value of the centering parameter µN , scale factor ψN and form para-
meter ξN . The form parameter ξ is of paramount importance for the shape of
the limiting distribution. Its sign determines the three possible limiting forms
of the GEV distribution of maxima (2.4):
1. If ξ > 0 the limit distribution is the (shifted) Fréchet power-like distribu-
tion;
2. If ξ = 0, the limit distribution is the Gumbel (double-exponential) distri-
bution;
3. If ξ < 0, the limit distribution has a support bounded from above.
The determination of the parameter ξ is the central problem of extreme
value analysis. Indeed, it allows one to determine the maximum domain of
attraction of the underlying distribution and therefore its behavior in the tails.
When ξ > 0, the underlying distribution belongs to the Fréchet maximum
domain of attraction and is regularly varying (power-like tail). When ξ = 0, it
belongs to the Gumbel maximum domain of attraction and is rapidly varying
(exponential tail), while if ξ < 0 it belongs to the Weibull maximum domain
of attraction and has a finite right endpoint, which means that there exists a
finite xF such that X ≤ xf with probability one.
8 ∨
Similar results hold for XN = min {X1 , . . . , XN } since min {X1 , . . . , XN } =
− max {−X1 , . . . , −XN }.
46 2 Marginal Distributions of Returns

The usefulness of formula (2.6) for risk assessment purposes seems obvious
as it provides a universal estimation of the Value-at-Risk. If X denotes the

profit and loss, XN represents the largest among N losses. The Value-at-Risk
at confidence level α, denoted by VaRα , is given by the unique solution of:

FX (VaRα ) = Pr [X < VaRα ] = α , (2.7)

provided that FX is increasing.9 For N iid observations of the profits and


losses, we have
∧ N
Pr [XN < VaRα ] = Pr [X < VaRα ] = αN , (2.8)

so that VaRα is (asymptotically) solution of


 
VaRα − µN
H ξN = αN , (2.9)
ψN

which with (2.4) yields

ψN  −ξ

VaRα µN + (−N ln α) N − 1 . (2.10)
ξN
When the observations are not iid, one can generally replace N by θ ·N , where
θ ∈ [0, 1] is the so-called extremal index [146, 293, 313], related to the size of
the clusters of extremes which may appear when the data exhibit temporal
dependence. Indeed, generally speaking, one can write [146, p.419]:
∧ θ·N
Pr [XN < VaRα ] Pr [X < VaRα ] = αθ·N , (2.11)

so that
ψ −ξ

VaRα µ + (−θ · N ln α) − 1 . (2.12)
ξ
The second limit theorem is called after Gnedenko-Pickands-Balkema-de
Haan (GPBH) and its formulation is as follows [146, pp. 152–168] (see also
[451, Chap. 1] for an intuitive exposition). In order to state the GPBH the-
orem, let us define the right endpoint xF of a distribution function F (x) as
xF = sup{x : F (x) < 1}. Let us call the function

Pr{X − u ≥ x | X > u} ≡ F̄u (x) (2.13)

the excess distribution function. Then, this (survival) distribution function


F̄u (x) belongs to the maximum domain of attraction of Hξ (x) defined by
(2.4) if and only if there exists a positive scale-function s(u), depending on
the threshold u, such that
9
For a more general definition of VaR, see (3.85) page 125.
2.3 Constraints from Extreme Value Theory 47

lim sup |F̄u (x) − Ḡ(x | ξ, s(u))| = 0 , (2.14)


u→xF 0≤x≤xF −u

where

x
x −1/ξ
G(x | ξ, s) = 1 + ln Hξ =1− 1+ξ· (2.15)
s s
is called the generalized Pareto distribution (GPD). By taking the limit ξ → 0,
expression (2.15) leads to the exponential distribution. The support of the
distribution function (2.15) is defined as follows:

0  x < ∞, if ξ  0
(2.16)
0  x  −s/ξ, if ξ < 0 .

Thus, the GPD has a finite support for ξ < 0.


Again, this theorem has important practical implications for risk man-
agement, since it provides a general assessment of the expected-shortfall of
a position X associated with a given distribution of profits and losses. The
expected-shortfall, at confidence level α, is given by:

ESα = E [X|X ≥ VaRα ] , (2.17)

which can be evaluated with the help of relation (2.15):


s
ESα = VaRα + , (2.18)
1−ξ

with ξ < 1, in order for the expectation to exist.10


As a note of caution, it should be underlined that the existence of a non-
degenerate limit distribution of properly centered and normalized maxima

XN or peaks over threshold X − u|X > u is a rather strong requirement.
There are a lot of distribution functions which do not satisfy this condition,
e.g., infinitely alternating functions between a power-like and an exponential
behavior.

2.3.2 Estimation of the Form Parameter and Slow Convergence


to Limit Generalized Extreme Value (GEV)
and Generalized Pareto (GPD) Distributions

There exist two main ways of estimating the form parameter ξ. First, if there is
a sample of maxima (taken from subsamples of sufficiently large size), then one
can fit to this sample the GEV distribution, thus estimating the parameters
by the maximum likelihood method, for instance. Alternatively, one can prefer
the distribution of exceedances over a large threshold given by the GPD (2.15),
whose tail index can be estimated with Pickands’ estimator or by maximum
10
Recall that ξ is the inverse of the tail exponent.
48 2 Marginal Distributions of Returns

likelihood, as previously. Hill’s estimator cannot be used in the present case


since it assumes ξ > 0, while the essence of extreme value analysis is, as we
said, to test for all the classes of limit distributions without excluding any
possibility, and not only to determine the quantitative value of an exponent.
Each of these methods has its advantages and drawbacks, especially when one
has to study dependent data, as we show below.
Given a sample of size N , one can consider the q-maxima drawn from
q subsamples of size p (such that p · q = N ) to estimate the parameters
(µ, ψ, ξ) in (2.6) by maximum likelihood. This procedure yields consistent and
asymptotically efficient Gaussian estimators, provided that ξ > −1/2 [444].
The properties of the estimators still hold approximately for dependent data,
provided that the interdependence remains weak. However, it is difficult to
choose the optimal value q of the number of subsamples as it depends both on
the size N of the entire sample and on the underlying distribution: the maxima
drawn from an exponential distribution are known to converge very quickly
to Gumbel’s distribution [220], while for the Gaussian law, convergence is
particularly slow [219].
The second possibility is to estimate the parameter ξ from the distrib-
ution of exceedances (i.e., from the GPD). For this, one can use either the
maximum likelihood estimator or Pickands’ estimator. Maximum Likelihood
estimators are well known to be asymptotically the most efficient ones (at least
for ξ > −1/2 and for independent data) but, in this particular case, Pickands’
estimator works reasonably well. Given an ordered sample x1 ≤ x2 ≤ · · · ≤ xN
of size N , Pickands’ estimator is given by
1 xk − x2k
ξˆk,N = ln . (2.19)
ln 2 x2k − x4k
For independent and identically distributed data, this estimator is consistent
provided that k is chosen so that k −→ ∞ and k/N −→ 0 as N −→ ∞.
Moreover, ξˆk,N is asymptotically normal with variance
ξ 2 (22ξ+1 + 1)
σ(ξˆk,N )2 · k −→ , as N −→ ∞ . (2.20)
(2(2ξ − 1) ln 2)2
In the presence of dependence between data, one can expect an increase of
the standard deviation, as reported by Kearns and Pagan [267]. For time
dependence of the GARCH class, they have indeed demonstrated a signifi-
cant increase of the standard deviation of the tail index estimator, such as
Hill’s estimator, by a factor more than seven with respect to their asymptotic
properties for iid samples. This leads to very inaccurate index estimates for
time series with this kind of temporal dependence. Another problem lies in
the determination of the optimal threshold u of the GPD, which is in fact
related to the optimal determination of the subsamples size p in the case of
the estimation of the parameters of the distribution of maximum.
In order to compare the performance of the various estimators of the tail
index ξ for iid data, Malevergne et al. [329] have considered several numerically
2.3 Constraints from Extreme Value Theory 49

generated samples respectively drawn from (i) an asymptotic power law dis-
tribution with tail index b = 3, (ii) a SE distribution, i.e., such that
ln Pr [X ≤ x] ∝ −xc , as x −→ ∞ , (2.21)
with fractional exponent c = 0.7 and (iii) a SE with fractional exponent
c = 0.3. Considering 1000 replications of each of these three samples (made of
10, 000 data each), they show that the estimates of ξ obtained from the distri-
bution of maxima (2.6) are compatible (at the 95% confidence level) with the
theoretical value for the first two distributions (Pareto and SE with c = 0.7)
as soon as the size p of the subsamples, from which the maxima are drawn,
is larger than 10. For the SE with fractional exponent c = 0.3, an average
value ξ larger than 0.2 is obtained even for large subsample sizes (p = 200).
This value is reported to be significantly different from the theoretical value
ξ = 0.0. These results clearly show that the distribution of the maximum
drawn from a SE distribution with c = 0.7 converges quickly toward the the-
oretical asymptotic GEV distribution, while for c = 0.3 the convergence is
very slow. A fast convergence for c = 0.7 is not surprising since, for this value
of the fractional index c, the SE distribution remains close to the exponential
distribution, which is known to converge very quickly to the GEV distribu-
tion [220]. For c = 0.3, the SE distribution behaves, over a wide range, like
the power law (see page 59 hereafter for a theoretical formalization with an
exact embedding of the power law into the SE family). Thus, it is not sur-
prising to obtain an estimate of ξ which remains significantly positive for SE
distributions with small exponents c’s.
Overall, the results reported in [329] are slightly better for the maximum
likelihood estimates obtained from the GPD. Indeed, the bias observed for
the SE with c = 0.3 seems smaller for large quantiles than the smallest biases
reached by the GEV method. Thus, it appears that the distribution of ex-
ceedance converges faster to its asymptotic distribution than the distribution
of maximum. However, while in line with the theoretical values, the standard
deviations are found to be almost always larger than in the previous case,
which testifies of the higher variability of this estimator. Thus, for sample of
sizes of 10, 000 or so – a typical size for most financial samples – the GEV and
GPD maximum likelihood estimates should be handled with care and their
results interpreted with caution due to possibly important bias and statistical
fluctuations. If a small value of ξ seems to allow one to reliably conclude in
favor of a rapidly varying distribution, a positive estimate does not appear
informative, and in particular does not allow one to reject the rapidly varying
behavior of a distribution. Pickands’ estimator does not perform better, in so
far as it is also unable to distinguish between a regularly varying distribution
and a SE with a low fractional exponent [329].
As another example illustrating the very slow convergence to the limit
distributions of the extreme value theory mentioned above, even with very
large samples, let us consider a simulated sample of iid random variables (we
thus fulfill the most basic assumption of extreme values theory, i.e, iid-ness)
50 2 Marginal Distributions of Returns

with Weibull distribution defined by



x c 
Fu (x) = 1 − exp − , (2.22)
d
with parameter set (c > 0, d > 0), for x ≥ 0. This distribution belongs
to the class of Stretched-Exponential distributions when the exponent c is
smaller than one, namely when the distribution decays more slowly than an
exponential distribution (but still faster than any power law). We consider
two values for the exponent of the Weibull distribution: c = 0.7 and c = 0.3,
with d = 1. Theoretically, using for instance the GPD of exceedances should
give estimated values of ξ close to zero in the limit of large N , since the SE
distribution belongs to the basin of attraction of the Gumbel distribution. In
order to use the GPD, we construct the conditional Weibull distribution under
the condition X > Uk , k = 1, . . . , 15, where the thresholds Uk are chosen as:
U1 = 0.1; U2 = 0.3; U3 = 1; U4 = 3; U5 = 10; U6 = 30; U7 = 100; U8 =
300; U9 = 1000; U10 = 3000; U11 = 104 ; U12 = 3 · 104 ; U13 = 105 ; U14 =
3 · 105 and U15 = 10.6
For each simulation, the size of the sample above a given threshold Uk is
set equal to 50,000 in order to get small standard deviations. The maximum-
likelihood estimates of the GPD form parameter ξ are shown in Fig. 2.4 as a
function of the index k of Uk . For c = 0.7, the threshold U7 gives an estimate
ξ = 0.0123 with standard deviation equal to 0.0045, i.e., the estimate for ξ
differs significantly from zero (recall that ξ = 0 is the theoretical limit value).

1.8

1.6

1.4

1.2

1
MLE of ξ

0.8
c=0.3
0.6

0.4
c=0.7
0.2

−0.2
0 2 4 6 8 10 12 14 16
Number of lower threshold k
Fig. 2.4. Maximum likelihood estimates of the GPD form parameter ξ in (2.15)
as a function of the index k of the thresholds Uk defined in the text for stretched-
exponential samples of size 50,000 and their 95% confidence interval. Reproduced
from [329]
2.3 Constraints from Extreme Value Theory 51

Stronger deviations from the correct value ξ = 0 are found for the smaller
thresholds U1 , ..., U6 while the discrepancy abates for larger thresholds Uk ’s
for k > 7. These results occur notwithstanding the huge size of the implied
data set; indeed, the probability Pr (X > U7 ) for c = 0.7 is about 10−9 , so
that in order to obtain a data set of conditional samples from an unconditional
data set of the size studied here (50,000 realizations above U7 ), the size of such
an unconditional sample should be approximately 109 times larger than the
number of “peaks over threshold.” It is practically impossible to have such
a sample. For c = 0.3, the convergence to the theoretical value zero is much
slower and the discrepancy with the correct value ξ = 0 remains even for
the largest financial data sets: for a single asset, the largest data sets, drawn
from high frequency data, are no larger than or of the order of one million
points;11 the situation does not improve for data sets one or two orders of
magnitudes larger as considered in [211], obtained by aggregating thousands
of stocks.12 Thus, although the GPD form parameter should be theoretically
zero in the limit of a large sample for the Weibull distribution, this limit cannot
be reached for any available sample sizes. This is another clear illustration that
a rapidly varying distribution, like the SE distribution, can be mistaken for a
regularly varying distribution for any practical applications.

2.3.3 Can Long Memory Processes Lead to Misleading Measures


of Extreme Properties?

As we already mentioned, Kearns and Pagan [267] have reported how mislead-
ing could be Hill’s and Pickands’ estimators in the presence of dependence in
data. Focusing on IGARCH processes, they show that the estimated standard
deviations of these estimators increase significantly with respect to the theo-
retical standard deviations derived under the iid assumption. They also find
an important bias. Generalizing these results, the study by Malevergne et al.
[329] shows that the presence of simple Markovian time dependences is suffi-
cient to draw erroneous conclusions from GEV or GPD maximum likelihood
estimates and Pickands estimates as well. Considering Markovian processes
with different stationary distributions including a regularly varying distribu-
tion with the tail index b = 3 and two SEs with fractional exponents c = 0.3
and c = 0.7, they report the presence of a significant downward bias (with
respect to the iid case) in almost every situation for the GPD estimates: the
stronger the dependence (measured by the correlation time varying from 20
to 100), the more important is the bias. At the same time, the empirical val-
ues of the standard deviations remain comparable with those obtained for iid
11
One year of data sampled at the 1-minute time scale gives approximately 1.2 · 105
data points.
12
In this case, another issue arises concerning the fact that the aggregation of
returns from different assets may distort the information and the very structure
of the tails of the probability density functions (pdf), if they exhibit some intrinsic
variability [351].
52 2 Marginal Distributions of Returns

data. The downward bias can be ascribed to the dependence between data.
Indeed, positive dependence yields important clustering of extremes and ac-
cumulation of realizations around some values, which – for small samples –
could (misleadingly) appear as the consequence of the compactness of the
support of the underlying distribution. In other words, for finite samples, the
dependence prevents the full exploration of the tails and creates clusters that
mimic a thinner tail (even if the clusters are all occurring at large values since
the range of exploration of the tail controls the value of ξ).
The situation is different for the GEV estimates which exhibit biases which
can be either upward or downward (with respect to the iid case). For the GEV
estimates, two effects are competing. On the one hand, the dependence cre-
ates a downward bias, as explained above, while, on the other hand, the lack
of convergence of the distribution of maxima toward its GEV asymptotic dis-
tribution results in an upward bias, as observed on iid data (see the previous
section). This last phenomenon is strengthened by the existence of time de-
pendence which leads to decrease the “effective” sample size (the actual size
divided by the correlation time of the time series) and thus slows down the
convergence rate toward the asymptotic distribution even more. Interestingly,
both the GEV and GPD estimators for the Pareto distribution may be utterly
wrong in presence of long-range dependence for any cluster sizes.
The same kind of results are reported for Pickands’ estimator. However,
the estimated standard deviations reported in [329] remain of the same order
as the theoretical ones, contrarily to results reported by [267] for IGARCH
processes. Nonetheless, in both studies, a very significant bias, either positive
or negative, is found, which can lead to misclassify a SE distribution for a
regularly varying distribution. Thus, in presence of dependence, Pickands’
estimator becomes unreliable.
To summarize, the determination of the maximum domain of attraction
with usual estimators does not appear to be a very efficient way to study the
extreme properties of financial time series. Many studies on the tail behav-
ior of the distributions of asset returns have focused on these methods (see
the influential study [312] for instance) and may thus have led to spurious
conclusions. In particular, the fact that rapidly varying distribution functions
may be mistaken for regularly varying distribution functions casts doubts on
the strength of the seeming consensus according to which the distributions of
returns are regularly varying. It also casts doubts on the reliability of EVT
for risk assessment. If an accurate estimation of the shape parameter ξ is
so difficult to reach, how can one hope to obtain trustful estimates of the
Value-at-Risk or expected-shortfall by use of EVT?

2.3.4 GEV and GPD Estimators of the Distributions of Returns


of the Dow Jones and Nasdaq Indices

As an illustration, let us apply the GEV and GDP estimators to the daily
returns of the Dow Jones Industrial Average Index over the last century and
2.3 Constraints from Extreme Value Theory 53
0.15 0.01

0.1 0.008

0.05 0.006

0.004
0
Daily return

Daily return
0.002
−0.05
0
−0.1
−0.002
−0.15
−0.004
−0.2 −0.006
−0.25 −0.008

−0.3 −0.01
1900 1920 1940 1960 1980 2000 M J J A S O N D J F M A M

Fig. 2.5. Daily returns of the Dow Jones Industrial Average Index from 1900 to
2000 (left panel ) and 5-minute returns of the Nasdaq Composite index over 1 year
from April 1997 to May 1998 (right panel )

to the 5-minute returns of the Nasdaq Composite index over 1 year from April
1997 to May 1998. These two time series are depicted on Fig. 2.5.
For the intraday Nasdaq data, there are two caveats that must be ad-
dressed before any estimation can be made. First, in order to remove the
effect of overnight price jumps, the intraday returns have to be determined
separately for each of 289 days contained in the Nasdaq data. Then, the union
of all these 289 return data sets provide a better global return data set. Sec-
ond, the volatility of intraday data is known to exhibit a U-shape, also called
“lunch effect”, that is, an abnormally high volatility at the beginning and the
end of the trading day compared with a low volatility at the approximate time
of lunch. Such an effect is present in this data set and it is desirable to cor-
rect it. Such a correction has been performed by renormalizing the 5-minute
returns at a given instant of the trading day by the corresponding average
absolute return at the same instant (when the average is performed over the
289 days). We shall refer to this time series as the corrected Nasdaq returns in
contrast with the raw (incorrect) Nasdaq returns and we shall examine both
data sets for comparison.
The daily returns of the Dow Jones also exhibit some non-stationarity.
Indeed, one can observe a clear excess volatility roughly covering the time of
the bubble ending in the October 1929 crash followed by the Great Depres-
sion. To investigate the influence of this non-stationarity, the statistical study
presented below has been performed twice: first with the entire sample, and
then after having removed the period from 1927 to 1936 from the sample. The
results are somewhat different, but on the whole, the conclusions about the
nature of the tail are the same.
Although the distributions of positive and negative returns are known to be
very similar (see for instance [256]), we have chosen to treat them separately.
For the Dow Jones, this gives us 14949 positive and 13464 negative data points
while, for the Nasdaq index, we have 11241 positive and 10751 negative data
points.
54 2 Marginal Distributions of Returns

Given these precautionary measures, the analysis of the previous section


has been applied to the the Dow Jones and Nasdaq (raw and corrected) re-
turns. In order to estimate the standard deviations of Pickands’ estimator for
the GPD derived from the upper quantiles of these distributions, and of the
Maximum Likelihood estimators of the distribution of the maximum and of
the GPD, we have randomly generated 1000 subsamples, each subsample be-
ing constituted of 10,000 data points in the positive or negative parts of the
samples respectively (with replacement). It should be noted that the Maxi-
mum Likelihood estimates themselves were derived from the full samples. The
results are given in Tables 2.2 and 2.3.
These results confirm the difficulties in obtaining a clear conclusion con-
cerning the nature of the tail behavior of the distributions of returns. In
particular, it seems impossible to exclude a rapidly varying behavior of their
tails. Even the estimations obtained with the maximum likelihood of the GPD
tail index do not allow one to reject clearly the hypothesis that the tails of the
empirical distributions of returns are rapidly varying, in particular for large
quantile values. For the Nasdaq data set, accounting for the lunch effect does
not yield any significant change in the estimations.

2.4 Fitting Distributions of Returns


with Parametric Densities
Since it is particularly difficult to conclude with enough certainty on the reg-
ularly or rapidly varying behavior of the tails of distributions of asset returns
by using the nonparametric methods of the extreme value theory, it may be
more appropriate to consider a parametric approach. However, in order to
avoid – or at least to lower – the risk of misspecification inherent in any para-
metric approach, it is mandatory to use models as versatile as possible. In
particular, it is necessary to consider models which encompass both regularly
and rapidly varying distributions. Many examples of such models have been
described in the literature, such as the generalized t-distribution of McDonald
and Newey [353] or of the q-exponential and q-Weibull distributions [385].
In the remaining of this section, relying on the results presented in [330],
we introduce two versatile families to characterize the behavior of the tails of
asset return distributions. The implications of the choice of these parametric
families for the assessment of tail risk will be discussed at the end of this
chapter.

2.4.1 Definition of Two Parametric Families


A General 3-Parameters Family of Distributions
We consider a general 3-parameters family of distributions and its particular
restrictions corresponding to some fixed value(s) of one (two) parameters. This
family is defined by its density function given by:
2.4 Fitting Distributions of Returns with Parametric Densities 55

Table 2.2. Mean values and standard deviations of the maximum likelihood esti-
mates of the parameter ξ for the distribution of the maximum (cf. (2.6)) when data
are grouped in samples of size 20, 40, 200, and 400 and for the generalized pareto
distribution (2.15) for thresholds u corresponding to quantiles 90%, 95%, 99%, and
99.5%

(a) Dow Jones


Positive Tail Negative Tail
GEV GEV
cluster 20 40 200 400 cluster 20 40 200 400
ξ 0.273 0.280 0.304 0.322 ξ 0.262 0.295 0.358 0.349
Emp Std 0.029 0.039 0.085 0.115 Emp Std 0.030 0.045 0.103 0.143

GPD GPD
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995
ξ 0.248 0.247 0.174 0.349 ξ 0.214 0.204 0.250 0.345
Emp Std 0.036 0.053 0.112 0.194 Emp Std 0.041 0.062 0.156 0.223
Theor Std 0.032 0.046 0.096 0.156 Theor Std 0.033 0.046 0.108 0.164

(b) Nasdaq (raw data)


GEV GEV
cluster 20 40 200 400 cluster 20 40 200 400
ξ 0.209 0.193 0.388 0.516 ξ 0.191 0.175 0.292 0.307
Emp Std 0.031 0.115 0.090 0.114 Emp Std 0.030 0.038 0.094 0.162

GPD GPD
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995
ξ 0.200 0.289 0.389 0.470 ξ 0.143 0.202 0.229 0.242
Emp Std 0.040 0.058 0.120 0.305 Emp Std 0.040 0.057 0.143 0.205
Theor Std 0.036 0.054 0.131 0.196 Theor Std 0.035 0.052 0.118 0.169

(c) Nasdaq (corrected data)


GEV GEV
cluster 20 40 200 400 cluster 20 40 200 400
ξ 0.090 0.175 0.266 0.405 ξ 0.099 0.132 0.138 0.266
Emp Std 0.029 0.039 0.085 0.187 Emp Std 0.030 0.041 0.079 0.197

GPD GPD
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995
ξ 0.209 0.229 0.307 0.344 ξ 0.165 0.160 0.210 0.054
Emp Std 0.039 0.052 0.111 0.192 Emp Std 0.039 0.052 0.150 0.209
Theor Std 0.036 0.052 0.123 0.180 Theor Std 0.036 0.050 0.116 0.143

Panel (a) gives the results for the Dow Jones index, panel (b) for the raw Nas-
daq index, and in panel (c) for the Nasdaq index corrected for the “lunch effect.”
Reproduced from [329]
56 2 Marginal Distributions of Returns

Table 2.3. Pickand’s estimates (2.19) of the parameter ξ for the generalized Pareto
distribution (2.15) for thresholds u corresponding to quantiles 90%, 95%, 99% and
99.5% and two different values of the ratio N/k respectively equal to 4 and 10

(a) Dow Jones


Negative Tail Positive Tail
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995
N/k 4 N/k 4
ξ 0.2314 0.2944 –0.1115 0.3314 ξ 0.2419 0.4051 –0.3752 0.5516
emp. Std 0.1073 0.1550 0.3897 0.6712 emp. Std 0.0915 0.1274 0.3474 0.5416
th. Std 0.1176 0.1680 0.3563 0.5344 th. Std 0.1178 0.1712 0.3497 0.5562

N/k 10 N/k 10
mean 0.3119 0.0890 –0.3452 0.9413 ξ 0.3462 0.3215 0.9111 –0.3873
emp. Std 0.1523 0.2219 0.8294 1.1352 emp. Std 0.1766 0.1929 0.6983 1.6038
th. Std 0.1883 0.2577 0.5537 0.9549 th. Std 0.1894 0.2668 0.6706 0.7816

(b) Nasdaq (raw data)


N/k 4 N/k 4
ξ 0.0493 0.0539 –0.0095 0.4559 ξ 0.0238 0.1511 0.1745 1.1052
emp. Std 0.1129 0.1928 0.4393 0.6205 emp. Std 0.1003 0.1599 0.4980 0.6180
th. Std 0.1147 0.1623 0.3601 0.5462 th. Std 0.1143 0.1644 0.3688 0.6272

N/k 10 N/k 10
ξ 0.2623 0.1583 –0.8781 0.8855 ξ 0.2885 0.1435 1.3734 –0.8395
emp. Std 0.1940 0.3085 0.9126 1.5711 emp. Std 0.2166 0.3220 0.7359 1.5087
th. Std 0.1868 0.2602 0.5543 0.9430 th. Std 0.1876 0.2596 0.7479 0.7824

(c) Nasdaq (Corrected data)


N/k 4 N/k 4
ξ 0.2179 0.0265 0.3977 0.1073 ξ 0.2545 –0.0402 –0.0912 1.3915
emp. Std 0.1211 0.1491 0.4585 0.7206 emp. Std 0.1082 0.1643 0.4317 0.6220
th. Std 0.1174 0.1617 0.3822 0.5167 th. Std 0.1180 0.1605 0.3570 0.6720

N/k 10 N/k 10
ξ –0.0878 0.4619 0.0329 0.3742 ξ 0.0877 0.3907 1.4680 0.1098
emp. Std 0.1882 0.2728 0.7561 1.1948 emp. Std 0.1935 0.2495 0.8045 1.2345
th. Std 0.1786 0.2734 0.5722 0.8512 th. Std 0.1822 0.2699 0.7655 0.8172

Panel (a) gives the results for the Dow Jones, panel (b) for the raw Nasdaq data and
panel (c) for the Nasdaq corrected for the “lunch effect.” Reproduced from [329]
2.4 Fitting Distributions of Returns with Parametric Densities 57
   c 
A(b, c, d, u) x−(b+1) exp − xd if x  u > 0
fu (x|b, c, d) = (2.23)
0 if x < u .

Here, b, c, and d are unknown parameters, u is a known lower threshold that


will be varied for the purposes of analysis and A(b, c, d, u) is a normalizing
constant given by the expression:
db c
A(b, c, d, u) = , (2.24)
Γ (−b/c, (u/d)c )
where Γ (a, x) denotes the (non-normalized) incomplete Gamma function:
 ∞
Γ (a, x) = ta−1 e−t dt . (2.25)
x

The parameter b ranges from minus infinity to infinity while c and d range
from zero to infinity. In the particular case where c = 0, the parameter b also
needs to be positive to ensure the normalization of the probability density
function. The family (2.23) includes several well-known pdfs often used in
different applications. We enumerate them.
1. The Pareto distribution:

Fu (x) = 1 − (u/x)b , (2.26)

which corresponds to the set of parameters (b > 0, c = 0) with


A(b, c, d, u) = b · ub . Several works have attempted to derive or justify the
existence of a power tail of the distribution of returns from agent-based
models [91], from optimal trading of large funds with sizes distributed ac-
cording to the Zipf law, as recalled in Sect. 2.2.2, or from ad hoc stochastic
processes [55, 445].
2. The Weibull distribution:

x c
u c 
Fu (x) = 1 − exp − + , (2.27)
d d
with parameter set (b  =c−c,
 c > 0, d > 0) and normalization constant
A(b, c, d, u) = dcc exp ud . Recall that this distribution is said to be a
Stretched-Exponential distribution when the exponent c is smaller than
one, namely when the distribution decays more slowly than an exponential
distribution. Stationary distributions exhibiting this kind of tails arise, for
instance, from the so called α-ARCH processes introduced in [132].
From a theoretical viewpoint, this class of distributions is motivated in
part by the fact that the large deviations of multiplicative processes
are generically distributed with Stretched-Exponential distributions [191].
Stretched-Exponential distributions are also parsimonious examples of the
important subset of subexponentials, that is, of the general class of distri-
butions decaying slower than an exponential [487]. This class of subexpo-
nentials share several important properties of heavy-tailed distributions
58 2 Marginal Distributions of Returns

[146], not shared by exponentials or distributions decreasing faster than


exponentials: for instance, they have “fat tails” in the sense of the asymp-
totic probability weight of the maximum compared with the sum of large
samples [167] (see also [451], Chaps. 1 and 6).
Notwithstanding their fat-tailness, SE distributions have all their mo-
ments finite,13 in contrast with regularly varying distributions for which
moments of order equal to or larger than the tail index b are not defined.
This property may provide a substantial advantage to exploit in general-
izations of the mean-variance portfolio theory using higher-order moments
(see for instance [6, 162, 241, 259, 333, 421, 453] among many others). In
addition, the existence of all moments is an important property allowing
for an efficient estimation of any high-order moment, since it ensures that
the estimators are asymptotically Gaussian. In particular, for Stretched-
Exponentially distributed random variables, the variance, skewness and
kurtosis can be accurately estimated, contrarily to random variables with
regularly varying distribution with tail index in the range 3–5 [356].
3. The Exponential distribution:

x u
Fu (x) = 1 − exp − + , (2.28)
d d
with parameter set (b = −1, c = 1, d > 0) and normalization constant
A(b, c, d, u) = d1 exp − ud . For sufficiently high quantiles, the exponential
behavior can, for instance, derive from the hyperbolic model introduced
by Eberlein et al. [140] or from a simple model where stock price dynam-
ics is governed by a diffusion with stochastic volatility. Dragulescu and
Yakovenko [136] have found an excellent fit of the Dow Jones index for
time lags from 1 to 250 trading days with a model exhibiting an asymp-
totic exponential tail of the distribution of log-returns.
4. The incomplete Gamma distribution:

Γ (−b, x/d)
Fu (x) = 1 − (2.29)
Γ (−b, u/d)

with parameter set (b, c = 1, d > 0) and normalization A(b, c, d, u) =


db
Γ (−b,u/d) . Such an asymptotic tail behavior can, for instance, be observed
for the generalized hyperbolic models, whose description can be found in
[393].
The Pareto distribution (PD) and Exponential distribution (ED) are
one-parameter families, whereas the Weibull/Stretched-exponential (SE) and
the incomplete Gamma distribution (IG) are two-parameter families. The
comprehensive distribution (CD) given by (2.23) contains three unknown
parameters.
13
However, they do not admit an exponential moment, which leads to problems in
the reconstruction of the distribution from the knowledge of their moments [465].
2.4 Fitting Distributions of Returns with Parametric Densities 59

Links between these different models reveal themselves under specific as-
ymptotic conditions. Very interesting is the behavior of the SE model when
c → 0 and u > 0. In this limit, and provided that

u c
c· → β, as c → 0 , (2.30)
d
where β is a positive constant, the SE model tends to the Pareto model.
Indeed, we can write
 c 
u c xc−1 
u c

x c 
c x − uc
c
· xc−1
· exp − c
=c · c exp − · −1 ,
d d d u d u

u c x 
β · x−1 exp −c · ln , as c → 0
 d  u
x
β · x−1 exp −β · ln ,
u

β β+1 , (2.31)
x
which is the pdf of the Pareto model with tail index β. The condition (2.30)
comes naturally from the properties of the maximum likelihood estimator of
the scale parameter d given by (2.B.53) in Appendix 2.B. It implies that, as
c → 0, the characteristic scale d of the SE model must also go to zero with c
to ensure the convergence of the SE model toward the Pareto model.
The Pareto model with exponent β can therefore be approximated with
any desired accuracy on any finite interval  [u, U ], U > u > 0, by the SE
c
model with parameters (c, d) satisfying c ud = β (cf. (2.30), where the ar-
row is replaced by an equality). Although the value c = 0 does not give,
strictly speaking, a SE distribution, the limit c −→ 0 provides any desired ap-
proximation to the Pareto distribution, uniformly on any finite interval [u, U ].
This deep relationship between the SE and PD models allows us to under-
stand why it can be very difficult to decide, on a statistical basis, which of
these models fits the data best.
Another interesting behavior is obtained in the limit b → +∞, where the
Pareto model tends to the exponential model [72]. Indeed, provided that the
scale parameter u of the power law is simultaneously scaled as ub = (b/α)b ,
we can write the tail of the cumulative distribution function of the PD as
ub /(u + x)b which is indeed of the form ub /xb for large x. Then,

ub
x −b
= 1 + α → exp(−αx) for b → +∞ . (2.32)
(u + x)b b
This shows that the exponential model can be approximated with any desired
accuracy on intervals [u, u + A] by the PD model with parameters (β, u)
satisfying ub = (b/α)b , for any positive constant A. Although the value b →
+∞ does not give, strictly speaking, an exponential distribution, the limit u ∝
b −→ +∞ provides any desired approximation to the exponential distribution,
uniformly on any finite interval [u, u + A]. This limit is thus less general than
60 2 Marginal Distributions of Returns

the SE → PD limit since it is valid only asymptotically for u −→ +∞, while


u can be finite in the SE → PD limit.

The Log-Weibull Family of Distributions

Another interesting family is the two-parameter log-Weibull family:


c
Fu (x) = 1 − exp [−b (ln(x/u)) ] , for x ≥ u . (2.33)

whose density is
  x c−1   c 
b·c
ln u
x exp −b ln ux , if x  u > 0
fu (x|b, c, d) = (2.34)
0, if x < u .

This family of pdf interpolates smoothly between the SE and the Pareto
classes. It recovers the Pareto family for c = 1, in which case the parameter b
is the tail exponent. For c larger than one, the tail of the log-Weibull is thinner
than any Pareto distribution but heavier than any Stretched-Exponential.14 In
particular, when c equals two, the log-normal distribution is retrieved (above
threshold u). For c smaller than one, the tails of the log-Weibull distributions
are even heavier than any regularly varying distribution. It is interesting to
note that in this case the log-Weibull distributions do not belong to the do-
main of attraction of a law of the maximum. Therefore, the standard extreme
values theory cannot apply to such distributions. If it would appear that the
log-Weibull distributions with an index c < 1 provides a reasonable description
of the tails of distributions of returns, this would mean that risk management
methods based upon EVT are particularly unreliable (see below).

2.4.2 Parameter Estimation Using Maximum Likelihood


and Anderson-Darling Distance

It is instructive to fit the two data sets used in Sect. 2.3.4 – i.e. the Dow Jones
daily returns and the Nasdaq 5-minute returns – in addition to a sample of
returns of the Standard & Poor’s 50015 over the two decades 1980–1999 by the
distributions enumerated above (2.23), (2.26–2.29) and (2.34). We will show
that no single parametric representation among any of the cited pdfs fits the
whole range of the data sets. Positive and negative returns will be analyzed
separately, the later being converted to the positive semi-axis. The analysis
14
A generalization of the log-Weibull distributions to the following three-parameter
family also contains the SE family in some formal limit. Consider indeed 1 −
F (x) = exp(−b(ln(1 + x/D))c ) for x > 0, which has the same tail as expression
(2.33). Taking D → +∞ together with b = (D/d)c with d finite yields 1 − F (x) =
exp(−(x/d))c ).
15
The returns on the Standard & Poor’s 500 are calculated at five different time
scales: 1 minute, 5 minutes, 30 minutes, an hour and 1 day.
2.4 Fitting Distributions of Returns with Parametric Densities 61

uses a movable lower threshold u, restricting by this threshold the study to


the observations satisfying the condition x > u.
In addition to estimating the parameters involved in each representation
(2.23, 2.26–2.29, 2.34) by maximum likelihood16 for each particular threshold
u, it is important to characterize the goodness-of-fit. There are many mea-
sures of goodness-of-fit; a natural class consists in the distances between the
estimated distribution and the sample distribution. Many distances can be
used: mean-squared error, Kullback-Leibler distance,17 Kolmogorov distance,
Sherman distance (as in [312]) or Anderson-Darling distance, to cite a few.
The parameters of each pdf can also be determined according to the criterion
of minimizing the distance between the estimated distribution and the sample
distribution. The chosen distance is thus useful both for characterizing and
for estimating the parametric pdf. In this case, once an estimation of the pa-
rameters of a particular distribution family has been obtained according to
the selected distance, the quantification of the statistical significance of the
fit requires to derive the statistics associated with the chosen distance. These
statistics are known for most of the examples cited above, in the limit of large
sample.
In addition to the maximum likelihood method (which is associated as we
said with the Kullback-Leibler distance), it is instructive to use the Anderson-
Darling distance to estimate the parameters and perform the tests of goodness-
of-fit. The Anderson-Darling distance between a theoretical distribution func-
tion F (x) and its empirical analog FN (x), estimated from a sample of N
realizations, is defined by

[FN (x) − F (x)]2
ADS = N · dF (x) (2.35)
F (x)(1 − F (x))
and evaluated as

N
= −N − 2 {wk log(F (xk )) + (1 − wk ) log(1 − F (xk ))} , (2.36)
1

where wk = 2k/(2N + 1), k = 1 . . . N and x1  . . .  xN is its ordered


sample. If the sample is drawn from a population with distribution function
F (x), the Anderson-Darling statistic (ADS) has a standard AD-distribution
free of the theoretical distribution function F (x) [11], similarly to the χ2 for
the χ2 -statistic, or the Kolmogorov distribution for the Kolmogorov statis-
tic. It should be noted that the ADS weights [FN (x) − F (x)]2 in (2.35) by
N/F (x)(1 − F (x)) which is nothing but the inverse of its variance. Thus, the
AD distance emphasizes more the tails of the distribution than, say, the Kol-
mogorov distance which is determined by the maximum absolute deviation of
16
The estimators and their asymptotic properties are summarized in Appendix 2.B.
17
This distance (or divergence) is the natural distance associated with the maxi-
mum likelihood estimation since it is for the maximum likelihood values that the
distance between the true model and the assumed model reaches its minimum.
62 2 Marginal Distributions of Returns

FN (x) from F (x) or the mean-squared error, which is mostly controlled by


the middle range of the distribution.
Since we have to insert the estimated parameters into the ADS, this statis-
tic does not obey any more the standard AD-distribution: the ADS decreases
because the use of the fitting parameters ensures a better fit to the sample dis-
tribution (we will come back later, with more details, on this topic in Chap. 5).
However, we can still use the standard quantiles of the AD-distribution as up-
per boundaries of the ADS. If the observed ADS is larger than the standard
quantile with a high significance level (1 − ε), we can then conclude that the
null hypothesis F (x) is rejected with a significance level larger than (1 − ε). If
one wishes to estimate the real significance level of the ADS in the case where
it does not exceed the standard quantile of a high significance level, one is
forced to use some other method, such as the bootstrap method.
In the following, the estimates minimizing the Anderson-Darling distance
will be referred to as AD-estimates. The maximum likelihood estimates (ML-
estimates) are asymptotically more efficient than AD-estimates for indepen-
dent data and under the condition that the null hypothesis (given by one of
the four distributions (2.26–2.29), for instance) corresponds to the true data-
generating model. When this is not the case, the AD-estimates can provide
a better practical tool for approximating sample distributions compared with
the ML-estimates. These estimates will be reported for the thresholds u(qk )
determined by the probability levels q1 = 0, q2 = 0.1, q3 = 0.2, q4 = 0.3,
q5 = 0.4, q6 = 0.5, q7 = 0.6, q8 = 0.7, q9 = 0.8, q10 = 0.9, q11 = 0.925,
q12 = 0.95, q13 = 0.96, q14 = 0.97, q15 = 0.98, q16 = 0.99, q17 = 0.9925,
q18 = 0.995, q19 = 0.999, q20 = 0.9995 and q21 = 0.9999.
Despite the fact that threshold u(qk ) varies from sample to sample, it
always corresponds to the same fixed probability level qk which allows one
to compare the goodness-of-fit for samples of different sizes. In the statistics
presented below, only subsamples with at least 100 data points or so are con-
sidered, in order to allow for a sufficiently accurate assessment of the quantile
under consideration.

2.4.3 Empirical Results on the Goodness-of-Fits

The Anderson-Darling statistics (ADS) for four parametric distributions


(Weibull or Stretched-Exponential, Exponential, Pareto and Log-Weibull) are
shown in Table 2.4 for two quantile ranges, the first top half of the table cor-
responding to the 90% lowest thresholds while the second bottom half corre-
sponds to the 10% highest ones. For the lowest thresholds, the ADS rejects
all distributions at the 95% confidence level, except the SE for the nega-
tive tail of the Standard & Poor’s 500 for the 60-minute returns and for the
Nasdaq. Thus, none of the considered distributions is adequate to model the
data over such large ranges. For the 10% highest quantiles, the exponential
model is rejected at the 95% confidence level except for the negative tails of
the Dow Jones (daily returns) and the Nasdaq. The Log-Weibull and the SE
Table 2.4. Mean Anderson-Darling distances in the range of thresholds u(q1 )–u(q9 ) (90% lowest thresholds) and in the range
u(qi ) ≥ u(q10 ) (10% highest thresholds)
Mean AD-statistics for u1 -u9
S&P 500 1 min S&P 500 5 min S&P 500 30 min
Pos. tail Neg. tail Pos. tail Neg. tail Pos. tail Neg. tail
Weibull 292.85 (100%) 299.46 (100%) 36.62 (100%) 41.04 (100%) 7.36 (100%) 4.84 (100%)
Exponential 771.70 (100%) 718.56 (100%) 86.79 (100%) 108.17 (100%) 17.47 (100%) 16.36 (100%)
Pareto 23998.94 (100%) 23337.60 (100%) 6834.06 (100%) 6563.26 (100%) 1847.40 (100%) 1298.47 (100%)
Log-Weibull 1559.70 (100%) 1470.11 (100%) 360.18 (100%) 331.45 (100%) 60.03 (100%) 67.22 (100%)
Mean AD-statistics for ui ≥ u10
Weibull 6.80 (100%) 5.80 (100%) 1.81 (88%) 1.93 (90%) 0.67 (42%) 0.79 (51%)
Exponential 143.97 (100%) 136.66 (100%) 28.12 (100%) 30.88 (100%) 8.19 (100%) 9.75 (100%)
Pareto 19.97 (100%) 19.24 (100%) 8.10 (100%) 7.61 (100%) 1.63 (85%) 1.77 (88%)
Log-Weibull 3.60 (99%) 4.10 (99%) 1.20 (73%) 1.55 (84%) 0.64 (39%) 0.42 (17%)
Mean AD-statistics for u1 -u9
S&P 500 60 min Nasdaq Dow Jones
Pos. tail Neg. tail Pos. tail Neg. tail Pos. tail Neg. tail
Weibull 3.58 (99%) 2.36 (94%) 1.37 (80%) 0.85 (55%) 4.96 (100%) 3.86 (99%)
Exponential 8.12 (100%) 12.20 (100%) 5.41 (100%) 3.33 (98%) 16.48 (100%) 10.30 (100%)
Pareto 1001.68 (100%) 702.47 (100%) 475.00 (100%) 441.40 (100%) 691.30 (100%) 607.30 (100%)
Log-Weibull 34.44 (100%) 36.55 (100%) 35.90 (100%) 30.92 (100%) 32.30 (100%) 28.27 (100%)
Mean AD-statistics for ui ≥ u10
Weibull 0.66 (41%) 0.68 (42%) 0.67 (42%) 0.50 (29%) 0.38 (13%) 0.35 (10%)
Exponential 4.99 (100%) 4.89 (100%) 3.06 (97%) 1.97 (90%) 3.06 (97%) 1.89 (89%)
Pareto 1.12 (70%) 1.28 (76%) 1.30 (78%) 1.33 (78%) 0.78 (50%) 1.26 (75%)
Log-Weibull 0.48 (23%) 0.57 (32%) 0.46 (29%) 0.49 (30%) 0.38 (13%) 0.69 (43%)
2.4 Fitting Distributions of Returns with Parametric Densities

The figures within parenthesis characterize the goodness of fit: they represent the significance levels with which the considered
model can be rejected. Note that these significance levels are only lower bounds since one or two parameters are fitted. Reproduced
63

from [330]
64 2 Marginal Distributions of Returns

distributions are the best since they are only rejected at the 1-minute time
scale for the Standard & Poor’s 500. The Pareto distribution provides a reli-
able description for time scales larger than or equal to 30 minutes. However,
it remains less accurate than the log-Weibull and the SE distributions, on
average. Overall, it can be noted that the Nasdaq and the 60-minute returns
of the Standard & Poor’s 500 behave very similarly. Let us now analyze each
distribution in more detail.

Pareto Distribution

Figures 2.3 and 2.6 show the complementary sample distribution functions
1 − FN (x) for the Standard & Poor’s 500 index at the 30-minute time scale
and for the daily Dow Jones Industrial Average index, respectively. In Fig. 2.6,
the mismatch between the Pareto distribution and the data can be seen with
the naked eye: even in the tails, one observes a continuous downward curvature
in the double logarithmic diagram, instead of a straight line as would be the
case if the distribution ultimately behaved like a Pareto law. To formalize this
impression, we calculate the ML and AD estimators for each threshold u. For
the Pareto law, the ML estimator is well known to agree with Hill’s estimator.
Indeed, denoting x1  . . .  xNu the ordered subsample of values exceeding u
where Nu is the size of this subsample, the Hill maximum likelihood estimate
of the parameter b is [233]

100

10−1
Complementary sample DF

10−2

10−3

10−4

10−5
10−5 10−4 10−3 10−2 10−1 100
Absolute log−return, x
Fig. 2.6. Complementary sample distribution function for the daily returns of the
Dow Jones index over the time period from 1900–2000. The plain (resp. dotted) line
shows the complementary distribution for the positive (resp. the absolute value of
negative) returns. Reproduced from [330]
2.4 Fitting Distributions of Returns with Parametric Densities 65
5 6

4.5
5
4

Hill’s estimate b
3.5
4
Hill estimate b

2.5 3

2
2
1.5

1
1
0.5

0 −4 −3 −2 −1
0 −4 −3 −2 −1
10 10 10 10 10 10 10 10
Lower threshold u Lower threshold u

Fig. 2.7. Hill estimate b̂u as a function of the threshold u for the Dow Jones (left
panel) and for the Standard & Poor’s 1-minute returns(right panel)

 −1
1 
Nu
x
k
b̂u = log . (2.37)
Nu u
k=1

Its standard deviation can be asymptotically estimated as



Std(bu ) = b̂u / Nu , (2.38)

under the assumption of iid data, but very severely underestimate the true
standard deviation when samples exhibit dependence, as reported by Kearns
and Pagan [267] (see the previous section of this chapter).
Figure 2.7 shows the Hill estimates b̂u as a function of u for the Dow Jones
and for the Standard & Poor’s 500 1-minute returns. Instead of an approx-
imately constant exponent (as would be the case for true Pareto samples),
the tail index estimator, for the Dow Jones, increases until u ∼ = 0.04, beyond
which it seems to slow its growth and oscillates around a value ≈ 3 − 4 up to
the threshold u ∼ = .08. It should be noted that the interval [0, 0.04] contains
99.12% of the sample whereas the interval [0.04, 0.08] contains only 0.64% of
the sample. The behavior of b̂u is very similar for the Nasdaq (not shown).
The behavior of b̂u for the Standard & Poor’s 500 shown on the right panel of
Fig. 2.7 is somewhat different: Hill’s estimate b̂u slows its growth at u ∼ = 0.006,
corresponding to the 95% quantile, then decays until u ∼ = 0.05 (99.99% quan-
tile) and then strongly increases again. Are these slowdowns of the growth
of b̂u genuine signatures of a possible constant well-defined asymptotic value
that would qualify a regularly varying function?
To answer this question, let us have a look at Fig. 2.8 which shows the Hill
estimator b̂u for all data sets (positive and negative branches of the distribu-
tion of returns for the Dow Jones, the Nasdaq and the Standard & Poor’s 500
(SP)) as a function of the index n = 1, 2, . . . , 18 of the quantiles or standard
significance levels q1 , . . . , q18 . Similar results are obtained with the AD esti-
mates. The three branches of the distribution of returns for the Dow Jones
66 2 Marginal Distributions of Returns

95% confidence bounds for b=3.5,


SP (30min) > and N=10000 (sample size)
4 SP (30min) <
SP (60min) >
SP (60min) <
DJ (daily) >
DJ (daily) <
ND (5min) >
3 ND (5min) <
b

0
0 5 10 15 20
Number n of the quantile qn

Fig. 2.8. Hill estimator b̂u for all sets (positive and negative branches of the
distribution of returns for the Dow Jonesc (DJ), Nasdaq (ND) and Standard &
Poor’s 500 (SP)) as a function of the index n = 1, . . . , 18 of the 18 quantiles or
standard significance levels q1 , . . . , q18 given in Table 6.3. The two thick lines (in red)
show the 95% confidence bounds obtained from synthetic time series of 10000 data
points generated with a Student distribution with exponent b = 3.5. Reproduced
from [330]

and the negative tail of the Nasdaq suggest a continuous growth of the Hill
estimator b̂u as a function of n = 1, . . . , 18. However, it turns out that this
apparent growth may be explained solely on the basis of statistical fluctua-
tions and slow convergence to a moderate b-value. Indeed, the two thick lines
show the 95% confidence bounds obtained from synthetic time series of 10000
data points generated with a Student distribution with exponent b = 3.5. It is
clear that the growth of the upper bound can explain the observed behavior
of the b-value obtained for the Dow Jones and Nasdaq data. It would thus be
incorrect to extrapolate this apparent growth of the b-value. However, con-
versely, we cannot conclude with certainty that the growth of the b-value has
been exhausted and that we have access to the asymptotic value. Indeed, large
values of tail indices are for instance predicted by traditional GARCH models
giving b∼ 10–20 [153, 463].
2.4 Fitting Distributions of Returns with Parametric Densities 67

Weibull Distributions

We now present the results of the fits of the same data with the SE distribution
(2.27). The corresponding Anderson-Darling statistics (ADS) are shown in
Table 2.4. The ML-estimates and AD-estimates of the form parameter c are
represented in Table 2.5. Table 2.4 shows that, for the highest quantiles, the
ADS for the SE is the smallest of all ADS, suggesting that the SE is the
best model of all. Moreover, for the lowest quantiles, it is the sole model not
systematically rejected at the 95% level.
The c-estimates are found to decrease when increasing the order q of the
threshold u(q) beyond which the estimations are performed. In addition, sev-
eral c-estimates are found very close to zero. However, this does not auto-
matically imply that the SE model is not the correct model for the data even
for these highest quantiles. Indeed, numerical simulations show that, even
for synthetic samples drawn from genuine SE distributions with exponent c
smaller than 0.5 and whose size is comparable with that of our data, in about
one case out of three (depending on the exact value of c) the estimated value
of c is zero. This a priori surprising result comes from condition (2.B.57) in
Appendix 2.B which is not fulfilled with certainty even for samples drawn for
SE distributions.
Notwithstanding this cautionary remark, note that the c-estimate of the
positive tail of the Nasdaq data equals zero for all quantiles higher than q14 =
0.97%. In fact, in every case, the estimated c is not significantly different
from zero – at the 95% significance level – for quantiles higher than q12 –q14 ,
except for quantile q21 of the negative tail of the Standard & Poor’s 500,
but this value is probably doubtful. In addition, the values of the estimated
scale parameter d, not reported here, are found very small, particularly for
the Nasdaq – beyond q12 = 95% – and the S&P 500 – beyond q10 = 90%. In
contrast, the Dow Jones keeps significant scale factors until q16 –q17 .
These evidences taken all together provide a clear indication on the exis-
tence of a change of behavior of the true pdf of these distributions: while the
bulks of the distributions seem rather well approximated by a SE model, a
distribution with a tail fatter than that of the SE model is required for the
highest quantiles. Actually, the fact that both c and d are extremely small may
be interpreted according to the asymptotic correspondence given by (2.30) and
(2.31) as the existence of a possible power law tail.
At this stage, we can state the following conservative statement: the true
distribution of returns is probably bracketed by a power law, as a lower bound
and a SE as an upper bound. It is therefore particularly interesting to focus
on distributions such as log-Weibull distributions which interpolate between
these two classes in order to obtain – hopefully – a better description of the
data.
68
Table 2.5. Maximum likelihood (MLE) and Anderson-Darling (ADE) estimates of the form parameter c of the Weibull (Stretched-
Exponential) distribution

S&P 500 (1 min) Nasdaq Dow Jones


Pos. tail Neg. tail Pos. tail Neg. tail Pos. tail Neg. tail
MLE ADE MLE ADE MLE ADE MLE ADE MLE ADE MLE ADE
q1 1.065 (0.001) 1.175 1.051 (0.001) 1.158 1.007 (0.008) 1.053 0.987 (0.008) 1.017 1.040 (0.007) 1.104 0.975 (0.007) 1.026
q2 0.927 (0.002) 1.049 0.915 (0.002) 1.035 0.983 (0.011) 1.051 0.953 (0.011) 0.993 0.973 (0.010) 1.075 0.910 (0.010) 0.989
q3 0.8754 (0.002) 1.0196 0.8634 (0.002) 1.0027 0.944 (0.014) 1.031 0.912 (0.014) 0.955 0.931 (0.013) 1.064 0.856 (0.012) 0.948
q4 0.813 (0.002) 0.970 0.799 (0.002) 0.947 0.896 (0.018) 0.995 0.876 (0.018) 0.916 0.878 (0.015) 1.038 0.821 (0.015) 0.933
q5 0.763 (0.003) 0.952 0.752 (0.003) 0.932 0.857 (0.021) 0.978 0.861 (0.021) 0.912 0.792 (0.019) 0.955 0.767 (0.018) 0.889
q6 0.733 (0.003) 0.985 0.727 (0.003) 0.971 0.790 (0.026) 0.916 0.833 (0.026) 0.891 0.708 (0.023) 0.873 0.698 (0.022) 0.819
q7 0.593 (0.004) 0.799 0.590 (0.004) 0.791 0.732 (0.033) 0.882 0.796 (0.033) 0.859 0.622 (0.028) 0.788 0.612 (0.028) 0.713
q8 0.504 (0.005) 0.740 0.502 (0.005) 0.730 0.661 (0.042) 0.846 0.756 (0.042) 0.834 0.480 (0.035) 0.586 0.531 (0.035) 0.597
q9 0.337 (0.007) 0.537 0.342 (0.007) 0.531 0.509 (0.058) 0.676 0.715 (0.059) 0.865 0.394 (0.047) 0.461 0.478 (0.047) 0.527
2 Marginal Distributions of Returns

q10 0.152 (0.010) 0.394 0.159 (0.010) 0.387 0.359 (0.092) 0.631 0.522 (0.099) 0.688 0.304 (0.074) 0.346 0.403 (0.076) 0.387
q11 0.079 (0.012) 0.327 0.091 (0.012) 0.339 0.252 (0.110) 0.515 0.481 (0.120) 0.697 0.231 (0.087) 0.158 0.379 (0.091) 0.337
q12 < 10−8 0.151 < 10−8 0.169 0.039 (0.138) 0.177 0.273 (0.155) 0.275 0.269 (0.111) 0.207 0.357 (0.119) 0.288
q13 < 10−8 0.0793 < 10−8 0.084 0.057 (0.155) 0.233 0.255 (0.177) 0.274 0.253 (0.127) 0.147 0.428 (0.136) 0.465
q14 < 10−8 0.008 < 10−8 0.020 < 10−8 0 0.215 (0.209) 0.194 0.290 (0.150) 0.174 0.448 (0.164) 0.641
q15 < 10−8 0.008 < 10−8 0.008 < 10−8 0 0.103 (0.260) 0 0.379 (0.192) 0.407 0.451 (0.210) 0.863
q16 < 10−8 0.008 < 10−8 0.008 9.6 × 10−8 0 0.064 (0.390) 0 0.398 (0.290) 0.382 0.022 (0.319) 0.110
q17 < 10−8 0.008 < 10−8 0.008 < 10−8 0 0.158 (0.452) 0.224 0.307 (0.346) 0.255 0.178 (0.367) 0.703
q18 < 10−8 0.008 < 10−8 0.008 < 10−8 0 < 10−8 0 2 × 10−8 0 < 10−8 0
q19 0.035 (0.082) 0.007 0.009 (0.032) 0.007 – – – – – – – –
q20 0.111 (0.119) 0.075 0.316 (0.117) 0.007 – – – – – – – –
q21 < 10−8 0.008 0.827 (0.393) 0.900 – – – – – – – –

Reproduced from [330]


2.4 Fitting Distributions of Returns with Parametric Densities 69

Log-Weibull Distributions
The parameters b and c of the log-Weibull distribution defined by (2.33) are
estimated with both the maximum likelihood and Anderson-Darling methods
for the 18 standard significance levels q1 , . . . , q18 (given on page 62) for the
Dow Jones and Nasdaq data and up to q21 for the Standard & Poor’s 500
data. The results for the Dow Jones and the Standard & Poor’s 500 are
given in Table 2.6. For both positive and negative tails of the Dow Jones, the
results are very stable for all quantiles lower than q10 : c = 1.09 ± 0.02 and
b = 2.71 ± 0.07. These results reject the Pareto distribution degeneracy c = 1
at the 95% confidence level. Only for the quantiles higher than or equal to
q16 , an estimated value c compatible with the Pareto distribution is found.
Moreover both for the positive and negative Dow Jones tails, one finds that
c ≈ 0.92 and b ≈ 3.6−3.8, suggesting either a possible change of regime or
a sensitivity to “outliers” or a lack of robustness due to a too small sample
size. For the positive Nasdaq tail, the exponent c is found compatible with
c = 1 (the Pareto value), at the 95% significance level, above q11 while b
remains almost stable at b 3.2. For the negative Nasdaq tail, we find that c
decreases almost systematically from 1.1 for q10 to 1 for q18 for both estimators
while b regularly increases from about 3.1 to about 4.2. The Anderson-Darling
distances are significantly better than for the SE and this statistics cannot be
used to conclude neither in favor of nor against the log-Weibull class.
The situation is different for the Standard & Poor’s 500 (1-min). For
the positive tail, the parameter c remains significantly smaller than 1 from
q14 = 97% to q21 except for q19 and q20 . Therefore, it seems that for very small
time scales, the tails of the distribution of returns might be even fatter than a
power law. As stressed in Sect. 2.4.1, when c is less than one, the log-Weibull
distribution does not belong to the domain of attraction of a law of the max-
imum. As a consequence, EVT cannot provide reliable results when applied
to such data, neither from a theoretical point of view nor from a practical
stance (e.g. extreme risk assessment). The conclusions are the same for the
5-minute time scale. For the 30-minute and 60-minute time scales, c remains
systematically less than one for the highest quantiles but this difference ceases
to be significant. In the negative tail, the situation is overall the same.

2.4.4 Comparison of the Descriptive Power


of the Different Families
The previous sections have shown that none of the considered distributions
(2.26–2.29) and (2.34) fit the data over the entire range, which is not a surprise.
For the highest quantiles, several models seem to be able to represent the data,
including the Pareto model, the SE model and the log-Weibull model discussed
above. The last two models seem to be the most reasonable models among
the models compatible with the data. For all the samples, their Anderson-
Darling statistics remain so close to each other for the highest quantiles that
the descriptive power of these two models cannot be distinguished.
70 2 Marginal Distributions of Returns

Table 2.6. Maximum likelihood (MLE) and Anderson-Darling (ADE) estimates


of the parameters b and c of the log-Weibull distribution defined by (2.33)

Dow Jones (1 day) Positive tail Dow Jones (1 day) Negative tail
MLE ADE MLE ADE
c b c b c b c b
q1 5.262 (0.005) 0.000 (0.000) 5.55 0.000 5.085 (0.005) 0.000 (0.000) 5.320 0.000
q2 2.140 (0.009) 0.241 (0.002) 2.25 0.220 2.125 (0.009) 0.211 (0.002) 2.240 0.191
q3 1.790 (0.010) 0.531 (0.005) 1.87 0.510 1.751 (0.010) 0.495 (0.005) 1.800 0.481
q4 1.616 (0.012) 0.830 (0.008) 1.65 0.820 1.593 (0.012) 0.744 (0.008) 1.630 0.735
q5 1.447 (0.012) 1.165 (0.012) 1.47 1.160 1.459 (0.013) 1.022 (0.011) 1.480 1.015
q6 1.339 (0.012) 1.472 (0.017) 1.36 1.473 1.353 (0.013) 1.311 (0.016) 1.370 1.311
q7 1.259 (0.013) 1.768 (0.023) 1.28 1.773 1.269 (0.014) 1.609 (0.022) 1.270 1.610
q8 1.173 (0.013) 2.097 (0.031) 1.17 2.096 1.188 (0.015) 1.885 (0.030) 1.190 1.887
q9 1.125 (0.015) 2.362 (0.043) 1.12 2.358 1.158 (0.017) 2.178 (0.042) 1.150 2.174
q10 1.090 (0.020) 2.705 (0.070) 1.08 2.695 1.087 (0.022) 2.545 (0.069) 1.090 2.545
q11 1.035 (0.022) 2.771 (0.083) 1.03 2.762 1.074 (0.024) 2.688 (0.085) 1.070 2.681
q12 1.047 (0.027) 2.867 (0.105) 1.04 2.857 1.068 (0.029) 2.880 (0.111) 1.050 2.857
q13 1.046 (0.030) 2.960 (0.121) 1.03 2.933 1.067 (0.032) 2.900 (0.125) 1.080 2.924
q14 1.044 (0.034) 3.000 (0.142) 1.03 2.976 1.132 (0.038) 3.171 (0.158) 1.120 3.155
q15 1.090 (0.043) 3.174 (0.184) 1.09 3.165 1.163 (0.047) 3.439 (0.209) 1.180 3.472
q16 1.085 (0.059) 3.424 (0.280) 1.09 3.425 1.025 (0.056) 3.745 (0.322) 1.010 3.731
q17 1.093 (0.066) 3.666 (0.345) 1.09 3.650 1.108 (0.069) 3.822 (0.380) 1.120 3.891
q18 0.935 (0.071) 3.556 (0.411) 0.902 3.484 0.921 (0.071) 3.804 (0.461) 0.933 3.846
S&P 500 (1 min) Positive tail S&P 500 (1 min) Negative tail
MLE ADE MLE ADE
c b c b c b c b
q1 3.261 (0.003) 0.029 (0.000) 3.298 0.027 3.232 (0.003) 0.030 (0.000) 3.264 0.028
q2 1.875 (0.002) 0.433 (0.001) 1.878 0.410 1.884 (0.002) 0.420 (0.001) 1.881 0.399
q3 1.645 (0.002) 0.723 (0.001) 1.642 0.690 1.647 (0.002) 0.707 (0.001) 1.641 0.676
q4 1.471 (0.002) 1.017 (0.001) 1.477 0.970 1.465 (0.002) 1.000 (0.001) 1.470 0.954
q5 1.414 (0.002) 1.277 (0.002) 1.405 1.233 1.411 (0.002) 1.251 (0.002) 1.401 1.208
q6 1.382 (0.002) 1.512 (0.002) 1.387 1.477 1.383 (0.002) 1.477 (0.002) 1.389 1.443
q7 1.233 (0.002) 1.862 (0.003) 1.234 1.811 1.232 (0.002) 1.823 (0.003) 1.239 1.776
q8 1.187 (0.002) 2.155 (0.005) 1.192 2.116 1.192 (0.002) 2.117 (0.005) 1.196 2.079
q9 1.112 (0.002) 2.508 (0.007) 1.111 2.470 1.113 (0.002) 2.455 (0.007) 1.112 2.415
q10 1.069 (0.003) 2.876 (0.011) 1.078 2.896 1.062 (0.003) 2.818 (0.011) 1.074 2.831
q11 1.048 (0.003) 2.961 (0.014) 1.066 3.016 1.055 (0.003) 2.927 (0.014) 1.069 2.972
q12 1.016 (0.004) 3.048 (0.018) 1.033 3.123 1.015 (0.004) 3.006 (0.017) 1.034 3.076
q13 1.002 (0.004) 3.063 (0.020) 1.021 3.151 1.001 (0.004) 3.033 (0.020) 1.020 3.115
q14 0.981 (0.005) 3.054 (0.023) 1.003 3.153 0.990 (0.005) 3.033 (0.023) 1.012 3.134
q15 0.961 (0.006) 3.015 (0.027) 0.985 3.133 0.978 (0.006) 3.004 (0.027) 1.003 3.132
q16 0.941 (0.008) 2.867 (0.036) 0.961 2.980 0.937 (0.008) 2.871 (0.037) 0.957 2.987
q17 0.937 (0.010) 2.798 (0.040) 0.951 2.899 0.927 (0.010) 2.780 (0.041) 0.947 2.887
q18 0.902 (0.011) 2.649 (0.046) 0.902 2.677 0.925 (0.012) 2.644 (0.046) 0.940 2.726
q19 0.994 (0.028) 2.256 (0.084) 0.971 2.201 0.962 (0.027) 2.134 (0.080) 0.923 2.063
q20 0.999 (0.039) 2.245 (0.118) 0.967 2.139 1.011 (0.040) 2.037 (0.107) 0.933 1.879
q21 0.949 (0.083) 2.686 (0.330) 0.957 2.801 1.288 (0.115) 3.387 (0.455) 1.234 3.272

The numbers in parenthesis give the standard deviations of the estimates. Repro-
duced from [330]
2.4 Fitting Distributions of Returns with Parametric Densities 71

One can go further and ask which of these models are sufficient to describe
the data compared with the comprehensive distribution (2.23) encompassing
all of them. Here, the four distributions (2.26–2.29) are compared with the
comprehensive distribution (2.23) using Wilks’ theorem [485] on maximum
likelihood ratios, which allows to compare nested hypotheses. It will be shown
that the Pareto and the SE models are the most parsimonious. We then turn
to a direct comparison of the best two-parameter models (the SE and log-
Weibull models) with the best one-parameter model (the Pareto model), which
will require an extension of Wilks’ theorem derived in Appendix 2.D. This
extension allows us to directly test the SE model against the Pareto model.

Comparison Between the Four Parametric Families (2.26–2.29)


and the Comprehensive Distribution (2.23)

According to Wilks’ theorem, the doubled log-likelihood ratio Λ:

max L(CD, X, Θ)
Λ = 2 log , (2.39)
max L(z, X, θ)

has asymptotically (as the size N of the sample X tends to infinity) the χ2 -
distribution. Here L denotes the likelihood function, θ and Θ are parametric
spaces corresponding to hypotheses z and CD (comprehensive distribution
defined in (2.23)) correspondingly (hypothesis z is one of the four hypotheses
(2.26–2.29) that are particular cases of the CD under some parameter re-
strictions recalled in Sect. 2.4.1). The statement of the theorem is valid under
the condition that the sample X obeys the hypothesis z for some particular
value of its parameter belonging to the space θ. The number of degrees of
freedom of the χ2 -distribution is equal to the difference of the dimensions
of the two spaces Θ and θ. We have dim(Θ) = 3, dim(θ) = 2 for the SE
and for the incomplete Gamma distributions while dim(θ) = 1 for the Pareto
and the Exponential distributions. This leads to one degree of freedom of the
χ2 -distribution for the two former cases and two degrees of freedom of the
χ2 -distribution for the later models. The maximum of the likelihood in the
numerator of (2.39) is taken over the space Θ, whereas the maximum of the
likelihood in the denominator of (2.39) is taken over the space θ. Since we
have always θ ⊂ Θ, the likelihood ratio is always larger than 1, and the log-
likelihood ratio is non-negative. If the observed value of Λ does not exceed
some high-confidence level (say, 99% confidence level) of the χ2 , we then re-
ject the hypothesis CD in favor of the hypothesis z, considering the space Θ
redundant. Otherwise, we accept the hypothesis CD, considering the space θ
insufficient.
The double log-likelihood ratios (2.39) are shown for the positive and neg-
ative branches of the distribution of returns in Fig. 2.9 for the Nasdaq Com-
posite index. Similar results (not shown) are obtained for the Dow Jones and
the Standard & Poor’s 500 (1, 5, 30 and 60 minutes) indices.
72 2 Marginal Distributions of Returns

100

90
Wilks statistic (doubled log−likelihood ratio)

80

70

60

50

40

30

20

10

0
0 0.5 1 1.5 2 2.5 3
Lower threshold, u −3
x 10

80

70
Wilks statistic (doubled log−likelihood ratio)

60

50

40

30

20

10

0
0 0.5 1 1.5 2 2.5 3
Lower threshold, u −3
x 10

Fig. 2.9. Wilks statistic for the comprehensive distribution versus the four para-
metric distributions: Pareto (), Stretched-Exponential (∗), Exponential, (◦) and
incomplete Gamma () for the Nasdaq 5-minute returns. The upper (lower) panel
refers to the positive (negative) returns. The horizontal lines represent the critical
values at the 95% confidence level of the test for the χ2 -distribution with one (lower
line) and two (upper line) degrees of freedom. Reproduced from [330]
2.4 Fitting Distributions of Returns with Parametric Densities 73

For the Nasdaq data, Figure 2.9 clearly shows that the Exponential dis-
tribution is completely insufficient: for all lower thresholds, the Wilks log-
likelihood ratio exceeds the critical value corresponding to the 95% level of
the χ21 function. The Pareto distribution is insufficient for thresholds corre-
sponding to quantiles less than q11 = 92.5% and becomes comparable with the
comprehensive distribution beyond. It is natural that the families with two
parameters, the incomplete Gamma and the SE, have higher goodness-of-fit
than the one-parameter Exponential and Pareto distributions. The incomplete
Gamma distribution is comparable with the comprehensive distribution be-
yond quantile q10 = 90%, whereas the SE is somewhat better beyond quantile
q8 = 70%. For the tails representing 7.5% of the data, all parametric families
except for the Exponential distribution fit the sample distribution with almost
the same efficiency according to this test.
The results obtained for the Dow Jones data are similar. The SE is com-
parable with the comprehensive distribution starting with q8 = 70%. On the
whole, one can say that the SE distribution performs better than the three
other parametric families.
The situation is somewhat different for the Standard & Poor’s 500 index.
For the positive tail, none of the four distributions is really sufficient in order
to accurately describe the data. The comprehensive distribution is overall
the best. In the negative tail, we retrieve a behavior more similar to that
observed in the two previous cases, except for the Exponential distribution
which also appears to be better than the comprehensive distribution. However,
it should be noted that the comprehensive distribution is only rejected in the
very far tail. The four models (2.26–2.29) are better than the comprehensive
distribution only for the two highest quantiles (q20 and q21 ) of the negative
tail. In contrast, the Pareto, SE and incomplete Gamma models are better
than the comprehensive distribution over the 10 highest quantiles (or so) for
the Nasdaq and the Dow Jones.
We should stress again that each log-likelihood ratio, so-to say “acts on its
own ground” that is, the corresponding χ2 -distribution is valid under the as-
sumption of the validity of each particular hypothesis whose likelihood stands
in the numerator of the double log-likelihood (2.39). It would be desirable to
compare all combinations of pairs of hypotheses directly, in addition to com-
paring each of them with the comprehensive distribution. Unfortunately, the
Wilks theorem cannot be used in the case of pair-wise comparison because the
problem is no more that of comparing nested hypothesis (i.e., one hypothe-
sis is a particular case of the comprehensive model). As a consequence, the
previous results on the comparison of the relative merits of each of the four
distributions using the generalized log-likelihood ratio should be interpreted
with care, in particular, in a case of contradictory conclusions. Fortunately, the
main conclusion of the comparison (an advantage of the SE distribution over
the three other distributions) does not contradict the earlier results discussed
above.
74 2 Marginal Distributions of Returns

Pair-Wise Comparison of the Pareto Model


with the Stretched-Exponential and Log-Weibull Models

Let us compare formally the descriptive power of the SE distribution and the
log-Weibull distribution (the two best two-parameter models qualified until
now) with that of the Pareto distribution (the best one-parameter model).
For the comparison of the log-Weibull model versus the Pareto model, Wilks’
theorem can still be applied since the log-Weibull distribution encompasses
the Pareto distribution. A contrario, the comparison of the SE versus the
Pareto distribution should in principle require that we use the methods for
testing non-nested hypotheses [209], such as the Wald encompassing test or
the Bayes factors [266]. Indeed, the Pareto model and the (SE) model are
not, strictly speaking, nested. However, as shown in Sect. 2.4.1, the Pareto
distribution is a limited case of the SE distribution, as the fractional exponent
c goes to zero. Changing the parametric representation of the (SE) model into

 
b

x c
f (x|b, c) = b u−c xc−1 exp − −1 , x>u, (2.40)
c u
 c
i.e., setting b = c · ud , where the parameter d refers to the former (SE)
representation (2.27), Appendix 2.D shows that the doubled log-likelihood
ratio
maxb,c LSE
W = 2 log (2.41)
maxb LP D
still follows Wilks’ statistic, namely is asymptotically distributed according to
a χ2 -distribution, with one degree of freedom in the present case. Thus, even
in this case of non-nested hypotheses, Wilks’ statistic still allows us to test
the null hypothesis H0 according to which the Pareto model is sufficient to
describe the data.
Concerning the comparison between the Pareto model and the SE one, the
null hypothesis is found to be more often rejected for the Dow Jones than for
the Nasdaq and the Standard & Poor’s 500 [330]. Indeed, beyond the quantile
q12 = 95%, the Pareto model cannot be rejected in favor of the SE model at
the 95% confidence level for the Nasdaq and the Standard & Poor’s 500 data.
For the Dow Jones, one must consider quantiles higher than q16 = 99% – at
least for the negative tail – in order not to reject H0 at the 95% significance
level. These results are in qualitative agreement with what we could expect
from the action of the central limit theorem: the power law regime (if it really
exists) is pushed back to higher quantiles due to time aggregation (recall that
the Dow Jones data is at the daily scale while the Nasdaq data is at the
5-minute time scale).
It is, however, more difficult to rationalize the fact reported in [330] that
the SE model is not rejected (at the 99% confidence level) for the two highest
quantiles (q20 = 99.95% and q21 = 99.99%) of the negative tail of the 1 minute
2.4 Fitting Distributions of Returns with Parametric Densities 75

returns of the Standard & Poor’s 500 and for the quantiles q19 = 99.9% and
q20 = 99.95% for its positive tail. This might be ascribed to a lack of power
of the test, but recall that we have restricted our investigation to empirical
quantiles with more than a hundred points (or so). Therefore, invoking a lack
of power is not very convincing. In addition, for these high quantiles, the
fractional exponent c in the SE model becomes significantly different from
zero (see Table 2.5). It could be an empirical illustration of the existence of
a cut-off beyond which the power law regime is replaced by an exponential
(or stretched-exponential) decay of the distribution function as suggested by
Mantegna and Stanley [344] and by the recent model [493] based upon a pure
jump Lévy process, whose jump arrival rate obeys a power law dampened
by an exponential function. To strengthen this idea, it can be noted that the
exponential distribution is found sufficient to describe the distributions of the
1 minute returns of the Standard & Poor’s 500, while it is always rejected
(with respect to the comprehensive distribution) for the Nasdaq and the Dow
Jones. Thus, this non-rejection could really be the genuine signature of a cut-
off beyond which the decay of the distribution is faster than any power law.
However, this conclusion is only drawn from the one hundred most extreme
data points and, therefore, should be considered with caution. Larger samples
should be considered to obtain a confirmation of this intuition. Unfortunately,
samples with more than 10 million (non zero) data points (for a single asset)
are not yet accessible.
Based upon the study of [330], Wilks’ test for the Pareto distribution
versus the log-Weibull distribution shows that, for quantiles above q12 , the
Pareto distribution cannot be rejected in favor of the log-Weibull for the Dow
Jones, the Nasdaq and the Standard & Poor’s 500 30-minute returns. This
parallels the lack of rejection of the Pareto distribution against the SE beyond
the significance level q12 . The picture is different for the 1-minute returns of
the Standard & Poor’s 500. The Pareto model is almost always rejected. The
most interesting point is the following: in the negative tail, the Pareto model
is always strongly rejected except for the highest quantiles. Comparing with
Table 2.6, one clearly sees that between q15 and q18 the exponent c is signif-
icantly (at the 95% significance level) less than one, indicating a tail fatter
than any power law. On the contrary, for q21 , the exponent c is found signif-
icantly larger than one, indicating a change of regime and again an ultimate
decay of the tail of the distribution faster than any power law.
In summary, the null hypothesis that the true distribution is the Pareto
distribution is strongly rejected until quantiles 90–95% or so. Thus, within
this range, the Stretched-Exponential and log-Weibull models seem the best
and the Pareto model is insufficient to describe the data. But, for the very
highest quantiles (above 95%–98%), one cannot reject any more the hypothe-
sis that the Pareto model is sufficient compared with the SE and log-Weibull
models. These two parameter models can then be seen as a redundant para-
meterization for the extremes compared with the Pareto distribution, except
for the returns calculated at the smallest time scales.
76 2 Marginal Distributions of Returns

2.5 Discussion and Conclusions

2.5.1 Summary

This chapter has revisited the generally accepted fact that the tails of the
distributions of returns present a power-like behavior. Often, the conviction
of the existence of a power-like tail is based on the Gnedenko theorem stating
the existence of only three possible types of limit distributions of normalized
maxima (a finite maximum value, an exponential tail, and a power-like tail)
together with the exclusion of the first two types by empirical evidence. The
power-like character of the tails of the distribution of log-returns follows then
simply from the power-like distribution of maxima. However, in this chain
of arguments, the conditions needed for the fulfillment of the correspond-
ing mathematical theorems are often omitted and not discussed properly. In
addition, widely used arguments in favor of power law tails invoke the self-
similarity of the data but are often assumptions rather than experimental
evidence or consequences of economic and financial laws.
Sharpening and generalizing the results obtained by Kearns and Pagan
[267], Sect. 2.3.3 has recalled that standard statistical estimators of heavy
tails are much less efficient than often assumed and cannot in general clearly
distinguish between a power law tail and a SE tail (even in the absence of
long-range dependence in the volatility). So, in view of the stalemate reached
with the nonparametric approaches and in particular with the standard ex-
treme value estimators, resorting to a parametric approach appears essential.
The parametric approach is useful to decide which class of extreme value
distributions – rapidly versus regularly varying – accounts best for the em-
pirical distributions of returns at different time scales. However, here again,
the problem is not as straightforward as its appears. Indeed, in order to apply
statistical methods to the study of empirical distributions of returns and to de-
rive their resulting implication for risk management, it is necessary to keep in
mind the existence of necessary conditions that the empirical data must obey
for the conclusions of the statistical study to be valid. Maybe the most impor-
tant condition in order to speak meaningfully about distribution functions is
the stationarity of the data, a difficult issue that we have barely touched upon
here. In particular, the importance of regime switching is now well established
[14, 397] and its possible role should be assessed and accounted for.

2.5.2 Is There a Best Model of Tails?

The results that standard statistical estimators of heavy tails are much less
efficient than often assumed and cannot in general clearly distinguish between
a power law tail and a SE tail, can be rationalized by the fact that, into a cer-
tain limit, the Stretched-Exponential pdf tends to the Pareto distribution (see
(2.30–2.31) and Appendix 2.B). Thus, the Pareto (or power law) distribution
2.5 Discussion and Conclusions 77

can be approximated with any desired accuracy on an arbitrary interval by a


suitable adjustment of the pair (c, d) of the parameters of the Stretched Ex-
ponential pdf. The parametric tests presented above indicate that the class of
SE and log-Weibull distributions provide a significantly better fit to empirical
returns than the Pareto, the exponential or the incomplete Gamma distribu-
tions. All these tests are consistent with the conclusion that these two models
provide the best effective apparent and parsimonious models to account for
the empirical data on the largest possible range of returns.
However, this does not mean that the Stretched Exponential or the log-
Weibull model is the correct description of the tails of empirical distributions
of returns. Again, as already mentioned, the strength of these models come
from the fact that they encompass the Pareto model in the tail and offers a
better description in the bulk of the distribution. To see where the problem
arises, Table 2.7 summarizes the best ML-estimates for the SE parameters c
(form parameter) and d (scale parameter) restricted to the quantiles beyond
q12 = 95%, which offers a good compromise between a sufficiently large sample
size and a restricted tail range leading to an accurate approximation in this
range.
One can see that c is very small (and all the more so for the scale parameter
d) for the tail of the distribution of positive returns of the Nasdaq data,
suggesting a convergence to a power law tail. The exponents c for the negative
returns of the Nasdaq data and for both positive and negative returns of the
Dow Jones data are an order of magnitude larger but the statistical tests show
that they are not incompatible with an asymptotic power tail either. Indeed,
Sect. 2.4.4 has shown that, for the very highest quantiles (above 95–98%),
one cannot reject the hypothesis that the Pareto model is sufficient compared
with the SE model. The values of c and d are even smaller for the Standard
& Poor’s 500 data both at the 1-minute and 5-minute time scales.

Table 2.7. Best parameters c and d of the Stretched-Exponential model and best
parameter b of the Pareto model estimated beyond quantile q12 = 95% for the Dow
Jones (DJ), the Nasdaq (ND) and the Standard & Poor’s 500 (SP) indices. The
apparent Pareto exponent c(u(q12 )/d)c (see expression (2.30)) is also shown

Sample c d c(u(q12 )/d)c b


DJ pos. returns 0.274 (0.111) 4.81 × 10−6 2.68 2.79 (0.10)

DJ neg. returns 0.362 (0.119) 1.02 × 10−4 2.57 2.77 (0.11)

ND pos. returns 0.039 (0.138) 4.54 × 10−52 3.03 3.23 (0.14)

ND neg. returns 0.273 (0.155) 1.90 × 10−7 3.10 3.35 (0.15)

SP pos. returns (1min) – – 3.01 3.02 (0.02)

SP neg returns (1min) – – 2.97 2.97 (0.02)

SP pos. returns (5min) 0.033 (0.031) 3.06 × 10−59 2.95 2.95 (0.03)

SP neg. returns (5min) 0.033 (0.031) 3.26 × 10−56 2.87 2.86 (0.03)
78 2 Marginal Distributions of Returns

Note also that the exponents c are larger for the daily Dow Jones data
than for the 5-minute Nasdaq data and the 1-minute and 5-minute Standard
& Poor’s 500 data, in agreement with an expected (slow) convergence to the
Gaussian law according to the central limit theory.18 However, a t-test does
not allow one to reject the hypothesis that the exponents c remain the same for
the positive and negative tails of the Dow Jones data. This confirms previous
results, for instance [319, 256] according to which the extreme tails can be
considered as symmetric, at least for the Dow Jones data. In contrast, there is
a very strong asymmetry for the 5-minute sampled Nasdaq and the Standard
& Poor’s 500 data.
These are the evidences in favor of the existence of an asymptotic power
law tail. Balancing this view, many of the tests have shown that the power
law model is not as powerful compared with the SE and log-Weibull models,
even arbitrarily far in the tail (as far as the available data allows us to probe).
In addition, for the smallest time scales, the tail of the distribution of return
is, over a large range, well-described by a log-Weibull distribution with an
exponent c less than one, i.e., is fatter than any power law. A change of
regime is ultimately observed and the very extreme tail decays faster than any
power law. Both a SE or a log-Weibull model with exponent c > 1 provide a
reasonable description.
Attempting to wrap up the different results obtained by the battery of
tests presented here, we can offer the following conservative conclusion: it
seems that the tails of the distributions examined here are decaying faster
than any (reasonable) power law but slower than any Stretched-Exponentials.
Maybe log-normal distributions could offer a better effective description of
the distribution of returns,19 as suggested in [436].
In sum, in the most practical case, the Pareto distribution is sufficient
above quantiles q12 = 95% but is not stable enough to ascertain with strong
confidence an asymptotic power law nature of the pdf.

2.5.3 Implications for Risk Assessment

The correct description of the distribution of returns has important impli-


cations for the assessment of large risks not yet sampled by historical time
series. Indeed, the whole purpose of a characterization of the functional form
of the distribution of returns is to extrapolate currently available historical
18
See [453] and Figs. 3.6–3.9 pp. 81–82 of [451] where it is shown that SE distri-
butions are approximately stable in family and the effect of aggregation can be
seen to slowly increase the exponent c. See also [137] which studies specifically
this convergence to a Gaussian law as a function of the time scale.
19
Let us stress that we are speaking of a log-normal distribution of returns, not of
price! Indeed, the standard Black and Scholes model of a log-normal distribution
of prices is equivalent to a Gaussian distribution of returns. Thus, a log-normal
distribution of returns is much more fat-tailed, and in fact bracketed by power
law tails and Stretched-Exponential tails.
2.5 Discussion and Conclusions 79

time series beyond the range provided by the empirical reconstruction of the
distributions. For risk management, the determination of the tail of the dis-
tribution is crucial. Indeed, many risk measures, such as the Value-at-Risk or
the expected-shortfall, are based on the properties of the tail of the distrib-
utions of returns. In order to assess risk at probability levels of 95% or so,
nonparametric methods have merits. However, in order to estimate risks at
high probability level such as 99% or larger, nonparametric estimations fail by
lack of data and parametric models become unavoidable. This shift in strat-
egy has a cost and replaces sampling errors by model errors. The considered
distribution can be too thin-tailed as when using normal laws, and risk will be
underestimated, or it can be too fat-tailed and risk will be overestimated as
with Lévy law and possibly with regularly varying distributions. In each case,
large amounts of money are at stake and can be lost due to a too conservative
or too optimistic risk measurement.
In order to bypass these problems, many authors [34, 313, 355, among
others] have proposed to estimate the extreme quantiles of the distributions
in a semiparametric way, which allows one (i) to avoid the model errors and (ii)
to limit the sampling errors with respect to nonparametric methods and thus
to keep a reasonable accuracy in the estimation procedure. For this aim, it has
been suggested to use the extreme value theory.20 However, as emphasized in
Sect. 2.3.3, estimates of the parameters of such (GEV or GPD) distributions
can be very unreliable in the presence of dependence, so that these methods
finally appear to be not very accurate and one cannot avoid a parametric
approach for the estimation of the highest quantiles.
The above analysis suggests that the Paretian paradigm leads to an overes-
timation of the probability of large events and therefore leads to the adoption
of too conservative positions. Generalizing to larger time scales, the overly
pessimistic view of large risks deriving from the Paretian paradigm should be
all the more revised, due to the action of the central limit theorem. The above
comparison between several models, which turn out to be almost undistin-
guishable such as the Stretched-Exponential, the Pareto and the log-Weibull
distributions, offers the important possibility of developing scenarios that can
test the sensitivity of risk assessment to errors in the determination of para-
meters and even more interesting with respect to the choice of models, often
referred to as model errors.
Finally, an additional note of caution is in order. This chapter has fo-
cused on the marginal distributions of returns calculated at fixed time scales
and thus neglects the possible occurrence of runs of dependencies, such as
in cumulative drawdowns. In the presence of dependencies between returns,
and especially if the dependence is nonstationary and increases in time of
stress, the characterization of the marginal distributions of returns is not suf-
ficient. As an example, Johansen and Sornette [249] (see also Chap. 3 of [450])
20
See, for instance, http://www.gloriamundi.org for an overview of the extensive
application of EVT methods for VaR and expected-shortfall estimation.
80 2 Marginal Distributions of Returns

have recently shown that the recurrence time of very large drawdowns cannot
be predicted from the sole knowledge of the distribution of returns and that
transient dependence effects occurring in times of stress make very large draw-
downs more frequent, qualifying them as abnormal “outliers” (other names
are “kings” or “black swans”).

Appendix

2.A Definition and Main Properties of Multifractal Processes

The traditional description of the dynamics of asset prices initiated by Bache-


lier [26] was based upon the Brownian motion and then the geometric Brown-
ian motion [377, 425]. But it is now widely recognized that these descriptions
suffer from two major discrepancies. As shown in this chapter, the stationary
distribution of asset returns is far from the Gaussian law (it exhibits fat tails)
and, in addition, the volatility of asset returns has long range dependence (or
long memory), which is characterized by the alternation of periods of small
price changes and periods of large price changes.
In the mathematical literature, stochastic processes are said to exhibit
a long memory when their autocovariance function decays hyperbolically
[85, 213, 222]. Fractionally integrated processes like ARFIMA21 and FI-
GARCH22 [31] processes are discrete time processes that enjoy this property.
The first class is not suitable for the modeling of financial assets returns in
so far as it yields long memory in the returns themselves. In contrast, the
second class leads to long memory properties in the squared returns, which
is more appropriate. However, an important question remains open concern-
ing FIGARCH processes: is this kind of representation time consistent? That
is, given that the daily returns of an asset can be modeled by a FIGARCH
process, can we still model the monthly returns of this asset by a FIGARCH
process? In other words, is the class of FIGARCH processes closed under time
aggregation?
If time-consistency is not obeyed, the comparison of the discrete-time
model with empirical data at all time-scales simultaneously imposes strong
additional restrictions on the model. It is thus highly desirable that a suitable
discrete-time model be time-consistent. Note that continuous-time models are
time-consistent by construction, justifying the emphasis on continuous-time
stochastic processes with long memory.
This Appendix presents useful results on a family of continuous-time sto-
chastic processes which enjoy the property of long memory, the so-called mul-
tifractal process, born from the generalization of the seminal works by Man-
delbrot on the notions of self-similarity and fractality.
21
Fractionally integrated autoregressive moving average.
22
Fractionally integrated GARCH.
Appendix 81

2.A.1 Self-similar Processes, Multiplicative Cascades


and Multifractal Processes

Before presenting two examples of multifractal processes with suitable prop-


erties for the modeling of financial asset prices, it is useful to describe their
underpinning. First, let us recall that given the filtered space (Ω, {Ft }t≥0 , P),
the stochastic process {X(t)} (with X(0) = 0) is self-similar with exponent
H > 0 if, by definition, for all λ, k, t1 , . . . , tk ≥ 0
law  
{X (λt1 ) , . . . , X (λtk )} = λH · X (t1 ) , . . . , λH · X (tk ) . (2.A.1)
The most famous self-similar stochastic process is obviously the Brownian mo-
tion whose exponent H = 1/2. It belongs to the family of self-similar processes
with stationary Gaussian increments, namely the Fractional Brownian Mo-
tions whose exponent H range is ]0, 1[; when 0 < H < 1/2, the autocorrelation
of the increments is negative (antipersistence) while it is positive (persistence)
when 1/2 < H < 1. In the later case, the Fractional Brownian Motion exhibits
long memory.
Let us consider the law of the increments δl X(t) = X(t)−X(t−l). Assum-
ing the stationarity of these increments, the law of δl X(t) is the same as the
law of X(l) (since X(0) = 0). Thus, if X(t) is self-similar with an exponent
H, it is easy to prove that the q-order moment of δl X(t) and δL X(t), denoted
by M (q, l) and M (q, L) respectively, are related by
 qH
l
M (q, l) = M (q, L) . (2.A.2)
L
This is called a “monofractal” process characterized by a linear dependence
of the moment exponent ζ(q) = qH as a function of the moment order q.
A multifractal process is obtained by using a weaker form of self-similarity.
Instead of the simple scaling rule
law
X(λt) = λH · X(t) (2.A.3)
induced by (2.A.1), multifractal processes enjoy the more general property
law
X(λt) = K(λ) · X(t) (2.A.4)
where X and K are independent random variables. This generalized scaling
rule induces strong restrictions on the distribution of the stochastic process.
For instance, it is straightforward to show that for all λ, k, t1 , . . . , tk > 0
X(λt1 ) law X(λt2 ) law law X(λtk )
= = ··· = , (2.A.5)
X(t1 ) X(t2 ) X(tk )
law
and that K(µ · ν) = K  (µ) · K  (ν), where K  and K  are two independent
copies of K. This last relation implies (provided that the expectations are
finite)
82 2 Marginal Distributions of Returns

E [|K(µ · ν)|q ] = E [|K(µ)|q ] · E [|K(ν)|q ] , (2.A.6)

which immediately yields

E [|K(λ)|q ] = λζ(q) , ∀λ > 0 , (2.A.7)

for some real-valued function ζ(·) such that ζ(0) = 0. Considering the relation
between the q-order moments of δl X(t) and δL X(t), (2.A.2) generalizes as
follows
 ζ(q)
l
M (q, l) = M (q, L) . (2.A.8)
L

The function ζ(q) defines the multifractal spectrum of the process.


Processes enjoying this scaling property can be derived from so-called mul-
tiplicative cascades [17, 190, 340, 341]. It is convenient to present multiplica-
tive cascades with discrete scales ln = 2−n L. A multiplicative cascade for the
increments δX is defined by relating the local variation of the process δln X
at scale ln to the variation at scale L according to
n

δln X(t) = Wi δL X(t) , (2.A.9)
i=1

where Wi are i.i.d. random positive factors. Realizations of such processes


can be constructed using orthonormal wavelet bases [16]. If one defines the
magnitude ω(t, l) at time t and scale l as [17],
1
ω(t, l) = ln(|δl X(t)|2 ) , (2.A.10)
2
then the cascade (2.A.9) becomes a simple random walk as a function of the
logarithm of scales, at a fixed time t:

ω(t, ln+1 ) = ω(t, ln ) + ln(Wn+1 ) . (2.A.11)

Assuming that W follows a log-normal law with parameters (µ, λ2 ), the magni-
tude ω admits a density at scale ln , Qln (ω), which satisfies the simple equation

 
Qln (ω) = ϕ(µ, λ2 )∗n ∗ QL (ω) (2.A.12)

where ∗ is the convolution product and ϕ(µ, λ2 ) denotes the Gaussian density
function with mean µ and variance λ2 . Going back to the original variable δX,
the previous equation provides us with the expression of the density function
of δX at scale ln

Pln (x) = Gln ,L (u)e−u PL (e−u x)du , (2.A.13)

where
Appendix 83

Gln ,L = ϕ(µ, λ2 )∗n = ϕ(nµ, nλ2 ) .


Conversely, a process that satisfies (2.A.13) with a normal kernel G can
be written as
law
δl X(t) = W · δ2l X(t) (2.A.14)
where W is distributed according to a log-normal law with mean µ and vari-
ance λ2 . By iterating this equation n times, one thus recovers the cascade
(2.A.9). Therefore, the cascade picture across scales constitutes a paradigm
of multifractal self-similar processes. The log-normal cascade model on the
dyadic tree associated to the orthonormal wavelet representation leads to a
magnitude correlation function given by

Cω (τ, l) = Cov(ω(t, l), ω(t + τ, l)) ∝ −λ2 ln(τ /T ) , for l < τ < T ,
(2.A.15)
which is proportional to the logarithm of the lag τ . The parameter T is called
the “integral time scale” and is such that Cω (τ, l) is exactly 0 for τ > T .

2.A.2 The Multifractal Spectrum


We have introduced in (2.A.8) the q-order moment M (q, l) of the increment
δl X(t) as scale l, which follows the scaling law
M (q, l) ∼ lζ(q) , (2.A.16)
and allows us to explore the multifractal properties of the multifractal
processes.
Let us define the Hölder exponent α(t0 ) at time t0 as
δl X(t0 ) ∼l→0 lα(t0 ) . (2.A.17)
The multifractal spectrum f (α) is the fractal (Haussdorf) dimension of the
iso-Hölder exponent sets:
f (α) = Dim{t, α(t) = α} . (2.A.18)
Roughly speaking, this means that, at scale l  T , the number of times where
δl X(t) ∼ lα is
N (t, α) ∼ l−f (α) . (2.A.19)
The multifractal formalism obtains that f (α) and ζ(q) are Legendre transform
of each other:
f (α) = 1 + minq (qα − ζ(q)) , (2.A.20)
ζ(q) = 1 + minα (qα − f (α)) . (2.A.21)
Therefore, in the multifractal formalism, q is nothing but the value of the
derivative of f (α) and conversely α is the value of the derivative of ζ(q):
∂f
q(α∗ ) = |α∗ . (2.A.22)
∂α
84 2 Marginal Distributions of Returns

2.A.3 The Multifractal Model of Asset Returns


of Mandelbrot et al.

Mandelbrot et al. [341] have proposed a very simple way to obtain a multi-
fractal process with suitable properties for the modeling of asset returns. In
its simplest form, it is based upon the subordination of a Brownian motion
by a multifractal process. Indeed, considering the price process {P (t)}t≥0 , the
logarithm of the price

X(t) = ln P (t) − ln P (0) (2.A.23)

is assumed to be defined by

X(t) = B[θ(t)] , (2.A.24)

where B(t) denotes the standard Brownian motion assumed independent of


the stochastic process θ(t) which is a multifractal process with continuous,
nondecreasing paths and stationary increments.
It is easy to check that X(t) enjoys the multifractal property and it is
straightforward to show that its multifractal spectrum ζX is related to the
multifractal spectrum ζθ of θ by

q
ζX (q) = ζθ . (2.A.25)
2
In addition, as long as B(t) is a Brownian motion without drift, the stochastic
process {X(t)}
 t≥0 is a martingale with respect to its natural filtration pro-
vided that E θ1/2 < ∞. Besides, if E [θ] is finite, the autocovariance function
of the price return process δl X(t) vanishes for all lag larger than l. The co-
variance of the absolute values of the price returns (raised to the power 2q)
satisfies
 
Cov |δl X(t)|2q , |δl X(t + τ )|2q = κ(q) · Cov (|δl θ(t)|q , |δl θ(t + τ )|q ) ,
(2.A.26)
  2
with κ(q) = E |B(1)|2q , so that the volatility of assets returns exhibits
long memory if the volatility of θ itself exhibits long memory, i.e., if periods
of intense trading activity alternates with periods of weak activity.

2.A.4 The Multifractal Random Walk (MRW)

An alternative approach for modeling the dynamics of asset returns in terms


of multifractal processes has been introduced by Bacry et al. [27, 28] with
the aim of constructing a stationary process with continuous scale invariance
inspired from the standard hierarchical models presented in Sect. 2.A.1.
Appendix 85

Definition

The MRW is a stochastic volatility model which has exact multifractal prop-
erties, is invariant under continuous dilations, and possesses stationary incre-
ments. It is constructed so as to mimic the crucial logarithmic dependence
(2.A.15) of the magnitude correlation function, at the basis of multifractality
in cascade processes.
The MRW is constructed as the continuous limit for ∆t → 0 of the dis-
cretized version X∆t (using a time discretization step ∆t) defined by adding
up t/∆t random variables:


t/∆t
X∆t (t) = δX∆t [k] . (2.A.27)
k=1

The process {δX∆t [k]}k is a noise whose variance is stochastic, i.e.,

δX∆t [k] = ∆t [k]eω∆t [k] , (2.A.28)

where ω∆t [k] is the logarithm of the stochastic variance. ∆t is a Gaussian
white noise independent of ω and of variance σ 2 ∆t.23
Following the cascade model, ω∆t is a Gaussian stationary process whose
covariance reads

Cov(ω∆t [k], ω∆t [l]) = λ2 ln ρ∆t [|k − l|] , (2.A.29)

where ρ∆t is chosen in order to mimic the correlation structure (2.A.15) ob-
served in cascade models:
 T
for |k| ≤ T /∆t − 1
ρ∆t [k] = (|k|+1)∆t (2.A.30)
1 otherwise

In order for the variance of X∆t (t) to converge when ∆t → 0, one must choose
the mean of the process ω∆t such that [27]

E (ω∆t [k]) = −Var (ω∆t [k]) = −λ2 ln(T /∆t) , (2.A.31)

for which Var(X∆t (t)) = σ 2 t .

Multifractal Spectrum

Since, by construction, the increments of the model are stationary, the pdf of
X∆t (t + l) − X∆t (t) does not depend on t and is the same as that of X∆t (l). In
23
Introducing an asymmetric dependence between ω and the noise  allows one
to account for the Leverage effect [170] while preserving the scale invariance
properties of the MRW, but forbids the existence of a limit as ∆t −→ 0 [388].
86 2 Marginal Distributions of Returns

[27], it was proven that the moments of X(l) ≡ X∆t→0+ (l) can be expressed
as
  l 
σ 2p (2p)! l 2
E(X(l)2p ) = p
du 1 ... dup ρ(ui − uj )4λ , (2.A.32)
2 p! 0 0 i<j

where ρ is defined by

T /|t| for |t| ≤ T
ρ(t) = . (2.A.33)
1 otherwise

Using this expression in the multiple integrals in (2.A.32), a straightforward


scaling argument leads to
 p−2p(p−1)λ2
l
M (2p, l) = K2p , (2.A.34)
T
where
 1  1 
|ui − uj |−4λ .
2
K2p = T σ (2p − 1)!!
p 2p
du1 ... dup (2.A.35)
0 0 i<j

K2p is nothing but the moment of order 2p of the random variable X(T ) or
equivalently of δT X(t). Expression (2.A.35) leads to ζ2p = p − 2p(p − 1)λ2 ,
and by analytical continuation, the corresponding full ζq spectrum is thus the
parabola

ζq = (q − q(q − 2)λ2 )/2 . (2.A.36)

Approximate form in Terms of a Long Memory Kernel


in the Discrete Time Approximation

Consider the returns at scale ∆t, defined by r∆t (t) ≡ ln[p(t)/p(t − ∆t)]. Then,
mapping the increments δX∆t [k] defined in (2.A.28) onto r∆t (t) makes the
price p(t) a multifractal random walk in the continuous limit ∆t → 0. The
discrete return r∆t (t) can thus be written as

r∆t (t) = (t) · σ∆t (t) = (t) · eω∆t (t) , (2.A.37)

where (t) is a standardized Gaussian white noise independent of ω∆t (t) and
ω∆t (t) is a nearly Gaussian process (exactly Gaussian for ∆t → 0) with mean
and covariance:
1
µ∆t = ln(σ 2 ∆t) − C∆t (0) (2.A.38)
2
C∆t (τ ) = Cov[ω∆t (t), ω∆t (t + τ )] ,


T
λ2 ln |τ |+e−3/2 ∆t
if |τ | < T − e−3/2 ∆t ,
= (2.A.39)
0 if |τ | ≥ T − e−3/2 ∆t
Appendix 87

σ 2 ∆t is the variance of the returns at scale ∆t and T is the “integral” (cor-


relation) time scale. Typical values for T and λ2 are respectively 1 year and
0.02.
The MRW model can be expressed in a more familiar form, in which the
log-volatility ω∆t (t) obeys an autoregressive equation whose solution reads
[456]
 t
ω∆t (t) = µ∆t + dW (τ ) K∆t (t − τ ) , (2.A.40)
−∞

where W (t) denotes a standard Wiener process and the memory kernel K∆t (·)
is a causal function, ensuring that the system is not anticipative. The process
W (t) can be seen as the cumulative information flow. Thus ω(t) represents
the response of the price to incoming information up to the date t. At time

(t)3/2
t, the distribution of ω∆t is Gaussian with mean µ∆t and variance V∆t =
∞ 2 2 Te
0
dτ K∆t (τ ) = λ ln ∆t . Its covariance, which entirely specifies the
random process, is given by
 ∞
C∆t (τ ) = dt K∆t (t)K∆t (t + |τ |) . (2.A.41)
0

Performing a Fourier transform, we obtain


 
Tf
2 2 −1 sin(t)
K̂∆t (f ) = Ĉ∆t (f ) = 2λ f dt + O (f ∆t ln(f ∆t)) ,
0 t
(2.A.42)
which shows that, for τ small enough,

λ2 T
K∆t (τ ) ∼ K0 for ∆t  τ  T . (2.A.43)
τ
This slow inverse square root power law decay (2.A.43) of the memory ker-
nel in (2.A.40) ensures the long-range dependence and multifractality of the
stochastic volatility process (2.A.37). Note that (2.A.40) for the log-volatility
ω∆t (t) takes a form similar to but simpler than the ARFIMA models often
used to account for the very slow decay of the sample ACF of the log-volatility
of assets returns [30].

2.B A Survey of the Properties


of Maximum Likelihood Estimators
This appendix summarizes the expressions of the maximum likelihood esti-
mators derived from the four distributions (2.26–2.29). In the following, we
consider an iid sample X1 , . . . , XT drawn from one of the distributions under
consideration, namely the Pareto, the Weibull, the Exponential, the incom-
plete Gamma and the log-Weibull distributions.
88 2 Marginal Distributions of Returns

2.B.1 The Pareto Distribution

According to expression (2.26), the Pareto distribution is given by



u b
Fu (x) = 1 − , x≥u (2.B.44)
x
and its density is

ub
fu (x|b) = b . (2.B.45)
xb+1
Let us denote by


T
LP D
T (b̂) = max ln fu (Xi |b) (2.B.46)
b
i=1

the maximum of the log-likelihood function derived under hypothesis (PD).


b̂ is the maximum likelihood estimator of the tail index b under the PD hy-
pothesis.
The maximum of the likelihood function is solution of

1
T
1
+ ln u − ln Xi = 0 , (2.B.47)
b T i=1

which yields
 −1  
1
T
1 PD b̂ 1
b̂ = ln Xi − ln u , and LT (b̂) = ln − 1 + .
T i=1 T u b̂
(2.B.48)

Moreover, one easily shows that b̂ is asymptotically normally distributed:



T (b̂ − b) ∼ N (0, b) . (2.B.49)

2.B.2 The Weibull Distribution

The Weibull distribution is given by (2.27) and its density is


c u c

x c 
fu (x|c, d) = c · e( d ) xc−1 · exp − , x≥u. (2.B.50)
d d
The maximum of the log-likelihood function is


T
LSE ˆ ln fu (Xi |c, d)
T (ĉ, d) = max (2.B.51)
c,d
i=1
Appendix 89

ˆ are solution of
Thus, the maximum likelihood estimators (ĉ, d)
T  Xi c Xi
1  Xi
1 T
1 ln u
= 1 i=1
T
T  u
 c − ln , (2.B.52)
c
T i=1 u
Xi
−1 T i=1 u
T  c
u c  Xi
c
d = −1. (2.B.53)
T i=1 u

Equation (2.B.52) depends on c only and must be solved numerically. Then,


the resulting value of c can be put in (2.B.53) to get d. The maximum of the
log-likelihood function is

 T
1 SE ˆ = ln ĉ + ĉ − 1
LT (ĉ, d) ln Xi − 1 . (2.B.54)
T dˆĉ T i=1

Since c > 0, the vector T (ĉ − c, dˆ − d) is asymptotically normal, with a
covariance matrix whose expression is given in Appendix 2.C.
It should be noted that the maximum likelihood (2.B.52–2.B.53) do not
admit a solution with positive c for all possible samples (X1 , . . . , XT ). Indeed,
the function
T  Xi c Xi
1  Xi
1 T
1 ln u
h(c) = − 1 i=1
T
T  u
 c + ln , (2.B.55)
c i=1
T
Xi
u −1 T i=1 u

which is the total derivative of LSE ˆ


T (c, d(c)), is a decreasing function of c. This
means, as one can expect, that the likelihood function is concave. Thus, a
necessary and sufficient condition for (2.B.52) to admit a solution is that h(0)
is positive. After some calculations, we find
1  2  2
2 ln Xui − T1 ln Xi
h(0) = T
2
 Xi u
, (2.B.56)
T ln u

which is positive if and only if


 2
1  Xi 1  2 Xi
2 ln − ln >0. (2.B.57)
T u T u

A finite sample may not automatically obey this condition even if it has been
generated by the SE distribution. However, the probability of occurrence of a
sample leading to a negative maximum likelihood estimate of c tends to zero
(under the SE Hypothesis with a positive c) as
 √
c T σ c2 T
Φ − √ e− 2σ2 , (2.B.58)
σ 2πT c
90 2 Marginal Distributions of Returns

i.e. exponentially with respect to T . Here, σ 2 is the variance of the limit


Gaussian distribution of the maximum likelihood c-estimator that can be de-
rived explicitly. If h(0) is negative, LSE
T reaches its maximum at c = 0 and in
such a case
  
1 SE 1 Xi 1
LT (c = 0) = − ln ln − ln Xi − 1 . (2.B.59)
T T u T

In contrast, if the maximum likelihood estimation based on the SE assumption


is applied to samples distributed differently from the SE, negative c-estimate
can then be obtained with some positive probability not tending to zero with
T −→ ∞. If the sample is distributed according to the Pareto distribution,
for instance, then the maximum-likelihood c-estimate converges in probability
to a Gaussian random variable with zero mean, and thus the probability for
negative c-estimates converges to 0.5.

2.B.3 The Exponential Distribution

The Exponential distribution function is given by (2.28), and its density is


   x
exp ud
fu (x|d) = exp − , x ≥ u . (2.B.60)
d d
The maximum of the log-likelihood function is reached at

1
T
dˆ = Xi − u , (2.B.61)
T i=1

and is given by
1 ED ˆ ˆ .
L (d) = −(1 + ln d) (2.B.62)
T T

The random variable T (dˆ − d) is asymptotically normally distributed with
zero mean and variance d2 /T .

2.B.4 The Incomplete Gamma Distribution

The expression of the incomplete Gamma distribution function is given by


(2.29) and its density is

db 
x 
  −(b+1)
fu (x|b, d) = · x exp − , x≥u. (2.B.63)
Γ −b, ud d

Let us introduce the partial derivative of the logarithm of the incomplete


Gamma function:
Appendix 91
 ∞
∂ 1
Ψ (a, x) = ln Γ (a, x) = dt ln t ta−1 e−t . (2.B.64)
∂a Γ (a, x) x

ˆ
The maximum of the log-likelihood function is reached at the point (b̂, d)
solution of
1  Xi
T
u
ln = Ψ −b, , (2.B.65)
T i=1 d d

1  Xi
T
1
u −b u
=   e− d − b , (2.B.66)
T i=1 d Γ −b, ud d
and is equal to
1 IG ˆ
u
LT (b̂, d) = − ln dˆ − ln Γ −b,
T d

u 1
u −b u
+ (b + 1) · Ψ −b, +b−   e− d (2.B.67)
.
d Γ −b, ud d

2.B.5 The Log-Weibull Distribution

The Log-Weibull distribution is given by (2.34) and its density is


b · c
x c−1 
x c 
fu (x|b, c) = · ln · exp −b ln , x≥u. (2.B.68)
x u d
The maximum of the log-likelihood function is


T
LSE
T (b̂, ĉ) = max ln fu (Xi |b, c) . (2.B.69)
b,c
i=1

Thus, the maximum likelihood estimators (b̂, b̂) are solution of


T  c
−1 1 Xi
b = ln , (2.B.70)
T i=1 u
T  Xi c  Xi   
1
1 T
1 i=1 ln u ln ln u Xi
= T
T  
Xi c
− ln ln . (2.B.71)
c 1
i=1 ln
T i=1 u
T u

The
√ solution of these equations is unique and it can be shown that the vector
T (b̂ − b, ĉ − c) is asymptotically Gaussian with a covariance which can be
deduced from the matrix (2.C.88) given in Appendix 2.C.

2.C Asymptotic Variance–Covariance


of Maximum Likelihood Estimators of the SE Parameters

We consider the Stretched-Exponential (SE) parametric family with comple-


mentary distribution function
92 2 Marginal Distributions of Returns

x c
u c 
F̄ = 1 − F (x) = exp − + xu, (2.C.72)
d d
where c, d are unknown parameters and u is a known lower threshold.
Let us take a new parameterization of the SE distribution, more appro-
priate for the derivation of asymptotic variances. It should be noted that this
change of parameters does not affect the asymptotic variance of the form
parameter c. In the new parameterization, the complementary distribution
function has the form:


x c 
F̄ (x) = exp −v −1 , xu. (2.C.73)
u
Here, the parameter v involves both the unknown parameters c, d and the
known threshold u:

u c
v= . (2.C.74)
d
The log-likelihood L for sample (X1 , . . . , XT ) has the form:


N N  c 
Xi Xi
L = N ln v + N ln c + (c − 1) ln −v −1 . (2.C.75)
i=1
u i=1
u

Now, we derive the Fisher matrix Φ:


  2   2 
E −∂v L  E −∂v,c L
Φ= , (2.C.76)
E −∂c,v
2
L E −∂c2 L

and find
∂2L N
= − , (2.C.77)
∂v 2 v2
N  c  c 
∂2L 1  Xi Xi N →∞ X X
= −N · ln −→ −N E ln (2.C.78)
∂v∂c N i=1 u u u u
N  c
∂2L N 1  Xi Xi
= − − Nv · ln2
∂c2 c2 N i=1 u u
 c 
N →∞ N X X
−→ − − Nv · E ln2 . (2.C.79)
c2 u u
After some calculations, we obtain:
 c  
X X 1 + E1 (v)
E ln = , (2.C.80)
u u c·v

where E1 (v) is the integral exponential function:


 ∞ −t
e
E1 (v) = dt. (2.C.81)
v t
Appendix 93

Similarly we find:
 c 
X X 2ev
E ln2 = [E1 (v) + E2 (v) − ln(v)E1 (v)] , (2.C.82)
u u v · c2
where E2 (v) is the partial derivative of the incomplete Gamma function:
 ∞  ∞
ln(t) −t ∂ ∂
E2 (v) = · e dt = ta−1 e−t dt = Γ (a, x) .
v t ∂a v a=0 ∂a a=0
(2.C.83)

The Fisher matrix (multiplied by N ) then reads:


 v
1 1+e E1 (v)
v2 c·v
NΦ = v
1+e E1 (v) 1
. (2.C.84)
c·v c2 (1 + 2ev [E1 (v) + E2 (v) − ln(v)E1 (v)])

The covariance matrix B of the ML-estimates (ṽ, c̃) is equal to the inverse
of the Fisher matrix. Thus, inverting the Fisher matrix Φ in (2.C.84) provides
the desired covariance matrix:
⎛ 2 ⎞
v
[1 + 2ev
E 1 (v)+2ev
E 2 (v)−ln(v)ev
E 1 (v)] − cv
[1 + ev
E 1 (v)]
B=⎝ ⎠,
N H(v) N H(v)
c2
−N H(v) [1 + e E1 (v)]
cv v
N H(v)
(2.C.85)

where H(v) has the form:

H(v) = 2ev E2 (v) − 2 ln(v)ev E1 (v) − (ev E1 (v))2 . (2.C.86)

We also present here the covariance matrix of the limit distribution of


ML-estimates for the SE distribution on the whole semi-axis (0, ∞):

1 − F (x) = exp(−g · xc ), x0. (2.C.87)

After some calculations following the same steps as above, we find the co-
variance matrix B of the limit Gaussian distribution of ML-estimates (g̃, c̃):

  
π2
6 g2 + (γ + ln(g) − 1)2 g · c [γ + ln(g) − 1]
6
B= , (2.C.88)
N π2 g · c [γ + ln(g) − 1] c2
where γ is the Euler number: γ 0.577 215 . . .

2.D Testing the Pareto Model versus


the Stretched-Exponential Model

This Appendix derives the statistic that allows one to test the SE hypothe-
sis f1 (x|c, b) versus the Pareto hypothesis f0 (x|β) on a semi-infinite interval
(u, ∞), u > 0. The following parameterization is used:
94 2 Marginal Distributions of Returns
 
b

x c
f1 (x|c, b) = b u−c xc−1 exp − −1 ; x≥u (2.D.89)
c u
for the Stretched-Exponential distribution and

f0 (x|β) = β ; x≥u (2.D.90)
x1+β
for the Pareto distribution.
Theorem: Assuming that the sample X1 , . . . , XN is generated from the
Pareto distribution (2.D.90), and taking the supremum of the log-likelihoods
L0 and L1 of the Pareto and (SE) models respectively over the domains
(β > 0) for L0 and (b > 0, c > 0) for L1 , then Wilks’ log-likelihood ratio
W:
 
WN = 2 sup L1 − sup L0 , (2.D.91)
b,c β

is distributed according to the χ2 -distribution with one degree of freedom, in


the limit N → ∞.
Proof
The log-likelihood L0 reads

N 
N
Xi
L0 = − log Xi + N log(β) − β log . (2.D.92)
i=1 i=1
u

The supremum over β of L0 given by (2.D.92) is reached at


 −1
1 
N
Xi
β̂N = log , (2.D.93)
N i=1 u

and is equal to
 
1
sup L0 = −N 1 + log u + − log β̂N . (2.D.94)
β β̂N
The log-likelihood L1 is
 N  c $
1  b 1 
N
Xi Xi
L1 = −N log u − (c − 1) log − log b + −1 .
N i=1 u c N i=1 u
(2.D.95)

The supremum over b of L1 given by (2.D.95) is reached at


 N  c  −1
1  Xi
b̂N = c −1 (2.D.96)
N i=1 u
Appendix 95

and is equal to

1 
N
Xi
sup L1 = −N 1 + log u − (c − 1) log − log b̂N . (2.D.97)
b N i=1 u

Taking the derivative of expression (2.D.97) with respect to c, we obtain the


maximum likelihood equation for the SE parameter c
N  Xi ĉN
1 
N
1 1
log Xui Xi
= N i=1 u
  ĉ
− log . (2.D.98)
ĉN 1 N Xi N
−1 N i=1 u
N i=1 u

If the sample X1 , . . . , XN is generated by the Pareto distribution (2.D.90),


then by the strong law of large numbers, we have with probability 1 as N →
+∞
 
1 
N
Xi X 1
log −→ E0 log = , (2.D.99)
N i=1 u u β
N  c   c 
1  Xi X c
− 1 −→ E0 −1 = , (2.D.100)
N i=1 u u β−c
N  c  c 
1  Xi Xi X X β
log −→ E0 log = , (2.D.101)
N i=1 u u u u (β − c)2

where E0 [·] denotes the expectation with respect to f0 (·|β).


Inserting these limit values into (2.D.98), the only limit solution of this
equation is c = 0. Thus, the solution of (2.D.98) for finite N , denoted as ĉN ,
converges to zero with probability 1 to zero as N → +∞.
Expanding (Xi /u)c in power series in the neighborhood of c = 0 gives
 c      
Xi ∼ Xi c2 Xi c3 Xi
= 1 + c · log + · log2 + · log3 + ··· ,
u u 2 u 6 u
(2.D.102)

which yields
 c
1  Xi ∼ c2 c3
= 1 + c · S1 + · S2 + S 3 , (2.D.103)
N u 2 6
  c   2
1 Xi Xi ∼ c
log = S1 + c · S2 + S3 , (2.D.104)
N u u 2
where
96 2 Marginal Distributions of Returns
 
1 
N
Xi
S1 = log , (2.D.105)
N i=1 u
 
1  2 Xi
N
S2 = log , (2.D.106)
N i=1 u
 
1  3 Xi
N
S3 = log . (2.D.107)
N i=1 u

Putting these expansions into (2.D.96) and (2.D.98) and keeping only
terms in c up to second order, the solutions of these equations reads
 
S2 3S 2 − 2S1 S3 2 1
2 S2 − S12
b̂N S1−1 1− ĉN + 2 ĉN , and ĉN .
2 S1 S2 − 3 S3
2S1 12S12 1 1

(2.D.108)

Inserting these solutions into (2.D.97) and (2.D.94) gives



ĉN S2
sup L1 = −N 1 + log u − (ĉN − 1) S1 + log S1 + ·
b,c 2 S1

3S22 − 4S1 S3 2
− ĉN , (2.D.109)
24S12
up to the second order in ĉN , and

sup L0 = −N [1 + log u + S1 + log S1 ] , (2.D.110)


β

which obtains the explicit formula


 
WN = 2 sup L1 − sup L0 , (2.D.111)
b,c β
  2
S3 S2 1 S2
2N ĉ2N · − + . (2.D.112)
6S1 2 8 S1
Now by the law of large numbers, S1 converges to 1/β, S2 converges to
2/β 2 and S3 converges to 6/β 3 with probability 1 as N goes to infinity. Thus,
by the continuous mapping theorem
 2
S3 S2 1 S2 a.s 1
− + −→ , (2.D.113)
6S1 2 8 S1 2β 2

so that defining the variables ξ1 = S1 − β −1 , ξ2 = S2 − 2β −2 , we can assert


that
 2
β
WN = β N 2ξ1 − ξ2 + op (1).
2
(2.D.114)
2
Appendix 97

Now, accounting for the fact that


    
√ ξ1 law 1 4β −1
Nβ −→ N 0, , (2.D.115)
ξ2 4β −1 20β −2

we can write
√ 4 √ 2
N βξ2 = ( N βξ1 ) + ˆ , (2.D.116)
β β
where ˆ is a Gaussian variable with zero mean and unit variance, independent
from ξ1 . This implies that

WN = ˆ2 + op (1) , (2.D.117)

which means that Wilks’ statistic WN converges to a χ2 -random variable with


one degree of freedom.
3
Notions of Copulas

In this chapter, we introduce the notion of copulas, which describes the depen-
dence between several random variables. These variables can be the returns
of different assets or the value of a given asset at different times, and more
generally, any set of economic variables. We present some examples of clas-
sical families of copulas and provide several illustrations of the usefulness of
copulas for actuarial,1 economic, and financial applications.
Until relatively recently, the correlation coefficient was the measure of
reference used to quantify the amplitude of dependence between two assets.
From the old hypothesis or belief that the marginal distribution of returns is
Gaussian, it was natural to extend this assumption of normality to the mul-
tivariate domain. Recall that only under the assumption of multivariate nor-
mality2 is the correlation coefficient necessary and sufficient to capture the
full dependence structure between asset returns. The growing attacks of the
past three decades and the now overwhelming evidence against the Gaussian
hypothesis also cast doubts on the relevance of the correlation coefficient as
an adequate measure of dependence. See for instance [404] for a specific test
of multivariate normality of asset returns. Actually, it is now clear that the
correlation coefficient is grossly insufficient to provide an accurate description
of the dependence between two assets [64, 148, 149] and that it is necessary
to characterize the full joint multivariate distribution of asset returns. This is
all the more important for rare large events whose deviations from normality
are the most extreme both in amplitude and dependence.
Consider for simplicity the problem of characterizing the bivariate distrib-
ution of the returns of only two assets. It is essential to realize that the bivari-
1
Actuarial science is a sister discipline of statistics. Actuaries play an important
role in many of the financial plans that involve people, e.g., life insurance, pension
plans, retirement benefits, car insurance, unemployment insurance, and so on.
2
To some extent, the correlation coefficient also adequately quantifies the depen-
dence between elliptically distributed random variables, even if it may yield spu-
rious conclusions – especially in the far tails – as we shall see in the next chapters.
100 3 Notions of Copulas

ate distribution embodies two qualitatively different pieces of information on


the two assets. On the one hand, it contains the two marginal distributions;
on the other hand, it contains information on the dependence between the
two assets irrespective of their individual (marginal) distributions. Only the
introduction of the copula allows one to operate a clean dissection between
these two pieces of information. The role of the copula of two random vari-
ables is precisely to offer a complete and unique description of the dependence
structure existing between them, excluding all (parasiting) information on the
marginal distribution of the random variables.
Such an approach in terms of copulas has witnessed a recent burst of in-
terest and of activity spurred by its practical and theoretical implications.
From an applied view point, determining the dependence between assets is at
the core of risk management: the dependence governs (i) the optimization of
diversification of risks by aggregation in portfolios, (ii) the hedging strategies
based on derivatives, and (iii) the securitization3 of different risky instru-
ments to sell them to third parties. Specifically, the advantage of the copula
formulation is to provide a better understanding and quantification of the
interactions between assets by determining the diverse dependence structures
between the various sources of risk. Applications to finance include the calcu-
lation of VaR (Value-at-Risk) and portfolio optimization [145], the calculation
of option prices [99, 112], and credit risk [184, 186]. For various applications
to insurance, see [110, 183, 478].
From a fundamental viewpoint, it is reasonable to think that the structure
of dependence between assets reflects the underlying mechanisms at work in
financial markets. In particular, the dependence between assets is in part the
result of the interactions between the agents investing in the stock market4 .
Not only are investors responsible for the individual variations and fluctua-
tions of assets but, by their asset allocation choices (buying or selling such
or such security rather than another), they also create dependence between
assets. It can thus be hoped that the study of the dependence between assets
may complement the understanding of the important mechanisms at work
in stock markets and therefore of the interactions between agents. It should
also help in narrowing down the relevant macroscopic parameters influencing
investors in their asset allocation.
Before presenting copulas and their fundamental properties, we should
stress that this body of results applies also when the structure of dependence
is time-varying. This remark is important since there is a priori no principle
or reason for the dependence to be constant [380, 409, 412]. One should thus
3
The process of aggregating similar instruments, such as loans or mortgages, into
a negotiable security.
4
Of course, the observed dependence between assets has also other inputs than just
the action of economic agents on financial markets. The macroeconomic variables
also play an important role, especially for assets belonging to the same economic
sector, which are collectively sensitive to the same variations of the macroeco-
nomic landscape. This is the stance taken by factor models described in Chap. 1.
3.1 What is Dependence? 101

study its dynamics in addition. However, such a study of the time-dynamics


of the multivariate dependence structure between assets is extremely deli-
cate both from an empirical and theoretical point of view. In addition, as we
show in Chap. 6, some apparent time-varying dependence may appear as a
spurious consequence of conditioning the measures of dependence on market
phases with large volatility, for instance. This mechanism appears to explain
a large part of the empirical observations on time-varying dependence, sug-
gesting that it would be sufficient to model the time-dependent properties of
volatility alone. We thus make the simplifying assumption that any possible
time-dependence of the statistical properties of assets is entirely embedded in
the evolution of the marginal distributions of their returns, while the depen-
dence structure between assets remains invariant.

3.1 What is Dependence?


The notion of independence of random variables is very easy to define. From
elementary probability theory, two random variables X and Y are independent
if and only if, for any x and y in the supports of the distributions,

Pr [X ≤ x; Y ≤ y] = Pr [X ≤ x] · Pr [Y ≤ y] , (3.1)

or equivalently

Pr [X ≤ x | Y ] = Pr [X ≤ x] . (3.2)

In other words, two random variables are independent if the knowledge of a


piece of information about one of the random variables does not bring any
new insight on the other one.
The notion of dependence is much more subtle to define, or at least to
quantify. Let us start with the concept of mutual complete dependence [290].
It seems natural that two real random variables X and X  are mutually com-
pletely dependent if the knowledge of X implies the knowledge of X  , and
reciprocally. This statement simply means that there exists a one-to-one map-
ping f such that:

X  = f (X), almost everywhere , (3.3)

which, as stressed in [270], implies the perfect predictability of one of the ran-
dom variables from the other one. The mapping f is either strictly increasing
or strictly decreasing. In the first case, the random variables are said to be
comonotonic.
In a second stage of our investigation of the concept of dependence, let us
ask what could be the meaning of the following statement:
The random variables X and Y exhibit the same dependence as the
random variables X  and Y  .
102 3 Notions of Copulas

A possible interpretation, explored in this chapter, is that the random vari-


ables X and X  , on the one hand, and Y and Y  , on the other hand, are
comonotonic. In this case, all variables or functions describing the dependence
between two (and more generally several) random variables should enjoy the
property of invariance under an arbitrary increasing mapping. Let us assume
that there exists a function C describing the dependence of the random vari-
ables X and Y and a function C  describing the dependence of the random
variables X  and Y  . Writing that X and X  (respectively Y and Y  ) are
comonotonic,
X  = h1 (X) , (3.4)
Y  = h2 (Y ) , (3.5)
where h1 and h2 are increasing functions on R (if we consider real-valued ran-
dom variables), the property of invariance under strictly increasing mapping
reads C = C  .
Let us now show how to build C. Does the usual correlation coefficient
qualify? While the correlation coefficient measures some kind of dependence,
it is only able to account for a linear dependence.5 Therefore, it does not fulfill
the requirement for a general concept of dependence which should involve any
nonlinear monotonic structure. Thus, we must look for something else.
Let us consider the two random variables X and Y and their joint distri-
bution function denoted by H:
H(x, y) = Pr [X ≤ x; Y ≤ y] . (3.6)
The marginal distributions of X and Y are respectively:
F (x) = Pr [X ≤ x] = lim H(x, t) , (3.7)
t→∞
G(y) = Pr [Y ≤ y] = lim H(t, y) . (3.8)
t→∞
For simplicity, let us assume that F and G are continuous and increasing, so
that the usual inverses F −1 and G−1 exist. Then, let us define
 
C(u, v) = H F −1 (u), G−1 (v) , ∀u, v ∈ [0, 1] . (3.9)
Let us now focus on the random variables X  and Y  given by (3.4–3.5) above.
It is clear that their joint distribution function is
 
H  (x, y) = Pr [X  ≤ x; Y  ≤ y] = Pr X ≤ h−1 −1
1 (x); Y ≤ h2 (y)
 
= H h−1 −1
1 (x), h2 (y) , (3.10)
5
Indeed, consider the linear regression Y = βX +  where β is a constant and X
and  are two independently distributed centered random variables with variances
Var(X) and Var() respectively. Then, the knowledge of the covariance Cov(X, Y )
and of the variance Var(X) of X is equivalent to the knowledge of the linear de-
pendence between X and Y : Cov(X, Y ) = β Var(X). The correlation coefficient,
  −1/2
Corr(X, Y ) ≡ Cov(X, Y )/ Var(X)Var(Y ) = 1 + Var()/(β 2 Var(X)) , in-
volves in addition an information on Var().
3.2 Definition and Main Properties of Copulas 103

while their marginal distributions are:


 
F  (x) = Pr [X  ≤ x] = F h−1
1 (x) , (3.11)
 
G (y) = Pr [Y  ≤ y] = G h−1
2 (y) . (3.12)
Now, considering
 
C  (u, v) = H  F −1 (u), G−1 (v) , ∀u, v ∈ [0, 1] , (3.13)

elementary algebraic manipulations show that

C(u, v) = C  (u, v), ∀u, v ∈ [0, 1] . (3.14)

It turns out that the function C defined by (3.9) is the only object obeying the
property of invariance under strictly increasing mapping and which entirely
captures the full dependence between X and Y .
The following properties follow from simple calculations:
• C(u, 1) = u and C(1, v) = v, ∀u, v ∈ [0, 1],
• C(u, 0) = C(0, v) = 0, ∀u, v ∈ [0, 1],
• C is 2-increasing, namely, for all u1 ≤ u2 and v1 ≤ v2 :

C(u2 , v2 ) − C(u2 , v1 ) − C(u1 , v2 ) + C(u1 , v1 ) ≥ 0 . (3.15)

This last property is a simple translation of the nonnegativity of probabilities,


specifically of the following expression:
 
Pr F −1 (u1 ) ≤ X ≤ F −1 (u2 ); G−1 (v1 ) ≤ Y ≤ G−1 (v2 ) ≥ 0 . (3.16)

As we shall see in the sequel, these three properties define the mathemati-
cal object called copula, which has been introduced by A. Sklar in the late
1950s [443] in order to describe the general dependence properties of random
variables.

3.2 Definition and Main Properties of Copulas


This section provides a brief survey of the main properties of copulas, empha-
sizing the most important definitions and theorems useful in the following. For
exhaustive and general presentations, we refer to [248, 370] and to [74, 183]
for introductions oriented to financial and actuarial applications.
The definition of a copula of n random variables generalizes the intuitive
definition (3.9) presented above for the bivariate copula.
Definition 3.2.1 (Copula). A function C : [0, 1]n −→ [0, 1] is a n-copula
if it enjoys the following properties :
• ∀u ∈ [0, 1], C(1, . . . , 1, u, 1 . . . , 1) = u ,
• ∀ui ∈ [0, 1], C(u1 , . . . , un ) = 0 if at least one of the ui ’s equals zero ,
104 3 Notions of Copulas

• C is grounded and n-increasing, i.e., the C-volume of every box whose


vertices lie in [0, 1]n is positive.

It is clear from this definition that a copula is nothing but a multivariate


distribution with support in [0,1]n and with uniform marginals. It immedi-
ately follows that a convex sum of copulas remains a copula. The fact that
such mathematical objects can be very useful for representing multivariate
distributions with arbitrary marginals has been suggested in the previous in-
troductory section and is stated more formally in the following result [443].

Theorem 3.2.1 (Sklar’s Theorem). Given a n-dimensional distribution


function F with continuous6 (cumulative) marginal distributions F1 , . . . , Fn ,
there exists a unique n-copula C : [0, 1]n −→ [0, 1] such that:
F (x1 , . . . , xn ) = C(F1 (x1 ), . . . , Fn (xn )) . (3.17)
Thus, the copula combines the marginals to form the multivariate distri-
bution. This theorem provides both a parameterization of multivariate distri-
butions and a construction scheme for copulas. Indeed, given a multivariate
distribution F with marginals F1 , . . . , Fn , the function
 
C(u1 , . . . , un ) = F F1−1 (u1 ), . . . , Fn−1 (un ) (3.18)
is automatically an n-copula.7 This copula is the copula of the multivariate
distribution F . We will use this method in the sequel to derive the expressions
of standard copulas such as the Gaussian copula or the Student’s copula.
In addition to the copula itself, it is often very useful to consider the two
following quantities:
Definition 3.2.2. Given n random variables X1 , . . . , Xn with marginal sur-
vival distributions F̄1 , . . . , F̄n and joint survival distribution F̄ , the survival
copula C̄ is such that:
 
C̄ F̄1 (x1 ) , . . . , F̄n (xn ) = F̄ (x1 , . . . , xn ) . (3.19)
The dual copula C ∗ of the copula C of X1 , . . . , Xn is defined by:

C ∗ (u1 , . . . , un ) = 1 − C̄ (1 − u1 , . . . , 1 − un ) , ∀u1 , . . . , un ∈ [0, 1] .


(3.20)

6
When this assumption fails, Sklar’s theorem still holds, but in a weaker sense: a
representation like (3.17) still exists but is not unique anymore.
7
The quantile function, or generalized inverse, Fi−1 of the distribution Fi can be
defined by:
Fi−1 (u) = inf{x | Fi (x) ≥ u}, ∀u ∈ (0, 1).
When the distribution function Fi is strictly increasing, Fi−1 denotes the usual
inverse of Fi . In fact, any quantile function can be chosen. But, for noncontinuous
margins, the copula (3.18) depends upon the precise quantile function which is
selected.
3.2 Definition and Main Properties of Copulas 105

While the survival copula is indeed a true copula, the dual copula is not.
However, it can be simply related to the probability that (at least) one of the
Xi ’s is less than or equal to xi . Indeed, one can easily check that:
 n 
%
Pr {Xi ≤ xi } = C ∗ (F1 (x1 ), . . . , Fn (xn )) . (3.21)
i=1

A very powerful property shared by all copulas is their invariance under


arbitrary increasing mapping of the random variables (this has been shown
for the case of the bivariate copulas in the derivation ending with (3.14)):
Theorem 3.2.2 (Invariance Theorem). Consider n continuous random
variables X1 , . . . , Xn with copula C. Then, if h1 (X1 ), . . . , hn (Xn ) are increas-
ing on the ranges of X1 , . . . , Xn , the random variables Y1 = h1 (X1 ), . . . ,
Yn = hn (Xn ) have exactly the same copula C.
Let us stress again that this result demonstrates that the full dependence
between the n random variables is completely captured by the copula, in-
dependent of the shape of the marginal distributions. In other words, the
Invariance Theorem shows that the copula is an intrinsic measure of depen-
dence between random variables. Under a monotonic change of variable from
an old variable to a new variable, these two variables are comonotonic by
definition. Intuitively, as explained in the previous section, it is natural that
a measure of dependence between two random variables should be insensitive
to the substitution of one of the variables by a comonotonic variable: if X and
X  are two comonotonic variables, one expects the same dependence struc-
ture for the pair (X, Y ) and for the pair (X  , Y ). This is precisely the content
of the Invariance Theorem on copulas. In contrast, a measure of dependence
such as the correlation coefficient which is function of both the copula and
the marginal distribution is not invariant under a monotonic change of vari-
able. It does not constitute an intrinsic measure of dependence (we will come
back in detail on this point in Chap. 4). The benefit of using copulas is the
decoupling between the marginal distribution and the dependence structure,
which justifies the separate study of marginal distributions on the one hand
and of the dependence on the other hand.
Let us now state several useful properties enjoyed by copulas. First, any
copula is uniformly continuous:
Proposition 3.2.1. Given an n-copula C, for all u1 , . . . , un ∈ [0, 1] and all
v1 , . . . , vn ∈ [0, 1]:

|C (v1 , . . . , vn ) − C (u1 , . . . , un )| ≤ |v1 − u1 | + · · · + |vn − un | . (3.22)

This result is a direct consequence of the property that copulas are n-


increasing. Indeed, restricting ourselves to the bivariate case for the simplicity
of the exposition, the triangle inequality implies
106 3 Notions of Copulas

|C (v1 , v2 ) − C (u1 , u2 )| = |C (v1 , v2 ) − C (u1 , v2 ) + C (u1 , v2 ) − C (u1 , u2 )|


≤ |C (v1 , v2 ) − C (u1 , v2 )| + |C (u1 , v2 ) − C (u1 , u2 )| ,
and by (3.15), with some of the arguments put equal to 0 or 1, we have

|C (v1 , v2 ) − C (u1 , v2 )| ≤ |v1 − u1 | , (3.23)

and

|C (u1 , v2 ) − C (u1 , u2 )| ≤ |v2 − u2 | , (3.24)

which leads to the expected result.


Besides, it follows that a copula is differentiable almost everywhere:
Proposition 3.2.2. Let C be an n-copula. For almost all (u1 , . . . , un ) ∈
[0, 1]n , the partial derivative of C with respect to ui exists and:
∂C
0≤ (u1 , . . . , un ) ≤ 1 . (3.25)
∂ui
These two properties show that copulas enjoy nice regularity (or smoothness)
conditions. In fact, the later one will turn out to be very useful for numerical
simulations, as we shall see in Sect. 3.5.
Due to the property that copulas are n-increasing, we can find an upper
and a lower bound for any copula. Choosing u2 = v2 = 1 in (3.15), we obtain
that any bivariate copula satisfies

C(u, v) ≥ u + v − 1 . (3.26)

Since, in addition, a copula is non-negative, we obtain a lower bound for any


bivariate copula:

C(u, v) ≥ max (u + v − 1, 0) . (3.27)

Similarly, choosing alternatively (u1 = 0, v2 = 1) and (u2 = 1, v1 = 0), we get


an upper bound for any bivariate copula

C(u, v) ≤ min(u, v) . (3.28)

It is clear that these two bounds fulfill all the requirements of copulas, quali-
fying the functions max (u + v − 1, 0) and min(u, v) as genuine bivariate cop-
ulas. These two bounds are thus the tightest possible bounds. Generalization
to higher dimension is straightforward, so that we can state
Proposition 3.2.3 (Fréchet-Hoeffding Upper and Lower Bounds).
Given an n-copula C, for all u1 , . . . , un ∈ [0, 1]:

max (u1 + . . . + un − n + 1, 0) ≤ C (u1 , . . . , un ) ≤ min (u1 , . . . , un ) .


(3.29)
3.3 A Few Copula Families 107

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.4 0.6 0.4 0.6
0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
v u v u

Fig. 3.1. The Fréchet-Hoeffding lower (left panel ) and upper (right panel ) bounds
for bivariate copulas

These lower and upper bounds, which constitute the so-called Fréchet-
Hoeffding bounds, are represented in Fig. 3.1 for the bivariate case. The upper
bound is itself an n-copula, while the lower one is a copula only for n = 2.
However, this lower bound remains the best possible insofar as, for any fixed
point (u1 , . . . , un ) ∈ [0, 1]n , there exists a copula C̃ such that, at this particular
point:

C̃ (u1 , . . . , un ) = max (u1 + · · · + un − n + 1, 0) . (3.30)

The Fréchet-Hoeffding upper bound represents the strongest form of depen-


dence that several random variables can exhibit. In fact, it is nothing but the
copula associated with comonotonicity. Similarly, when n = 2, the Fréchet-
Hoeffding lower bound is nothing but the copula of countermonotonicity.

3.3 A Few Copula Families


As shown from Sklar’s theorem 3.2.1, for each multivariate distribution, one
can easily derive a copula. Notwithstanding their formidable number, a few
copula families play a more important role.

3.3.1 Elliptical Copulas

Elliptical copulas derive from multivariate elliptical distributions [252]. Here,


we give the two most important examples, the Gaussian and Student’s copulas.
By construction, these two copulas are close to each other in their central part,
and become closer and closer in their tail only when the number of degrees of
freedom of the Student’s copula increases. As a consequence, it is sometimes
difficult to distinguish between them, even with sensitive tests. However, as we
shall see in Chap. 4, these two copulas may have drastically different behaviors
with respect to the dependence between extremes.
108 3 Notions of Copulas

Multiplicative factor models, which account for most of the stylized facts
observed on financial time series, generate distributions with elliptical copu-
las. Multiplicative factor models contain in particular multivariate stochastic
volatility models with a common stochastic volatility factor. They can be
formulated as

X=σ·Y , (3.31)

where σ is a positive random variable modeling the volatility, Y is a Gaussian


random vector, independent of σ and X is the vector of assets returns. In this
framework, the multivariate distribution of asset returns X is an elliptical
multivariate distribution. For instance, if the inverse 1/σ 2 of the square of the
volatility σ is a constant times a χ2 -distributed random variable with ν degrees
of freedom, the distribution of asset returns will be the Student distribution
with ν degrees of freedom. When the volatility follows ARCH or GARCH
processes, then the asset returns are also elliptically distributed with fat-tailed
marginal distributions. Such elliptical multivariate distribution ensures that
each asset Xi is asymptotically distributed according to a regularly varying
distribution:8 Pr{|Xi | > x} ∼ L(x)·x−ν – where L(·) denotes a slowly varying
function – with the same exponent ν for all assets.
Elliptical copulas have the advantage of being easily synthesized numer-
ically, which makes their use convenient for numerical simulations and for
the study of scenarios. This results from the fact that it is easy to generate
Gaussian or Student’s distributed random variables which, upon appropriate
monotonic changes of variables, give the correct marginal distributions while
conserving the copula unchanged.

The Gaussian Copula

The Gaussian copula is the copula derived from the multivariate Gaussian
distribution. The Gaussian copula provides a natural setting for generaliz-
ing Gaussian multivariate distributions into so-called meta-Gaussian distrib-
utions. Meta-Gaussian distributions have been introduced in [283] (see [163]
for a generalization to meta-elliptical distributions) and have been applied in
many areas, from the analysis of experiments in high-energy particle physics
[265] to finance [453]. These meta-Gaussian distributions have exactly the
same dependence structure as the Gaussian distributions while differing in
their marginal distributions which can be arbitrary.
Let Φ denote the standard Normal (cumulative) distribution and Φρ,n
the n-dimensional standard Gaussian distribution with correlation matrix ρ.
Then, the Gaussian n-copula with correlation matrix ρ is
 
Cρ,n (u1 , . . . , un ) = Φρ,n Φ−1 (u1 ), . . . , Φ−1 (un ) , (3.32)
8
See footnote 3 page 39.
3.3 A Few Copula Families 109
1 1e−005 0.1 1 0.1 0.01 1e−005
0.001 1e−
005 0.4 005
1e− 01 0.01 0.1 0.4 0.1 0.01 0.0
0.0 01

0.1
0.1
0.4 0.8 0.4
0.8 0.1
0.1

0.8
0.8 0.8
1.3
1

0 .8
0.0

0.4

0.0
0.

0.8
4
0.8 1.3

1
1.3
0.01

1e−005 1
1e−005

1.3

0.4
0.001

1.6 1.
0.6 0.6 3

0.4

0.00
1.6
0.1

0.1
v

0.1
0.1

0.8
0.0
1e 005
1.6

0.
1e 005
1. 6
1.3

8
0.4

01
0.4 0.4
0.8

0.4
0. 1.3
0.01

1.3
0.
4
1.3

1
0.0

1.3
0.4

0.0
0.8

1
0.2 0.2 0.1
0.8

0.8 0.1
0.4
0.1

0.1
0.4 01 0.0 0.8
0.4 0.0 01
0.1 0.01 1e− 0.01 0.1
005 005 0.4
0.1 0.01 0.001 1e− 0.1
0 1e 005 0 1e 005
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u

Fig. 3.2. Contour plot of the density (3.34) of the bivariate Gaussian copula with
a correlation coefficient ρ = 0.8 (left panel ) and ρ = −0.8 (right panel )

whose density (see Fig. 3.2)


∂Cρ,n (u1 , . . . , un )
cρ,n (u1 , . . . , un ) = (3.33)
∂u1 . . . ∂un
reads
 
1 1
cρ,n (u1 , . . . , un ) = √ exp − y t (u)(ρ−1 − Id)y (u) (3.34)
det ρ 2
with y t (u) = (Φ−1 (u1 ), . . . , Φ−1 (un )). Note that Theorem 3.2.1 and equation
(3.18) ensure that Cρ,n (u1 , . . . , un ) in (3.32) is a copula.
The Gaussian copula is completely determined by the knowledge of the cor-
relation matrix ρ. The parameters involved in the description of the Gaussian
copula are simple to estimate, as we shall see in Chap. 5.

Student’s Copula

Student’s copula is derived from Student’s multivariate distribution. It pro-


vides a natural generalization of Student’s multivariate distributions, in the
form of meta-elliptical distributions [163]. These meta-elliptical distributions
have exactly the same dependence structure as the Student’s distributions
while differing in their marginal distributions which can be arbitrary.
Given an n-dimensional Student distribution Tn,ρ,ν with ν degrees of free-
dom and a shape matrix ρ9
   x  xn
1 Γ ν+n 1
dx
Tn,ρ,ν (x) = √ ν  2 . . .
ν+n , (3.35)
det ρ Γ 2 (πν)n/2 −∞ −∞ x ρ−1 x
t 2
1+ ν
9
Note that the shape matrix ρ is nothing but the correlation matrix when the
number of degrees of freedom ν is larger than 2, namely when the second moments
of the variables Xi ’s exist.
110 3 Notions of Copulas
1 0.1 0.15 0.5 1 0.1 0.5 1 2

10
0.1 1 1.5
0.15 0.0

3
1 5
0.1 3

2
2
0.5

0.5
1 2
0.5

1.5
0.8 5 0.8 1

0.5
0.1

0.1

1
5 2
1 1.
5
0.

0.10.15
0.6 2 0.6 5

0.1
2 0.
1 1.5
1
v

v
5
0.
1.5

0.1
1 1

5
0.
0.4
0.15

0.4
0.1

2 2
0.5

5
2 1.

0.5
1
1
0.5

5
0.1
0.2 0.2

5
1

1.
0.5

0.1
0.1

1
2
21
3

2
5

0
0.
3 1 0.5 0.15 5
10 0.1 0 .1 1.5
0.15 0.1 32 1 0.5 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u

Fig. 3.3. Contour plot of the density (3.37) of a bivariate Student t copula with
a shape parameter ρ = 0.8 and ν = 2 degrees of freedom (left panel ) or ν = 10
degrees of freedom (right panel ). For small ν’s, the difference between the Student
copula and the Gaussian copula is striking on both diagonals. As ν increases, this
difference decreases on the second diagonal but remains large (for ν = 10) on the
main diagonal, as can be observed by comparing the above right with the left panel
of Fig. 3.2

the corresponding Student’s copula reads:


 
Cn,ρ,ν (u1 , . . . , un ) = Tn,ρ,ν Tν−1 (u1 ), . . . , Tν−1 (un ) , (3.36)

where Tν is the univariate Student’s distribution with ν degrees of freedom.


The density of the Student’s copula is thus


ν+1
 ν+n    ν n−1 &n yk2 2

1 Γ 2 Γ 2 k=1 1 + ν
cn,ρ,ν (u1 , . . . , un ) = √   ν+1 n
ν+n ,
det ρ Γ 2 y t ρ−1 y 2
1+ ν
(3.37)

where y t = (Tν−1 (u1 ), . . . , Tν−1 (un )). See also Fig. 3.3.
Since Student’s distribution tends to the normal distribution when ν goes
to infinity, Student’s copula tends to the Gaussian copula as ν → +∞ [350]:

sup |Cn,ρ,ν (u) − Cρ,n (u)| −→ 0, as ν → +∞ . (3.38)


u∈[0,1]n

The description of a Student copula relies on two parameters: the shape


matrix ρ, as in the Gaussian case, and in addition the number of degrees
of freedom ν. An accurate estimation of the parameter ν is rather difficult
and this can have an important impact on the estimated value of the shape
3.3 A Few Copula Families 111

matrix.10 As a consequence, the Student’s copula may be more difficult to


calibrate and to use than the Gaussian copula.

3.3.2 Archimedean Copulas

The importance of this class of copulas lies in that it encompasses a very


large number of copulas while enjoying a certain number of interesting prop-
erties. In addition, as pointed out by Frees and Valdez [183], a large number
of models developed to account for the dependence between various sources
of risks in the theory of insurance lead to Archimedean copulas. The factor
models constitute, however, a notable exception. While linear factor models
play a fundamental role in the phenomenological description of interactions
between financial assets, Archimedean copulas are not adequate to describe
their corresponding dependence structure. In the same vein, the Gaussian and
Student’s copulas, as well as any elliptical copula, are not Archimedean.
An Archimedean copula is defined as follows:
Definition 3.3.1 (Archimedean Copula). Let ϕ be a continuous strictly
decreasing, convex, function from [0, 1] onto [0, ∞] and such that ϕ(1) = 0.
Let ϕ[−1] be the pseudo-inverse of ϕ :
 −1
ϕ (t), if 0 ≤ t ≤ ϕ(0) ,
ϕ[−1] (t) = (3.39)
0, if t ≥ ϕ(0) ,

then the function

C(u, v) = ϕ[−1] (ϕ(u) + ϕ(v)) (3.40)

is an Archimedean copula with generator ϕ.


The generalization to an n-copula seems straightforward:

Cn (u1 , . . . , un ) = ϕ[−1] (ϕ(u1 ) + · · · + ϕ(un )) . (3.41)

However, this formulation holds— i.e., Cn is actually an n-Archimedean cop-


ula – if and only if ϕ[−1] is n-monotonic:

dk ϕ[−1] (t)
(−1)k ≥ 0, ∀k = 0, 1, . . . , n . (3.42)
dtk
When this later relation holds for all n ∈ N, ϕ[−1] is said completely monotonic.
In such a case, the bivariate Archimedean copula can be generalized to any
dimension.
10
Lindskog et al. [307] have recently introduced a robust estimation technique for
the calibration of the shape matrix of any elliptical copula, which is described in
Chap. 5.
112 3 Notions of Copulas
1 1

0.1

0.5

1.5
0.8 0.8

1.25
1.25
0.6 0.6

1
0.5
0.1
v

v
1

0.4 0.4 1.25


1

1
0.2 0.2

11.20.5
0.5

1.5
2

.55
0.1
25 1
1.
0.5
1 0.1

1
0 0 2.55 1 0.50 1 0.1

25
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u

Fig. 3.4. Contour plot of Clayton’s copula (left panel ) and contour plot of its
density (right panel ) for parameter value θ = 1

The complexity of the dependence structure between n variables usu-


ally described by a function of n variables is reduced and embedded, for
Archimedean copulas, into the function of a single variable, the generator
ϕ. This transforms a multidimensional formulation into a much simpler one-
dimensional one.
Among the large number of copulas in the Archimedean family, the fol-
lowing copulas can be mentioned:
• Clayton’s copula, which plays the role of a limit copula (see (3.61)):

 −1/θ
CθCl (u, v) = max u−θ + v −θ − 1 , 0 , θ ∈ [−1, ∞) (3.43)

1  −θ 
with generator ϕ(t) = t −1 ,
θ
• Gumbel’s copula, which plays a special role in the description of depen-
dence using extreme value theory (see next Sect. 3.3.3):

 1/θ
CθG (u, v) = exp − (− ln u)θ + (− ln v)θ , θ ∈ [1, ∞) (3.44)

θ
with generator ϕ(t) = (− ln t) ,
• Frank’s copula:
 
1 (e−θu − 1)(e−θv − 1)
CθF (u, v) = − ln 1 + , θ∈R (3.45)
θ e−θ − 1

e−θt − 1
with generator ϕ(t) = − ln .
e−θ − 1
Note that the bivariate Fréchet-Hoeffding lower bound is an Archimedean
copula, while the upper bound copula is not. For an overview of the members
of the Archimedean family, we refer to Table 4.1 in [370].
3.3 A Few Copula Families 113
1 1 1.
5

8
0.
1.4

1.
7
0.

3
1.2
1
9

1.1
0.
0.8 0.8

0.8
1.1
0.6 0.6 0.9
1
1
v

v
1

0.4 0.4 1 0.9


1.1

0.8

0.2 0.2

9
1.1

0.
0.7

1.2
1.
1.

3
4

8
1

0.
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u

Fig. 3.5. Contour plot of Frank’s copula (left panel ) and contour plot of its density
(right panel ) for parameter value θ = 1

A general procedure for constructing generators of the Archimedean copula


has been proposed by Marshal and Olkin [348]. They have proved that, given
a distribution function F defined on R+ such that F (0) = 0, the inverse
ϕ(t) = φ−1 (t) of the Laplace transform of F
 ∞
φ(t) = e−t·x dF (x) (3.46)
0

is the generator of an Archimedean copula.


This suggests that frailty models [236, 480] can provide a natural mecha-
nism for generating random variables with Archimedean copulas. Such models
are common in actuarial science, because they offer a simple way to study the
joint mortality of a group of individuals sharing common risk factors (see
[103, 182, 237] among many others). In finance, they can also model the joint
distribution of defaults of different obligators subjected to the same set of
economic factors.
In each case, one focuses on the continuous random variables Ti represent-
ing the survival time of the ith individual or company, i.e., the time before
death or default. Their individual survival distributions are defined by

Si (t) = Pr [Ti > t] , (3.47)

with hazard rate:


d
hi (t) = − ln Si (t) . (3.48)
dt
Conditional on a p-dimensional random vector Z representing the risk factors,
one can use a proportional hazard model [111], with the ith conditional hazard
rate given by

hi (t|Z) = eβ·Z bi (t) , (3.49)


114 3 Notions of Copulas

where the bi (t)’s are the base-line hazard rates and β is the vector of regression
parameters (the same for all individuals).
Defining the frailty variable V = eβ·Z and integrating the conditional haz-
ard rates, one obtains the expression of the conditional survival distributions:

 t
Si (t|V = v) = e−v·fi (t) , where fi (t) = bi (s)ds . (3.50)
0

Then, assuming that V has the distribution function F with Laplace transform
φ (cf. (3.46)), the joint survival distribution of the Ti ’s is given by
Pr [T1 > t1 , . . . , Tn > tn ] = EV [S1 (t1 |V ) · · · Sn (tn |V )] ,
 
= EV e−V ·(f1 (u1 )+···+fn (un )) ,
 ∞
= e−v·(f1 (u1 )+···+fn (un )) dF (v),
0
= ϕ−1 (f1 (u1 ) + · · · + fn (un )) . (3.51)
Since the unconditional marginal survival function of a given Ti reads

Si (ti ) = E V [Si (ti |V )] = ϕ−1 (fi (ui )) , (3.52)

Sklar’s theorem shows that the (survival) copula of all the Ti ’s is:

C̄(u1 , . . . , un ) = ϕ−1 (ϕ(u1 ) + · · · + ϕ(un )) , (3.53)

which is Archimedean, as expected.


As an example, let us consider Clayton’s copula. Equation (3.43) shows
−1/θ
that its generator is ϕ(t) = t−θ − 1, so that φ(t) = (1 + t) , which is pre-
cisely the Laplace transform of a Gamma distribution Γ (θ−1 , 1) with para-
meter 1/θ, θ > 0. As a consequence, considering a frailty variable V following
a Gamma distribution with parameters (1/θ, 1), θ > 0 and n conditionally
independent random variables Ui |V , with conditional law:
−θ
Pr [Ui ≤ ui |V = v] = e−v·(ui −1)
, ui ∈ [0, 1] , (3.54)

one obtains n uniformly distributed random variable Ui , whose dependence


structure is the Clayton copula with parameter θ.
Archimedean copulas enjoy the important property of associativity:

C3 (u, v, w) = C2 (u, C2 (v, w)) = C2 (C2 (u, v), w) , (3.55)

where C2 and C3 respectively denote the bivariate and trivariate form of


the copula under consideration. This property derives straightforwardly from
(3.41). In other words, given three random variables U, V and W , the depen-
dence between the first two random variables taken together and the third one
alone is the same as the dependence between the first random variable taken
3.3 A Few Copula Families 115

alone and the two last ones together. Therefore, if the dependence of the three
random variables is described by an Archimedean copula, this implies a strong
symmetry between the different variables in that they are exchangeable. As a
consequence, when there is no reason to expect a breaking of symmetry be-
tween the random variables, an Archimedean copula may be a good choice to
model their dependence. Such an assumption is often used in modeling large
credit baskets. A contrario, when the random variables play very different
roles, namely when they are not exchangeable, Archimedean copulas do not
provide valid models of their dependence.
Another interesting property of Archimedean copulas is that their values
C(u, u) on the first bisectrix verify the following inequality:

C(u, u) < u, for all u ∈ (0, 1) . (3.56)

Reciprocally, one can demonstrate [370, Theorem 4.1.6] that any copula pos-
sessing these two properties (associativity and C(u, u) < u) are Archimedean.
This provides an intuitive understanding of the nature of Archimedean copu-
las. It also allows one to understand why the Fréchet–Hoeffding upper bound
copula is not Archimedean. Indeed, although it enjoys the associativity prop-
erty, the Fréchet-Hoeffding upper bound is such that C(u, u) = u for all
u ∈ [0, 1] (note that it is the only copula with this property).
Archimedean copulas obey an important limit theorem [260] of the type
of the Gnedenko-Pikand-Balkema-de Haan (GPBH) theorem (see Chap. 2).
Consider two random variables, X and Y , distributed uniformly on [0, 1], and
whose dependence structure can be described by an Archimedean copula C.
Then, the copula associated with the distribution of left-ordered quantiles
tends, in most cases, to Clayton’s copula (3.43) in the limit where the prob-
ability level of the quantiles goes to zero. To be more specific, let us denote
by ϕ the generator of the copula C, assumed differentiable. Let us define the
conditional distribution
C(x ∧ u, u)
Fu (x) = Pr[X ≤ x|X ≤ u, Y ≤ u] = , ∀x ∈ [0, 1] , (3.57)
C(u, u)
where x ∧ u means the minimum of x and u, and the conditional copula
Cu (x, y) = Pr[X ≤ Fu−1 (x), Y ≤ Fu−1 (y)|X ≤ u, Y ≤ u]
 
C Fu−1 (x), Fu−1 (y)
= . (3.58)
C(u, u)
One can first show that, provided that ϕ is a strick generator (that is, ϕ(0)
is infinite such that ϕ[−1] = ϕ−1 ), Cu is a strict Archimedean copula with
generator:
 
ϕu (t) = ϕ Fu−1 (t) − ϕ(u), (3.59)
 −1

= ϕ t · ϕ (2ϕ(u)) − 2ϕ(u) , (3.60)
from which, it follows that the limiting behavior of Cu , as u goes to zero, is:
116 3 Notions of Copulas

lim Cu (x, y) = CθCl (x, y), ∀(x, y) ∈ [0, 1] × [0, 1] , (3.61)


u→0

provided that ϕ is regularly varying11 at zero, with index θ ∈ R+ . When θ = 0,


Cu tends to the independent copula while it tends to the Fréchet-Hoeffding
upper bound copula when θ = ∞.
Thus, Clayton’s copula plays, in some sense, a role similar in n dimensions
to the generalized Pareto distribution in one dimension:
−1/ξ
Gξ (x) = 1 − (1 + ξ · x) . (3.62)

This result is of particular relevance in the study of multivariate statistics of


extremes.

3.3.3 Extreme Value Copulas

Another family of copulas which is of common use is that of extreme value


copulas. These copulas are derived from the dependence structure of mul-
tivariate generalized extreme value (GEV) distributions, which provide the
limit distributions of the component-wise maxima of n-dimensional random
vectors, after a suitable normalization.
Consider T iid n-dimensional random vectors Xk = (Xk,1 , . . . , Xk,n ),
k = 1, . . . , T with distribution function F , and their component-wise maxima

Mj,T = max Xk,j . (3.63)


1≤k≤T

For suitably chosen norming sequences (ak,T , bk,T ), the limit distribution
 
M1,T − b1,T Mn,T − bn,T
lim Pr ≤ z1 , . . . , ≤ zn
T →∞ a1,T an,T

= lim F T (a1,T · z1 + b1,T , . . . , an,T · zn + bn,T ) , (3.65)


T →∞

if it exists, is given by

C (Hξ1 (z1 ), . . . , Hξn (zn )) , (3.66)

where Hξ is a GEV distribution (see Chap. 2), and C is – by definition – an


extreme value copula. Therefore, accounting for the general representation of
multivariate extreme value (MEV) distributions (see [107]), we can state that:

11
See footnote 3 page 39.
3.3 A Few Copula Families 117
1 1 0.1 1.5
0.1 11.25
5
0.5

55
11.21.
2
2
0.8 0.8

0 .1
1
25

5
0.5 1.

1.
0.6 0.6 1.5

0.5
5
1 1.2
5 1
v

v
1.2

0.5
1.5

5
1.
0.4 0.4

0.1
25
1.5 1.

1. 1
25
1

5
0.
0.2 0.2

5
1.
5 5
1.21 0.5 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u

Fig. 3.6. Contour plot of Gumbel’s copula (left panel ) and of its density (right
panel ) for the parameter value θ = 2

Definition 3.3.2 (Extreme Value Copula). Any copula which admits the
representation:
  
1 1
C(u1 , . . . , un ) = exp −V − ,...,− , (3.67)
ln u1 ln un

with
  
wi
V (x1 , . . . , xn ) = max dH(w) , (3.68)
Πn i xi

where H is any positive finite measure such that Πn
wi dH(w) = 1 and Πn
is the (n-1)-dimensional unit simplex:
 $
n
Πn = w ∈ Rn+ : wi = 1 , (3.69)
i=1

is an extreme value copula.


One immediately observes that V is a homogeneous function of degree −1.
Thus, any extreme value copula satisfies [248]:
α
C (u1 α , . . . , un α ) = [C (u1 , . . . , un )] , (3.70)

for all u ∈ [0, 1]n and all α > 0.


It is now easy to check that Gumbel’s copula (3.44) belongs to the class of
extreme value copula. It is depicted in Fig. 3.6: apart from a slight asymmetry
with respect to the second bisectrix, it looks similar to a Student’s copula.
The Fréchet-Hoeffding upper bound copula is also an extreme value copula
since
α
min (u1 α , . . . , un α ) = min (u1 , . . . , un ) . (3.71)
118 3 Notions of Copulas

It is interesting to notice that this copula is the only associative extreme value
copula which is not Archimedean. Indeed, due to the relation (3.70), either
C(u, . . . , u) = u for all u ∈ [0, 1] or C(u, . . . , u) < u for all u ∈ (0, 1).12
Since the Fréchet-Hoeffding upper bound copula is the only copula such that
C(u, . . . , u) = u, for all u ∈ [0, 1] [370], we can conclude that any extreme value
copula, which enjoys the associativity property, is an Archimedean copula.

3.4 Universal Bounds for Functionals


of Dependent Random Variables
In many situations, one has to consider various non-linear operations on sets
of dependent random variables. For instance, one has to aggregate several
risky positions in a portfolio, or to evaluate the pay-off of a derivative on a
basket of several underlying assets. Very often, the actual dependence of the
random variables under consideration is not known with sufficient accuracy. It
is therefore interesting to ask whether it would be possible to obtain (sharp)
bounds for the distribution of aggregated losses of a portfolio or of the pay-
offs of a derivative constructed on a basket of assets. We will discuss in detail
these two important examples in Sect. 3.6. For the time being, let us focus on
the following mathematical result.
Consider n random variables X1 , . . . , Xn with margins F1 , . . . , Fn and un-
known copula C. Let ψ : Rn −→ R and let Y = ψ (X1 , . . . , Xn ). The most
general result on bounds for Pr [Y ≤ y] 13 has been recently derived by Em-
brechts, Höing and Juri [145]:
Theorem 3.4.1. Let X1 , . . . , Xn be n random variables with margins
F1 , . . . , Fn and copula C. Let ψ : Rn −→ R be an increasing function, left
continuous in its last argument. Provided that there exists two functions Cinf
and Csup , increasing in each of their arguments, such that C ≥ Cinf and
C ∗ ≤ Csup ∗
(where the expression of the dual copula C ∗ is given in Definition
3.2.2), then

Finf (y) ≤ Pr [ψ (X1 , . . . , Xn ) ≤ y] ≤ Fsup (y) , (3.72)

where
12
Assuming that there exists a number u∗ ∈ (0, 1) such that C(u∗ , . . . , u∗ ) = u∗ ,
and raising this equation to the power α, it follows that C (u∗α , . . . , u∗α ) = u∗α
for any positive α, by (3.70). Note that u∗α spans the entire interval (0, 1) when
α ranges from zero to infinity. Thus, for all u ∈ (0, 1), C(u, . . . , u) = u, and since
this equality still holds when u = (0, . . . , 0) and u = (1, . . . , 1), we have:

∃u∗ ∈ (0, 1), C(u∗ , . . . , u∗ ) = u∗ =⇒ C(u, . . . , u) = u, ∀u ∈ [0, 1],

so that either C(u, . . . , u) = u for all u ∈ [0, 1] or C(u, . . . , u) < u for all u ∈ (0, 1).
13
Former results concerning the case where ψ is a sum of n terms or where ψ is an
increasing continuous function can be found for instance in [327, 179, 486, 126].
3.4 Universal Bounds for Functionals of Dependent Random Variables 119

Finf (y) = sup Cinf (F1 (x1 ), . . . , Fn−1 (xn−1 ), Fn (ξ(x1 , . . . , xn−1 , y))) ,
x1 ,...,xn−1 ∈R

(3.73)


Fsup (y) = inf Csup (F1 (x1 ), . . . , Fn−1 (xn−1 ), Fn (ξ(x1 , . . . , xn−1 , y))) ,
x1 ,...,xn−1 ∈R

(3.74)
with
ξ(x1 , . . . , xn−1 , y) = sup {t ∈ R; ψ (x1 , . . . , xn−1 , t) ≤ y} . (3.75)
A heuristic proof of this result can be found in Appendix 3.A.
In this theorem, Cinf and Csup can be copulas, but this is not necessary. In
particular, since any copula is larger than the Fréchet-Hoeffding lower bound,
in the absence of any information on the dependence between the random
variables, one can always resort to
Csup (u1 , . . . , un ) = Cinf (u1 , · · · , un ) = max(u1 +. . .+un −n+1, 0) . (3.76)
This allows one to derive a universal bound for the probability that
ψ (X1 , . . . , Xn ) be less than y. Obviously, when additional information on
the dependence is available, the bound can be improved. For instance, when
the random variables are known to be positive orthant dependent – we will
come back in Chap. 4 on this notion – we can choose the independence (or
product) copula14 for Cinf and Csup .
The bound provided by Theorem 3.4.1 is point-wise the best possible.
Indeed, as shown in [145, Theorem 3.2], there always exists a copula C̃ for
X1 , . . . , Xn such that the distribution of ψ(X1 , . . . , Xn ) reaches the bound, at
least at one point. Therefore, on the entire set of distribution functions, it is
not possible to improve on this bound.
To conclude this section, let us state a straightforward bound implied
by Theorem 3.4.1 for expectations. Denoting by Xinf and Xsup two random
variables with distribution functions Finf and Fsup respectively, and a non-
decreasing function G, we obviously have:
E [G (Xsup )] ≤ E [G (ψ (X1 , . . . , Xn ))] ≤ E [G (Xinf )] . (3.78)
Similar bound exists – mutatis mutandis – for any non-increasing function.
14
Recall that the independence (or product) copula is:
C(u, v) = u · v, ∀u, v ∈ [0, 1] ,
so that:
Pr [X ≤ x, Y ≤ y] = F (x, y) = C (FX (x), FY (y))
= FX (x) · FY (y) = Pr [X ≤ x] · Pr [Y ≤ y] .
120 3 Notions of Copulas

3.5 Simulation of Dependent Data


with a Prescribed Copula
An important practical application of copulas consists in the simulation of
random variables with prescribed margins and various dependence structures
in order to perform Monte-Carlo studies [171, 244], to generate scenarios for
stress-testing investigations or to analyze the sensitivity of portfolio alloca-
tions to various parameters. We will come back to these various applications
in the next section.
Here, we present several algorithms for the simulation of random variables
with copulas characterizing a large class of dependences. The conceptually
simplest approach is the acceptance-rejection method [218, 243]. However,
this method is relatively slow in large dimensions, and therefore becomes
unreliable due to the smallness of the size of obtainable statistical samples.
In addition, it does not lend itself well to the study of the impact of the
dependence structure on the optimal allocation of assets, for instance. As a
consequence, another approach is desirable.
In fact, Sklar’s theorem shows that the generation of n random variables
X1 , . . . , Xn with margins F1 , . . . , Fn and copula C can be performed as follows:
1. Generate n random variables u1 , . . . , un with uniform margins and copula
C.
2. Apply the inversion method to each ui , in order to generate each xi :
xi = Fi−1 (ui ) , (3.79)
where Fi−1 denotes the (generalized) inverse of Fi .
Therefore, the main difficulty in generating  n random variables  following the
joint distribution H (x1 , . . . , xn ) = C F1−1 (x1 ) , . . . , Fn−1 (xn ) lies in the gen-
eration of n auxiliary random variables with uniform margins and dependence
structure given by the copula C. We will now present two methods to simu-
late n-dimensional random vectors: the first one is specific to elliptical copulas
while the second one applies to a wide range of copulas.

3.5.1 Simulation of Random Variables Characterized


by Elliptical Copulas

The simulation of random variables whose dependence structure is given by


an elliptical copula is particularly simple. This is one of the many appeals of
this family of copulas.
By virtue of the invariance Theorem 3.2.2, the simulation of random vari-
ables with an elliptical copula is equivalent to the problem of the simula-
tion of elliptically distributed random variables. Therefore, simulating an n-
dimensional vector X = (X1 , . . . , Xn ) following an n-Gaussian copula with
correlation matrix ρ is particularly easy. Indeed, one has just to use the fol-
lowing algorithm.
3.5 Simulation of Dependent Data with a Prescribed Copula 121
1 1

0.8 0.8

0.6 0.6
v

v
0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u

Fig. 3.7. Five thousand realizations of two random variables whose distribution
function is given by the Gaussian copula with correlation coefficient ρ = 0.4 (left
panel ) and ρ = 0.8 (right panel )

Algorithm 1
1. Generate n independent standard Gaussian random variables: u = (u1 ,
. . . , un ) using the Box-Müller algorithm [77], for instance,
2. find the Cholevsky composition of ρ: ρ = A · At , where A is a lower-
triangular matrix,
3. set y = A · u,
4. and finally evaluate xi = Φ (yi ), i = 1, . . . , n, where Φ denotes the univari-
ate standard Gaussian distribution function.
To generate an n-dimensional random vector drawn from a more compli-
cated elliptical copula, it is useful to recall that any centered and elliptically
distributed random vector X admits the following stochastic representation
[252]:

X=R·N, (3.80)

where N is a centered Gaussian vector with covariance matrix Σ 2 and R is


a positive random variable independent of N. As an example, to generate an
n-dimensional random vector drawn from a Student copula with ν degrees of
freedom and shape matrix ρ, one has to follow
Algorithm 2
1. Generate n independent standard Gaussian random variables:
u = (u1 , . . . , un ),
2. find the Cholevsky composition of ρ: ρ = A · At ,
3. set z = A · u,
4. generate a random variable r, independent of z = (z1 , . . . , zn ) and following
a χ2 -distribution
√ with ν degrees of freedom,
5. set y = ν · r−1 · z,
122 3 Notions of Copulas
1 1

0.8 0.8

0.6 0.6
v

v
0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u

Fig. 3.8. Five thousand realizations of two random variables whose distribution
function is given by Student’s copula with shape coefficient ρ = 0.4 (left panel ) and
ρ = 0.8 (right panel ) and ν = 3 degrees of freedom

6. and finally, evaluate xi = Tν (yi ), i = 1, . . . , n, where Tν denotes the uni-


variate standard Student’s distribution function with ν degrees of freedom.
When the representation (3.80) is known explicitly, as in the example
involving the Gaussian or Student copulas, the generation of the n-dimensional
random vector by Algorithm 2 is straightforward. However, the law of the
random variable R is difficult to derive for most elliptical distributions. In
that case, the general algorithm described in the next paragraph is much
more useful.

3.5.2 Simulation of Random Variables Characterized


by Smooth Copulas

The second general method is based upon the simple fact that:
Pr [U1 ≤ u1 , . . . , Un ≤ un ] = Pr [Un ≤ un |U1 = u1 , . . . , Un−1 = un−1 ]
× Pr [U1 ≤ u1 , . . . , Un−1 ≤ un−1 ] ,
which gives
Pr [U1 ≤ u1 , . . . , Un ≤ un ] = Pr [Un ≤ un |U1 = u1 , . . . , Un−1 = un−1 ]
× Pr [Un−1 ≤ un−1 |U1 = u1 , . . . , Un−2 = un−2 ]
..
.
× Pr [U2 ≤ u2 |U1 = u1 ] · Pr [U1 ≤ u1 ] (3.81)
by a straightforward recursion.
Therefore, applying this reasoning to the n-copula C, and denoting by Ck
the copula of the k first variables, this yields:
C (u1 , . . . , un ) = Cn (un |u1 , . . . , un−1 ) . . . C2 (u2 |u1 ) · C1 (u1 ) , (3.82)
' () *
=u1
3.5 Simulation of Dependent Data with a Prescribed Copula 123

where we define:
∂u1 . . . ∂uk−1 Ck (u1 , . . . , uk )
Ck (uk |u1 , . . . , uk−1 ) = . (3.83)
∂u1 . . . ∂uk−1 Ck−1 (u1 , . . . , uk−1 )

As a consequence, in order to simulate n random variables with copula C,


one just has to
1. generate n uniform and independent random variables: v1 , . . . , vn ,
2. set u1 = v1 ,
3. set u2 = C2 −1 (v2 |u1 ),
..
.
n+1. set un = Cn −1 (vn |u1 , . . . , un−1 ).
This algorithm is particularly efficient when one considers the Archimedean
copula. Genest and MacKay [198] have shown that, in such a case, it is very
simple to generate pairs of random variables whose distribution function is
given by the copula C with generator ϕ. Indeed, the previous algorithm sim-
ply leads to
1. generate two uniform and independent random variables: v1 , v2 ,
2. set u1 = v1 , 

 
3. set u2 = ϕ[−1] ϕ ϕ−1 ϕ v(u2 1 ) − ϕ(u1 ) .

Applying this simplified algorithm to simulate Frank’s copula leads to the


following algorithm:
1. generate two uniform and independent random variables: v1 , v2 ,
2. set u1 = v1 ,  
v2 (e−θ −1)
3. set u2 = − θ1 ln 1 + v2 +(1−v2 )·e−θ·v1 .

The same scheme can also be used to simulate Clayton’s copula. However,
Devroye [129] has proposed a somewhat simpler method for Clayton’s copula
with positive parameter θ:
1. generate two standard exponential random variables: v1 , v2, 
2. generate a random variable x following the distribution Γ θ−1 , 1 ,
 −1/θ  −1/θ
3. set u1 = 1 + vx1 and u2 = 1 + vx2 .
This approach is in fact related to Marshall and Olkin’s work [348]. Indeed,
it is straightforward to check that, with the specification above, one has:
−θ
Pr [Ui ≤ ui |X = x] = e−x·(ui −1)
, (3.84)

as in (3.54). Figure 3.9 provides an example of the realizations obtained by


the use of these two algorithms.
A similar algorithm works for Gumbel’s copula with parameter θ > 1,
but it requires the generation of a random variable following a positive stable
124 3 Notions of Copulas
1 1

0.8 0.8

0.6 0.6
v

v
0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u

Fig. 3.9. Five thousand realizations of two random variables whose distribution
function is given by Clayton’s copula with parameter θ = 1 (left panel ) and by
Frank’s copula with parameter θ = 5 (right panel )

law with tail index 1/θ, since the inverse of the generator of such a copula
is φ(t) = e−t , t ≥ 0. For an overview and softwares to generate random
1/θ

Lévy variables, see the Web pages of Professors J. Huston McCulloch (http://
economics.sbs.ohio-state.edu/jhm/jhm.html) and John P. Nolan (http:
//academic2.american.edu/∼jpnolan/stable/stable.html).
To conclude on the question concerning the simulation of dependent ran-
dom variables, the second approach is sometimes more appropriate for n-
copulas, with n > 2, because the algorithm based upon the inversion of the
conditional copulas can rapidly become intractable for large n.

3.6 Application of Copulas


This section reviews several applications of copulas to risk assessment, in par-
ticular to tail risks in the presence of dependence [145, 179, 336], to option
pricing [99, 417] and also to default risks [184, 302, 303]. In view of the growing
importance of copulas in financial applications [100], an exhaustive presenta-
tion is not realistic. We thus restrict our discussion to examples that we have
found particularly illustrative. For many other examples, see the following
references concerning portfolio theory [228, 338], performance measurements
[240], insurance applications [183, 272] or decision theory [258], among many
others.

3.6.1 Assessing Tail Risk

One of the most important activities in the financial as well as in the actuarial
worlds consists in assessing the risk of uncertain aggregated positions. This
risk is often measured by the Value-at-Risk VaRα at probability level α. VaRα
is the lower α-quantile of the net risk position Y , as illustrated in Fig. 3.10:
3.6 Application of Copulas 125

α
Fy (y)

0
VaRα Losses

Fig. 3.10. Value-at-Risk at probability level α for the loss Y with distribution
function FY . We show the case where the distribution has a gap to exemplify how
the Value-at-Risk is defined in such a degenerate case

Varα = inf {t ∈ R; Pr [Y ≤ t] ≥ α} . (3.85)

In this definition of the Value-at-Risk, we take the convention of counting


losses as positive. In this section, we show how to bound the Value-at-Risk of
a portfolio using copulas.
Considering n risky investments or insurance losses X1 , . . . , Xn , the net
risk of the position is:

n
Y = wi · Xi , (3.86)
1

where wi denotes the weight of position i in the portfolio. It is convenient to


define X̃i = wi · Xi , so that Y simply becomes the sum of the X̃i ’s. If Fi is
the
distribution
function of each Xi , the distribution

function
of X̃i is F̃i (·) =
Fi w·i . Now, applying Theorem 3.4.1 with ψ X̃1 , . . . , X̃n = X̃1 + . . . + X̃n ,
we obtain – using slightly different notations:

Fmin (y) ≤ Pr [Y ≤ y] ≤ Fmax (y) , (3.87)

with
 n $

Fmin (y) = sup max F̃i− (xi ) − (n − 1), 0 (3.88)
x ∈A(y) i=1

and
126 3 Notions of Copulas
 n $

Fmax (y) = inf min F̃i (xi ), 1 , (3.89)
x ∈A(y)
i=1

where F̃i− denotes the left limit of F̃i and


 $
n
A(y) = x = (x1 , . . . , xn ) ∈ Rn ; xi = y . (3.90)
i=1

Therefore, a tight bound for the Value-at-Risk of the aggregated position Y


is:

VaRmin
α ≤ VaRα (Y ) ≤ VaRmax
α , (3.91)

with:

VaRmin
α = inf {t ∈ R; Fmax (t) ≥ α} , (3.92)

and

VaRmax
α = inf {t ∈ R; Fmin (t) ≥ α} . (3.93)

These two relations have a clear economic meaning: they represent respectively
the most optimistic and pessimistic outcomes one can expect in the absence
of any information on the actual dependence structure between the different
sources of risk.
A closed-form expression for Fmin and Fmax is almost impossible to obtain
in the general case where the marginal distributions of each of the assets
are different. However, when all the risks can be described by distributions
belonging to the same class, some general results have been obtained [126]. As
an example, let us consider the case of a portfolio made of n risks (with the
set of weights {wi , i = 1, . . . , n}) following shifted-Pareto distributions with
the same tail index β > 0:
 β
λi
Pr [Xi ≤ x] = 1 − , x ≥ θi . (3.94)
λi + (x − θi )

This model provides a reasonable description of the tails of the distribution


of returns of financial assets, such as stocks returns or FX (foreign exchange)
rates as discussed in Chap. 2, with β 3 − 4. Shifted-Pareto distributions are
also relevant for modeling insurance claims associated with industrial [496,
497] as well as natural disasters like, earthquakes [274, 411], floods [386] or
fires [328]. Using (3.94), the upper and lower bounds for the Value-at-Risk are
given by:

n  
1
VaRmin
α = wi · θi + max{wi · λi } · − 1 , (3.95)
i=1
i (1 − α)1/β
3.6 Application of Copulas 127

and

n
λ̃
VaRmax
α = wi · (θi − λi ) + , (3.96)
i=1
(1 − α)1/β

where
 n  1+β
 β
β

λ̃ = (wi · λi ) 1+β
. (3.97)
i=1

These relations have been obtained by recursion. Indeed, as emphasized by


Frank et al. [179], (3.88–3.89) involve searching an extremum over the hyper-
plane A(y). Such an extremum can be found recursively according to
 n $

(n)
Fmax (y) = inf min F̃i (xi ), 1 , (3.98)
x ∈A(y)
i=1
+ ,
(n−1)
= inf min Fmax (x) + F̃n (y − x), 1 , (3.99)
x ∈R

and equivalently
 n $

F̃i− (xi )
(n)
Fmin (y) = sup max − (n − 1), 0 , (3.100)
x ∈A(y) i=1
+ ,
(n−1)
= sup max Fmin (x) + F̃n (y − x) − 1, 0 . (3.101)
x ∈R

Unfortunately, this approach is efficient only as long as Fmax and Fmin remain
of the same class as the distributions Fi ’s, as occurs in the shifted-Pareto ex-
ample (3.94). In general, the Fi ’s are different and one has to rely on numerical
procedures to derive the bounds of real portfolio risks.
An efficient numerical algorithm has been proposed by Williamson and
Downs [486]. Starting with T − 1 observations of the risks X1 , . . . , Xn , one
first evaluates the upper and lower bounds for the VaRα of a portfolio made
of X1 and X2 . Let qi (k/T ) denote the empirical quantiles of order k/T of Xi .
Let us set −∞ < qi (0) < qi (1/T ) and qi (1 − 1/T ) < qi (1) < ∞. It can be
shown that convergent estimators of VaRmin α and VaRmax
α are given by:
ˆ min
VaRk/T = max {q1 (j/T ) + q2 ((k − j)/T )} , (3.102)
0≤j≤k
ˆ max = min {q1 (j/T ) + q2 (1 − (j − k)/T )} .
VaR (3.103)
k/T
k≤j≤T

In practice, the convergence of VaR ˆ k/T is very fast. Using the same kind
of arguments as in (3.98–3.100), it appears that this method can be used
iteratively, making possible the calculation of the bounds for (reasonably)
large portfolios. An illustration of this method for three portfolios made of
large capitalization US stocks is depicted in Fig. 3.11. For a portfolio of ten
stocks and T = 1500, only a few seconds are required to obtain the Value-at-
Risk bounds.
128 3 Notions of Copulas

0.2
2 asset portfolio
5 asset portfolio
10 asset portfolio
0.15

0.1
α
VaR

0.05

−0.05
0.85 0.9 0.95 1
α

Fig. 3.11. Upper and lower bounds for the VaR of a portfolio over the period from
25 January, 1995 to 29 December, 2000 made of two assets (Applied Materials Inc.
and Coca Cola Co: plain lines), five assets (the two above plus E.M.C Corp MA,
General Electric Co, General Motors Corp: dotted lines) and ten assets (the five
above plus Hewlett Packard Co, I.B.M Corp, Intel Corp, Medtronic Inc. and Merck
& Co Inc.: dash-dotted lines). We find practically identical results when exchanging
these assets with others from the largest capitalization stocks. The lower negative
bounds for portfolios of 5 and 10 assets correspond to the favourable situation where
diversification has removed the risks of losses

3.6.2 Asymptotic Expression of the Value-at-Risk

In several special cases, the tail risk of a portfolio, made of assets exhibit-
ing nontrivial dependence, can be approximately calculated by a linear or
quadratic approximation [472] or by using an asymptotic expansion. Here, we
follow this later approach and provide an example borrowed from [336].
Consider a portfolio of N assets whose dependence structure is given by
the Gaussian copula. We will discuss the relevance and the limits of this as-
sumption in Chap. 5. In addition, we assume that the returns of each asset
are distributed according to a so-called modified-Weibull distribution charac-
terized by its density
1 c −
|x| c
2 −1 e ( χ ) ,
c
p(x) = √ c |x| (3.104)
2 π χ2

or more generally
3.6 Application of Copulas 129

c
|x|
1 c+ c+

+
p(x) = √ 2 −1 e
c+ |x| if x ≥ 0 (3.105)
χ+

2 πχ 2
+

c
|x| −
1 c− c−

p(x) = √ 2 −1 e
c− |x| if x < 0 , (3.106)
χ−

2 πχ 2

when it is desirable to take into account a possible asymmetry between nega-


tive and positive values (thus leading to possible nonzero mean and skewness
of the returns). This parameterization has the remarkable property that if the
random variable X follows a modified-Weibull law with exponent c, then the
variable
 c
√ |X| 2
Y = sgn(X) 2 (3.107)
χ

follows a standard Gaussian law. This offers a simple visual test of the hypoth-
esis that the returns are distributed according to the modified-Weibull distrib-
ution: starting from the empirical returns, one transforms them by converting
the empirical distribution into a Gaussian one. Then, plotting the transformed
variables as a function of the raw returns should give the power law (3.107)
if the modified-Weibull distribution is a good model. Figure 3.12 shows the
(negative) transformed returns of the S&P’s 500 index as a function of the
raw returns over the time interval from 03 January, 1995 to 29 December,

1
10
Gaussian Returns

0
10

c/2 = 0.73

−1
10 −4 −3 −2 −1
10 10 10 10
Raw Returns

Fig. 3.12. Graph of the Normalized returns Y of the Standard & Poor’s 500 index
(as explained in the text) versus its raw returns X, from 03 January, 1995 to 29
December, 2000 for the negative tail of the distribution. The double logarithmic
scales clearly show a straight line over an extended range of data, qualifying the
power law relationship (3.107)
130 3 Notions of Copulas

2000. The double logarithmic scales of Fig. 3.12 qualifies a power law with
exponent c/2 = 0.73 over an extended range of data.
For such a portfolio constituted of assets with returns distributed according
to modified-Weibull distributions with the same exponent c > 1, it can be
shown that the distribution of its returns is still given by a modified-Weibull
law, in the asymptotic regime of large losses (counted as negative). Specifically,
the distribution function Fπ of the portfolio losses is asymptotically equivalent
to a modified-Weibull distribution function FZ ,

Fπ (x) ∼ λ · FZ (x), as x −→ −∞ , (3.108)

where λ is a constant, with the same exponent c and with a scale factor χ̂
given by:
 c−1
 c

χ̂ = wi χi σi , (3.109)
i

where χi is the scale factor of asset i, wi ≥ 0 is its relative weight in the


portfolio, σi is the solution of

N
Vik wk χk σk 1−c/2 = σi c/2 , ∀i = 1, . . . , N , (3.110)
k=1

and V is the correlation matrix of the Gaussian copula. The proof of this
result can be found in [336].
For two particular cases, the above equations allow us to retrieve simple
closed-form formulas. For independent assets, one has V = Id, so that the
solution of (3.110) is
1
σi = (wi χi ) c−1 , ∀i = 1, . . . , N (3.111)

and thus
N c−1   N2−1
 c
c
c
χ̂ = (wi χi ) c−1 , , and λ = , (3.112)
i=1
2(c − 1)

(see Appendix 3.B for a direct proof of this result). For comonotonic assets,
Vij = 1 for all i, j = 1, . . . N , which leads to
 c−1
1

N
σi = wk χk , ∀i = 1, . . . , N (3.113)
k=1

and thus

N
χ̂ = wi χi , and λ = 1 . (3.114)
i=1
3.6 Application of Copulas 131

This result is obvious and can be directly retrieved from the comonotonicity
between the assets. In fact, in such a case the distribution of the portfolio is
a modified-Weibull law, not only asymptotically but exactly over the whole
range.
Denoting by W (0) the initial amount of money invested in the risky port-
folio, the asymptotic Value-at-Risk, at probability level α, can easily be com-
puted with the formula
χ̂ 
α 2/c
VaRα W (0) 1/c Φ−1 1 − , (3.115)
2 λ
ξ(α)2/c W (0) · χ̂ , (3.116)
where the function Φ(·) denotes the cumulative Normal distribution function
and
1
α
ξ(α) ≡ √ Φ−1 1 − . (3.117)
2 λ
The example provided here for a portfolio made of assets whose dependence
is described by a Gaussian copula can be easily extended to more complex
cases. For instance, the same kind of asymptotic expansion can be performed
for the Student’s copula. This illustrates the simplification brought by the use
of copula for some parametric calculations of tail risks.

3.6.3 Options on a Basket of Assets

As suggested in [99, 417], copulas offer a useful framework for pricing mul-
tivariate contingent claims. Indeed, they provide natural pricing kernels that
allow one to determine the price of options defined on a basket of assets by
simply gathering the prices of options written on each individual asset.
Following [99], let us consider a market with two risky assets S1 and S2
and a risk-free asset B. For simplicity – but without loss of generality – the
risk-free interest rate is set to zero. Let us assume the existence of two digital
options O1 on S1 and O2 on S2 respectively, with maturity T . They pay one
monetary unit at time T if the value Si (T ) of the underlying asset at time T
is more than Ki . Their price Pi is:
 
Pi = EQ 1{Si (T )>Ki } = PrQ [Si (T ) > Ki ] , (3.118)

where Q denotes a risk-neutral probability measure, equivalent to the histor-


ical probability measure P. Q is unique when the market is complete.
Now, consider the bivariate digital option O which pays one monetary unit
at time T if the value S1 (T ) is larger than K1 and the value S2 (T ) is larger
than K2 . The price of such an option on a basket of two assets is
 
P = EQ 1{S1 (T )>K1 ,S2 (T )>K2 } = PrQ [S1 (T ) > K1 , S2 (T ) > K2 ] .
(3.119)
132 3 Notions of Copulas

By Sklar’s Theorem (3.2.1), we can write the price of the bivariate digital
option as a function of the price of each individual digital option:

P = C Q (P1 , P2 ) , (3.120)

where C Q is a risk-neutral (survival) copula. Just as the individual risk-neutral


density embodies traders’ expectations on future asset prices and therefore
represents a forward-looking indicator of market risk [44], the risk-neutral
copula contains the expectations on future co-movements of the basket of
assets [56].
Accounting for Fréchet-Hoeffding bounds, we can assert that the price of
any bivariate digital option must satisfy

max{P1 + P2 − 1, 0} ≤ P ≤ min{P1 , P2 } . (3.121)

This relation can be interpreted as a direct consequence of the no-arbitrage


principle, as we now show. The considered market exhibits four states, denoted
by HH, HL, LH and LL. In the first state, both S1 (T ) and S2 (T ) are larger
than K1 and K2 respectively. In the second state, only S1 (T ) is larger than K1 ,
while in the third state, only S2 (T ) is larger than K2 . In the fourth state, both
S1 (T ) and S2 (T ) are smaller than K1 and K2 respectively. It is convenient to
introduce the vector p whose components are the price of the bivariate digital
option, of the risk-free asset, and of the two digital options,
⎛ ⎞
P
⎜ 1 ⎟
p=⎜ ⎟
⎝ P1 ⎠ . (3.122)
P2

Let us introduce the matrix indicator defined by


⎛ ⎞
1HH (O) 1HL (O) 1LH (O) 1LL (O)
⎜ 1HH (1) 1HL (1) 1LH (1) 1LL (1) ⎟
Π=⎜ ⎟
⎝ 1HH (O1 ) 1HL (O1 ) 1LH (O1 ) 1LL (O1 ) ⎠ , (3.123)
1HH (O2 ) 1HL (O2 ) 1LH (O2 ) 1LL (O2 )

where
1HH (O) = 1{S1 (T )>K1 ,S2 (T )>K2 }|HH = 1 ,
1HL (O) = 1{S1 (T )>K1 ,S2 (T )>K2 }|HL = 0 ,
1LH (O) = 1{S1 (T )>K1 ,S2 (T )>K2 }|LH = 0 ,
1LL (O) = 1{S1 (T )>K1 ,S2 (T )>K2 }|LL = 0 ,
1HH (1) = 1HL (1) = 1LH (1) = 1LL (1) = 1 for the risk-free asset, and
3.6 Application of Copulas 133

1HH (O1 ) = 1{S1 (T )>Ki }|HH = 1 ,


1HL (O1 ) = 1{S1 (T )>Ki }|HL = 1 ,
1LH (O1 ) = 1{S1 (T )>Ki }|LH = 0 ,
1LL (O1 ) = 1{S1 (T )>Ki }|LL = 0 ,
1HH (O2 ) = 1{S2 (T )>Ki }|HH = 1 ,
1HL (O2 ) = 1{S2 (T )>Ki }|HL = 0 ,
1LH (O2 ) = 1{S2 (T )>Ki }|LH = 1 ,
1LL (O2 ) = 1{S2 (T )>Ki }|LL = 0 .
The first row of Π in (3.123) corresponds to the bivariate digital option, the
second row to the risk-free asset, the third row to the option on S1 and the
fourth row to the option on S2 . The first column corresponds to state HH,
the second column to state HL, the third column to state LH and the fourth
column to state LL. This yields
⎛ ⎞
1000
⎜1 1 1 1⎟
Π=⎜ ⎟
⎝1 1 0 0⎠ . (3.124)
1010

In short, the matrix Π allows one to obtain the value of the four assets in
each of the four states of the world.
The absence of arbitrage opportunity amounts to the existence of a vec-
tor p̃ with positive components such that the vector p of prices can be written
as follows [103, 214]

p = Π · p̃ . (3.125)

Since, in the present case, the market is complete by construction, the matrix
Π can be inverted and we have
⎛ ⎞
P
⎜ P1 − P ⎟
p̃ = Π −1 · p = ⎜

⎟ .
⎠ (3.126)
P2 − P
P − P1 − P2 + 1

Writing that all the components of p̃ are positive is equivalent to:

max{P1 + P2 − 1, 0} < P < min{P1 , P2 } . (3.127)

This retrieves (3.121) except for the fact that the Fréchet-Hoeffding bounds
are now excluded. In fact, as recalled earlier, the Fréchet-Hoeffding upper
and lower bounds are associated with the comonotonicity and the counter-
monotonicity. These two situations are obviously excluded from the formu-
lation in terms of the pricing kernel since the market cannot be considered
134 3 Notions of Copulas

as complete in those cases. Therefore, the prices associated with the Fréchet-
Hoeffding bounds are nothing but the static super-replication15 prices of the
bivariate digital option. Indeed, selling for instance the bivariate digital option
for the price P = min{P1 , P2 }, the trader can buy the least expensive of the
two digital options, say O1 if P1 ≤ P2 . Then, at maturity, she can pay one
monetary unit to the buyer of the binary digital option with certainty since
the binary option generates a cash-flow of one monetary unit if and only if
the world is in the state HH for which O1 also generates a cash-flow of one
monetary unit.
It is straightforward to extend the previous calculations to the case of
multivariate digital options written on a larger basket of underlying assets.
The restriction to bivariate digital options presented here is only for notational
convenience.
More generally, let us consider an option written on a basket of N under-
lying assets S1 , . . . , SN . Let the pay-off of such an option be

G [ψ (S1 (T ), . . . , SN (T ))] , (3.128)

where T still denotes the maturity. G is typically the univariate pay-off char-
acterizing the contract. For instance, for a European call with strike K, we
have:
+
G(x) = [x − K] . (3.129)

The function ψ describes how the N underlying assets Si determine the ter-
minal cash-flow. For instance, one can consider an option on the minimum of
the N assets

ψ (S1 (T ), . . . , SN (T )) = min{S1 (T ), . . . , SN (T )} , (3.130)

or on a weighed sum (a portfolio) of these assets


N
ψ (S1 (T ), . . . , SN (T )) = wi · Si (T ) . (3.131)
i=1

The fair price of such a contract is, as usual, given by

P = EQ [G (ψ (S1 (T ), . . . , SN (T )))] . (3.132)

Using Theorem 3.4.1 and (3.78), we can assert that

EQ [G (Ssup )] ≤ P ≤ EQ [G (Sinf )] , (3.133)

where Sinf and Ssup are two random variables with distribution functions
Finf and Fsup respectively (see Theorem. 3.4.1).
15
To super-replicate means to hedge with certainty.
3.6 Application of Copulas 135

As an example, let us consider a rainbow call16 on the minimum of the


N assets S1 , . . . , SN , with strike K and maturity T [466]. For simplicity, we
assume a zero interest rate. The value of such a contract is:

P = EQ [min{S1 (T ), . . . , SN (T )} − K]
+
. (3.134)

Denoting ψ = min{S1 (T ), . . . , SN (T )}, we have


PrQ [ψ ≤ x] = PrQ [min{S1 (T ), . . . , SN (T )} ≤ x] , (3.135)
Q
= 1 − Pr [min{S1 (T ), . . . , SN (T )} > x] , (3.136)
= 1 − PrQ [S1 (T ) > x, . . . , SN (T ) > x] , (3.137)
= 1 − C Q (P1 (x), . . . , PN (x)) , (3.138)
Q
where Pi (x) = Pr [Si (T ) > x] is the price of a digital option written on the
underlying asset Si , which pays one monetary unit if Si (T ) is larger than x.
This immediately yields

1 − min{P1 (x), . . . , PN (x)} ≤ PrQ [ψ ≤ x] (3.139)

and

PrQ [ψ ≤ x] ≤ 1 − max{P1 (x) + · · · + PN (x) − (N − 1), 0} . (3.140)

Thus, defining Sinf and Ssup as two random variables such that:
PrQ [Sinf ≤ x] = 1 − min{P1 (x), . . . , PN (x)}, (3.141)
Q
Pr [Ssup ≤ x] = 1 − max{P1 (x) + · · · + PN (x) − (N − 1), 0} (3.142)
it follows from (3.133) that

EQ [Ssup − K] ≤ P ≤ EQ [Sinf − K]
+ +
. (3.143)

The quantitative values of these two bounds are obtained after calibration
and numerical integration.
To obtain more accurate information on the price of options defined on
a basket of assets, it is necessary to specify the nature of the risk-neutral
copula. The problem comes from the fact that there exists no general rela-
tion between the historical copula C P and the risk-neutral C Q . However, in
some special cases, one can obtain this relation. For instance, in the multivari-
ate Black-Scholes model, both the historical and the risk-neutral copulas are
Gaussian copulas, with the same correlation matrix. This result generalizes to
the case where asset prices follow diffusion processes with deterministic drifts
and volatilities [112].
In the more realistic case where one considers a stochastic volatility model
(under P) like
16
Rainbow options get their name from the fact that their underlying is two or
more assets rather than one.
136 3 Notions of Copulas

dSi (t)
= µi (t, σi (t)) dt + σi (t)dBi (t), i = 1, . . . , N (3.144)
Si (t)
dσi (t) = ai (t, σi (t)) dt + bi (t, σi (t)) dWi (t) , (3.145)
for instance, where Bi (t) and Wi (t) denote standard Wiener processes and
where ai (·, ·) and bi (·, ·) are chosen such that the σi (t)’s remain positive almost
surely, one cannot express C P and C Q explicitly. In addition, since individual
volatilities are a non-traded assets, the market is incomplete, and the choice
of a risk-neutral measure Q – which amounts to choosing the market prices of
volatility risks λi – is not unique. One has to set additional constraints in order
to select an appropriate Q. Many methods have been developed for univariate
stochastic volatility models, which can be extended to the multivariate case.
Let us mention the minimal martingale measure [176, 434, 435], the minimal
entropy measure [192, 403] or the variance-optimal measure [54, 177, 226, 292,
384], for instance. All these examples are, in fact, particular cases of q-optimal
measures for q = 0, 1 and 2, respectively) [125, 234], i.e. measures which are
the closest to the objective (or historical) measure P in the sense of the qth
moment of their relative density. Such measures minimize the functional
   dQ q 
E q−1 q
, if Q  P
Hq (P, Q) = dP
(3.146)
+∞, otherwise ,

for q ∈ R \ {0, 1} and


   q 
E (−1)q+1 dQ · ln dQ
dP , if Q  P
Hq (P, Q) = dP
(3.147)
+∞, otherwise ,

for q ∈ {0, 1}. The symbol “” means absolutely continuous, i.e., the sets of
zero measure for P are also sets of zero measure for Q.
Such measures have the additional advantage of allowing an interpretation
in terms of utility maximizing agents. Indeed, asset prices obtained under q-
optimal measures represent the marginal utility indifferent prices for investors
with HARA17 utility functions [230].
Using the risk-neutral probability measure Q which amounts to taking a
vanishing market price of the volatility risk, and if in addition the rates of
return µi (t, σi (t)) do not depend on σi , then it can be shown that C P =
C Q (see Appendix 3.C). In such a case, the calibration of the copula under
historical data provides the risk-neutral copula.
Unfortunately, when these conditions are not met, or when one considers
more general diffusion models of the form
dSi (t) = µi (t, Si (t)) dt + σi (t, Si (t)) dWi (t), i = 1, . . . , N , (3.148)
it is in general impossible to obtain a relation between C P and C Q . In this
case, the risk-neutral copula can only be and has to be determined directly
17
Hyperbolic absolute risk aversion.
3.6 Application of Copulas 137

from options prices. In practice, when one deals with contracts which are not
actively traded, or contracts negotiated OTC,18 data may be rare, leading to
serious restrictions for the calibration of the risk-neutral copula and showing
the limit of the approach.

3.6.4 Basic Modeling of Dependent Default Risks

Default risk models are basically of two kinds. The first class contains models
which are close to many actuarial models. They rely on the assumption that,
conditional on a set of economic factors, the individual default probabilities
of each obligator are independent. Such models are known as mixture models
[248]. They include frailty models, presented page 113, as well as professional
models like CreditRisk+ [114]. It is in general difficult to obtain an analytical
expression of their dependence structure.
The second class of default risk models are based on Merton’s seminal work
on firm value [358]. In particular, industry standards like Moody’s KMV [273]
and RiskMetrics [406] are extensions of this original model. They consider
that the default of an obligator occurs when a latent variable, which usually
represents the firm’s asset value, goes below some level usually representing
the value of the firm’s liabilities. In the more recent model by Li [303], the
latent variables account for the time-to-default of an obligator and the crossing
level represents the time horizon of interest. These approaches are equivalent
since, once a dynamics is specified for the assets, one can derive, in principle,
the law of the time-to-default.
These models assume the same dependence structure for the latent vari-
ables, characterized by a Gaussian copula. Hence, the joint probability of
default is closely related to the Gaussian copula. Indeed, let us consider N
obligators and let Di be the default indicator of obligator i. Di equals one if
obligator i has defaulted and zero otherwise. Let (X1 , . . . , XN ) denote the vec-
tor of latent variables and (T1 , . . . , TN ) the vector of thresholds below which
default occurs:

Di = 1 ⇐⇒ Xi ≤ Ti . (3.149)

The joint probability that obligators i1 , . . . , ik (k ≤ N ) default is


Pr [Di1 = 1, Di2 = 1, . . . , Dik = 1] = Pr [Xi1 ≤ Ti1 , . . . Xik ≤ Tik ] ,
= C (Pr [Xi1 ≤ Ti1 ] , . . . , Pr [Xik ≤ Tik ]) ,
= C (πi1 , . . . , πik ) , (3.150)
where C denotes the (Gaussian) copula of the latent variables Xi1 , . . . , Xik
and πi1 , . . . , πik are the individual default probabilities of obligators i1 , . . . , ik .
18
Over-the-counter: a market for securities made up of dealers who may or may
not be members of a formal securities exchange. The over-the-counter market is
conducted over the telephone and is a negotiated market rather than an auction
market such as the NYSE.
138 3 Notions of Copulas

In the KMV methodology, the variables {Xi } model the return processes
of the assets. They are assumed multivariate Gaussian, and their correlations
are set by a factor model representing the various underlying macroeconomic
variables impacting the dynamics of the asset returns. Each threshold Ti is
determined by an option technique applied to the historical data of the ith
firm.
CreditMetrics’ approach is also based upon the assumption that the Xi ’s
are multivariate Gaussian random variables. However, they do not represent
the evolution of the asset value itself but the evolution of the rating of the
firm. The range of each Xi is divided into classes which represent the possible
rating classes of the firm. The classes are determined so that they agree with
historical data. This procedure allows one to fix simultaneously all the values
of the thresholds {Ti }. Again, the correlations are calibrated by assuming a
factor model.
In Li’s model, the latent variable Xi is interpreted as the time-to-default
of obligator i and the thresholds Ti ’s are all equal to T , the time horizon over
which the credit portfolio is monitored. Here, the multivariate distribution of
the Xi ’s is not Gaussian anymore (since, now, the Xi ’s are positive random
variables). The marginal distribution of each Xi is exponential with parameter
λi :

Pr [Xi ≤ xi ] = 1 − e−λi ·xi , (3.151)

while the copula remains Gaussian. Again, the correlations between the Xi ’s
can be determined from a factor model.
This recurrent use of a Gaussian factor model which is equivalent to de-
scribing the dependence between the latent variables in terms of a Gaussian
copula has been ratified by the recommendations of the BIS [42] concerning
credit risk modeling. However, there are many indications suggesting that
this Gaussian copula approach may be grossly inadequate to account for large
credit risk (see [186] for instance), since the Gaussian copula might – by con-
struction – underestimate the largest concomitant risks. We will come back
in more detail on this crucial point in the next chapter (Chap. 4) where we
will present and contrast the different available measures of dependence and
address more precisely how to assess the dependence in the tails of the distri-
bution.

Appendix

3.A Simple Proof of a Theorem on Universal Bounds


for Functionals of Dependent Random Variables

Here, we provide a simple heuristic proof of Theorem 3.4.1. For simplicity, we


restrict ourselves to the bivariate case: we consider a random vector (X, Y )
Appendix 139

ψ(x,y)≤t
* *
(x ,y )

X
Fig. 3.13. The area hatched with plain lines represents the set of points (x, y) such
that ψ(x, y) ≤ t. The area hatched with dashed lines represents the set of points
(x , y  ) such that x ≤ x∗ and y  ≤ y ∗ for some (x∗ , y ∗ ) satisfying ψ (x∗ , y ∗ ) = t. By
definition, the F -measure of this area is Pr [X ≤ x∗ , Y ≤ y ∗ ] = F (x∗ , y ∗ )

with joint distribution function F , and continuous margins FX and FY , respec-


tively. In addition, we assume that the function ψ is continuous and increasing
in each argument. In such a case, provided that t belongs to the range of ψ,
the set of points (x, y) such that ψ(x, y) is less than t has a typical shape
represented by the area hatched with plain lines in Fig. 3.13.
By definition, Pr [ψ(X, Y ) ≤ t] is the F -measure of this hatched area:

Pr [ψ(X, Y ) ≤ t] = dF (x, y) . (3.A.1)
ψ(x,y)≤t

For any couple (x∗ , y ∗ ) such that ψ (x∗ , y ∗ ) = t,

Pr [ψ(X, Y ) ≤ t] ≥ Pr [X ≤ x∗ , Y ≤ y ∗ ] = F (x∗ , y ∗ ) , (3.A.2)



since F (x∗ , y ∗ ) = dF (x, y) is the F -measure of the area hatched
{x≤x∗ ,y≤y ∗ }
with dashed lines in Fig. 3.13, which is included within the area representing
the set of points {(x, y) : ψ(x, y) ≤ t}. Given any copula Cinf such that

Cinf (u, v) ≤ C(u, v), ∀(u, v) ∈ [0, 1]2 , (3.A.3)

where C denotes the copula of the random vector (X, Y ), we can write:

Pr [ψ(X, Y ) ≤ t] ≥ Cinf (FX (x∗ ) , FY (y ∗ )) , (3.A.4)

for all (x∗ , y ∗ ) such that ψ (x∗ , y ∗ ) = t, which finally allows us to assert that:

Pr [ψ(X, Y ) ≤ t] ≥ sup Cinf (FX (x∗ ) , FY (y ∗ )) . (3.A.5)


ψ(x∗ ,y ∗ )=t
140 3 Notions of Copulas

This equation is equivalent to (3.73) under the restrictive assumptions re-


tained in this simple proof.
The proof of the second inequality of Theorem 3.4.1 follows the same line
of reasoning. One has just to consider Pr [ψ(X, Y ) ≥ t], which leads, mutatis
mutandis, to undervalue the survival copula of (X, Y ).

3.B Sketch of a Proof of a Large Deviation Theorem


for Portfolios Made of Weibull Random Variables

Let X1 , X2 , . . . , XN be N i.i.d random variables with density p(·). Let us


denote by f (·) and g(·) two positive functions such that p(·) = g(·)e−f (·) .
Let w , w , . . . , wN be N real (positive) non-random coefficients, and S =
N 1 2
i=1 wi xi . N
Let χ = {x ∈ RN , i=1 wi xi = S}. The density of the variable S is given
by
 N
PS (S) = dx e− i=1 [f (xi )−ln g(xi )] . (3.B.6)
χ

We will assume the following conditions on the function f :


1. f (·) is three times continuously differentiable and four times differentiable,
2. f (2) (x) > 0, for |x| large enough,
f (3) (x)
3. limx→±∞ (f (2) (x))2
= 0,
(3)
4. f is asymptotically monotonic,
(3)
5. there is a constant β > 1 such that ff (3)(β·x)
(x)
remains bounded as x goes to
infinity,
6. there exists C1 , C2 > 0 and some ν > 0 such that C1 · xν ≤ g(·) ≤ C2 · xν ,
as x goes to infinity.
Under the assumptions stated above, the leading order expansion of PS (S)
for large S and finite N > 1 is obtained by a generalization of Laplace’s
method which assumes that the set of x∗i ’s that maximize the integrand in
(3.B.6) are a solution of

fi (x∗i ) = σ(S)wi , (3.B.7)

where σ(S) is nothing but a Lagrange multiplier introduced to minimize the


N N
expression i=1 fi (xi ) under the constraint i=1 wi xi = S. This constraint
shows that at least one xi , for instance x1 , goes to infinity as S → ∞. Since
f  (x1 ) is an increasing function by Assumption 2, which goes to infinity as
x1 → +∞ (Assumption 3), expression (3.B.7) shows that σ(S) goes to infinity
with S, as long as the weight of the asset 1 is not zero. Putting the divergence
of σ(S) with S in expression (3.B.7) for i = 2, . . . , N ensures that each x∗i
increases when S increases and goes to infinity when S goes to infinity.
Appendix 141

Expanding fi (xi ) around x∗i yields


 x∗
i +hi
 t
f (xi ) = f (x∗i ) + f  (x∗i ) · hi + dt du f  (u) , (3.B.8)
x∗
i x∗
i

where the set of hi = xi − x∗i obey the condition


N
wi hi = 0 . (3.B.9)
i=1

Summing (3.B.8) over i in the presence of relation (3.B.9), we obtain


N 
N N 
 x∗
i +hi
 ti
f (xi ) = f (x∗i ) + dti dui f  (ui ) . (3.B.10)
i=1 i=1 i=1 x∗
i x∗
i


Thus exp(− f (xi )) can be rewritten as follows:
   N   

N 
N  x∗
i +hi ti
exp − f (xi ) = exp − f (x∗i ) − dti 
dui f (ui ) .
i=1 i=1 i=1 x∗
i x∗
i

(3.B.11)
N
Let us now define the compact set AC = {h ∈ RN , i=1 f  (x∗i )2 ·h2i ≤ C 2 }
N
for any given positive constant C and the set H = {h ∈ RN , i=1 wi hi = 0}.
We can thus write
 N
PS (S) = dh e− i=1 [f (xi )−ln g(xi )] , (3.B.12)
H
 N
= dh e− i=1 [f (xi )−ln g(xi )]
A ∩H
 C N
+ dh e− i=1 [f (xi )−ln g(xi )] . (3.B.13)
AC ∩H

Let us analyze in turn the two integrals of the right-hand side of (3.B.13).
Concerning the first integral, it can be shown that
 N  x∗ t
 − i +hi
i=1 x∗ dt x∗
du f  (u)−ln g(x∗
i +hi )
AC ∩H
dh e i i
lim N −1 & = 1, for some positive C.
S→∞ (2π) 2 g(x∗
i)
/ i
&N
N w2 f  (x∗ )
i j=1 j j
i=1 f  (x∗ )
i i

(3.B.14)

The cumbersome proof of this assertion is found in [336]. It is based upon the
fact that
142 3 Notions of Copulas

1. by Assumptions 1, 3, 4 and 5 for all h ∈ AC and all i > 0



sup (ξ)|
ξ∈Gi |f
(3)

≤ i , for x∗i large enough, (3.B.15)
f  (x∗i )
 
where Gi = x∗i − C
f  (x∗
, x∗i + f  Cx∗ ,
i) ( i)
2. for all i > 0 and x∗i large enough:

g(x∗i + hi )
∀h ∈ AC , (1 − i )ν ≤ ≤ (1 + i )ν , (3.B.16)
g(x∗i )

by Assumptions 1 and 6.
Now, for the second integral on the right-hand side of (3.B.13), we have
to show that
  ∗ ∗
dh e− f (xi +hi )−g(xi +hi ) (3.B.17)
AC ∩H

can be neglected. This is obvious since, by Assumption 2 and 6, the function


f (x) − ln g(x) remains convex for x large enough, which ensures that f (x) −
ln g(x) ≥ C1 |x| for some positive constant C1 and x large enough. Thus,
choosing the constant C in AC large enough, we have
 N
 N

− α
dh e− i=1 f (xi )−ln g(xi )
≤ dh e−C1 i=1 |xi ∗+hi |
∼ O e f  (x∗ )
AC ∩H AC ∩H
(3.B.18)

for some positive α. Thus, for S large enough, the density PS (S) is asymptot-
ically equal to
 N 
 (2π)
N −1
2 
PS (S) = g(x∗i )  &N
exp − f (x∗i ) . (3.B.19)
i
 N wi
2
j=1 f  (x∗ )
j j i=1
i=1 fi (x∗
i)

In the case of the modified Weibull variables, we have


 c
|x|
f (x) = , (3.B.20)
χ

and
c
g(x) = √ c/2 · |x| 2 −1 ,
c
(3.B.21)
2 πχ

which satisfies our assumptions if and only if c > 1. In such a case, we obtain
Appendix 143
1
c−1
w
x∗i = N i c ·S , (3.B.22)
j=i ωjc=1

which, after some simple algebraic manipulations, yields


  N2−1
c c 1 |S| c
√ c/2 |S| 2 −1 e−( χ̂ )
c
P (S) ∼ (3.B.23)
2(c − 1) 2 π χ̂
with
N c−1
 c
c

χ̂ = wic−1
·χ. (3.B.24)
i=1

Let us now consider N independent random variables X1 , X2 , . . . , XN with


modified-Weibull pdfs with the same exponent c > 1 but different scale factors
χi . Let w1 , w2 , . . . , wN be N non-random real coefficients. Then, the variable

SN = w1 X1 + w2 X2 + · · · + wN XN (3.B.25)

follows asymptotically a modified-Weibull with scale factor


N c−1
 c
c

χ̂ = |wi χi | c−1 , c>1. (3.B.26)


i=1

Indeed, let Y1 , Y2 , . . . , YN be N independent and identically distributed ran-


dom variables with modified-Weibull pdfs with same exponent c > 1 and scale
factor χ = 1. Then,
law
(X1 , X2 , . . . , XN ) = (χ1 Y1 , χ2 Y2 , . . . , χN YN ) , (3.B.27)

which yields
d
SN = w1 χ1 · Y1 + w2 χ2 · Y2 + · · · + wN χN · YN . (3.B.28)

Thus, (3.B.26) immediately follows from (3.B.24).

3.C Relation Between the Objective


and the Risk-Neutral Copula

Assuming that we have a filtered probability space (Ω, F, (Ft )0≤t≤T , P) – P


denotes the objective or historical probability measure – generated by a 2N -
dimensional Brownian motion (B1 , W1 , . . . , BN , WN ) with (constant) correla-
tion matrix ρ, let us consider the N -dimensional stochastic volatility model:
dSi (t)
= µi (t, σi (t)) dt + σi (t)dBi (t), i = 1, . . . , N (3.C.29)
Si (t)
dσi (t) = ai (t, σi (t)) dt + bi (t, σi (t)) dWi (t), (3.C.30)
144 3 Notions of Copulas

where Si is the price of asset i, while ai (·, ·) and bi (·, ·) are chosen so that the
volatility σi (t) of each asset remains positive almost surely. As an example,
one can choose
 
mi
ai (t, σi ) = κi − σi , and bi (t, σi ) = βi . (3.C.31)
σi
This stochastic volatility model is equivalent to the Heston model [232] written
for the squared volatility instead of the volatility itself. In the present case,
the condition κi · mi ≥ βi 2 , together with κi , mi > 0, ensures the positivity of
σi (t), provided that σi (0) > 0.
The solution of (3.C.29) with Si (0) = Si0 is:
 t    t 
1
Si (t) = Si0 exp µi (s, σi (s)) − σi (s) ds +
2
σi (s)dBi (s) ,
0 2 0
(3.C.32)

where σi (t) is solution of (3.C.30). Denoting by Zi (t) the random variable


 t   t
1
µi (s, σi (s)) − σi (s)2 ds + σi (s)dBi (s) , (3.C.33)
0 2 0

we can assert that the copula C P of (S1 (t), . . . , SN (t)) is the same as the
copula of (Z1 (t), . . . , ZN (t)), since each Si (t) = Si0 · exp [Zi (t)] is an increasing
transform of the corresponding Zi (t).
Assuming that the usual conditions are satisfied, Girsanov Theorem19 al-
lows us to assert that there exists a probability measure Q, equivalent to P
on FT , such that
 N    2
dQ  t
µi (s, σi (s)) 1 t µi (s, σi (s))
= exp − dBi (s) + ds
dP i=1 0 σi (s) 2 0 σi (s)
N  t  
1 t 2
− λi (s, σi (s)) dWi (s) + λi (s, σi (s)) ds , (3.C.34)
i=1 0 2 0
for any suitable processes (λ1 , . . . , λN ), and that
 t
µi (s, σi (s))
B̃i (t) = Bi (t) + ds, i = 1, . . . , N (3.C.35)
0 σi (s)
 t
W̃i (t) = Wi (t) + λi (s, σi (s)) ds, i = 1, . . . , N (3.C.36)
0
19
In the theory of probability, the Girsanov Theorem specifies how stochastic
processes change under changes in measure. The theorem is especially important
in the theory of asset pricing as it allows one to convert the physical measure
which describes the probability that an underlying (such as a share price or in-
terest rate) will take a particular value into the risk-neutral measure used for
evaluating the derivatives on the underlying.
Appendix 145

are Brownian motions under Q, with correlation matrix ρ. Since the volatility
is a non-traded asset, the problem of market incompleteness arises, so that
there is not a unique risk-neutral measure such that discounted assets prices
are martingale.
For simplicity, let us assume that the risk-free interest rate is vanishing so
that asset prices are directly discounted prices. Under any Q, using (3.C.35)
and (3.C.36), (3.C.29–3.C.30) can be written
dSi (t)
= σi (t)dB̃i (t), i = 1, . . . , N (3.C.37)
Si (t)
dσi (t) = [ai (t, σi (t)) − λi (t, σi (t)) · bi (t, σi (t))] dt
+ bi (t, σi (t)) dW̃i (t), (3.C.38)
which shows that Si (t) is a Q-martingale. The solution of (3.C.37) with
Si (0) = Si0 , under Q, is:
   t 
1 t
Si (t) = Si exp −
0 2
σi (s) ds + σi (s)dB̃i (s) , (3.C.39)
2 0 0

where σi (t) is now the solution of (3.C.38). Denoting by Z̃i (t) the random
variable
  t
1 t
− σi (s)2 ds + σi (s)dB̃i (s) , (3.C.40)
2 0 0
Q
we can assert

that the copula C of (S1 (t), . . . , SN (t)) is the same as the
copula of Z̃1 (t), . . . , Z̃N (t) .


Therefore, C P = C Q if and only if the copula of Z̃1 (t), . . . , Z̃N (t) is the
same as the copula of (Z1 (t), . . . , ZN (t)). In the general case, the Zi (t)’s and
Z̃i (t)’s are not simple increasing transforms of each other. Therefore, their
copulas are not identical and C P
= C Q . But in the particular case where the
rates µi are deterministic functions – i.e., independent of σi (t) – the copula
C P is nothing but the copula of the random variables:
  t
∗ 1 t 2
Zi (t) = − σi (s) ds + σi (s)dBi (s), i = 1, . . . , N , (3.C.41)
2 0 0

where σi (t) is the solution of (3.C.30), since the maps


x −→ Si0 ex (3.C.42)
are monotonous increasing functions of their argument. If the market prices
∗ ∗
λ

i ’s of volatility risks are vanishing, the vectors (Z1 (t), . . . , ZN (t)) and
Z̃1 (t), . . . , Z̃N (t) are equal in law, since (3.C.30) and (3.C.38) are then the


same. Thus, in this case, (Z1∗ (t), . . . , ZN∗
(t)) and Z̃1 (t), . . . , Z̃N (t) have the
same copula, and therefore C P = C Q .
4
Measures of Dependences

In the previous chapter, we have shown how to describe with copulas the
general dependence structure of several random variables, with the goal of
modeling baskets of asset returns, or more generally, any multivariate financial
risk. However, the general framework provided by copulas does not exclude
more specific measures of dependences that can be useful to target particular
ranges of variations of the random variables.
This chapter presents and describes in detail the most important depen-
dence measures. Starting with the description of the basic concept of linear de-
pendence, through linear correlation and canonical N -correlation coefficients,
we then focus on concordance measures and on more interesting families of
dependence measures. We then turn to measures of extreme dependence. In
each case, we underline their relationship with copulas.

4.1 Linear Correlations

4.1.1 Correlation Between Two Random Variables

The linear correlation is probably still the most widespread measure of de-
pendence, both in finance and insurance. Given two random variables X and
Y , the linear correlation coefficient is defined as:

Cov [X, Y ]
ρ(X, Y ) =  , (4.1)
Var [X] · Var [Y ]

provided that the variances Var [X] and Var [Y ] exist. Cov [X, Y ] is the co-
variance of X and Y . The coefficient ρ(X, Y ) is called a linear correlation
coefficient because its knowledge is equivalent to that of the coefficient β of
the linear regression Y = βX + , where  0is the residual which is linearly
Var[X]
uncorrelated with X. We have indeed ρ = β Var[Y ] .
148 4 Measures of Dependences
1
1

0.5

V
0
V

−0.5

−1
0
−1 −0.5 0 0.5 1 0 θ 1
U U

Fig. 4.1. Graph of the variable V = sin ω versus U = cos ω for ω ∈ [0, 2π] (left
panel ) and graph of the variable V = Uθ · 1U ∈[0,θ] + 1−U
1−θ
· 1U ∈[θ,1] (right panel )

Regularly varying random variables (power-like random variables) with a


tail index less than two do not have finite variances; they thus do not admit a
correlation coefficient. In addition, when the tail index belongs to the interval
(2, 4], the correlation coefficient exists but its Pearson estimator, based on a
sample of size T {(Xi , Yi )}Ti=1 :

1    
T
Xi − X̄ · Yi − Ȳ
T i=1
ρ̂T = 1 , (4.2)
2
21  T
  1 T
 
3 Xi − X̄ ·
2
Yi − Ȳ
2
T i=1 T i=1

where X̄ and Ȳ denote the sample means of X and Y respectively, performs


rather poorly, insofar as its asymptotic distribution is not Gaussian but Lévy
stable [356]. Therefore, a sample correlation coefficient may exhibit large de-
viations from its true value, providing very inaccurate estimates. This is par-
ticularly problematic for financial purposes since, as recalled in Chap. 2, the
existence of the fourth moment for the distribution of stock returns is still a
topic of active debate.
Considering two independent random variables, it is well known that their
correlation coefficient equals zero. However, the converse does not hold. In-
deed, given a random variable ω uniformly distributed in [0, 2π], let us define
the couple of random variables:

(U, V ) = (cos ω, sin ω) . (4.3)


4.1 Linear Correlations 149

It is easy to check that ρ(U, V ) = 0, even though the two random variables
are not independent, as shown in the left panel of Fig. 4.1 which plots the
variable V as a function of U .
More striking is the case where the knowledge of one of the variables com-
pletely determines the other one. As an example, consider a random variable
U, uniformly distributed on [0, 1] and the random variable V defined by:

V = Uθ U ∈ [0, θ],
(4.4)
1−U
V = 1−θ U ∈ [θ, 1] ,

for some θ ∈ [0, 1] (see right panel of Fig. 4.1). One can easily show that V is
also uniformly distributed on [0, 1] and that
ρ(U, V ) = 2θ − 1 , (4.5)
so that U and V are uncorrelated for θ = 1/2 while V remains perfectly
predictable from U .
When two random variables, X and Y , are linearly dependent:
Y =α+β·X , (4.6)
the correlation coefficient ρ(X, Y ) equals ±1, depending on whether β is pos-
itive or negative (in the previous example, this corresponds to θ = 1 or 0,
respectively). Here, the converse holds. This derives from the representation:
 
E (Y − (α + β · X))2
ρ(X, Y ) = 1 − min
2
, (4.7)
α,β Var [Y ]
where E [ ] denotes the expectation with respect to the joint distribution of
X and Y . ρ(X, Y )2 is called the coefficient of determination and gives the
proportion of the variance of one variable (Y ) that is predictable from the
other variable (X).
By Cauchy-Schwartz inequality, (4.1) allows one to show that ρ ∈ [−1, 1].
But, given two random variables X and Y with fixed marginal distribution
functions FX and FY , it is not always possible for the correlation coefficient
to reach the bounds ±1. Indeed, Chap. 3 has shown that any bivariate distri-
bution function F is bracketed by the Fréchet-Hoeffding bounds:
max {FX (x) + FY (y) − 1, 0} ≤ F (x, y) ≤ min {FX (x), FY (y)} . (4.8)
Therefore, applying Hoeffding identity [130]
 
ρ(X, Y ) = [F (x, y) − FX (x) · FY (y)] dx dy , (4.9)

one can now conclude that, given FX and FY , the correlation coefficient ρ
lies between ρmin and ρmax , where ρmin is attained when X and Y are coun-
termonotonic random variables while ρmax is attained when X and Y are
comonotonic random variables.
150 4 Measures of Dependences

As an illustration, let us consider the following example from Embrechts


et al. [149]. Given two random variables with log-normal marginal distribu-
tions: X ∼ log N (0, 1) and Y ∼ log N (0, σ), the upper and lower bounds for
ρ(X, Y ) are given by
   
ρmin = ρ eZ , e−σZ and ρmax = ρ eZ , eσZ , (4.10)
where Z is a standard Gaussian random variable. A straightforward calcula-
tion, based upon the fact that
  α2
E eα·Z = e 2 , (4.11)
gives
e−σ − 1 eσ − 1
ρmin = 0   and ρmax = 0   . (4.12)
(e − 1) eσ2 − 1 (e − 1) eσ2 − 1

Figure 4.2 represents these two bounds as a function of σ. As σ becomes


of the order of or larger than 3, ρmin becomes extremely close to zero, so
that in this case an (almost) vanishing correlation coefficient corresponds to
a countermonotonic relation between the two random variables. For σ larger
than 4, both the lower and the upper bounds can hardly be distinguished
from zero. Thus, a very small value of the correlation coefficient cannot (must
not) be always considered as the signature of a weak dependence between two
random variables.
The correlation coefficient is invariant under an increasing affine change
of variable of the form

0.8

0.6

0.4

0.2
ρ

−0.2

−0.4

−0.6

−0.8

−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
σ

Fig. 4.2. Graph of ρmin and ρmax given by (4.12) versus σ for two random variables
with log-normal marginal distributions: log N (0, 1) and log N (0, σ)
4.1 Linear Correlations 151

X  = a · X + b, a>0, (4.13)

Y = c · Y + d, c>0, (4.14)
 
since ρ (X , Y ) = ρ (X, Y ). However, this property does not generalize to any
(nonlinear) increasing transformation. As a consequence, the correlation coef-
ficient does not give access to the dependence between two random variables
in the sense of Chap. 3. This lack of invariance with respect to nonlinear
changes of variables is due to the fact that the correlation coefficient aggre-
gates information on both the marginal behavior of each random variable and
on their true dependence structure given by the copula.

4.1.2 Local Correlation

Instead of focusing on the overall correlation, one can look at the local linear
dependence between two random variables. This idea, introduced by Doksum
et al. [58, 134], enables one to probe the changes of the correlation strength as
a function of the value of the realizations of the random variables. It allows, for
instance, to address the question of whether the correlation remains constant
or vary when the realizations of the random variables are typical or not. This
is particularly useful when dealing with contagions of crises (see Chap. 6)
or when investigating whether flight-to-quality actually occurs between stock
and bond markets, for instance.
The definition of the local correlation coefficient is quite natural. It starts
from the remark that, in a linear framework, if the two random variables X
and Y are related by

Y = α + βX +  , (4.15)

where  is independent from (or at least uncorrelated with) X, the correlation


coefficient reads
β · σX
ρ=  , (4.16)
β2 · σX
2 + σ2


2
where σX and σ2 denote respectively the variance of X and of the error term
.
Let us now assume that the more general relation

Y = f (X) + σ(X) ·  (4.17)

holds between X and Y , with σ = 1 and f differentiable. In the neighborhood


of X = x0 , one can linearize the relation above as follows:

Y = [f (x0 ) − x0 · f  (x0 )] + f  (x0 )X + σ (x0 )  (4.18)

and, by analogy with (4.16), define the local linear correlation coefficient by
152 4 Measures of Dependences

f  (x0 ) · σ(x0 )
ρ (x0 ) = 0 . (4.19)
2 2
f  (x0 ) · σ(x0 )2 + σ (x0 )

It is straightforward to check that the local correlation coefficient reduces to


the usual linear correlation coefficient when f is an affine mapping and σ(x)
remains constant. In addition, the local correlation coefficient ρ(x) fulfills the
same main properties as the linear correlation coefficient ρ:
1. ρ(x) ∈ [−1, 1],
2. ρ(x) is invariant under (increasing) linear mappings in both X and Y ,
3. ρ(x) = 0 for all x if X and Y are independent.
Beside and in constrast with the linear correlation coefficient, the local corre-
lation coefficient equals ±1 only if σ(x) is zero (the sign depends on that of
the derivative of f ), so that Y = f (X). Thus, the local correlation coefficient
avoids the drawback of the linear correlation coefficient that a vanishing value
can be found even when X and Y are deterministically related to each other.

4.1.3 Generalized Correlations Between N > 2 Random Variables

The (overall) correlation coefficient ρ is a linear measure of dependence be-


tween two random variables. We now present a natural generalization to N
random variables, whose exposition borrows from [324].
Let us denote by X (t) a random vector of N components, for instance the
vector of returns of N assets in a portfolio. The mean values of the components
of X (t) are first estimated and then subtracted to each vector X (t) for t =
1, . . . , L, where L denotes the sample size, equal for instance to the chosen
length of the time interval used for the estimations. For ease of notation, we
keep X (t) to represent the now centered vectors. The sample estimate of the
covariance matrix of these N random variables over some interval of length L
is

1
L
S X(t) = X (t) · X (t)T , (4.20)
L t=1

where T denotes the transpose.


Let us now divide the N components of the vectors X (t) into two parts: a
scalar Xi (t) constituted of one of the components and an (N − 1)-dimensional
column vector ξ i (t) = [X1 (t), . . . , Xi−1 (t), Xi+1 (t), . . . , XN (t)]T made of the
other components. By multiplying (scalar product) each vector ξ i by some
still unknown vector φ, we obtain a set of scalar values ζi = φT · ξ i . Let us
now search for the vector φ which makes the square of the correlation coeffi-
cient between the two random variables Xi and ζi maximum. This procedure
constitutes an example of the implementation of the classical solution devel-
oped by Hotelling [235, 398] on canonical correlations: the vector φ is defined
as the eigenvector corresponding to the maximal eigenvalue (which is equal
4.1 Linear Correlations 153

to the maximal correlation coefficient between the two random variables Xi


and ζi ) of the following matrix of size (N − 1) × (N − 1):

S −1 −1
ξi ξi S ξi Xi SXi Xi S Xi ξi (4.21)

where

S Xi Xi = Cov(Xi , Xi ) , S Xi ξi = S Tξi Xi = Cov(Xi , ξ Ti ) , S ξi ξi = Cov(ξ i , ξ i ) .


(4.22)

The matrices in formulas (4.21) and (4.22) are submatrices of the general
N × N covariance matrix S XX = Cov(X, X T ) (whose estimation is given
in (4.20)). Thus, replacing the matrix S XX (and its submatrices) in (4.21)
and (4.22) by its sample estimate (4.20) allows one to compute the vector φ
and the set of scalar values ζi for i = 1, . . . , N . One can call the maximum
eigenvalue of the matrix (4.21) the “canonical coefficient of N -correlation”
between the random variable Xi and the other N −1 variables, which captures
the common factors between Xi and all the other N − 1 variables. Performing
similar operations with all other components of the vector X, one thus obtains
a N -dimensional vector of canonical coefficients of N -correlation equal to the
largest eigenvalues of the matrices (4.21) for i = 1, . . . , N . For N = 2, the (N −
1)-dimensional matrix (4.21) reduces to the square of the standard correlation
coefficient between the N = 2 variables.
A slightly different but equivalent formulation is as follows. Consider the
regression of a random variable Xi on the (N − 1)-dimensional random vec-
tor ξ i (t) = [X1 (t), . . . , Xi−1 (t), Xi+1 (t), . . . , XN (t)]T , i.e., the evaluation of a
vector φ of regression coefficients in the linear formula:

Xi = φj Xj + i = φT · ξ i + i , (4.23)
j =i

where i is a regression residual. If the vector φ is defined by the least-squares


method of minimizing


L
 2
φT · ξ i − Xi (4.24)
t=1

with respect to φ, then its estimate is easily obtained as

φ̂ = S −1
ξi ξi · S ξi Xi . (4.25)

T
Let ξˆi = φ̂ · ξ i denote the contribution to the regression (4.23) for this
estimate (4.25). Since

Cov(Xi , ξ̂ i ) = Cov(Xi , S −1 −1
ξi ξi · S ξi Xi · ξ i ) = S Xi ξi · S ξi ξi · S ξi Xi , (4.26)
154 4 Measures of Dependences

it follows that the correlation coefficient between φ̂ given by (4.25) and Xi


is equal to the scalar S Xi ξi · S −1 −1
ξi ξi · S ξi Xi · SXi Xi , which is nothing but the
maximum eigenvalue of the matrix (4.21) [398]. This shows that the canonical
coefficient of N -correlation:

ρiN = S Xi ξi · S −1 −1
ξi ξi · S ξi Xi · SXi Xi (4.27)

can be determined from the solution of the regression problem (4.23, 4.24).
This correspondence between the two formulations is rooted in the equivalence
between linear correlation and the coefficient of linear regression, as pointed
out above.
Again, this canonical coefficient of N -correlation is, by construction, in-
variant under linear transformations of each Xi individually. However, it is
not left unchanged under nonlinear monotonic transformations. It is therefore
necessary to look for other measures of dependence which are only functions
of the copula. The concordance measures described below enjoy this property.

4.2 Concordance Measures

4.2.1 Kendall’s Tau

A fundamental question for financial risk management is the following:


“Do the prices of two (or more) assets tend to rise or fall together?”
If the answer is affirmative, the diversification of risks will probably be dif-
ficult, since diversification is based upon the fact that the fall of an asset is
statistically balanced by the rise of another one. A natural way to quantify
the propensity of assets to move together is to compare the probability that
they rise (or fall) together with the probability that one of the two assets rises
(respectively falls) while the other one falls (respectively rises). This can be
translated mathematically as follows. Starting with two independent realiza-
tions (X1 , Y1 ) and (X2 , Y2 ) of the same pair of random variables (X, Y ), let
us consider the quantity

τ = Pr [(X1 − X2 ) · (Y1 − Y2 ) > 0] − Pr [(X1 − X2 ) · (Y1 − Y2 ) < 0] .


(4.28)

The left-most term in the r.h.s. (right-hand side) gives the probability of con-
cordance, i.e., the probability that X and Y move together upward or down-
ward. In contrast, the right-most term in the r.h.s. represents the probability
of discordance, i.e., the probability that the two random variables move in
opposite directions.
The expression (4.28) defines the population version of the so-called
Kendall’s τ . This quantity is invariant under increasing transformation of the
4.2 Concordance Measures 155

marginal distributions. Indeed, given any increasing mapping GX and GY , we


have
X1 ≥ X2 ⇐⇒ GX (X1 ) ≥ GX (X2 ) , (4.29)
Y1 ≥ Y2 ⇐⇒ GY (Y1 ) ≥ GY (Y2 ) . (4.30)
As a consequence, Kendall’s τ depends only on the copula of (X, Y ). For
continuous random variables, expression (4.28) can be transformed into

τ = 2 Pr [(X1 − X2 ) · (Y1 − Y2 ) > 0] − 1 , (4.31)

which yields the following expression in terms of a functional of the copula C


of the two random variables:
 
τ (C) = 4 C(u, v) dC(u, v) − 1 . (4.32)

From this equation, one easily checks that Kendall’s τ varies between −1
and +1. The lower bound is reached if and only if the variables (X, Y ) are
countermonotonic, while the upper bound is attained if and only if (X, Y ) are
comonotonic. In addition, τ equals zero for independent random variables.
However, as for the (linear) correlation coefficient, τ may vanish even for
non-independent random variables.
In spite of its attractive structure, (4.32) is not always very useful for calcu-
lations and one often has to resort to numerical integration (by use of quadra-
ture, for instance). However, some more tractable expressions have been found
for particular families of copulas.

Archimedean Copulas

Genest and McKay [198] have shown that, for generators ϕ which are strictly
decreasing functions from [0, 1] onto [0, ∞] with ϕ(1) = 0, Kendall’s τ of the
Archimedean copula

C(u, v) = ϕ−1 (ϕ(u) + ϕ(v)) (4.33)

is given by
 1
ϕ(t)
τ =1+4 dt . (4.34)
0 ϕ (t)

This expression relies on the general fact that (4.32) can be rewritten as

τ = 4 · E [C(U, V )] − 1 , (4.35)

where U and V are uniform random variables with joint distribution function
C. Now, in the particular case of an Archimedean copula, one can show that
[370]
156 4 Measures of Dependences

ϕ(t)
Pr [C(U, V ) ≤ t] = t − , (4.36)
ϕ (t+ )

which immediately yields the results given by (4.34). Table 4.1 provides closed
form expressions for Kendall’s τ ’s of Clayton’s copula, Gumbel’s copula and
Frank’s copula, which are shown in Fig. 4.3 as a function of their corresponding
form parameters θ.

Table 4.1. Expression of Kendall’s


x τ for three Archimedean copulas. D1 denotes
the Debye function D1 (x) = x1 0 dt et t−1

Copula ϕ(t) Kendall’s τ Range


1 −θ θ
Clayton t −1 θ ∈ [−1, ∞]
θ θ+2
θ−1
Gumbel (− ln t)θ θ ∈ [1, ∞]
θ
e−θt − 1 4
Frank − ln 1− [1 − D1 (θ)] θ ∈ [−∞, ∞]
e−θ − 1 θ

0.8

0.6

0.4

0.2

0
τ

−0.2

−0.4

−0.6

−0.8

−1
−10 0 10 20 30 40 50
θ

Fig. 4.3. Graph of Kendall’s τ ’s as a function of the form parameter θ defined


in Table 4.1, for Clayton’s copula (dotted line), Gumbel’s copula (dashed line) and
Frank’s copula (plain line). Kendall’s τ for Frank’s copula is symmetric with respect
to the origin
4.2 Concordance Measures 157

Elliptical Copulas
This particularly useful family of copulas also allows for tractable calculation
of Kendall’s τ . Generalizing the result originally obtained by Stieltjes [115] for
the Gaussian distribution, Lindskog et al. [307] have shown that the relation
2
τ= arcsin ρ (4.37)
π
holds for any pair of random variables whose dependence structure is given
by an elliptical copula. The parameter ρ denotes the shape coefficient (or
correlation coefficient, when it exists) of the elliptical distribution naturally
associated with the considered elliptical copula.
This result is particularly interesting because it provides a robust esti-
mation method for the shape parameter ρ. Of course, when the elliptical
distribution associated with the elliptical copula admits a second moment,
the correlation coefficient exists and ρ can be estimated from Pearson’s coeffi-
cient (4.2). However, when the elliptical distribution does not admit a second
moment, this approach fails. In this case, Kendall’s τ has the advantage of
always existing and of being easily estimated. In fact, its superiority is even
greater, as demonstrated by Fig. 4.4 which shows that estimates of τ yield
more robust estimates of ρ via (4.37). This is especially true when the tails
of the marginals associated with the elliptical distributions are heavy. In the
example depicted in Fig. 4.4, we have considered two Student’s distributions
with three and ten degrees of freedom respectively. While the estimates of
ρ provided by Kendall’s τ (dashed curve) remain approximately equally effi-
cient in both cases, the efficiency of the estimates of ρ provided by Pearson’s

5 6

5
4

4
3

2
2

1
1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ρ ρ

Fig. 4.4. Probability density function of the correlation coefficient ρ estimated


from synthetic realizations generated with a student distribution with three degrees
of freedom (left panel ) and ten degrees of freedom (right panel ) both with a true
value of ρ = 0.6 for a sample size equal to 100. The continuous curve represents the
pdf obtained from Pearson’s estimator while the dashed curve gives the pdf obtained
when estimating Kendall’s τ and applying (4.37)
158 4 Measures of Dependences

coefficient (continuous curve) drops dramatically as the number of degrees


of freedom of the Student’s distributions decreases. This phenomenon can be
ascribed to the fact that, with three degrees of freedom, the correlation co-
efficient still exists but the asymptotic distribution of Pearson’s coefficient is
not Gaussian but has a heavy tail, as recalled in Sect. 4.1.

4.2.2 Measures of Similarity Between Two Copulas

Consider two copulas C1 , C2 and the copula C = w · C1 + (1 − w) · C2 , with


w ∈ [0, 1]. Chapter 3 has recalled that the convex sum of several copulas
remains a copula. Kendall’s τ of copula C can be written as

τC = w2 · τC1 + 2w(1 − w) · Q (C1 , C2 ) + (1 − w)2 · τC2 , (4.38)

with
 
Q (C1 , C2 ) = 4 C1 (u, v) dC2 (u, v) − 1 , (4.39)
[0,1]2
 
=4 C2 (u, v) dC1 (u, v) − 1 . (4.40)
[0,1]2

To provide an intuitive interpretation of Q (C1 , C2 ), let us consider two copulas


C1 and C2 with identical Kendall’s τ : τC1 = τC2 = τ . This means that,
through the prism of Kendall’s τ , these two copulas C1 and C2 have the same
dependence. Kendall’s τ of the copula C formed by their convex sum will also
be equal to τ , for all values of w, if and only if Q (C1 , C2 ) = τ , that is, if
expression (4.40) is equal to expression (4.32) obtained for either C1 or C2 .
The difference
 
4 [C1 (u, v) − C2 (u, v)] dC2 (u, v) (4.41)
[0,1]2

between these two expressions (4.40) and (4.32) therefore allows one to define
the notion of proximity between two copulas.
Since any copula is bounded by the Fréchet-Hoeffding upper and lower
bounds, we have
 
[max(u + v − 1, 0) − C2 (u, v)] dC2 (u, v)
[0,1]2
 
≤ [C1 (u, v) − C2 (u, v)] dC2 (u, v) (4.42)
[0,1]2
 
≤ [min(u, v) − C2 (u, v)] dC2 (u, v) ,
[0,1]2

which can be rewritten as


4.2 Concordance Measures 159
 1
τ +1
C2 (u, 1 − u) du −
4
0 
≤ [C1 (u, v) − C2 (u, v)] dC2 (u, v)
[0,1]2
 1
τ +1
≤ C2 (u, u) du − . (4.43)
0 4
The left-most term is always negative while the right-most one is positive. In
the case where the left-most term is the opposite of the right-most one,
 1  1 
τ +1 τ +1
C2 (u, u)du − =− C2 (u, 1 − u)du − >0, (4.44)
0 4 0 4

one can renormalize expression (4.43) to obtain



[0,1]2
[C1 (u, v) − C2 (u, v)] dC2 (u, v)
−1 ≤ 1 ≤1. (4.45)
0
C2 (u, u)du − τ +1
4

Choosing a fixed copula C2 as a reference, this provides a new dependence


measure, allowing to assess the similarity between any copula C1 and the
reference copula C2 . Two particular choices of C2 have been studied in the
literature:

Spearman’s Rho

Let us choose C2 as the product copula Π(u, v) = u · v, describing indepen-


dence. One easily checks that
 1  1 
τ +1 τ +1 1
u du −
2
=− u · (1 − u) du − = , (4.46)
0 4 0 4 12

while
 1 1
1
u · v dudv = , (4.47)
0 0 4

so that the central fraction in (4.45) leads to define the so-called Spearman’s
rho:
 
ρs (C) = 12 C(u, v) dudv − 3 . (4.48)
[0,1]2

This equation can be interpreted as the difference between the probability of


concordance and the probability of discordance for the two pairs of random
variables (X1 , Y1 ) and (X2 , Y3 ), where the pairs (X1 , Y1 ), (X2 , Y2 ) and (X3 , Y3 )
are three independent realizations drawn from the same distribution:
160 4 Measures of Dependences

ρs = 3 (Pr[(X1 − X2 )(Y1 − Y3 ) > 0] − Pr[(X1 − X2 )(Y1 − Y3 ) < 0]) . (4.49)

By definition, Spearman’s rho equals zero for independent random vari-


ables while the lower (resp. upper) bound is reached if and only if the random
variables are countermonotonic (resp. comonotonic). An alternative expres-
sion is
 
ρs (C) = 12 u · v dC(u, v) − 3 . (4.50)
[0,1]2

It enlightens the fact that Spearman’s rho is related to the linear correlation of
the rank. Indeed, considering two random variables X and Y , with marginal
distributions FX and FY , it is straightforward to check that
Cov (FX (X), FY (Y ))
ρs =  . (4.51)
VarFX (X) · VarFY (Y )
Our introduction of Spearman’s rho, motivated from Kendall’s τ , shows
that they are closely related. In fact, given any copula C, Kruskal [281] has
shown that
3τ − 1 τ 2 − 2τ − 1
≤ ρS ≤ − , τ ≥0, (4.52)
2 2
τ 2 + 2τ − 1 3τ + 1
≤ ρS ≤ , τ ≤0. (4.53)
2 2
Figure 4.5 shows that the area of accessible values for the couple (τ, ρS ) repre-
sents a relatively narrow strip, reflecting the strong relation between Kendall’s
τ and Spearman’s rho.

Fig. 4.5. The shaded area represents the allowed values for the couple (τ, ρS )
4.2 Concordance Measures 161

Gini’s Gamma

Instead of choosing the reference copula C2 (u, v) = Π(u, v) = u · v, one can


consider an equally weighted mixture of the two copulas expressing monotonic
dependence:
1 1
C2 (u, v) = min(u, v) + max(u + v − 1, 0) . (4.54)
2 2
The central fraction in (4.45) then measures how far is a given copula C1 from
the monotonous dependence. Simple algebraic manipulations show that
 1  1 
τ +1 τ +1 1
C2 (u, u) du − =− C2 (u, 1 − u) du − = ,
0 4 0 4 8
(4.55)

and
 1  1
1
C2 (u, v) dC2 (u, v) = . (4.56)
0 0 4
The central fraction in (4.45) then yields the so-called Gini’s gamma:
 1  1 
1
γ(C) = 4 C(u, u) du + C(u, 1 − u) du − . (4.57)
0 0 2
Note that this measure of dependence only relies on the values taken by C on
its main diagonals. The alternative expression
 1  1 
γ(C) = 4 C(u, u) du − [u − C(u, 1 − u)] du (4.58)
0 0

shows that Gini’s gamma represents the difference of the area between the
values of C(u, v) and max(u + v − 1, 0) on the first diagonal and between the
value of C(u, v) and min(u, v) on the second diagonal (see the shaded areas
in Fig. 4.6).

4.2.3 Common Properties of Kendall’s Tau, Spearman’s Rho


and Gini’s Gamma

The three measures of dependence – Kendall’s tau, Spearman’s rho and Gini’s
gamma – presented in the previous paragraphs enjoy the same set of proper-
ties:
1. they are defined for any pair of continuous random variables X and Y ,
2. they are symmetric: for any pair X and Y , τ (X, Y ) = τ (Y, X), for instance,
3. they range from −1 to +1, and reach these bounds when X and Y are
countermonotonic and comonotonic respectively,
162 4 Measures of Dependences

Fig. 4.6. The shaded surface represents the area between the values of C(u, v) –
here the product copula Π(u, v) = u · v – and max(u + v − 1, 0) on the first diagonal
and between the value of C(u, v) and min(u, v) on the second diagonal

4. they equal zero for independent random variables,


5. if the pair of random variables (X1 , X2 ) is more dependent than the pair
(Y1 , Y2 ) in the following sense:

CX (u, v) ≥ CY (u, v), ∀u, v ∈ [0, 1] , (4.59)

then the same ranking holds for any of these three measures; for instance,
τ (X1 , X1 ) ≥ τ (Y1 , Y2 ).
Any measure of dependence fulfilling these five properties is named a con-
cordance measure. The central fraction in (4.45), with any exchangeable cop-
ula C2 such that condition (4.44) is fulfilled together with ρs (C2 ) = 3 τ (C2 ),
ensuring that the numerator of the central term of (4.45) vanishes for
C1 (u, v) = u.v, provides a measure of dependence which satisfies the five
conditions above, and is thus a concordance measure.

4.3 Dependence Metric

Concordance measures fulfill most of the requirements expected from a mea-


sure of dependence. Following Granger et al. [214], one can impose slightly
more demanding properties for a functional measure F[X, Y ] of dependence
between two random variables X and Y , which strengthen properties 1–4 of
concordance measures as follows:
1. F is well defined for both continuous and discrete random variables,
4.3 Dependence Metric 163

2. F is invariant under continuous and strictly increasing transformations of


the random variables, i.e., F depends only on the copula of X and Y ,
3. F is a distance,
4. F equals 0 if X and Y are independent, and varies between 0 and 1,
5. F equals 1 (or, at least, reaches a maximum) if there exist a measurable
mapping between the random variables X and Y : X = f (Y ),
6. F has a simple relationship with the (linear) correlation coefficient in the
case of a bivariate normal distribution.
Dependence measures satisfying all these requirements are named Dependence
metrics.
As an example, one can consider the measure introduced by Bhattacharya,
Matusita and Hellinger:
    12 2
1 f (x)g(y)
S= 1− dH(x, y) , (4.60)
2 R2 h(x, y)

where f and g denote the marginal densities of X and Y respectively, while h


and H are the bivariate density and distribution functions of (X, Y ). Simple
algebraic manipulations give
  
S= 1 − [c(u, v)]1/2 dudv , (4.61)
[0,1]2

where c is the density of the copula of X and Y , showing that S agrees


with the second requirement. Properties 3–5 are easy to check while the last
requirement has been established in [442]. Indeed, for two random variables
with Gaussian copula and shape coefficient ρ, one has
 5/4
1 − ρ2
S =1−
3/2 . (4.62)
2
1 − ρ2

This dependence metric is in fact related to a generalized relative en-


tropy between the joint density h and the product density f · g. Consider
the generalized Kullback-Leibler distance (obtained by symmetrization of the
Kullback-Leibler divergence) for the k-class entropy family [225] defined by
1   
Hk (f ) = 1 − E f k−1 , k
= 1 , (4.63)
k−1
= −E [ln f ] , k = 1 , (4.64)
where f is the density of the random variable (or vector) under consideration.1
In the particular case k = 1, one retrieves the usual Shannon entropy. One
1
This k-class entropy is also known as Tsallis entropy of order k in the physical
literature [476] and has many applications to characterize complex systems with
nonseparable long-range space/time dependences.
164 4 Measures of Dependences

can then show that S is equal to one-fourth of the symmetric relative entropy
of h and f · g for the 1/2-class entropy.
Dependence metrics such as S provide very useful tools to test the presence
of complicated serial dependences. This is particularly important not only to
analyze and forecast financial time series [214] but also to test the goodness-
of-fit in copula modeling, as we shall see in Chap. 5.

4.4 Quadrant and Orthant Dependence


In practice, it is often useful to characterize the dependence of more than
two variables. For instance, risk management deals with portfolios made of
dozens up to tens of thousands of assets. The analysis of the risks associated
with such portfolios requires the assessment of the dependence between many
(N ) variables. It is in general not true that the genuine multivariate depen-
dence between the N variables can be adequately quantified by N (N − 1)/2
dependence measures between all possible pairs. A first approach to define
generalized correlations between N > 2 random variables has been already
described in Sect. 4.1.3. We now discuss other measures which can be shown
to be pure copula properties.
First, note that the previous concept of concordance cannot be easily ex-
tended to more than two random variables. The intuition behind this state-
ment can be obtained by taking the example of Kendall’s τ . One could think
of generalizing the integral expression (4.32) of Kendall’s τ to higher dimen-
sions. However, this straightforward generalization loses several nice properties
of the concordance measures. In particular, the concept of countermonotonic-
ity cannot be used for more than two random variables. Indeed, consider
three random variables X, Y and Z, such that (X, Y ) and (Y, Z) are counter-
monotonic; then (X, Z) are necessarily comonotonic.2 As a consequence, even
if the extension to higher dimensions of Kendall’s τ remains bounded by −1
(by Fréchet-Hoeffding inequality), it is not ascertained that this bound can
still be reached. Therefore, the interpretation of a negative value for such a
generalized Kendall’s τ would not be obvious.
In order to provide measures of dependences which do not suffer from this
problem, let us first introduce the notion of positive quadrant dependence
[300]. Two random variables X and Y are positive quadrant dependent (PQD)
if
Pr[X ≤ x, Y ≤ y] ≥ Pr[X ≤ x] · Pr[Y ≤ y] , ∀x, y . (4.65)
This inequality means that the probability that the two random variables X
and Y are simultaneously small is at least as large as it would be if these two
2
This effect is related to the concept of “frustration” introduced in statistical
physics to describe situations in which constraints tending to create opposite
states in two interacting variables cannot be all obeyed in systems of three or
more elements [475, 481]. Frustration leads in general to multiple equilibria [360].
4.4 Quadrant and Orthant Dependence 165

random variables were independent. If X and Y represent the returns of two


PQD assets, the probability that they undergo simultaneous large losses is not
less than it would be if they were independent. As a consequence, one expects
(and it can be shown [130]) that risk-averse investors prefer a portfolio X̃ + Ỹ
made of independent replications of assets X and Y to a portfolio X + Y
made of the actual PDQ assets. This means that, for any increasing concave
utility function U ,


E [U (X + Y )] ≤ E U X̃ + Ỹ . (4.66)

Inequality (4.65) can be rewritten as

Pr[X > x, Y > y] ≥ Pr[X > x] · Pr[Y > y] , ∀x, y . (4.67)

This defines two random variables as PQD if the probability that they are
simultaneously large or small is at least as large as it would be if these two
random variables were independent. This definition is relevant for risk man-
agement purpose, since it amounts to ask whether large losses of individual
assets tend to occur more frequently together than they would if the assets
were independent.
Definition (4.65) implies that X and Y are PQD if and only if their copula
C satisfies

C(u, v) ≥ Π(u, v) = u · v , ∀u, v ∈ [0, 1] . (4.68)

This ensures that the PQD property depends only on the dependence structure
of the random variables (and not on their marginals).
The PQD property and the concordance measures are intimately related.
Indeed, as recalled in Sect. 4.2.3, if the pair of random variables (X1 , X2 ) is
more dependent than the pair (Y1 , Y2 ), it is also more concordant. So, any
PQD pair of random variables is more concordant than independent pairs of
random variables. But, since any concordance measure equals zero for inde-
pendent random variables, we can assert that, given any concordance mea-
sure, any pair of PQD random variables has a positive concordance measure.
In particular, Kendall’s tau, Spearman’s rho or Gini’s gamma are necessarily
positive for PQD random variables. Besides, (4.48) shows that the Spearman’s
rho is a kind of averaged positive quadrant dependence.
To conclude this brief survey of the properties of PQD random variables,
let us stress that the same result holds for the usual linear correlation coeffi-
cient. Indeed, by Hoeffding identity (4.9), any PQD random variables exhibit
a nonnegative correlation coefficient. Unfortunately, the converse does not
hold. However, given two random variables X and Y such that the linear
correlation coefficient ρ (f (X), g(Y )) exists and is non-negative for any non-
decreasing functions f and g, then these two random variables are PQD [300].
Let us now generalize the bivariate concept of positive quadrant depen-
dence to the multivariate concept of positive orthant dependence. We will say
166 4 Measures of Dependences

that N random variables X1 , X2 , . . . , XN are Positive Lower Orthant Depen-


dent (PLOD) if

Pr[X1 ≤ x1 , . . . , XN ≤ xN ] ≥ Pr[X1 ≤ x1 ] · · · Pr[XN ≤ xN ] , (4.69)

for all xi ’s. As in the bivariate case, this equation simply means that the
probability that the N random variables X1 , . . . , XN are simultaneously small
is at least as large as it would be if these N random variables were independent.
Similarly, N random variables X1 , X2 , . . . , XN are Positive Upper Orthant
Dependent (PUOD) if

Pr[X1 > x1 , . . . , XN > xN ] ≥ Pr[X1 > x1 ] · · · Pr[XN > xN ] , (4.70)

for all xi ’s. Again, this equation has a simple interpretation: the probability
that the N random variables X1 , . . . , XN are simultaneously large is at least as
large as it would be if these N random variables were independent. Note that
the two definitions (4.69) and (4.70) are not equivalent anymore for N > 2.
Finally, N random variables X1 , X2 , . . . , XN are Positive Orthant Depen-
dent (POD) if they are both PUOD and PLOD: the probability that the N
random variables X1 , . . . , XN are simultaneously small or large is at least as
large as it would be if these N random variables were independent.
In terms of copulas, these definitions can be expressed as follows. Given a
N -random vector X = (X1 , . . . , XN ) with copula C,


N
X is PLOD ⇐⇒ C(u1 , . . . , uN ) ≥ ui , ∀ui ∈ [0, 1] , (4.71)
i=1

and

N
X is PUOD ⇐⇒ C̄(u1 , . . . , uN ) ≥ (1 − ui ) , (4.72)
i=1

where C̄ denotes the survival copula of C.


For Archimedean copulas, the PLOD property can easily be related to the
shape of its generator ϕ. In fact, in order for an Archimedean copula Cϕ to
be PLOD, it is sufficient that the mapping
 
x ∈ R+ −→ ϕ e−x (4.73)

be concave, or at least sub-additive (the former implying the later). Indeed,


for any ui ∈ [0, 1], the assumption that ϕ (e−x ) is sub-additive allows us to
write that
   
ϕ (exp [−(− ln u1 ) − · · · − (− ln uN )]) ≤ ϕ eln u1 + · · · + ϕ eln uN ,
≤ ϕ (u1 ) + · · · + ϕ (uN ) , (4.74)
so that
4.4 Quadrant and Orthant Dependence 167

exp [−(− ln u1 ) − · · · − (− ln uN )] ≤ ϕ[−1] (ϕ (u1 ) + · · · + ϕ (uN )) , (4.75)

which is equivalent to


N
Cϕ (u1 , . . . , uN ) ≥ ui . (4.76)
i=1

The proof that the subadditivity of ϕ (e−x ) is in fact a necessary and sufficient
condition for an Archimedean copula to be PLOD can be found in [147].
Let us remark that any completely monotonic generator fulfills the re-
quirement that (4.73) be concave. Therefore, any Archimedean copula which
admits a generalization to arbitrary dimension is PLOD. Archimedean cop-
ulas which exist in any dimension necessarily exhibit positive associations
and their bivariate marginals cannot have negative concordance measures. In
this respect, the bivariate Clayton or Frank copulas admit an n-dimensional
generalization for positive parameter value θ only.
The property of POD is a reasonable assumption for most asset returns.
This allows us to sharpen the (universal) bound for the VaR of the portfolios
considered in Fig. 3.11. Instead of considering the Fréchet-Hoeffding lower
bound in (3.76), one can choose Cinf = Csup = Π, where Π(u, v) = u · v is
the product copula.
The concept of POD is also appealing for testing whether some trading
strategies are actually market neutral. Such strategies are very common in the
alternative investment industry. They aim at decoupling portfolio moves from
market moves, in order to ensure a better stability of the performance of port-
folios. Portfolio managers often focus solely on their fund’s beta, trying to keep
it as small as possible while raising their alpha (the market-independent part
of the expected return). However, if this approach allows them in principle
to remove any linear dependence between the portfolio and the market, it to-
tally neglects nonlinear and extreme dependences. Therefore, testing for POD
seems necessary in order to check whether a fund is actually market neutral.
Denuit and Scaillet [127] have proposed a nonparametric test for POD and,
considering the HRF and CSFB/Tremont market neutral hedge fund indices,
they have shown that both of them exhibit weak linear dependence with the
S&P 500 index – as expected – but that POD cannot be rejected between the
CSFB/Tremont market neutral index and the Standard & Poor’s 500. There-
fore, some funds contributing in the composition of the CSFB/Tremont index
may exhibit nonlinear or extreme dependence with the Standard & Poor’s 500.
This teaches us that focusing on beta is clearly not sufficient to ensure market
neutrality. We will come back to this problem at the end of this chapter when
constructing portfolios which minimize the impact of extreme market moves.
168 4 Measures of Dependences

4.5 Tail Dependence


4.5.1 Definition

Positive quadrant (and more generally orthant) dependence is a very strong


property. It requires that the relation (4.68) holds for every point on the
unit square for two variables (in the hypercube for more than two variables).
It could be interesting to weaken this definition to focus on properties of
local positive quadrant dependences. For instance, one could wish to focus on
the lower left corner only, in order to assess whether joint losses occurring
with (marginal) probability level less than u, say, appear more likely together
than one could expect from statistically independent losses. Recall that the
smaller the value of u, the more extreme are the losses. In this vein, the
notion of tail dependence, aiming at quantifying the propensity of two random
variables to exhibit concomitant extreme movements, has been introduced as
a particularly interesting measure of extreme risks.
The concept of tail dependence is appealing in its simplicity. By definition,
the (upper) tail dependence coefficient is
−1
λU = lim Pr[X > FX (u)|Y > FY−1 (u)] , (4.77)
u→1

and quantifies the probability to observe a large X, assuming that Y is large


itself. In other words, given that Y is very large (at some level of probability
u), the probability that X is very large at the same probability level u defines
asymptotically the tail dependence coefficient λ. As an example, if X and Y
represent the volatilities of two different national markets, their coefficient of
tail dependence λ gives the probability that both markets exhibit together
very high volatilities.
One can also interpret this expression (4.77) in terms of a Value-at-Risk.
−1
Indeed, the quantiles FX (u) and FY−1 (u) are nothing but the Values-at-Risk
of assets (or portfolios) X and Y at the confidence level u, if we count losses
as positive. Thus, the coefficient λU simply provides the probability that X
exceeds the VaR at level u, assuming that Y has exceeded the VaR at the
same probability level u, when this level goes to one. As a consequence, the
probability that both X and Y exceed their VaR at the level u is asymptot-
ically given by λU · (1 − u) as u → 1. As an example, consider a daily VaR
calculated at the 99% confidence level. Then, the probability that both X
and Y undergo a loss larger than their VaR at the 99% level is approximately
given by λU /100. Thus, when λU is about 0.1, the typical recurrence time
between such concomitant large losses is about 4 years, while for λU ≈ 0.5 it
is less than 10 months.

4.5.2 Meaning and Refinement of Asymptotic Independence

One of the appeals of this definition (4.77) of tail dependence is that it is


a pure copula property, i.e., it is independent of the margins of X and Y .
4.5 Tail Dependence 169

Indeed, let C be the copula of the variables X and Y . If their bivariate copula
C is such that
1 − 2u + C(u, u) log C(u, u)
lim = lim 2 − = λU (4.78)
u→1 1−u u→1 log u
exists, then C has an upper tail dependence coefficient λU (see [106, 149, 147]).
In a similar way, one can define the coefficient of lower tail dependence:
C(u, u)
λL = lim+ Pr{X < FX −1 (u) | Y < FY −1 (u)} = lim+ . (4.79)
u→0 u→0 u

If λ > 0,3 the copula presents tail dependence and large events tend to
occur simultaneously, with (conditional) probability λ. On the contrary, when
λ = 0, the copula has no tail dependence and the variables X and Y are said
to be asymptotically independent. There is however a subtlety in this definition
(4.77) of tail dependence. To make it clear, first consider the case where, for
large X and Y , the cumulative distribution function F (x, y) factorizes such
that
F (x, y)
lim =1, (4.80)
x,y→∞ FX (x)FY (y)
where FX (x) and FY (y) are the margins of X and Y respectively. This means
that, for X and Y sufficiently large, these two variables can be considered as
independent. It is then easy to show that
lim Pr{X > FX −1 (u)|Y > FY −1 (u)} = lim 1 − FX (FX −1 (u)) (4.81)
u→1 u→1
= lim 1 − u = 0 , (4.82)
u→1

so that independent variables really have no tail dependence λ = 0, as one


can expect.
However, the result λ = 0 does not imply that the multivariate distribu-
tion can be automatically factorized asymptotically, as shown by the Gaussian
example. Indeed, the Gaussian bivariate distribution cannot be factorized,
even asymptotically for extreme values, since the non-diagonal term of the
quadratic form in the exponential function does not become negligible in gen-
eral as X and Y go to infinity together. Therefore, in a weaker sense, there
may still be a dependence in the tail even when λ = 0.
To make this statement more precise, following [106], let us introduce the
coefficient
2 log Pr{X > FX −1 (u)}
λ̄U = lim −1 (4.83)
u→1 log Pr{X > FX −1 (u), Y > FY −1 (u)}

2 log(1 − u)
= lim −1 . (4.84)
u→1 log[1 − 2u + C(u, u)]

3
In the sequel, λ without subscript will represent either λU or λL .
170 4 Measures of Dependences

It can be shown that the coefficient λ̄U = 1 if and only if the coefficient of tail
dependence λU > 0, while λ̄U takes values in [−1, 1) when λU = 0, allowing
us to refine the nature of the dependence in the tail in the case when the tail
dependence coefficient is not sufficiently informative. It has been established
that, when λ̄ > 0, the variables X and Y are simultaneously large more
frequently than independent variables, while simultaneous large deviations of
X and Y occur less frequently than under independence when λ̄ < 0. In the
first case, the variables X and Y can be said to be locally PQD (positive
quadrant dependent) in the neighborhood of the point (0, 0) and/or (1, 1) in
probability space.
To summarize, independence (factorization of the bivariate distribution)
implies no tail dependence (λ = 0). But λ = 0 is not sufficient to imply factor-
ization and thus true independence. It also requires as a necessary condition
that λ̄ = 0.

4.5.3 Tail Dependence for Several Usual Models

We present several general results allowing for the calculation of the tail de-
pendence of Archimedean copulas, elliptical copulas and copulas derived from
factor models.

Archimedean Copulas

The generator of an Archimedean copula fully embodies the properties of


dependence (and therefore of extreme dependence). As a consequence, the
coefficient of tail dependence of an Archimedean copula can be expressed solely
in terms of its generator. A simple application of L’Hospital’s rule shows that
any Archimedean copula, with a strict generator ϕ (that is, such that ϕ(0) is
infinite so that ϕ[−1] = ϕ−1 ), has a coefficient of upper tail dependence given
by

ϕ−1 (2t)
λU = 2 − 2 lim . (4.85)
t→0 ϕ−1  (t)


As a consequence, if ϕ−1 (0) > −∞, the coefficient of upper tail dependence
is identically zero. For an Archimedean copula to present tail dependence, it

is necessary that lim ϕ−1 (t) = −∞.
t→0
Similarly, the coefficient of lower tail dependence is

ϕ−1 (2t)
λL = 2 lim , (4.86)
t→∞ ϕ−1  (t)


so that ϕ−1 (∞) must be equal to 0 in order for the Archimedean copula to
have a nonzero lower tail dependence.
4.5 Tail Dependence 171

Table 4.2. Expressions of the coefficient of upper and lower tail dependence for
three Archimedean copulas. Note that the usual range for the parameter θ of Clay-
ton’s copula has to be restricted to [0, ∞] in order for the generator to be “strict”

Copula ϕ(t)−1 λL λU Range
−1/θ −1/θ
Clayton (1 + θt) 2 0 θ ∈ [0, ∞]

t1/θ−1

Gumbel − exp −t1/θ 0 2 − 21/θ θ ∈ [1, ∞]
θ
 
1 1 − e−θ e−t
Frank − · 0 0 θ ∈ [−∞, ∞]
θ 1 − (1 − e−θ e−t )

Table 4.2 gives the coefficients of tail dependence of several Archimedean


copulas. It illustrates the fact that some copulas have an upper tail depen-
dence but no lower tail dependence (the Gumbel copula) or, on the contrary,
some copulas have no upper tail dependence but have a lower tail dependence
(Clayton copula). More precisely, the coefficient of lower tail dependence of
Clayton’s copula equals 2−1/θ while the coefficient of upper tail dependence
of Gumbel’s copula is 2 − 21/θ . In addition, the generator of Clayton’s copula
is regularly varying at t = 0 (see Table 4.1), with a tail index −θ while the
generator of Gumbel’s copula is regularly varying at t = 1, with tail index θ.
In fact, one can show that any Archimedean copula, with a generator
regularly varying at zero and tail index −θ (with θ ≥ 0), has a coefficient of
lower tail dependence equal to 2−1/θ . Indeed, by (4.79)

ϕ−1 (2ϕ(u)) ϕ−1 (2x)


λL = lim+ = lim+ −1 , (4.87)
u→0 u x→0 ϕ (x)

and, since ϕ is regularly varying with tail index −θ, ϕ−1 is also regularly
varying with tail index −1/θ [57], so that

ϕ−1 (2x)
λL = lim = 2−1/θ , (4.88)
x→0+ ϕ−1 (x)

Similarly, any Archimedean copula with a generator regularly varying at 1 and


with tail index θ (with θ ≥ 1, in order to fulfill the convexity requirement),
has a coefficient of upper tail dependence equal to 2 − 21/θ .
These results also apply to the frailty model with frailty parameters having
a distribution regularly varying at zero with tail index 1/θ. Using the proper-
ties of the Laplace transform, one can conclude that the copulas generated by
such frailty models have generators which are regularly varying at zero with
tail index −θ. Therefore, they have a coefficient of lower tail dependence equal
to 2−1/θ . Similarly, copulas generated by frailty models with frailty parame-
ters with distribution regularly varying at infinity, with tail index 1/θ (θ > 1),
172 4 Measures of Dependences
1 1 0.1 0.51.2115.5
2 5
0.1

1.21. 2
155
0.5
0.8 0.8 1
25
1. 1.5

0.5
0.1
2
0 .1
2
0.6 0.6 0.
5
1 1.5 5
2
25 1. 1
v

v
1. 1.5

0.4 0.4 2
5
0.

2
5
1. 5
2 1
1. 1 0.

2 11.2 1.5
0.1
0
.55
0.2 0.2
0.5
5 2
.55 0.1
11.2 1
0.5 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u

Fig. 4.7. Contour plot of the copula with generator (4.89) (left panel ) and of its
density (right panel ) for the parameters value α = 1 and β = 2

have a coefficient of upper tail dependence equal to 2 − 21/θ , since they lead
to generators which are regularly varying at 1, with tail index θ. Finally, to
obtain an Archimedean copula with both upper and lower tail dependence,
one just has to consider generators which are regularly varying at 0 and 1, or
alternatively to have frailty parameters with regular variation at zero and at
infinity.
An example is the following generator:

ϕ(t) = t−α · (− ln t) ,
β
(α, β) ∈ [0, ∞) × [1, ∞) , (4.89)

with inverse
  
−1 β α 1/β
ϕ (t) = exp − · W t , (4.90)
α β

where W (·) denotes the Lambert function solution of W (x) · eW (x) = x. It


allows for upper and lower tail dependence with λL = 2−1/α and λU = 2−21/β .
√ this copula for α = 1 and β = 2, corresponding to λL = 0.5
Figure 4.7 shows
and λU = 2 − 2 0.6.

Elliptical Copulas

Assuming that (X, Y ) are normally distributed with correlation coefficient ρ,


it can be shown that, for all ρ ∈ [−1, 1), λ = 0, while λ̄ = ρ [149, 229]. This
later result expresses, as one can expect, that – despite the absence of tail
dependence – extremes appear more likely together for positively correlated
variables.
In contrast, if (X, Y ) have a Student’s copula, one can show that the tail
dependence coefficient is
  
√ 1−ρ
λ = 2 · T̄ν+1 ν+1 , (4.91)
1+ρ
4.5 Tail Dependence 173

1
ν=3
0.9 ν=5
ν=10
0.8 ν=20
ν=50
0.7 ν=100

0.6

0.5
λ

0.4

0.3

0.2

0.1

0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
ρ

Fig. 4.8. Coefficient of upper tail dependence as a function of the correlation co-
efficient ρ for various values of the number of degrees of freedom ν for Student’s
copula

which is greater than zero for all ρ > −1, and thus λ̄ = 1. Tν+1 is the Student
distribution with ν degrees of freedom and the bar denote the complemen-
tary distribution. This result λ̄ = 1 proves that extremes appear more likely
together whatever the correlation coefficient may be, showing that, in fact,
there is no general relationship between the asymptotic dependence and the
linear correlation coefficient. Figure 4.8 shows the coefficient of upper tail de-
pendence as a function of the correlation coefficient ρ for various values of the
number of degrees of freedom ν.
These distinctive properties of the Gaussian and Student’s copulas, char-
acterized by the absence or presence of tail dependence, are illustrated in
Fig. 4.9 which shows the realizations of two random variables with identical
standard Gaussian marginals, with a Gaussian copula or a Student’s copula
with three degrees of freedom and the same correlation coefficient ρ = 0.8.
In the right panel for the Student’s copula, the realizations (dots) are found
to lie within a diamond-shaped domain with narrower and narrower tips as
more extreme values are considered. This phenomenon can be observed, not
only for the bottom-left and upper-right quadrants, but also for the upper-left
and bottom-right quadrants. This results from the fact that the tail depen-
dence coefficient remains nonzero even for negative correlation coefficients as
illustrated in Fig. 4.8.
The Gaussian and Student’s distributions are two examples of elliptical
distributions. More generally, the following result is known: elliptically dis-
tributed random variables present a nonzero tail dependence if and only if
they are regularly varying, i.e., their distributions behave asymptotically like
174 4 Measures of Dependences

Gaussian copula Student’s copula


5 5

4 4

3 3

2 2

1 1
Y

Y
0 0

−1 −1

−2 −2

−3 −3

−4 −4

−5 −5
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5

X X

Fig. 4.9. Realizations of two random variables with Gaussian marginals and with
a Gaussian copula (left panel ) and a Student’s copula with three degrees of freedom
(right panel ) with the same correlation coefficient ρ = 0.8

power laws with some exponent ν > 0 [239]. In such a case, for every regu-
larly varying pair of random variables which are elliptically distributed, the
coefficient of tail dependence λ is given by expression (4.91). This result is
natural since the correlation coefficient is an invariant quantity within the
class of elliptical distributions and since the coefficient of tail dependence is
only determined by the asymptotic behavior of the distribution, so that it
does not matter that the distribution is a Student’s distribution with ν de-
grees of freedom or any other elliptical distribution as long as they have the
same asymptotic behavior in the tail.

Linear Factor Models

Consider the one-factor model


X1 = β1 · Y + 1 , (4.92)
X2 = β2 · Y + 2 , (4.93)
where the i ’s are random variables independent of Y and the βi ’s are non-
random positive coefficients.
The tail dependence λ of X1 and X2 can be simply expressed as the min-
imum of the tail dependence coefficients λ1 and λ2 between the two random
variables X1 and Y , on the one hand and X2 and Y , on the other hand
[332, 335]:

λ = min{λ1 , λ2 } . (4.94)

To understand this result, note that the tail dependence between X1 and X2
is created only through the common factor Y . It is thus natural that the tail
dependence between X1 and X2 is bounded from above by the weakest tail
4.5 Tail Dependence 175

dependence between the Xi ’s and Y while deriving the equality requires more
work. The result (4.94) generalizes to an arbitrary number of random variables
and shows that the study of the tail dependence in linear factor models can
be reduced to the analysis of the tail dependence between each individual
Xi and the common factor Y . In the following, we thus omit the subscript i
and consider without loss of generality one X linearly regressed on a factor Y
according to X = β · Y + .
A general result concerning the tail dependence generated by factor models
for any kind of factor and noise distributions is as follows [332, 335]: the
coefficient of (upper) tail dependence between X and Y is given by
 ∞
λ= dx f (x) , (4.95)
max{1, βl }

where, provided that they exist,


FX −1 (u)
l = lim , (4.96)
u→1 FY −1 (u)

t · PY (t · x)
f (x) = lim , (4.97)
t→∞ F̄Y (t)
where FX and FY are the marginal distribution functions of X and Y respec-
tively, and PY is the density of Y .
As a direct consequence, one can show that any rapidly varying factor,
which encompasses the Gaussian, the exponential or the gamma distributed
factors for instance, leads to a vanishing coefficient of tail dependence, what-
ever the distribution of the idiosyncratic noise may be. This result is obvious
when both the factor and the idiosyncratic noise are normally distributed,
since then X and Y follow a bivariate Gaussian distribution, whose tail de-
pendence has been said to be zero.
On the contrary, regularly varying factors, like the Student’s distributed
factors, lead to a tail dependence, provided that the distribution of the idio-
syncratic noise does not become fatter-tailed than the factor distribution. One
can thus conclude that, in order to generate tail dependence, the factor must
have a sufficiently “wild” distribution. To present an explicit example, let us
assume now that the factor Y and the idiosyncratic noise  have centered Stu-
dent’s distributions with the same number ν of degrees of freedom and scale
factors respectively equal to 1 and σ. The choice of the scale factor equal to
1 for Y is not restrictive but only provides a convenient normalization for σ.
Appendix 4.A shows that the tail dependence coefficient is given by
1
λ=
ν . (4.98)
σ
1+ β

This expression shows that, the larger the typical scale σ of the fluctuation of
 and the weaker the coupling coefficient β, the smaller is the tail dependence,
in accordance with intuition.
176 4 Measures of Dependences

The linear correlation coefficient ρ for the one-factor model is given by


2
ρ = (1 + σβ 2 )−1/2 , which allows us to rewrite the coefficient of upper tail
dependence in terms of ρ > 0 and ν > 2:
ρν 1
λ=
ρν + (1 − ρ2 )ν/2
=
ν/2 . (4.99)
1−ρ2
1+ ρ2

Surprisingly, λ does not go to zero for all ρ’s as ν goes to infinity, as could
be anticipated from the fact that the Student’s distribution converges to the
Gaussian distribution which is known to have zero tail dependence. Expression

(4.99) predicts that λ → 0 when ν → ∞ for all ρ’s smaller √ than 1/ 2. But,
and here lies the surprise, λ → 1 for all ρ larger than 1/ 2 when ν → ∞. This
counterintuitive result is due to a non-uniform convergence which makes the
order of the two limits non-commutative: taking first the limit u → 1 and then
ν → ∞ is different from taking first the limit ν → ∞ and then u → 1. In a
sense, by taking first the limit u → 1, we always ensure the power law regime
even if ν is later taken to infinity. This is different from first “sitting” on
the Gaussian limit ν → ∞. This paradoxical behavior reveals the sometimes
paradoxical consequences of taking the limit u → 1 in the definition of the
tail dependence.
As an illustration, Fig. 4.10 presents the coefficient of tail dependence for
the Student’s factor model as a function of ρ for various values of ν. It is
interesting to compare this figure with Fig. 4.8 depicting the coefficient of tail
dependence for the Student’s copula. Note that λ is vanishing for all negative
ρ’s in the case of the factor model, while λ remains nonzero for negative values
of the correlation coefficient for bivariate Student’s variables.
If Y and  have different numbers νY and ν of degrees of freedom, two
cases occur. For νY < ν ,  is negligible asymptotically and λ = 1. For νY > ν ,
X becomes asymptotically identical to . Then, X and Y have the same tail-
dependence as  and Y , which is zero by construction.
A straightforward generalization of this result can be derived for the mul-
tifactor model [72]:
X1 = β1,1 · Y1 + · · · + β1,n · Yn + 1 , (4.100)
X2 = β2,1 · Y1 + · · · + β2,n · Yn + 2 . (4.101)
The following generalization of (4.98) gives the coefficient of tail dependence
between X1 and one of the Yi as
ν
β1,i
λ1,i = n ν ν
, (4.102)
j=1 β1,j + σ

provided that the Yi ’s remain independent factors. For simplicity, we have


assumed that all the factors are standardized, i.e., their scale factors are all
equal to one. Generalizing expression (4.94), the coefficient of tail dependence
between X1 and X2 is
4.5 Tail Dependence 177

1
ν→2+
0.9 ν=3
ν=5
0.8 ν=10
ν=20
0.7 ν=50

0.6

0.5
λ

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ρ

Fig. 4.10. Coefficient of upper tail dependence as a function of the correlation


coefficient ρ for various values of the number of degrees of freedom ν for the Student’s
factor model


n
λ= 1{β1,i ·β2,i >0} · min (λ1,i , λ2,i ) . (4.103)
i=1

These results are of particular interest for portfolio analysis and risk man-
agement, as we shall see in the next section.

4.5.4 Practical Implications


Let us now give two straightforward applications of the tail dependence for
financial purposes. We also refer the reader to [390] for other financial appli-
cations.

Portfolio Tail Risk Management


Table 4.3 presents the results obtained on the estimations of the upper and
lower coefficients of tail dependence between several major stocks and the
market represented here by the Standard & Poor’s 500, over the last decade.
The estimation has been performed under the assumption that (4.92–4.93)
hold, in which the factor is represented by the Standard & Poor’s 500. Using
the market index as the factor is reasonable since, according to standard
financial theory, the market’s return is well-known to be the most important
explanatory factor for the return of each individual asset.4 The coefficient of
4
In a situation where the common factor cannot be easily identified or estimated,
the results for elliptic distributions obtained in Sect. 4.5.3 may provide a conve-
nient alternative.
178 4 Measures of Dependences

Table 4.3. This table presents the coefficients of lower and of upper tail depen-
dence of the companies traded on the NYSE and listed in the first column with
the Standard & Poor’s 500. The returns used for the calculations are sampled in
the time interval from January 1991 to December 2000. The numbers within the
parentheses are the estimated standard deviations of the empirical coefficients of
tail dependence. Reproduced from [332]

λL λU
Bristol-Myers Squibb Co. 0.16 (0.03) 0.14 (0.01)
Chevron Corp. 0.05 (0.01) 0.03 (0.01)
Hewlett-Packard Co. 0.13 (0.01) 0.12 (0.01)
Coca-Cola Co. 0.12 (0.01) 0.09 (0.01)
Minnesota Mining & MFG Co. 0.07 (0.01) 0.06 (0.01)
Philip Morris Cos Inc. 0.04 (0.01) 0.04 (0.01)
Procter & Gamble Co. 0.12 (0.02) 0.09 (0.01)
Pharmacia Corp. 0.06 (0.01) 0.04 (0.01)
Schering-Plough Corp. 0.12 (0.01) 0.11 (0.01)
Texaco Inc. 0.04 (0.01) 0.03 (0.01)
Texas Instruments Inc. 0.17 (0.02) 0.12 (0.01)
Walgreen Co. 0.11 (0.01) 0.09 (0.01)

tail dependence between any two assets is then easily derived from (4.94). It
is interesting to observe that the coefficients of tail dependence seem almost
identical in the lower and the upper tail. Nonetheless, the coefficient of lower
tail dependence is always slightly larger than the upper one, showing that large
losses are more likely to come together compared with large gain occurrences.
Two clusters of assets clearly stand out: those with a tail dependence of
about 10% (or more) and those with a tail dependence of about 5%. Let
us exploit this observation and explore some consequences of the existence
of stocks with drastically different tail dependence coefficients with the in-
dex. These stocks offer the interesting possibility of constructing a prudential
portfolio which can be significantly less sensitive to the large market moves.
Figure 4.11 compares the daily returns of the Standard & Poor’s 500 with
those of two portfolios P1 and P2 : P1 is made of the four stocks (Chevron
Corp., Philip Morris Cos Inc., Pharmacia Corp., and Texaco Inc.,) with the
smallest λ’s while P2 is made of the four stocks (Bristol-Meyer Squibb Co.,
Hewlett-Packard Co., Schering-Plough Corp., and Texas Instruments Inc.,)
with the largest λ’s. In fact, we have constructed two variants of P1 and two
variants of P2 . The first variant corresponds to choose the same weight 1/4
of each asset in each class of assets (with small λ’s for P1 and large λ’s for
P2 ). The second variant has asset weights in each class chosen in addition to
minimize the variance of the resulting portfolio. We find that the results are
almost the same between the equally weighted and minimum-variance port-
folios. This makes sense since the tail dependence coefficient of a bivariate
4.5 Tail Dependence 179

random vector does not depend on the variances of the components, which
only account for the price moves of moderate amplitudes.
Figure 4.11 presents the results for the equally weighted portfolios gener-
ated from the two groups of assets. Observe that only one large drop occurs
simultaneously for P1 and for the Standard & Poor’s 500 in contrast with P2
for which several large drops are associated with the largest drops of the in-
dex and only a few occur desynchronized. The figure clearly shows an almost
circular scatter plot for the large moves of P1 and the index compared with a
rather narrow ellipse, whose long axis is approximately along the first diago-
nal, for the large returns of P2 and the index, illustrating that the small tail
dependence between the index and the four stocks in P1 automatically implies
that their mutual tail dependence is also very small, according to (4.94). As a
consequence, P1 offers a better diversification with respect to large drops than
P2 . This effect already, quite significant for such small portfolios, should be
overwhelming for large ones. The most interesting result stressed in Fig. 4.11
is that optimizing for minimum tail dependence automatically diversifies away
the large risks.

0.1
Portfolio P1
0.08 Portfolio P2

0.06

0.04
Portfolio daily return

0.02

−0.02

−0.04

−0.06

−0.08

−0.1
−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06
S&P 500 daily return
Fig. 4.11. Daily returns of two equally weighted portfolios P1 (made of four stocks
with small λ ≤ 0.06) and P2 (made of four stocks with large λ ≥ 0.12) as a function
of the daily returns of the Standard & Poor’s 500 over the period January 1991 to
December 2000. The straight (resp. dashed) line represents the regression of portfolio
P1 (resp. P2 ) on the Standard & Poor’s 500. Reproduced from [332]
180 4 Measures of Dependences

These advantages of portfolio P1 with small tail dependence compared


with portfolio P2 with large tail dependence with respect to the Standard &
Poor’s 500 index come at almost no cost in terms of the daily Sharpe ratio,
equal respectively to 0.058 and 0.061 for the equally weighted and minimum
variance P1 and to 0.069 and 0.071 for the equally weighted and minimum
variance P2 .
The straight lines in Fig. 4.11 represent the linear regressions of the returns
of the two portfolios on the index returns, and show that there is significantly
less linear correlation between P1 and the index (correlation coefficient of 0.52
for both the equally weighted and the minimum variance P1 ) compared with
P2 and the index (correlation coefficient of 0.73 for the equally weighted P2
and of 0.70 for the minimum variance P2 ). Theoretically, it is possible to con-
struct two random variables with small correlation coefficient and large λ and
vice-versa. Recall that the correlation coefficient and the tail dependence coef-
ficient are two opposite end-members of dependence measures: the correlation
coefficient quantifies the dependence between relatively small moves while the
tail dependence coefficient measures the dependence during extreme events.
The finding that P1 comes with both the smallest correlation and the smallest
tail dependence coefficients suggests that they are not independent properties
of assets. This intuition is in fact explained and encompassed by the factor
model since the larger β is, the larger is the correlation coefficient and the
larger is the tail dependence. Diversifying away extreme shocks may provide
a useful diversification tool for less extreme dependences, thus improving the
potential usefulness of a strategy of portfolio management based on the tail
dependence proposed here.

Impact on Dependent Default Modeling

Consider N obligators with individual default probability πi , i = 1, . . . , N and


default indicator

Di = 1 ⇐⇒ Xi ≤ Ti , (4.104)

where (X1 , . . . , XN ) denotes the vector of latent variables and (T1 , . . . , TN )


the vector of thresholds below which default occurs, Sect. 3.6.4 has shown
that the probability that k obligators – labeled i1 , . . . , ik – among N default
is given by

Pr [Di1 = 1, . . . , Dik = 1] = C (πi1 , . . . , πik ) , (4.105)

where C is the copula of the latent variables Xi under consideration.


This equation emphasizes the key role of the copula in credit risk mod-
eling. Since default probabilities are generally very low, specifically for very
high quality obligators, the behavior of the copula in the extreme is crucial.
As a consequence, it could seem natural that the presence or the absence of
4.5 Tail Dependence 181

Table 4.4. Ratio of the 99% quantiles of the distribution of defaulting obligators
when the latent variables have a Student copula with ν degrees of freedom, nor-
malized with respect to the Gaussian copula, for portfolios of 10,000 homogeneous
credits with default probability πi and correlation ρ. The values of πi and ρ are the
same as in [186]

πi ρ ν = 50 ν = 10 ν=4
0.01% 2.58% 2.33 5.62 6.00
0.50% 3.80% 1.66 3.75 6.84
7.50% 9.21% 1.09 1.39 1.78

tail dependence between the latent variables Xi would be of particular impor-


tance. When latent variables are asymptotically independent – as assumed in
traditional models exposed in Sect. 3.6.4 – one can reasonably guess that such
models would underestimate the actual occurrence of concomitant defaults.
This view has been advocated by Frey et al. [185, 186], among others.
Considering large credit portfolios, they investigate the evolution of the total
number of defaulting obligators when the dependence structure describing
their interaction changes. Table 4.4 gives the ratios of the 99% quantiles of
the distribution of defaulting obligators for a Student’s copula normalized with
respect to the Gaussian copula, for three credit groups of different quality. The
ratio of the quantiles increases when the number of degrees of freedom of the
Student’s copula decreases, since the dependence between extremes becomes
stronger. We also observe that the ratio of the quantiles decreases with the
quality of the obligator, i.e., when the default probability increases. Indeed,
in such a case, the tail dependence has a weaker impact on the portfolio loss,
and therefore the exact shape of the copula in the neighborhood of (0, 0) or
of (1, 1) is less important.
Overall, these simulations tend to give substance to the assertion that the
choice of the copula is fundamental. However, some recent studies support the
opposite point of view. For the practical purpose of pricing credit derivatives,
several authors [291, 430] have shown that the choice of the copula has in fact
only a weak impact on the value of such contracts. As an example, Laurent
and Gregory [291] show that the premium for the first-to-default swap in bas-
ket default swaps is almost the same for a Gaussian copula and for a Clayton
copula. Such results also hold for CDO5 tranches. Schloegl and O’Kane con-
firm these results for the Student’s copula [430], for which they find that it
does not provide significant improvement with respect to the Gaussian copula.
To sum up, for credit derivative pricing, the choice of the copula does not
appear to be crucial. However, taking into account the simulation results in
[186], we clearly see that the dependence structure has a real impact on the
5
Collaterized Debt Obligation
182 4 Measures of Dependences

loss distribution of the credit portfolio. Therefore, even if the copula is not so
important for derivative pricing, it could be really crucial to establish hedging
strategies. This point has not yet been really explored to our knowledge,
but appears as an important future development of the research on credit
derivatives.

Appendix

4.A Tail Dependence Generated by Student’s Factor Model

We consider two random variables X and Y , related by

X = βY +  , (4.A.1)

where  is a random variable independent of Y and β a nonrandom positive


coefficient. Let us assume that Y and  have a Student’s distribution with
density:

PY (y) =
(ν+1)/2 , (4.A.2)
2
y
1+ ν

P () =  (ν+1)/2 . (4.A.3)
2
σ 1+ ν σ2

−1
Lemma 4.5.1. The probability that X is larger than FX (u) knowing that Y
−1
is larger than FY (u) is given by :
−1
Pr[X > FX (u)|Y > FY−1 (u)] = F̄ (η)
 ∞
β
+ dy F̄Y (y) · P [βFY−1 (u) + η − βy] , (4.A.4)
1 − u FY−1 (u)
with
−1
η = FX (u) − βFY−1 (u) . (4.A.5)

The proof of this lemma relies on a simple integration by part and a change
of variable, which are detailed in Appendix 4.A.1.
Introducing the notation

Ỹu = FY−1 (u) , (4.A.6)

Appendix 4.A.2 shows that


  ν 1/ν 
σ
η=β 1+ − 1 Ỹu + O(Ỹu−1 ) . (4.A.7)
β
Appendix 183

This allows us to conclude that η → +∞ as u → 1. Thus, F̄ (η) → 0 as u → 1


and
 ∞
β
λ = lim dy F̄Y (y) · P (β Ỹu + η − βy) . (4.A.8)
u→1 1 − u Ỹ
u

Now, using the following result:


Lemma 4.5.2. Assuming ν > 0 and x0 > 1,

1 ∞ 1 Cν 1
lim dx ν  = ν , (4.A.9)
→0  1 x  x−x 2  ν+1
2 x 0
1+ 
0

whose proof is given in Appendix 4.A.3, it is straightforward to show that


1
λ=
ν . (4.A.10)
σ
1+ β

The final steps of this derivation are given in Appendix 4.A.4.

4.A.1 Proof of Lemma 4.5.1

By definition,
 ∞  ∞
−1
Pr[X > FX (u), Y > FY−1 (u)] = dx dy PY (y) · P (x − βy)
−1
FX (u) FY−1 (u)
 ∞
−1
= dy PY (y) · F̄ [FX (u) − βy] .
FY−1 (u)

Let us perform an integration by part:


−1
Pr[X > FX (u), Y > FY−1 (u)]
 ∞
 −1
∞ −1
= −F̄Y (y) · F̄ (FX (u) − βy) F −1 (u) + β dy F̄Y (y) · P (FX (u) − βy)
Y
FY−1 (u)
 ∞
−1
= (1 − u)F̄ (FX (u) − βFY−1 (u)) + β −1
dy F̄Y (y) · P (FX (u) − βy) .
FY−1 (u)
−1
Defining η = FX (u) − βFY−1 (u) (see (4.A.5)), and dividing each term by

Pr[Y > FY−1 (u)] = 1 − u , (4.A.11)

we obtain the result given in (4.A.4)


184 4 Measures of Dependences

4.A.2 Derivation of Equation (4.A.7)

The factor Y and the idiosyncratic noise  are distributed according to the
Student’s distributions with ν degrees of freedom given by (4.A.2) and (4.A.3)
respectively. It follows that the survival distributions of Y and  are:
ν−1
ν 2 Cν
F̄Y (y) = + O(y −(ν+2) ) , (4.A.12)

ν−1
σν ν 2 Cν
F̄ () = + O(−(ν+2) ) , (4.A.13)

and
ν−1
(β ν + σ ν ) ν 2 Cν
F̄X (x) = + O(x−(ν+2) ) . (4.A.14)

Using the notation (4.A.6), (4.A.5) can be rewritten as

F̄X (η + β Ỹu ) = F̄Y (Ỹu ) = 1 − u , (4.A.15)

whose solution for large Ỹu (or equivalently as u goes to 1) is


  ν 1/ν 
σ
η=β 1+ − 1 Ỹu + O(Ỹu−1 ) . (4.A.16)
β

To obtain this equation, we have used the asymptotic expressions of F̄X and
F̄Y given in (4.A.14) and (4.A.12).

4.A.3 Proof of Lemma 4.5.2

The change of variable


x − x0
u= , (4.A.17)

gives
 
1 ∞ 1 Cν ∞
1 Cν
dx ν  = du
 1 x 1
 x−x 2  ν+1
2 1−x0 (u + x0 )ν (1 + u2 ) ν+1
2
1+ ν 
0 ν
 ∞
1 1 Cν
= du u ν
xν0 1−x0 (1 + x0 ) (1 + u2 ) ν+1
2
ν
 x0
1 1 Cν
= du
xν0 (1 + xu0 )ν (1 + u2 ) ν+1
1−x0 2
ν
 ∞

1 1 Cν
+ ν du u ν .
x0 0
x (1 + x0 ) (1 + u2 ) ν+1 2
ν
Appendix 185

Consider the second integral in the right-hand side of the last equality. We
have
x0
u≥ , (4.A.18)

which allows us to write
ν+1
1 ν 2 ν+1
≤ , (4.A.19)
xν+1
ν+1
(1 + u2 ) 2
0

so that
 
∞ ν ν+1 ν+1 ∞
1 Cν 2 Cν
x du ≤ du (4.A.20)
0 (1 + x0 ) (1 + u ) 2
u ν 2
ν+1
x0 ν+1 x 0 (1 + xu0 )ν


ν 2 ν ∞
ν+1

= dv (4.A.21)
xν0 1 (1 + v)ν
= O( ).
ν
(4.A.22)
The next step of the proof is to show that
 x0
1 Cν
du −→ 1 as  −→ 0 . (4.A.23)
1−x0 (1 + xu0 )ν (1 + u2 ) ν+1
2
ν

Let us calculate
 x0

1 Cν
1−x du − 1
0 (1 + u ν
x0 ) (1 + u2 ν+1
) 2
 x0 ν


1 Cν Cν
= du − du
1−x0 (1 + u
x0 )ν u
(1 + ν )
2 ν+1
2 −∞
u
(1 + ν )
2 ν+1
2
 x0
 

1 Cν
= du u ν − 1
1−x 0 (1 + x0 ) 2 ν+1
(1 + uν ) 2


 1−x 0  ∞
Cν Cν
− du ν+1 − du ν+1
(1 + ν ) 2
u2 u2
−∞ (1 + ν ) 2 x 0

 x0  1−x0

 

1 Cν Cν
≤ du − 1 + du
1−x0 (1 + u ν
x0 ) (1 + u2 ν+1
) 2 −∞ (1 + u2 ν+1
) 2

ν ν


+ du ν+1 . (4.A.24)
x0 (1 + ν ) 2
u2

186 4 Measures of Dependences

xν0 −1
− ⋅ε⋅ u
x0−1

ν⋅ ε ⋅ u
x0

0
(1−x0)/ε 0 u x0 /ε

Fig. 4.12. The graph of the function (4.A.25)


 (thick
 solid line), the cord which
gives an upper bound of the function within 1−x
0
, 0 (dashed line)
 and  the tangent
at 0+ which gives an upper bound of the function in the interval 0, x0 (dash dotted
line)

The second and third integrals obviously behave like O(ν ) when  goes
to zero since we have assumed x0 > 1 which ensures that 1−x 
0
→ −∞ and
x0
 → ∞ when  → 0 +
. For the first integral, we have
 x0  

1 Cν
1−x du u ν − 1 ν+1
0 (1 + x0 ) (1 + ν ) 2
u2

 x 0

1 Cν
≤ du − 1 .
1−x0 (1 + x0 )ν
u
(1 + u2 ) ν+1
2
ν

The function


1
u ν − 1 (4.A.25)
(1 + x0 )

vanishes at u = 0, is convex for u ∈ [ 1−x


 , 0) and concave for u ∈ (0,  ] (see
0 x0

Fig. 4.12), so that there are two constants A, B > 0 such that
Appendix 187
 
xν0 − 1 1 − x0
1
− 1 ≤ −  · u = −A ·  · u , ∀u ∈ ,0 (4.A.26)
(1 + xu0 )ν x0 − 1 

 x 
1 ν 0
− 1 ≤ u = B ·  · u, ∀u ∈ 0, . (4.A.27)
(1 + x0 )
u ν
x0 
We can thus conclude that
 x0    0
u · Cν
1 Cν
1−x du − 1 ≤ −A ·  du
0
u ν
(1 + x0 ) (1 + ν ) 2
u2 ν+1 1−x 0
2 ν+1
(1 + uν ) 2

 x 0
u · Cν
+ B· du 2 ν+1
0 (1 + uν ) 2
= O(α ) , (4.A.28)
with α = min{ν, 1}. Indeed, the two integrals can be performed exactly, which
shows that they behave as O(1) if ν > 1 and as O(ν−1 ) otherwise. Thus, we
finally obtain
 x0

1 Cν
1−x du − 1 = O(α ) . (4.A.29)
0 (1 + u ν
x0 ) u2 ν+1
(1 + ν ) 2

Putting together (4.A.22) and (4.A.29) gives




 ∞
1 1 C 1
dx ν 
ν
ν+1 −
= O(min{ν,1} ) , (4.A.30)
 x    xν
1 1 + ν1 x−x
2 2 0

0

which concludes the proof.

4.A.4 Derivation of Equation (4.A.10)

From (4.A.12), we can deduce

Cν  
ν−1
ν 2
F̄Y (y) = 1 + O(y −2 ) . (4.A.31)

Using (4.A.3) and (4.A.7), we obtain




P (β Ỹu + η − βy) = P (γ Ỹu − βy) · 1 + O(Ỹu−2 ) , (4.A.32)

where
  ν 1/ν
σ
γ =β 1+ . (4.A.33)
β
188 4 Measures of Dependences

Putting together these results yields for the leading order


 ∞
dy F̄Y (y) · P (β Ỹu + η − βy)
Ỹu
 ∞ ν−1
ν 2 Cν Cν
= dy ·
ν+1
Ỹu yν (γ Ỹu −βy)2 2
σ 1+ ν σ2
ν−1  ∞
ν 2 Cν 1 Cν βσỸu
= dx · , (4.A.34)
β Y˜u
ν
1 xν   γ 2 ν+1
2
x− β
1 + ν1 σ
β Ỹu

where the change of variable x = Ỹy has been performed in the last equation.
u
We now apply Lemma 4.5.2 with x0 = βγ > 1 and  = βσỸ which goes to
u
zero as u → 1. This gives
 ∞ ν−1  ν
ν 2 Cν β
dy F̄Y (y) · P (β Ỹu + η − βy) ∼u→1 , (4.A.35)
Ỹu β Ỹuν γ

which shows that


 ν  ν
−1 β β
Pr[X > FX (u), Y > FY−1 (u)] ∼u→1 FY−1 (Ỹu ) = (1 − u) .
γ γ

Therefore
 ν
−1 β
Pr[X > FX (u)|Y > FY−1 (u)] ∼u→1 , (4.A.36)
γ

which finally yields


1
λ=
ν . (4.A.37)
σ
1+ β
5
Description of Financial Dependences
with Copulas

There are two general methods for estimating empirically the copula best de-
scribing the dependence structure of a basket of assets, and more generally of
a portfolio made of different financial and/or actuarial risks: parametric and
nonparametric. The latter class is by far the most general since it does not
require the a priori specification of a model, and should thus avoid the prob-
lem of misspecification (model error). In contrast, the parametric approach
has the advantage that, if a model is correctly specified, it leads to a much
more precise parametric estimation. In addition, the reduced number of para-
meters involved in the description of the selected copula can be interpreted as
being the relevant meaningful variables that summarize the dependence prop-
erties between the assets. Consider for instance the Gaussian representation,
or more generally any presentation in terms of elliptical distributions, whose
dependence structure is, to large extent (see Chap. 4), summarized by the
set of linear coefficients of correlation. These coefficients of correlation thus
play a pivotal role and it is tempting to interpret them as the macrovariables
(or phenomenological variables) synthesizing all possible microstructural in-
teractions between economic agents leading to the observed dependence. Let
us recall that identifying the “correct variables” constitutes the critical first
step in model building to obtain the best possible representation of observed
phenomena. The usefulness of the parametric estimation is thus obvious from
this point of view.
The first section of this chapter reviews the most representative methods
to estimate copulas, with an emphasis on the description of parametric ap-
proaches. The following section focuses on the problem of model selection and
on goodness-of-fit tests. Indeed, the estimation procedure has no sense if the
quality and the likelihood of the model are not assessed. Instead of reviewing
the many available goodness-of-fit tests, we discuss how to best describe the
dependence structure of asset returns and we compare the relative merits of
the different models considered in the literature to address this question.
190 5 Description of Financial Dependences with Copulas

5.1 Estimation of Copulas


There is a significant body of literature on the estimation of copulas. This
section aims at summarizing some of the most popular techniques which have
appeared in the statistical literature and which are now of common use in
modeling financial and economic variables as well as actuarial risks.

5.1.1 Nonparametric Estimation

The Empirical Copula

The very first copula estimation method dates back to the work by De-
heuvels [121, 122]. It relies on a simple generalization of the usual estimator
of a multivariate distribution. Indeed, considering an n-dimensional random
vector X = (X1 , . . . , Xn ) whose copula is C and given a sample of size T
{(x1 (1), x2 (1), . . . , xn (1)), . . . , (x1 (T ), x2 (T ), . . . , xn (T ))}, a natural idea is to
estimate the empirical distribution function F of X as

1 
T
F̂ (x) = 1{x1 (k)≤x1 ,...,xn (k)≤xn } , (5.1)
T
k=1

and the empirical marginal distribution functions of the Xi ’s as

1 
T
F̂i (xi ) = 1{xi (k)≤xi } . (5.2)
T
k=1

The application of Sklar’s theorem would then appear to obtain a nonpara-


metric estimation of the copula C. Unfortunately, even if the margins of F
are continuous, their empirical counterparts are not. Therefore, one cannot
determine a unique estimated copula Ĉ.1 Following this approach, one can,
however, obtain
 a unique nonparametric estimator of C defined at the dis-
crete points iT1 , iT2 , . . . , iTn , with ik ∈ {1, 2, . . . , T }. Inverting the empirical
marginal distribution function, we obtain
 
1 
T
i1 i2 in
Ĉ , ,..., = 1{x1 (k)≤x1 (i1 ;T ),...,xn (k)≤xn (in ;T )} , (5.3)
T T T T
k=1

where xp (k; T ) denotes the k th order statistics of the sample {xp (1), . . . ,
xp (T )}. Following Deheuvels, one can define an empirical copula as any copula
which satisfies the relation (5.3).
It is well-known that the empirical distribution function F̂ converges, al-
most surely, uniformly to the underlying distribution function F from which
1
The same issue arises, of course, for the empirical estimation of marginal as well
as multivariate distributions.
5.1 Estimation of Copulas 191

the sample is drawn, as the sample size T goes to infinity. This property still
holds for the nonparametric estimator defined by the empirical copula

a.s
sup Ĉ(u) − C(u) −→ 0 . (5.4)
u∈[0,1]n

Similarly, the empirical copula density ĉ can be estimated by


  
i1 in 1
T , if {x1 (i1 ; T ), . . . , xn (in ; T )} belongs to the sample,
ĉ ,..., =
T T 0, otherwise.
(5.5)

The following relation holds between the empirical copula Ĉ and the empirical
copula density:
   2 2 
i1 in k +···+kn
ĉ ,..., = ··· (−1) 1 ×
T T
k1 =1 kn =1
 
i1 − k1 + 1 in − kn + 1
×Ĉ ,..., . (5.6)
T T
A natural question arises: what is the estimated value of C(u)  or c(u) when

u does not belong to the lattice defined by the set of points iT1 , iT2 , . . . , iTn ,
with ik ∈ {1, 2, . . . , T }? It would seem that this is nothing but a straightfor-
ward interpolation problem, which could be solved by constructing a sim-
ple staircase function or applying spline functions, for instance. However,
such methods of interpolation do not ensure that the function so obtained
fulfills the requirements for a copula, according to Definition 3.2.1; in par-
ticular, the function must be n-increasing, which requires a multilinear in-
terpolation
 scheme.  In  the bivariate
 case (for simplicity), given any point
(u, v) ∈ kTu , kuT+1 × kTv , kvT+1 , where ku , kv ∈ {0, 1, . . . T − 1} denotes the
integer part of T · u and T · v respectively, the following interpolation
 
ku kv
C̃(u, v) = Ĉ , · (ku + 1 − T · u) (kv + 1 − T · v)
T T
 
ku kv + 1
+ Ĉ , · (ku + 1 − T · u) (T · v − kv )
T T
 
ku + 1 kv
+ Ĉ , · (T · u − ku ) (kv + 1 − T · v)
T T
 
ku + 1 kv + 1
+ Ĉ , · (T · u − ku ) (T · v − kv ) , (5.7)
T T
defines a bona fide empirical
 copula. Indeed,
 by construction C̃ is a copula
(see [370, p. 16]) and C̃ Ti , Tj = Ĉ Ti , Tj for all i, j ∈ {1, . . . , T }.
Li et al. [304, 305] have provided some other insightful methods. One of
them relies on the use of Bernstein polynomials,
192 5 Description of Financial Dependences with Copulas
 
n i
Pi,n (x) = x (1 − x)n−i . (5.8)
i

Defining


T 
T  
i1 i2 in
ĈB (u) = ··· Pi1 ,T (u1 ) · · · Pin ,T (un ) · Ĉ , ,..., , (5.9)
i1 =1 in =1
T T T

one obtains a copula which converges uniformly to C, almost surely as T goes


to infinity (a weak form of the Stone-Weierstrass theorem).2
If the method provides a smooth infinitely differentiable copula, it comes
however, with two severe drawbacks:
• First, it is easy to show [138] that any differentiable copula in the neigh-
borhood of (1, 1) (or of (0, 0)) has a vanishing coefficient of tail dependence
λ (see Chap. 4). Indeed, a necessary condition for λ not to vanish is that
the copula be non-differentiable in the neighborhood of (1, 1).3 Thus, by
construction, a nonparametric estimation of copulas using the interpola-
tion method described above automatically forbids a correct estimation of
the tail dependence parameter. Such an estimation amounts to project the
copula onto the set of copulas with vanishing tail dependence.
• Second, the convergence of the derivatives of ĈB toward the derivatives
of C is not a priori ensured. As a consequence, it is not possible to use
these estimates to generate simulated data enjoying the same dependence
structure as that of the sample (see Sect. 3.5). This is particularly harmful
since one often has to resort to Monte Carlo simulations and bootstrap
methods to assess portfolio risk or to valuate derivative assets. It is thus
necessary to look for nonparametric estimators of both the copula and its
derivatives.

Kernel Copula Estimator

Smooth joint estimates of a copula and of its derivatives can be obtained


by using a kernel-based approach [168]. Still considering an n-dimensional
random vector X with copula C, let us call its joint distribution function
F and its marginal distribution functions Fi such that F (X1 , . . . , Xn ) =
2
According to the Weierstrass approximation theorem, any continuous function
defined on an interval [a, b] can be uniformly approximated as closely as desired by
a polynomial function. The Stone-Weierstrass theorem generalizes the Weierstrass
approximation theorem in two directions by considering an arbitrary compact
Hausdorff space instead of a compact interval [a, b], and approximations with
elements from more general sets than polynomials.
3
Note that this condition is necessary but not sufficient as shown for instance
by the Gaussian copula which is not differentiable at (1, 1) but nevertheless has
vanishing tail dependence.
5.1 Estimation of Copulas 193

C (F1 (X1 ) , . . . , Fn (Xn )). The most commonly used kernel is probably the
Gaussian kernel,
1
ϕ(x) = √ e− 2 x .
1 2
(5.10)

We present the general procedure detailed in [168] on this particular example.
Let us first estimate the joint distribution of X. Given the sample of size
T {(x1 (1), x2 (1), . . . , xn (1)), . . . , (x1 (T ), x2 (T ), . . . , xn (T ))}, the kernel esti-
mates of Fi (x) and F (x) are
 
1
T
xi − xi (t)
F̂i (xi ) = Φ , (5.11)
T t=1 hi

and
 
1 
T n
xi − xi (t)
F̂ (x) = Φ , (5.12)
T t=1 i=1 hi

where
 x
Φ(x) = ϕ(t) dt (5.13)
−∞

and (h1 , . . . , hn ) is the bandwidth, a function of T with value in Rn and


satisfying
hi (T ) > 0,∀T, i ∈ {1, . . . , n} , (5.14)
 −1

n 
n
hi (T ) + T · hi (T ) −→ 0 , as T −→ ∞ . (5.15)
i=1 i=1

In practice, one usually chooses hi = σ̂i · (4/3T )1/5 , where σ̂i denotes the
sample standard deviation of {xi (1), . . . , xi (T )}.
Defining q̂, the vector whose ith component is the ui -quantile of F̂i ,

q̂i = inf {x : F̂i (x) ≥ ui }, ui ∈ (0, 1) , (5.16)


x∈R

the kernel estimator of the copula C is simply given by

Ĉ(u1 , . . . , un ) = F̂ (q̂) . (5.17)

Under mild regularity conditions, this kernel estimator is asymptotically


Gaussian. From Proposition 1 in [168], one can show that
 1/2

n

T· hi · Ĉ(u) − C(u) −→ N (0, C(u)) . (5.18)
i=1
194 5 Description of Financial Dependences with Copulas

This asymptotic behavior holds even when the sample is not iid, provided
that the underlying process satisfies some strong mixing conditions, roughly
speaking (see [168] for details). Therefore, this method can be applied to
financial asset returns, which are known to exhibit volatility clustering among
other time dependence patterns.
By construction of the kernel estimator, it can be differentiated with re-
spect to the ui ’s. It is thus easy to obtain an estimator of a partial derivative
of the copula with respect to one (or more) of the variables. For instance, the
kernel estimator of the first order partial derivative of C with respect to ui is


∂C(u) ∂ Ĉ(u) 1
= = · ∂i F̂ (q̂(u)) (5.19)
∂ui ∂ui ˆ
fi (q̂i (ui ))

where fˆi denotes the kernel estimate of the marginal density of Xi ,


 
1 
T
xi − xi (t)
fˆi (xi ) = ϕ , (5.20)
T · hi t=1 hi

and ∂i F̂ is the partial derivative of F̂ with respect to its ith variable.


Again, under mild regularity conditions, it can be shown that this estima-
tor is asymptotically Gaussian, so that
 1/2   

n
∂ Ĉ(u) ∂C(u) 1 ∂C(u)
T· hi · − −→ N 0, · . (5.21)
i=1
∂ui ∂ui fi (qi (ui )) ∂ui

Applying the same kind of arguments, one can estimate the higher order
partial derivatives of the copula C:


∂C(u) ∂ Ĉ(u) ∂i1 ,...,ik F̂ (q̂(u))
= = , (5.22)
∂ui1 · · · ∂uik ∂ui1 · · · ∂uik fˆi1 (q̂i1 (ui1 )) · · · fˆik (q̂ik (uik ))

where all the ij ’s are assumed different and k ≤ n. As a consequence, it


becomes possible to simulate random variables with copula Ĉ, by using the
algorithm detailed in Sect. 3.5.2.
When k = n, one obtains the kernel estimator of the copula density:

fˆ (q̂(u))
ĉ(u) = , (5.23)
fˆ1 (q̂1 (u1 )) · · · fˆn (q̂n (un ))

where fˆ denotes the kernel estimate of the joint density of X,

T  n  
1 xi − xi (t)
fˆ(x) = & ϕ . (5.24)
T· i hi t=1 i=1 hi
5.1 Estimation of Copulas 195
1 0.5 1 1 .0

75
0.5 4 0.95 98

1.1
1. 1
5

1.0
5
1.2 0.

1
0.

1.0
99

2
0.5

0.

1
0.5
2

1
1.00
0 . 75
0.8 1.5 0.8
1
5
1.2

98

1.
0.
5

02
0.

25
1.
00

1.
95 1.01
0.5

3
0.6 0.7
5 0.6 0.9

1
03

1
1.0

00
5
1.2

1.
1.001 1
v

v
1 1
0.5 001
1.1.003 1.001

5
0 .7
1
5
0.4 1.2 0.4 5
1 99
1.01 0.

1.0
251 5 0.5

0
25

8
1.0

0.9
1.
0.5

2
1. 0.7

1.001
1.5 1
0.2 0.7
5 0.2
2
0.5 5

1.0
1.5 99
0.

1.1

1.

1
4 8 0.95

1
5 5 1.05
0.9

02
1.2 1 0.70.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

u u

Fig. 5.1. Contour plot of the copula density estimated by the kernel method for
the daily returns of the couple constituted of the German Mark (u variable) and the
Japanese Yen (v variable) over the time interval from 1 May, 1973 to 2 November,
2001 (left panel ) and for the couple made of General Motors (u variable) and Procter
& Gamble (v variable) over the time period from 3 July, 1962 to 29 December, 2000
(right panel ). The German Mark data has been reconstructed from the Euro data
after 31 December, 1999

Two examples of copula densities estimated by the kernel method are shown
in Fig. 5.1. Observe that the level curves of the left panel are rather similar to
those of Fig. 3.3, which depicts the contour plot of a Student copula. This is
suggestive of the relevance of a Student’s copula with a moderate number of
degrees of freedom as a possible candidate for modeling dependencies between
the returns of foreign exchange rates.4 For stock returns, the situation is less
clear, even if one could surmise that a Student copula with a large number of
degrees of freedom could be a reasonable model.
To sum up this paragraph on kernel estimators, let us stress that, notwith-
standing their seeming attractiveness, they have a severe drawback as they
require a very large amount of data. As an illustration, in order to obtain the
two pictures of Fig. 5.1, we used between 7,000 and 10,000 data points. With
less than 2,500–5,000 points, one obtains unreliable estimates in most cases,
showing that the kernel estimators behave badly for small samples. Therefore,
with daily returns, an accurate non-parametric estimate of the copula requires
between 30 and 40 years of data. Over such a long time period, it is far from
given that the dependence structure remains stationary, thus possibly blowing
up the whole effort.

5.1.2 Semiparametric Estimation

When the number of observations is not large enough and/or when one has
a sufficiently accurate idea of the true model, it is in general more profitable
4
Of course, this statement should be formally tested by using rigorous statistical
techniques. See the following sections.
196 5 Description of Financial Dependences with Copulas

to apply a parametric or semiparametric estimation method. By parametric,


we mean a method based on an entirely parametric model: in such a case,
we assume that the true model belongs to a given family of multivariate dis-
tributions, i.e., a family of copula plus a family of univariate distribution for
each individual marginal law. Such a modeling approach requires a very ac-
curate knowledge of the true distribution and can lead to bad estimations of
the copula parameters if the marginals are misspecified. Thus, when in doubt
concerning the univariate marginal distributions of the data (see in this re-
spect the cautionary study presented in Chap. 2), a semiparametric approach
may be preferable. Indeed, in contrast with fully parametric methods, semi-
parametric techniques use a parametric representation only for the copula. No
assumption is made concerning the marginal distributions, which may either
be estimated nonparametrically or not even come into play at all, as we shall
see now.

Estimation Based on Concordance Measures

Basically, two kinds of semiparametric methods exist. The simplest one is


based upon the nonparametric estimation of parameters which only depend on
the copula. Concordance measures, such as Kendall’s tau and Spearman’s rho
for instance, provide good examples. They can be easily estimated and, once
a parametric family of copulas has been retained, one just has to express the
parameters of the copula as functions of these estimated quantities. It is the
stance taken by Oakes [375] to estimate the parameter θ of a Clayton copula
(see (3.43)). Table 4.1 gives the following relation between the parameter θ
and Kendall’s tau:

θ= . (5.25)
1−τ
Therefore, a natural estimator of θ is:
2τ̂T
θ̂T = , (5.26)
1 − τ̂T
where τ̂T denotes the sample version of Kendall’s tau, based on the bivariate
sample of size T : {(x1 , y1 ), . . . , (xT , yT )}. Let us recall that
C −D
τ̂T = 2 · , (5.27)
T · (T − 1)

where C (resp. D) denotes the number of concordant (resp. discordant) pairs,


i.e., such that (xi − xj ) · (yi − yj ) > 0 (resp. < 0).
Based on relation (4.37), a similar approach can be applied to estimate
the shape parameter ρ of an elliptical copula. One then obtains

π
ρ̂T = sin τ̂T . (5.28)
2
5.1 Estimation of Copulas 197

As an illustration, let us consider again the two samples of the daily returns
of the German Mark and the Japanese Yen on the one hand, and of General
Motors and Procter & Gamble on the other hand. Assuming that their copula
belongs to the class of elliptical copulas, we can apply this method to infer
the value of the shape parameter ρ for each pair of assets. For the first one
(German Mark/Japanese Yen), we obtain: τ̂T = 0.37 so that ρ̂T = 0.54, while
for the second one: τ̂T = 0.18 and therefore ρ̂T = 0.29. This shows that the
dependence is stronger between the pair of currencies than between the pair
of stocks.
This method is particularly attractive due to its simplicity but is a bit
naive. While it provides very simple and robust estimators, these estima-
tors are not always very accurate. This justifies turning to more elaborated
methods, such as that developed by Genest et al. [197], which relies on the
maximization of a pseudo likelihood.

Pseudo Maximum Likelihood Estimation

Let us still consider a sample of size T {(x1 (1), x2 (1), . . . , xn (1)), . . . , (x1 (T ),
x2 (T ), . . . , xn (T ))}, drawn from a common distribution F with copula C and
margins Fi . By definition of the copula, the random vector U whose ith com-
ponent is given by Ui = Fi (Xi ) has a distribution function equal to C. Assum-
ing that the copula C = C(·; θ 0 ) belongs to the family {C(u1 , . . . , un ; θ); θ ∈
Θ ⊂ Rp }, where θ denotes the vector parameterizing the copula, the function


T
ln L = ln c (F1 (x1 (i)) , . . . , Fn (xn (i)) ; θ) , (5.29)
i=1

where c(·; θ) denotes the density of C(·; θ), provides the likelihood of the se-
quence {(u1 (k) = F1 (x1 (k)), . . . , uN (k) = FN (xN (k)))}Tk=1 . Note that the se-
quence {(u1 (k) = F1 (x1 (k)), . . . , uN (k) = FN (xN (k)))}Tk=1 is independently
and identically distributed provided that the xi (k)’s are independent and iden-
tically distributed realizations.
Since the marginal distributions are generally unknown and when no para-
metric model seems available, it is reasonable to use the empirical marginal
distribution functions F̂i defined by (5.2) to obtain an estimator of U ,


Û = F̂1 (X1 ), . . . , F̂n (Xn ) . (5.30)

Then, one derives the pseudo-sample {(û1 (k), . . . , ûn (k))}Tk=1 , where ûi (k) =
F̂i (xi (k)), which is not iid even if the xi (k)’s are. Hence, substituting the
u(k)’s for the û(k)’s in the log-likelihood function (5.29), one obtains the
peudo log-likelihood of the model, based on the sample {(x1 (1), . . . , x1 (T )),
. . . , (xn (1), . . . , xn (T ))}:
198 5 Description of Financial Dependences with Copulas


T

ln L̃ = ln c F̂1 (x1 (i)) , . . . , F̂n (xn (i)) ; θ . (5.31)
i=1

Finally, the parameter vector θ is estimated by maximizing the pseudo log-


likelihood, so that

θ̂ T = arg max ln L̃ ({x1 (i), . . . , xn (i)}; θ) . (5.32)


θ

Under the usual technical regularity conditions ensuring the consistency


and asymptotic normality of maximum likelihood estimators, Genest et al.
[197] have shown that θ̂ T is a consistent estimator of θ 0 and that it is asymp-
totically Gaussian (see also Appendix 5.B):


law  
n θ̂ T − θ 0 −→ N 0, Σ 2 (5.33)
 −1  −1  −1  
with Σ 2 = I θ 0 + I θ0 ΩI θ 0 , where I θ 0 denotes Fisher’s
information matrix at θ 0 :
 
  0  ∂c(U ; θ) ∂c(U ; θ)
I θ ij = E · , (5.34)
∂θi ∂θj θ=θ 0

and, with p = dim θ,


 p 
 
p
Ωij = Cov Wki (Uk ), Wkj (Uk ) , (5.35)
k=1 k=1

where

∂ 2 ln c(u; θ)  
Wki (Uk ) = 1{Uk ≤uk } dC u; θ 0 . (5.36)
u∈[0,1]p ∂θi ∂ui θ=θ 0

These results rely on a straightforward application of the consistency and


asymptotic normality of functionals of multivariate rank statistics derived by
Ruymgaart et al. [423, 424] and Rüschendorf [422]. Since Ω is a positive
definite matrix, the covariance matrix of the estimator θ̂ T is larger5 than
it would be, were the marginal distributions Fi perfectly known. Indeed, in
such a case, the covariance matrix of the estimator would be nothing but the
 −1
inverse of Fisher’s information matrix I θ 0 .
As an illustration, let us fit the two samples considered in Sect. 5.1.1,
namely the daily returns of the FX rate of the German Mark and of the
5
We say that a matrix A is larger than a matrix B if their difference A − B is a
positive definite matrix. In particular, it implies that the diagonal terms of A are
larger one to one than the diagonal terms of B. In the present case, this means
that the variance of each component of the pseudo maximum likelihood estimator
is larger than the variance of each component of the actual maximum likelihood
estimator, yielding less accurate estimates of the parameter θ 0 .
5.1 Estimation of Copulas 199
1 1 .25 1 0.5 1 .52 .3
0.5 1 1.5 1.1
1 1 3
4 0.7
5 2
0.7
2 0.9
1.5
3
0.8 0.
5 0.8 1 1.

7
0.
1

1 5

1.1
2

1
1.
1.5

1
75

1.
5

0.5
0. 1.2 1.1

25
1.

1.15
9
0.
0.6 0.6
5 1
1 1.2 1. 1

75
v

v
1.1

0.
0.5

0.5
1 1
.25

9
0.7
1

0.
5

0.4 0.4
0.7

1.15

0.7
0.9

0.5
25

1
1.

1.
2 5.1
11. .111
5

1.2
5

0.7
1.21

5
0.2 1. 1 0.2 3 1
1. 0.9
2 0.5 1.5
1.5

0.7

15
1.
0.75 0.5 3 1.3 1.11
1.215

2
0.9
4

0.7 0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u

Fig. 5.2. Contour plot of the Student copula maximizing the pseudo likeli-
hood (5.31) for the daily returns of the couple German Mark/Japanese Yen over
the time interval from 1 May, 1973 to 2 November, 2001 (left panel ) and for the
couple General Motors/Procter & Gamble over the time period from 3 July, 1962
to 29 December, 2000 (right panel )

Japanese Yen over the time interval from 1 May, 1973 to 2 November, 2001
and the daily returns of the couple of stocks (General Motors; Procter &
Gamble) over the time period from 3 July, 1962 to 29 December, 2000. As
aforementioned, the kernel estimates (see Fig. 5.1) of the copulas of these two
couples suggest that the Student copula could provide a reasonable description
of their dependence structure, at least for the pair of currencies. The pseudo
log-likelihood of these samples for a Student copula with ν degrees of freedom
and shape matrix ρ can be straightforwardly derived from (3.37) p. 110. No
closed form for ρ̂ and ν̂ can be obtained. One has to maximize the pseudo
log-likelihood with a numerical procedure. Figure 5.2 depicts the contour plot
of the Student copula maximizing the pseudo likelihood for each sample. For
the sample (German Mark; Japanese Yen), we obtain the following estimates
for the shape parameter and the number of degrees of freedom respectively:
ρ̂ = 0.54 and ν̂ = 5.82. Comparing the left panels of Figs. 5.1 and 5.2, we
observe that the Student copula estimated from the data seems reasonably
close to the kernel estimate of the copula, suggesting that the copula model
is realistic in this case. For the couple (General Motors; Procter & Gamble),
we find ρ̂ = 0.29 and ν̂ = 5.92. However, when comparing the right panels of
Figs. 5.1 and 5.2, one can observe a clear discrepancy between the two models
and it is doubtful that the Student copula provides a good representation
of the dependence in this case. Settling this question requires to qualify the
goodness-of-fit of the model, which will be discussed in Sect. 5.2.
In the mean time, let us stress two important points concerning the prac-
tical implementation of the pseudo maximum likelihood estimation method.
• It is convenient to replace the empirical distribution function F̂i (·), defined
T
by (5.2), by F̂i (·). These two quantities are asymptotically equivalent
T +1
200 5 Description of Financial Dependences with Copulas

but the use of the latter allows to prevent potential unboundedness of the
pseudo log-likelihood when one of the ûi ’s tends to one.
• Any maximization algorithm requires an initialization. The choice of the
starting point is not innocuous since the performance of the algorithm
can depend, for a large part, on it. For any elliptical copula, assessing the
Kendall’s tau and applying relation (5.28) allows one to obtain a good
starting point. In fact, the estimation of ρij from τij often provides such
a good starting point that the pseudo maximum likelihood estimate of
ρ does not significantly improve on it [350]. Our examples confirm this
point: with both methods, we have obtained the same values (within their
confidence interval). In addition, the first estimation method is much faster
than the second one. These remarks are specially important when one deals
with large portfolios for which the numerical maximization of the pseudo
likelihood becomes particularly tricky (and time consuming). Therefore,
in such a case, the non-parametric estimation of ρ by relation (5.28) is
probably the best method. Then, one can obtain an accurate estimate
of the number ν of degrees of freedom by maximization of the pseudo
likelihood with respect to this single parameter only,

ν̂T = arg max L̃ ({x}; ρ̂T , ν) , (5.37)


ν

where ρ̂T denotes the non-parametric estimator of ρ obtained from (5.28).


To conclude on the two semiparametric estimation methods that we have
presented, both methods have their pros and cons. For low dimensional prob-
lems, the pseudo maximum likelihood estimator is probably the best. Its vari-
ance is usually lower: Genest et al. [197] report that the variance of this
estimator is smaller than the variance of the estimator based on Kendall’s tau
by 10–40% for Clayton’s copula (depending on the value of parameter θ). In
contrast, when the dimension of the problem is large, the pseudo maximum
likelihood method becomes time consuming and less efficient.

5.1.3 Parametric Estimation

While many procedures exist, we will only focus on maximum likelihood meth-
ods. Among those, two main approaches can be distinguished: the one-step
maximum likelihood estimation and the two-step maximum likelihood esti-
mation.
Given a multivariate distribution function F (x; θ) depending on the vec-
tor of parameters θ ∈ Θ ⊂ Rp , which can be represented as F (x; θ) =
C (F1 (x1 ; θ), . . . , Fn (xn ; θ); θ), and given a sample of size T {(x1 (1), x2 (1),
. . . , xn (1)), . . . , (x1 (T ), x2 (T ), . . . , xn (T ))}, the log-likelihood of the model is
5.1 Estimation of Copulas 201


T
ln L ({x}; θ) = ln c (F1 (x1 (i); θ), . . . , Fn (xn (i); θ); θ)
i=1

T 
T
+ ln f1 (x1 (i); θ) + · · · + ln fn (xn (i); θ) , (5.38)
i=1 i=1

where, as usual, c(·; θ) denotes the density of C(·; θ) and the fi (·; θ)’s are
the densities of the marginal distribution function Fi (·; θ)’s. The one-step
maximum likelihood estimator of θ is then

θ̂ T = arg max ln L ({x}; θ) . (5.39)


θ

Under the usual regularity conditions, it enjoys the properties of consistency


and asymptotic normality, with its asymptotic covariance matrix given by the
inverse of Fisher’s information matrix.
Consider the dependence structure for a sample, supplied by the Insur-
ance Service Office, of indemnity claims of insurance companies consisting of
indemnity payment (or loss) and allocated loss adjustment expenses (ALAE).
Applying the one-step maximum likelihood method leading to (5.39), Frees
and Valdez [183] and Klugman and Parsa [272] have shown that the depen-
dence of this sample can be reasonably modeled by Gumbel’s or Frank’s cop-
ula. We should stress that, for this procedure to work properly, the choice of
the marginals is crucial. It is thus appropriate to model each marginal distri-
bution function and perform a first maximum likelihood estimation of their
corresponding parameters. Then, together with the choice of a suitable copula,
these preliminary estimates of the parameters of the marginal distributions
provide useful starting points to globally maximize (5.38) numerically.
Pushing further this reasoning, consider the situation where one can split
the parameter vector θ under the form θ = (α, β 1 , . . . , β n ) so that

F (x; θ) = C (F1 (x1 ; β 1 ), . . . , Fn (xn ; β n ); α, β 1 , . . . , β n ) , (5.40)

i.e., the marginal distributions are functions of independent sets of parame-


ters. As a consequence, the log-likelihood reads

T
ln L ({x}; α, β 1 , . . . , β n ) = ln c (F1 (x1 (i); β 1 ), . . . , Fn (xn (i); β n ); θ)
i=1

T
+ ln f1 (x1 (i); β 1 )
i=1
..
.

T
+ ln fn (xn (i); β n ) . (5.41)
i=1

Thus, instead of looking for the global maximum


202 5 Description of Financial Dependences with Copulas


α̂T , β̂ ,1,T , . . . , β̂ n,T = arg max ln L ({x}; α, β 1 , · · · , β n ) ,
α,β 1 ,...,β n
(5.42)

over (α, β 1 , . . . , β n ), one can perform a two-steps–in fact (n + 1)-steps–


maximization of the likelihood:

T
β̂ 1,T = arg max ln f1 (x1 (i); β 1 ) (5.43)
β1
i=1
..
.

T
β̂ n,T = arg max ln fn (xn (i); β n ) (5.44)
βn
i=1

T


α̂T = arg max ln c F1 x1 (i); β̂ 1,T , . . .
α
i=1


. . . , Fn xn (i); β̂ n,T ; α, β̂ 1,T , . . . , β̂ n,T . (5.45)


One can prove that the two-step estimator θ̂ T = α̂T , β̂ 1,T , . . . , β̂ n,T is
consistent and asymptotically Gaussian [248, 372, 492],


t

θ̂ T − θ 0 −→ N 0, A−1 BA−1 ,
law
(5.46)

t
where A−1 BA−1 is the inverse of Godambe’s information matrix,6 with
⎛   ⎞
E ∂β1 ,β1 ln f1 |β01 0 0 0
⎜ ⎟
⎜ .. ⎟
A=⎜
⎜ 0 .
 0  0 ⎟ ,

⎝ 0 0 E ∂βn ,βn ln fn |β0n 0 ⎠
E [∂β1 ,α ln c|θ0 ] · · · E [∂βn ,α ln c|θ0 ] E [∂α,α ln c|θ0 ]
(5.47)

and


B = Cov ∂β1 ln f1 |β01 , · · · , ∂βn ln fn |β0n , ∂α ln c|θ0 . (5.48)

While asymptotically less efficient than the one-step estimator, this approach
has the obvious advantage of reducing the dimensionality of the problem,
which is particularly useful when one has to resort to a numerical maximiza-
tion.
In practice, one has often to deal with samples of different lengths. This
may occur for instance when considering simultaneously mature and emerging
6
Godambe’s information matrix has been introduced in the context of inference
functions (or estimating equations) [205, 354].
5.1 Estimation of Copulas 203

markets with different lifespans, or market returns together with the returns of
a company which has only been recently introduced in the stock exchange or
which has defaulted, or foreign exchange rates where one of the currencies of
interest has only a short history, such as the Euro. In such a case, the two-step
method is much better than the one-step method. The latter requires using a
data set which is the intersection of all the marginal samples, leading often to
a significant loss of efficiency in the estimation of the parameters of marginal
distributions. In contrast, the two-step method uses the whole set of samples
for the estimation of marginal parameters and restricts to the intersection of
the marginal samples only for the estimation of the parameters of the copula.
This two-step estimator is still consistent and asymptotically Gaussian. Its
asymptotic variance can be derived from (5.47–5.48), by accounting for the
different lengths of the marginal samples (see Patton [380]). While the one-
step estimator still remains asymptotically more efficient than the two-step
estimator, Patton reports that the accuracy of the two-step estimator is much
better than that of the one-step estimator, when the size of the intersection
of the marginal samples is small.

5.1.4 Goodness-of-Fit Tests

Many different tests have been developed to check the goodness of fit of a


copula. Basically, the simplest approach uses the property that, under the
null hypothesis that C (u1 , . . . , un ) is the right copula, the set of random
variables

Cn (Un |U1 , . . . , Un−1 ) , . . . , C2 (U2 |U1 ) , U1 , (5.49)

with
∂u1 · · · ∂uk−1 Ck (u1 , . . . , uk )
Ck (uk |u1 , . . . , uk−1 ) = , (5.50)
∂u1 · · · ∂uk−1 Ck−1 (u1 , . . . , uk−1 )
are identically, uniformly, and independently distributed. This property has
already been used in Sect. 3.5.2 to provide an algorithm for the generation of
random variables with a given copula C. Thus, testing the null hypothesis is
equivalent to testing that the sample of T vectors
T
{Cn (ûn (t)|û1 (t), · · · , ûn−1 (t)) , · · · , C2 (û2 (t)|û1 (t)) , û1 (t)}t=1 , (5.51)

is drawn from a population of uniform random vectors with independent com-


ponents. Such tests date back to [121, 122]. In the same vein, the more recent
Bhattacharya-Matusita-Hellinger dependence metric discussed in Chap. 4 can
also be used [214]. It allows in particular to account for censored data [272],
which is particularly useful when dealing with insurance data. One can also
focus on a restricted area of the unit hypercube by using hit tests [381, 380].
Other alternatives consist in testing the significance of the distance in Lp ,
for some p, between the null copula C and the estimated copula Ĉ:
204 5 Description of Financial Dependences with Copulas
 p

Dp = C(u) − Ĉ(u) du , (5.52)
u∈[0,1]n

but the statistical properties of such tests are rather poor [169]. A simpler
approach focuses on the discrepancy between the fitted copula and the null
copula on the main diagonal only by use of the K function:

K(z) = Pr [C (U1 , . . . , Un ) ≤ z] . (5.53)

Since K is a univariate distribution function, Kolmogorov or Anderson-


Darling tests can be applied.
In fact, this last approach is particularly interesting when one deals with
Archimedean copulas since K can be shown to admit the simple closed-form
expression [35]


n−1
ϕk (z)
K(z) = z + (−1)k · χk−1 (z) , (5.54)
k!
k=1

where ϕ(·) denotes the generator of the copula and:

∂z χk−1 (z) −1
χk (z) = , with χ0 (z) = [∂z ϕ(z)] . (5.55)
∂z ϕ(z)

5.2 Description of Financial Data in Terms


of Gaussian Copulas

Section 3.6 has discussed the importance of the Gaussian copula for financial
modeling. We now review the empirical tests of the hypothesis, denoted H0 ,
that the Gaussian copula is the correct description of the dependence between
financial assets. After summarizing the testing procedure developed in [334],
we describe the results.

5.2.1 Test Statistics and Testing Procedure

Let us first derive the test statistics which will allow us to reject or not reject
the null hypothesis H0 . The following proposition, whose proof is given in
Appendix 5.A, can be stated.
Proposition 5.2.1. Assuming that the N -dimensional random vector X =
(X1 , . . . , XN ) with joint distribution function F and marginals Fi , satisfies
the null hypothesis H0 , then, the variable


N
Z2 = Φ−1 (Fi (Xi )) (ρ−1 )ij Φ−1 (Fj (Xj )) , (5.56)
j,i=1
5.2 Description of Financial Data in Terms of Gaussian Copulas 205

where the matrix ρ is

ρij = Cov[Φ−1 (Fi (Xi )), Φ−1 (Fj (Xj ))] , (5.57)

follows a χ2 -distribution with N degrees of freedom.


The testing procedure based on this result is now described for N = 2
assets. This case N = 2 is not restrictive as it would appear at first sight
since, for portfolio analysis and risk management purposes, larger baskets of
assets should be considered. The testing procedure described here can indeed
be applied to any number of assets, and it is only for the sake of simplicity of
the exposition that the presentation is restricted to the bivariate case.
Let us consider two financial time series of size T : {x1 (1), . . . , x1 (t), . . . ,
x1 (T )} and {x2 (1), . . . , x2 (t), . . . , x2 (T )}. We assume that the vectors x(t) =
(x1 (t), x2 (t)), t ∈ {1, . . . , T } are independent and identically distributed with
distribution F , which implies that the variables x1 (t) (respectively x2 (t)),
t ∈ {1, . . . , T }, are also independent and identically distributed, with dis-
tribution F1 (respectively F2 ). We immediately note that this assumption of
independently distributed data is not very realistic. It is well-known that daily
returns are uncorrelated but that their volatility exhibits long-range depen-
dence. A natural approach would then be to filter the data with an ARCH or
GARCH process and then apply the testing procedure to the residuals. This
approach will be discussed in Sect. 5.3.4 and we do not pursue this further
here.
The empirical cumulative distribution F̂i of each variable Xi is given by

1 
T
F̂i (xi ) = 1{xi (k)≤xi } . (5.58)
T
k=1

We use these estimated cumulative distributions to obtain the nearly Gaussian


variables ŷi as


ŷi (k) = Φ−1 F̂i (xi (k)) k ∈ {1, . . . , T } . (5.59)

The sample covariance matrix ρ̂ is estimated by the expression

1
T
ρ̂ = ŷ(i) · ŷ(i)t (5.60)
T i=1

which allows us to calculate the variable



2
ẑ 2 (k) = ŷi (k) (ρ−1 )ij ŷj (k) , (5.61)
i,j=1

as defined in (5.A.6) for k ∈ {1, . . . , T }. This variable ẑ 2 (k) should be dis-


tributed according to a χ2 -distribution if the Gaussian copula hypothesis is
correct.
206 5 Description of Financial Dependences with Copulas

As recalled in Chap. 2, a standard way for comparing an empirical with a


theoretical distribution is to measure the distance between these two distrib-
utions and to perform the Kolmogorov test or the Anderson-Darling test (for
a better accuracy in the tails of the distribution). The Kolmogorov distance
is the maximum local distance among all quantiles, which is most often real-
ized in the bulk of the distribution, while the Anderson-Darling distance puts
the emphasis on the tails of the two distributions by a suitable normalization
(which is nothing but the local standard deviation of the fluctuations of the
distance). These two distances can be complemented by two additional mea-
sures which are defined as averages of the Kolmogorov distance and of the
Anderson-Darling distance respectively,
Kolmogorov: d1 = max |Fz2 (z 2 ) − Fχ2 (z 2 )| (5.62)
z

average Kolmogorov: d2 = |Fz2 (z 2 ) − Fχ2 (z 2 )| dFχ2 (z 2 ) (5.63)

|Fz2 (z 2 ) − Fχ2 (z 2 )|
Anderson-Darling: d3 = max  (5.64)
z Fχ2 (z 2 )[1 − Fχ2 (z 2 )]

|F 2 (z 2 ) − Fχ2 (z 2 )|
average Anderson-Darling: d4 =  z dFχ2 (z 2 ) (5.65)
Fχ2 (z 2 )[1 − Fχ2 (z 2 )]
The Kolmogorov distance d1 and its average d2 are more sensitive to the de-
viations occurring in the bulk of the distributions. In contrast, the Anderson-
Darling distance d3 and its average d4 are more accurate in the tails of the
distributions. Considering statistical tests for these four distances is important
in order to be as complete as possible with respect to the different sensitiv-
ity of the tests. The averaging introduced in the distances d2 and d4 (which
are simply the average of d1 and d3 respectively) provides important infor-
mation. Indeed, the distances d1 and d3 are mainly controlled by the point
that maximizes the argument within the max(·) function. They can thus be
quite sensitive to the presence of an outlier. By averaging, d2 and d4 become
less sensitive to outliers, since the weight of such points is only of order 1/T
(where T is the size of the sample) while it equals one for d1 and d3 .
For the usual Kolmogorov and Anderson-Darling distance, the law of the
empirical counterpart of d1 and d2 is known, at least asymptotically. In ad-
dition, it is free from the underlying distribution. However, for such a result
to hold, one needs to know the exact value of the covariance matrix ρ and
the exact expression of the marginal distribution functions Fi . In the present
case, the variables ẑ 2 (k), given by (5.61), are only pseudo-observations since
their assessment requires the preliminary estimation of the covariance matrix
ρ̂ and the marginal distribution functions F̂i . And, as outlined in [200, 201],
when one considers the empirical process constructed from the pseudo-sample
{ẑ 2 (k)}Tt=1 , the limiting behavior is not the same as in the case where one
would actually observe z 2 (k), because there are two extra terms: one due to
the fact that Fi is replaced by F̂i and another one due to the fact that ρ
5.2 Description of Financial Data in Terms of Gaussian Copulas 207

is replaced by ρ̂. Therefore, one cannot directly use the asymptotic results
known for these standard statistical tests.
As a very simple remedy, one can use a bootstrap method [143], whose
accuracy has been proved to be at least as good as that given by asymptotic
methods used to derive the theoretical distributions [97]. For the assessment
of the asymptotic laws of d2 and d4 , such a numerical study is compulsory
since, even for the true observations z 2 (k), one does not know the expression
of the asymptotic laws. Putting all this together, a possible implementation
of this testing procedure is the following:
1. Given the original sample {x(t)}Tt=1 , generate the pseudo-Gaussian vari-
ables ŷ(t), t ∈ {1, . . . , T } defined by (5.59).
2. Then, estimate the covariance matrix ρ̂ of the pseudo-Gaussian variables
ŷ, which allows one to compute the variables ẑ 2 and then measure the
distance of its estimated distribution to the χ2 -distribution.
3. Given this covariance matrix ρ̂, generate numerically a sample of T bi-
variate Gaussian random vectors with the same covariance matrix ρ̂.
4. For the sample of Gaussian vectors synthetically generated with covari-
ance matrix ρ̂, estimate its sample covariance matrix ρ̃ and its marginal
distribution functions F̃i .
5. To each of the T vectors of the synthetic Gaussian sample, associate the
corresponding realization of the random variable z 2 , called z̃ 2 (t).
6. Construct the empirical distribution for the variable z̃ 2 and measure the
distance between this distribution and the χ2 -distribution.
7. Repeat 10,000 times (for instance) the steps 3 to 6, and then obtain an
accurate estimate of the cumulative distribution of distances between the
distribution of the synthetic Gaussian variables and the theoretical χ2 -
distribution. This cumulative distribution represents the test statistic,
which will allow you to reject or not the null hypothesis H0 at a given
significance level.
8. The significance of the distance obtained at step 2 for the true variables –
i.e., the probability to observe, at random and under H0 , a distance larger
than the empirically estimated distance – is finally obtained by a sim-
ple reading on the complementary cumulative distribution estimated at
step 7.

5.2.2 Empirical Results

Empirical tests implementing the previous procedure have been performed


on securities, exchange rates, and commodities (metals) in [334]. Focusing on
securities and exchange rates, we summarize some of the most striking features
concerning the results obtained in this study.
208 5 Description of Financial Dependences with Copulas

Currencies

The Federal Reserve Board provides access to a large set of historical quotes
of spot foreign exchange rates. Following [334], let us focus on the Swiss Franc,
the German Mark, the Japanese Yen, the Malaysian Ringit, the Thai Baht
and the British Pound during the time interval of ten years from 25 January,
1989 to 31 December, 1998. All these exchange rates are expressed against
the US dollar.
At the 95% significance level, one observes that only 40% (according to d1
and d3 ) but 60% (according to d2 and d4 ) of the tested pairs of currencies are
compatible with the Gaussian copula hypothesis over the entire time interval.
During the first half-period from 25 January, 1989 to 11 January, 1994, 47%
(according to d3 ) and up to about 75% (according to d2 and d4 ) of the tested
currency pairs are compatible with the assumption of the Gaussian copula,
while during the second subperiod from 12 January, 1994 to 31 December,
1998, between 66% (according to d1 ) and about 75% (according to d2 , d3
and d4 ) of the currency pairs remain compatible with the Gaussian copula
hypothesis. These results raise several comments both from a statistical and
from an economic point of view.
We first have to stress that the most significant rejection of the Gaussian
copula hypothesis is obtained for the distance d3 , which is indeed the most
sensitive to the events in the tail of the distributions. The test statistics given
by this distance can indeed be very sensitive to the presence of a single large
event in the sample, so much so that the Gaussian copula hypothesis can
be rejected only because of the presence of this single event (outlier). The
difference between the results given by d3 and d4 (the averaged d3 ) are very
significant in this respect. The case of the German Mark and the Swiss Franc
provides a particularly startling example. Indeed, during the time interval
from 12 January, 1994 to 31 December, 1998, the probability p(d) of non-
rejection is rather high according to d1 , d2 and d4 (p(d) ≥ 31%) while it is
very low according to d3 : p(d) = 0.05%, which should lead to the rejection
of the Gaussian copula hypothesis on the basis of the distance d3 alone. This
discrepancy between the different distances suggests the presence of an outlier
in the sample.
To check this hypothesis, we show in the upper panel of Fig. 5.3 the func-
tion
|Fz2 (z 2 (t)) − Fχ2 (χ2 (t))|
f3 (t) =  , (5.66)
Fχ2 (χ2 )[1 − Fχ2 (χ2 )]

used in the definition of the Anderson-Darling distance d3 = maxz f3 (z) (see


definition (5.64)), expressed in terms of time t rather than z 2 . The functions
have been computed over the two time subintervals separately.
Apart from three extreme peaks occurring on 20 June, 1989, 19 August,
1991 and 16 September, 1992 within the first time subinterval and one ex-
treme peak on 10 September, 1997 within the second time subinterval, the
5.2 Description of Financial Data in Terms of Gaussian Copulas 209
4
19/08/1991

10/09/97
f3

20/06/1989
1 16/09/1992

0
Jan 89 Jan 94 Dec 98

4 3

2
2
German Mark

German Mark 1

0 0

−1
−2
−2

−4 −3
−4 −2 0 2 4 −6 −4 −2 0 2 4
Swiss Franc Swiss Franc

Fig. 5.3. The upper panel represents the graph of the function f3 (t) defined in
(5.66) used in the definition of the distance d3 for the couple Swiss Franc/German
Mark as a function of time t, over the time intervals from 25 January, 1989 to
11 January, 1994 and from 12 January, 1994 to 31 December, 1998. The two lower
panels represent the scatter plot of the return of the German Mark versus the return
of the Swiss Franc during the two previous time periods. The circled dot, in each
figure, shows the pair of returns responsible for the largest deviation of f3 during
the considered time interval. Reproduced from [332]

statistical fluctuations measured by f3 (t) remain small and of the same or-
der. Removing the contribution of these outlier events in the determination of
d3 , the new statistical significance derived according to d3 becomes similar to
that obtained with d1 , d2 and d4 on each subinterval. From the upper panel
of Fig. 5.3, it is clear that the Anderson-Darling distance d3 is equal to the
height of the largest peak corresponding to the event on 19 August, 1991 for
the first period and to the event on 10 September, 1997 for the second period.
These events are depicted by a circled dot in the two lower panels of Fig. 5.3,
which represent the return of the German Mark versus the return of the Swiss
Franc over the two considered time periods.
The event on 19 August, 1991 is associated with the coup against
Gorbachev in Moscow: the German mark (respectively the Swiss Franc) lost
3.37% (respectively 0.74%) against the US dollar. The 3.37% drop of the Ger-
210 5 Description of Financial Dependences with Copulas

man Mark is the largest daily move of this currency against the US dollar over
the whole first period. On 10 September, 1997, the German Mark appreciated
by 0.60% against the US dollar while the Swiss Franc lost 0.79%, which repre-
sents a moderate move for each currency, but a large joint move. This event is
related to the contradictory announcements of the Swiss National Bank about
its monetary policy, which put an end to a rally of the Swiss Franc along with
the German mark against the US dollar.
Thus, removing the large moves associated with major historical events or
events associated with unexpected incoming information7 – which cannot be
accounted for in a statistical study, unless one relies on a stress-test analy-
sis – we obtain, for d3 , significance levels compatible with those obtained with
the other distances. We can thus conclude that, according to the four dis-
tances, during the time interval from 12 January, 1994 to 31 December, 1998
the Gaussian copula hypothesis cannot be rejected for the couple German
Mark/Swiss Franc.
From an economic point of view, the impact of regulatory mechanisms
between currencies or monetary crises can be well identified by the rejection
or the absence of rejection of the null hypothesis. Indeed, consider the couple
German Mark/British Pound. During the first half period, their correlation
coefficient is very high (ρ = 0.82) and the Gaussian copula hypothesis is
strongly rejected according to the four distances. On the contrary, during
the second half period, the correlation coefficient decreases significantly (ρ =
0.56) and none of the four distances allows us to reject the null hypothesis.
Such non-stationarity can be easily explained. Indeed, on 1 January, 1990,
the British Pound entered the European Monetary System (EMS), so that
the exchange rate between the German Mark and the British Pound was not
allowed to fluctuate beyond a margin of 2.25%. However, due to a strong
speculative attack, the British Pound was devaluated in September 1992 and
had to leave the EMS. Thus, between January 1990 and September 1992, the
exchange rate of the German Mark and the British Pound was confined within
a narrow spread, incompatible with the Gaussian copula description. After
1992, the British Pound exchange rate floated with respect to the German
Mark, and the dependence between the two currencies decreased, as shown
by their correlation coefficient. In this latter regime, one can no more reject
the Gaussian copula hypothesis.
The impact of major crises on the copula can also be clearly identified.
An example is given by the Malaysian Ringit/Thai Baht couple. During the
period from January 1989 to January 1994, these two currencies have only
undergone moderate and weakly correlated fluctuations (ρ = 0.29), so that the
null hypothesis cannot be rejected at the 95% significance level. In contrast,
during the period from January 1994 to October 1998, the Gaussian copula
7
Modeling the volatility by a mean reverting stochastic process with long memory
(the multifractal random walk (MRW)), Sornette et al. [456] have demonstrated
the outlier nature of the event on 19 August, 1991.
5.2 Description of Financial Data in Terms of Gaussian Copulas 211

hypothesis is strongly rejected. This rejection is obviously due to the persistent


and dependent (ρ = 0.44) shocks incurred by the Asian financial and monetary
markets during the 7 months of the Asian Crisis from July 1997 to January
1998 [29, 262].
These two cases show that the Gaussian copula hypothesis can be consid-
ered reasonable for currencies in the absence of regulatory mechanisms and of
strong and persistent crises. They also provide an understanding of why the
results of the test over the entire sample are so much weaker than the results
obtained for the two subintervals: the time series are strongly nonstationary.

Stocks

Let us now turn to the description of the dependence properties of the dis-
tributions of daily returns for a diversified set of stocks among the largest
companies quoted on the New York Stock Exchange. We report the results
presented in [334] concerning Appl. Materials, AT&T, Citigroup, Coca Cola,
EMC, Exxon-Mobil, Ford, General Electric, General Motors, Hewlett Packard,
IBM, Intel, MCI WorldCom, Medtronic, Merck, Microsoft, Pfizer, Procter &
Gamble, SBC Communication, Sun Microsystem, Texas Instruments, and Wal
Mart.
The dataset covers the time interval from 8 February, 1991 to 29 December,
2000. At the 95% significance level, 75% of the pairs of stocks are compatible
with the Gaussian copula hypothesis. Over the time subinterval from February
1991 to January 1996, this percentage becomes larger than 99% for d1 , d2
and d4 while it equals 94% according to d3 . Over the time subinterval from
February 1996 to December 2000, 92% of the pairs of stocks are compatible
with the Gaussian copula hypothesis according to d1 , d2 and d4 and more
than 79% according to d3 . Therefore, the Gaussian copula assumption is much
more widely accepted for stocks than it was for the currencies reported above.
In addition, the nonstationarity observed for currencies does not seem very
prominent for stocks.
For the sake of completeness, let us add a word concerning the results of
the tests performed for five stocks belonging to the computer sector : Hewlett
Packard, IBM, Intel, Microsoft, and Sun Microsystem. During the first half pe-
riod (from Feb. 1991 to Jan. 1996), all the pairs of stocks qualify the Gaussian
copula hypothesis at the 95% significance level. The results are rather differ-
ent for the second half period (from Feb. 1996 to Dec. 2000) since about 40%
of the pairs of stocks reject the Gaussian copula hypothesis according to d1 ,
d2 and d3 . This can certainly be ascribed to the existence of a few shocks,
notably associated with the crash of the “new economy” in March–April 2000
[450]. However, on the whole, it appears that there is no systematic rejection
of the Gaussian copula hypothesis for stocks within the same industrial sector,
notwithstanding the fact that one can expect correlations stronger than the
average between such stocks.
212 5 Description of Financial Dependences with Copulas

5.3 Limits of the Description in Terms


of the Gaussian Copula

5.3.1 Limits of the Tests

A severe limitation of existing tests applied to Gaussian copulas [334, 350]


is their inability to clearly distinguish between Gaussian and some relatively
close alternative models such as the Student’s copulas when these latter cop-
ulas have a sufficiently large number of degrees of freedom, typically of the
order of or larger than 10–20. As recalled in Chap. 3, the Student copula be-
comes very close to the Gaussian copula in its bulk when it has a large number
of degrees of freedom. In contrast, these two copulas still differ significantly
in the corners of the unit square (see Figs. 3.2 and 3.3). This difference has
no serious consequences for “normal” events but leads to important implica-
tions for extremes. Indeed, as discussed in Sect. 4.5.3, an alternative model to
the Gaussian copula, such as the Student’s copula, presents a significant tail
dependence, even for moderately large numbers of degrees of freedom, while
the Gaussian copula has absolutely no asymptotic tail dependence; these tail
dependences are controlled mathematically by the behavior of the copulas in
the corners of the unit square. Therefore, if the tests previously described are
unable to distinguish between a Student’s and a Gaussian copula, Occam’s
razor (simplicity and parsimony) suggests choosing the Gaussian copula and,
as a consequence, one may underestimate severely the dependence between
extreme events if the correct description turns out to be the Student’s cop-
ula. This may have catastrophic consequences in risk assessment and portfolio
management.
Figure 4.8 p. 173 provides a quantification of the dangers incurred by
mistaking a Student copula for a Gaussian one. Consider the case of a Student
copula with ν = 20 degrees of freedom with a correlation coefficient ρ lower
than 0.3 ∼ 0.4; its tail dependence λν (ρ) turns out to be less than 0.7%, i.e.,
the probability that one variable becomes extreme knowing that the other
one is extreme is less than 0.7%. In this case, the Gaussian copula with a zero
probability of simultaneous extreme events is not a bad approximation of the
Student’s copula. In contrast, consider a Student copula with a correlation
ρ larger than 0.7–0.8 , corresponding to a tail dependence larger than 10%,
which is a nonnegligible probability for simultaneous extreme events. The
effect of tail dependence becomes of course much stronger as the number ν of
degrees of freedom decreases.
These examples stress the importance of determining whether the previous
testing procedure distinguishes between a Student copula with ν = 20 (or less)
degrees of freedom and a given correlation coefficient of the order of ρ = 0.5,
for instance, and a Gaussian copula with an appropriate correlation coefficient
ρ . Due to the strikingly different behavior of these two models in the extremes,
the non-rejection of the Gaussian copula hypothesis previously found for most
assets can lead to an underestimation of the extreme risks, as a result of the
5.3 Limits of the Description in Terms of the Gaussian Copula 213

d d
1 2

1 1
0.8 0.8
0.6 0.6
95%

p95%
0.4 0.4
p

0.2 0.2
0 0
0 0.1 0.2 0 0 0.1 0.2 0
0.2 0.3 0.8 0.6 0.4 0.2 0.3 0.8 0.6 0.4
0.4 1 ρ 0.4 1 ρ
1/ν 1/ν
d3 d4

1 1
0.8 0.8
0.6 0.6
95%

p95%

0.4 0.4
p

0.2 0.2
0 0 0 0
0 0.1 0.4 0.2 0 0.1 0.4 0.2
0.2 0.3 0.6 0.2 0.3 0.6
0.4 1 0.8 ρ 0.4 1 0.8 ρ
1/ν 1/ν

Fig. 5.4. Probability of non-rejection of the Gaussian copula hypothesis when the
true copula is given by a Student copula with ν degrees of freedom and a correlation
coefficient equal to ρ (error of type II: “false positive”), when the error of type I
(“false negative”) of the test is set equal to 5%, for the four distances d1 –d4

weak sensitivity of the test in the extreme regions of the copula. It is therefore
important to discuss the sensitivity of the test presented in Sect. 5.2.1 and to
review the other alternatives proposed in the literature.

5.3.2 Sensitivity of the Method

The previous section has found that the Gaussian copula provides a reasonably
good model, in the sense that it cannot be rejected by a statistical test at the
95% significance level. However, could this be due to the lack of power of the
statistical test rather than to the goodness of the Gaussian copula?
Let us denote by Hν,ρ the hypothesis that the true copula of the data is
the Student copula with ν degrees of freedom with the correlation coefficient
ρ. Considering the alternative hypothesis Hν,ρ , one needs to know what is
the probability that one cannot reject the null hypothesis H0 when the true
model is Hν,ρ . A complementary information is: what is the minimum p-
value (significance level) of the test allowing us to reject the Gaussian copula
hypothesis for instance 95 times out of 100 when the true copula is the Student
214 5 Description of Financial Dependences with Copulas

copula. Answering these questions on the power of the test require a numerical
study.
Figure 5.4 shows the minimum p-value of the test, denoted by p95% , as
a function of the (inverse of the) number of degrees of freedom ν and of the
correlation coefficient ρ of the true Student copula. Overall, the four tests asso-
ciated with the four different distances d1 –d4 behave similarly. As expected,
for large ν, namely ν ≥ 10 − 20 (1/ν ≤ 0.05 − 0.1), a very high p-value is
required to reject the Gaussian hypothesis. In such a case, it is almost impos-
sible to distinguish between the Gaussian hypothesis and a Student copula
for most realizations. If one leaves out distance d3 , the power of the tests
is almost independent of the value of the correlation coefficient. For d3 , the
power is clearly weaker for the smallest correlations.
In the light of these results on the performance of the tests, the previous
conclusion on the relevance of the Gaussian copula for the modeling of the
dependence between financial risks must be reconsidered. Concerning curren-
cies, the non-rejection of the Gaussian copula hypothesis does not exclude
at the 95% significance level that the dependence of the currency pairs may
be described by a Student copula with adequate values of ν and ρ. For the
German Mark/Swiss Franc pair, a Student copula with about five degrees of
freedom was found to obtain the same p-values [334]. For the correlation coef-
ficient ρ = 0.92 of the German Mark/Swiss Franc pair, Student’s copula with
five degrees of freedom predicts a tail dependence coefficient λ5 (0.92) = 63%,
in constrast with a zero value for the Gaussian copula. Such a large value of
λ5 (0.92) implies that, when an extreme event occurs for the German Mark, it
also occurs for the Swiss Franc with a frequency of 63%. Therefore, a stress
scenario based on the assumption of a Gaussian copula would fail to account
for such coupled extreme events, which may represent as many as two-third
of all extreme events, if it would turn out that the true copula was Student’s
copula with five degrees of freedom. Note that, with such a large value of the
correlation coefficient, the tail dependence remains high even if the number
of degrees of freedom is as large as 20 or more (see Fig. 4.8).
The Swiss Franc and Malaysian Ringit pair offers a very different case. For
instance, during the time period from January 1994 to December 1998, the test
statistics are so high that the description of the dependence with Student’s
copula would require it to have at least 7–10 degrees of freedom. In addition,
the correlation coefficient of the two currencies is only ρ = 0.16, so that, even
in the most pessimistic situation ν = 7, the choice of the Gaussian copula
would amount to neglecting the tail dependence coefficient λ5 (0.16) = 4%
predicted by Student’s copula. In this case, stress scenarios based on the
Gaussian copula would predict uncoupled extreme events, which would be
wrong only once in 25 times.
These two examples highlight the fact that, as much as the number of
degrees of freedom of Student’s copula which is necessary to describe the
data, the correlation coefficient remains an important parameter.
5.3 Limits of the Description in Terms of the Gaussian Copula 215

5.3.3 The Student Copula: An Alternative?


The tests performed in [83, 350] show that the Student copula can provide
a significantly better description of the data than the Gaussian copula, par-
ticularly for foreign exchange (FX) rates, for which the number of degrees of
freedom of the Student copula is about 5–6 for daily returns. In both cases,
the testing procedure is based on the pseudo likelihood estimation method
detailed in Sect. 5.1.2.
Using the Akaike information criterion defined by the following formula:


AIC = −2 ln L̃ {x1 (i), · · · , xn (i)}; θ̂ + 2 dim θ . (5.67)

Breymann and his co-authors [83] have shown that the dependence struc-
ture of the German Mark/Japanese Yen couple is better described by Stu-
dent’s copula with about six degrees of freedom (for daily returns) than with
a Gaussian copula, the latter being the second best copula among a set of
five copulas comprising Clayton’s, Gumbel’s, and Frank’s copulas. This re-
sult refines those obtained in [334] and is in line with the results obtained
by non-parametric and semiparametric estimation shown in Figs. 5.1–5.2. In
addition, Student’s copula is found to provide an even better description when
one considers FX returns calculated at smaller time scales [83]. Indeed, the
Student copula seems to provide a reliable model for FX returns calculated
for time scales larger than 2 hours. The number of degrees of freedom is found
to increase with the time scale: it increases from 4 at the 2 hours time scale to
6 at the daily time scale. Such a result is expected since, under time aggrega-
tion, the distribution of returns should converge to the Gaussian distribution
according to the central limit theorem, therefore the dependence structure
of the returns is expected to also converge toward the Gaussian copula at a
large time scale. At time scales smaller than 2 hours, the study by Breymann
et al. shows that neither the Gaussian nor the Student copulas are sufficient
to describe the dependence structure of the distributions of FX returns. At
these small time scales, microstructural effects probably come into play and
require more elaborated copulas to model the dependences observed at very
high frequencies.
In addition, for all time scales, the copula of the bivariate excess returns
for high (or low) threshold appears to be best described by Clayton’s (or by
the survival Clayton) copula. This result can lead us to the following interpre-
tation on the existence of concomitant extremes. Assume that, conditional on
a frailty random variable representing the information flow, the assets returns
are independent. The copula of the returns of the 2 assets then exhibit the
behavior reported in the study by Breymann et al. [83] if one assumes that
the random variable representing the information flow has a regularly varying
distribution – which means that pieces of information with great impact on
asset returns arrive relatively often.
In contrast with the case of foreign exchange rates, the estimated number of
degrees of freedom of the Student copula best fitting the dependence between
216 5 Description of Financial Dependences with Copulas

stocks is rather high, so that the probability of concomitant extreme risks


remains weak for their usual level of correlation. In the study by Mashal
and Zeevi [350], the dependence between stocks is claimed to be significantly
better accounted for by a Student copula with 11–12 degrees of freedom than
by a Gaussian copula. Their conclusions are drawn from a generalization of
the log-likelihood ratio (or Wilks) test, based on the fact that the Gaussian
copula is nothing but a Student copula with an infinite number of degrees of
freedom. This allows them to compare directly the relevance of the Student
copula with respect to the Gaussian copula. Indeed, given two nested copulas
C1 and C2 , i.e., two copulas such that the space of parameter vectors θ 1 of
C1 is a subspace of the space of parameter vectors θ 2 of C2 , then the statistic

L̃1 (θ̂ 1 )
ΛT = −2 ln (5.68)
L̃2 (θ̂ 2 )
is asymptotically distributed as a χ21 with one degree of freedom if dim θ 1 =
dim θ 2 −1, up to a scale factor 1+γ larger than one due to the use of a pseudo
maximum-likelihood instead of the true maximum likelihood:
law
ΛT −→ (1 + γ)χ21 , as T −→ ∞ . (5.69)
The positive parameter γ depends on the choice of the model and can be
determined by numerical simulations. In more general cases where dim θ 2 −
dim θ 1 = m > 1, ΛT does not follows an asymptotic χ2m -distribution with m
degrees of freedom as in standard tests of nested hypotheses. This results from
the fact that the log-likelihood ratio statistic does not converge to a χ2 distri-
bution when the model is misspecified, which is the relevant situation when
using the pseudo likelihood instead of the true likelihood (see Appendix 5.B).
In such a case, the Wald or Lagrange multiplier tests are more appropriate
[209, 372].
While these results improve somewhat on the initial study [334], in con-
trast with the case of currencies, one can question the existence of a real
improvement brought by the Student copula to describe the dependence be-
tween stocks. Indeed, correlation coefficients between two stocks are hardly
greater than 0.4–0.5, so that the tail dependence of a Student copula with
11–12 degrees of freedom is about 2.5% or less. In view of all the differ-
ent sources of uncertainty during the estimation process in addition to the
possible non-stationarity of the data, one can doubt that such a description
eventually leads to concrete improvements for practical purposes. To highlight
this point, let us consider several portfolios made of 50% of the Standard &
Poor’s 500 index and 50% of one stock (whose name is indicated in the first
column of Table 5.1). Let us then estimate the probability Pr that this portfo-
lio incurs a loss larger than n times its standard deviation (n = 2, . . . , 5). For
the same portfolio, let us estimate the probability Pg (resp. Ps ) that it incurs
the same loss ( i.e., n times its standard deviation) when the dependence be-
tween the index and the stock is given by a Gaussian copula (resp. a Student
5.3 Limits of the Description in Terms of the Gaussian Copula 217

copula with ten degrees of freedom). The row named Pr /Pg/s gives the aver-
age values of Pr /Pg and Pr /Ps over the 20 portfolios. For shocks of two- and
three-standard deviations, the values of Pr /Pg close to 1 indicate that the
dependence structure is correctly captured by a Gaussian copula. For shocks
of four- and five-standard deviations, Pr /Pg becomes larger than 1, showing
that large shocks are more probable than predicted by the Gaussian depen-
dence, and all the more so, the larger the amplitude of the shocks. This occurs
notwithstanding the use of marginals with heavy tails, suggesting the effect of
a non-zero tail dependence in the true data. In contrast, the values of Pr /Ps
are significantly smaller than 1 showing that the Student copula overestimates
the frequency of large shocks. In addition, this overestimation is surprisingly
worse for larger shocks (by as much as a Factor 2.5) in the range in which the
Gaussian copula becomes less adequate. This suggests that the tail depen-
dence of the Student copula is too large to describe this data set. This simple
exercise illustrates that neither the Gaussian copula nor a Student copula with
a reasonable number of degrees of freedom provide an accurate description of
the dependence between stock returns.8 The discrepancies between these two
models and the real dependence structure becomes all the more important,
the more extreme is the amplitude of the shock. And in fact, the situation is
worse for the Student copula. This suggests that, for practical applications,
Student’s copula may not provide a real improvement with respect to the
Gaussian copula for traditional portfolio management.

5.3.4 Accounting for Heteroscedasticity

The aforementioned studies have not taken into account, or only partially,
the well-known volatility clustering phenomenon, which certainly impacts
on the dependence properties of assets returns. This issue has been addressed
by Patton [380], who has shown that the two-step maximum likelihood es-
timation can be extended to conditional copulas to account for the time-
varying nature of financial time series. Filtering marginal data by a GARCH
process, Patton has shown that the conditional dependence structure between
exchange rates (Japanese Yen against Euro) is better described by Clayton’s
copula than by the Gaussian copula. We also note that Muzy et al. [366] have
constructed a multivariate “multifractal” process to account for both volatil-
ity clustering and the dependence between assets. In this case, the conditional
copula is (nearly) Gaussian.
The main limitation of Patton’s approach comes from the fact that fil-
tering the data does not leave the dependence structure, i.e., the copula,
unchanged. Thus, the copula of the residuals is not the same as the cop-
ula of the raw returns. Moreover, the copula of the residuals changes with
8
This point confirms the doubts raised by the comparison of the nonparametric
and the semiparametric estimates of the density of the copula of the daily returns
of General Motors and Procter & Gamble, represented in Figs. 5.1–5.2.
Table 5.1. Portfolios made of 50% of the Standard & Poors 500 index and 50% of one stock (whose name is indicated in the first
column) are considered
218

100 × Pr[R ≤ −n · σ]
n=2 n=3 n=4 n=5
Pr Pg Ps Pr Pg Ps Pr Pg Ps Pr Pg Ps
Abbott Labs 2.07 2.06 2.62 0.58 0.51 0.84 0.21 0.15 0.39 0.09 0.07 0.26
American Home Products Corp. 1.98 2.07 2.72 0.51 0.56 0.98 0.30 0.24 0.43 0.17 0.13 0.22
Boeing Co. 2.03 1.96 2.50 0.53 0.51 0.95 0.21 0.18 0.44 0.13 0.09 0.19
Bristol-Myers Squibb Co. 1.56 1.81 2.33 0.55 0.48 0.98 0.26 0.22 0.81 0.11 0.1 0.42
Chevron Corp. 1.94 1.99 2.26 0.40 0.42 0.88 0.13 0.15 0.55 0.08 0.07 0.30
Du Pont (E.I.) de Nemours & Co. 2.13 2.02 2.59 0.51 0.47 0.87 0.21 0.19 0.58 0.09 0.07 0.32
Disney (Walt) Co. 1.83 1.87 2.40 0.47 0.53 1.28 0.24 0.22 0.73 0.15 0.12 0.43
General Motors Corp. 1.73 1.95 2.12 0.45 0.42 0.76 0.21 0.13 0.59 0.08 0.06 0.36
Hewlett-Packard Co. 1.77 2.08 2.54 0.53 0.51 0.99 0.21 0.19 0.44 0.08 0.09 0.15
Coca-Cola Co. 1.60 1.83 2.13 0.45 0.5 0.77 0.19 0.18 0.58 0.09 0.07 0.46
Minnesota Mining & MFG Co. 1.85 2.01 2.23 0.57 0.49 0.80 0.19 0.19 0.60 0.08 0.09 0.52
Philip Morris Cos Inc. 2.00 2.07 2.33 0.45 0.5 1.10 0.21 0.19 0.65 0.13 0.12 0.34
Pepsico Inc. 1.92 2.08 2.50 0.51 0.49 0.83 0.15 0.18 0.39 0.15 0.07 0.22
Procter & Gamble Co. 1.51 1.67 2.05 0.45 0.48 0.95 0.24 0.21 0.82 0.13 0.09 0.67
Pharmacia Corp. 1.81 1.94 2.69 0.53 0.54 1.06 0.23 0.25 0.80 0.11 0.12 0.45
Schering-Plough Corp. 1.85 1.94 2.01 0.49 0.44 0.73 0.11 0.14 0.58 0.08 0.06 0.31
Texaco Inc. 1.90 1.94 2.77 0.55 0.55 1.01 0.28 0.23 0.41 0.11 0.11 0.21
5 Description of Financial Dependences with Copulas

Texas Instruments Inc. 1.87 2.02 2.09 0.49 0.5 0.89 0.21 0.15 0.66 0.06 0.07 0.16
United Technologies Corp 2.17 2.1 2.28 0.47 0.45 0.78 0.17 0.14 0.47 0.11 0.06 0.30
Walgreen Co. 1.81 1.96 2.28 0.47 0.41 0.92 0.23 0.14 0.40 0.09 0.08 0.21
Pr /Pg/s 0.95 0.79 1.02 0.55 1.15 0.39 1.24 0.38

We estimate the probability Pr that each portfolio incurs a loss larger than n times its standard deviation (n = 2, . . . , 5). For each
portfolio, we also estimate the probability Pg (resp. Ps ) that it incurs the same loss (i.e., n times its standard deviation) when the
dependence between the index and the stock is given by a Gaussian copula (resp. a Student copula with ten degrees of freedom).
The row named Pr /Pg/s gives the average values of Pr /Pg and Pr /Ps over the 20 portfolios.
5.4 Summary 219

the chosen filter. Residuals are not the same when one filters the data with
an ARCH, a GARCH or a Multifractal Random Walk. In addition, for an
arbitrage-free market, the (multivariate) log-price process can be expressed
as a time changed multivariate Brownian motion9 [264], so that conditional
on the (realized) volatility [8, 38], the log-price process is nothing but a mul-
tivariate Brownian motion. As a consequence, conditional on the volatility,
the multivariate distribution of returns should be Gaussian, and, therefore,
the copula of conditional returns should also be the Gaussian copula. Thus,
the estimation of the conditional copula does not really bring new insights. In
fine, the discrepancy between the Gaussian copula and the conditional copula
provided by some other model mainly highlights the weakness of the model
under consideration. This raises the question whether performing a model-free
analysis (without any pre-filtering process) is not a more satisfying alterna-
tive. Obviously, the price to pay for such a model-free approach is a weakening
of the power of the statistical test due to the presence of (temporal) depen-
dence between data. There is no free lunch, neither on financial markets, nor
in statistics.

5.4 Summary
The Gaussian paradigm has had a long life in finance. While it is now clear that
marginal distributions cannot be described by Gaussian laws, especially in
their tails (see Chap. 2), the dependence structure between two or more assets
is much less known and nothing suggests to reject a priori the Gaussian copula
as a correct description of the observed dependence structure. In addition,
the Gaussian copula can be derived in a very natural way from a principle of
maximum entropy [265, 453].10 The Gaussian copula has also the advantage
of being the simplest possible one in the class of elliptical copulas, since it
is entirely specified by the knowledge of the correlation coefficients while, for
instance, Student’s copula requires in addition the specification of the number
of degrees of freedom. This has led to taking the Gaussian copula as a logical
starting point for the study of the dependence structure between financial
assets.
However, as recalled in Chap. 3, if the Gaussian and Student copulas are
very similar in their bulk, they become significantly different in their tails.
9
More precisely, in an arbitrage-free market, any n-dimensional square-integrable
log-price process ln p(t), with continuous sample path, satisfies
 t+τ  t+τ
rτ (t) = ln p(t + τ ) − ln p(t) = µ(s) ds + σ(s) dW (s) ,
t t

where µ is a predictable n-dimensional vector and σ is an n-by-n matrix. W


denotes an n-dimensional standard Brownian motion.
10
For other examples of the determination of distributions using the principle of
maximum entropy, see [410].
220 5 Description of Financial Dependences with Copulas

Concretely, the essential difference between the Gaussian and Student copu-
las is that the former has independent extremes (in the sense of the asymptotic
tail dependence; see Chap. 4), while the latter generates concomitant extremes
with a non-zero probability which is all the larger, the smaller is the number
of degrees of freedom and the larger is the correlation coefficient. Thus, by
providing a slight departure from the Gaussian copula in the bulk of the dis-
tributions, Student’s copula could also be a good candidate to model financial
dependencies. It turns out that it is indeed a good model for foreign exchange
rates. The situation is not so clear of stock returns, as Student’s copula does
not seem to perform significantly better than the Gaussian copula, both be-
ing apparently approximations of the true copula. From a practical point of
view, there have been several efforts to find better copulas, but the obtained
gains are not clear. From an economic point of view, the reasons explaining
the difference between the dependence structure of the FX rate and the stock
returns remain to be found. The differences between stock markets and FX
markets organizations can be seen as an obvious reason, but direct links be-
tween markets organization and returns distribution or copula have not yet
been clearly articulated.
One of the motivations in introducing the tail dependence coefficient λ is
to quantify the potential risks incurred in modeling the dependence structure
between assets with Gaussian copulas, for which λ = 0. Indeed, for assets with
large correlation coefficients, it may be dangerous to use Gaussian copulas as
long as one does not have a better idea of the value of the tail dependence
coefficient. Parametric models do not provide readily this information since
they fix the tail dependence coefficient and therefore do not provide an inde-
pendent test of whether λ is small (and undistinguishable from 0) or large.
To get further insight, nonparametric methods could thus be useful.
Nonparametric models have the advantage of being much more general
since, by construction, they do not assume a specific copula and might thus
allow for an independent determination of the tail dependence coefficient.
Some of these methods have the advantage of leading to estimated copulas
which are smooth and differentiable everywhere, which is convenient for the
generation of random variables having the estimated copula, for sensitivity
analysis and for the generation of synthetic scenarios [149]. However, this
advantage comes with the main drawback that the tail dependence coefficient
vanishes by construction. In sum, all methods mentioned until now suffer from
the same problem of neglecting concomitant extremes. It thus seems that the
use of copulas is not the easiest path to calibrate extreme events. We address
this problem in the next chapter, in particular by describing direct methods
for estimating extreme concomitant events.
Appendix 221

Appendix
5.A Proof of the Existence of a χ2 -Statistic
for Testing Gaussian Copulas
To prove proposition 5.2.1, we first consider an n-dimensional random vector
X = (X1 , . . . , Xn ). Let us denote by F its distribution function and by Fi
the marginal distribution of each Xi . Let us now assume that the distribution
function F satisfies H0 , so that F has a Gaussian copula with correlation ma-
trix ρ while the Fi ’s can be any distribution functions. According to Theorem
3.2.1, the distribution F can be represented as :
F (x1 , . . . , xn ) = Φρ,n (Φ−1 (F1 (x1 )), . . . , Φ−1 (FN (xn , ))) . (5.A.1)
Let us now transform the Xi ’s into Normal random variables Yi ’s:
Yi = Φ−1 (Fi (Xi )) . (5.A.2)
−1
Since the mapping Φ (Fi (·)) is increasing, the invariance Theorem 3.2.2 al-
lows us to conclude that the copula of the variables Yi ’s is identical to the
copula of the variables Xi ’s. Therefore, the variables Yi ’s have Normal mar-
ginal distributions and a Gaussian copula with correlation matrix ρ. Thus, by
definition, the multivariate distribution of the Yi ’s is the multivariate Gaussian
distribution with correlation matrix ρ:
G(y) = Φρ,n (Φ−1 (F1 (x1 )), . . . , Φ−1 (Fn (xn ))) (5.A.3)
= Φρ,n (y1 , . . . , yn ) , (5.A.4)
and Y is a Gaussian random vector. From (5.A.3–5.A.4), we have
ρij = Cov[Φ−1 (Fi (Xi )), Φ−1 (Fj (Xj ))] . (5.A.5)
Consider now the random variable

n
Z 2 = Y t ρ−1 Y = Yi (ρ−1 )ij Yj , (5.A.6)
i,j=1

where ·t denotes the transpose operator. It is well known that the variable
Z 2 follows a χ2 -distribution with n degrees of freedom. Indeed, since Y is
a Gaussian random vector with covariance matrix11 ρ, it follows that the
components of the vector
Ỹ = AY , (5.A.7)
are independent Normal random variables. Here, A denotes the square root
of the matrix ρ−1 , obtained by the Cholevsky decomposition, so that At A =
ρ−1 . Thus, the sum Ỹ t Ỹ = Z 2 is the sum of the squares of n independent
Normal random variables, which follows a χ2 -distribution with n degrees of
freedom.
11
Up to now, the matrix ρ was named correlation matrix. But in fact, since the
variables Yi ’s have unit variance, their correlation matrix is also their covariance
matrix.
222 5 Description of Financial Dependences with Copulas

5.B Hypothesis Testing with Pseudo Likelihood

Let us consider the iid sample {(x1 (1), x2 (1), . . . , xn (1)), . . . , (x1 (T ), x2 (T ),
. . . , xn (T ))} drawn from the n-dimensional distribution F with copula C and
margins Fi . We aim at estimating the unknown copula C by use of the semi-
parametric method presented in Sect. 5.1.2. Its pseudo likelihood reads


T
ln L̃T = ln c (û1 (i), . . . , ûn (i); θ) , (5.B.8)
i=1

with ûk (i) = F̂k (xk (i)), where the F̂i ’s are the empirical estimates of the
marginal distribution functions Fi ’s, and c(·; θ) denotes the copula density
Cθ , θ ∈ Θ ⊂ Rp . The parameter vector θ can be estimated by maximization
of this pseudo log-likelihood, so that

θ̂ T = arg max ln L̃ ({û1 (i), . . . , ûn (i)}; θ) . (5.B.9)


θ

Under usual regularity conditions, it can be shown that θ̂ T is a consistent


estimator of θ 0 , which is asymptotically Gaussian [197],


law  
T θ̂ T − θ 0 −→ N 0, Σ 2 (5.B.10)
 −1  −1  −1  
with Σ 2 = I θ 0 + I θ0 ΩI θ 0 , where I θ 0 represents Fisher’s
information matrix at θ 0 ,
 
  0  ∂c(U ; θ) ∂c(U ; θ)
I θ ij = E · , (5.B.11)
∂θi ∂θj θ=θ 0

and U denotes an n-dimensional random vector with distribution function C


and with
dim θ 
 θ
dim
Ωij = Cov Wki (Uk ), Wkj (Uk ) , (5.B.12)
k=1 k=1

where

∂ 2 ln c (u; θ)  
Wki (Uk ) = 1{Uk ≤uk } dC u; θ 0 .
u∈[0,1]dim θ ∂θi ∂ui θ=θ 0
(5.B.13)

These results come from a straightforward application of the consistency and


asymptotic normality of functionals of multivariate rank statistics derived
by Ruymgaart et al. [423, 424] and Rüschendorf [422]. Indeed, concerning
asymptotic normality, the maximum pseudo likelihood estimator θ̂ T satisfies:
Appendix 223


hT θ̂ T =0, (5.B.14)

where

1
T
hT (θ) = ∇θ ln c (û(i); θ) . (5.B.15)
T i=1

Now, expanding hT around θ 0 , we have


 
hT (θ) = hT (θ 0 ) + A˜T (θ 0 ) θ − θ 0 + · · · (5.B.16)

where A˜T (θ) is the Hessian matrix of hT (θ),


1  2
T
A˜T (θ) = ∂θi θj ln c (û(k); θ) . (5.B.17)
ij T
k=1

Proposition A.1 in [197] provides a generalized form of the law of large num-
bers for functionals of rank statistics, so that
   
a.s
A˜T (θ) −→ E ∂θ2i θj ln c (U ; θ) = −I θ 0 , (5.B.18)
0θ=θ
 
where I θ 0 denotes Fisher’s information matrix (5.B.11). Evaluating (5.B.16)
at θ = θ̂ T , one finally obtains
√ √  

T · hT (θ 0 ) = T · I θ 0 θ̂ T − θ 0 + op (1) , (5.B.19)

as usual.
Proposition A.1 in [197] also states a generalized form of the central limit
theorem for functionals of rank statistics, which allows one to write
√   
T · hT (θ 0 ) −→ N 0, Γ θ 0 , (5.B.20)
 0  0
where Γ θ = I θ + Ω. Then, (5.B.19–5.B.20) allow us to conclude that

 
T · θ̂ T − θ 0 −→ N 0, Σ 2 , (5.B.21)
 −1  −1  −1
where Σ 2 stands for I θ 0 + I θ0 ΩI θ 0 .
Since Ω is a positive definite matrix, the variance of the estimator θ̂ T is
larger than it would be, were the marginal distributions Fi perfectly known.
Indeed, in such a case, the variance of the estimator would be nothing but the
 −1
inverse of Fisher’s information matrix I θ 0 .
Now, let us write the vector θ of parameters as follows:
 
θ1
θ= (5.B.22)
θ2
224 5 Description of Financial Dependences with Copulas

with dim θ 1 = d and dim θ 2 = p − d. We would like to test the null hypothesis
according to which θ 1 = θ 01 , i.e, H0 = {θ ∈ Θ, θ 1 = θ 1}. In Mashal
0
 and
Zeevi’s approach [350], this amounts to test H0 = ν, Σ ; ν = ∞ , where
2

ν denotes the number of degrees of freedom of the Student’s copula and Σ 2


its shape matrix.
If the likelihood L̃ was the actual likelihood, and not a pseudo likelihood,
the log-likelihood ratio test would allow us to test such a null hypothesis.
Indeed, under the null, Wilks’ theorem would hold and one would have


ΛT = 2 · L̃T θ̂ T − sup L̃(θ) −→ χ2d , (5.B.23)
θ∈H0

where χ2d denotes the χ2 distribution with d degrees of freedom (see Chap. 2,
Sect. 2.4.4).
Unfortunately, this test does not apply with the pseudo likelihood, as pre-
viously assumed [209, 372, 491]. Actually, expanding the pseudo log-likelihood
(5.B.8) around θ 0 and accounting for (5.B.19), we obtain

  T
t  

L̃T θ̂ T = L̃T θ 0 + θ̂ T − θ 0 I θ 0 θ̂ T − θ 0 + op (1) .
2
(5.B.24)
0
Denoting by θ̂ T the pseudo maximum likelihood estimator under the null
hypothesis (i.e., assuming θ 1 = θ 01 ):

0 
T
θ̂ T = arg max ln c (û(i); θ) , (5.B.25)
θ∈H0
i=1

and expanding hT = T −1 ∇θ L̃T around θ 0 , which yields



0 
1 ∇θ1 L̃T θ̂ T √  0 √ 0
√ · = T ·hT θ − T ·I(θ 0 ) 0 +op (1) ,
T 0 θ̂ 2,T − θ 02
(5.B.26)

the expansion of the pseudo likelihood around θ 0 reads



0   T
0 t  
0
L̃T θ̂ T = L̃T θ 0 + θ̂ T − θ 0 I θ 0 θ̂ T − θ 0 + op (1) .
2
(5.B.27)

The notation

0 θ 01
θ̂ T = 0 , (5.B.28)
θ̂ 2,T

has been used in (5.B.26).


Appendix 225

ΛT , defined in (5.B.23), is now obtained by taking the difference between


(5.B.24) and (5.B.27):

t  

0 t  
0
ΛT = T θ̂ T − θ 0 I θ 0 θ̂ T − θ 0 − T θ̂ T − θ 0 I θ 0 θ̂ T − θ 0 + op (1),


0 t  

0

0 t  
0

=T θ̂ T − θ̂ T I θ 0 θ̂ T − θ̂ T +2T θ̂ T −θ 0 I θ 0 θ̂ T − θ̂ T + op (1) ,

where the last equality uses the fact that each term is a scalar and is thus
equal to its transpose.
Substituting (5.B.19) in (5.B.26) yields

0


1 ∇θ1 L̃T θ̂ T 0
√ · = T · I(θ 0 ) θ̂ T − θ̂ n + op (1) (5.B.29)
T 0
 t

0 t √ 0
and, left-multiplying by T · θ̂ T − θ 0 = T · 0 shows that
θ̂ 2,T − θ 02

0 t  
0

T θ̂ T − θ 0 I θ 0 θ̂ T − θ̂ T = op (1) , (5.B.30)

which allows us to conclude that




0 t  

0

ΛT = T θ̂ T − θ̂ T I θ 0 θ̂ T − θ̂ T + op (1) . (5.B.31)

Now, since (5.B.29) is equivalent to



0

 −1
0 1 ∇θ1 L̃T θ̂ T
T · θ̂ T − θ̂ T = √ · I θ0 + op (1) , (5.B.32)
T 0

we have:

0 t 

1 ∇θ1 L̃T θ̂ T  0 −1 ∇θ1 L̃T θ̂ 0T
ΛT = I θ + op (1) , (5.B.33)
T 0 0

0 t  
0
= T −1 · ∇θ1 L̃T θ̂ T I −1 11 ∇θ1 L̃T θ̂ T + op (1) , (5.B.34)
 
where I −1 11 denotes the p × p submatrix of the p first rows and columns of
the inverse of I(θ0 ).
From (5.B.29) again, we have
1  
0 √

√ · I −1 11 ∇θ1 L̃T θ̂ T = T θ̂ 1,T − θ 01 + op (1) , (5.B.35)
T
so that
226 5 Description of Financial Dependences with Copulas

0 t  
0
ΛT = T −1 · ∇θ1 L̃T θ̂ T I −1 11 ∇θ1 L̃T θ̂ n

t   −1

= T · θ̂ 1,T − θ 01 I −1 11 θ̂ 1,T − θ 01 + op (1) . (5.B.36)

Now, by (5.B.21), we have



   
T · θ̂ 1,T − θ 01 −→ N 0, Σ 2 11 . (5.B.37)

Denoting by B a symmetric positive definite matrix such that


 
B · B = Σ 2 11 (5.B.38)

and by ξ d a d-dimensional standard Gaussian vector, we obtain:


  −1
ΛT = ξ d t · B I −1 11
B · ξ d + op (1) . (5.B.39)

As a consequence,

ΛT −→
/ χ2d , as T → ∞ (5.B.40)

unless
  −1
B I −1 11
B = Idd , (5.B.41)

which holds when Ω = 0, for instance. Therefore, when one resorts to the
pseudo likelihood instead of the actual likelihood, the asymptotic distribu-
tion of Λn is not a simple χ2 distribution and the log-likelihood ratio test
becomes impracticable. In the particular case where dim θ 1 = 1, as in [350],
  −1
B I −1 11 B is a scalar so that Λn follows a χ2 distribution with one
degree of freedom, up the scale factor.
6
Measuring Extreme Dependences

In this chapter, we investigate the relative information content of several mea-


sures of dependence between two random variables X and Y in various models
of financial series. We consider measures of dependence especially defined for
large and extreme events. These measures of dependence are of two types: (i)
unconditional such as with the coefficient of tail dependence already intro-
duced in Chap. 4 and (ii) conditional such as with the correlation coefficient
conditional over a given threshold. The introduction of conditioning over val-
ues of one or both variables reaching above some threshold is a natural ap-
proach to discriminate the dependence in the tails. It explodes the concept
of dependence into a multidimensional set of measures, each adapted to cer-
tain ranges spanned by the random variables. We present explicit analytical
formulas as well as numerical and empirical estimations for these measures of
dependence. The main overall insight is that conditional measures of depen-
dence may be very different from the unconditional ones and can often lead
to paradoxical interpretations, whose origins are explained in detail.
When the dependence properties are studied as a function of time, one can
often observe that conditional measures vary with time. Such time variation
has initiated a vigorous discussion in the literature on its possible economic
meaning. We review the mechanism by which conditioning provides a straight-
forward and general mechanism for explaining changes of correlations based
on changes of volatility or of trends: for a given conditional threshold, if the
volatility of one or both time series changes in some time interval, then the
corresponding quantiles sampled in the conditional measure will also change;
as a result, the conditional measure will not sample the same part of the tails
of the distributions, effectively changing the definition of the conditional mea-
sure. In this explanation, the variation with time of conditional measures of
dependence results solely from a change of volatility but does not reflect a gen-
uine change of dependence. In other words, a constant dependence structure
together with time-varying volatility may give rise to changing conditional
measures of dependence, which would be incorrectly interpreted as reflecting
genuine changes of dependence. Thus, tools based upon conditional quantities
228 6 Measuring Extreme Dependences

should be used with caution since conditioning alone induces a change in the
dependence structure which has nothing to do with a genuine change of un-
conditional dependence. In this respect, for its stability, the coefficient of tail
dependence should be preferred to the conditional correlations. Moreover,
the various measures of dependence exhibit different and sometimes opposite
behaviors, showing that extreme dependence properties possess a multidimen-
sional character that can be revealed in various ways.
As an illustration, the theoretical results and their interpretation presented
below are applied to the controversial contagion problem across Latin Amer-
ican markets during the turmoil periods associated with the Mexican crisis
in 1994 and with the Argentinean crisis that started in 2001. The analysis of
several measures of dependence between the Argentinean, Brazilian, Chilean
and Mexican markets shows that the above conditioning effect does not fully
explain the behavior of the Latin American stock indexes, confirming the ex-
istence of a possible genuine contagion. Our analysis below suggests that the
1994 Mexican crisis has spread over to Argentina and Brazil through conta-
gion mechanisms and to Chile only through co-movements. Concerning the
recent Argentinean crisis that started in 2001, no evidence of contagion to the
other Latin American countries (except perhaps in the direction of Brazil)
can be found but significant co-movements are identified.
The chapter is organized as follows. Sect. 6.1 motivates the whole chap-
ter by presenting a number of historically important cases which suggested
to previous authors that, “during major market events, correlations change
dramatically” [71]. This section then offers a review of the different existing
view points on conditional dependences.
Section 6.2 describes three conditional correlation coefficients:
• the correlation ρ+
v conditioned on signed exceedance of one variable,
• or on both variables (ρu ) and
• the correlation ρsv conditioned on the exceedance of the absolute value of
one variable (amounting to a conditioning on large values of the volatility).
Boyer et al. [78] have provided the general expression of ρ+ s
v and ρv for the
Gaussian bivariate model, which we use to derive their v dependence for large
thresholds v. This analysis shows that, for a given distribution, the condi-
tional correlation coefficient changes even if the unconditional correlation is
left unchanged, and the nature of this change depends on the conditioning set.
We then give the general expression of ρ+ s
v and ρv for the Student’s bivariate
model with ν degrees of freedom and for the factor model X = βY + , for
arbitrary distributions of Y and . By comparison with the Gaussian model,
these expressions exemplify that, for a fixed conditioning set, the behavior of
the conditional correlation change dramatically from one distribution to an-
other one. Conditioning on both variables, we give the asymptotic dependence
of ρu for the bivariate Gaussian model and show that it essentially behaves
like ρ+
v . Applying these results to the Latin American stock indexes, we find
that one cannot entirely explain the behavior of the conditional correlation
6 Measuring Extreme Dependences 229

coefficient for these markets by the conditioning effect, suggesting the exis-
tence of a possible genuine contagion as mentioned above.
In Sect. 6.3, to account for several deficiencies of the correlation coefficient,
we study an alternative measure of dependence, the conditional rank corre-
lation (Spearman’s rho) which, in its unconditional form, is related to the
probability of concordance and discordance of several events drawn from the
same probability distribution, as recalled in Chap. 4. This measure provides
an important improvement with respect to the correlation coefficient since it
only takes into account the dependence structure of the variable and is not
sensitive to the marginal behavior of each variable. Numerical computations
allow us to derive the behavior of the conditional Spearman’s rho, denoted
by ρs (v). This allow us to prove that there is no direct relation between the
Spearman’s rho conditioned on large values and the correlation coefficient
conditioned on the same values. Therefore, each of these coefficients quanti-
fies a different kind of extreme dependence. Then, calibrating the models on
the Latin American market data confirms that the conditional effect cannot
fully explain the observed dependence and that contagion can therefore be in-
voked. These results are much clearer for the conditional Spearman’s rho than
for the condition (linear) correlation coefficient, due to the greater impact of
large statistical fluctuations in the later.
Section 6.4 discusses the tail-dependence parameters λ and λ̄, introduced
in Chap. 4. Applying the procedure of [390], we estimate nonparametrically
the tail dependence coefficients. We find them significant and thus conclude
that, with or without contagion mechanism, extreme co-movements must nat-
urally occur on the various Latin American markets as soon as one of them
undergoes a crisis.
Section 6.5 provides a comparison between these different results and a
synthesis. A first important message is that there is no unique measure of
extreme dependence. Each of the coefficients of extreme dependence that we
have presented provides a specific quantification that is sensitive to a certain
combination of the marginals and of the copula of the two random variables.
Similarly to risks whose adequate characterization requires an extension be-
yond the restricted one-dimensional measure in terms of the variance (volatil-
ity) to include the knowledge of the full distribution, tail-dependence has
also a multidimensional character. A second important message is that the
increase of some of the conditional coefficients of extreme dependence when
weighting more and more the extreme tail range does not necessarily signal
a genuine increase of the unconditional correlation or dependence between
the two variables. The calculations presented here firmly confirm that this
increase is a general and unavoidable result of the statistical properties of
many multivariate models of dependence. From the standpoint of the con-
tagion across Latin American markets, the theoretical and empirical results
suggest an asymmetric contagion phenomenon from Chile and Mexico towards
Argentina and Brazil: large moves of the Chilean and Mexican markets tend
to propagate to Argentina and Brazil through contagion mechanisms, i.e.,
230 6 Measuring Extreme Dependences

with a change in the dependence structure, while the converse does not hold.
As a consequence, this seems to prove that the 1994 Mexican crisis had spread
over to Argentina and Brazil through contagion mechanisms and to Chile only
through co-movements. Concerning the more recent Argentinean crisis start-
ing in 2001, no evidence of contagion to the other Latin American countries
is found (except perhaps in the direction of Brazil) and only co-movements
can be identified.

6.1 Motivations

6.1.1 Suggestive Historical Examples

The 19 October, 1987, stock-market crash stunned Wall Street professionals,


hacked about $1 trillion off the value of all U.S. stocks, and elicited predictions
of another Great Depression. On “Black Monday,” the Dow Jones industrial
average plummeted 508 points, or 22.6 percent, to 1,738.74. Contrary to com-
mon belief, the US was not the first to decline sharply. Non-Japanese Asian
markets began a severe decline on 19 October, 1987, their time, and this
decline was echoed first on a number of European markets, then in North
American, and finally in Japan. However, most of the same markets had ex-
perienced significant but less severe declines in the latter part of the previous
week. With the exception of the US and Canada, other markets continued
downward through the end of October, and some of these declines were as
large as the great crash on 19 October.
On 19 December, 1994, the Mexican government, facing a solvency crisis,
chose to devaluate the peso and abandoned its exchange rate parity with
the dollar. This devaluation plunged the country into a major financial crisis
which quickly propagated to the rest of the Latin American countries.
From July 1997 to December 1997, several East Asian markets crashed,
starting with the Thai market on 2 July , 1997 and ending with the Hong Kong
market on 17 October, 1997. After this regional event, the turmoil spread over
to the American and European markets.
The “slow” crash and in particular the turbulent behavior of the stock
markets worldwide starting mid-August 1998 are widely associated with and
even attributed to the plunge of the Russian financial markets, the devaluation
of its currency and the default of the government on its debt obligations.
The Nasdaq Composite index dropped precipitously with a low of 3227 on
17 April, 2000, corresponding to a cumulative loss of 37% counted from its
all-time high of 5133 reached on 10 March, 2000. The drop was mostly driven
by the so-called “New Economy” stocks which have risen nearly four-fold over
1998 and 1999 compared to a gain of only 50% for the Standard & Poor’s 500
index. And without technology, this benchmark would be flat.
All these events epitomize the observation often reported by market profes-
sionals that, “during major market events, correlations change dramatically”
6.1 Motivations 231

[71], as mentioned above. The possible existence of changes of correlation, or


more precisely of changes of dependence, between assets and between markets
in different market phases has obvious implications in risk assessment, port-
folio management and in the way policy and regulation should be formulated.
Concerning portfolio management, these questions related to state-varying-
dependence are important for practical applications since in such a case the
optimal portfolio will also become state-dependent. Neglecting this effect can
lead to very inefficient asset allocations [14, 15]. In this spirit, the Argentinean
crisis in 2001 has triggered fears of a contagion to other Latin American mar-
kets. Also, the Enron financial scandal at the end of 2001 seems to have opened
a flux of similar bankruptcies in other “new economy” companies.

6.1.2 Review of Different Perspectives

In the academic world, all these manifestations of propagating crises have


given birth to an intense activity concerning the notion of contagion (see
[102] for a review). According to the most commonly accepted definition, con-
tagion is characterized by as an increase in the correlation (or, more generally,
dependence) across markets during periods of turmoil. In fact, as we shall see,
there are two distinct classes of mechanisms for understanding “changes of
correlations,” not necessarily mutually exclusive.
• It is possible that there are genuine changes with time of the uncondi-
tional (with respect to amplitudes) correlations and thus of the underly-
ing structure of the dynamical processes, as observed by identifying shifts
in ARMA-ARCH/GARCH processes [440], in regime-switching models
[14, 15] or in contagion models [395, 396]. Many workers (see for instance
[314, 477]) have shown that the hypothesis of a constant conditional cor-
relation for stock returns or international equity returns must be rejected.
In fact, there is strong evidence that the correlations are not only time-
dependent but also state-dependent. Indeed, as shown in [271, 397], the
correlations increase in periods of large volatility. Moreover, Longin and
Solnik [315] have proved that the correlations across international equity
markets are also trend-dependent.
• A second class of explanation is that correlations between two variables
conditioned on signed exceedance (one-sided) or on absolute value (volatil-
ity) exceedance of one or both variables may deviate significantly from the
unconditional correlation [78, 316, 317]. In other words, with a fixed un-
conditional correlation ρ, the measured correlation conditional of a given
bullish trend, bearish trend, high or low market volatility, may in general
differ from ρ and can be viewed as a function of the specific market phase.
According to this explanation, changes of correlation may be only a fal-
lacious appearance that stems from a change of volatility or a change of
trend of the market and not from a real change of unconditional correlation
or dependence.
232 6 Measuring Extreme Dependences

The existence of the second class of explanation is appealing by its parsi-


mony, as it posits that observed “changes of correlation” may simply result
from the way the measure of dependence is performed. This approach has
been followed by several authors but is often open to misinterpretation, as
stressed in [178]. In addition, it may also be misleading since it does not
provide a signature or procedure for identifying the existence of a genuine
contagion phenomenon, if any. Therefore, in order to clarify the situation and
eventually develop more adequate tools for probing the dependences between
assets and between markets, it is highly desirable to characterize the different
possible ways with which higher or lower conditional dependence can occur in
models with constant unconditional dependence. In order to make progress, it
is necessary to first distinguish between the different measures of dependence
between two variables for large or extreme events that have been introduced
in the literature. This is because the conclusions that one can draw about
the variability of dependence are sensitive to the choice of its measure. These
measures include the following.
1. The correlation conditioned on signed exceedance of one or both variables
[101, 78, 316, 317] that we call respectively ρ+v and ρu , where u and v
denote the thresholds above which the exceedances are calculated.
2. The correlation conditioned on absolute value exceedance (or large volatil-
ity), above the threshold v, of one or both variables [101, 78, 316, 317]
that we call ρsv (for a condition of exceedance on one variable).
3. The local correlation (whose definition is given in Sect. 4.1.2), which is
immune to the biases associated with the two aforementioned conditional
correlation coefficients. Bradley and Taqqu have used it to introduce a
new diagnostic of contagion: contagion from market X to market Y is
qualified if there is more dependence between X and Y when X is doing
badly than when X exhibits typical performance, that is, if there is more
dependence at the loss tail distribution of X than at its center [80, 81, 82].
4. The tail-dependence parameter λ, which has a simple analytical expression
when using copulas [149, 147] such as the Gumbel copula [315], and whose
estimation provides useful information about the occurrence of extreme
co-movements [260, 334, 390].
5. The spectral measure associated with the tail index (assumed to be the
same for all assets) of extreme value multivariate distributions [43, 224,
462].
6. Tail indices of extremal correlations defined as the upper or lower corre-
lation of exceedances of ordered log-values [395].
7. Confidence weighted forecast correlations [53] or algorithmic complexity
measures [342].
The contribution of this chapter is both methodological and empirical. On
the methodological front, first of all, we review the existing tools available
for probing the dependence between large or extreme events for several mod-
els of interest for financial time series; second, we provide explicit analytical
6.2 Conditional Correlation Coefficient 233

expressions for these measures of dependence between two variables; third,


this allows us to quantify the misleading interpretations of certain conditional
coefficients commonly used for exploring the evolution of the dependence as-
sociated with a change in the market conditions (an increase of the volatility,
for instance). On the empirical front, the theoretical results are applied to the
controversial problem of the occurrence or absence of a contagion phenomenon
across Latin American markets during the turmoil period associated with the
Mexican crisis in 1994 or with the recent Argentinean crisis in 2001. For this
purpose, the novel insight derived from the analysis of several measures of
dependence is applied to the question of a possible evolution of the depen-
dence between the Argentinean, Brazilian, Chilean and Mexican markets with
respect to the market conditions.
The dependence measures discussed below are the conditional correlation
coefficients ρ+ s
v , ρv , ρu , the conditional Spearman’s rho ρs (v) and the tail de-
pendence coefficients λ and λ̄, whose properties have been summarized in
Chap. 4, for several models among which are the bivariate Gaussian distribu-
tion, the bivariate Student’s distribution, and the one factor model for various
distributions of the factor. A priori, one could hope for the existence of logical
links between some of these measures, such as a vanishing tail-dependence
parameter λ implies vanishing asymptotic conditional correlation coefficients.
In fact, this turns out to be wrong and one can construct simple examples
for which all possible combinations occur. Therefore, each of these measures
probe a different quality of the dependence between two variables for large or
extreme events. In addition, even if the conditional correlation coefficients are
asymptotically zero, they decay in general extremely slowly, as inverse powers
of the value of the threshold, and may thus remain significant for most practi-
cal applications. These results will allow us to assert that, somewhat similarly
to risks whose adequate characterization requires an extension beyond the
restricted one-dimensional measure in terms of the variance to include all
higher order cumulants or more generally the knowledge of the full distrib-
ution [453, 6], these results suggest that large and/or extreme dependences
have also a multidimensional character.

6.2 Conditional Correlation Coefficient


In this section, we discuss the properties of the correlation coefficient con-
ditioned on one variable. We study the difference between conditioning on
the signed values or on absolute values of the variable (conditioning on the
absolute value of the variable of interest is only meaningful when its distrib-
ution is symmetric). This allows us to conclude that conditioning on signed
values generally provides more information than conditioning on absolute val-
ues. Moreover, as already underlined for instance by Boyer et al. [78], the
conditional correlation coefficient is shown to suffer from a bias which forbids
its use as a measure of change in the correlation between two assets when
234 6 Measuring Extreme Dependences

the volatility increases (many papers on contagion unfortunately use the con-
ditional correlation coefficient as a probe to detect changes of dependence).
We then present an empirical illustration of the evolution of the correlation
between several stock indexes of Latin American markets.

6.2.1 Definition

Let us consider the correlation coefficient ρA of two real random variables X


and Y conditioned on Y ∈ A, where A is a subset of R such that Pr{Y ∈
A} > 0. By definition, the conditional correlation coefficient ρA is given by

Cov(X, Y | Y ∈ A)
ρA =  . (6.1)
Var(X | Y ∈ A) · Var(Y | Y ∈ A)

This general expression of the conditional correlation coefficient can be trans-


formed into closed formula for several standard distributions and models. This
will allow us to investigate the influence of the conditioning set and the un-
derlying model on the behavior of ρA .

6.2.2 Influence of the Conditioning Set

Let the variables X and Y have a multivariate Gaussian distribution with


(unconditional) correlation coefficient ρ. The following result has been proved
[78]:
ρ
ρA = 0 . (6.2)
Var(Y )
ρ2 + (1 − ρ2 ) Var(Y | Y ∈A)

Note that ρ and ρA have the same sign, that ρA = 0 if and only if ρ = 0
and that ρA does not depend directly on Var(X). Note also that ρA can be
either greater or smaller than ρ since Var(Y | Y ∈ A) can be either greater
or smaller than Var(Y ). Let us illustrate this property in the two following
examples, with a conditioning on large positive (or negative) returns and a
conditioning on large volatility. The difference comes from the fact that in the
first case, one accounts for the trend while one neglects this information in
the second case.
These two simple examples will show that, in the case of two Gaussian ran-
dom variables, the two conditional correlation coefficients ρ+ s
v and ρv exhibit
+
opposite behaviors since the conditional correlation coefficient ρv is a decreas-
ing function of the conditioning threshold v (and goes to zero as v → +∞)
while the conditional correlation coefficient ρsv is an increasing function of v
and goes to one as v → ∞. These opposite behaviors seem very general and
do not depend on the particular choice of the joint distribution of X and Y ,
namely the Gaussian distribution studied until now, as it will be seen in the
sequel.
6.2 Conditional Correlation Coefficient 235

This result underlines the importance of the choice of the conditioning


set with the following two caveats that we stress again. First, as already
stressed by many authors, the conditional correlation ρ+ s
v or ρv change with
the value of the threshold v even if the unconditional correlation ρ remains
unchanged. Thus, the observation of a change in the conditional correlation
does not provide a reliable signature of a change in the true (unconditional)
correlation. Second, the conditional correlations can exhibit really opposite
behaviors depending on the conditioning sets. Specifically, accounting for a
signed trend or only for its amplitude may yield a decrease or an increase
of the conditional correlation with respect to the unconditional one, so that
these changes cannot be interpreted as a strengthening or a weakening of the
correlations.

Example 1: Conditioning on Large (Positive) Returns

Let us first consider the conditioning set A = [v, +∞), with v ∈ R+ . Thus
ρA is the correlation coefficient conditioned on the returns Y larger than a
given positive threshold v. It will be denoted by ρ+
v in the sequel. Assuming for
simplicity, but without loss of generality that Var(Y ) = 1, an exact calculation
given below shows that, for large v,
ρ 1
v ∼v→∞ 
ρ+ · , (6.3)
1− ρ2 |v|

which slowly goes to zero as v goes to infinity. Obviously, by symmetry, the


conditional correlation coefficient ρ−
v , conditioned on Y smaller than v, obeys
the same formula.

Proof. We start with the calculation of the first and the second moments of
Y conditioned on Y larger than v:
√  
2 1 2 1
E(Y | Y > v) = √ v2
=v+ − 3 +O 5
, (6.4)
πe 2 erfc √2v v v v
√  
2v 2 1
E(Y | Y > v) = 1 + √ v2
2
=v +2− 2 +O
2
4
, (6.5)
πe 2 erfc √2v v v

which allows us to obtain the variance of Y conditioned on Y larger than v:


⎛ ⎞2
√ √
2v 2
Var(Y | Y > v) = 1 + √ v2
− ⎝ √ v2

v
πe 2 erfc √2 πe 2 erfc √v2
 
1 1
= 2 +O , (6.6)
v v4
which together with (6.2) yields (6.3) for large v.
236 6 Measuring Extreme Dependences

Example 2: Conditioning on Large Volatilities

Let now the conditioning set be A = (−∞, −v] ∪ [v, +∞), with v ∈ R+ .
Thus ρA is the correlation coefficient conditioned on |Y | larger than v, i.e.,
it is conditioned on a large volatility of Y . Still assuming Var(Y ) = 1, this
correlation coefficient is denoted by ρsv and, for large v
 
ρ 1 1 − ρ2 1
ρsv ∼v→∞ 0 ∼v→∞ sgn(ρ) · 1 − , (6.7)
ρ2 + 1−ρ
2 2 ρ2 v 2
2+v 2

which goes to (plus or minus) 1 as v goes to infinity according to 1−|ρsv | ∼v→∞


1−ρ2 −2
2ρ2 v .

Proof. The correlation coefficient conditioned on |Y | larger than v can be


written
ρ
ρsv = 0 2
. (6.8)
ρ2 + Var(Y1−ρ
| |Y |>v)

The first and second moment of Y conditioned on |Y | larger than v can be


easily calculated:
E(Y | |Y | > v) = 0 , (6.9)
√  
2v 2 1
E(Y 2 | |Y | > v) = 1 + √ v2
= v2 + 2 − 2 + O . (6.10)
πe 2 erfc √v2 v v4

Expression (6.10) is the same as (6.6) as it should. This gives the following
conditional variance:
√  
2v 1
Var(Y | |Y | > v) = 1 + √ v2
=v +2+O2
2
, (6.11)
πe 2 erfc √2v v

and finally yields (6.7), for large v.

Intuitive Meaning

Let us provide an intuitive explanation (see also [315]). As seen from (6.2), ρ+
v
is controlled by Var(Y | Y > v) ∝ 1/v 2 derived in the example 1. In contrast,
as seen from (6.8), ρsv is controlled by Var(Y | |Y | > v) ∝ v 2 given in the
example 2. The difference between ρ+ s
v and ρv can thus be traced back to that
between Var(Y | Y > v) ∝ 1/v and Var(Y | |Y | > v) ∝ v 2 for large v.
2

This results from the following effect. For Y > v, one can picture the
possible realizations of Y as those of a random particle on the line, which is
strongly attracted to the origin by a spring (the Gaussian distribution that
prevents Y from performing significant fluctuations beyond a few standard
6.2 Conditional Correlation Coefficient 237

deviations) while being forced to be on the right to a wall at Y = v. It is


clear that the fluctuations of the position of this particle are very small as
it is strongly glued to the impenetrable wall by the restoring spring, hence
the result Var(Y | Y > v) ∝ 1/v 2 . In contrast, for the condition |Y | > v, by
the same argument, the fluctuations of the particle are constrained to be very
close to |Y | = v, i.e., very close to Y = +v or Y = −v. Thus, the fluctuations
of Y typically flip from −v to +v and vice-versa. It is thus not surprising to
find Var(Y | |Y | > v) ∝ v 2 .
This argument makes intuitive the results Var(Y | Y > v) ∝ 1/v 2 and
Var(Y | |Y | > v) ∝ v 2 for large v and thus the results for ρ+ s
v and for ρv
if we use (6.2) and (6.8). We now attempt to justify ρv ∼v→∞ v and 1 −
+ 1

ρsv ∼v→∞ 1/v 2 directly by the following intuitive argument. Using the picture
of particles, X and Y can be visualized as the positions of two particles which
fluctuate randomly. Their joint bivariate Gaussian distribution with nonzero
unconditional correlation amounts to the existence of a spring that ties them
together. Their Gaussian marginals also exert a spring-like force attaching
them to the origin. When Y > v, the X-particle is teared off between two
extremes, between 0 and v. When the unconditional correlation ρ is less than
1, the spring attracting to the origin is stronger than the spring attracting
to the wall at v. The particle X thus undergoes tiny fluctuations around the
origin that are relatively less and less attracted by the Y -particle, hence the
v ∼v→∞ v → 0. In contrast, for |Y | > v, notwithstanding the still
1
result ρ+
strong attraction of the X-particle to the origin, it can follow the sign of
the Y -particle without paying too much cost in matching its amplitude |v|.
Relatively tiny fluctuation of the X-particle but of the same sign as Y ≈ ±v
will result in a strong ρsv , thus justifying that ρsv → 1 for v → +∞.

6.2.3 Influence of the Underlying Distribution


for a Given Conditioning Set

For a fixed conditioning set defining a specific conditional correlation coef-


ficient like ρ+ s
v or ρv , the behavior of these coefficients can be dramatically
different from a pair of random variables to another one, depending on their
underlying joint distribution. As an example, let the variables X and Y have
a multivariate Student’s distribution with ν degrees of freedom and an (un-
conditional) correlation coefficient ρ. According to the proposition stated in
Appendix 6.B.1, we have the exact formula
ρ
ρA = 0 . (6.12)
E[E(X 2 | Y )−ρ2 Y 2 | Y ∈A]
ρ2 + Var(Y | Y ∈A)

Explicit formulas for E[E(X 2 | Y ) − ρ2 Y 2 | Y ∈ A] and Var(Y | Y ∈ A) are


also given in Appendix 6.B.1. The proof of (6.12) is presented in Appendix
6.B.2.
238 6 Measuring Extreme Dependences

Expression (6.12) is the analog for a Student bivariate distribution to (6.2)


derived above for the Gaussian bivariate distribution. Again, ρ and ρA share
the following properties: they have the same sign, ρA equals zero if and only
if ρ equals zero and ρA can be either greater or smaller than ρ. Applying this
general formula (6.12) to the calculus of ρ+ s
v and ρv , we find (see Appendices
6.B.3 and 6.B.4) that, conditioning on large returns,
ρ
v −→v→+∞ 
ρ+ 0 , (6.13)
ρ + (ν − 1)
2
ν (1 − ρ )
ν−2 2

while when conditioning on large volatility,


ρ
ρsv −→v→+∞  0 . (6.14)
ρ2 + 1
(ν−1)
ν−2
ν (1 − ρ2 )

ρ+ s
v and ρv converge both, at infinity, to nonvanishing constants (excepted for
ρ = 0). Moreover, for ν larger than νc 2.839, this constant is smaller than
the unconditional correlation coefficient ρ, for all value of ρ, in the case of ρ+
v,
while for ρsv it is always larger than ρ, whatever ν (larger than two) may be.
These results show that, conditioned on large returns, ρ+ v is a decreasing
function of the threshold v (at least when ν ≥ 2.839), while, conditioned on
large volatilities, ρsv is an increasing function of v.
To give another example, let us now assume that X and Y are two random
variables following the equation:

X = βY +  , (6.15)

where α is a nonrandom real coefficient and  an idiosyncratic noise indepen-


dent of Y , whose distribution admits a centered moment of second order σ2 .
Let us also denote by σy2 the second centered moment of the variable Y . This
relation between X and Y corresponds to the so-called one factor model. This
one factor model with independence between Y and  is of course naive for
concrete applications, as it neglects the potential influence of other factors in
the determination of X. However, it has been argued to be a useful model in
the context of contagion, and several studies have been based upon it (see [29]
or [178], for instance). Moreover, it provides a simple illustrative model with
rich and somewhat surprising results.
One can straightforwardly show that the conditional correlation coefficient
of X and Y is
ρ
ρA = 0 , (6.16)
Var(y)
ρ2 + (1 − ρ2 ) Var(y | y∈A)

where
6.2 Conditional Correlation Coefficient 239

β · σy
ρ= 0 (6.17)
β 2 · σy2 + σ2

denotes the unconditional correlation coefficient of X and Y . Note that the


term σ2 in the expression (6.17) of ρ is the only place where the influence of
the idiosyncratic noise is felt.
Expression (6.16) is the same as (6.2) for the bivariate Gaussian situation
studied in Sect. 6.2.2. This is not surprising since, in the case where Y and 
have univariate Gaussian distributions, the joint distribution of X and Y is
a bivariate Gaussian distribution. The new fact is that this expression (6.16)
remains true whatever the distribution of Y and , provided that their second
moments exist.
We now present the asymptotic expression of ρA for Y with a Gaussian
or a Student’s distribution. Note that the expression of ρA is simple enough
to allow for exact calculations for a larger class of distributions, but for illus-
tration, these two simple cases will be sufficient.
Assuming that Y has a Gaussian distribution, while the distribution of 
can be everything (provided that E[2 ] < ∞), allows us to show that the same
results as those given by (6.3) and (6.7) still hold, so that ρ+ v goes to zero,
while ρsv goes to one.
In contrast, assuming that Y has a Student’s distribution yields for both
ρ+ s
v and ρv :

sgn(β)
ρ+,s
v ∼0 , (6.18)
1 + vK2

where K is a positive constant. ρ+,s


v thus goes to ±1 as v goes to infinity with
1 − |ρ+,s
v | ∝ 1/v 2
, which shows that they can have similar behaviors.

6.2.4 Conditional Correlation Coefficient on Both Variables

Since the exploration of the behavior of the correlation coefficient conditioned


on only one variable clearly indicates that it can exhibit any kind of behavior,
it is natural to look for the effect of a more constraining conditioning. To
this aim, let us consider two random variables X and Y and define their
conditional correlation coefficient ρA,B , conditioned upon X ∈ A and Y ∈ B,
where A and B are two subsets of R such that Pr{X ∈ A, Y ∈ B} > 0, by

Cov(X, Y | X ∈ A, Y ∈ B)
ρA,B =  . (6.19)
Var(X | X ∈ A, Y ∈ B) · Var(Y | X ∈ A, Y ∈ B)

In this case, it is much more difficult to obtain general results for any
specified class of distributions compared with the previous case of conditioning
on a single variable. Here, we give the asymptotic behavior for a Gaussian
240 6 Measuring Extreme Dependences

distribution in the situation detailed below, using the expressions in [252,


page 113], or proposition A.1 of [15].
Let us assume that the pair of random variables (X,Y) has a Normal
distribution with unit unconditional variance and unconditional correlation
coefficient ρ. The subsets A and B are both chosen equal to [u, +∞), with
u ∈ R+ , so that we focus on the correlation coefficient conditional on the
returns of both X and Y larger than the threshold u. Denoting by ρu the
correlation coefficient conditional on this particular choice for the subsets A
and B, Appendix 6.A shows that, for large u,
1+ρ 1
ρu ∼u→∞ ρ · , (6.20)
1 − ρ u2

v ∼v→+∞ 1/v given by (6.3)


which goes to zero. This decay is faster than ρ+
resulting from the conditioning on a single variable. However, unfortunately,
there is no qualitative change. Thus, the correlation coefficient conditioned
on both variables does not yield new significant information and does not
provide any special improvement with respect to the correlation coefficient
conditioned on a single variable.

6.2.5 An Example of Empirical Implementation

Let us consider four national stock markets in Latin America, namely Ar-
gentina (MERVAL index), Brazil (IBOV index), Chile (IPSA index) and Mex-
ico (MEXBOL index). We are particularly interested in the contagion effects
which may have occurred across these markets. We will study this question
for the market indexes expressed in US Dollar to emphasize the effect of the
devaluations of local currencies and to account for monetary crises. Doing so,
we follow the same methodology as in most contagion studies (see [178], for
instance). Our sample contains the daily (log) returns of each stock in local
currency and US dollar during the time interval from 15 January, 1992 to 15
June, 2002 and thus encompasses both the Mexican crisis as well as the more
recent Argentinean crisis.
Before applying the theoretical results derived above, we need to test
whether the distributions of the returns are not too fat-tailed so that the
correlation coefficient exists. Recall that this is the case if and only if the tail
of the distribution decays faster than a power law with tail index µ = 2, and
its estimator given by the Pearson’s coefficient is well behaved if at least the
fourth moment of the distribution is finite.
Figure 6.1 shows the complementary distribution of the positive and neg-
ative tails of the index returns of four Latin American countries in US dollars.
The positive tail clearly decays faster than a power law with tail index µ = 2.
In fact, Hill’s estimator provides a value ranging between 3 and 4 for the four
indexes. The situation for the negative tail is slightly different, particularly for
the Brazilian index. For the Argentina, the Chilean and the Mexican indexes,
6.2 Conditional Correlation Coefficient 241

Positive Tail
100

10−1
µ=2

10−2

Argentina
10−3 Brazil
Chile
Mexico
10−4
10−4 10−3 10−2 10−1 100

Negative Tail
100

10−1
µ=2
10−2

Argentina
10−3 Brazil
Chile
Mexico
10−4
10−4 10−3 10−2 10−1 100

Fig. 6.1. The upper (respectively lower) panel graphs the complementary distri-
bution of the positive (respectively the minus negative) returns in US dollar of the
indices of four countries (Argentina, Brazil, Chile and Mexico). The straight line
represents the slope of a power law with tail exponent µ = 2

the negative tail behaves almost like the positive one, but for the Brazilian
index, the negative tail exponent is hardly larger than two, as confirmed by
Hill’s estimator. This means that, in the Brazilian case, the estimates of the
correlation coefficient will be particularly noisy and thus of weak statistical
value.
We have checked that the fat-tailness of the indexes expressed in US dollar
comes from the impact of the exchange rates. Thus, an alternative should be
to consider the indexes in local currency, following the methodology of [314]
and [315], but it would lead to focus on the linkages between markets only and
to neglect the impact of the devaluations, which is precisely the main concern
of studies on contagion.
Figures 6.2, 6.3 and 6.4 give the conditional correlation coefficient ρv+,−
(plain thick line) for the pairs (Argentina/Brazil), (Brazil/Chile) and (Chile/
Mexico) while the Figs. 6.5, 6.6 and 6.7 show the conditional correlation co-
efficient ρsv for the same pairs. For each figure, the thick dashed line gives the
theoretical curve obtained under the bivariate Gaussian assumption whose
analytical expressions can be found in Sect. 6.2.2. The unconditional corre-
242 6 Measuring Extreme Dependences

Argentina−Brazil
1

0.5
ρv ,y=Brazil

0
+,−

−0.5

−1
−5 −4 −3 −2 −1 0 1 2 3
v
1
ρv ,y=Argentina

0.5

0
+,−

−0.5

−1
−4 −3 −2 −1 0 1 2 3 4
v

Fig. 6.2. In the upper panel, the thick plain curve depicts the correlation coeffi-
cient between the daily returns of the Argentinean and the Brazilian stock indices
conditional on the Brazilian stock index daily returns larger than (smaller than) a
given positive (negative) value v (after normalization by the standard deviation).
The thick dashed curve represents the theoretical conditional correlation coefficient
ρv+,− calculated for a bivariate Gaussian model, while the two thin dashed curves
define the area within which we cannot consider at the 95% confidence level that
the estimated correlation coefficient is significantly different from its Gaussian theo-
retical value. The dotted curves provide the same information under the assumption
of a bivariate Student’s model with ν = 3 degrees of freedom. The lower panel is
the same as the upper panel but the conditioning is on the Argentinean stock index
daily returns larger than (smaller than) a given positive (negative) value v (after
normalization by the standard deviation)

lation coefficient of the Gaussian model is set to the empirically estimated


unconditional correlation coefficient. The two thin dashed lines represent the
interval within which we cannot reject, at the 95% confidence level, the hy-
pothesis according to which the estimated conditional correlation coefficient is
equal to the theoretical one. This confidence interval has been estimated using
the Fisher’s statistics. Similarly, the thick dotted curve graphs the theoreti-
cal conditional correlation coefficient obtained under the bivariate Student’s
assumption with ν = 3 degrees of freedom (whose expressions are given in
6.2 Conditional Correlation Coefficient 243

Brazil−Chile
1

0.5
ρv ,y=chile

0
+,−

−0.5

−1
−4 −3 −2 −1 0 1 2 3 4
v

0.5
ρv ,y=Brazil

0
+,−

−0.5

−1
−4 −3 −2 −1 0 1 2 3
v

Fig. 6.3. Same as Fig. 6.2 for the (Brazil, Chile) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Chilean (respectively Brazilian)
stock market index

Appendices 6.B.3 and 6.B.4) and the two thin dotted lines are its 95% confi-
dence level. Here, the Fisher’s statistics cannot be applied, since it requires at
least that the fourth moment of the distribution exists. In fact, Meerschaert
and Scheffler have shown that, for ν = 3, the distribution of the sample cor-
relation converges to a stable law with index 3/2 [356]. This explains why
the confidence interval for the Student’s model with three degrees of free-
dom is much larger than the confidence interval for the Gaussian model. In
the present case, we have used a bootstrap method to derive this confidence
interval since the scale factor of the stable law is difficult to calculate.
In Figs. 6.2, 6.3 and 6.4, the changes in the conditional correlation coef-
ficients ρv+,− are not significantly different, at the 95% confidence level, from
those obtained with a bivariate Student’s model with three degrees of free-
dom. In contrast, the Gaussian model is almost always rejected as expected,
since marginal returns distributions are not Gaussian (as shown by Fig. 6.1).
In fact, similar results hold (but are not depicted here) for the three oth-
ers pairs (Argentina/Chile), (Argentina/Mexico) and (Brazil/Mexico). Since
these results are compatible with a Student’s model with constant correlation,
244 6 Measuring Extreme Dependences

Chile−Mexico
1
ρv ,y=Mexico

0.5

0
+,−

−0.5

−1
−4 −3 −2 −1 0 1 2 3 4
v

0.5
ρv ,y=Chile

0
+,−

−0.5

−1
−4 −3 −2 −1 0 1 2 3 4
v

Fig. 6.4. Same as Fig. 6.2 for the (Chile, Mexico) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Mexican (respectively Chilean)
stock market index

this suggests that no change in the correlations, and therefore no contagion


mechanism, needs to be invoked to explain the data.
Let us now discuss the results obtained for the correlation coefficient con-
ditioned on the volatility. Figures 6.5 and 6.7 show that the estimated cor-
relation coefficients conditioned on volatility remain consistent with the Stu-
dent’s model with three degrees of freedom, while they still reject the Gaussian
model. In contrast, Fig. 6.6 shows that the increase of the correlation cannot be
explained by any of the Gaussian or Student models, when conditioning on the
Mexican index volatility. Indeed, when the Mexican index volatility becomes
larger than 2.5 times its standard deviation, none of these models can ac-
count for the increase of the correlation. The same discrepancy is observed for
the pairs (Argentina/Chile), (Argentina/Mexico) and (Brazil/Mexico) which
are not shown here. In each case, the Chilean and the Mexican markets have
an impact on the Argentinean and the Brazilian markets which cannot be
accounted for by neither the Gaussian model nor the Student model with
constant correlation.
To conclude this empirical part, there is no significant increase in the real
correlation between Argentina and Brazil on the one hand and between Chile
6.2 Conditional Correlation Coefficient 245

Argentina−Brazil
1.2
1
0.8
ρv ,y=Brazil

0.6
0.4
0.2
s

0
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
v

1.2
1
ρv ,y=Argentina

0.8
0.6
0.4
0.2
s

0
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4
v

Fig. 6.5. In the upper panel, the thick plain curve gives the correlation coeffi-
cient between the daily returns of the Argentinean and the Brazilian stock indices
conditioned on the daily volatility of the Brazilian stock index being larger than
a given value v (after normalization by the standard deviation). The thick dashed
curve represents the theoretical conditional correlation coefficient ρv+,− calculated
for a bivariate Gaussian model, while the two thin dashed curves delineate the area
within which we cannot consider at the 95% confidence level that the estimated
correlation coefficient is significantly different from its Gaussian theoretical value.
The dotted curves provide the same information using a bivariate Student’s model
with ν = 3 degrees of freedom. The lower panel is the same as the upper panel but
the conditioning is on the Argentinean stock index

and Mexico on the other hand, when the volatility or the returns exhibit large
moves. In contrast, in period of high volatility, the Chilean and Mexican mar-
ket seem to have a genuine impact on the Argentinean and Brazilian markets.
A priori, this should confirm the existence of a contagion across these mar-
kets. However, this conclusion is based only on two theoretical models. One
should thus remain cautious before concluding positively on the existence of
contagion on the sole basis of these results, in particular in view of the use of
theoretical models which are all symmetric in their positive and negative tails.
Such a symmetry is crucial for the derivation of the theoretical expressions of
ρsv . However, the empirical sample distributions are certainly not symmetric,
246 6 Measuring Extreme Dependences

Brazil−Chile
1.2
1
0.8
ρv ,y=Chile

0.6
0.4
0.2
s

0
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
v
1.2
1
0.8
ρv ,y=Brazil

0.6
0.4
0.2
s

0
−0.2
0 1 2 3 4 5 6
v

Fig. 6.6. Same as Fig. 6.5 for the (Brazil, Chile) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Chilean (respectively Brazilian)
stock market index

as shown in Fig. 6.1. Using univariate and bivariate switching volatility mod-
els, Edwards and Susmel [142] have found strong volatility co-movements in
Latin American but no clear evidence of contagion.

6.2.6 Summary

The previous sections have shown that the conditional correlation coefficients
can exhibit all possible types of behavior, depending on their conditioning set
and the underlying distributions of returns. More precisely, we have shown
that the correlation coefficients, conditioned on large returns or volatility
above a threshold v, can be either increasing or decreasing functions of the
threshold, can go to any value between zero and one when the threshold goes
to infinity and can produce contradictory results in the sense that accounting
for a trend or not can lead to conclude on an absence of linear correlation or
on a perfect linear correlation. Moreover, due to the large statistical fluctua-
tions of the empirical estimates, one should be very careful when concluding
on an increase or decrease of the genuine correlations.
Thus, from the general standpoint of the study of extreme dependences,
but more particularly for the specific problem of the contagion across coun-
tries, the use of conditional correlation does not seem very informative and
6.3 Conditional Concordance Measures 247

Chile−Mexico
1.2
1
ρv ,y=Mexico

0.8
0.6
0.4
0.2
s

0
−0.2
0 1 2 3 4 5 6
v
1.2
1
0.8
ρv ,y=Chile

0.6
0.4
0.2
s

0
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
v

Fig. 6.7. Same as Fig. 6.5 for the (Chile, Mexico) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Mexican (respectively Chilean)
stock market index

is sometimes misleading since it leads to spurious changes in the observed


correlations: even when the unconditional correlation remains constant, con-
ditional correlations yield artificial changes. Since one of the most commonly
accepted and used definition of contagion is the detection of an increase of the
conditional correlations during a period of turmoil, namely when the volatil-
ity increases, these results cast serious shadows on previous studies. In this
respect, the conclusions of Calvo and Reinhart [87], about the occurrence of
contagion across Latin American markets during the 1994 Mexican crisis, but
more generally also the results of [271] or [299], on the effect of the October
1987 crash on the linkage of national markets, must be considered with some
caution. It is quite desirable to find a more reliable tool for studying extreme
dependences.

6.3 Conditional Concordance Measures

The (conditional) correlation coefficients, which have just been investigated,


suffer from several theoretical as well as empirical deficiencies. From the
248 6 Measuring Extreme Dependences

theoretical point of view, they constitute just linear measures of dependence.


Thus, as recalled in Chap. 4, they are fully satisfying only for the descrip-
tion of the dependence of variables with elliptical distributions. Moreover,
we have seen that the correlation coefficient aggregates the information con-
tained both in the marginal and in the collective behavior. The correlation
coefficient is not invariant under an increasing change of variable, a trans-
formation which is known to let unchanged the dependence structure. From
the empirical standpoint, we have seen that, for some data, the correlation
coefficient may not always exist, and even when it exits, it cannot always be
accurately estimated, due to sometimes “wild” statistical fluctuations. Thus,
it is desirable to find another measure of the dependence between two assets or
more generally between two random variables, which, contrarily to the linear
correlation coefficient, is always well-defined and only depends on the copula
properties. This ensures that this measure is not affected by a change in the
marginal distributions (provided that the mapping is increasing). It turns out
that this desirable property is shared by all measures of concordance. Among
these measures are the well-known Kendall’s tau, Spearman’s rho or Gini’s
beta (see Sect. 4.2).
However, these concordance measures are not well-adapted, as such, to the
study of extreme dependence, because they are functions of the whole distrib-
ution, including the moderate and small returns. A simple idea to investigate
the extreme concordance properties of two random variables is to calculate
these quantities conditioned on values larger than a given threshold and let
this threshold go to infinity.
In the sequel, we will only focus on the rank correlation which can be easily
estimated empirically. It offers a natural generalization of the (linear) corre-
lation coefficient. Indeed, Spearman’s rho quantifies the degree of functional
dependence, whatever the functional dependence between the two random
variables may be. This represents a very interesting improvement. Perfect cor-
relations (respectively anticorrelation) give a value 1 (respectively −1) both
for the standard correlation coefficient and for the Spearman’s rho. Otherwise,
there is no general relation allowing us to deduce the Spearman’s rho from
the correlation coefficient and vice-versa.

6.3.1 Definition

Recall that Spearman’s rho, denoted ρs in the sequel, measures the difference
between the probability of concordance and the probability of discordance
for the two pairs of random variables (X1 , Y1 ) and (X2 , Y3 ), where the pairs
(X1 , Y1 ), (X2 , Y2 ) and (X3 , Y3 ) are three independent realizations drawn from
the same distribution:
ρs = 3 (Pr[(X1 − X2 )(Y1 − Y3 ) > 0] − Pr[(X1 − X2 )(Y1 − Y3 ) < 0]) . (6.21)
Thus, setting U = FX (X) and V = FY (Y ), we have seen that ρs is nothing
but the (linear) correlation coefficient of the uniform random variables U and
6.3 Conditional Concordance Measures 249

V (see Chap. 4):

Cov(U, V )
ρs =  , (6.22)
Var(U )Var(V )

which justifies its name as a correlation coefficient of the rank, and shows that
it can easily be estimated.
An attractive feature of the Spearman’s rho is to be independent of the
margins, as we can see in equation (6.22). Thus, contrarily to the linear cor-
relation coefficient, which aggregates the marginal properties of the variables
with their collective behavior, the rank correlation coefficient takes into ac-
count only the dependence structure of the variables.
Using expression (6.22), a natural definition of the conditional rank cor-
relation, conditioned on V larger than a given threshold ṽ, can be proposed:

Cov(U, V | V ≥ ṽ)
ρs (ṽ) =  , (6.23)
Var(U | V ≥ ṽ)Var(V | V ≥ ṽ)

whose expression in term of the copula C(·, ·) is given in Appendix 6.C.


Obviously, ρs (v) is not a true concordance measure, as defined at the end
of Sect. 4.2. An alternative definition of the conditional Spearman’s rho [96] –
and more generally of any conditional concordance measure – which preserves
all the properties of concordance measures, can be obtained by considering
the concordances measures of the conditional copula defined by (3.58). As
an example, the conditional Kendall’s tau would be defined by the Kendall’s
tau of the conditional copula. This idea has several advantages. In particular,
when one focuses on the conditional Kendall’s tau, asymptotic results can
be straightforwardly derived for Archimedean copulas, in relation with result
(3.61). Indeed, considering an Archimedean copula with a regularly varying
generator φ (with tail index θ), the conditional copula (3.58) converges to
Clayton’s copula with parameter θ as the threshold u goes to zero. There-
fore, Kendall’s tau τu of the conditional copula converges to Kendall’s tau of
Clayton’s copula, so that
θ
lim τu = , (6.24)
u→0 θ+2
according to Table 4.1.

6.3.2 Example

Contrarily to the conditional correlation coefficient, it is difficult to obtain


analytical expressions for the conditional Spearman’s rho, for the Gaussian
and Student distributions. Obviously, for many families of copulas known
in closed form, equation (6.23) allows for an explicit calculation of ρs (v).
250 6 Measuring Extreme Dependences
0.9 0.9
ρ=0.1 ρ=0.1
ρ=0.3 ρ=0.3
0.8 0.8
ρ=0.5 ρ=0.5
ρ=0.7 ρ=0.7
0.7 0.7
ρ=0.9 ρ=0.9
0.6 0.6
ρs(v)

ρs(v)
0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v v

Fig. 6.8. Conditional Spearman’s rho for a bivariate Gaussian copula (left panel )
and a Student’s copula with three degrees of freedom (right panel ), with an uncon-
ditional linear correlation coefficient ρ = 0.1, 0.3, 0.5, 0.7, 0.9, as a function of the
constraint level v

However, most copulas of interest in finance have no simple closed form, so


that it is necessary to resort to numerical computations.
As an example, let us consider the bivariate Gaussian distribution (or
copula) with unconditional correlation coefficient ρ. It is well-known that its
unconditional Spearman’s rho is given by
6 ρ
ρs = · arcsin . (6.25)
π 2
The left panel of Fig. 6.8 shows the conditional Spearman’s rho ρs (v) defined
by (6.23) obtained from a numerical integration. We observe the same bias
as for the conditional correlation coefficient, namely the conditional rank cor-
relation changes with v even though the unconditional correlation is fixed to
a constant value. Nonetheless, this conditional Spearman’s rho seems more
sensitive than the conditional correlation coefficient since one can observe in
the left panel of Fig. 6.8 that, as v goes to one, the conditional Spearman’s rho
ρs (v) does not go to zero for all values of ρ (at the precision of our bootstrap
estimates), as previously observed with the conditional correlation coefficient
(see (6.3)).
The right panel of Fig. 6.8 depicts the conditional Spearman’s rho of Stu-
dent’s copula with three degrees of freedom. The biases are qualitatively the
same as for the Gaussian copula, but ρs (v) goes in this case to zero for all value
of ρ when v goes to one. Thus, here again, several different behaviors can be
observed depending on the underlying copula of the random variables. More-
over, these two examples show that the quantification of extreme dependence
is a function of the tools used to quantify this dependence. Here, the condi-
tional Spearman’s ρ goes to a nonvanishing constant for the Gaussian model,
while the conditional (linear) correlation coefficient goes to zero, contrarily to
the Student’s distribution for which the situation is exactly the opposite.
6.3 Conditional Concordance Measures 251

6.3.3 Empirical Evidence

Figures 6.9, 6.10 and 6.11 give the conditional Spearman’s rho respectively for
the (Argentinean/Brazilian), the (Brazilian/Chilean), and the (Chilean/Mex-
ican) stock markets. As previously, the plain thick line refers to the estimated
correlation, while the dashed lines refer to the Gaussian copula and its 95%
confidence levels and and dotted lines to Student’s copula with three degrees
of freedom and its 95% confidence levels.
Contrarily to the cases of the conditional (linear) correlation coefficient
exhibited in Figs. 6.2, 6.3 and 6.4, the empirical conditional Spearman’s ρ
does not always comply with the Student’s model (neither with the Gaussian

Argentina−Brazil
0.4

0.2
ρv ,y=Brazil

−0.2
+

−0.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


v

0.4
ρv ,y=Argentina

0.2

0
+

−0.2

−0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v

Fig. 6.9. In the upper panel, the thick curve shows Spearman’s rho between the
Argentinean stock index daily returns and the Brazilian stock index daily returns.
Above the quantile v = 0.5, Spearman’s rho is conditioned on the Brazilian index
daily returns whose quantiles are larger than v, while below the quantile v = 0.5 it is
conditioned on the Brazilian index daily returns whose quantiles are smaller than v.
As in the above figures for the correlation coefficients, the dashed lines refer to the
prediction of the Gaussian copula and its 95% confidence levels and the dotted lines
to Student’s copula with three degrees of freedom and its 95% confidence levels. The
lower panel is the same as the upper panel but with the conditioning done on the
Argentinean index daily returns
252 6 Measuring Extreme Dependences

Brazil−Chile
0.6

0.4
ρv ,y=Chile

0.2
+,−

−0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v

0.2

0
ρv ,y=Brazil

−0.2

−0.4
+,−

−0.6

−0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v

Fig. 6.10. Same as Fig. 6.9 for the (Brazil, Chile) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Chilean (respectively Brazilian)
stock market index

one), and thus confirm the discrepancies observed in Figs. 6.5, 6.6 and 6.7. In
all cases, for thresholds v larger than the quantile 0.5 corresponding to the
positive returns, the Student model with three degrees of freedom is almost
always sufficient to explain the data. In contrast, for the negative returns and
thus thresholds v lower then the quantile 0.5, only the interaction between
the Chilean and the Mexican markets is well described by the Student copula
and does not need to invoke the contagion mechanism. For all other pairs,
none of these models explain the data satisfyingly. Therefore, for these cases
and from the perspective of these models, the contagion hypothesis seems to
be needed.
There are however several caveats. First, even though we have considered
the most natural financial models, there may be other models with constant
dependence structure, that we have ignored, which could account for the ob-
served evolutions of the conditional Spearman’s ρ. If this is the case, then
the contagion hypothesis would not be needed. Second, the main discrepancy
between the empirical conditional Spearman’s ρ and the prediction of Stu-
dent’s model does not occur in the tails of the distribution, i.e for large and
extreme movements, but in the bulk. Thus, during periods of turmoil, the
6.3 Conditional Concordance Measures 253

Chile−Mexico
0.5
0.4
ρv ,y=Mexico

0.3
0.2
0.1
+,−

0
−0.1
−0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v

0.6

0.4
ρv ,y=Chile

0.2
+,−

−0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v
Fig. 6.11. Same as Fig. 6.9 for the (Chile, Mexico) pair. The upper (respectively
lower) panel corresponds to a conditioning on the Mexican (respectively Chilean)
stock market index

Student’s model with three degrees of freedom seems to remain a good model
of co-movements. Third, the contagion effect is never necessary for upwards
moves. Indeed, we observe the same asymmetry or trend dependence as found
in [315] for five major equity markets. This was apparent in Figs. 6.2, 6.3 and
6.4 for ρv+,− , and is strongly confirmed on the conditional Spearman’s ρ.
Interestingly, there is also an asymmetry or directivity in the mutual influ-
ence between markets. For instance, the Chilean and Mexican markets have
an influence on the Argentinean and Brazilian markets, but the later do not
have any impact on the Mexican and Chile markets. Chile and Mexico have
no contagion effect on each other while Argentina and Brazil have.
These empirical results on the conditional Spearman’s rho are different
from and often opposite to the conclusion derived from the conditional corre-
lation coefficients ρv+,− . This puts in light the difficulty in obtaining reliable,
unambiguous and sensitive estimations of conditional correlation measures.
In particular, Pearson’s coefficient usually employed to estimate the correla-
tion coefficient between two variables is known to be not very efficient when
the variables are fat-tailed and when the estimation is performed on a small
254 6 Measuring Extreme Dependences

sample. Indeed, with small samples, Pearson’s coefficient is very sensitive to


the largest value, which can lead to an important bias in the estimation. More-
over, even with large sample sizes, Meerschaert and Scheffler [356] have shown
that the nature of convergence of the Pearson coefficient of two times series
with tail index µ toward the theoretical correlation, as the sample size T tends
to infinity, is sensitive to the existence and strength of the theoretical corre-
lation. If there is no theoretical correlation between the two times series, the
sample correlation tends to zero with Gaussian fluctuations. If the theoretical
correlation is nonzero, the difference between the sample correlation and the
theoretical correlation times T 1−2/µ converges in distribution to a stable law
with index µ/2. These large statistical fluctuations are responsible for the lack
of accuracy of the estimated conditional correlation coefficient encountered in
the previous section. Thus, we think that the conditional Spearman’s ρ pro-
vides a good alternative both from a theoretical and an empirical viewpoint.

6.4 Extreme Co-movements

For the sake of completeness, and since it is directly related to the multivari-
ate extreme value theory, we study the coefficient of tail dependence λ, which
has been defined in Sect. 4.5. It would seem that the coefficient of tail depen-
dence could provide a useful measure of the extreme dependence between two
random variables for the analysis of contagion between markets. Two possi-
bilities can occur. Either the whole data set does not exhibit tail dependence,
and a contagion mechanism seems necessary to explain the occurrence of con-
comitant large movements during turmoil periods. Or, the data set exhibits
tail dependence which by itself is enough to produce concomitant extremes
(and contagion is not needed).
Unfortunately, the empirical estimation of the coefficient of tail dependence
is a strenuous task. Indeed, a direct estimation of the conditional probability
Pr{X > FX −1 (u) | Y > FY −1 (u)}, which should tend to λ when u → 1 is
very difficult to implement in practice due to the combination of the curse of
dimensionality and the drastic decrease of the number of realizations as u be-
come close to one. A better approach consists in using kernel methods, which
generally provide smooth and accurate estimators [168, 284, 305]. However,
these smooth estimators lead to copulas which are differentiable. This auto-
matically gives vanishing tail dependence, as already mentioned in Chap. 5.
Indeed, in order to obtain a nonvanishing coefficient of tail dependence, it is
necessary for the corresponding copula to be nondifferentiable at the point
(1, 1) (or at (0, 0)). An alternative is then the fully parametric approach. One
can choose to model dependence via a specific copula, and thus to deter-
mine the associated tail dependence [315, 334, 380]. The problem with such
a method is that the choice of the parameterization of the copula amounts to
choose a priori whether or not the data presents tail dependence.
6.4 Extreme Co-movements 255

In fact, there are three ways for estimating the tail dependence coefficient.
The two first methods are specific to a class of copulas or of models, while
the last one is very general, but less accurate. The first method is only re-
liable when the underlying copula is known to be Archimedean. In such a
case, the limit theorem established by Juri and Wüthrich [260] (see Chap. 3.)
allows one to estimate the tail dependence. The problem is that it is not
obvious that the Archimedean copulas provide a good representation of the
dependence structure for financial assets. For instance, the Archimedean cop-
ulas are generally inconsistent with a representation of assets by linear factor
models. A second method – based upon results of Sect. 4.5.3 – offers good
results by allowing to estimate the tail dependence in a semiparametric way,
which solely relies on the estimation of marginal distributions, when the data
can be explained by a factor model [332, 335].
When none of these situations occur, or when the factors are too difficult
to extract, a third and fully nonparametric method exists, which is based upon
the mathematical results of Ledford and Tawn [294, 295] and Coles et al. [106]
and has recently been applied by Poon et al. [390]. The method consists in
transforming the original random variables X and Y into Fréchet random
variables denoted by S and T respectively. Then, considering the variable
Z = min{S, T }, its survival distribution is:

Pr{Z > z} = L(z) · z 1/η as z → ∞ , (6.26)

where L denotes a slowly varying function. Now, assuming that

lim L(z) = d ∈ (0, 1], (6.27)


z→∞

the coefficient of tail dependence λ and the coefficient λ̄, defined by (4.84),
are simple functions of d and η: λ̄ = 2 · η − 1 with λ = 0 if η < 1, or λ̄ = 1
and λ = d otherwise. The parameters η and d can be estimated by maximum
likelihood, and deriving their asymptotic statistics allows one to test whether
the hypothesis λ̄ = 1 can be rejected or not, and consequently, whether the
data present tail dependence or not.
Let us implement this procedure on the four previously considered Latin
American markets (Argentina, Brazil, Chile and Mexico). The results for the
estimated values of the coefficient of tail dependence are given in Table 6.1
both for the positive and the negative tails. The tests show that one cannot
reject the hypothesis of tail dependence between the four considered Latin
American markets. Notice that the positive tail dependence is almost always
slightly smaller than the negative one, which could be linked with the existence
of trend asymmetry [315], but it turns out that these differences are not sta-
tistically significant. These results indicate that, according to this analysis of
the extreme dependence coefficient, the propensity of extreme co-movements
is almost the same for each pair of stock markets: even if the transmission
mechanisms of a crisis are different from one country to another one, the
propagation occurs with the same probability overall. Thus, the subsequent
256 6 Measuring Extreme Dependences

Table 6.1. Coefficients of tail-dependence between pairs among four Latin Ameri-
can markets. The figure within parenthesis gives the standard deviation of the esti-
mated value derived under the assumption of asymptotic normality of the estimators.
Only the coefficients above the diagonal are indicated since they are symmetric

Negative tail Argentina Brazil Chile Mexico


Argentina – 0.28 (0.04) 0.25 (0.04) 0.25 (0.05)
Brazil – 0.19 (0.03) 0.25 (0.05)
Chile – 0.24 (0.07)
Mexico –
Positive tail Argentina Brazil Chile Mexico
Argentina – 0.21 (0.06) 0.20 (0.04) 0.22 (0.04)
Brazil – 0.28 (0.04) 0.19 (0.04)
Chile – 0.19 (0.03)
Mexico –

Table 6.2. Coefficients of tail dependence between pairs among four Latin Amer-
ican markets derived under the assumption of a Student copula with three degrees
of freedom

Student hypothesis ν = 3
Argentina Brazil Chile Mexico
Argentina – 0.24 0.25 0.27
Brazil – 0.24 0.27
Chile – 0.28
Mexico –

risks are the same. Table 6.2 also gives the coefficients of tail dependence es-
timated under the Student’s copula (or in fact any copula derived from an
elliptical distribution – see Chap. 4) with three degrees of freedom, given by
expression (4.91). One can observe a remarkable agreement between these
values and the nonparametric estimates given in Table 6.1. This is consis-
tent with the results given by the conditional Spearman’s rho, for which we
have remarked that the Student’s copula seems to reasonably account for the
extreme dependence.

6.5 Synthesis and Consequences


Table 6.3 summarizes the asymptotic dependences for large v and u of the
signed conditional correlation coefficient ρ+
v , the unsigned conditional corre-
lation coefficient ρsv and the correlation coefficient ρu conditioned on both
6.5 Synthesis and Consequences 257

variables for the bivariate Gaussian, the Student’s model, the Gaussian factor
model and the Student’s factor model. These results provide a quantitative
proof that conditioning on exceedance leads to conditional correlation coef-
ficients that may be very different from the unconditional correlation. This
provides a straightforward mechanism for fluctuations or changes of correla-
tions, based on fluctuations of volatility or changes of trends. In other words,
the many reported variations of correlation structure might be in large part
attributed to changes in volatility (and statistical uncertainty).
The distinct dependences as a function of exceedance v and u of the condi-
tional correlation coefficients offer novel tools for characterizing the statistical
multivariate distributions of extreme events. Since their direct characteriza-
tion is in general restricted by the curse of dimensionality and the scarcity of
data, the conditional correlation coefficients provide reduced statistics which
can be estimated with reasonable accuracy and reliability at least when the
pdf of the data decays faster than any hyperbolic function with tail index
equal to 2. In this respect, the empirical results suggest that a Student’s cop-
ula, or more generally an elliptical copula, with a tail index of about three
accounts for the main extreme dependence properties investigated here. This
result is not really surprising since Chap. 5 has shown that Student’s cop-
ula is a reasonable choice to account for the dependence structure between
foreign exchange rates. In the present case, since the value of any domestic
stock index has been converted into the US dollar, the influence of the depen-
dence structure of foreign exchange rates can be considered as dominant in
comparison with the dependence structure between each domestic stock index
expressed in local currency. This dominance of the dependence structure of
foreign exchange rates seems particularly true during turmoil periods.
Table 6.4 gives the asymptotic values of ρ+ v , ρv and ρu for v → +∞ and
s

u → ∞ in order to compare them with the tail-dependence λ.


These two tables only scratch the surface of the rich sets of measures of
tail and extreme dependences. We have already stressed that complete inde-
pendence implies the absence of tail dependence: λ = 0, but that λ = 0 does
not imply independence, at least in the intermediate range, since it is only an
asymptotic property. Conversely, a nonzero tail dependence λ implies the ab-
sence of asymptotic independence. Nonetheless, it does not imply necessarily
that the conditional correlation coefficients ρ+ s
v=∞ and ρv=∞ are nonzero, as
one could have a priori expected.
Note that the examples of Table 6.4 are such that λ = 0 seems to go
hand-in-hand with ρ+ v→∞ = 0. However, the logical implication

(λ = 0) =⇒ (ρ+
v→∞ = 0)

does not hold in general. A counter example is offered by the Student’s factor
model in the case where νY > ν (the tail of the distribution of the idio-
syncratic noise is fatter than that of the distribution of the factor). In this
case, X and Y have the same tail-dependence as  and Y , which is zero by
construction. But, ρ+ s
v=∞ and ρv=∞ are both one because a large Y almost
258

s
Table 6.3. Large v and u dependence of the conditional correlations ρ+ v (signed condition), ρv (unsigned condition) and ρu (on
both variables) for the different models discussed in this chapter, described in the first column. The numbers in parentheses give the
equation numbers from which the formulas are derived. The factor model is defined by (6.15), i.e., X = βY + . ρ is the unconditional
correlation coefficient

v
ρ+ ρsv ρu

2

ρ 1 1 1−ρ 1 1+ρ 1
Bivariate Gaussian √ · v
(6.3) sgn(ρ) · 1 − 2 ρ2 v2
(6.7) ρ 1−ρ · u2
(6.20)
1−ρ2
6 Measuring Extreme Dependences

 ρ  ρ
Bivariate student’s 0 (6.13) 0 (6.14) –
ν−2 1 ν−2
ρ2 +(ν−1) ν
(1−ρ2 ) ρ2 + (ν−1) ν
(1−ρ2 )

Gaussian factor model same as (6.3) same as (6.7) same as (6.20)


 K
  K

Student’s factor model sgn(β) · 1 − 2v 2
(6.18) sgn(β) · 1 − 2v 2
(6.18) –
s
Table 6.4. Asymptotic values of ρ+ v , ρv and ρu for v → +∞ and u → ∞ and comparison with the tail-dependence λ and λ̄ for the
four models indicated in the first column. The factor model is defined by (6.15), i.e., X = αY + . ρ is the unconditional correlation
coefficient. For Student’s factor model, Y and  have centered Student’s distributions with the same number ν of degrees of freedom
2
and their scale factors are respectively equal to 1 and σ, so that ρ = (1 + σβ 2 )−1/2 . For the Bivariate Student’s distribution, we refer
s
to Table 1 for the constant values of ρ+v=∞ and ρv=∞

v=∞
ρ+ ρsv=∞ ρu=∞ λ λ̄
Bivariate Gaussian 0 sgn(ρ) 0 0 ρ

√ 0
1−ρ
Bivariate student’s see Table 6.3 see Table 6.3 – 2 · T̄ν+1 ν+1 1+ρ
1

Gaussian factor model 0 sgn(ρ) 0 0 ρ

ρν
Student’s factor model sgn(β) sgn(β) – 1
ρν +(1−ρ2 )ν/2 {β>0}
1
6.5 Synthesis and Consequences
259
260 6 Measuring Extreme Dependences

always gives a large X and the simultaneous occurrence of a large Y and a


large  can be neglected. The reason for this absence of tail dependence (in the
sense of λ) coming together with asymptotically strong conditional correlation
coefficients stems from two facts:
• first, the conditional correlation coefficients put much less weight on the
extreme tails that the tail-dependence parameter λ. In other words, ρ+ v=∞
and ρsv=∞ are sensitive to the marginals, i.e., there are determined by the
full bivariate distribution, while, as we said, λ is a pure copula property
independent of the marginals. Since ρ+ s
v=∞ and ρv=∞ are measures of ex-
treme dependence weighted by the specific shapes of the marginals, it is
natural that they may behave differently.
• Secondly, the tail dependence λ probes the extreme dependence property
of the original copula of the random variables X and Y . On the contrary,
when conditioning on Y , one changes the copula of X and Y , so that the
extreme dependence properties investigated by the conditional correlations
are not exactly those of the original copula. This last remark explains
clearly what Boyer et al. [78] call a “bias” in the conditional correlations.
Indeed, changing the dependence between two random variables obviously
changes their correlations.
There are important consequences to these facts. Consider a situation in
which one measure (λ) would conclude on asymptotic tail-independence while
the other measures ρ+ s
v=∞ and ρv=∞ would conclude the opposite. Therefore,
before concluding on a change in the dependence structure with respect to
a given parameter – the volatility or the trend, for instance – one should
check that this change does not result from the tool used to probe the depen-
dence. These results shed new light on recent controversial results about the
occurrence or absence of contagion during the Latin American crises. As in
every previous work, the analysis reported in this chapter finds no evidence of
contagion between Chile and Mexico, but contrarily to [178], it is difficult to
ignore the possibility of contagion toward Argentina and Brazil, in agreement
with [87].
In fact, most of the discrepancies between these different studies probably
stem from the fact that the conditional correlation coefficient does not provide
an accurate tool for probing the potential changes of dependence. Indeed, even
when the bias has been accounted for, the fat-tailness of the distributions of
returns are such that the Pearson’s coefficient is subjected to very strong
statistical fluctuations which forbid an accurate estimation of the correlation.
Moreover, when studying the dependence properties, it is interesting to free
oneself from the marginal behavior of each random variable. This is why the
conditional Spearman’s rho seems a good tool: it only depends on the copula
and is statistically well-behaved.
The conditional Spearman’s rho has identified a change in the dependence
structure during downward trends in Latin American markets, similar to that
found by Longin and Solnik [315] in their study of the contagion across five
Appendix 261

major equity markets. It has also put in light the asymmetry in the conta-
gion effects: Mexico and Chile can be potential sources of contagion toward
Argentina and Brazil, while the reverse does not seem to hold. This phenom-
enon has been observed during the 1994 Mexican crisis and appears to remain
true in the recent Argentinean crisis, for which only Brazil seems to exhibit
the signature of a possible contagion.
The origin of the discovered asymmetry may lie in the difference between
the more market-oriented countries and the more state-intervention oriented
economies, giving rise to either currency floating regimes adapted to an im-
portant manufacturing sector which tend to deliver more competitive real
exchange rates (Chile and Mexico) or to fixed rate pegs (Argentina until the
2001 crisis and Brazil until the early 1999 crisis) [187, 188, 189]. The asym-
metry of the contagion is compatible with the view that fixed exchange rates
tighten more strictly an economy and its stock market to external shocks (case
of Argentina and Brazil) while a more flexible exchange rate seems to pro-
vide a cushion allowing a decoupling between the stock market and external
influences.
Finally, the absence of contagion does not imply necessarily the absence
of contamination. Indeed, the study of the coefficient of tail dependence has
proven that with or without contagion mechanisms (i.e., increase in the link-
age between markets during crisis) the probability of extreme co-movements
during the crisis (i.e., the contamination) is almost the same for all pairs of
markets. Thus, whatever the propagation mechanism may be – historically
strong relationship or irrational fear and herd behavior – the observed effects
are the same: the propagation of the crisis. From the practical perspective of
risk management or regulatory policy, this last point is perhaps more impor-
tant than the real knowledge of the occurrence or not of contagion.

Appendix
6.A Correlation Coefficient for Gaussian Variables Conditioned
on Both X and Y Larger Than u

Let us consider a pair of Normal random variables (X, Y ) ∼ N (0, Σ) where Σ


is their covariance matrix with unconditional correlation coefficient ρ. Without
loss of generality, and for simplicity, we shall assume that Σ has unconditional
variances equal to 1. By definition, the conditional correlation coefficient ρu ,
conditioned on both X and Y larger than u, is
Cov[X, Y | X > u, Y > u]
ρu =   , (6.A.1)
Var[X | X > u, Y > u] Var[Y | X > u, Y > u]
m11 − m10 · m01
= √ √ , (6.A.2)
m20 − m10 2 m02 − m01 2
where mij denotes E[X i · Y j | X > u, Y > u].
262 6 Measuring Extreme Dependences

Using the proposition A.1 of [15] or the expressions in [252, p.113], we can
assert that
  
1−ρ
m10 L(u, u; ρ) = (1 + ρ) ϕ(u) 1 − Φ u , (6.A.3)
1+ρ
  
1−ρ
m20 L(u, u; ρ) = (1 + ρ2 ) u ϕ(u) 1 − Φ u
1+ρ
  
ρ 1 − ρ2 2
+ √ ϕ u + L(u, u; ρ), (6.A.4)
2π 1+ρ
  
1−ρ
m11 L(u, u; ρ) = 2ρ u ϕ(u) 1 − Φ u
1+ρ
  
1−ρ 2 2
+ √ ϕ u + ρ L(u, u; ρ) , (6.A.5)
2π 1+ρ
where L(·, ·; ·) denotes the bivariate Gaussian survival (or complementary
cumulative) distribution:
 ∞  ∞  
1 1 x2 − 2ρxy + y 2
L(h, k; ρ) =  dx dy exp − , (6.A.6)
2π 1 − ρ2 h k 2 1 − ρ2

ϕ(·) is the Gaussian density:


1 x2
ϕ(x) = √ e− 2 , (6.A.7)

and Φ(·) is the cumulative Gaussian distribution:
 x
Φ(x) = du ϕ(u) . (6.A.8)
−∞

6.A.1 Asymptotic Behavior of L(u, u; ρ)

Let us focus on the asymptotic behavior of L(u, u; ρ), where L(h, k; ρ) is de-
fined by (6.A.6), for large u. Performing the change of variables x = x − u
and y  = y − u, we can write
u2  ∞  ∞  
e− 1+ρ   x + y 
L(u, u; ρ) =  dx dy exp −u
2π 1 − ρ2 0 0 1+ρ
 
1 x2 − 2ρx y  + y 2
× exp − . (6.A.9)
2 1 − ρ2
Using the fact that
 
1 x2 − 2ρx y  + y 2 x2 − 2ρx y  + y 2
exp − = 1 −
2 1 − ρ2 2(1 − ρ2 )
2   2 2
(x − 2ρx y + y ) (x − 2ρx y + y 2 )3
2  
+ − + ··· , (6.A.10)
8(1 − ρ2 )2 48(1 − ρ2 )3
Appendix 263

and applying Theorem 3.1.1 in [247, p. 68] (Laplace’s method), (6.A.9) and
(6.A.10) yield
u2 
(1 + ρ)2 e− 1+ρ (2 − ρ)(1 + ρ) 1
L(u, u; ρ) =  · 1− · 2
2π 1 − ρ 2 u 2 1−ρ u
(2ρ2 − 6ρ + 7)(1 + ρ)2 1
+ · 4
(1 − ρ)2 u
 
(12 − 13ρ + 8ρ − 2ρ )(1 + ρ)3 1
2 3
1
−3 · + O , (6.A.11)
(1 − ρ)3 u6 u8
and
 
2π u2 1 − ρ2 1+ρ u2 (2 − ρ)(1 + ρ) 1
1/L(u, u; ρ) = ·e 1+ · 2
(1 + ρ)2 1−ρ u
3 − 2ρ + ρ )(1 + ρ)
2 2
1
− · 4
(1 − ρ) 2 u
 
(16 − 13ρ + 10ρ2 − 3ρ3 )(1 + ρ)3 1 1
+ · 6 +O . (6.A.12)
(1 − ρ)3 u u8

6.A.2 Asymptotic Behavior of the First Moment m10

The first moment m10 = E[X | X > u, Y > u] is given by (6.A.3). For large
u,
  /
1−ρ 1 1−ρ
1−Φ u = erfc u (6.A.13)
1+ρ 2 2(1 + ρ)
 1−ρ 2   2
1 + ρ e− 2(1+ρ) u 1+ρ 1 1+ρ 1
= √ 1− · 2 +3 · 4
1−ρ 2π u 1−ρ u 1−ρ u
 3  
1+ρ 1 1
−15 · 6 +O , (6.A.14)
1−ρ u u8
so that multiplying by (1 + ρ) φ(u), we obtain
u2 
(1 + ρ)2 e− 1+ρ 1+ρ 1
m10 L(u, u; ρ) =  1− ·
1 − ρ2 2π u 1 − ρ u2
 2  3  
1+ρ 1 1+ρ 1 1
+3 · 4 − 15 · 6 +O . (6.A.15)
1−ρ u 1−ρ u u8
Using the result given by equation (6.A.11), we can conclude that
1 (1 + ρ)2 (2 − ρ) 1
m10 = u + (1 + ρ) · − · 3
u (1 − ρ) u
 
(10 − 8ρ + 3ρ2 )(1 + ρ)3 1 1
+ · 5 +O . (6.A.16)
(1 − ρ) 2 u u7
264 6 Measuring Extreme Dependences

In the sequel, we will also need the behavior of m10 2 :


(1 + ρ)2 (3 − ρ) 1
m10 2 = u2 + 2 (1 + ρ) − · 2
(1 − ρ) u
 
(8 − 5ρ + 2ρ2 )(1 + ρ)3 1 1
+2 · 4 +O . (6.A.17)
(1 − ρ) 2 u u6

6.A.3 Asymptotic Behavior of the Second Moment m20

The second moment m20 = E[X 2 | X > u, Y > u] is given by expression


(6.A.4). The first term in the right hand side of (6.A.4) yields
    u2
1−ρ 1 + ρ e− 1+ρ
(1 + ρ ) u ϕ(u) 1 − Φ
2
u 2
= (1 + ρ ) ×
1+ρ 1 − ρ 2π
  2  3  
1+ρ 1 1+ρ 1 1+ρ 1 1
1− · 2 +3 · 4 − 15 · 6 +O
1−ρ u 1−ρ u 1−ρ u u8
(6.A.18)
while the second term gives
   u2
ρ 1 − ρ2 2  e− 1+ρ
√ ϕ u = ρ 1 − ρ2 . (6.A.19)
2π 1+ρ 2π

Putting these two expressions together and factorizing the term (1+ρ)/(1+ρ2 )
gives
u2 
(1 + ρ)2 e− 1+ρ 1 + ρ2 1 (1 + ρ2 )(1 + ρ) 1
m20 L(u, u; ρ) =  1− · 2 +3 · 4
1 − ρ2 2π 1−ρ u (1 − ρ)2 u
2 2
 
(1 + ρ )(1 + ρ) 1 1
−15 · 6 +O + L(u, u; ρ), (6.A.20)
(1 − ρ)3 u u8
which finally yields
(1 + ρ)2 1
m20 = u2 + 2 (1 + ρ) − 2 · 2
1−ρ u
3 2
 
(5 + 4ρ + ρ )(1 + ρ) 1 1
+2 +O . (6.A.21)
(1 − ρ)2 u4 u6

6.A.4 Asymptotic Behavior of the Cross Moment m11

The cross moment m11 = E[X · Y | X > u, Y > u] is given by expression


(6.A.5). The first and second terms in the right-hand side of (6.A.5) respec-
tively give
Appendix 265
 u2
− 1+ρ
1+ρ e
2ρ u ϕ(u)[1 − Φ(u)] = 2ρ ×
1 − ρ 2π
  2  3  
1+ρ 1 1+ρ 1 1+ρ 1 1
1− · 2 +3 · 4 − 15 · 6 +O , (6.A.22)
1−ρ u 1−ρ u 1−ρ u u8

    u2
1 − ρ2 2 e− 1+ρ
√ φ u = 1−ρ2 , (6.A.23)
2π 1+ρ 2π

which, after factorization by (1 + ρ)/ρ, yields


u2 
(1 + ρ)2 e− 1+ρ ρ 1 ρ(1 + ρ) 1
m11 L(u, u; ρ) =  1−2 · +6 ·
1 − ρ2 2π 1 − ρ u2 (1 − ρ)2 u4
 
ρ(1 + ρ)2 1 1
−30 · 6 +O + ρ L(u, u; ρ), (6.A.24)
(1 − ρ)3 u u8
and finally
(1 + ρ)2 (3 − ρ) 1
m11 = u2 + 2 (1 + ρ) − · 2
(1 − ρ) u
 
(16 − 9ρ + 3ρ )(1 + ρ)
2 3
1 1
+ · 4 +O . (6.A.25)
(1 − ρ)2 u u6

6.A.5 Asymptotic Behavior of the Correlation Coefficient

The conditional correlation coefficient conditioned on both X and Y larger


than u is defined by (6.A.2). Using the symmetry between X and Y , we have
m10 = m01 and m20 = m02 , which allows us to rewrite (6.A.2) as follows:

m11 − m10 2
ρu = . (6.A.26)
m20 − m10 2
Putting together the previous results, we have
 
(1 + ρ)2 (4 − ρ + 3ρ2 + 3ρ3 )(1 + ρ)2 1 1
m20 − m10 2 = − 2 + O ,
u2 1−ρ u4 u6
(6.A.27)
3
 
(1 + ρ) 1 1
m11 − m10 2 =ρ · 4 +O , (6.A.28)
1−ρ u u6
which proves that
 
1+ρ 1 1
ρu = ρ · +O and ρ ∈ [−1, 1). (6.A.29)
1 − ρ u2 u4
266 6 Measuring Extreme Dependences

6.B Conditional Correlation Coefficient for Student’s Variables

6.B.1 Proposition

Let us consider a pair of Student’s random variables (X, Y ) with ν > 2 degrees
of freedom and unconditional correlation coefficient ρ. Let A be a subset of R
such that Pr{Y ∈ A} > 0. The correlation coefficient of (X, Y ), conditioned
on Y ∈ A defined by
Cov(X, Y | Y ∈ A)
ρA =   (6.B.30)
Var(X | Y ∈ A) Var(Y | Y ∈ A)

can be expressed as
ρ
ρA = 0 , (6.B.31)
E[E(x2 | Y )−ρ2 Y 2 | Y ∈A]
ρ2 + Var(Y | Y ∈A)

with
⎡ +0 , ⎤
ν
Pr Y ∈A | ν−2
⎢ν − 1 ν−2 ⎥
Var(Y | Y ∈ A) = ν ⎣ · − 1⎦
ν−2 Pr{Y ∈ A | ν}
 2
y∈A
dy y · ty (y)
− , (6.B.32)
Pr{Y ∈ A | ν}

where tν (y) is given below by (6.B.36) and


+0 ,
ν
ν Pr ν−2
Y ∈A | ν−2
E[E(X 2 | Y ) − ρ2 Y 2 | Y ∈ A] = (1 − ρ2 ) · . (6.B.33)
ν−2 Pr{Y ∈ A | ν}

6.B.2 Proof of the Proposition

Let the variables X and Y have a multivariate Student distribution with ν > 2
degrees of freedom and a correlation coefficient ρ :
   − ν+2
Γ ν+2 x2 − 2ρxy + y 2 2

PXY (x, y) =  ν+1  


2
1+ , (6.B.34)
νπ Γ 2 1 − ρ2 ν (1 − ρ2 )
 1/2  1/2 
ν+1 1 ν+1 x − ρy
=  tν (y) · tν+1  , (6.B.35)
ν + y2 1 − ρ2 ν + y2 1 − ρ2
where tν (·) denotes the univariate Student density with ν degrees of freedom

 
Γ ν+1 1 Cν
tν (x) = ν  2 · ν+1 =  . (6.B.36)
Γ 2 (νπ)1/2 1 + x2  2 1 + x2
 ν+1
2
ν ν
Appendix 267

Let us evaluate Cov(X, Y | Y ∈ A):


Cov(X, Y | Y ∈ A) = E(X · Y | Y ∈ A) − E(X | Y ∈ A) · E(Y | Y ∈ A)
= E(E(X | Y ) · Y | Y ∈ A) − E(E(X | Y ) | Y ∈ A) · E(Y | Y ∈ A). (6.B.37)
As it can be seen in equation (6.B.35), E(X | Y ) = ρY , which gives
Cov(X, Y | Y ∈ A) = ρ · E(Y 2 | Y ∈ A) − ρ · E(Y | Y ∈ A)2 , (6.B.38)
= ρ · Var(Y | Y ∈ A) . (6.B.39)
Thus, we have
/
Var(Y | Y ∈ A)
ρA = ρ . (6.B.40)
Var(X | Y ∈ A)

Using the same method as for the calculation of Cov(X, Y | Y ∈ A), we


find
Var(X | Y ∈ A) = E[E(X 2 | Y ) | Y ∈ A)] − E[E(X | Y ) | Y ∈ A)]2 ,
= E[E(X 2 | Y ) | Y ∈ A)] − ρ2 · E[Y | Y ∈ A]2 , (6.B.41)
= E[E(X 2 | Y ) − ρ2 Y 2 | Y ∈ A)] − ρ2 · Var[Y | Y ∈ A] ,
which yields (6.B.31).
To go one step further, we have to evaluate the three terms E(Y | Y ∈ A),
E(Y 2 | Y ∈ A), and E[E(X 2 | Y ) | Y ∈ A].
The first one is trivial to calculate :

y∈A
dy y · ty (y)
E(Y | Y ∈ A) = . (6.B.42)
Pr{Y ∈ A | ν}
The second one gives

y∈A
dy y 2 · ty (y)
E(Y 2
| Y ∈ A) = , (6.B.43)
Pr{Y ∈ A | ν}
⎡ +0 , ⎤
Pr ν
Y ∈ A | ν − 2
⎢ν − 1 ν−2 ⎥
= ν⎣ · − 1⎦ , (6.B.44)
ν−2 Pr{Y ∈ A | ν}

so that
⎡ +0 , ⎤
Pr ν
ν−2 Y ∈A | ν−2
⎢ν − 1 ⎥
Var(Y | Y ∈ A) = ν ⎣ · − 1⎦
ν−2 Pr{Y ∈ A | ν}
 2
y∈A
dy y · ty (y)
− . (6.B.45)
Pr{Y ∈ A | ν}

To calculate the third term, we first need to evaluate E(X 2 | Y ). Using


equation (6.B.35) and the results given in [1], we find
268 6 Measuring Extreme Dependences
  1/2  1/2 
ν+1 x2 ν+1 x − ρy
E(X | Y ) =
2
dx  · tν+1  ,
ν + y2 1 − ρ2 ν + y2 1 − ρ2
ν + y2
= (1 − ρ2 ) − ρ2 y 2 , (6.B.46)
ν−1
which yields

ν 1 − ρ2
E[E(X 2 | Y )−ρ2 Y 2 | Y ∈ A] = (1−ρ2 )+ E[Y 2 | Y ∈ A] , (6.B.47)
ν−1 ν−1
and applying the result given in equation (6.B.44), we finally obtain
+0 ,
ν
ν Pr ν−2
Y ∈A | ν−2
E[E(X 2 | Y ) − ρ2 Y 2 | Y ∈ A] = (1 − ρ2 ) · , (6.B.48)
ν−2 Pr{Y ∈ A | ν}

which concludes the proof.

6.B.3 Conditioning on Y Larger Than v

The conditioning set is A = [v, +∞), thus


ν−1 Cν

−(ν+2)
Pr{Y ∈ A | ν} = T̄ν (v) = ν 2 + O v , (6.B.49)

 : 
ν ν−p
Pr Y ∈A | ν−p = T̄ν−p v
ν−p ν
ν
ν−p
2 Cν−p

−(ν−p+2)
= 1 + O v , (6.B.50)
(ν − p) 2 v ν−p

  
ν ν−2
dy y · ty (y) = tν−2 v
y∈A ν−2 ν
ν2
ν
Cν−2

= √ + O v −(ν−3) , (6.B.51)
ν − 2 v ν−1
where tν (·) and T̄ν (·) denote respectively the density and the Student survival
distribution with ν degrees of freedom and Cν is defined in (6.B.36).
Using equation (6.B.31), one can thus give the exact expression of ρ+ v.
Since it is very cumbersome, we will not write it explicitly. We will only give
the asymptotic expression of ρ+ v:
ν
Var(Y | Y ∈ A) = v 2 + O(1) (6.B.52)
(ν − 2)(ν − 1)2

ν 1 − ρ2 2
E[E(X | Y ) − ρ Y | Y ∈ A] =
2 2 2
v + O(1) . (6.B.53)
ν−2 ν−1
Appendix 269

Thus, for large v,


ρ
v −→ 
ρ+ 0 . (6.B.54)
ρ + (ν − 1)
2
ν (1 − ρ )
ν−2 2

6.B.4 Conditioning on |Y | Larger Than v

The conditioning set is now A = (−∞, −v] ∪ [v, +∞), with v ∈ R+ . Thus, the
right-hand sides of equations (6.B.49) and (6.B.50) have to be multiplied by
two while

dy y · ty (y) = 0 , (6.B.55)
y∈A

for symmetry reasons. So, equation (6.B.53) still holds while


ν
Var(Y | Y ∈ A) = v 2 + O(1) . (6.B.56)
(ν − 2)

Thus, for large v,


ρ
ρsv −→  0 . (6.B.57)
ρ2 + 1
(ν−1)
ν−2
ν (1 − ρ2 )

6.B.5 Conditioning on Y > v Versus on |Y | > v

The results (6.B.54) and (6.B.57) are valid for ν > 2, as one can expect since
the second moment must exist for the correlation coefficient to be defined.
Contrarily to the Gaussian case, the conditioning set is not really important.
Indeed with both conditioning set, ρ+ s
v and ρv go to constants different from
zero and (plus or minus) one, when v goes to infinity. This striking difference
with the Gaussian case can be explained by the large fluctuations allowed by
the Student’s distribution, and can be related to the fact that the coefficient of
tail dependence for this distribution does not vanish even though the variables
are anticorrelated (see Sect. 4.5.3).
Contrarily to the Gaussian distribution which binds the fluctuations of the
variables near the origin, the Student’s distribution allows for “wild” fluctu-
ations. These properties are thus responsible for the result that, contrarily to
the Gaussian case for which the conditional correlation coefficient goes to zero
when conditioned on large signed values and goes to one when conditioned on
large unsigned values, the conditional correlation coefficient for Student’s vari-
ables have a similar behavior in both cases. Intuitively, the large fluctuations
of X for large v dominate and control the asymptotic dependence.
270 6 Measuring Extreme Dependences

6.C Conditional Spearman’s Rho

To obtain the conditional Spearman’s rho defined in (6.23), we need a few


intermediate calculations. We have
11  1 1
· dC(u, v) 1
E[· | V ≥ ṽ] = ṽ 10 1 = · dC(u, v) . (6.C.58)
dC(u, v) 1 − ṽ ṽ 0
ṽ 0

Performing a simple integration by parts, we obtain


 1 
1 1
E[U | V ≥ ṽ] = 1+ du C(u, ṽ) − , (6.C.59)
1 − ṽ 0 2
1 + ṽ
E[V | V ≥ ṽ] = , (6.C.60)
2
 1 
2 1
E[U 2 | V ≥ ṽ] = 1 + du u C(u, ṽ) − , (6.C.61)
1 − ṽ 0 3
ṽ 2 + ṽ + 1
E[V 2 | V ≥ ṽ] = , (6.C.62)
3
 1  1
1 + ṽ 1
E[U · V | V ≥ ṽ] = + dv du C(u, v)
2 1 − ṽ ṽ 0
 1 
1
+ṽ du C(u, ṽ) − , (6.C.63)
0 2
which yields
 1  1  1
1 1 1
Cov(U, V | V ≥ ṽ) = dv du C(u, v) − du C(u, ṽ) − ,
1 − ṽ ṽ 0 2 0 4

1 − 4ṽ 2 1
Var(U | V ≥ ṽ) = + du u C(u, ṽ)
12 (1 − ṽ)2 1 − ṽ 0
 1  1 2
2ṽ − 1 1
+ du C(u, ṽ) − du C(u, ṽ) ,
(1 − ṽ)2 0 (1 − ṽ)2 0
(1 − ṽ)2
Var(V | V ≥ ṽ) = ,
12
so that
1 1 1
12
dv du C(u, v) − 6 du C(u, ṽ) − 3
ρs (ṽ) = 1−ṽ ṽ 0
√ 0
, (6.C.64)
D
with
 1
D = 1 − 4ṽ + 24 (1 − ṽ) du u C(u, ṽ)
0
 1  1 2
+ 12 (2ṽ − 1) du C(u, ṽ) − 12 du C(u, ṽ) .
0 0
7
Summary and Outlook

7.1 Synthesis
A common theme underlying the chapters of this book is that many impor-
tant applications of risk management rely on the assessment of the positive
or negative outcomes of uncertain positions. The probability theory and sta-
tistics, together with the valuation of losses incurred for a given exposition to
various risk factors, take a predominant place in this process. However, they
are not, by far, the sole ingredients needed in an efficient risk management
system. Quoting Andrew Lo [311], one can assert that
[Although most] current risk-management practices are based on prob-
abilities of extreme dollar losses (e.g., measures like Value-at-Risk),
[. . . ] these measures capture only part of the story. Any complete risk
management system must address two other important factors: prices
and preferences. Together with probabilities, these comprise the three
P’s of Total Risk Management. [Understanding] how the three P’s in-
teract [allows] to determine sensible risk profiles for corporations and
for individuals, guidelines for how much risk to bear and how much
to hedge. By synthesizing existing research in economics, psychology,
and decision sciences, and through an ambitious research agenda to
extend this synthesis into other disciplines, a complete and systematic
approach to rational decision making in an uncertain world is within
reach.
Among the three P’s, Probability constitutes today, in our opinion, the
most solid pillar of risk management, because it has reached the highest level
of maturation. Compared with Price and Preference, the Probability theory is
clearly the most developed in terms of its mathematical formulation, providing
important and accurate quantitative results.
Asset valuation – and therefore Price assessment – is also very developed
quantitatively, but it remains, for a large part, subordinated to the quality of
the estimation of the probabilities. Indeed, a cornerstone of modern finance
272 7 Summary and Outlook

theory holds that the (fair) value of a given investment vehicle is nothing
but the mathematical expectation – under a suitable probability measure –
of the future discounted cash-flows generated by this investment vehicle. The
assessment of future cash-flows for complex investment vehicles is nothing
but an exercise of pure financial analysis but, without a correct probability
measure, this exercise has little value. This indubitably shows that Prices and
Probabilities are inextricably entangled and that an accurate price assessment
requires an accurate determination of the probabilities.
Preferences are also of crucial importance – in fact the most important of
the three P’s, according to Lo – since under this term is embodied the entire
human decision making process. But here, in contrast to the two other P’s,
our knowledge is still in its infancy. The pioneering theoretical work by Von
Neuman and Morgenstern [482] has laid the foundations of a rational decision
theory. However, this theory has been undermined over the years by several
paradoxes and deficiencies [4, 5], when tested against real human preferences.
Most of the recent theories, notably those directly inspired by psychological
studies [204, 263, 352, and references therein], attempt to cure the original
rational decision theory from its inconsistencies. But, one should recognize
that, while significant qualitative progress has been obtained, there is not yet
a satisfying fully operational theory of decision making.
For all these reasons, Probability still plays a dominant role in current
risk management practice. And we firmly believe that this supremacy will
extend well into the future, in view of the still large remaining potential for
improvement. Of course, the modern science of human psychology and decision
making is in constant progression and accounts better and better for the many
anomalies observed on financial markets. However, its fusion with finance,
which has given birth to the field of “behavioral finance,” will provide useful
practical tools only with the development of accurate quantitative predictions.
Until then, behavioral finance will continue to be mostly the playground of
academic research. We thus believe that, in the next few years, the most
important improvements in applied risk management will occur through more
elaborate modeling of financial markets and more generally of the economic
environment.
In spite of the key role of Price and Preference, this book has mainly
focused on the role of Probability in the risk assessment and management
processes. The different probabilistic concepts presented in the core chapters
of this book should provide a better understanding and modeling of the various
sources of uncertainty and therefore of risk factors that investors are facing.
Our presentation has been organized around the key idea that the risk of a
set of positions can be decomposed into two major components:
(i) the marginal risks associated with the variations of wealth of each risky
position,
(ii) the cross-dependence between the change in the wealth of each position.
7.1 Synthesis 273

This decomposition has been justified in Chap. 3 by the introduction of


the notion of copula, which is the pivot of the book.
With respect to the marginal risks, Chap. 2 has highlighted the weaknesses
of traditional methods used to assess large downside risks, which are generally
based upon tools derived from the extreme value theory. As an alternative, we
have advocated using comprehensive parametric distributions which provide a
good compromise between weak model errors – inherent to any parametric risk
measurement – and accurate risk estimates. Accounting for model error is of
vital importance for risk management purposes. This aspect is often forgotten,
but it really plays a prominent role in the risk assessment process. Consider
for instance the VaR estimates obtained under the Gaussian hypothesis. We
have seen that, for large confidence levels, they have only little value and, yet,
this class of models is still promoted by regulating institutions such as the
Bank for International Settlements [42].
Concerning the cross-dependence between assets, we have emphasized that
copulas are the most fundamental concept and tool. They should therefore
constitute a cornerstone of modern risk management practices. Among many
properties, this results in particular from the unique and optimal separation
between individual risks and collective dependence that they provide. As a
consequence, copulas have been shown to exhibit an unsurpassed flexibility
and versatility for the elaboration of scenarios. This comes, however, at the
cost of the preliminary calibration of the best copula, a problem which has not
yet received a fully satisfying answer, as reviewed in Chap. 5. Nevertheless,
the few families of copulas, which have been surveyed in Chap. 3, appear to
be reasonable candidates for modeling the dependence structures of arbitrary
baskets of financial assets and therefore allow for a relatively easy and useful
generation of case studies.
Chapters 5 and 6 have unearthed an important and somewhat surprising
difference in the dependence between currencies and between stocks. Overall,
the analyses which have been presented find a much weaker dependence be-
tween stocks than between currencies. This is reflected quantitatively by the
fact that the dependence between currencies can be described by Student’s
copulas with a low number of degrees of freedom (typically 4 to 6). In contrast,
the dependence between stock returns require a larger number of degrees of
freedom (10 or more). This observation raises important questions for interna-
tionally diversified portfolios. Indeed, if the dependence between the returns
of foreign exchange rates is much stronger than the dependence between stock
returns as reported in Chap. 5, the benefits of international diversification can
vanish. It is true that various national stock markets exhibit anticorrelations
between some stocks and at some epochs, which a priori would justify holding
stocks from different national markets. However, if the corresponding curren-
cies are positively associated with strong tail dependences, as in the Latin
American markets (see Chap. 6), the diversification effect mostly disappears
once the gains or losses of the stocks are translated into the same monetary
unit. In particular, once expressed in the domestic currency of the holder of
274 7 Summary and Outlook

an internationally allocated portfolio, the diversification effect seems to dis-


appear at times of turmoil, that is, exactly when the investor needs it the
most.
Chapter 6 has examined in depth this question of possible changes of de-
pendence during financial crises. Two possible explanations have been consid-
ered: (i) contamination operating via the sensitivity of dependence measures
to the changing volatility level, or (ii) contagion reflecting a genuine change
of dependence. The later case leads to inefficient allocations when neglected.
However, the most important conclusion for risk management is not so much
the distinction between contamination or contagion but the presence of a
strong tail dependence that may exist between national markets, because it
destroys the benefits of diversification across countries when one market goes
in crisis.

7.2 Outlook and Future Directions


Our exposition has mainly focused on the concept of cross-sectional depen-
dence between several random variables, but there are many aspects of the
question, such as time dependencies, which have been only barely touched
or which have been actually neglected. It may be useful to discuss them so
as to provide a better appreciation of the limits of the methods proposed in
this book and consequently of their domains of application. It is also useful
to delineate possible future exciting directions for future improvements of the
risk management practice.

7.2.1 Robust and Adaptive Estimation of Dependences

A major concern, especially for practitioners, is whether this whole math-


ematical edifice, its algorithmic implementations and its rigorous statistical
tests are relevant and useful for the really important risks, such as global
market moves and crashes. It is indeed a common experience that the depen-
dence estimated and predicted by standard models change dramatically at
certain times, not only during crashes, but also when the market exhibits a
collective downward plunge. A quite common observation is that investment
strategies, which have some moderate β (coefficient of regression to the mar-
ket) for normal times, can see their β jumps to a much larger value (close to 1
or larger depending on the leverage of the investment) at certain times when
the market collectively dives. However, investments which are thought to be
hedged against negative global market trends may actually lose as much or
more than the global market, at certain times when a large majority of stocks
plunge simultaneously.
This question has been touched upon in Chap. 4 when discussing the
possible strategies for preventing the large downward moves of a portfolio,
based upon its tail dependence with the market. We will also revisit this
7.2 Outlook and Future Directions 275

problem in the context of the occurrence of “outliers” and of time-varying


dependence. But it contains two other components: (1) Is the estimation of
the dependence meaningful and really robust? (2) If not, what can be done?
The first question deals with the development of robust estimation tech-
niques, defined as methods which are insensitive to small departures from the
idealized assumptions which have been used to optimize the algorithm. Such
techniques include M-estimates (which follow from maximum likelihood con-
siderations), L-estimates (which are linear combinations of order statistics),
and R-estimates (based on statistical rank tests) [238, 479, 484].
The second question requires novel approaches. A possible one is inspired
from Herbert A. Simon, the famous economist and cognitive scientist studying
how people make real-world decisions, who observed that they seldom opti-
mize. “Rather people seek strategies that will work well enough, that include
hedges against various potential outcomes and that are adaptive. Tomorrow
will bring information unavailable today; therefore, people plan on revising
their plans” summarize Popper et al. [391]. In this spirit, people have de-
veloped an approach to look not for optimal strategies but for robust ones,
defined as strategies which perform well when compared with the alterna-
tives across a wide range of plausible futures. “It need not be the optimal
strategy in any future; it will, however, yield satisfactory outcomes in both
easy-to-envision futures and hard-to-anticipate contingencies. This approach
replicates the way people often reason about complicated and uncertain de-
cisions in everyday life” says Popper et al. The process of decision-making
under conditions of deep uncertainty requires first to consider ensembles of
scenarios, then to seek robust and adaptive strategies, and finally to combine
machine and human capabilities interactively. Outstanding questions involve
the compromise between near-term objectives and long-term sustainability,
and the characterization of irreducible risks and of “surprises” [301].
In the same vein, the approach in terms of universal portfolios initiated by
Cover [113] a decade ago has opened the way to many studies [63, 227, 261].
Assuming for instance that one invests in a constantly rebalanced strategy,
the question amounts to determining the best weights (the fraction of wealth
invested in each stock) of this strategy for an investment horizon T . Since
the optimal weights can only be assessed ex-ante – once the time T has been
reached – it seems impossible to design ex-ante a strategy whose performance
will compare with the performance of the best strategy with hindsight. How-
ever, it turns out that the universal approach promoted by Cover circumvents
this problem. It consists in investing uniformly in all constantly rebalanced
portfolio strategies. This results in a strategy that is nearly optimal in the
sense that, for any sequence of stock market outcomes, this particular invest-
ment strategy has a performance comparable to the best constantly rebalanced
portfolio in the long run. More generally, any universal strategy is such that
its average logarithmic performance over horizon T approaches the best ex-
post average logarithmic performance over the same horizon T , in the limit of
long horizon T , irrespective of the market price sequence from now to time T .
276 7 Summary and Outlook

7.2.2 Outliers, Kings, Black Swans and Their Dependence

In its conclusion, Chap. 2 notes the existence of “outliers” (also called “kings”
or “black swans”), in the distribution of financial risks measured at variable
time scales such as with drawdowns. These outliers are identified only with
metrics adapted to take into account transient increases of the time depen-
dence in the time series of returns of individual financial assets [249] (see also
Chap. 3 of [450]). These outliers seem to belong to a statistical population
which is different from the bulk of the distribution and require some additional
amplification mechanisms active only at special times.
Chapter 5 shows that two exceptional events in the period from January
1989 to December 1998 stand out in statistical tests determining the relevance
of the Gaussian copula to describe the dependence between the German Mark
and the Swiss Franc. The first of the two events is the coup against Gorbachev
in Moscow on 19 August, 1991 for which the German mark (respectively the
Swiss Franc) lost 3.37% (respectively 0.74%) against the US dollar. The second
event occurred on 10 September, 1997, and corresponds to an appreciation of
the German Mark of 0.60% against the US dollar while the Swiss Franc lost
0.79% which represents a moderate move for each currency, but a large joint
move.
The presence of such outliers both in marginal distributions and in con-
comitant moves, together with the strong impact of crises and of crashes,
suggests the need for novel measures of dependence between drawdowns and
other time-varying metrics across different assets. This program is part of
the more general need for a joint multi-time-scale and multi-asset approach
to dependence. Examples of efforts in this direction include multidimensional
GARCH models [23, 45, 46, 154, 296, 400, 477] and the multivariate multi-
fractal random walk [366]. It also epitomizes the need for new multi-period
risk measures, which would account for this class of events. Several avenues
of research have recently been opened by attempting to generalize the no-
tions of Value-at-Risk and of coherent measures of risk within a multi-period
framework [21, 405, 483].

7.2.3 Endogeneity Versus Exogeneity

The presence of outliers such as those mentioned in the previous section poses
the problem of exogeneity versus endogeneity. An event identified as anom-
alous could perhaps be cataloged as resulting from exogenous influences.1
The same issue has been investigated in Chap. 6 when testing for contagion
versus contamination in the Latin American crises. Contamination refers to
an endogenous dependence described by an approximately constant copula.
In contrast, contagion is by definition the concept that the dependence has
1
However, outliers may also have an endogenous origin, as described for financial
crashes [250, 449, 450].
7.2 Outlook and Future Directions 277

changed, either transiently or with lasting effects, due to some influence or


mechanism which is exogenous (in a sense that needs to be defined precisely)
to the previous regime.
The concept of exogeneity2 is fundamental in empirical econometric mod-
eling and statistical estimation (see for instance [152, 156]). Here, we refer
to the question of exogeneity versus endogeneity in the broader context of
self-organized criticality3 [32, 231, 246, 451], inspired in particular from the
physical and natural sciences. According to self-organized criticality, extreme
events are seen to be endogenous, in contrast with previous prevailing views
(see for instance the discussion in [33, 448]). But, how can one assert with
100% confidence that a given extreme event is really due to an endogenous
self-organization of the system, rather than to the response to an external
shock? Most natural and social systems are indeed continuously subjected
to external stimulations, noises, shocks, solicitations, and forcing, which can
widely vary in amplitude. It is thus not clear a priori if a given large event
is due to a strong exogenous shock, to the internal dynamics of the system,
or maybe to a combination of both. Addressing this question is fundamental
for understanding the relative importance of self-organization versus external
forcing in complex systems and underpins much of the problem of dependence
between variables.
The question, whether distinguishing properties characterize endogenous
versus exogenous shocks, permeates many systems, for instance, biological
extinctions such as the Cretaceous/Tertiary KT boundary (meteorite versus
extreme volcanic activity versus self-organized critical extinction cascades),
commercial successes (progressive reputation cascade versus the result of a
well-orchestrated advertisement), immune system deficiencies (external vi-
ral/bacterial infections versus internal cascades of regulatory breakdowns),
the aviation industry recession (9/11 versus structural endogenous problems),
discoveries (serendipity versus the outcome of slow endogenous maturation
processes), cognition and brain learning processes (role of external inputs ver-
sus internal self-organization and reinforcements) and recovery after wars (in-
ternally generated – i.e., civil wars – versus imported from the outside) and
so on. In economics, endogeneity versus exogeneity has been hotly debated
for decades. A prominent example is the theory of Schumpeter [432] on the
importance of technological discontinuities in economic history. Schumpeter
argued that “evolution is lopsided, discontinuous, disharmonious by nature...
studded with violent outbursts and catastrophes. . . more like a series of ex-
plosions than a gentle, though incessant, transformation”. Endogeneity versus
exogeneity is also paramount in economic growth theory [415].
Several evidences of quantitative signatures distinguishing exogenous from
endogenous shocks have recently been described. Concerning the way the
2
In a nutshell, conditioning on an exogenous variable does not decrease the amount
of information in parameter estimation [152].
3
Self-organized criticality is part of the theory of complex systems.
278 7 Summary and Outlook

continuous stream of news gets incorporated into market prices for instance,
it has recently been shown how one can distinguish the effects of events like
the 11 September, 2001 attack or the coup against Gorbachev on 19 August,
1991 from events like financial crashes such as October, 1987 as well as smaller
volatility bursts. Based on a stochastic volatility model with long range de-
pendence (the so-called “multifractal random walk”, whose main properties
are given in Appendix 2.A), Sornette et al. [456] have predicted different re-
sponse functions of the volatility to large external shocks compared with what
we term endogenous shocks, i.e., which result from the cooperative accumu-
lation of many small news. This theory, which has been successfully tested
against empirical data with no adjustable parameters, suggests a general clas-
sification into two classes of events (endogenous and exogenous) with specific
signatures and characteristic precursors for the endogenous class. It also pro-
poses a simple origin for endogenous shocks as the accumulations, in certain
circumstances, of tiny bad news that add coherently due to their persistence.
Another example supporting the existence of specific signatures distin-
guishing endogenous and exogenous events has been provided by a recent in-
vestigation concerning the origin of the success of best sellers [128, 455]. The
question is whether the latest best seller is simply the product of a clever mar-
keting campaign or if it has truly permeated society? In other words, can one
determine whether a book’s popularity will wane as quickly as it appeared
or will it become a classic for future generations? The study in [455, 128]
describes a simple and generic method that distinguishes exogenous shocks
(e.g., very large news impact) from endogenous shocks (e.g., book that be-
comes a best seller by word of mouth) within the network of online buyers.
An endogenous shock appears slowly but results in a long-lived growth and
decay of sales due to small but very extensive interactions in the network of
buyers. In contrast, while an exogenous shock appears suddenly and propels
a book to best seller status, these sales typically decline rapidly as a power
law with exponent larger than for endogenous shocks. These results suggest
that the network of human acquaintances is close to “critical,” with informa-
tion neither propagating nor disappearing but spreading marginally between
people. These results have interesting potential for marketing agencies, which
could measure and maximize the impact of their publicity on the network of
potential buyers, for instance.
These two examples show that the concepts of endogeneity and exogene-
ity should have many applications including the modeling and prediction of
financial crashes [250, 458], Initial Public Offerings (IPO) [245], the movie
industry [119] and many other domains related to marketing [452], for which
the mechanism of information cascade derives from the fact that agents can
observe box office revenues and communicate word of mouth about the quality
of the movies they have seen. The formulation of a comprehensive theory of
(time) dependence allowing to characterize endoneity and exogeneity and to
distinguish between them is thus of great importance in future developments.
7.2 Outlook and Future Directions 279

7.2.4 Nonstationarity and Regime Switching in Dependence

The general problem of the application of mathematical statistics to nonsta-


tionary data (including nonstationary time series) is very important, but alas,
not much can be done. There are only a few approaches which may be used
and only in specific conditions:
• Use of algorithms and methods which are robust with respect to possible
nonstationarity in data, such as normalization procedures or the use of
quantile samples instead of initial samples.
• Model nonstationarity by some low-frequency random processes, such as,
e.g., a narrow-band random process X(t) = A(t) cos(ωt + φ(t)) where
ω  1 and A(t) and phase φ(t) are slowly varying amplitude and phase.
In this case, the Hilbert transform can be very useful to characterize φ(t)
nonparametrically [79, 379].
• The estimation of the parameters of a low-frequency process based on a
“short” realization is often hopeless. In this case, the only quantity which
can be evaluated is the uncertainty (or scatter) of the results due to the
nonstationarity.
Regime switching popularized by Hamilton [221] for autoregressive time
series models is a special case of nonstationary, which can be handled with
specific methods. Regime switching has been extensively used in business cycle
analysis in order to describe the economic fluctuations in a rigorous statistical
framework. The key idea is that the parameters of a model may switch between
two (or more) regimes, where the switching is governed by a time-dependent
state variable St which takes typically two values 0 or 1. When St = 0, the
parameters of the model are different from those when St = 1. Clearly, if St
were an observed variable, the parameters could simply be estimated using
dummy variable methods. Regime-switching methods rely on the observation
by Hamilton that, even when the state is unobservable, the parameters of the
model in each state can be estimated provided that restrictions are placed
on the probability process governing St . The simplest such restriction is to
assume that St obeys the dynamics of a first-order Markov chain, which means
that any persistence in the state is completely embodied in the value of the
state in the previous period. Many generalizations are under development.
In recent studies, many workers have extended this idea to model regime
switches in the dependence structure of financial assets. We refer to [382, 498]
for a perspective of recent efforts and references therein. Developing these
ideas in the framework of copulas is a promising avenue for future research.
Regime switching is also appealing from a microeconomic viewpoint as it
may reflect the changing conventions used by investors. Conventions can be
formed by the belief of agents on the existence of correlations between infor-
mation and returns for instance. Following this belief, agents try to estimate
this correlation from past time series and act on it, thus creating it [494]. An-
other mechanism for conventions is based on imitation and moods [460]. Both
280 7 Summary and Outlook

mechanisms predict the existence of random abrupt changes. For the future,
it would be interesting to combine both mechanisms as they are arguably
present together in real markets, in order to clarify their relative importance
and interplay. Another important field of research is to combine these micro-
economic models with the tools developed to detect regime switching.

7.2.5 Time-Varying Lagged Dependence

Determining the arrow of causality between two time series X(t) and Y (t) has
a long history, especially in economics, econometrics and finance and it is often
asked which economic variable might influence other economic phenomena
[93, 199]. This question is raised in particular for the relationships between
respectively inflation and GDP, inflation and growth rate, interest rate and
stock market returns, exchange rate and stock prices, bond yields and stock
prices, returns and volatility [95], advertising and consumption and so on. One
simple naive measure is the lagged cross-correlation function

Cov [X(t)Y (t + τ )]
CX,Y (τ ) =  .
Var[X]Var[Y ]

Then, a maximum of CX,Y (τ ) at some nonzero positive time lag τ implies


that the knowledge of X at time t gives some information on the future real-
ization of Y at the later time t + τ . However, such correlations do not imply
necessarily causality in a strict sense as a correlation may be mediated by a
common source influencing the two time series at different times. The concept
of Granger causality bypasses this problem by taking a pragmatic approach
based on predictability: if the knowledge of X(t) and of its past values im-
proves the prediction of Y (t + τ ) for some τ > 0, then it is said that X
Granger causes Y [22, 199] (see [98] for a recent extension to nonlinear time
series). Such a definition does not address the fundamental philosophical and
epistemological question of the real causality links between X and Y but has
been found useful in practice.
However, most economic and financial time series are not strictly sta-
tionary and the lagged correlation/dependence and/or causality between two
time series may be changing as a function time, for instance reflecting regime
switches and/or changing agent expectations. It is thus important to define
tests of causality or of lagged dependence which are sufficiently reactive to such
regime switches, allowing to follow almost in real time the evolving structure
of the causality. Cross-correlation methods and Granger causality tests require
rather substantial amount of data in order to obtain reliable conclusions. In
addition, cross-correlation techniques are fundamentally linear measures of de-
pendence and may miss important nonlinear dependence properties. Granger
causality tests are most often formulated using linear parametric autoregres-
sive models. It may thus be that many of the paradoxes in macroeconomics
7.2 Outlook and Future Directions 281

concerning the causal relationship between such variables as inflation, infla-


tion change, GDP growth rate and unemployment rate could result from an
inadequate description of the time-varying lag cross-sectional dependence.
Recently, a new method, called “Optimal thermal causal path”, has been
introduced [459]. It is both nonparametric and sufficiently general so as to
detect a priori arbitrary nonlinear dependence structures. Moreover, it is
specifically conceived so as to adapt to the time evolution of the causality
structure. The “Optimal thermal causal path” can be viewed as an extension
of the “time distance” measure which amounts to compare trend lines upon
horizontal differences of two time series [212].
The development of generalized dependence measures using such time-
adaptive lag structure seems to be another promising domain of future devel-
opments.

7.2.6 Toward a Dynamical Microfoundation of Dependences

The need for the rather sophisticated statistical methods described in this
book, as well as the developments suggested in this concluding chapter, reflect
in our opinion the absence of a fundamental genuine economic understand-
ing. To make a comparison with Natural Sciences, the need of such statistical
methods has been less important, probably because most of the fundamental
equations are known (at least at the macroscopic level) and the challenge lies
more in understanding the emergence of complex solutions from seemingly
simple mathematical formulations. In physics, for instance, the issues of de-
pendences raised in this book are better and more simply attacked from a
study of the fundamental dynamical equations. In contrast, we lack a deep
underpinning for understanding the mechanisms at the origin of the dynam-
ical behavior of financial markets. It is thus possible that the emerging field
of behavioral finance, with its sister fields of neuroeconomics and evolution-
ary psychology, and their exploration of the impact on decision making of
imperfect bounded subjective probability perceptions [36, 206, 437, 439, 474],
may provide a fundamental shift in our understanding and therefore in the
formulation of dependence between assets. This will have major impacts on
risk assessment and its optimization.
References

1. Abramovitz, E. and I.A. Stegun (1972) Handbook of Mathematical Functions.


Dover Publications, New York. 267
2. Acerbi, C. (2002) Spectral measures of risk: A coherent representation of sub-
jective risk aversion. Journal of Banking & Finance 26, 1505–1518. 11
3. Alexander, G.J. and A.M. Baptista (2002) Economic implications of using a
mean-VaR model for portfolio selection: A comparison with mean-variance
analysis. Journal of Economic Dynamics & Control 26, 1159–1193. 15
4. Allais, M. (1953) Le comportement de l’homme rationel devant le risque, cri-
tique des postulats de l’école américaine. Econometrica 21, 503–546. 4, 272
5. Allais, M. (1990) Allais Paradox. In The New Palgrave, Utility and Probability.
Macmillan, 3–9. 4, 272
6. Andersen, J.V. and D. Sornette (2001) Have your cake and eat it too: Increasing
returns while lowering large risks! Journal of Risk Finance 2, 70–82. 58, 233
7. Andersen, J.V. and D. Sornette (2005) A Mechanism for Pockets of Predictabil-
ity in Complex Adaptive Systems, Europhys. Lett., 70(5), 697–703. 23
8. Andersen, T.G., T. Bollerslev, F.X. Diebold and P. Labys (2001) The distrib-
ution of realized exchange rate volatility. Journal of the American Statistical
Association 96, 42–55. 219
9. Anderson, P.W. (1972) More is different (Broken symmetry and the nature of
the hierarchical structure of science). Science 177, 393–396. 22
10. Anderson, P.W., K.J. Arrow and D. Pines (1988) The Economy as an Evolving
Complex System. Santa Fe Institute Studies in the Sciences of Complexity 5.
Westview Press, Addison-Wesley, Redwood City CA, 1988. 14
11. Anderson, T.W. and D.A. Darling (1952) Asymptotic theory of certain “good-
ness of fit” criteria. Annals of Mathematical Statistics 23, 193–212. 61
12. Andersson, M., B. Eklund and J. Lyhagen (1999) A simple linear time series
model with misleading nonlinear properties. Economics Letters 65, 281–285.
43
13. Andreanov, A., F. Barbieri and O.C. Martin (2004) Large deviations in spin
glass ground state energies. European Physical Journal B 41, 365–375. 44
14. Ang, A. and G. Bekaert (2002) International asset allocation with regime shifts.
Review of Financial Studies 15, 1137–1187. 76, 231
15. Ang, A. and J. Chen (2002) Asymmetric correlations of equity portfolios. Jour-
nal of Financial Economics 63, 443–494. 231, 240, 262
284 References

16. Arneodo, A., E. Bacry and J.F. Muzy (1998) Random cascades on wavelet
dyadic trees. Journal of Mathematical Physics 39, 4142–4164. 82
17. Arneodo, A., J.F. Muzy and D. Sornette (1998) Direct causal cascade in the
stock market. European Physical Journal B 2, 277–282. 82
18. Arthur, W.B., S.N. Durlauf and D.A. Lane (1997) The Economy As an Evolv-
ing Complex System II. Santa Fe Institute Studies in the Sciences of Complexity
27 Westview Press, Addison-Wesley, Redwood City CA. 14
19. Artzner, P., F. Delbaen, J.M. Eber and D. Heath (1997) Thinking coherently.
Risk 10 (November), 68–71. 2, 4, 5
20. Artzner, P., F. Delbaen, J.M. Eber and D. Heath (1999) Coherent measures of
risk. Mathematical Finance 9, 203–228. 2, 4, 5
21. Artzner, P., F. Delbaen, J.M. Eber, D. Heath and H. Ku (2004) Coherent
multiperiod risk adjusted values and Bellman’s principle. Working Paper. 276
22. Ashley, R., C.W.J. Granger and R. Schmalensee (1980) Advertising and ag-
gregate consumption: An analysis of causality. Econometrica 48, 1149–1167.
280
23. Audrino, F. and G. Barone-Adesi (2004) Average conditional correlation and
tree structures for multivariate GARCH models. Working Paper. Available at
http://papers.ssrn.com/paper.taf?abstract id=553821 276
24. Axelrod, R. (1997) The Complexity of Cooperation. Princeton University Press,
Princeton, NJ. 22
25. Axtell, R. (2001) Zipf distribution of U.S. firm sizes. Science 293, 1818–1820.
41
26. Bachelier, L. (1900) Théorie de la spéculation. Annales Scientifiques de l’Ecole
Normale Supérieure 17, 21–86. VIII, 37, 80
27. Bacry, E., J. Delour and J.-F. Muzy (2001) Multifractal random walk. Physical
Review E 64, 026103. 39, 40, 84, 85, 86
28. Bacry, E. and J.-F. Muzy (2003) Log-infinitely divisible multifractal processes.
Communications in Mathematical Physics 236, 449–475. 41, 84
29. Baig, T. and I. Goldfajn (1999) Financial market contagion in the Asian
crisis. IMF Staff Papers 46(2), 167–195. Available at http://www.imf.org/
external/Pubs/FT/staffp/1999/06-99/baig.htm 211, 238
30. Baillie, R.T. (1996) Long memory processes and fractional integration in econo-
metrics, Journal of Econometrics 73, 5–59. 87
31. Baillie, R.T., T. Bollerslev and H.O. Mikelsen (1996) Fractionally integrated
generalized autoregressive conditional heteroskedasticity. Journal of Economet-
rics 74, 3–30. 37, 80
32. Bak, P. (1996) How Nature Works: The Science of Self-organized Criticality.
Copernicus, New York. 277
33. Bak, P. and M. Paczuski (1995) Complexity, contingency and criticality. Pro-
ceedings of the National Academy of Science USA 92, 6689–6696. 277
34. Bali, T.G. (2003) An extreme value approach to estimating volatility and
Value-at-Risk. Journal of Business 76, 83–108. 79
35. Barbe, P., C. Genest, K. Ghoudi and B. Rémillard (1996) On Kendall’s process.
Journal of Multivariate Analysis 58, 197–229. 204
36. Barberis, N. and R. Thaler (2003) A survey of behavioral finance. In Handbook
of the Economics of Finance, 1(B), G.M. Constantinides, M. Harris and R.M.
Stulz, eds. Elsevier, Amsterdam, 1053–1123. 4, 281
37. Barndorff-Nielsen, O.E. (1997) Normal inverse Gaussian distributions and the
modeling of stock returns. Scandinavian Journal of Statistics 24, 1–13. 43
References 285

38. Barndorff-Nielsen, O.E. and N. Shephard (2002) Econometric analysis of re-


alised volatility and its use in estimating stochastic volatility models. Journal
of the Royal Statistical Society, Series B 64, 253–280. 219
39. Barral, J. and B.B. Mandelbrot (2002) Multifractal products of cylindrical
pulses. Probability Theory & Related Fields 124, 409–430. 37, 41
40. Barro, R.J., E.F. Fama, D.R. Fischel, A.H. Meltzer, R. Roll and L.G. Telser
(1989) Black Monday and the Future of Financial Markets, R.W. Kamphuis Jr.,
R.C. Kormendi and J.W.H. Watson, eds. Mid American Institute for Public
Policy Research, Inc. and Dow Jones-Irwin, Inc. 26
41. Basle Committee on Banking Supervision (1996) Amendment to the Capital
Accord to Incorporate Market Risks. Available at http://www.bis.org/publ/
bcbs24.pdf VIII
42. Basle Committee on Banking Supervision (2001) The New Basel Capital Ac-
cord. Available at http://www.bis.org/publ/bcbsca.htm VIII, 138, 273
43. Basrak, B., R.A. Davis and T. Mikosch (1999) The sample ACF of a simple
bilinear process. Stochastic Processes & their Application 83, 1–14. 232
44. Bates, D.S. (1991) The crash of 87: Was it expected? The evidence from options
markets. Journal of Finance 46, 1009–1044. 132
45. Baur, D. (2005) A flexible dynamic correlation model. In Advances in Econo-
metrics: Econometric Analysis of Economic and Financial Time Series 20(A),
T.B. Fomby and D. Terrell eds. Elsevier, Kidlington, Oxford 276
46. Bauwens, L., S. Laurent and J. Rombouts (2003) Multivariate GARCH models:
A survey. Working Paper. Available at http://ssrn.com/abstract=411062
276
47. Beck, U. (1992) Risk Society: Toward a New Modernity. Sage Publications,
Thousand Oaks, CA. VII
48. Berman, S.M. (1962) A law of large numbers for the maximum in a stationary
Gaussian sequence. Annals of Mathematical Statistics 33, 93–97. 44
49. Berman, S.M. (1992) Sojourns and Extremes of Stochastic Processes.
Wadsworth. Chapman & Hall/CRC. 44
50. Bernardo, A. and O. Ledoit (2001) Gain, loss and asset pricing. Journal of
Political Economy 108, 144–172. 13
51. Bernstein, P.L. (1998) Against the Gods: The Remarkable Story of Risk, New
edition. Wiley, New York. 13
52. Bernstein, P.L. (1993) Capital Ideas: The Improbable Origins of Modern Wall
Street, Reprint edition. Free Press. 13, 14
53. Bhansali, V. and M.B. Wise (2001) Forecasting portfolio risk in normal and
stressed markets. The Journal of Risk 4(1), 91–106. 232
54. Biagini F., P. Guasoni and M. Prattelli (2000) Mean-variance hedging for sto-
chastic volatility models. Mathematical Finance 10, 109–123. 136
55. Biham, O., O. Malcai, M. Levy and S. Solomon (1998) Generic emergence of
power law distributions and Levy–Stable intermittent fluctuations in discrete
logistic systems. Physical Review E 58, 1352–1358. 39, 57
56. Bikos, A. (2000) Bivariate FX PDFs: A sterling ERI application. Working
Paper, Bank of England. 132
57. Bingham, N.H., C.M. Goldie and J.L. Teugel (1987) Regular variation. Cam-
bridge University Press, Cambridge. 39, 171
58. Bjerve, S. and K. Doksum (1993) Correlation curves: Measures of association
as functions of covariate values. Annals of Statistics 21, 890–902. 151
286 References

59. Black, F., M.C. Jensen and M.S. Scholes (1972) The capital asset pricing model:
Some empirical tests. In Studies in the Theory of Capital Markets, M.C. Jensen,
ed. Praeger, New York, 79–121. 14
60. Black, F. and M. Scholes (1973) The pricing of options and corporate liabilities.
Journal of Political Economy 81, 637–653. VIII, 38
61. Blanchard, O.J. and M.W. Watson (1982) Bubbles, rational expectations and
speculative markets. In Crisis in Economic and Financial Structure: Bubbles,
Bursts, and Shocks, P. Wachtel, ed. Lexington Books, Lexington. 39
62. Blattberg, R. and Gonnedes, N. (1974) A comparison of stable and Student
distribution as statistical models for stock prices. Journal of Business 47, 244–
280. 42
63. Blum, A. and A. Kalai (1999) Universal portfolios with and without transaction
costs. Machine Learning 35, 193–205. 275
64. Blum, P., A. Dias and P. Embrechts (2002) The ART of dependence modeling:
The latest advances in correlation analysis. In Alternative Risk Strategies. Risk
Books, Morton Lane, London, 339–356. 99
65. Bollerslev, T. (1986) Generalized autoregressive conditional heteroscedasticity.
Journal of Econometrics 31, 307–327. 43
66. Bollerslev, T., R.F. Engle and D.B. Nelson (1994) ARCH models. In Hand-
book of Econometrics, 4, R.F. Engle and D.L. McFadden, eds. North-Holland,
Amsterdam, 2959–3038. 43
67. Bonabeau, E., M. Dorigo and G. Théraulaz (1999) Swarm Intelligence: From
Natural to Artificial Systems. Oxford University Press, Oxford. 22
68. Bonabeau, E., J. Kennedy and R.C. Eberhart (2001) Swarm Intelligence. Aca-
demic Press, New York. 22
69. Bonabeau, E. and C. Meyer (2001) Swarm intelligence: A whole new way to
think about business. Havard Business Review (May), 106–114. 22
70. Bonabeau, E. and G. Théraulaz (2000) Swarm smarts. Scientific American
(March), 72–79. 22
71. Bookstaber, R. (1997) Global risk management: Are we missing the point?
Journal of Portfolio Management 23, 102–107. 228, 231
72. Bouchaud, J.-P. and M. Potters (2003) Theory of Financial Risks: From Sta-
tistical Physics to Risk Management, 2nd edition. Cambridge University Press,
Cambridge, New York. 38, 42, 59, 176
73. Bouchaud, J.-P., D. Sornette, C. Walter and J.-P. Aguilar (1998) Taming large
events: Optimal portfolio theory for strongly fluctuating assets. International
Journal of Theoretical & Applied Finance 1, 25–41. 2
74. Bouyé, E., V. Durrleman, A. Nikeghbali, G. Riboule and T. Roncalli (2000)
Copulas for finance: A reading guide and some applications. Technical Docu-
ment, Groupe de Recherche Opérationelle, Crédit Lyonnais. 103
75. Bovier, A. and D.M. Mason (2001) Extreme value behavior in the Hopfield
model. Annals of Applied Probability 11, 91–120. 44
76. Bowyer, K. and P.J. Phillips (1998) Empirical Evaluation Techniques in Com-
puter Vision. IEEE Computer Society, Los Alamos, CA. 3
77. Box, G.E.P. and M.E. Muller (1958) A note on the generation of random
normal deviates. Annals of Mathematics & Statistics 29, 610–611. 121
78. Boyer, B.H., M.S. Gibson and M. Lauretan (1997) Pitfalls in tests for changes
in correlations. Board of the Governors of the Federal Reserve System, Inter-
national Finance Discussion Paper 597. 228, 231, 232, 233, 234, 260
References 287

79. Bracewell, R. (1999) The Hilbert transform. In The Fourier Transform and Its
Applications, 3rd edition. McGraw-Hill, New York, 267–272. 279
80. Bradley, B. and M. Taqqu (2004) Framework for analyzing spatial contagion
between financial markets. Finance Letters 2(6), 8–15. 232
81. Bradley, B. and M. Taqqu (2005) Empirical evidence on spatial contagion be-
tween financial markets. Finance Letters 3(1), 64–76. 232
82. Bradley, B. and M. Taqqu (2005) How to estimate spatial contagion between
financial markets. Finance Letters 3(1), 77–86. 232
83. Breymann, W., A. Dias and P. Embrechts (2003) Dependence structures for
multivariate high-frequency data in finance. Quantitative Finance 3, 1–14. 215
84. Brock, W.A., W.D. Dechert, J.A. Scheinkman and B. Le Baron (1996) A test
for independence based on the correlation dimension. Econometric Reviews 15,
197–235. 38
85. Brockwell, P.J. and R.A. Davis (1996) Introduction to Time Series and Fore-
casting. Springer Series in Statistics, Springer, New York. 3, 80
86. Buhmann, M.D. (2003) Radial Basis Functions: Theory and Implementations.
Cambridge University Press, Cambridge, New York. 3
87. Calvo, S. and C.M. Reinhart (1995) Capital flows to Latin America: Is there
evidence of contagion effects? In Private Capital Flows to Emerging Market
After the Mexican Crisis, G.A. Calvo, M. Goldstein and E. Haochreiter, eds.
Institute for International Economics, Washington, DC. 247, 260
88. Campbell, J.Y., A.W. Lo and A.C. MacKinlay (1997) The Econometrics of
Financial Markets. Princeton University Press, Princeton, NJ. 38
89. Carpentier, D. and P. Le Doussal (2001) Glass transition of a particle in a
random potential, front selection in nonlinear renormalization group, and en-
tropic phenomena in Liouville and Sinh–Gordon models. Physical Review E
63, 026110. 44
90. Carr, P., H. Geman, D.B. Madan and M. Yor (2002) The fine structure of
assets returns: An empirical investigation. Journal of Business 75, 305–332.
43
91. Challet, D. and M. Marsili (2003) Criticality and market efficiency in a simple
realistic model of the stock market. Physical Review E 68, 036132. 57
92. Challet, D., M. Marsili and Y.-C. Zhang (2004) The Minority Game. Oxford
University Press, Oxford. 23
93. Chamberlain, G. (1982) The general equivalence of Granger and Sims causality.
Econometrica 50, 569–582. 280
94. Champenowne, D.G. (1953) A model of income distribution. Economic Journal
63, 318–351. 39
95. Chan, K.C., L.T.W. Cheng and P.P. Lung (2001) Implied volatility and equity
returns: Impact of market microstructure and cause–effect relation. Working
Paper. 280
96. Charpentier, A. (2004) Multivariate risks and copulas. Ph.D. thesis, University
of Paris IX. 249
97. Chen, K. and S.-H. Lo (1997) On a mapping approach to investigating the
bootstrap accuracy. Probability Theory & Related Fields 107, 197–217. 207
98. Chen, Y., G. Rangarajan, J. Feng and M. Ding (2004) Analyzing multiple
nonlinear time series with extended Granger causality. Physics Letters A 324,
26–35. 280
99. Cherubini, U. and Luciano, E. (2002) Bivariate option pricing with copulas.
Applied Mathematical Finance 9, 69–86. 100, 124, 131
288 References

100. Cherubini, U.E. Luciano and W. Vecchiato (2004) Copula Methods for Finance.
Wiley, New York. 124
101. Cizeau, P., M. Potters and J.P. Bouchaud (2001) Correlation structure of ex-
treme stock returns. Quantitative Finance 1, 217–222. 232
102. Claessen, S., R.W. Dornbush and Y.C. Park (2001) Contagion: Why crises
spread and how this can be stopped. In International Financial Contagion, S.
Cleassens and K.J. Forbes, eds. Kluwer Academic Press, Dordrecht, Boston.
231
103. Clayton, D.G. (1978) A model for association in bivariate life tables and its
application in epidemiological studies of familial tendency in chronic disease
incidence. Biometrika 65, 141–151. 113
104. Cochran, J.H. (2001) Asset Pricing. Princeton University Press, Princeton,
NJ.
105. Cohen, E., R.F. Riesenfeld and G. Elber (2001) Geometric Modeling with
Splines: An Introduction. AK Peters, Natick, MA. 3
106. Coles, S., J. Heffernan and J. Tawn (1999) Dependence measures for extreme
value analyses. Extremes 2, 339–365. 169, 255
107. Coles, S. and J.A. Tawn (1991) Modeling extreme multivariate events. Journal
of the Royal Statistical Society, Series B 53, 377–392. 116
108. Cont, R., Potters, M. and J.-P. Bouchaud (1997) Scaling in stock market
data: Stable laws and beyond. In Scale Invariance and Beyond, B. Dubrulle,
F. Graner and D. Sornette, eds. Springer, Berlin. 42
109. Cont, R. and P. Tankov (2003) Financial Modeling with Jump Processes. Chap-
man & Hall, London. 35
110. Cossette, H., P. Gaillardetz, E. Marceau and J. Rioux (2002) On two dependent
individual risk models. Insurance: Mathematics & Economics 30, 153–166. 100
111. Cox, D.R. (1972) Regression models in life tables (with discussion). Journal of
the Royal Statistical Society, Series B 34, 187–220. 113
112. Coutant, S., V. Durrleman, G. Rapuch and T. Roncalli (2001) Copulas, multi-
variate risk-neutral distributions and implied dependence functions. Technical
Document, Groupe de Recherche Opérationelle, Crédit Lyonnais. 100, 135
113. Cover, T.M. (1991) Universal portfolios. Mathematical Finance 1, 1–29. 275
114. Credit-Suisse-Financial-Products (1997) CreditRisk+ : A credit risk manage-
ment framework. Technical Document. Available at http://www.csfb.com/
creditrisk. 137
115. Cramer, H. (1946) Mathematical Methods of Statistics. Princeton University
Press, Princeton, NJ. 157
116. Cromwell, J.B., W.C. Labys and M. Terraza (1994) Univariate Tests for Time
Series Models. Sage, Thousand Oaks, CA, 20–22. 38
117. Danielsson, J., P. Embrechts, C. Goodhart, C. Keating, F. Muennich, O. Re-
nault and H.-S. Shin (2001) An academic response to Basel II. FMG and ESRC,
130 (London). VIII
118. Dawkins, R. (1989) The Selfish Gene, 2nd edition. Oxford University Press,
Oxford. 21
119. De Vany, A. & Lee, C. (2001) Quality signals in information cascades and the
dynamics of the distribution of motion picture box office revenues. Journal of
Economic Dynamics & Control 25, 593–614. 278
120. DeGregori, T.R. (2002) The zero risk fiction. April 12, American Council on
Science and Health. Available at http://www.acsh.org VII
References 289

121. Deheuvels, P. (1979) La fonction de dépendence empirique et ses propriétés –


Un test non paramétrique d’indépendance. Académie Royale de Belgique –
Bulletin de la Classe des Sciences, 5eme Série 65, 274–292. 190, 203
122. Deheuvels, P. (1981) A non parametric test for independence. Publications de
l’Institut de Statistique de l’Université de Paris 26, 29–50. 190, 203
123. Delbaen, F. (2002) Coherent risk measures on general probability spaces. In
Advances in Finance and Stochastics, Essays in Honour of Dieter Sondermann,
K. Sandmann and P.J. Schonbucher, eds. Springer, New York. 5
124. Diether, K.B., C.J. Malloy and A. Scherbina (2002) Difference of opinion and
the cross-section of stock returns. Journal of Finance 57, 2113–2141. 20
125. Delbaen, F., P. Monat, W. Schachermayer, M. Schweizer and C. Stricker (1997)
Weighted norm inequalities and hedging in incomplete markets. Finance &
Stochastics 1, 181–227. 136
126. Denuit, M., C. Genest and E. Marceau (1999) Stochastic bounds on sums of
dependent risks. Insurance: Mathematics & Economics 25, 85–104. 118, 126
127. Denuit, M. and O. Scaillet (2004) Non parametric tests for positive quadrant
dependence Journal of Financial Econometrics 2, 422–450. 167
128. Deschatres, F. and D. Sornette (2005) The dynamics of book sales: endogenous
versus exogenous shocks in complex networks. Physical Review E 72, 016112.
278
129. Devroye, L. (1986) Non-uniform Random Variate Generation. Springer-Verlag,
New York. 123
130. Dhaene, J. and M.J. Goovaerts (1996) Dependency of risks and stop-loss order.
ASTIN Bulletin 26, 201–212. 149, 165
131. Dhaene, J., R.J.A., Laeven, S. Vanduffel, G. Darkiewicz and M.J. Goovaerts
(2004) Can a coherent risk measure be too subadditive? Working Paper. 5
132. Diebolt, J. and D. Guegan (1991) Le modèle de série chronologique au-
torégressive β-ARCH. Compte-rendus de l’Académie des Sciences 312, 625–
630. 37, 57
133. Dodge, Y. (1985) Analysis of Experiments with Missing Data. Wiley, New York.
3
134. Doksum, K., S. Blyth, E. Bradlow, X. Meng and H. Zhao (1994) Correlation
curves as local measures of variance explained by regression. Journal of the
American Statistical Association 89, 571–582. 151
135. Dorigo, M., G. Di Caro, and M. Sampels (2002) Ant Algorithms. Springer,
Heidelberg. 22
136. Dragulescu, A.A. and V.M. Yakovenko (2002) Probability distribution of re-
turns in the Heston model with stochastic volatility. Quantitative Finance 2,
443–453. 58
137. Drozdz, S., J. Kwapien, F. Gruemmer, F. Ruf and J. Speth (2003) Are the
contemporary financial fluctuations sooner converging to normal? Physica
Polonica B 34, 4293–4306. 78
138. Durrleman, V., A. Nikeghbali and T. Roncalli (2000) Copulas approximation
and new families. Technical Document, Groupe de Recherche Opérationelle,
Crédit Lyonnais. 192
139. Dyn, N., D. Leviatan, D. Levin and A. Pinkus (2001) Multivariate Approxi-
mation and Applications. Cambridge University Press, Cambridge, New York.
3
290 References

140. Eberlein, E., Keller, U. and Prause, K. (1998) New insights into smile, mispric-
ing and value at risk: The hyperbolic model. Journal of Business 71, 371–405.
43, 58
141. Eberlein, E. and F. Özkan (2003) Time consistency of Lévy models. Quantita-
tive Finance 3, 40–50. 35
142. Edwards, S. and R. Susmel (2001) Volatility dependence and contagion in
emerging equity markets. Working Paper. Available at http://papers.ssrn.
com/sol3/papers.cfm?abstract id=285631 246
143. Efron, B. and R.J. Tibshirani (1986) Bootstrap method for standard errors,
confidence intervals and other measures of statistical accuracy. Statistical Sci-
ence 1, 54–77. 207
144. Efron, B. and R.J. Tibshirani (1993) An Introduction to the Bootstrap. Chap-
man & Hall, CRC. 3
145. Embrechts, P., A. Hoeing and A. Juri (2003) Using copulae to bound the Value-
at-Risk for functions of dependent risk. Finance & Stochastics 7, 145–167. 100,
118, 119, 124
146. Embrechts, P., C.P. Klüppelberg and T. Mikosh (1997) Modelling Extremal
Events. Springer-Verlag, Berlin. 46, 58
147. Embrechts, P., F. Lindskog and A. McNeil (2003) Modelling dependence with
copulas and applications to risk management. In Handbook of Heavy Tailed
Distributions in Finance, S. Rachev, ed. Elsevier, Amsterdam, 329–384. 167,
169, 232
148. Embrechts, P., A.J. McNeil and D. Straumann (1999) Correlation: Pitfalls and
alternatives. Risk 12(May), 69–71. 99
149. Embrechts, P., A.J. McNeil and D. Straumann (2002) Correlation and depen-
dence in risk management: Properties and pitfalls. In Risk Management: Value
at Risk and Beyond, M.A.H. Dempster, ed. Cambridge University Press, Cam-
bridge, 176–223. 99, 150, 169, 172, 220, 232
150. Engle, R.F. (1982) Autoregressive conditional heteroskedasticity with estima-
tion of the variance of UK inflation. Econometrica 50, 987–1008. 35
151. Engle, R.F. (1984) Wald, likelihood ratio, and lagrange multiplier tests in
econometrics. In Handbook of Econometrics, II, Z. Griliches and M.D. Intrili-
gator, eds. North-Holland, Amsterdam. 38
152. Engle, R.F., D.F. Hendry and J.-F. Richard (1983) Exogeneity. Econometrica
51, 277–304. 277
153. Engle, R.F. and A.J. Patton (2001) What good is a volatility model? Quanti-
tative Finance 1, 237–245. 66
154. Engle, R.F. and K. Sheppard (2001) Theoretical and empirical properties of
dynamic conditional correlation multivariate GARCH. NBER Working Papers
number 8554. Available at http://www.nber.org/papers/w8554.pdf 276
155. Erdös, P., A. Rényi and V.T. Sós (1966) On a problem of graph theory. Studia
Scientiarum Mathematicarum Hungarica 1, 215–235. 26
156. Ericsson, N. and J.S. Irons (1994) Testing Exogeneity: Advanced Texts in
Econometrics. Oxford University Press, Oxford. 277
157. Fama, E.F. (1965) The behavior of stock market prices. Journal of Business
38, 34–105. 42
158. Fama, E.F. (1970) Efficient capital markets: A review of theory and empirical
work. Journal of Finance 25, 383–417. 20
159. Fama, E.F. (1991) Efficient capital markets II. Journal of Finance 46, 1575–
1617. 20
References 291

160. Fama, E.F. and K.R. French (1992) The cross-section of expected stock returns.
Journal of Finance 47, 427–465. 3, 14, 18, 19, 24
161. Fama E.F. and K.R. French (1996) Multifactor explanations of asset pricing
anomalies. Journal of Finance 51, 55–84. 3, 24, 37
162. Fang, H.B. and T. Lai (1997) Co-kurtosis and capital asset pricing. Financial
Review 32, 293–307. 15, 58
163. Fang, H.B., K.T. Fang and S. Kotz (2002) The meta-elliptical distributions
with given marginals. Journal of Multivariate Analysis 82, 1–16. 108, 109
164. Farmer, J.D. (2002) Market force, ecology and evolution. Industrial & Corpo-
rate Change 11(5), 895–953. 22
165. Farmer, J.D. and Lillo, F. (2004) On the origin of power-law tails in price
fluctuations. Quantitative Finance 4, 11–15. 41
166. Farmer, J.D., P. Patelli and I. I. Zovko (2005) The predictive power of zero
intelligence models in financial markets. Proceedings of the National Academy
of Sciences 102(6), 2254–2259. 21
167. Feller, W. (1971) An Introduction to Probability Theory and its Applications,
II. Wiley, New York. 58
168. Fermanian, J.D. and O. Scaillet (2003) Nonparametric estimation of copulas
for time series. Journal of Risk 5(4), 25–54. 192, 193, 194, 254
169. Fermanian, J.D. and O. Scaillet (2005) Some statistical pitfalls in copula model-
ing for financial applications. In Capital Formation, Governance and Banking,
E. Klein, ed. Nova Publishers, Hauppauge, NY. 204
170. Figlewski, S. and X. Wang (2001) Is the “Leverage Effect” a Leverage Effect?
Unpublished working paper, New York University. 85
171. Fishman, G.S. (1996) Monte Carlo. Springer-Verlag, New York. 120
172. Flood, P. and P.M. Garber (1994) Speculative Bubbles, Speculative Attacks,
and Policy Switching. MIT Press, Cambridge, MA. 3
173. Flury, B. (1997) A First Course in Multivariate Statistics. Springer, New York.
3
174. Föllmer, H. and A. Schied (2002) Convex measures of risk and trading con-
straints. Finance & Stochastics 6, 429–447. 4, 7
175. Föllmer, H. and A. Schied (2003) Robust preferences and convex measures
of risk. In Advances in Finance and Stochastics, Essays in Honour of Dieter
Sondermann, K. Sandmann and P.J. Schonbucher, eds. Springer-Verlag, New
York. 4, 7
176. Föllmer, H. and M. Schweizer (1991) Hedging of contingent claims under in-
complete information. In Applied Stochastic Analysis, Stochastic Monographs,
5, M.H.A. Davis and R.J. Elliot, eds. Gordon and Breach, New York. 136
177. Föllmer, H. and D. Sondermann (1986) Hedging of non-redundant contingent
claims. In Contributions to Mathematical Economics, W. Hildenbrand and A.
Mascolell, eds. North-Holland, Amsterdam. 136
178. Forbes, K.J. and R. Rigobon (2002) No contagion, only interdependence: Mea-
suring stock market co-movements. Journal of Finance 57, 2223–2261. 232,
238, 240, 260
179. Frank, M.J., R.B. Nelsen and B. Schweitzer (1987) Best-possible bounds for
the distribution for a sum – A problem of Kolmogorov. Probability Theory &
Related Fields 74, 199–211. 118, 124, 127
180. Franses, P.H. and D. van Dijk (2000) Nonlinear Time Series Models in Em-
pirical Finance. Cambridge University Press, Cambridge, New York. 3
292 References

181. Füredi, Z. and J. Komlós (1981) The eigenvalues of random symmetric matrices
Combinatorica 1, 233–241.
182. Frees, W.E., J. Carriere and E.A. Valdez (1996) Annuity valuation with de-
pendent mortality. Journal of Risk & Insurance 63, 229–261. 113
183. Frees, W.E. and E.A. Valdez (1998) Understanding relationship using copulas.
North American Actuarial Journal 2, 1–25. 100, 103, 111, 124, 201
184. Frey, R. and A. McNeil (2001) Modelling dependent defaults. ETH E-
Collection. Available at http://e-collection.ethbib.ethz.ch/show?type=
bericht&nr=273 100, 124
185. Frey, R. and A. McNeil (2002) VaR and expected shortfall in portfolios of
dependent credit risks: Conceptual and practical insights. Journal of Banking
& Finance 26, 1317–1334. 181
186. Frey, R., A. McNeil and M. Nyfeler (2001) Credit risk and copulas. Risk 14,
111–114. 100, 138, 181
187. Frieden, J.A. (1992) Debt, Development, and Democracy: Modern Political
Economy and Latin America, 1965–1985. Princeton University Press, Prince-
ton, NJ. 261
188. Frieden, J.A., P. Ghezzi and E. Stein (2001) Politics and exchange rates: A
cross-country approach. In The Currency Game: Exchange Rate Politics in
Latin America, J.A. Frieden, P. Ghezzi, E. Stein, eds. Inter-American Devel-
opment Bank, Washington, DC. 261
189. Frieden, J.A. and E. Stein (2000) The political economy of exchange rate policy
in Latin America: An analytical overview. Working Paper, Harvard University.
261
190. Frisch, U. (1995) Turbulence. Cambridge University Press, Cambridge. 82
191. Frisch, U. and D. Sornette (1997) Extreme deviations and applications. Journal
de Physique I, France 7, 1155–1171. 57
192. Frittelli, M. (2000) The minimal entropy martingale measure and the valuation
problem in incomplete markets. Mathematical Finance 10, 39–52. 136
193. Gabaix, X. (1999) Zipf’s law for cities: An explanation. Quarterly Journal of
Economics 114, 739–767. 39
194. Gabaix, X., P. Gopikrishnan, V. Plerou and H.E. Stanley (2003) A theory of
power-law distributions in financial market fluctuations. Nature 423, 267–270.
41
195. Gabaix, X., P. Gopikrishnan, V. Plerou and H.E. Stanley (2003) A theory
of large fluctuations in stock market activity. MIT Department of Economics
Working Paper 03-30. Available at http://ssrn.com/abstract=442940 41
196. Geman, H. (2002) Pure jump Lévy process for asset price modelling. Journal
of Banking & Finance 26, 1297–1316. 35
197. Genest, C., K. Ghoudi and L.P. Rivest (1995) A semiparametric estimation
procedure of dependence parameters in multivariate families of distributions.
Biometrika 82, 543–552. 197, 198, 200, 222, 223
198. Genest, C. and J. MacKay (1986) The joy of copulas: Bivariate distribution
with uniform marginals. American Statistician 40, 28–283. 123, 155
199. Geweke, J. (1984) Inference and causality in economic time series models. In
Handbook of Economics, Vol. II, Z. Griliches and M.D. Intriligator, eds. Elsevier
Science Publisher BV, Amsterdam, 1101–1144. 280
200. Ghoudi, K. and B. Remillard (1998) Empirical processes based on pseudo-
observations. In Asymptotic Methods in Probability and Statistics: A Volume
in Honour of Miklos Csorgo, B. Szyskowicz, ed. Elsevier, Amsterdam. 206
References 293

201. Ghoudi, K. and B. Remillard (2004) Empirical processes based on pseudo-


observations II: The multivariate case. Fields Institute Communications Series
44, 381–406. 206
202. Gilboa, I. (1987) Expected utility with purely subjective non-additive proba-
bilities. Journal of Mathematical Economics 16, 65–88. 6
203. Gilboa, I. and D. Schmeidler (1989) Maxmin expected utility with non-unique
prior. Journal of Mathematical Economics 18, 141–153. 6
204. Glimcher, P.W. and A. Rustichini (2004) Neuroeconomics: The consilience of
brain and decision. Science 306, 447–452. 272
205. Godambe, V.P. (1960) An optimum property of regular maximum likelihood
estimation. Annals of Mathematical Statistics 31, 1208–1211. 202
206. Goldberg, J. and R. von Nitzsch (2001) Behavioral Finance, translated by A.
Morris. Wiley, New York. 4, 281
207. Goldie, C.M. (1991) Implicit renewal theory and tails of solutions of random
equations. Annals of Applied Probability 1, 126–166. 39
208. Gouriéroux, C. and J. Jasiak (1998) Truncated maximum likelihood, goodness
of fit tests and tail analysis. Working Paper, CREST. 42
209. Gouriéroux, C. and A. Monfort (1994) Testing non-nested hypothesis. In Hand-
book of Econometrics, 4, R.F. Engle and D. McFadden, eds. North-Holland,
Amsterdam, 2585–2637. 74, 216, 224
210. Gouriéroux, C. and A. Montfort (1997) Time Series and Dynamic Models,
translated and edited by G.M. Gallo. Cambridge University Press, Cambridge,
New York. 3
211. Gopikrishnan, P., V. Plerou, L.A.N. Amaral, M. Meyer and H.E. Stanley (1999)
Scaling of the distributions of fluctuations of financial market indices. Physical
Review E 60, 5305–5316. 51
212. Granger, C.W.J. and Y. Jeon (1997) Measuring lag structure in forecasting
models – The introduction of Time Distance. UCSD Economics Discussion
Paper 97-24. Available at http://ssrn.com/abstract=56515 281
213. Granger, C.W.J. and R. Joyeux (1980) An introduction to long memory time
series models and fractional differencing. Journal of Time Series Analysis 1,
15–29. 80
214. Granger, C.W.J., E. Maasoumi and J. Racine (2004) A dependence metric
for possibly nonlinear processes. Journal of Time Series Analysis 25, 649–669.
162, 164, 203
215. Granger, C.W.J. and T. Teräsvirta (1993) Modelling Nonlinear Economic Rela-
tionships (Advanced Texts in Econometrics). Oxford University Press, Oxford.
3
216. Granger, C.W.J. and T. Teräsvirta (1999) A simple nonlinear model with mis-
leading properties. Economics Letters 62, 741–782. 43
217. Guillaume, D.M., M.M. Dacorogna, R.R. Davé, U.A. Müller, R.B. Olsen and
O.V. Pictet (1997) From the bird’s eye to the microscope: A survey of new
stylized facts of the intra-day foreign exchange markets. Finance & Stochastics
1, 95–130. 42, 44
218. Haas, C.N. (1999) On modeling correlated random variables in risk assessment.
Risk Analysis 19, 1205–1214. 120
219. Hall, P.G. (1979) On the rate of convergence of normal extremes. Journal of
Applied Probabilities 16, 433–439. 48
220. Hall, W.J. and J.A. Wellnel (1979) The rate of convergence in law of the
maximum of an exponential sample. Statistica Neerlandica 33, 151–154. 48, 49
294 References

221. Hamilton, J.D. (1989) A new approach to the economic analysis of non-
stationary time series and the business cycle. Econometrica 57, 357–384. 279
222. Hamilton, J.D. (1994) Time Series Analysis. Princeton University Press,
Princeton, NJ. 3, 80
223. Harvey, C.R. and A. Siddique (2000) Conditional skewness in asset pricing
tests. Journal of Finance 55, 1263–1295. 14, 15
224. Hauksson, H.A., M.M. Dacorogna, T. Domenig, U.A. Müller and G. Samorod-
nitsky (2001) Multivariate extremes, aggregation and risk estimation. Quanti-
tative Finance 1, 79–95. 232
225. Havrda, J. and F. Charvat (1967) Quantification method of classification
processes: Concept of structural α-entropy. Kybernetica Cislo I. Rocnik 3, 30–
34. 163
226. Heath, D., E. Platen and M. Schweizer (2001) A comparison of two quadratic
approaches to hedging in incomplete markets. Mathematical Finance 11, 385–
413. 136
227. Helmbold, D.P., R.E. Schapire, Y. Singer and M.K. Warmuth (1998) On-line
portfolio selection using multiplicative updates. Mathematical Finance 8, 325–
347, 275
228. Hennessy, D. and H.E. Lapan (2002) The use of Archimedean copulas to model
portfolio allocations. Mathematical Finance 12, 143–154. 124
229. Herffernan, J.E. (2000) A directory of tail dependence. Extremes 3, 279–290.
172
230. Henderson, V., D. Hobson, S. Howison and T. Kluge (2005) A comparison of
option prices under different pricing measures in a stochastic volatility model
with correlation. Review of Derivatives Research 8, 5–25. 136
231. Hergarten, S. (2002) Self-organized Criticality in Earth Systems. Springer-
Verlag, Heidelberg. 277
232. Heston, S.L. (1993) A closed-form solution for options with stochastic volatility
with applications to bond and currency options. Review of Financial Studies
6, 327–343. 37, 144
233. Hill, B.M. (1975) A simple general approach to inference about the tail of a
distribution. Annals of Statistics 3, 1163–1174. 64
234. Hobson, D. (2004) Stochastic volatility models, correlation and the q-optimal
measure. Mathematical Finance 14, 537–556. 136
235. Hotelling, H. (1936) Relations between two sets of variates. Biometrika 28,
321–377. 152
236. Houggard, P. (1984) Life table methods for heterogeneous populations: Distri-
butions describing for heterogeneity. Biometrika 71, 75–83. 113
237. Houggard, P., B. Harvald and N.V. Holm (1992) Measuring the similarities
between the lifetimes of adult Danish twins born between 1881–1930. Journal
of the American Statistical Association 87, 17–24. 113
238. Hubert, P.J. (2003) Robust Statistics. Wiley-Intersecience, New York. 275
239. Hult, H. and F. Lindskog (2001) Multivariate extremes, aggregation and depen-
dence in elliptical distributions. Advances in Applied Probability 34, 587–609.
174
240. Hwang, S. and M. Salmon (2002) An analysis of performance measures using
copulae. In Performance Measurement in Finance: Firms, Funds and Man-
agers, J. Knight and S. Satchell, eds. Butterworth-Heinemann, London. 124
References 295

241. Hwang, S. and S. Satchell (1999) Modelling emerging market risk premia using
higher moments. International Journal of Finance & Economics 4, 271–296.
13, 15, 58
242. Ingersoll, J.E. (1987) The Theory of Financial Decision Making. Rowman &
Littlefield, Totowa, NJ.
243. Iman, R.L. and W.J. Conover (1982) A distribution-free approach to inducing
rank correlation among input variables. Communications in Statistics Simula-
tion & Computation 11, 311–334. 120
244. Jackel, P. (2002) Monte Carlo Methods in Finance. Wiley, New York. 120
245. Jenkinson, T. and Ljungqvist, A. (2001) Going Public: The Theory and Evi-
dence on How Companies Raise Equity Finance, 2nd edition. Oxford University
Press, Oxford. 278
246. Jensen, H.J. (1998) Self-organized Criticality: Emergent Complex Behavior in
Physical and Biological Systems, Cambridge Lecture Notes in Physics. Cam-
bridge University Press, Cambridge. 277
247. Jensen, J.L. (1995) Saddlepoint Approximations. Oxford University Press, Ox-
ford. 263
248. Joe, H. (1997) Multivariate Models and Dependence Concepts. Chapman &
Hall, London. 103, 117, 137, 202
249. Johansen, A. and D. Sornette (2001) Large stock market price Drawdowns are
outliers. Journal of Risk 4, 69–110, Hauppauge, NY. 3, 23, 36, 79, 276
250. Johansen, A. and D. Sornette (2004) Endogenous versus exogenous crashes
in financial markets. In Contemporary Issues in International Finance. Nova
Science Publishers. 3, 36, 276, 278
251. Johnson, N.F., P. Jefferies and P.M. Hui (2003) Financial Market Complexity.
Oxford University Press, Oxford. 23
252. Johnson, N.L. and S. Kotz (1972) Distributions in Statistics: Continuous Mul-
tivariate Distributions. Wiley, New York. 107, 121, 240, 262
253. Johnson, N.L., S. Kotz and N. Balakrishnan (1997) Discrete Multivariate Dis-
tributions. Wiley, New York. 3
254. Johnson, R.A. and D.W. Wichern (2002) Applied Multivariate Statistical
Analysis, 5th edition. Prentice Hall, Upper Saddle River, NJ. 3
255. Johnson, T.C. (2004) Forecast dispersion and the cross section of expected
returns. Journal of Finance 59, 1957–1978. 20
256. Jondeau, E. and M. Rockinger (2003) Testing for differences in the tails of
stock-market returns. Journal of Empirical Finance 10, 559–581. 53, 78
257. Jorion, P. (1997) Value-at-Risk: The New Benchmark for Controlling Deriva-
tives Risk. Irwin Publishing, Chicago, IL. 2
258. Jouini, M.N. and R.T. Clemen (1996) Copula models for aggregating expert
opinions. Operation Research 43, 444–457. 124
259. Jurczenko, E. and B. Maillet (2005) The 4-CAPM: in between Asset Pricing
and Asset Allocation. In Multi-Moment Capital Pricing Models and Related
Topics, C. Adcock, B. Maillet and E. Jurzenko, eds. Springer. Forthcoming 58
260. Juri, A. and M.V. Wüthrich (2002) Copula convergence theorem for tail events.
Insurance: Mathematics & Economics 30, 405–420. 115, 232, 255
261. Kalai, A. and S. Vempala (2000) Efficient algorithms for universal portfolios. In
Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer
Science, 486–491. 275
296 References

262. Kaminsky, G.L. and S.L. Schmukler (1999) What triggers market jitters? A
chronicle of the Asian crisis. Journal of International Money & Finance 18,
537–560. 211
263. Kahneman, D. (2002) Maps of bounded rationality: A perspective on intuitive
judgment an choice. Nobel Prize Lecture. Available at http://nobelprize.
org/economics/laureates/2002/kahnemann-lecture.pdf 272
264. Karatzas, I. and S.E. Shreve (1991) Brownian Motion and Stochastic Calculus.
Springer-Verlag, New York. 219
265. Karlen, D. (1998) Using projection and correlation to approximate probability
distributions. Computer in Physics 12, 380–384. 108, 219
266. Kass, R.E. and A.E. Raftery (1995) Bayes factors. Journal of the American
Statistical Association 90, 773–795. 74
267. Kearns, P. and A. Pagan (1997) Estimating the density tail index for financial
time series. Review of Economics & Statistics 79, 171–175. 43, 48, 51, 52, 65,
76
268. Kesten, H. (1973) Random difference equations and renewal theory for prod-
ucts of random matrices. Acta Mathematica 131, 207–248. 39
269. Kim, C.-J. and C.R. Nelson (1999) State-Space Models with Regime Switch-
ing: Classical and Gibbs-Sampling Approaches with Applications. MIT Press,
Cambridge, MA. 3
270. Kimeldorf, G. and A. Sampson (1978) Monotone dependence. Annals of Sta-
tistics 6, 895–903. 101
271. King, M. and S. Wadhwani (1990) Transmission of volatility between stock
markets. Review of Financial Studies 3, 5–330. 231, 247
272. Klugman, S.A. and R. Parsa (1999) Fitting bivariate loss distributions with
copulas. Insurance: Mathematics & Economics 24, 139–148. 124, 201, 203
273. KMV Corporation (1997) Modelling default risk. Technical Document. Avail-
able at http://www.kmv.com 137
274. Knopoff, L. and D. Sornette (1995) Earthquake death tolls. Journal de Ph-
ysisque I, France 5, 1681–1688. 126
275. Kon, S. (1984) Models of stock returns: A comparison. Journal of Finance 39,
147–165. 42
276. Kotz, S. (2000) Continuous Multivariate Distributions, 2nd edition. Wiley, New
York. 3
277. Kotz, S. and S. Nadarajah (2000) Extreme Value Distribution: Theory and
Applications. Imperial College Press, London. 44
278. Krauss, A. and R. Litzenberger (1976) Skewness preference and the valuation
of risk assets. Journal of Finance 31, 1085–1099. 15
279. Krivelevich, M. and V.H. Vu (2002) On the concentration of eigenvalues of
random symmetric matrices. Israel Journal of Mathematics 131, 259–268. 26
280. Krugman, P. (1996) Self-organizing Economy. Blackwell publishers,
Cambridge, MA, and Oxford. 14
281. Kruskal, W.H. (1958) Ordinal measures of association. Journal of the American
Statistical Association 53, 814–861. 160
282. Krzanowski, W.J. (2000) Principles of Multivariate Analysis: A User’s Perspec-
tive, revised edition. Clarendon Press, Oxford; Oxford University Press, New
York. 3
283. Krzysztofowicz, R. and K.S. Kelly (1996) A meta-Gaussian distribution with
specified marginals. Technical Document, University of Virginia. 108
References 297

284. Kulpa, T. (1999) On approximations of copulas. International Journal of Math-


ematics & Mathematical Sciences 22, 259–269. 254
285. Kusuoka, S. (2001) On law invariant coherent risk measures. In Advances in
Mathematical Economics, 3, S. Kusuoka, T. Maruyama, R. Anderson, C. Cas-
taing, F.H. Clarke, G. Debreu, E. Dierker, D. Duffie, L.C. Evans, T. Fujimoto,
J.-M. Grandmont, N. Hirano, L. Hurwicz, eds. Springer, Tokyo. 83–95. 6
286. Laherrère, J. and D. Sornette (1999) Stretched exponential distributions in
nature and economy: Fat tails with characteristic scales. European Physical
Journal B 2, 525–539. 36, 43
287. Lakonishok, J. and A.C. Shapiro (1986) Systematic risk, total risk and size
as determinants of stock market returns. Journal of Banking & Finance 10,
115–132. 14
288. Laloux, L., P. Cizeau, M. Potters and J.P. Bouchaud (2000) Random matrix
theory and financial correlations. International Journal of Theoretical & Ap-
plied Finance 3, 391–397. 24, 27
289. Lamper, D., S.D. Howison and N.F. Johnson (2001) Predictability of large
future changes in a competitive evolving population. Physical Review Letters
88, 017902. 23
290. Lancaster, H.O. (1963) Correlation and complete dependence of random vari-
ables. Annals of Mathematical Statistics 34, 1315–1321. 101
291. Laurent, J.P. and J. Gregory (2004) In the core of correlation. Risk 17(10),
87–91. 181
292. Laurent, J.P. and H. Pham (1999) Dynamic programming and mean-variance
hedging. Finance & Stochastics 3, 83–101. 136
293. Leadbetter, M.R. (1974) On extreme values in stationary processes.
Wahrscheinlichkeits 28, 289–303. 44, 46
294. Ledford, A.W. and J.A. Tawn (1996) Statistics for near independence in mul-
tivariate extreme values. Biometrika 83, 169–187. 255
295. Ledford, A.W. and J.A. Tawn (1998) Concomitant tail behavior for extremes.
Advances in Applied Probability 30, 197–215. 255
296. Ledoit, O., P. Santa-Clara and M. Wolf (2003) Flexible multivariate GARCH
modeling with an application to international stock markets. Review of Eco-
nomics & Statistics 85, 735–747. 276
297. Ledoit, O. and M. Wolf (2004) Honey, I shrunk the sample covariance matrix
– Problems in mean-variance optimization. Journal of Portfolio Management
30(4), 110–119. 3
298. Ledoit, O. and M. Wolf (2004) A well-conditioned estimator for large-
dimensional covariance matrices. Journal of Multivariate Analysis 88, 365–411.
3
299. Lee, S.B. and K.J. Kim (1993) Does the October 1987 crash strengthen the
co-movements among national stock markets? Review of Financial Economics
3, 89–102. 247
300. Lehmann, E. (1966) Some concepts of dependence. Annals of Mathematical
Statistics 37, 1137–1153. 164, 165
301. Lempert, R.J., S.W. Popper, and S.C. Bankes (2003) Shaping the next one
hundred years: New methods for quantitative, long-term policy analysis. RAND
Pardee Center Publications MR-1626-CR. 275
302. Li, D.X. (1999) The valuation of basket credit derivatives. CreditMetrics Mon-
itor (April), 34–50. 124
298 References

303. Li, D.X. (2000) On default correlation: A copula approach. Journal of Fixed
Income 9, 43–54. 124, 137
304. Li, D.X., Mikusinski, P., H. Sherwood and M.D. Taylor (1997) On approxima-
tion of copulas. In Distributions with Given Marginals and Moments Problems,
V. Benes and J. Stephan, eds. Kluwer Academic Publisher, Dordrecht, Boston.
191
305. Li, D.X., Mikusinski, P. and Taylor, M.D. (1998) Strong approximation of
copulas. Journal of Mathematical Analysis & Applications 225, 608–623. 191,
254
306. Lim, K.G. (1989) A new test for the three-moment capital asset pricing model.
Journal of Financial & Quantitative Analysis 24, 205–216. 14, 15
307. Lindskog, F., A.J. McNeil and U. Schmock (2003) Kendall’s tau for elliptical
distributions. In Credit Risk – Measurement, Evaluation and Management, G.
Bol, G. Nakhaeizadeh, S. Rachev, T. Ridder and K.-H. Vollmer, eds. Physica-
Verlag, Heidelberg. 111, 157
308. Lintner, J. (1975) The valuation of risk assets and the selection of risky invest-
ments in stock portfolios and capital budgets. Review of Economics & Statistics
13, 13–37. 14
309. Litterman, R. and K. Winkelmann (1998) Estimating Covariance Matrices.
Risk Management Series, Goldman Sachs. 2
310. Little, R.J.A. and D.B. Rubin (1987) Statistical Analysis with Missing Data.
Wiley, New York. 3
311. Lo, A.W. (1999) The three P’s of total risk management. Financial Analysts
Journal 55 (January/February), 13–26. 271
312. Longin, F.M. (1996) The asymptotic distribution of extreme stock market re-
turns. Journal of Business 96, 383–408. 42, 43, 44, 52, 61
313. Longin, F.M. (2000) From value at risk to stress testing: The extreme value
approach. Journal of Banking & Finance 24, 1097–1130. 43, 46, 79
314. Longin, F.M. and B. Solnik (1995) Is the correlation in international equity
returns constant: 1960–1990? Journal of International Money & Finance 14,
3–26. 231, 241
315. Longin F.M. and B. Solnik (2001) Extreme correlation of international equity
markets. Journal of Finance 56, 649–676. 231, 232, 236, 241, 253, 254, 255,
260
316. Loretan, M. (2000) Evaluating changes in correlations during periods of high
market volatility. Global Investor 135, 65–68. 231, 232
317. Loretan, M. and W.B. English (2000) Evaluating “correlation breakdowns”
during periods of market volatility. BIS Quarterly Review (June), 29–36. 231,
232
318. Loynes, R.M. (1965) Extreme values in uniformly mixing stationary stochastic
processes. Annals of Mathematical Statistics 36, 993–999. 44
319. Lux, T. (1996) The stable Paretian hypothesis and the frequency of large re-
turns: An examination of major German stocks. Applied Financial Economics
6, 463–475. 78
320. Lux, T. (2000) On moment condition failure in German stock returns: An
application of recent advances in extreme value statistics. Empirical Economics
25, 641–652. 42
321. Lux, T. (2003) The multifractal model of asset returns: Its estimation via GMM
and its use for volatility forecasting. Working Paper, University of Kiel. 40
References 299

322. Lux, T. (2001) The limiting extreme behavior of speculative returns: An analy-
sis of intra-daily data from the Francfurt stock exchange. Journal of Economic
Behavior & Organization 46, 327–342. 42, 44
323. Lux, T. and D. Sornette (2002) On rational bubbles and fat tails. Journal of
Money Credit & Banking 34, 589–610. 39
324. Lyubushin Jr., A.A. (2002) Robust wavelet-aggregated signal for geophysical
monitoring problems. Izvestiya, Physics of the Solid Earth 38, 1–17. 152
325. Majumdar, S.N. and P.L. Krapivsky (2002) Extreme value statistics and trav-
eling fronts: Application to computer science. Physical Review E 65, 036127.
44
326. Majumdar, S.N. and P.L. Krapivsky (2003) Extreme value statistics and trav-
eling fronts: Various applications. Physica A 318, 161–170. 44
327. Makarov, G.D. (1981) Estimates for the distribution function of a sum of two
random variables when the marginal distributions are fixed. Theory of Proba-
bility & its Applications 26, 803–806. 118
328. Malamud, B.D., G. Morein and D.L. Turcotte (1998) Forest fires – An example
of self-organized critical behavior. Science 281, 1840–1842. 126
329. Malevergne, Y., V.F. Pisarenko and D. Sornette (2003) On the power of gen-
eralized extreme value (GEV) and generalized pareto distribution (GPD) esti-
mators for empirical distributions of log-returns. Applied Financial Economics.
Forthcoming. 38, 48, 49, 50, 51, 52, 55, 56
330. Malevergne, Y., V.F. Pisarenko and D. Sornette (2005) Empirical distribution
of log-returns: Between the stretched-exponential and the power law? Quanti-
tative Finance 5. Forthcoming. 40, 54, 63, 64, 66, 68, 70, 72, 74, 75
331. Malevergne, Y. and D. Sornette (2001) Multi-dimensional rational bubbles and
fat tails. Quantitative Finance 1, 533–541. 39
332. Malevergne, Y. and D. Sornette (2002) Minimizing extremes. Risk 15 (11),
129–132. X, 174, 175, 178, 179, 255
333. Malevergne, Y. and D. Sornette (2005) Multi-moment methods for portfo-
lio management: Generalized capital asset pricing model in homogeneous and
heterogeneous markets. In Multi-Moment Capital Pricing Models and Related
Topics, C. Adcock, B. Maillet and E. Jurzenko, eds. Springer. Forthcoming. X,
8, 16, 17, 31, 58
334. Malevergne, Y. and Sornette, D. (2003) Testing the Gaussian copula hypothesis
for financial assets dependences. Quantitative Finance 3, 231–250. 204, 207,
208, 211, 212, 214, 215, 216, 232, 254
335. Malevergne, Y. and D. Sornette (2004) How to account for extreme co-
movements between individual stocks and the market. Journal of Risk 6(3),
71–116. 174, 175, 255
336. Malevergne, Y. and D. Sornette (2004) VaR-efficient portfolios for a class of
super- and sub-exponentially decaying assets return distributions. Quantitative
Finance 4, 17–36. X, 124, 128, 130, 141
337. Malevergne, Y. and D. Sornette (2004) Collective origin of the coexistence of
apparent RMT noise and factors in large sample correlation matrices. Physica
A 331, 660–668. 24, 27, 28
338. Malevergne, Y. and D. Sornette (2005) Higher-moment portfolio theory (Cap-
italizing on behavioral anomalies of stock markets). Journal of Portfolio Man-
agement 31(4), 49–55. 124
339. Mandelbrot, B.B. (1963) The variation of certain speculative prices. Journal
of Business 36, 392–417. 42
300 References

340. Mandelbrot, B.B. (1997) Fractals and Scaling in Finance: Discontinuity, Con-
centration, Risk. Springer, New York. 82
341. Mandelbrot, B.B., A. Fisher and L. Calvet (1997) A multifractal model of asset
returns. Coles Fundation Discussion Paper #1164. 37, 41, 82, 84
342. Mansilla, R. (2001) Algorithmic complexity of real financial markets. Physica
A 301, 483–492. 232
343. Mantegna, R.N. (1999) Hierarchical structure in financial markets. European
Physical Journal B 11, 193–197. 28
344. Mantegna, R.N. and H.E. Stanley (1994) Stochastic process with ultra slow
convergence to a Gaussian: The truncated Lévy flight. Physical Review Letters
73, 2946–2949. 75
345. Mantegna, R.N. and H.E. Stanley (1995) Scaling behavior of an economic
index. Nature 376, 46–55. 42
346. Mantegna, R.N. and H.E. Stanley (2000) An Introduction to Econophysics,
Correlations and Complexity in Finance. Cambridge University Press, Cam-
bridge. 42
347. Markovitz, H. (1959)Portfolio Selection: Efficient Diversification of Invest-
ments. Wiley, New York. VIII, 1, 2, 13, 31, 38
348. Marshall, A. and I. Olkin (1988) Families of multivariate distributions. Journal
of the American Statistical Association 83, 834–841. 113, 123
349. Marsili, M. (2002) Dissecting financial markets: Sectors and states. Quantita-
tive Finance 2, 297–302. 28
350. Mashal, R. and A.J. Zeevi (2002) Beyond correlation: Extreme co-movements
between financial assets. Working Paper, Columbia Business School. 110, 200,
212, 215, 216, 224, 226
351. Matia, K., L.A.N. Amaral, S.P. Goodwin and H.E. Stanley (2002) Different
scaling behaviors of commodity spot and future prices. Physical Review E 66,
045103. 51
352. McClure, S.M., D.I. Laibson, G. Loewenstein and J.D. Cohen (2004) Separate
neural systems value immediate and delayed monetary rewards. Science 306,
503–507. 272
353. McDonald, J.B. and W.K. Newey (1988) Partially adaptive estimation of re-
gression models via the generalized t-distribution. Econometric Theory 4, 428–
457. 54
354. McLeish, D.L. and C.G. Small (1988) The Theory and Application of Statistical
Inference Functions. Springer-Verlag, Berlin. 202
355. McNeil, A. and R. Frey (2000) Estimation of tail-related risk measures for
heteroscedastic financial time series: An extreme value approach. Journal of
Empirical Finance 7, 271–300. 79
356. Meerschaert, M.M. and H. Scheffler (2001) Sample cross-correlations for mov-
ing averages with regularly varying tails. Journal of Time Series Analysis 22,
481–492. 58, 148, 243, 254
357. Mehta, M.L. (1991) Random Matrices, 2nd edition. Academic Press, Boston.
26
358. Merton, R.C. (1974) On the pricing of corporate debt: The risk structure of
interest rates. Journal of Finance 29, 449–470. 20, 137
359. Merton, R.C. (1990) Continuous-Time Finance. Blackwell, Cambridge. 14
360. Mézard, M., G. Parisi and M.A. Virasoro (1987) Spin Glass Theory and Be-
yond, World Scientific Lecture Notes in Physics, Vol. 9. World Scientific, Sin-
gapore. 164
References 301

361. Mills, T.C. (1993) The Econometric Modelling of Financial Time Series. Cam-
bridge University Press, Cambridge, New York. 3
362. Mittnik S., S.T. Rachev and M.S. Paolella (1998) Stable Paretian modeling in
finance: Some empirical and theoretical aspects. In A Practical Guide to Heavy
Tails, R.J. Adler, R.E. Feldman and M.S. Taqqu, eds. Birkhauser, Boston,
79–110. 42
363. Morrison, D.F. (1990) Multivariate Statistical Methods, 3rd edition. McGraw-
Hill, New York. 3
364. Mossin, J. (1966) Equilibrium in a capital market. Econometrica 34, 768–783.
14
365. J.-F. Muzy, A. Kozhemyak and E. Bacry (2005) Extreme values and fat tails
of multifractal fluctuations. Working Paper. 40, 41
366. Muzy, J.F., D. Sornette, J. Delour and A. Arnéodo, (2001) Multifractal returns
and hierarchical portfolio theory. Quantitative Finance 1, 131–148. 40, 217, 276
367. Müller, U.A., M.M. Dacorogna, O.V. Pictet (1998) Heavy tails in high-
frequency financial data. In A Practical Guide to Heavy Tails, R.J. Adler,
R.E. Feldman and M.S. Taqqu, eds. Birkhauser, Boston, 55–78. 42, 44
368. Nagahara, Y. and G. Kitagawa (1999) A non-Gaussian stochastic volatility
model. Journal of Computational Finance 2, 33–47. 42
369. Naveau, P. (2003) Almost sure relative stability of the maximum of a stationary
sequence. Advances in Applied Probability 35, 721–736. 44
370. Nelsen, R.B. (1998) An Introduction to Copulas, Lectures Notes in statistic,
139. Springer Verlag, New York. 103, 112, 115, 118, 155, 191
371. Neudecker, H., R. Heijmans, D.S.G. Pollock, A. Satorra (2000) Innovations in
Multivariate Statistical Analysis. Kluwer Academic Press, Dordrecht, Boston.
3
372. Newey, W.K. and D. McFadden (1994) Large sample estimation and hypothesis
testing. In Handbook of Econometrics, 4, R.F. Engle and D. McFadden, eds.
North-Holland, Amsterdam. 202, 216, 224
373. Noh, J.D. (2000) Model for correlations in stock market. Physical Review E
61, 5981–5982. 27
374. O’Brien, G.L. (1987) Extreme values for stationary and Markov sequences.
Annals of Probability 15, 281–291. 44
375. Oakes, D. (1982) A model for association in bivariate survival data. Journal of
the Royal Statistical Society, Series B 44, 414–422. 196
376. Okuyama, K., M. Takayasu and H. Takayasu (1999) Zipf’s law in income dis-
tribution of companies. Physica A 269, 125–131. 41
377. Osborne, M.F.M. (1959) Brownian motion in the stock market. Operations
Research 7, 145–173. Reprinted in The Random Character of Stock Market
Prices, P. Cootner ed. MIT Press, Cambridge, MA (1964), 100–128. 38, 80
378. Panja, D. (2004) Fluctuating fronts as correlated extreme value problems: An
example of Gaussian statistics. Physical Review E 70, 036101. 44
379. Papoulis, A. (1962) Hilbert transforms. In The Fourier Integral and Its Appli-
cations. McGraw-Hill, New York, 198–201. 279
380. Patton, J.A (2005) Estimation of multivariate models for time series of possibly
different lengths. Journal of Applied Econometrics. Forthcoming. 100, 203, 217,
254
381. Patton, J.A. (2005) Modelling asymmetric exchange rate dependence. Inter-
national Economic Review. Forthcoming. 203
302 References

382. Pelletier, D. (2005) Regime switching for dynamic correlations. Journal of


Econometrics. Forthcoming. 279
383. Pfingsten, A., P. Wagner and C. Wolferink (2004) An empirical investigation
of the rank correlation between different risk measures. Journal of Risk 6(4),
55–74. 4
384. Pham, H., T. Rheinländer and M. Schweizer (1998) Mean-variance hedging
for continuous processes: New proofs and examples. Finance & Stochastics 2,
173–198. 136
385. Picoli Jr., S., R.S. Mendes and L.C. Malacarne (2003) q-exponential, Weibull,
and q-Weibull distributions: An empirical analysis. Physica A 324, 678–688.
43, 54
386. Pisarenko, V.F. (1998) Non-linear growth of cumulative flood losses with time.
Hydrological Processes 12, 461–470. 126
387. Plerou, V., P. Gopikrishnan, X. Gabaix and H.E. Stanley (2004) On the origin
of power-law fluctuations in stock prices. Quantitative Finance 4, 7–11. 41
388. Pochart, B. and J.-P. Bouchaud (2002) The skewed multifractal random walk
with applications to option smiles. Quantitative Finance 2, 303–314. 85
389. Polimenis, V. (2002) The distributional CAPM: Connecting risk premia to
return distributions. Working Paper. 15
390. Poon, S.H., M. Rockinger and J. Tawn (2004) Extreme-value dependence in
financial markets: Diagnostics, models and financial implications. Review of
Financial Studies 17, 581–610. 177, 229, 232, 255
391. Popper, S.W., R.J. Lempert and S.C. Bankes (2005) Shaping the future. Sci-
entific American (April) 28. 275
392. Potters, M.J.-P. Bouchaud, L. Laloux and P. Cizeau (1999) Random matrix
theory. Risk 12(3), 69. 24, 27, 28
393. Prause, K. (1998) The generalized hyperbolic model, Ph.D. dissertation, Uni-
versity of Freiburg. 43, 58
394. President’s Working Group on Financial Markets (1999) Hedge Funds, Lever-
age, and the Lessons of Long-Term Capital Management, Report of the US
Department of Treasury. 24
395. Quintos, C.E. (2004) Dating breaks in the extremal behavior of financial time
series. Working Paper. Available at www.ssb.rochester.edu/fac/quintos/
231, 232
396. Quintos, C.E., Z.H. Fan and P.C.B. Phillips (2001) Structural change tests in
tail behaviour and the Asian crisis. Review of Economic Studies 68, 633–663.
231
397. Ramchand, L. and R. Susmel (1998) Volatility and cross correlation across
major stock markets. Journal of Empirical Finance 5, 397–416. 76, 231
398. Rao, C.R. (1965) Linear Statistical Inference and Its Applications. Wiley, New
York. 3, 152, 154
399. Raychaudhuri, S., Cranston, M., Przybyla, C., Shapir, Y. (2001) Maximal
height scaling of kinetically growing surfaces. Physical Review Letters 87,
136101. 44
400. Reese, C. and B. Rosenow (2003) Predicting multivariate volatility. Preprint
at http://arXiv.org/abs/cond-mat/0304082 276
401. Reinganum, M.R. (1981) A new empirical perspective on the CAPM. Journal
of Financial & Quantitative Analysis 16, 439–462. 14
402. Reisman, H. (2002) Some comments on the APT. Quantitative Finance 2,
378–386. 19
References 303

403. Rheinländer, T. (2005) An entropy approach to the Stein and Stein model with
correlation. Finance & Stochastics 9, 399–413. 136
404. Richardson, M. and T. Smith (1993) A test for multivariate normality in stocks.
Journal of Business 66, 295–321. 99
405. Riedel, F. (2004) Dynamic coherent risk measures. Stochastic Processes & their
Applications 112, 185–200. 276
406. RiskMetrics Group (1997) CreditMetrics. Technical Document. Available at
http://www.riskmetrics.com/research 137
407. Rockafellar, R.T., S. Uryasev and M. Zabarankin (2005) Generalized Devia-
tions in Risk Analysis. Finance & Stochastics 9. Forthcoming. 4, 7, 8, 9
408. Rockafellar, R.T., S. Uryasev and M. Zabarankin (2005) Master Funds in Port-
folio Analysis with General Deviation Measures. Journal of Banking & Finance
29. Forthcoming. 4, 18
409. Rockinger, M. (2005) Modeling the Dynamics of Conditional Dependency be-
tween Financial Series. In Multi-Moment Capital Pricing Models and Related
Topics, C. Adcock, B. Maillet and E. Jurzenko, eds. Springer. Forthcoming.
100
410. Rockinger, M. and E. Jondeau (2002) Entropy densities with an application
to autoregressive conditional skewness and kurtosis. Journal of Econometrics
106, 119–142. 219
411. Rodkin, M.V. and V.F. Pisarenko (2001) Earthquake losses and casualties: A
statistical analysis. In Problems in Dynamics and Seismicity of the Earth: Coll.
Sci. Proc. Moscow. Computational Seismology 31, 242–272 (in Russian). 126
412. Rodriguez, J.C. (2003) Measuring financial contagion: A copula approach.
Working Paper, Eurandom. 100
413. Roll, R. (1988) The international crash of October 1987. Financial Analysts
Journal 4(5), 19–35. 26
414. Roll, R. (1994) What every CFO should know about scientific progress in
financial economics: What is known and what remains to be resolved. Financial
Management 23(2), 69–75. 1, 18, 19, 20, 24
415. Romer, D. (1996) Advanced Macroeconomics. McGraw-Hill, New York. 277
416. Rootzen, H., M.R. Leadbetter and L. de Haan (1998) On the distribution of
tail array sums for strongly mixing stationary sequences. Annals of Applied
Probability 8, 868–885. 43
417. Rosenberg, J.V. (2003) Nonparametric pricing of multivariate contingent
claims. Journal of Derivatives, Spring. 124, 131
418. Ross, S. (1976) The arbitrage theory of capital asset pricing. Journal of Eco-
nomic Theory 17, 254–286. 19
419. Rothschild, M. and J.E. Stiglitz (1970) Increasing risk I: A definition. Journal
of Economic Theory 2, 225–243. 4
420. Rothschild, M. and J.E. Stiglitz (1971) Increasing risk II: Its economic conse-
quences. Journal of Economic Theory 3, 66–84. 4
421. Rubinstein, M. (1973) The fundamental theorem of parameter-preference se-
curity valuation. Journal of Financial & Quantitative Analysis 8, 61–69. 15,
16, 58
422. Rüschendorf, L. (1974) Asymptotic distributions of multivariate rank order
statistics. Annals of Mathematical Statistics 4, 912–923. 198, 222
423. Ruymgaart, F.H. (1974) Asymptotic normality of nonparametric tests for in-
dependence. Annals of Mathematical Statistics 2, 892–910. 198, 222
304 References

424. Ruymgaart, F.H., G.R. Shorack and W.R. van Zwet (1972) Asymptotic nor-
mality of nonparametric tests for independence. Annals of Statistics 44, 1122–
1135. 198, 222
425. Samuelson, P.A. (1965) Rational theory of warrant pricing. Industrial Manage-
ment Review 6(Spring), 13–31. 38, 80
426. Sarfraz, M. (2003) Advances in Geometric Modeling. Wiley, Hoboken. 3
427. Sargent, T.J. (1987) Dynamic Macroeconomic Theory. Harvard University
Press, Cambridge, MA. 19
428. Scott, D.W. (1992) Multivariate Density Estimation: Theory, Practice, and
Visualization. Wiley, New York. 3
429. Sharpe, W. (1964) Capital asset prices: A theory of market equilibrium under
condition of risk. Journal of Finance 19, 425–442. 14, 16, 38
430. Schloegl, L. and D. O’Kane (2004) Pricing and risk-managing CDO tranches.
Quantitative Credit Research, Lehman Brothers. 181
431. Schmeidler, D. (1986) Integral representation without additivity. Proceedings
of the American Mathematical Society 97, 255–261. 7
432. Schumpeter, J.A. (1939) Business Cycles: A Theoretical, Historical and Sta-
tistical Analysis of the Capitalist Process. McGraw-Hill, New York. 277
433. Schürmann, J. (1996) Pattern Classification: A Unified View of Statistical and
Neural Approaches. Wiley, New York. 3
434. Schweizer, M. (1995) On the minimal martingale measure and the Föllmer-
Schweizer decomposition. Stochastic Analysis & Applications 13, 573–599. 136
435. Schweizer, M. (1999) A minimality property of the minimal martingale mea-
sure. Statistics & Probability Letters 42, 27–31. 136
436. Serva, M., U.L. Fulco, M.L. Lyra and G.M. Viswanathan (2002) Kinematics of
stock prices. Preprint at http://arXiv.org/abs/cond-mat/0209103. 78
437. Shefrin, H. (2000) Beyond Greed and Fear: Understanding Behavioral Finance
and the Psychology of Investing. Harvard Business School Press, Boston, MA.
4, 281
438. Shiller, R.J. (2000) Irrational Exuberance. Princeton University Press, Prince-
ton, NJ. 23
439. Shleifer, A. (2000) Inefficient Markets: An Introduction to Behavioral Finance,
Clarendon Lectures in Economics. Oxford University Press, Oxford. 4, 281
440. Silvapulle, P. and C.W.J. Granger (2001) Large returns, conditional correlation
and portfolio diversification: A Value-at-Risk approach. Quantitative Finance
1, 542–551. 231
441. Simon, H.A. (1957) Models of Man: Social and Rational; Mathematical Essays
on Rational Human BEHAVIOR in a Social Setting. Wiley, New York. 39
442. Skaug, H. and D. Tjostheim (1996) Testing for serial independence using mea-
sures of distance between densities. In Athen Conference on Applied Probability
and Time Series, P. Robinson and M. Rosenblatt, eds. Springer, New York.
163
443. Sklar, A. (1959) Fonction de répartition à n dimensions et leurs marges. Pub-
lication de l’Institut de Statistique de l’Université de Paris 8, 229–231. 103,
104
444. Smith, R.L. (1985) Maximum likelihood estimation in a class of non-regular
cases. Biometrika 72, 67–90. 44, 48
445. Sobehart, J. and R. Farengo (2003) A dynamical model of market under- and
overreaction. Journal of Risk 5(4). 57
References 305

446. Sornette, D. (1998) Linear stochastic dynamics with nonlinear fractal proper-
ties. Physica A 250, 295–314. 39
447. Sornette, D. (1998) Large deviations and portfolio optimization. Physica A
256, 251–283. 2
448. Sornette, D. (2002) Predictability of catastrophic events: Material rupture,
earthquakes, turbulence, financial crashes and human birth. Proceedings of the
National Academy of Science USA 99 (Suppl), 2522–2529. 277
449. Sornette, D. (2003) Critical market crashes. Physics Reports 378, 1–98. 276
450. Sornette, D. (2003) Why Stock Markets Crash, Critical Events in Complex
Financial Systems. Princeton University Press, Princeton, NJ. VII, 23, 79,
211, 276
451. Sornette, D. (2004) Critical Phenomena in Natural Sciences, Chaos, Frac-
tals, Self-organization and Disorder: Concepts and Tools, 2nd enlarged edition.
Springer Series in Synergetics, Heidelberg. VIII, 22, 42, 46, 58, 78, 277
452. Sornette, D. (2005) Endogenous versus exogenous origins of crises. In Extreme
Events in Nature and Society, S. Albeverio, V. Jentsch and H. Kantz, eds.
Springer, Heidelberg. 278
453. Sornette, D., J.V. Andersen and P. Simonetti (2000) Portfolio theory for “fat
tails”. International Journal of Theoretical & Applied Finance 3, 523–535. 2,
58, 78, 108, 219, 233
454. Sornette, D. and R. Cont (1997) Convergent multiplicative processes repelled
from zero: Power laws and truncated power laws. Journal de Physique I France
7, 431–444. 39
455. Sornette, D., F. Deschatres, T. Gilbert and Y. Ageon (2004) Endogenous versus
exogenous shocks in complex networks: An empirical test. Physical Review
Letters 93(22), 228701. 278
456. Sornette, D., Y. Malevergne and J.F. Muzy (2003) What causes craches? Risk
16 (2), 67–71. http://arXiv.org/abs/cond-mat/0204626. 87, 210, 278
457. Sornette, D., Simonetti, P. and Andersen, J.V. (2000) φq -field theory for portfo-
lio optimization: “Fat-tails” and non-linear correlations. Physics Reports 335,
19–92. 34
458. Sornette D. and W.-X. Zhou (2005) Predictability of large future changes in
complex systems. International Journal of Forecasting. Forthcoming. Available
at http://arXiv.org/abs/cond-mat/0304601. 278
459. Sornette, D. and W.-X. Zhou (2004) Non-parametric determination of real-
time lag structure between two time series: The “optimal thermal causal path”
method. Working Paper. Available at http://arXiv.org/abs/cond-mat/
0408166 281
460. Sornette, D. and W.-X. Zhou (2005) Importance of positive feedbacks and over-
confidence in a self-fulfilling Ising model of financial Markets. Working Paper.
Available at http://arxiv.org/abs/cond-mat/0503607. 279
461. Srivastava, M.S. (2002) Methods of Multivariate Statistics. Wiley-Interscience,
New York. 3
462. Starica, C. (1999) Multivariate extremes for models with constant conditional
correlations. Journal of Empirical Finance 6, 515–553. 232
463. Starica, C. and O. Pictet (1999) The tales the tails of GARCH(1,1) process
tell. Working Paper, University of Pennsylvania. 66
464. Stollnitz, E.J., T.D. DeRose and D.H. Salesin (1996) Wavelets for Computer
Graphics: Theory and Applications. Morgan Kaufmann Publishers, San Fran-
cisco, CA. 3
306 References

465. Stuart, A. and K. Ord (1994) Kendall’s Advances Theory of Statistics. Wiley,
New York. 10, 58
466. Stulz, R.M. (1982) Options on the minimum or the maximum of two risky
assets: Analysis and applications. Journal of Financial Economics 10, 161–
185. 135
467. Szergö, G. (1999) A critique to Basel regulation, or how to enhance (im)moral
hazards. In Proceedings of the International Conference on Risk Management
and Regulation in Banking. Bank of Israel, Kluwer Academic Press, Dordrecht,
Boston. VIII
468. Tabachnick, B.G. and L.S. Fidell. (2000) Using Multivariate Statistics, 4th
edition. Pearson Allyn & Bacon, Boston and New York. 3
469. Taleb, N.N. (2004) Fooled by Randomness: The Hidden Role of Chance in Life
and in the Markets, 2nd edition. Texere, New York. 21
470. Taleb, N. (2004) Learning to expect the unexpected. The New York Times,
April 8. 36
471. Tashe, D. (2002) Expected shortfall and beyond. Journal of Banking & Finance
26, 1519–1533. 6
472. Tasche, D. and L. Tibiletti (2001) Approximations for the Value-at-Risk ap-
proach to risk-return analysis. The ICFAI Journal of Financial Risk Manage-
ment 1(4), 44–61. 128
473. Taylor, S.J. (1994) Modeling stochastic volatility: A review and comparative
study. Mathematical Finance 2, 183–204. 37
474. Thaler, R.H. (1993) Advances in Behavioral Finance. Russell Sage Foundation,
New York. 4, 281
475. Toulouse, G. (1977) Theory of the frustration effect in spin glasses. Communi-
cations in Physics 2, 115–119. 164
476. Tsallis, C. (1988) Possible generalization of Boltzmann–Gibbs statistics. Jour-
nal of Statistical Physics 52, 479–487. For an updated bibliography on this
subject, see http://tsallis.cat.cbpf.br/biblio.htm 163
477. Tsui, A.K. and Q. Yu (1999) Constant conditional correlation in a bivariate
GARCH model: Evidence from the stock markets of China. Mathematics &
Computers in Simulation 48, 503–509. 231, 276
478. Valdez, E.A. (2001) Bivariate analysis of survivorship and persistency. Insur-
ance: Mathematics & Economics 29, 357–373. 100
479. van der Vaart, A.W., R. Gill, B.D. Ripley, S. Ross, B. Silverman and M. Stein
(2000) Asymptotic Statistics. Cambridge University Press, Cambridge. 275
480. Vaupel, J.W., K.G. Manton and E. Stallard (1979) The impact of heterogeneity
in individual frailty on the dynamics of mortality. Demography 16, 439–454.
113
481. Vannimenus, J. and G. Toulouse (1977) Theory of the frustration effect: II.
Ising spins on a square lattice. Journal of Physics C: Solid State Physics 10,
537–541. 164
482. Von Neuman, J. and O. Morgenstern (1944) Theory of Games and Economic
Behavior. Princeton University Press, Princeton, NJ. 4, 272
483. Wang, T. (2002) A class of dynamic risk measures. Working Paper, Faculty of
Commerce and Business Administration, U.B.C. 276
484. Wilcox, R.R. (2004) Introduction to Robust Estimation and Hypothesis Testing,
2nd edition. Academic Press, New York. 275
485. Wilks, S.S. (1938) The large sample distribution of the likelihood ratio for
testing composite hypotheses. Annals of Mathematical Statistics 9, 60–62. 71
References 307

486. Williamson, R.C. and T. Downs (1990) Probabilistic arithmetic: Numerical


methods for calculating convolutions and dependency bounds. Journal of Ap-
proximate Reasoning 4, 89–158. 118, 127
487. Willekens, E. (1988) The structure of the class of subexponential distributions.
Probability Theory & Related Fields 77, 567–581. 57
488. Wilson, E.O. (1971) The Insect Societies Belknap Press of the Harvard Uni-
versity Press, Cambridge, MA. 22
489. Wilson, K.G. (1979) Problems in physics with many scales of length. Scientific
American 241 (August), 158–179. 22
490. Wilson, K.G. and J. Kogut (1974) The renormalization group and the  ex-
pansion. Physics Report 12, 75–200. 22
491. White, H. (1982) Maximum likelihood estimation of misspecified models.
Econometrica 50, 1–25. 224
492. White, H. (1994) Estimation, Inference and Specification Analysis, Econo-
metric Society Monograph, 22. Cambridge University Press, Cambridge, New
York. 202
493. Wu, L. (2006) Dampened power law: Reconciling the tail behavior of financial
security returns. Journal of Business 79(6). 75
494. Wyart, M. and J.-P. Bouchaud (2003) Self-referential behavior, overreaction,
and conventions in financial markets. Working Paper. Available at http://
arxiv.org/abs/cond-mat/0303584. 279
495. Yaari, M.E. (1987) The dual theory of choice under risk. Econometrica 55,
95–115. 7
496. Zajdenweber, D. (1996) Extreme values in business interruption insurance.
Journal of Risk & Insurance 63, 95–110. 126
497. Zajdenweber, D. (2000) Economie des Extrêmes. Flammarion, Paris. 126
498. Zhang, J. and R.A. Stine (2001) Autocovariance structure of Markov regime
switching models and model selection. Journal of Time Series Analysis 22,
107–124. 279
Index

Akaike information criterion 215 Clayton’s copula 112, 114, 200, 217,
ALAE 201 249
Anderson-Darling distance 61 Kendall’s tau 156
Arbitrage 35, 132, 133 simulation 123
Arbitrage pricing theory X, 19 tail dependence 171
ARCH see GARCH Coefficient of tail dependence see Tail
Archimedean copula 111, 204, 255 dependence
Clayton 112, 114 Coherent measures of risk 4, 276
Frank 112 Comonotonicity 101, 102, 107, 149,
Gumbel 112 155, 160
Kendall’s tau 155 Complete market 133, 136
orthant dependence 166 Complete monotonicity 111
tail dependence 170 Concordance measure 154–162, 165
Asian crisis 211, 230 Conditional correlation coefficient 233
Associativity 114 Consistent measures of risk 7
Asymptotic independence 169 Contagion XI, 231, 260
Contingent claim see Option
Bank for International Settlements Convex measure of risk 7
VIII, 35 Copula X, 34, 35, 103, 273
Bernstein polynomial 191 Archimedean see Archimedean
Bhattacharya-Matusita-Hellinger copula
dependence metric 163, 203 dual 104, 118
Black swan see Outlier elliptical see Elliptical copula
Black-Merton-Scholes’ option pricing extreme value see Extreme value
model VIII, 20, 38 copula
Book-to-market 18 Fréchet-Hoeffding bounds 106
Bootstrap 62, 192, 207 survival 104, 114, 132, 140, 166, 215
British Pound 208, 210 Correlation coefficient 2, 24, 99, 105,
147–154, 165, 173, 174, 189, 219,
Canonical coefficient of N -correlation 220
153, 154 Hoeffding identity 149
Capital asset pricing model IX, 14, 38 Countermonotonicity 107, 149, 155,
Central limit theorem VIII 160, 164
310 Index

Coup against Gorbachev 209, 276 Dual copula 104, 118


Covariance matrix 3, 24, 33
CreditMetrics 138 Efficient market hypothesis 20
CreditRisk+ 137 Elliptical copula 107, 196, 200
CSFB/Tremont index 167 Gaussian copula 108
Currency Kendall’s tau 157
British Pound 208, 210 simulation 120
Euro 195, 203, 217 Student’s copula 109
German Mark 197–199, 208–210, tail dependence 172
214, 215, 276 Empirical copula 190
Japanese Yen 197, 199, 208, 215, 217 Endogeneity 276
Malaysian Ringit 208, 210, 214 Euro 195, 203, 217
Swiss Franc 208–210, 214, 276 European Monetary System 210
Thai Baht 208, 210 Evolutionary stable equilibrium 20
US Dollar 208–210, 240, 276 Exogeneity 277
Expectation-bounded measures of risk
Default risk 100, 137, 180 7
Dependence 2, 101 Expected utility 4, 21, 165
mutual complete 101 Expected-Shortfall 47, 79
positive orthant 119, 164 Exponential distribution 58, 90, 175
Dependence measure 147, 161
Extremal index 46
Dependence metric 162
Extreme value copula 116, 117
Dependence structure see copula
Extreme value theory 43, 45, 254, 273
Derivative see Option
Digital option 131
Distribution function Factor model 3, 19, 24, 29, 111, 138,
Exponential 58, 90, 175 174, 233, 238, 255
Fréchet 45, 255 Fat tail see Heavy tail
Gamma 58, 90, 175 Federal Reserve Board 208
Gaussian 2, 37, 148, 169, 175, 233 Firm size 37
Generalized Pareto 39, 44–47, 116 Foreign exchange rate 215
GEV 45, 47, 116 Fréchet distribution 45, 255
Gumbel 45, 48 Fréchet-Hoeffding bounds 106, 115,
Lévy stable law 2, 39, 42, 148, 243 117, 119
Log-normal 37, 60, 78, 150 Fractality 80
Log-Weibull 60, 69, 91 Fractional Brownian motion 81
Meta-elliptical 109 Frailty model 113
Meta-Gaussian 108 Frank’s copula 112
Modified-Weibull 128 Kendall’s tau 156
Pareto 39, 57, 64, 88 simulation 123
Pearson type-VII 42 tail dependence 171
Shifted-Pareto 126 Friendship theorem 26
Stretched exponential 43, 50, 57
Student t 42, 108, 157, 175, 233 Gain–loss ratio 13
Weibull 50, 57, 67, 88 Gamma distribution 58, 90, 175
Diversification 19, 180 GARCH 35, 37, 43, 108, 205, 217, 219,
Dow Jones Industrial Average Index 231
44, 53, 62, 78 Gaussian copula 108, 128, 130, 131,
Drawdown 23, 36, 79, 276 135, 137, 204, 212, 217
Index 311

Gaussian distribution 2, 37, 148, 169, Lévy stable law 2, 39, 42, 148, 243
175, 233 Lambert function 172
General deviation measures 8 Laplace transform 113
Generalized Extreme Value distribution Latin American crisis 228, 260
45, 47, 116 Argentinean crisis 231, 233, 240, 261
Generalized Pareto distribution 39, Mexican crisis 230, 233, 240, 247,
44–47, 116 261
German Mark 197–199, 208–210, 214, Linear dependence see Correlation
215, 276 coefficient
Gini’s gamma 161, 165 Local correlation coefficient 151
Girsanov theorem 144 Log infinitely divisible process 41
Gnedenko theorem 45, 76 Log-normal distribution 37, 60, 78, 150
Goodness of fit 61, 164, 189 Log-Weibull distribution 60, 69, 91
GPBH theorem 115 LTCM 24
Great Depression 53 Lunch effect 53
Gumbel distribution 45, 48
Gumbel’s copula 112, 117 Malaysian Ringit 208, 210, 214
Kendall’s tau 156 Market crash VII, 23, 36, 38
simulation 123 April 2000 230
tail dependence 171 October 1987 VII, VIII, 26, 230, 247
Market index
Heavy tail 2, 12, 15, 36, 38, 42, 57, 157 CSFB/Tremont 167
Heteroscedasticity see Volatility Dow Jones Industrial Average 44,
clustering 53, 62, 78
High frequency data 35, 37, 44 Ibov 240
Hill estimator 43, 48, 64, 240 Ipsa 240
Hoeffding identity 149 Merval 240
Mexbol 240
Ibov index 240
Nasdaq Composite 53, 77, 230
Incomplete gamma function 57
Standard & Poor’s 500 39, 60, 129,
Inflation VII
167, 177, 216
Information matrix
Market liquidity 6, 41
Fisher 198, 201, 222
Market trend 231, 234, 253, 255
Godambe 202
Markowitz’ portfolio selection see
Internet bubble VII, VIII
Mean-variance portfolio theory
Invariance theorem 105
Maximum domain of attraction 45, 46
Ipsa index 240
Mean-variance portfolio theory VIII,
Japanese Yen 197, 199, 208, 215, 217 33, 38, 58
Merton model of credit risk 20, 137
Kendall’s tau 154, 165, 196, 200, 249 Merval index 240
Archimedean copula 155 Meta-elliptical distribution 109
elliptical copula 157 Meta-Gaussian distribution 108
Kernel estimator 192 Mexbol index 240
King see Outlier Micro-structure 189, 215
KMV 138 Minimum option 135
Kolmogorov distance 61 Minority game 22
Kullback-Leibler divergence 61, 163 Mixture model 137
Modified-Weibull distribution 128, 131
Lévy process 35 Monte Carlo 120, 192
312 Index

Multifractal Random Walk 39, 84, Self-organized criticality 277


210, 219, 278 Self-similarity 80
Mutual complete dependence 101 Semi-invariant 10
Shannon entropy 163
Nasdaq Composite index 53, 77, 230 Sharpe’s market equilibrium model
New economy 230 see Capital asset pricing model
Normal law see Gaussian distribution Shifted-Pareto distribution 126
Occam’s razor 212 Sklar’s theorem 104, 107, 120, 190
Option 33, 100, 131, 192 Spearman’s rho 159, 165, 196, 248
digital 131 Spectral measures of risk 6
minimum 135 Standard & Poor’s 500 39, 60, 129,
rainbow 135 167, 177, 216
Outlier 23, 30, 36, 80, 206, 209, 210, Stone-Weierstrass theorem 192
276 Stress testing 10, 43, 210, 214
Student t distribution 42, 108, 157,
Pareto distribution 39, 57, 64, 88 175, 233
Pearson estimator 148, 157 Student’s copula 109, 117, 131, 195,
Pearson type-VII distribution 42 212, 256
Pickands estimator 43, 47–49 Survival copula 104, 114, 132, 140, 166,
Portfolio 3, 100, 179, 189, 200, 216 215
analysis 205 Swarm intelligence 22
insurance 14, 23 Swiss Franc 208–210, 214, 276
management VIII, 180, 212, 217, 231 Swiss National Bank 210
risk 3, 33, 124, 127, 128, 177, 192
theory 3, 33 Tail dependence 168, 192, 212, 220,
Positive orthant dependence 119, 164 233, 254
Archimedean copula 166 Archimedean copula 170
Pseudo likelihood 197, 215, 222 elliptical copula 172
Pseudo-sample 197, 206 factor model 174
Tail risk 124, 131, 177, 216
Quantile 35 Thai Baht 208, 210
Theorem
Rainbow option 135
central limit VIII
Regular variation 39, 171
friendship 26
Risk VII, VIII, 1
Girsanov 144
analysis 13
Gnedenko 45, 76
assessment 78, 124, 212, 231
GPBH 46, 115
aversion 4, 16, 17
invariance 105
management VIII, 35, 79, 100, 205,
Sklar 104, 107, 190
271
Stone-Weierstrass 192
measure 10
Wilks 71, 93, 216, 224
coherent measures of risk 4, 276
consistent measures of risk 7 US Dollar 208–210, 240, 276
expectation-bounded measures of
risk 7 Value-at-Risk 2, 5, 43, 46, 79, 100, 124,
general deviation measures 8 128, 131, 168, 271, 276
spectral measures of risk 6 Volatility 2, 231
premium 37 clustering 35, 38, 39, 43, 194, 217
Russian crisis 230 Volume of transactions 41

Securization 100 Weibull distribution 50, 57, 67, 88


Wilks theorem 71, 93, 216, 224

You might also like