Multivariate Distributions
Multivariate Distributions
Multivariate Distributions
We will study multivariate distributions in these notes, focusing1 in particular on multivariate normal,
normal-mixture, spherical and elliptical distributions. In addition to studying their properties, we will also discuss
techniques for simulating and, very briefly, estimating these distributions. Familiarity with these important
classes of multivariate distributions is important for many aspects of risk management. We will defer the study
of copulas until later in the course.
1 Preliminary Definitions
Let X = (X1 , . . . Xn ) be an n-dimensional vector of random variables. We have the following definitions and
statements.
Definition 1 (Joint CDF) For all x = (x1 , . . . , xn )> ∈ Rn , the joint cumulative distribution function (CDF)
of X satisfies
FX (x) = FX (x1 , . . . , xn ) = P (X1 ≤ x1 , . . . , Xn ≤ xn ).
It is straightforward to generalize the previous definition to joint marginal distributions. For example, the joint
marginal distribution of Xi and Xj satisfies Fij (xi , xj ) = FX (∞, . . . , ∞, xi , ∞, . . . , ∞, xj , ∞, . . . ∞). If the
joint CDF is absolutely continuous, then it has an associated probability density function (PDF) so that
Z x1 Z xn
FX (x1 , . . . , xn ) = ··· f (u1 , . . . , un ) du1 . . . dun .
−∞ −∞
Similar statements also apply to the marginal CDF’s. A collection of random variables is independent if the joint
CDF (or PDF if it exists) can be factored into the product of the marginal CDFs (or PDFs). If
X1 = (X1 , . . . , Xk )> and X2 = (Xk+1 , . . . , Xn )> is a partition of X then the conditional CDF satisfies
where fX1 (·) is the joint marginal PDF of X1 . Assuming it exists, the mean vector of X is given by
>
E[X] := (E[X1 ], . . . , E[Xn ])
1 We will not study copulas in these notes as well defer this topic until later in the course.
Multivariate Distributions 2
so that the (i, j)th element of Σ is simply the covariance of Xi and Xj . Note that the covariance matrix is
symmetric so that Σ> = Σ, its diagonal elements satisfy Σi,i ≥ 0, and it is positive semi-definite so that
x> Σx ≥ 0 for all x ∈ Rn . The correlation matrix, ρ(X) has as its (i, j)th element ρij := Corr(Xi , Xj ). It is
also symmetric, positive semi-definite and has 1’s along the diagonal. For any matrix A ∈ Rk×n and vector
a ∈ Rk we have
and, if it exists, the moment-generating function (MGF) is given by (3) with s replaced by −i s.
The standard multivariate normal has µ = 0 and Σ = In , the n × n identity matrix. The PDF of X is given by
1 1 >
Σ−1 (x−µ)
f (x) = e− 2 (x−µ) (4)
(2π)n/2 |Σ|1/2
Recall again our partition of X into X1 = (X1 , . . . , Xk )> and X2 = (Xk+1 , . . . , Xn )> . If we extend this
notation naturally so that
µ1 Σ11 Σ12
µ = and Σ = .
µ2 Σ21 Σ22
then we obtain the following results regarding the marginal and conditional distributions of X.
Marginal Distribution
The marginal distribution of a multivariate normal random vector is itself multivariate normal. In particular,
Xi ∼ MN(µi , Σii ), for i = 1, 2.
Conditional Distribution
Assuming Σ is positive definite, the conditional distribution of a multivariate normal distribution is also a
multivariate normal distribution. In particular,
X2 | X1 = x1 ∼ MN(µ2.1 , Σ2.1 )
Multivariate Distributions 3
Linear Combinations
Linear combinations of multivariate normal random vectors remain normally distributed with mean vector and
covariance matrix given by (1) and (2), respectively.
M = U> DU
where U is an upper triangular matrix and D is a diagonal matrix with positive diagonal elements. Since Σ is
symmetric positive-definite, we can therefore write
√ √ √ √
Σ = U> DU = (U> D)( DU) = ( DU)> ( DU).
√
The matrix C = DU therefore satisfies C> C = Σ. It is called the Cholesky Decomposition of Σ.
ans =
0.9972 0.4969 0.4988
0.4969 1.9999 0.2998
0.4988 0.2998 1.4971
We must be very careful2 in Matlab3 and R to pre-multiply Z by C> and not C. We have the following
algorithm for generating multivariate random vectors, X.
Generating Correlated Normal Random Variables
generate Z ∼ MN(0, I)
/∗ Now compute the Cholesky Decomposition ∗/
compute C such that C> C = Σ
set X = C> Z
3 Normal-Mixture Models
Normal-mixture models are a class of models generated by introducing randomness into the covariance matrix
and / or the mean vector. Following the development of MFE we have the following definition of a normal
variance mixture:
where
(i) Z ∼ MNk (0, Ik )
(ii) W ≥ 0 is a scalar random variable independent of Z and
(iii) A ∈ Rn×k and µ ∈ Rn are a matrix and vector of constants, respectively.
Note that if we condition on W , then X is multivariate normally distributed. This observation also leads to an
obvious simulation algorithm for generating samples of X: first simulate a value of W and then simulate X
conditional on the generated value of W . We are typically interested in the case where rank(A) = n ≤ k and
Σ = AA> is a full-rank positive definite matrix. In this case we obtain a non-singular normal variance mixture.
Assuming W is integrable4 , we immediately see that
We refer to µ and Σ as the location vector and dispersion matrix of the distribution. It is also clear that the
correlation matrices of X and AZ are the same as long as W is integrable. This means that if A = Ik then the
components of X are uncorrelated, though they are not in general independent. The following result5
emphasizes this point.
Lemma 1 Let X = (X1 , X2 ) have a normal mixture distribution with A = I2 , µ = 0 and E[W ] < ∞ so that
Cov(X1 , X2 ) = 0. Then X1 and X2 are independent if and only if W is a constant with probability 1. (If W is
constant then X1 and X2 are IID N(0, W ).)
Proof: If W is a constant then it immediately follows from the independence of Z1 and Z2 that X1 and X2 are
also independent. Suppose now that X1 and X2 are independent. Note that
E[|X1 | |X2 |] = E[W |Z1 | |Z2 |] = E[W ] E[|Z1 | |Z2 |]
√ 2
≥ E[ W ] E[|Z1 | |Z2 |] = E[|X1 |] E[|X2 |]
with equality only if W is a constant. But the independence of X1 and X2 implies that we must have equality
and so W is indeed constant almost surely.
We can easily calculate the characteristic function of a normal variance mixture. Using (5), we obtain
h > i h h > ii
φX (s) = E eis X = E E eis X |W
h > 1 > i
= E eis µ− 2 W s Σs
is> µ c 1 >
= e W s Σs (6)
2
c (·) is the Laplace transform of W . As a result, we sometimes use the notation X ∼ Mn µ, Σ, W
where W c for
normal variance mixtures. We have the following proposition6 showing that affine transformations of normal
variance mixtures remain normal variance mixtures.
Proposition 1 If X ∼ Mn µ, Σ, W c and Y = BX + b for B ∈ Rk×n and b ∈ Rk then
Y ∼ Mk Bµ + b, BΣB> , W c .
The proof is straightforward using (6). This result is useful in the following setting: suppose a collection of risk
factors has a normal variance mixture distribution. Then the usual linear approximation to the loss distribution
will also have a (1-dimensional) normal variance mixture distribution.
5 This result is Lemma 3.5 in MFE.
6 This is Proposition 3.9 in MFE.
Multivariate Distributions 6
UX ∼ X (7)
3. For all a ∈ Rn
a> X ∼ ||a|| X1
where ||a||2 = a> a = a21 + · · · + a2n .
Proof: The proof is straightforward but see Section 3.3 of MFE for details.
Part (2) of Theorem 2 shows that the characteristic function of a spherical distribution is completely determined
by a function of a scalar variable. This function, ψ(·), is known as the generator of the distribution and it is
common to write X ∼ Sn (ψ).
and so it follows from (8) that X is spherical with generator ψ(s) = exp(− 21 s).
It is worth noting that there are also spherical distributions that are not normal variance mixture distributions.
Another important and insightful result regarding spherical distributions is given in the following theorem. A
proof may be found in Section 3.4 of MFE.
Theorem 3 The random vector X = (X1 , . . . , Xn ) has a spherical distribution if and only if it has the
representation
X ∼ RS
where S is uniformly distributed on the unit sphere S n−1 := {s ∈ Rn : s> s = 1} and R ≥ 0 is a random
variable independent of S.
X ∼ µ + AY
where Y ∼ Sk (ψ) and A ∈ Rn×k and µ ∈ Rn are a matrix and vector of constants, respectively.
We therefore see that elliptical distributions are obtained via multivariate affine transformations of spherical
distributions. It is easy to calculate the characteristic function of an elliptical distribution. We obtain
h > i
φX (s) = E eis (µ + A Y)
>
h > >
i
= eis µ E ei(A s) Y
>
= eis µ ψ s> Σs
Multivariate Distributions 8
where as before Σ = AA> . It is common to write X ∼ En (µ, Σ, ψ) and we refer to µ and Σ as the location
vector and dispersion matrix, respectively. It is worth mentioning, however, that Σ and ψ are only uniquely
determined up to a positive constant.
As mentioned earlier, the elliptical distributions form a rich class of distributions, including both heavy- and
light-tailed distributions. Their importance is due to this richness as well as to their general tractability. For
example, elliptical distributions are closed under linear operations. Moreover, the marginal and conditional
distributions of elliptical distributions are elliptical distributions. They may be estimated using maximum
likelihood methods such as the EM algorithm or other iterative techniques. Additional information and
references may be found in MFE but we note that software applications such as R or Matlab will often fit these
distributions for you.