Multivariate Normal Distribution
Multivariate Normal Distribution
There exists a random -vector z, whose components are independent standard normal random variables, a k-vector , and a k matrix A, such that
x = Az + . Here is the rank of the covariance
matrix = AA. Especially in the case of full rank,
see the section below on Geometric interpretation.
There is a k-vector and a symmetric, nonnegativedenite kk matrix , such that the characteristic
function of x is
x N (, ),
or to make it explicitly known that X is k-dimensional,
x Nk (, ).
3 Properties
The multivariate normal distribution is said to be nondegenerate when the symmetric covariance matrix
is positive denite. In this case the distribution has
density[2]
= [Cov[Xi , Xj ]], i = 1, 2, . . . , k; j = 1, 2, . . . , k
(
)
1
T 1
exp (x ) (x ) ,
fx (x1 , . . . , xk ) =
2
(2)k ||
Denition
3 PROPERTIES
This is because the above expression - but without being inside a signum function - is the best linear unbiased
prediction of Y given a value of X.[4]
3.1.2 Degenerate case
If the covariance matrix is not full rank, then the multivariate normal distribution is degenerate and does not
have a density. More precisely, it does not have a density
with respect to k-dimensional Lebesgue measure (which
is the usual measure assumed in calculus-level probability courses). Only random vectors whose distributions are
Bivariate normal joint density
absolutely continuous with respect to a measure are said
to have densities (with respect to that measure). To talk
reduces to that of the univariate normal distribution if about densities but avoid dealing with measure-theoretic
is a 1 1 matrix (i.e. a single real number).
complications it can be simpler to restrict attention to a
Note that the circularly-symmetric version of the subset of rank() of the coordinates of x such that the
complex normal distribution has a slightly dierent form. covariance matrix for this subset is positive denite; then
the other coordinates may be thought of as an ane funcEach iso-density locusthe locus of points in k- tion of the selected coordinates.
dimensional space each of which gives the same particular value of the densityis an ellipse or its higher- To talk about densities meaningfully in the singular case,
dimensional generalization; hence the multivariate nor- then, we must select a dierent base measure. Using
the disintegration theorem we can dene a restriction of
mal is a special case of the elliptical distributions.
Lebesgue measure to the rank() -dimensional ane
The descriptive statistic (x )T 1 (x ) in the non- subspace of Rk where the Gaussian distribution is supdegenerate multivariate normal distribution equation is ported, i.e. { + 1/2 v : v Rk } . With respect to this
known as the square of the Mahalanobis distance, which measure the distribution has density:
represents the distance of the test point x from the mean
. Note that in case when k = 1 , the distribution reduces
(
) 12
to a univariate normal distribution and the Mahalanobis
+
1
f (x) = det(2)
e 2 (x) (x)
distance reduces to the standard score.
where + is the generalized inverse and det* is the
pseudo-determinant.[5]
Bivariate case
f (x, y) =
+
exp
2
2
2(1 2 )
X
2X Y 1 2
Main article:YIsserlis theorem X Y
where is the correlation between X and Y and where
X > 0 and Y > 0 . In this case,
The kth-order moments of x are dened by
( )
X
,
Y
(
=
2
X
X Y
)
X Y
.
Y2
def
xj j
r
j=1
y(x) = sgn()
Y
(x X ) + Y .
X
1,...,2 (x ) =
(ij k XZ )
3.4
Entropy
a kth ( = 2 = 6) central moment, you will be summing the i.e. with the conjugate transpose (indicated by ) replacproducts of = 3 covariances (the - notation has been ing the normal transpose (indicated by T ). This is slightly
dropped in the interests of parsimony):
dierent than in the real case, because the circularlysymmetric version of the complex normal distribution has
a slightly dierent form.
E[x1 x2 x3 x4 x5 x6 ]
A similar notation is used for multiple linear regression.[6]
= E[x1 x2 ]E[x3 x4 ]E[x5 x6 ] + E[x1 x2 ]E[x3 x5 ]E[x4 x6 ] + E[x1 x2 ]E[x3 x6 ]E[x4 x5 ]
+ E[x1 x3 ]E[x2 x4 ]E[x5 x6 ] + E[x1 x3 ]E[x2 x5 ]E[x4 x6 ] + E[x1 x3 ]E[x2 x6 ]E[x4 x5 ]
+ E[x1 x4 ]E[x2 x3 ]E[x5 x6 ] + E[x1 x4 ]E[x2 x5 ]E[x3 x6 ] 3.4
+ E[x1Entropy
x4 ]E[x2 x6 ]E[x3 x5 ]
+ E[x1 x5 ]E[x2 x3 ]E[x4 x6 ] + E[x1 x5 ]E[x2 x4 ]E[x3 x6 ] + E[x1 x5 ]E[x2 x6 ]E[x3 x4 ]
The dierential entropy of the multivariate normal distri+ E[x1 x6 ]E[x2 x3 ]E[x4 x5 ] + E[x1 x6 ]E[x2 x4 ]E[x3 x5 ] bution
+ E[x1isx[7]6 ]E[x2 x5 ]E[x3 x4 ].
This yields (2 1)!/(21 ( 1)!) terms in the sum
(15 in the above case), each being the product of (in
this case 3) covariances. For fourth order moments (four
variables) there are three terms. For sixth-order moments
there are 3 5 = 15 terms, and for eighth-order moments
there are 3 5 7 = 105 terms.
h (f ) =
=
1
ln ((2e)n ||) ,
2
The covariances are then determined by replacing the where the bars denote the matrix determinant.
terms of the list [1, . . . , 2] by the corresponding terms
of the list consisting of r1 ones, then r2 twos, etc.. To illustrate this, examine the following 4th-order central mo- 3.5 KullbackLeibler divergence
ment case:
The KullbackLeibler divergence from N0 (0 , 0 ) to
N1 (1 , 1 ) , for non-singular matrices 0 and 1 , is:[8]
[ 4]
2
E xi = 3ii
[
]
{
(
)
1
E x3i xj = 3ii ij
T
DKL (N0 N1 ) =
tr 1
0 + (1 0 ) 1
1
1 (1 0 ) K + l
[ 2 2]
2
2
E xi xj = ii jj + 2 (ij )
]
[
where K is the dimension of the vector space.
E x2 x x = + 2
i
j k
ii
jk
ij
ik
Likelihood function
3.7
5 CONDITIONAL DISTRIBUTIONS
Prediction Interval
But it is not true that two random variables that are (separately, marginally) normally distributed and uncorrelated
The prediction interval for the multivariate normal distri- are independent. Two random variables that are normally
bution yields a region consisting of those vectors x satis- distributed may fail to be jointly normally distributed, i.e.,
the vector whose components they are may fail to have a
fying
multivariate normal distribution. In the preceding example, clearly X and Y are not independent, yet choosing c
to be 1.54 makes them uncorrelated.
T 1
2
(x ) (x ) (p).
k
Joint normality
[
4.1
]
[
]
1
q1
Normally distributed and independent = sizes with (N q) 1
2
4.2
= 1 + 12 22 (a 2 )
Two normally distributed random
variables need not be jointly bivariate
and covariance matrix
normal
[12]
= 11 12 1
22 21 .
x
2
1
1
12
2
22
variate normal distribution then any two or more of its
are
independent.
components that are uncorrelated are independent. This
implies that any two or more of its components that are The matrix 12 22 1 is known as the matrix of regression
pairwise independent are independent.
coecients.
5.1
Bivariate case
11
13
and covariance matrix =
.
31 33
5.2
5.2.1
7 Ane transformation
( )
(( ) ( 2
X1
1
1
N
,
X2
2
1 2
1 2
22
))
If y = c + Bx is an ane transformation of x N (, ),
where c is an M 1 vector of constants and B is a constant
M N matrix, then y has a multivariate normal distribuThe conditional expectation of X1 given X2 is:
tion with(expected value c) + B and variance BBT i.e.,
E(X1 |X2 = x2 ) = 1 + 21 (x2 2 )
y N c + B, BBT . In particular, any subset of
Proof: the result is simply obtained taking the expectation the xi has a marginal distribution that is also multivariate
of the conditional distribution X1 |X2 above.
normal. To see this, consider the following example: to
extract the subset (x1 , x2 , x4 )T , use
1 0
B = 0 1
0 0
0
0
0
0
0
1
0
0
0
...
...
...
0
0
0
(z)
E(X1 | X2 < z) =
,
(z)
E(X1 | X2 > z) =
(z)
,
(1 (z))
where the nal ratio here is called the inverse Mills ratio.
Proof: the last two results are obtained using the result
E(X1 | X2 = x2 ) = x2 , so that
E(X1 | X2 < z) = E(X2 | X2 < z) and
then using the properties of the expectation of
a truncated normal distribution.
[
B = b1
b2
...
]
b n = bT .
8 Geometric interpretation
See also: Condence region
11
Estimation of parameters
p(, ) = p( | ) p(),
where
p( | ) N (0 , m1 ),
and
p() W 1 (, n0 ).
Then,
p( | , X)
p( | X)
1
f (x) =
exp (x )T 1 (x )
k
2
(2) ||
b = 1
N
W
n
x+m0
, 1
(n+m n+m
+ nS +
nm
x
n+m (
)
0 )(
x 0 ) , n + n0 ,
= n1 ni=1 xi ,
x
n
)(xi x
) .
S = n1 i=1 (xi x
n1
.
n
An unbiased sample covariance is
b = 1
b = 1
) (xj x
)T
(xj x
n j=1
b =
E[]
The Fisher information matrix for estimating the parameters of a multivariate normal distribution has a closed
form expression. This can be used, for example, to compute the CramrRao bound for parameter estimation in
this setting. See Fisher information for more details.
n
n
]3
1 [
b 1 (xj x
)T
)
(xi x
6n i=1 j=1
}
{ n
]2
1 [
n
T b 1
) k(k + 2)
) (xi x
(xi x
B=
8k(k + 2) n i=1
A=
7
Under the null hypothesis of multivariate normality, the
statistic A will have approximately a chi-squared distribution with 1/6k(k + 1)(k + 2) degrees of freedom, and
B will be approximately standard normal N(0,1).
b
(xj x)
|t| /2
T =
e
e
(t)dt
Rk n j=1
n
2
1 2 (xi xj )T
b 1 (xi xj )
e 2
2
n i,j=1
n(1 + 2 )k/2
12
13 See also
Chi distribution, the pdf of the 2-norm (or
Euclidean norm) of a multivariate normally distributed vector (centered at zero).
Complex normal distribution, for the generalization
to complex valued random variables.
Copula, for the denition of the Gaussian or normal
copula model.
Multivariate stable distribution extension of the
multivariate normal distribution, when the index
(exponent in the characteristic function) is between
n
zero2to two.
b 1 (xi
1
2(1+
x)T
x)
2 ) (xi
e
+
(1 + 2 2 )k/2
i=1 Mahalanobis distance
Wishart distribution
14 References
[1] Gut, Allan (2009) An Intermediate Course in Probability,
Springer. ISBN 9781441901613 (Chapter 5)
[2] UIUC, Lecture 21. The Multivariate Normal Distribution,
21.5:"Finding the Density.
[3] Hamedani, G. G.; Tata, M. N. (1975). On the determination of the bivariate normal distribution from
distributions of linear combinations of the variables.
The American Mathematical Monthly 82 (9): 913915.
doi:10.2307/2318494.
[4] Wyatt, John. Linear least mean-squared error estimation (PDF). Lecture notes course on applied probability.
Retrieved 23 January 2012.
[5] Rao, C.R. (1973). Linear Statistical Inference and Its Applications. New York: Wiley. pp. 527528.
[6] Tong, T. (2010) Multiple Linear Regression : MLE and
Its Distributional Results, Lecture Notes
[7] Gokhale, DV; Ahmed, NA; Res, BC; Piscataway,
NJ (May 1989).
Entropy Expressions and Their
Estimators for Multivariate Distributions. Information Theory, IEEE Transactions on 35 (3): 688692.
doi:10.1109/18.30996.
14
REFERENCES
14.1 Literature
Rencher, A.C. (1995). Methods of Multivariate
Analysis. New York: Wiley.
15
15.1
Multivariate normal distribution Source: https://en.wikipedia.org/wiki/Multivariate_normal_distribution?oldid=674792766 Contributors: AxelBoldt, Bryan Derksen, Miguel~enwiki, Mrwojo, Michael Hardy, Delirium, SebastianHelm, Ciphergoth, RickK, Selket, Kaal,
Zero0000, McKay, Moriel~enwiki, Robbot, Benwing, Sanders muc, Meduz, Robinh, Giftlite, Cfp, J heisenberg, BenFrantzDale, Pycoucou, Jason Quinn, Lockeownzj00, MarkSweep, Oliver Jennrich, Picapica, Discospinster, Rich Farmbrough, TedPavlic, Paul August,
Nabla, Mdf, SgtThroat, O18, Zelda~enwiki, BernardH, Cburnett, Jheald, Forderud, Tabletop, Waldir, Btyner, Karam.Anthony.K, Rjwilmsi,
Winterstein, Strashny, Jondude11, Krishnavedala, YurikBot, Wavelength, Jess Riedel, Alanb, Ogo, Mebden, Zvika, SmackBot, Delldot,
Mcld, Oli Filth, DHN-bot~enwiki, Hongooi, MaxSem, AussieLegend, SimonFunk, Lambiam, Ninjagecko, Ulner, Breno, Yoderj, Ojan,
Moonkey, Lavaka, Vaughan Pratt, Jackzhp, Shorespirit, Winterfors, KipKnight, Myasuda, Mct mht, 137 0, Talgalili, Thijs!bot, Opabinia
regalis, Colin Rowat, MER-C, Arvinder.virk, Coee2theorems, A3nm, To Serve Man, Jlataire, Gwern, R'n'B, Steve8675309, Jorgenumata, Quantling, HyDeckar, Epistemenical, TXiKiBoT, Ramiromagalhaes, PhysPhD, Derfugu, Mauryaan, Rlendog, Riancon, Tommyjs,
Toddst1, Chromaticity, Jasondet, Mjdslob, Josuechan, Melcombe, Eamon Nerbonne, Rumping, Unbuttered Parsnip, Lbertolotti, Alexbot,
EtudiantEco, Sun Creator, Qwfp, Addbot, DOI bot, Fgnievinski, SpBot, Wikomidia, Peni, Yobot, Wjastle, Amirobot, YetAnotherBunny,
AnomieBOT, Rubinbot, Xqbot, Daviddengcn, Podgy piglet, FrescoBot, Omrit, Citation bot 1, Stpasha,
, Distantcube, KrodMandooon, Richardcherron, Duoduoduo, Sinuhet, RjwilmsiBot, KurtSchwitters, Velocitay, EmausBot, Set theorist, KHamsun, ZroBot, Quondum, Sigma0 1, Mikhail Ryazanov, Mathstat, Longbiaochen, , Andrestellez84, ChristophE, Dannybix, BlueScreenD, Marc.coram,
Isch de, Ivan Ukhov, Viraltux, ChrisGualtieri, Bscan, Dexbot, Rely2004609, Shedeki, Sourov0000, Mark viking, Jmeyers314, RowanMcAllister, Hamoudafg, Luminans, Dorafanxia, Mcrucix en, Maimonid, Mihev, Xcpenguin, RolLoDk, BeyondNormality, CodeEast, Kai
genome, Sa3016, MisterMoop and Anonymous: 200
15.2
Images
15.3
Content license