Data Reduction or Structural Simplification
Data Reduction or Structural Simplification
Data Reduction or Structural Simplification
1
Relationships between variables must be determined for the purpose of predicting the
values of one or more variables on the basis of observations on the other variables.
5. Hypothesis construction and testing
Specific statistical hypotheses, formulated in terms of the parameters of multivariate
populations are tested. This may be done to validate assumptions.
1.2. Areas of application
The applications of multivariate techniques have been in the behavioral and biological
sciences. However, interest in multivariate methods has now spread to numerous other fields
of investigation. Many organizations today are faced with the same challenge: too much
data. These include:
Business - customer transactions
Communications - website use
Government – intelligence /news
Industry - process data and etc
1.3. Organizing multivariate data
The values of the variables are all recorded for each distinct item, individual, or experimental
Or we can display these data as a rectangular array called X, of n rows and p columns:
2
[X=¿x1 x12⋯x1k⋯x1p¿][x21 x2⋯x2k⋯x2p¿][⋮ ⋮⋮¿][xj1 xj2 xjk⋯xjp¿][⋮ ⋮⋮⋮¿]¿
¿
The array X contains the data consisting of all of the observations on all of the variables.
1.4. Descriptive Statistics
Summary numbers to assess the information contained in data. Basic descriptive statistics are
sample mean, sample variance, sample standard deviation, sample covariance, and sample
correlation coefficient.
x , x , ..., x
Let 11 21 n 1 be n measurements on the first variables. Then the arithmetic average of
these measurements is
n
1
x̄ 1= ∑ x j1
n j=1
1.1
The sample mean can be computed from the n measurements on each of the p variables, so
that, in general, there will be p sample means:
n
1
x̄ k = ∑ x jk k =1 , 2 , . .. , p
n j=1
1.2
A measure of spread is provided by the sample variance, defined for n measurements on the
first variable as
n
1
s 21 = ∑ ( x j 1− x̄ 1 ) 2
n−1 j=1
1.3
x 's
Where, x̄ 1 is the sample mean of the j 1 . In general, for p variables, we have
n
1
s 2k = ∑ ( x jk − x̄ k )2 k =1 , 2 , .. . , p
n−1 j=1
1.4
Note:
1. Many authors define the sample variance with a divisor of n-1 rather than n. Later,
we shall see that there are theoretical reasons for doing this, and it is particularly
appropriate if the number of measurements, n is small. The two versions of the sample
variance will always be differentiated by displaying the appropriate expression.
3
2
2. The sample variance s σ 2 (the
is generally never equal to the population variance
2
probability of such an occurrence is zero), but it is an unbiased estimator for σ ;
2 2 2
that is E( s )=σ . Again, the notation E( s ) indicates the mean of all possible
sample variances. The square root of either the population variance or sample variance
is called standard deviation.
In this situation, it is convenient to use double subscripts on the variances in order to indicate
their positions in the array. Therefore, we introduce the notation
s kk to denote the same
variance computed from measurements on the kth variable as follows:
n
1
s 2k = s kk= ∑ ( x − x̄ )2 k =1 , 2 , .. . , p
n−1 j=1 jk k
1.5
r
Sample correlation coefficient ( ik ) is used to measure of the linear association between
two variables does not depend on the units of measurement. To find a measure of linear
relationship that is invariant to changes of scale, we can standardize the covariance by
dividing by the standard deviations of the two variables. This standardized covariance is
called a correlation. The sample correlation coefficient for the ith and kth variables is
defined as:
4
n
s ik ∑ ( x ji− x̄ i ) ( x jk − x̄ k )
j=1
r ik = = for i=1 , 2 , . . ., p and k =1 , 2 , . .. , p
√ sii √ s kk n n
√∑ (
j=1
x ji− x̄ i )2
√∑ (j=1
x jk − x̄ k ) 2
1.7
r =r
Note: ik ki for all i and k. The sample correlation coefficient is a standardized version of
the sample covariance, where the product of the square roots of the sample variances provides
the standardization.
Notice that
r ik has the same value whether n or n-1 is chosen as the common divisor for
s ii , s kk , and s ik .
Properties of Sample Correlation Coefficient
Its value is between -1 and 1
Magnitude measure the strength of the linear association. If r=0 , this implies a
lack of linear association between the components. If r <0 implies a tendency for
one value in the pair to be larger than its average when the other is smaller than its
average, and r >0 implies a tendency for one value in the pair to be larger when
the other is large and also for both values to be small together.
Sign indicates the direction of the association
Its value remains unchanged if all
x ji ' s and x jk ' s are changed to y ji =ax ji +b
and
y jk =cx jk +d , respectively, provided that the constants a and c have the same
sign.
The descriptive statistics computed from n measurements on p variables can also be organized
into arrays.
Arrays of Basic Descriptive Statistics
[x̄=¿ x̄1¿][x̄2¿][⋮¿]¿¿
Sample means ¿
Sample variances and co-variances
x̄ x̄ ¿
X̄=¿ [ 1 ¿ ] [ 2 ¿] ¿¿
Therefore, ¿
The sample variances and covariances are
5 2 2 2 2
1 2 ( 9−5 .6 ) + ( 2−5. 6 ) +2∗( 6−5 .6 ) + ( 5−5. 6 )
s 11 = ∑ ( x j 1− x̄ 1 ) = =6 .3
4 j =1 4
5 2 2 2 2 5 2 2 2 2
1 ( 12−8 ) +(6−8) +( 4−8) + (10−8) 1 2 ( 12−8 ) + ( 6−8 ) + ( 4−8 ) + (10−8 )
s22= ∑ ( x j2− x̄2)2= =10 s 22= ∑ ( x j2 − x̄ 2 ) = =10
4 j=1 4 4 j=1 4
5 2 2 2 2
1 2 ( 3−2 ) + ( 4−2 ) + ( 0−2 ) + ( 1−2 )
s 33= ∑ ( x j3 − x̄ 3 ) = =2. 5
4 j=1 4
5
1 ( 9−5 .6 )( 12−8 ) +…+ ( 5−5 . 6 ) ( 10−8 )
s 12= ∑ ( x j 1 − x̄ 1 ) ( x j 2 − x̄ 2 ) = =5
4 j =1 4
5
1 ( 9−5 . 6 ) ( 3−2 ) +…+ ( 5−5 .6 )( 1−2 )
s 13= ∑ ( x j1 − x̄ 1 ) ( x j3 − x̄ 3 ) = =−1. 75
4 j=1 4
6
5
1 ( 3−2 ) ( 12−8 )+…+ (1−2 ) (10−8 )
s 23= ∑ ( x j2 − x̄ 2 ) ( x j3 − x̄ 3 )= =1 .5
4 j=1 4
7
Chapter Two
Review of Matrix Algebra and Random Vectors
2.1. Basics concepts
The multivariate data can be easily displayed as an array of numbers. In general, a rectangular
array of numbers with n rows and p columns is called a matrix of dimension n×p . The
study of multivariate methods is greatly facilitated by the use of matrix algebra. The matrix
algebra results presented in this chapter will enable us to concisely state statistical models.
2.2. Vector and matrix
2.2.1. Vectors
A vector is a matrix with a single column or row. An array x of n real numbers
x 1 , x 2 ,..., xn is called a vector, and it is written as
[x1¿][x2¿][⋮¿] ¿
x=
'
¿ is a vector of length n. That is x ~ n×1
x =[ x1 , x 2 , ⋯, x n ] is a transpose of x. That is x ' ~ 1×n
A set of vectors
x 1 , x 2 ,…, x k is said to be linearly dependent if there exist constants k
c , c , …, c k ), not all zero, such that
numbers ( 1 2
c 1 x 1 +c 2 x 2 +…+ c k x k =0
k
∑ c j x j =0
Otherwise, the set of vectors is said to be linearly independent. That is, j=1
implies
c =0
j for all j. where, 1 2
c , c , …, c
k are scalars.
Note: Linear dependence implies that at least one vector in the set can be written as a linear
combination of the other vectors.
8
Example 2.1: Consider the following set of vectors, and then identify linear independent
vectors.
Setting
c 1 x 1 +c 2 x 2 +c 3 x 3 =0
c1 ¿ [1 ¿] [2 ¿] ¿ ¿
¿
Implies that
c 1 +c 2 +c 3=0
2 c1 −2 c 3=0
c 1−c 2 +c 3 =0
With the unique solution
c 1=c 2=c3 =0 . as we cannot find three constants c 1 , c 2 , and
c 3 , not all zero, such that c 1 x 1 +c 2 x 2 +c 3 x 3 =0 , the vectors x1 , x2 , and
x 3 are
linearly independent.
2.2.2. Matrix
9
A=¿ [2 −1¿] [4 8¿ ]¿ ¿¿ A =¿ [ 2 4 5 ¿ ] ¿ ¿¿
'
¿ and its transpose is ¿
The identity matrix
I p is defined by
aaa aaa ¿
A=¿[ 11 12 13 ¿][ 21 22 23 ¿] ¿¿
¿ is given by
|A|=a11 a 22 a33 +a 12 a23 a31 +a13 a32 a21−a 31 a22 a13−a32 a 23 a11 −a 33 a12 a21
If the square matrix A is singular, its determinant is 0: i.e det( A )=0 if A is singular.
If A is near singular, then there exists a linear combination of the columns that is close
to 0, and det( A ) is also close to 0.
If A is nonsingular, then its determinant is nonzero: i.e det( A )≠0 if A is
nonsingular.
If A is positive definite, then its determinant is positive: i.e det( A )>0 if A is
positive definite.
Inverse of Square Matrix
If the determinant of A matrix is nonzero ( det( A )≠0 ), then A is said to be
invertible matrix.
If determinant of A matrix is zero ( det( A )=0 ), then A has no regular inverse.
The inverse of any 2x2 matrix
10
A=¿[ a11 a12 ¿ ] ¿ ¿¿
¿ is given by
1
A−1 = ¿ [ a22 −a12 ¿ ] ¿ ¿
|A| ¿
|Aij|
In general , A
−1
has j, i
th
entry
(−1 ) i+ j
[ ]
|A| , where
A ij is the matrix obtained from
th th
A by deleting the i row and j column.
A=¿ [3 2¿] ¿ ¿¿
¿
−1
You may verify that A A=I
[ −0 . 2 0 . 4 ¿] ¿ ¿ ¿
¿
Hence,
[−0.2 0.4¿] ¿ ¿¿
−1
¿ is A
We note that
c1 ¿ [3 ¿ ] ¿ ¿
¿
Implies that c 1=c 2=0 , so the columns of A are linearly independent
Eigenvalues and eigenvectors of matrix
Let A be a k×k matrix and I k be a k×k identity matrix, then the scalars
det ( A−λI k ) =0
satisfying the characteristic equation: are called the eigenvalues of
matrix A.
Example 2.4: Consider the following matrix
A=¿ [1 2¿ ] ¿ ¿¿ I2=¿ [ 1 0¿ ] ¿ ¿¿
¿ and ¿
11
Find the eigenvalues and associated eigenvectors
Solutions:
For λ1 =3
Ax= λ1 x ⇒ ¿ [ 1 2 ¿ ] ¿ ¿
¿
x +2 x ¿
⇒¿ [ 1 2 ¿ ] ¿ ¿
¿
¿
X1=¿ [ 1 ¿ ] ¿ ¿¿
Let x 1=1 ⇒ x 2 =1 , thus an eigenvector corresponding toλ1 =3 is ¿
Similarly, the eigenvectors corresponding to λ2 =−1 is Ax i= (−1 ) x i
[ 1 2 ¿] ¿ ¿ ¿
¿
x 1 +2 x 2 =−x 1 ⇒ x 1 =−x 2
2 x 1 + x 2 =−x 2 ⇒ x 1 =−x 2
X 2=¿ [ 1¿] ¿ ¿¿
Thus, an eigenvector corresponding to λ2 =−1 is ¿
Note that the eigenvectors are not unique. So we often normalize them, that is, we
standardize them so that they have a unit length.
12
2 2 2 2
The norm of x 1 is
‖X 1‖= x 1 +x √
2 = ( 1 ) + ( 1 ) =√ 2 √
e1=¿ [ 1/ √ 2 ¿ ] ¿ ¿¿
Thus, the normalized eigenvector corresponding to λ1 =3 is ¿
2 2
The norm of e 1 is √
‖e 1‖= ( 1/ √ 2 ) + ( 1/ √ 2 ) =1
e 2=¿ [1 / √ 2 ¿ ] ¿ ¿¿
Similarly, the normalized eigenvector corresponding to λ2 =−1 is ¿
2 2
The norm of e 2 is √
‖e 2‖= ( 1/ √ 2 ) + (−1/ √ 2 ) =1
Let A be a k×k symmetric matrix having k eigenvalues λ1 , λ 2 , ..., λk with
A=¿ [1 2¿ ] ¿ ¿¿
¿
Λ=diag ( λ1 , λ 2)= ¿ [3 0 ¿ ] ¿ ¿¿
The eigenvalues of A matrix are λ1 =3 and λ 2=−1 , then ¿
e1=¿ [ 1/ √ 2 ¿ ] ¿ ¿¿ e 2=¿ [1/ √ 2 ¿ ] ¿ ¿¿
The normalized eigenvectors are ¿ and ¿
P=( e1 , e2 )=¿ [ 1/ √ 2 1/ √ 2 ¿ ] ¿ ¿¿
¿
λ1 0 ¿] ¿
A = PΛ P = [ e 1 , e2 ] ¿ [
'
¿
¿
2.3. Positive Definite Matrix
The symmetric matrix A is said to be positive definite if x ' Ax >0 for all possible vectors x
for all x≠0 . Where,
'
(except x = 0). Similarly, A is positive semi-definite if x Ax≥0
13
The eigenvalues and eigenvectors of positive definite and positive semidefinite matrices have
the followig properties.
1. The eigenvalues of a positive definite matrix are all positive.
2. The eigenvalues of positive semidefinite matrix are positive or zero, with the
number of positive eigenvalues equal to the rank of the matrix.
A=¿ [ 9 −2¿ ] ¿ ¿¿
¿
a. Is A symmetric?
b. Show that A is positive definite
Solution
a. Since A= A ' , A is symmetric
x Ax=[ x 1 , x 2 ] ¿ [ 9 −2 ¿ ] ¿ ¿
'
Where, ¿
14
1/2
The square root matrix A is symmetric and serves as the square root of A:
2
A 1/2 A 1/2=( A 1/2 ) = A
k
1
A−1 =PΛ−1 P' =∑ e i e'i
Thus, i =1 λi
The square root matrix of a positive definite matrix A, has the following properties:
′
1. ( A1/2 ) = A1 /2 (That is, A
1/2
is symmetric).
1/2 1/2
2. A A =A
k
1 ' '
( A 1/2 )−1=∑ e i ei =PΛ−1/2 P
3. i=1 √ λi Where, Λ
−1/2
is a diagonal matrix with
1/ √ λi
th
as the i diagonal element
−1
4.
1/2 −1/2
A A = A A =I and A A
−1 /2 1/2 −1 /2 −1/2
=A
−1
where, A−1/2 =( A1/2 )
Example 2.7: Consider the following matrix
A=¿ [ 9 −2¿ ] ¿ ¿¿
¿
a. Determine the eigenvalues and eigenvectors of A.
b. Write the spectral decomposition of A
c. Find the square root of matrix A
−1
d. Find A
−1
e. Find the eigenvalues and eigenvectors of A
Solution
a. Eigenvalues:
|A−λI 2|=0
¿ ¿
Therefore, λ1 =10 , λ 2=5
Eigenvectors: i Ax = λx
i using this formula, we can find the eigenvectors and the
Normalized eigenvectors are
e 1=¿ [ 2/ √ 5 ¿ ] ¿ ¿ ¿ e 2=¿ [ 1/ √ 5 ¿ ] ¿ ¿¿
¿ and ¿
b. The spectral decomposition of A is
15
A=¿ [ 9 −2 ¿ ] ¿ ¿ ¿
¿
c. The square root of matrix A is
k
A 1/ 2
=∑ √ λi e i ei =√ 10
'
¿ [ 2 / √5 ¿ ] ¿ ¿
i=1 ¿
d. The inverse of matrix A is
adjiont of A 1
A−1 = = ¿ [6 2 ¿ ] ¿ ¿
|A| 9 ( 6 ) − (− 2 )(−2 ) ¿
=1
e. The eigenvalues and eigenvectors of A are
=1
Eigenvalues of A is obtained by taking the reciprocal of eigenvalues of matrix
A and then order the eigenvalues. That is λ1 =1/5=0. 2 and λ2 =1/10=0 .1
A random vector is a vector whose elements are random variables (a function that associates a
real number with each element in the sample space). Similarly, a random matrix is a matrix
whose elements are random variables. The expected value of a random matrix (or vector) is
the matrix (vector) is consisting of the expected values of each of its elements. The expected
value of X, denoted by E( X ) is the n×p matrix of numbers
If
X ij is a continuous random variable with probability density function
f ij ( xij )
, then its marginal mean is given by
¿
E ( X ij )=∫− ¿ xij f ij ( x ij ) dx ij
2.6. Mean Vectors and Covariance Matrices
16
Suppose
X X 1 , X 2 , , X p is a p×1 random vector, then each element of X is a
random variable with its own marginal probability distribution. The marginal means
μi and
2
2
μi=E ( X i ) σ 2i =E ( X i−μi ) , i=1,2 ,…, p
the variances σ i are defined as and
respectively. Specifically,
If i X
is a discrete random variable with probability mass function
pi ( x i )
,
then its marginal mean is given by
μi= ∑ x i pi ( x i )
all x i
If
X
i is a continuous random variable with probability density function
f i ( xi )
, then its marginal mean is given by
¿
μi=∫−¿ x i f i ( x i ) dx i
If i X
is a discrete random variable with probability mass function
pi ( x i )
,
then its marginal variance is given by
σ 2i = ∑ ( x i−μi ) 2 pi ( x i)
all x i
If
X
i is a continuous random variable with probability density function
f i ( xi )
, then its marginal variance is given by
¿
2
σ 2i =∫−¿ ( x i−μi ) f i ( x i ) dx i
It will be convenient in later sections to denote the marginal variances by
σ ii rather than
2
the more traditional σ i , and consequently, we shall adopt this notation.
X
The behavior of any pair of random variables i and k , is described by their joint
X
probability function, and a measure of the linear association between them is provided by the
covariances.
If
X i , X k are discrete random variables with joint probability function
pik ( x i , x k )
, then its covariance is given by
σ ik =E ( X i−μi )( X k −μ k ) = ∑ ∑ ( X i−μ i )( X k−μ k ) pik ( x i , x k )
all x all x
i k
If
X i , X k are continuous random variables with joint probability function
f ik ( x i , x k )
, then its covariance is given by
¿ ¿
σ ik =E ( X i−μi )( X k −μ k ) =∫−¿ ∫−¿ ( X i −μi )( X k −μk ) f ik ( x i , x k ) dxi dx k
17
and
μi and μk , i , k=1,2 ,…, p , are the marginal means.
Note: When i=k , the covariance becomes the marginal variance.
When
X i and X k are continuous random variables with joint density
f ik ( x i , x k ) f i ( xi ) f k ( xk )
and marginal densities and , the independence
condition becomes
f ik ( x i , x k )=f i ( x i ) f k ( x k ) ( xi , xk )
for all pairs .
The p continuous random variables 1 2 X , X , ⋯X p are mutually statistically
independent if their joint density can be factored as
f 1,2⋯ p ( x 1 , x 2 ,⋯ x p ) =f 1 ( x1 ) f 2 ( x 2 ) ⋯f p ( x p )
for all p-tuples ( 1 2 p ). x ,x ,⋯x
Statistical independence has an important implication for covariance. Statistical independence
Cov ( X i , X k ) =0
implies that . Thus,
Cov ( X i , X k ) =0
if
X i and X k are independent
The means and covariances of the p×1 random vector X can be set out as matrices. The
expected value of each element is contained in the vector of means μ=E ( X ) , and the p
variances
σ ii and the p ( p−1 ) /2 distinct covariances σ ik ( i <k ) are contained in the
E(X)=¿ [ E ( X 1 ) ¿ ] [ E ( X 2 ) ¿ ] [ ⋮¿ ] ¿ ¿¿
¿
18
and
∑ ¿ E ( X −μ ) ( X−μ )'
=E ¿ ¿
2 2
[ ][ ]
=Ealignl (X1−μ1) (X1−μ1)(X2−μ2) ⋯ (X1−μ1)(Xp−μp) ¿ (X2−μ2)(X1−μ1)(X2−μ2) … (X2−μ2)(Xp−μp)¿ [ ⋮ ⋮ ⋱ ⋮¿]¿¿
¿
2 2
[=¿ E(X1−μ1) E(X1−μ1)(X2−μ2)…E(X1−μ1)(Xp−μp)¿][E(X1−μ1)(X2−μ2) E(X2−μ2) …E(X2−μ2)(Xp−μp)¿][ ⋮ ⋮ ⋱ ⋮¿]¿¿
Or
¿
[ σ σ ⋯σ ¿ ] [ σ σ ⋯σ ¿ ] [ ⋮ ⋮⋱⋮¿ ] ¿
∑¿Cov X =¿ ¿ ¿¿
( ) 1 1 2 1p 2 1 2 2p
Example 2.8: Find the covariance matrix for the two random variables X 1 and X2 when their
joint probability function given below
X2
X1 0 1 p1 ( x 1 )
-1 0.24 0.06 0.3
0 0.16 0.14 0.3
1 0.40 0.00 0.4
p2 ( x 2 ) 0.8 0.2 1
μ1 =E ( X 1 )=0 . 1 μ2 =E ( X 2 )=0. 2
We have already shown that and . In addition,
2
σ 11 =E ( X 1−μ1 )2 = ∑ ( x 1 −0.1 ) p1 ( x 1 )
all x 1
2 2 2
=(−1−0.1 ) 0 . 3+ ( 0−0.1 ) 0. 3+ ( 1−0. 1 ) 0. 4=0 .69
19
σ 22=E ( X 2 −μ2 ) 2= ∑ ( x 2−0 . 2 )2 p 2 ( x 2 ) =( 0−0 .2 )2 0 . 8+ ( 1−0 . 2 )2 0 . 2=0 . 16
all x 2
μ=E ( X )=¿ [ E ( X 1 ) ¿ ] ¿ ¿ ¿
¿
and
2
∑ ¿ E ( X −μ ) ( X−μ ) =Ealignl ['( X 1 −μ 1 ) ( X 1−μ1 ) ( X 2−μ2 ) ¿ ] ¿ ¿ ¿
¿
=
[ E ( X 1− μ 1 )2 E ( X 1− μ1 ) ( X 2− μ2 ) ¿ ¿ ¿ ¿ ]
¿
We note that the computation of means, variances, and covariances for discrete random
variables involves summation, while analogous computations for continuous random
variables involve integration.
σ =E ( X i−μi ) ( X k−μ k ) =σ ki
Because ik , it is convenient to write the ∑ ¿ ¿ matrix as:
variances
σ ik and σ ik as:
σ ik
ρ ik =
√σ ii √σ kk
The correlation coefficient measures the amount of linear association between the random
variables
X i and X k . Let the population correlation matrix ( ρ ) be the p× p
symmetric matrix and defined as:
20
σ1 σ12 σ1p σ21 σ22 σ2p
[ ][
ρ=¿ σ σ σ σ … σ σ ¿ σ σ σ σ … σ σ ¿ [ ⋮ ⋮ ⋱ ⋮¿]¿¿¿
√ 1 √ 1 √ 1 √ 12 √ 1 √ pp √ 1 √ 22 √ 22√ 22 √ 22√ pp
¿
]
And let the p× p standard deviation matrix be
∑ ¿¿ [ 4 1 2¿ ][ 1 9 −3¿ ] ¿ ¿
¿
1/2
Obtain V and ρ
Solution
=¿ [ 1/2 0 0¿ ][ 0 1/3 0¿ ] ¿ ¿¿
1/2 −1 1/2 −1
ρ=( V ) ∑ (V ) ¿
21
Chapter Three
3. Multivariate Normal Distribution
3.1. Multivariate normal density
A probability distribution that plays a pivotal role in much of the multivariate analysis is
multivariate normal distribution. Since the multivariate normal density is an extension of the
univariate normal density and shares many of its features, we review the univariate normal
density function
Review of the Univariate Normal Density
A random variable x is said to follow a univariate normal distribution with mean μ and
2
variance σ , if x has the density is
1 2 2 given by
f ( x )= e−( x−μ ) /2 σ , −¿ < x<¿
√2 π σ
3.1
x ~ N ( μ , σ 2)
or simply x is distributed as N ( μ , σ )
2
22
[ μ1 ¿ ] [ μ2 ¿ ] [⋮¿] ¿
μ=¿ ¿¿
¿
The simplest multivariate normal distribution is the bivariate (2- dimensional) normal
distribution, which has the density function
1
f ( x )= ¿¿
(2 π ) ¿ ¿
Where,
−¿< x i < ∝, i=1,2
X =¿ [ X1 ¿] ¿ ¿ ¿
¿
We can easily find the inverse of the covariance matrix
1
¿ [ σ 22 −σ 12 ¿ ] ¿ ¿
−1
∑ ¿
σ 11 σ 22−σ 122 ¿
) [( √ ) ( √ ) ( √ )( √ )]
1
− + −2 ρ 12
1 2( 1− ρ212 σ 11 σ 22 σ 11 σ 22
= e
( 2 π ) σ 11 σ 22 (1− ρ212)
√
23
If σ 12=0 or equivalently ρ12=0 , then X1 and X2 are uncorrelated. For bivarite normal,
σ 12=0 implies that X and X are statistically independent and then the joint density can be
1 2
f ( x )=
1
e
−
1
2 [( ) ( ) ]
√σ 11
+
√ σ 22
( 2 π ) √ σ 11 σ 22
2
1 x1− μ1 2
=
1
exp
[ ( ) ]∗1
−
2 √ σ 11
exp
[ ( )]
−
1 x 2− μ2
2 √σ
22
√ 2 πσ 11 √ 2 πσ 22
=f ( x 1 )∗f ( x 2 )
Slices of multivariate normal density
All points of equal density are called a contour. The multivariate normal density is
′ −1
constant on surfaces where the square of the distance ( x−μ ) ( x−μ ) is constant. ∑
Contours of constant density for the p- dimensional normal distribution are ellipsoids
defined by x such that
−1
( x−μ )′ ∑ ( x−μ )≤c 2
24
−1
has probability ( 1−α ). That is
[ ]
p ( x−μ )′ ∑ ( x−μ )≤ χ 2p ( α ) =1−α
where
χ 2p ( α ) th
is the (1−α ) 100 % point of the chi-square distribution with p- degrees of
freedom.
Example: Using the following bivariate normal ( x ~ N 2 (μ , ∑ ) ) finds the major and minor
μ=¿ ( 5 ¿ ) ¿ ¿ ¿
axis of ellipses. ¿
and we want the 95% probability contour. The upper 5% point of the chi-square distribution
2 2
with 2 degrees of freedom is χ p α = χ 2 0 .05 =5 . 9915 . So,
( ) ( ) c=√ 5.9915=2.4478
Axes:
μ±c √ λi e i where ( λ i , ei ) is the ith ( i=1,2 ) eigenvalues/ eigenvector pair of
∑ ¿¿ .
λ1 =68 . 316 e '1 =( 0 .2604 ,0 . 9655 )
λ2 =4 .684 e'2 =( 0 . 9655 ,−0 . 2604 )
Major Axis
Using the largest eigenvalues and corresponding eigenvector:
(5 ¿ )¿ ¿ ¿
¿ ¿
¿
Minor Axis
Same process but now use λ2 and e2, the smallest eigenvalues and corresponding eigenvector:
(5 ¿ )¿ ¿ ¿
¿ ¿
¿
Graph of 95% probability contour
25
Equation for Contour
′ −1
( x−μ ) ∑ ( x−μ )≤5.99
( 9 16 ¿ ) ¿ ¿¿
((x1−5),( x2−10)) ¿
¿−1
( x−μ )′ ∑ ( x−μ ) is a quadratic form, which is equation for a polynomial
Points inside or outside
Are the following points inside or outside the 95% probability contour?
Is the point (10, 20) inside or outside the 95% probability contour?
( 10 ,20 ) →0. 2 ( 10−5 )2 +0 .028 ( 20−10 )2 −0 .1 ( 10−5 )( 20−10 )
=0. 2 ( 25 ) +0 . 028 (100 )−0 . 1 ( 50 )
=2 .8
Is the point (16, 20) inside or outside the 95% probability contour?
( 16 , 20 ) →0. 2 ( 16−5 )2 +0. 028 (20−10 )2 −0 . 1 ( 16−5 ) ( 20−10 )
=0 .2 ( 121 ) +0 . 028 ( 100 ) −0 .1 ( 11 ) ( 10 )
=16
Graph of inside or outside the 95% probability contour
26
Example: The general form of contours for a bivariate normal probability distribution where
the variables have equal variance ( σ 11 =σ 22 ) is relative easy to derive:
First we need the eigenvalues of ∑¿ ¿
|∑ − λI |=0
0= ¿|σ 11 − λ σ 12 ¿|¿ ¿ ¿ ¿
¿
¿
Consequently, the eigenvalues are λ1 =σ 11 +σ 12 and λ2 =σ 11−σ 12 .
Next we need the eigenvectors of ∑¿ ¿
∑ e i =λi e i
|σ 11 σ 12 ¿|¿ ¿ ¿
¿
σ 11 e 1 + σ 12 e 2 =( σ 11 + σ 12 ) e1
σ 12 e1 + σ 11 e 2 =( σ 11 + σ 12 ) e2
Which implies e 1=e2 , and after normalization, the first eigenvector is e '1= [ 1/ √ 2 ,1 / √ 2 ]
'
and Similarly, λ2 =σ 11−σ 12 yields the eigenvectors e 2= [ 1 / √ 2 ,−1/ √ 2 ]
When the covariance ( σ 12 ) or correlation ( ρ12 ) is positive, λ1 =σ 11 +σ 12 is the
'
largest eigenvalues, and its associated eigenvectors e 1= [ 1/ √ 2 ,1 / √ 2 ] lies along the 45 0
'
line through the point μ =[ μ1 , μ2 ] . This is true for any positive value of the covariance
(correlation). Since the axes of the constant density ellipses are given by ±c √ λ 1 e1 and
±c √ λ 2 e 2 , and the eigenvectors each have length unity, the major axis will be associated
27
with the largest eigenvalues. For positively correlated normal random variables, then, the
0
major axis of the constant density ellipses will be along the 45 line through μ .
To summarize, the axes of the ellipses of constant density for a bivariate normal distribution
with σ 11 =σ 22 are determined by
±c √ σ 11 +σ 12 ¿ [ 1 / √ 2 ¿ ] ¿ ¿
¿
Properties of the Multivariate Normal Distribution
For any multivariate normal random vector X
1. The density
1
f ( x )= p
¿¿
( √2 π ) ¿ ¿
has maximum values at
[μ=¿ μ1¿][μ2¿][⋮¿]¿¿
2. The density
¿
1
f ( x )= p
¿¿
( √2 π ) ¿ ¿
is symmetric along its constant density contours and is centered at m, i.e., the mean is
equal to the median!
3. If
X ~ N ¿¿
p , then the linear combinations of the components of X are
(multivariate) normally distributed
4. If
X ~ N p ¿ ¿ , then all subsets of the components of X have a (multivariate) normal.
5. If
X ~ N p ¿ ¿ , then zero covariance implies that the corresponding components of
X are independently distributed.
28
6. If
X ~ N p ¿ ¿ , then conditional distributions of the components of X are
(multivariate) normal
Some Important Results Regarding the Multivariate Normal Distribution
1. If
X ~ N p ¿ ¿ , then any linear combination of variables
'
a X=a1 X 1 + a2 X 2 +…+a p X p is distributed as N ( a' μ ,a ' ∑ a ) . Also, if
p p
[ ][ ]
AX=¿ ∑ a1i Xi ¿ ∑ a2i Xi ¿ [⋮¿] ¿ ¿¿
i= 1 i=1
¿
Furthermore, if d is a vector of constants, then
X +d ~ N p ¿ ¿
X 1 ¿ ( ¿ __ ¿ ) ¿
X ( p×1 )=¿ ( q×1 ) ( ¿¿ )
¿
X 1 ~ N p ( μ1 , ∑ 11 ) X 2 ~ N p−q (μ 2 , ∑ 22 )
then ,
X 1 ~ N q 1 ( μ1 , ∑11 ) X 2 ~ N q 2 (μ 2 , ∑22 )
4. If and are independent, then
Cov ( X 1 , X 2 ) =∑12 ¿ 0
and if
( X 1 ¿ ) ¿¿ ¿
¿ , then X 1 and X 2 are independent if and only if
∑12 ¿ 0
X 1 ~ N q 1 ( μ1 , ∑11 ) X 2 ~ N q 2 (μ 2 , ∑22 )
And if and are independent, then
29
( X 1 ¿ ) ¿¿ ¿
¿
−1
5. If X ~ N p (μ , ∑ ) and ¿¿ ( )′ ( ) 2
, then x−μ ∑ x−μ ~ χ p and the N p (μ , ∑ )
distribution assigns probability 1−α to the solid ellipsoid
−1
{ x : ( x−μ ) ∑ ′
( x −μ )≤ χ 2p ( α ) }
3.2. Sampling from multivariate normal distribution
Let the observation vectors
X 1 , X 2 ,…, X n denote a sample (independent) from p-variate
normal distribution with mean vector μ and covariance matrix ∑ ¿¿ , and then the joint
density function of the X is
n
f ( x 1 , x 2 ,… x n )=∏ ¿ ¿ ¿
i =1
1
= np
¿¿
(√2 π ) ¿¿
Trace
Let A be a k×k symmetric matrix and x be a k×1 vector. Then
a. x Ax=tr ( x ' Ax)=tr ( Ax { x¿¿' )
'
n
tr( A )=∑ λi
b. i=1 , where i are the eigenvalues of A.
λ
Now the exponent in the joint density can be simplified as
( xi −μ ) ′ ∑ ( xi −μ ) =tr [( xi −μ ) ′ ∑ ( xi− μ ) ]=tr [∑ ( xi −μ )( xi −μ )′ ]
−1 −1 −1
Next
n
n
−1 −1
∑ ( xi −μ )′ ∑ [
( x i−μ ) = ∑ tr ( Xi −μ )′ ∑
i=1
( X i −μ ) ]
i=1
n
−1
= ∑ tr
i=1
[∑ ( X i −μ ) ( X i −μ )′ ]
n
=tr
[ (∑
−1
∑ ( X i− μ )( X i −μ ) ′
i=1 )]
Since the trace of a sum of matrices is equal to the sum of the traces of the matrices.
n
n
x̄=∑ x i / n ∑ ( X −μ ) ( X −μ) ′
30
n
= ∑ ( xi − x̄ )( x i − x̄ )′ + ∑ ( x̄ −μ ) ( x̄−μ )
n n
′
i=1 i=1
= ∑ ( xi − x̄ )( x i − x̄ )′ +n
n
( x̄−μ ) ( x̄−μ )′
i=1
n n
∑ ( x i − x̄ ) ( x̄ −μ )′ ∑ ( x̄−μ ) ( x i − x̄ ) ′
The cross products terms i =1 and i =1 are both matrices of
zeros. We can write the joint density of a random sample from a multivariate normal
population as
1
f ( x 1 , x 2 , … x n )= np / 2
¿¿
(2 π ) ¿¿
3.3. Maximum likelihood estimation
When the numerical values of the observations become available, they may be substituted for
the
x i in the joint density. The resulting expression, now considered as a function of μ and
∑ ¿¿ for the fixed set of observations x 1 , x 2 ,…, x n , is called the likelihood. One meaning of
best is to select the parameter values that maximize the joint density evaluated at the
observations. This technique is called maximum likelihood estimation, and the maximizing
parameter values are called maximum likelihood estimates.
L¿ ¿
31
n
1
=∏ p
¿ ¿¿
i=1 ( √ 2 π ) ¿ ¿
1
= np
¿¿
(√ 2 π ) ¿ ¿
The Likelihood function is:
L¿ ¿
and the Log-likelihood function is:
l ¿ ¿
To find the Maximum Likelihood estimators of μ and ∑ ¿¿ , we need to find μ^
and ∑^ ¿¿ to maximize
L¿ ¿
or equivalently maximize
l ¿ ¿
Note:
n n n ′
′ −1
∑ ( X i−μ ) ∑ ( X i−μ )=∑ X i ∑
i=1 i=1
' −1
X i −2
( )
∑ Xi ∑
i=1
−1
μ+n μ' ∑ μ
−1
Thus,
dl ¿ ¿ ¿ ¿
n n ′
=
−1 d
2 dμ ( ∑ Xi∑
i=1
' −1
X i −2
( ) ∑ Xi ∑
i=1
−1 −1
μ +n μ' ∑ μ =0
)
n n
¿∑
−1
(∑ )
i=1
X i −n ∑ μ=0⇒
−1
(∑ ) i=1
X i =nμ
n
1
Hence,
μ^ =
n ( )
∑ X i = X̄
i =1
Now,
32
l¿¿¿
Now,
¿
l ¿ ¿
dl ¿ ¿ ¿ ¿ n
n −1 1 −1 −1
=− ∑ + ∑ ∑ ( X i −μ ) ( X i−μ )′ ∑ ¿ 0
2 2 i=1
n n
^ 1 ′ 1 ′ n−1
∑ n ∑( i )( i ) n ∑ ( i )( i ) =n S
¿= X μ
−^ X μ
−^ ¿= X − X̄ X − X̄
i=1 i= 1
Where, S is sample covariance matrix
In general, the Maximum Likelihood estimators of μ and ∑ ¿¿ are
n
1
μ^ =
n ( )
∑ X i = X̄
i =1
and
n
∑ ¿= 1n ∑ ( X i− X̄ ) ( X i− X̄ )′= n−1
^
n
S¿
i=1
Sufficient Statistics
A sufficient statistic is one that, from a certain perspective, contains all the necessary
information for making inferences about the unknown parameters in a given model. By
making inferences, we mean the usual conclusions about parameters such as estimators,
33
The importance of sufficient statistics for normal populations is that all of the
information about μ and ∑ ¿¿ in the data matrix X is contained in X̄ and S,
regardless of the sample size n.
This generally is not true for non-normal populations.
Since many multivariate techniques begin with sample means and covariances, it is
prudent to check on the adequacy of the multivariate normal assumption.
If the data cannot be regarded as multivariate normal, techniques that depend solely on
X̄ and S may be ignoring other useful sample information.
The distribution of X̄ as N ¿¿
n
( n−1 ) S=∑ ( X i − X̄ )( X i − X̄ )′
The matrix i=1 is distributed as a Wishart random
matrix, p W ¿¿ with n − 1 degrees of freedom.
X̄ and S are independent.
Wishart Distribution:
The sampling distribution of the sample covariance matrix is called the Wishart distribution,
after it’s discover, it is defined as the sum of independent products of multivariate normal
random vectors. Specifically
W m ¿ ¿ = Wishart distribution with m degree of freedom
m
∑ Z i Z 'i
= distribution of i=1
Where,
Z i are each independently distributed as N p (0, ∑ )
Wishart distribution is the multivariate analogue to a chi-square distribution.
34
Properties of the Wishart Distribution
A 1 is distributed as Wm ¿¿
1. If 1 independently of A 2 , which is distributed as
Wm ¿¿ Wm + m2 ¿ ¿
2 , then A 1 + A 2 is distributed as 1 . That is, the
degree of freedom adds.
2. If A is distributed as
Wm ¿¿ , then
'
CA { C ¿ is distributed as
' '
W m ( CA { C ¿¿C ∑ C )
Assignment 1
1. Find the major and minor axes for a bivariate normal probability distribution where
μ ¿
μ=¿ ( 1 ¿) ¿¿
the variables have equal variance ( σ 11 =σ 22 ). Where ¿ and
∑ ¿¿ [ σ 11 σ¿12 ¿] ¿ ¿
2. Let
x 1 , x 2 ,…, x n be a random sample size n from univariate normal distribution with
2
mean μ and variance σ has the probability density function
1 2
−[ ( x −μ ) /σ ] /2
f ( x )= e −¿< x <¿
√ 2 πσ 2
n
n ∑ ( x i−μ )2
x̄=∑ x i / n=^μ σ^ 2 = i=1
Then, show that i =1 and n `and also give your
comment
3. Let
X 1 , X 2 ,…, X n be a random sample size n from a p-variate normal distribution
with mean μ and covariance matrix ∑ ¿¿ has the following joint density function
1
f ( x 1 , x 2 ,… x n )= np
¿¿
(√ 2 π ) ¿ ¿
n n
1
μ^ =
n (∑ ) X i = X̄ ∑ ¿= 1n ∑ ( X i− X̄ ) ( X i− X̄ )′= n−1
^
n
S¿
Then show that i =1 and i=1
35
4. Inference about a Multivariate Mean Vector
4.1. Inference about a mean vector
A large part of any analysis is concerned with inference. That is, reaching valid conclusions
concerning a population on the basis of information from a sample.
At this point, we shall concentrate on inferences about a population mean vector and its
component parts. One of the central messages of multivariate analysis is that p correlated
variables must be analyzed jointly.
4.2. Hypothesis testing
A hypothesis is a conjecture about the value of a parameter, in this section a population mean
or means. Hypothesis testing assists in making a decision under uncertainty.
4.2.1. Univariate case
We are interested in the mean of a population and we have a random sample of n observations from
the population,
X 1 , X 2 ,…, X n
Where, (i.e., Assumptions):
If
X 1 , X 2 ,…, X n be a random sample from a normal population, the appropriate test statistic
is
X̄−μ0
t=
s/ √ n
n n
1 1
Where,
X̄ =
n
∑
i=1
( ) Xi
and
s = 2
∑
n−1 i=1
( X i − X̄ )2
.
Sampling Distribution: If Ho and assumptions are true, then the sampling distribution of t is
Student’s - t distribution with n-1 degrees of freedom.
36
Decision: Reject Ho when t is “large” (i.e., small p–value). Or we reject the null hypothesis at
level α when
|t|>t α /2 ( n−1 ) . If we fail to reject H , then we conclude that μ0 is close to
0
X̄ .
Confidence Interval: A region or range of plausible µ’s (given observations/data). The set of
all µ’s such that
x̄−μ0
| |≤t α /2 ( n−1 )
s/ √ n
Where, t α /2 ( n−1 ) is the upper ( α /2 ) 100% percentile of Student’s t-distribution with n-1
degrees of freedom.
s s
{μ 0 such that X̄ −t α /2 ( n−1 )
th
√n
≤μ 0 ≤ X̄ +t α /2 ( n−1 )
√n }
A 100 (1 − α) confidence interval or region for μ is
s s
X̄ −t α /2 ( n−1 ) ≤μ 0≤ X̄ +t α /2 ( n−1 )
√n √n
Remark: Before for sample is selected, the ends of the interval depend on random variables
X̄ and s ; this is a random interval. 100(1−α )th percent of the time such intervals with
contain the “true” mean μ.
Example 4.1: Suppose we had the following fifteen sample observations on some random
variable X1:
5.76 6.68 6.79 7.88 2.46 2.48 2.97 4.47
1.62 1.43 7.46 8.92 6.61 4.03 9.42
At a significance level of α=0 .10 , do these data support the assertion that they were drawn
from a population with a mean of 4.0?
In other words, test the null hypothesis 0 1 0 H : μ =μ =4 .0 and H : μ ≠μ =4 .0
1 1 0
Solution
Let’s use the five steps of hypothesis testing to assess the potential validity of this conjecture:
1. State the null and alternative hypotheses
H 0 : μ1 =4 .0 vs H 1 : μ1 ≠4 . 0
2. State the desired level of significance α=0 .10
3. Select the appropriate test statistic: n = 15 < 30 but the data appear normal, so use
t- distribution and calculate the test statistic
We have
2
X̄ 1 =5 .26 , μ0 =4 . 0 , s 1=s 11=7 . 12→s 1=2 . 669=√ s11
So
37
X̄ 1 −μ 0 X̄ 1 −μ0 5. 26−4 . 0
t= = = =1. 84
s x̄ s/√n 2 . 669/ √ 15
4. Find the critical value(s) and state the decision rule
Critical Value: We have a two-tailed test t α /2 ( n−1 ) =±1.761
Decision rule: Do not reject H0 if –1.761 £t £ 1.761, otherwise reject H0
Since the observed (calculated) value of t is 1.84 which falls in the rejection region
and thus we reject Ho at 10% level.
5. Conclusion: At α=0 .10 , the sample evidence does not support the claim that the
mean of X1 is 4.0.
Example 4.2: Suppose we had the following fifteen sample observations on some random
variable X2:
-3.97 -3.24 -3.56 -1.87 -1.13 -5.20 -6.39 -7.88
-5.00 -0.69 1.61 -6.60 2.32 2.87 -7.64
At a significance level of a = 0.10, do these data support the assertion that they were drawn
from a population with a mean of -1.5?
In other words, test the null hypothesis 0 2 H : μ =μ =−1 .5 and H : μ ≠μ =−1 .5
0 1 2 0
Solution:
Let’s use the five steps of hypothesis testing to assess the potential validity of this conjecture:
1. State the null and alternative hypotheses
H 0 : μ2 =−1 .5 vs H 1 : μ2 ≠−1. 5
2. State the desired level of significance α=0 .10
3. Select the appropriate test statistic: n = 15 < 30 but the data appear normal, so use
t- distribution and calculate the test statistic
We have
X̄ 2 =−3 . 09 , μ 0=−1. 5 , s 22=s 22=12. 43 →s2 =3 . 526= √ s22
So
X̄ 2 −μ 0 X̄ 2 −μ0 −3. 09−(−1. 5 )
t= = = =−1. 748
s x̄ s /√n 3 .526 / √15
4. Find the critical value(s) and state the decision rule
Critical Value: We have a two-tailed test t α /2 ( n−1 ) =t 0 . 05 ( 14 )=±1 .761
Decision rule: Do not reject H0 if –1.761 £t £ 1.761, otherwise reject H0
Since the observed (calculated) value of t is –1.748 which is not fall in the rejection
region and thus we do not reject Ho at 10% level. That is
–1.761 £ t £ 1.761, i.e., –1.761 £ -1.748 £ 1.761
38
5. Conclusion: At α=0 .10 , the sample evidence does not refute the claim that the mean
of X2 is -1.5.
Square the test statistic t:
2
2 ( X̄ −μ0 ) −1
t = 2
=n ( X̄−μ 0 ) ( s2 ) ( X̄−μ0 )
s /n
2
So, t is a squared statistical distance between the sample mean x̄ and the
hypothesized value
μ0 .
2
Remember that t df =F 1, df ?
That is, the sampling distribution of
2
2 ( X̄ −μ0 ) −1
t = 2
=n ( X̄−μ 0 ) ( s2 ) ( X̄−μ0 ) ~ F ( 1 ,n−1 )
s /n
39
We can use this to test
H 0 : μ=μ 0 assuming that observations are a random sample from
N p¿¿ .
2
We can compute T and compare it to
( n−1 ) p
T2 ~ F (α )
n− p p , (n− p )
Or use the fact that
n− p 2
T ~ F p ,( n− p)( α)
( n−1 ) p
2
Compute T as
′
T 2 =n ( X̄−μ 0 ) S−1 ( X̄−μ0 )
n− p 2
and the p-value=
{
prob F p ,( n− p) (α )≥
( n−1 ) p
T
}
2
Reject Ho when p-value is small (i.e., when T is large)
2 ( X̄ −μ0 )2 −1
t = 2
=n ( X̄−μ 0 ) ( s2 ) ( X̄−μ0 )
s /n
1
Since,
X̄ ~ N μ , σ 2
n ( ) ,
√ n ( X̄ −μ 0 ) ~ N ( √n ( μ−μ0 ) , σ 2 )
This is a linear function of X̄ , which is a random variable.
We also know that
n
n ∑ ( X i − X̄ ) 2 n
i=1
( n−1 ) s2 =∑ ( X i− X̄ )2 ~ σ 2 χ 2( n−1 ) =∑ Zi2 ~ χ 2(n−1 )
i =1 because σ2 i =1
n
∑ ( X i − X̄ ) 2
Chi−square random var iable
S 2 = i=1 =
So, n−1 deg rees of freedom
Putting this all together, we find
2
t =¿ ( univariate¿ )( normal¿ )( random¿ ) ¿ ¿¿
¿
Now we will go through the same thing but with the multivariate case
40
′
T 2 =√ n ( X̄−μ0 ) S−1 √ n ( X̄ −μ0 )
1
X̄ ~ N p (μ ,
n ∑ and
) √ n ( X̄ −μ0 )
Since is a linear combination of X̄ ,
√ n ( X̄ − μ0 ) ~ N p ¿ ¿
Also,
n n
∑ ( X i− X̄ )( X i − X̄ ) ′ ∑ Z i Z 'i
S= i =1
n−1
= i =1
n−1
= ( Wishart random n−1
matrix with df =n−1
)
Where, Z i ~ N p (0 , ∑ ) if H0 is true
Recall that a Wishart distribution is a matrix generalization of the chi-square distribution.
The sampling distribution of ( n−1 ) S is Wishart where
Wm ¿¿
So,
x j 1 1.43 1.62 2.46 2.48 2.97 4.03 4.47 5.76 6.61 6.68 6.79 7.46 7.88 8.88 8.92
x j 2 -0.69 -5.0 -1.13 -5.2 -6.39 2.87 -7.88 -3.56 2.32 -3.24 -3.56 1.61 -1.87 -6.6 -7.64
At a significance level of α=0 .10 , do these data support the assertion that they were drawn
from a population with a centroid (4.0, -1.5)?
In other words, test the null hypothesis
H 0 : μ=¿ [ μ 1 ¿ ] ¿ ¿ ¿
¿
Let’s go through the five steps of hypothesis testing to assess the potential validity of our
assertion.
1. State the null and alternative hypothesis
H 0 : μ= ¿ [ μ 1 ¿ ] ¿ ¿ ¿
¿
2. State the level of significance α=0 .10
41
3. Select the appropriate test statistic: n – p = 15 – 2 = 13 is not very large, but the data
appear relatively bivariate normal, so use
′
T 2 =n ( X̄−μ 0 ) S−1 ( X̄−μ0 )
We have
X̄ =¿ [ 5 . 26 ¿ ] ¿ ¿ ¿
¿
Calculate test statistics
5. Conclusion:
At α=0 .10 , the sample evidence supports the claim that the mean vector differs
from
μ0=¿ [ 4.0¿] ¿ ¿¿
¿
.
Likelihood Ratio Test and Hotelling's T2
Compare the maximum value of the multivariate normal likelihood function under no
restrictions against the restricted maximized value with the mean vector held at
μ0 . The
μ
hypothesized value 0 will be plausible if it produces a likelihood value almost as large as
the unrestricted maximum.
To test
H 0 : μ=μ 0 against H 1 : μ≠μ 0 we construct the ratio:
42
Likelihood ratio =Λ=max¿¿¿ ¿¿¿
Where, the numerator in the ratio is the likelihood at the MLE of ∑ ¿¿
given that
μ=μ 0 and
the denominator is the likelihood at the unrestricted MLEs for both μ, ∑ ¿ ¿ .
Since
n n
∑^ ¿ 0=n−1 ∑ ( x i −μ 0)( xi −μ0 )′ ¿ −1
μ^ =n ∑ x i= x̄
i=1 and i=1
n
∑^ ¿=n−1 ∑ ( xi − x̄ )( x i− x̄ )′ ¿
i=1
, then under the assumption of multivariate normality
Λ=¿ ¿ ¿¿
Derivation of Likelihood Ratio Test
¿¿¿
¿
43
¿¿¿
¿
Λ=¿ ¿ ¿¿
μ0 is a plausible value for μ if Λ is close to one.
Relationship between Λ and T2
2 /n
Λ =¿ ¿ ¿ ¿
For large T2, the likelihood ratio is small and both lead to rejection of H0.
From the previous equation,
2
T =¿¿
which provides another way to compute T2 that does not require inverting a covariance
matrix.
When
H : μ=μ 0 is true, the exact distribution of the likelihood ratio test statistic is
0
obtained from
2
Example:
T =¿ ¿
4.4. Confidence regions and simultaneous comparison of components
4.5. Large sample inference about the mean vector
44