SIXTH EDITION
Applied Multivariate
Statistical Analysis
RICHARD A. JOHNSON
University of Wisconsin—Madison
DEAN W. WICHERN
Texas A&M University
Upper Saddle River, New Jersey 07458‘The Organization of Data 5
The preceding descriptions offer glimpses into the use of multivariate methods
in widely diverse fields.
1.3 The Organization of Data
‘Throughout this text, we are going to be concerned with analyzing measurements
made on several variables or characteristics. These measurements (commonly called
data) must frequently be arranged and displayed in various ways. For example,
graphs and tabular arrangements are important aids in data analysis. Summary num-
bers, which quantitatively portray certain features of the data, are also necessary to
any description.
We now introduce the preliminary concepts underlying these first steps of data
organization.
Arrays
Multivariate data arise whenever an investigator, secking to understand a social or
physical phenomenon, selects a number p = 1 of variables or characters to record.
‘The values of these variables are all recorded for each distinct item, individual, or
experimental unit.
We will use the notation x;, to indicate the particular value of the kth variable
that is observed on the jth item, or trial. That is,
jx = measurement of the kth variable on the jth item
Consequently, measurements on p variables can be displayed as follows:
Variable 1 Variable2_--- Variable k Variable p
Item 1: a 2 : Lik Xp
Item 2: Xy x2 : Xan os 2p
Item xa ta See Aik ce *ip
Item n: Xn no ms eae eeueeetarge
Or we can display these data as a rectangular array, called X, of n rows and p
columns:
A 2 Me
%21 X22 71" X2e
Zip
Pete eae akc:
Xn Xn2 * Xnk “77 Xap.
‘The array X, then, contains the data consisting of all of the observations on all of
the variables.66 Chapter 2 Matrix Algebra and Random Vectors
Thus,
1 eet 221)
Pat
since (PA“'P')PAP’ = PAP’(PA™'P’) = PP’ =I.
Next, {et A¥? denote the diagonal matrix with Vi; as the ith diagonal element,
k
The matrix $) Vi; ee) = PAP’ is called the square root of A and is denoted by
frst
av,
The square-root matrix, of a positive definite matrix A,
k
Al? = > Vij ee! = PAYPY (2-22)
A
has the following properties:
1, (A')' = Al? (that is, A’? is symmetric).
2 APA = A
‘
3. (a?) T= > ve ee; = PA“!P’, where A~Y? is a diagonal matrix with
ai vi,
1/ Vi; as the ith diagonal element.
4. NPAT = VAM = Tand AA? =
1 where AY? = (AN2)7,
2.5 Random Vectors and Matrices
A random vector is a vector whose elements are random variables. Similarly, a
random matrix is a matrix whose elements are random variables. The expected value
of a random matrix (or vector) is the matrix (yector) consisting of the expected
values of each of its elements. Specifically, let X = {X;;} be an n x p random
matrix. Then the expected value of X, denoted by £(X), is the n x p matrix of
numbers (if they exist)
E(X1) E(X12) -- E(X1p)
EX) (Xn) E%ay)
E(X) = (2-23)
BXn) E(Xea) - E%ny)Random Vectors and Matrices 67
where, for each element of the matrix?
; [ ay fit), Xi is continuous random variable with
co probability density function f,(x:;)
E(Xi) =
if X,)is a discrete random variable with
D xpi) probability function pi,(.xi;)
alts,
Example 2.12 (Computing expected values for discrete random variables) Suppose
p =2andn = 1, and consider the random vector X’ = [X;, X2]. Let the discrete
random variable X; have the following probability function:
an ele Oaa
Pix) ea
Then E(X1) = SY nipa() = (-1)(3) + (0)(3) + (1)(4) = A
i
Similarly, let the discrete random variable X, have the probability function
x2 ee
PAx%2) | 8 2
Then £(X2) = = %2P2(*2) = (0)(.8) + (1) (2) =
Thus,
E(x,)]_ [a
E(X) = =
) oS 2 :
‘Two results involving the expectation of sums and products of matrices follow
directly from the definition of the expected value of a random matrix and the univariate
properties of expectation, E(X, + %4) = E(X,) + E(¥j) and £(cXy) = cE(%)-
Let X and Y be random matrices of the same dimension, and let A and B be
conformable matrices of constants. Then (see Exercise 2.40)
= E(X) + E(¥) (2-24)
AE(X)B
It you are unfamiliar with calculus, you should concentrate on the interpretation of the expected
value and, eventually, variance. Our development is based primarily on the properties of expectation
rather than its particular evaluation for continuous or discrete random variables.68 Chapter 2 Matrix Algebra and Random Vectors
2.6 Mean Vectors and Covariance Matrices
Suppose X’ = [X1, X2,..., Xp] isa p x 1 random vector. Then each element of X is a
random variable with its own marginal probability distribution. (See Example 2.12.) The
marginal means 1, and variances g? are defined as p; = E(X;) ando? = E(X; — i)?,
i= 1,2,..., p, respectively. Specifically,
[ aif) dr, i€X:is a continuous random variable with probability
Loo density function f(x)
a
if X;is a discrete random variable with probability
= xiPi(%i) function pj(x;)
a,
[ (x; — w)?f(x) dx; EX is acontinuousrandom variable (25)
with probability density function fi(x;)
2
oe
: if X; is a discrete random variable
Mi)’ Pi(%i) with probability function p,(x;)
Z(t
It will be convenient in later sections to denote the marginal variances by ;, rather
than the more traditional 07, and consequently, we shall adopt this notation.
The behavior of any pair of random variables, such as X; and X;, is described by
their joint probability function, and a measure of the linear association between
them is provided by the covariance
in = E(Xi ~ aa) (Xe ~ Be)
f[ [ (x; — mi) (24 — madfiules, xx)de; dry if X;, Xeare continuous
00 J-c0 random variables with
the joint density
function fix().%%)
(xy Ha) re ~ ma) Pin 2H 4) if X;, X; are discrete
atts ats, random variables with
joint probability
function pja( Xis-*4)
(2-26)
and 1; and iy, i,k = 1,2,..., p, are the marginal means When i = k, the covari-
ance becomes the marginal variance.
More generally, the collective behavior of the p random variables X1, X2,....Xp
or, equivalently, the random vector X' = [X,, X2,-.., Xph, is described by a joint
probability density function f(x;, x2,...,xp) = f(x). As we have already noted in
this book, f (x) will often be the multivariate normal density function. (See Chapter 4.)
Ifthe joint probability P[X; = x, and X, = x,] can be written as the product of
the corresponding marginal probabilities, so that
P[X; S xjand X_ S xq] = P[X; < xJP[Xt < x4) (227)Mean Vectors and Covariance Matrices 69
for all pairs of values x;, xx, then X, and X; are said to be statistically independent.
When X; and X; are continuous random variables with joint density fi,(x;, x4) and
marginal densities f,(x;) and f;(x,), the independence condition becomes
Sue Xu) = fie Sa Xe)
for all pairs (x;, x,).
The p continuous random variables X,,Xp,...,Xp are mutually statistically
independent if their joint density can be factored as
Fir -p(21s Xay-++ Xp) = Alea)fal%2) > Sp(%p) (2-28)
for all p-tuples (x, x2,....Xp)-
Statistical independence has an important implication for covariance, The
factorization in (2-28) implies that Cov (X;, X,) = 0. Thus,
Cov(X;,X,) = 0 if X;and X, are independent (2-29)
The converse of (2-29) is not true in general; there are situations where
Cov(X;,X;) = 0, but X; and X; are not independent. (See [5}.)
The means and covariances of the p X 1 random vector X can be set out as
matrices. The expected value of each element is contained in the vector of means
= E(X), and the p variances ;; and the p(p — 1)/2 distinct covariances
oiu(i 2) pram 2)
all pits (ri.x2)
= (-1- .1)(0 ~ .2)(.24) + (-1 ~~ .1)(1 ~ .2)(.06)
++ (1 ~ 1)(1— .2)(.00) = -.08
oa, = E(X2 — ba)(%1 ~ on) = E(X1 ~ o1)(X2 ~ ba) = O12 = —.08‘Mean Vectors and Covariance Matrices 71
Consequently, with X’ = [X,, Xo],
wane = [E08] [in] [2]
= E(X — #)(X - pw)’
and
i | ~ my (X= a) - |
(42 = wa)(X = oa) (Xa ~ pa)?
- ea ~ wm? E(X, ~ w)(X2 — 2
(Xa ~ wa) ~ a) E(X2 ~ wa)?
-[ou on]_[ 69 ~08
On 022 -08 16 ia
We note that the computation of means, variances, and covariances for discrete
random variables involves summation (as in Examples 2.12 and 2.13), while analo-
gous computations for continuous random variables involve integration.
Because oj, = E(X;~ u,)(Xz —~ ux) = oxi, it is convenient to write the
matrix appearing in (2-31) as
Pe
E=E(X — p(X — py =| 7 Oe (2-32)
ieee ee
We shall refer to 4 and & as the population mean (vector) and population
variance-covariance (matrix), respectively.
The multivariate normal distribution is completely specified once the mean
vector #2 and variance-covariance matrix ¥ are given (see Chapter 4), so it is not
surprising that these quantities play an important role in many multivariate
procedures.
It is frequently informative to separate the information contained in vari-
ances ¢;; from that contained in measures of association and, in particular, the
measure of association known as the population correlation coefficient pj,. The
correlation coefficient pix is defined in terms of the covariance oj, and variances
oj; and o44 as
ik
= 2-33
oo (2-33)
The correlation coefficient measures the amount of linear association between the
random variables X, and X;. (See, for example, [5].)72. Chapter? Matrix Algebra and Random Vectors
Let the population correlation matrix be the p X p symmetric matrix
ce
Voi1 VOpp
Bei ePH
(O22 VOpp
1 pe Pip
| Pigeed Pop
Pip Pop = 1
and let the p X p standard deviation matrix be
Vay 0 + 0
yr} 9 Vor 0
00 ve,
‘Then it is easily verified (see Exercise 223) that
Viepv? =
and
p= (vie svi2yt
(234)
(2:35)
(2:36)
(2:37)
That is, can be obtained from V'? and p, whereas p can be obtained from ¥.
Moreover, the expression of these relationships in terms of matrix operations allows
the calculations to be conveniently implemented on a computer.
Example 2.14 (Computing the correlation matrix from the covariance matrix)
‘Suppose
4 1 2 On %12 O13
r=/1 9 -3/=]e, 022 O23
2-3 25] Les on 035
Obtain V'? and p.‘Mean Vectors and Covariance Matrices, 73
Here
Von 0 0 200
vVe2=!| 0 Von 0 |=|0 3 0
0 0 Voss 005
and
$00
(vy? =]0 40
0 0 3
Consequently, from (2-37), the correlation matrix p is given by
is [ S007] (eaeaie 21] (5200,
qi2ytx~wi2y7=!0 5 off1 9 -3}]0 $0
oo $jl2 -3 aslo o }
Tn
el 1
lies tacas
cat
Partitioning the Covariance Matrix
Often, the characteristics measured on individual trials will fall naturally into two
or more groups. As examples, consider measurements of variables representing
consumption and income or variables representing personality traits and physical
characteristics. One approach to handling these situations js to let the character-
istics defining the distinct groups be subsets of the fotal collection of characteris-
tics. If the total collection is represented by a (p X 1)-dimensional random
vector X, the subsets can be regarded as components of X and can be sorted by
partitioning X.
In general, we can partition the p characteristics contained in the p X 1 random
vector X into, for instance, two groups of size g and p ~ q, respectively, For exam-
ple, we can write