Basic Matrix Theory
Basic Matrix Theory
2 c1 y2 + c2 y2 = A(c1 x1 + c2 x2 ), which implies that c1 y2 + c2 y2 R(A) for any linear combination of y1 , y2 . Moreover, if x1 , x2 N (A), then we have Ax1 = Ax2 = 0, and therefore, A(c1 x1 + c2 x2 ) = 0, which implies that c1 x2 + c2 x2 N (A) for any linear combination of x1 , x2 . It is easy to see that the range of A is the space spanned by the column vectors of A. Note that Ax yields a linear combination of the column vectors of A with weights given by the components of x. We dene the rank of A, denoted by rank (A), to be the dimension of the range of A. On the other hand, the dimension of the null space of A is often referred to as the nullity of A, which is written as nullity (A). Recall that the dimension of a subspace of a vector space is dened to be the number of linearly independent vectors we need to span it. It is well expected that rank (A) = m nullity (A), or equivalently, rank (A) + nullity (A) = m, since we have m nulity (A) linearly independent vectors whose span has empty intersection with the null space of A except for the origin and they are mapped into a linearly independent set of vectors due to the linearity of A. We denote by A the transpose of a matrix A. Moreover, for a subspace M of Rn , we dene { } M = x x y = 0 for all y M ,
which is commonly referred to as the orthogonal complement of M and read as M perp. For instance, the orthogonal complement of the xy -plane is the z -axis in R3 . Note that we have dim M = n dim M , where dim denotes the dimension. It is easy to deduce that R(A) = N (A ). This just states that a vector x is orthogonal to every column vector of A if and only if A x = 0, which is almost tautological.
4 to a constant multiplication, since if xi is an eigenvector associated with eigenvalue i then any constant multiple of xi also becomes an eigenvector associated with the same eigenvalue i . Naturally, there may be multiple eigenvectors associated with a single eigenvalue i . In this case, they are not individually identied even up to a constant multiplication. If, for instance, x1i and x2i are two eigenvectors associated with i , then any linear combination of them is also an eigenvector associated with i , since A(c1 x1i + c2 x2i ) = i (c1 x1i + c2 x2i ) for any constants c1 and c2 . In fact, it is easy to see that eigenvectors associated with any eigenvalue i are identied only up to the space spanned by them, which we call the eigenspace associated with eigenvalue i . The null space, for instance, can be regarded as the eigenspace associated with zero eigenvalue. The dimension of the eigenspace associated with eigenvalue i is sometimes called the geometric multiplicity of eigenvalue i . It is known that the geometric multiplicity cannot exceed the algebraic multiplicity. For instance, if the eigenspace associated with eigenvalue i is 2-dimensional, then i is a root of the determinantal equation that is repeated at least 2-times. There are two commonly used functionals of a matrix, trace and determinant, whose values are solely determined by its eigenvalues. For a matrix A with eigenvalues 1 , . . . , n , we dene trace and determinant of A as tr (A) =
n i=1 n i=1
and
det(A) =
respectively. The trace and determinant can also be obtained directly from the entries of A using the relationships between the coecients a0 , . . . , an and the roots 1 , . . . , n of the determinantal equation. To see this more clearly, we consider a 2-dimensional square matrix A = (aij ), i, j = 1, 2, whose determinantal equation is given by ) ) ( ( 2 a11 + a22 + a11 a22 a12 a21 = 0, and note that we have 1 + 2 = a11 + a22 and 1 2 = a11 a22 a12 a21 if we set 1 and 2 to be the roots of the determinantal equation.
3. Projections
Let P be an n-dimensional square matrix, which we may view as a linear transformation on Rn . We say that a matrix P is idempotent if and only if P 2 = P. For an idempotent matrix P , we have P (P x) = P 2 x = P x for all x Rn , which implies that P is an identity map if restricted to the range R(P ) of P . Note that any vector in R(P ) can be written as P x for some x Rn . As for any other linear transformations on Rn , we have rank (P ) + nullity (P ) = n, i.e., dim R(P ) + dim N (P ) = n. Moreover, R(P ) N (P ) = {0}, since any nonzero vector in R(P ) is mapped to itself and in particular does not belong to N (P ). Consequently, for any vector x Rn we may write x = y + z uniquely for some y R(P ) and z N (P ). For an n-dimensional idempotent matrix P , we have y = P x if x Rn is given by x = y + z with y R(P ) and z N (P ). Therefore, y can be obtained by projecting x on R(P ) along N (P ). For this reason, we call the transformation given by an idempotent matrix a projection. An idempotent matrix itself is also often called a projection. If an idempotent matrix P is also symmetric and P = P , then we have R(P ) N (P ), in which case the transformation given by P becomes an orthogonal projection. We also call a matrix P itself an orthogonal projection if it is idempotent and symmetric. In what follows, we say that P is an m-dimensional projection or orthogonal projection on Rn if R(P ) is an m-dimensional subspace of Rn . The identity matrix is the only n-dimensional projection on Rn . Obviously, any projection P has only two distinct eigenvalues, 0 and 1. For all x N (P ), we have P x = 0, and therefore, N (P ) is the eigenspace associated with eigenvalue 0. On the other hand, for all x R(P ), we have P x = x, and therefore,
6 R(P ) is the eigenspace associated with eigenvalue 1. Let dim R(P ) = m. Then the geometric multiplicity of eigenvalue 1 is m, and 1 must be a root of determinantal equation repeated at least m-times. Likewise, the geometric multiplicity of eigenvalue 0 is n m, and 0 must be a root of determinantal equation repeated at least (n m)times. However, there cannot be more than n-roots to the determinantal equation, the algebraic multiplicities of eigenvalues of 1 and 0 are exactly m and n m. It follows, in particular, that tr (P ) = m, where m is the dimension of projection P . We may easily see that if P is idempotent, so is I P . This implies that if P is a projection, so is I P . In fact, it is clear that I P is a projection on N (P ) along R(P ). Note that R(I P ) = N (P ) and N (I P ) = R(P ) for any projection P . Therefore, if P is an m-dimensional projection in Rn , then I P is an (n m)dimensional projection in Rn . It follows straightforwardly that tr (I P ) = n m if P is m-dimensional. Clearly, we have P (I P ) = (I P )P = 0 for any projection P . If, in particular, P is the orthogonal projection on an m-dimensional subspace M of Rn , then I P is the (n m)-dimensional orthogonal projection on the orthogonal complement M of M . Let A be an n m matrix A of full column rank, i.e., rank (A) = m and A has m-linearly independent column vectors. Now we construct the orthogonal projection P on the range R(A) of A. We choose an arbitrary x Rn and orthogonally project it on R(A), which we denote by P x. We may set P x = Ab for some b Rm , since it is in R(A). To determine b Rm , we note that A (x Ab) = 0, which yields b = (A A)1 A x. Consequently, we have P x = A(A A)1 A x, from which it follows that P = A(A A)1 A since the choice of x Rn was arbitrary. As expected, P is idempotent and symmetric. If, in particular, the column vectors of A are orthonormal, then we have P = AA , since A A = I .
4. Spectral Representation
Throughout this section, we assume that A is an n-dimensional symmetric matrix of real numbers. There are three important facts known for symmetric matrices listed below. (a) All eigenvalues are real. (b) Eigenvectors associated with distinct eigenvalues are orthogonal. (c) Geometric multiplicities of all eigenvalues are identical to their algebraic multiplicities. One immediate consequence of these facts is that there are orthogonal eigenspaces M1 , . . . , Mm associated with real distinct eigenvalues 1 , . . . , m of A, the sum of whose dimensions is exactly n. Notice that Mi Mj = {0} for all i, j = 1, . . . , m, since they are orthogonal. Of course, the number of distinct eigenvalues is generally smaller than n, since some roots of the determinantal equations are repeated. For any n-dimensional symmetric matrix A, we may therefore partition Rn into the eigenspaces M1 , . . . , Mm of A, so that we may write any x Rn uniquely as x = x1 + xm with xi Mi for i = 1, . . . , m. Intuitively, it is clear that xi = Pi x, if we denote by Pi the orthogonal projection on Mi , for i = 1, . . . , m, since M1 , . . . , Mm are orthogonal. It follows that we have x = P1 x + + Pm x for all x Rn , i.e., P1 + + Pm = I . Consequently, we may deduce that ( m ) ( m ) m m Ax = A Pi x = A(Pi x) = i (Pi x) = i P i x
i=1 i=1 m i=1 i=1 i=1
A=
i Pi ,
which is called the spectral representation of A. Note that, if restricted on each of Mi , the transformation given by A is extremely simple and reduces to a scalar multiplication by i , i.e., A(Pi x) = i (Pi x), i = 1, . . . , m.
8 For i = 1, . . . , m, let Hi be a matrix whose column vectors consist of orthonormal eigenvectors associated with eigenvalue i . Then we may write Pi more explicitly as Pi = Hi Hi . If we further let xi1 , . . . , xi be the column vectors of Hi , we have Pi = Hi Hi = j =1 xij xij . Therefore, we may write the spectral representation of A generally as A=
n i=1
i xi xi
with eigenvalues 1 , . . . , n and their corresponding orthonormal eigenvectors x1 , . . . , xn of A, where we allow any eigenvalue i to be repeated arbitrary times. As a consequence, we may represent A as A = U U , where U is an orthogonal matrix having xi in its i-th column and is a diagonal matrix with i on its i-th diagonal entry. Note that U is a nonsingular matrix such that U U = I , and therefore, U = U 1 . We call such a matrix orthogonal. The matrix version A = U U of the spectral representation of A is extremely useful in many dierent contexts. Note that A2 = (U U )(U U ) = U 2 U , which can be easily extended to An = U n U for arbitrary nonnegative integer n. More generally, the spectral representation of A allows us to dene a wide class of functions f (A) with the matrix argument A by f (1 ) .. f (A) = U . f (n ) For instance, we may dene A or A1/2 as above with f () = , as long as A 0, U .
i.e., i 0 for all i = 1, . . . , n. Likewise, log A is dened as above with f () = log , which of course require A > 0, i.e., i > 0 for all i = 1, . . . , n. It is also possible to dene A1 as above with f () = 1/ if A is invertible and none of i , i = 1, . . . , n, is zero. Other functions of A, such as eA , may also be dened similarly with f () = e . Note in particular that, for A = (aij ), f (A) is in general not dened as f (A) = (f (aij )).
9 Above are introduced matrix inequalities A 0 and A > 0 for a symmetric matrix A, in which case we say that A is positive semi-denite and positive denite respectively. It follows from ( x Ax = x
m i=1
) i Pi x=
m i=1
( ) i x Pi x
that A 0 if and only if x Ax 0 for all x Rn , and that A > 0 if and only if x Ax > 0 for all x = 0 in Rn . Note that for any projection P we have x P x = (P x) (P x) = P x2 and P x2 0 for all x Rn and P x2 > 0 for all x = 0 in Rn . Clearly, we have P 0 for any projection P . For symmetric matrices A and B of the same dimension, we write A B and A > B if and only if A B 0 and A B > 0 respectively.
5. Exercises
1. Let A and B be matrices of dimensions n m and m , respectively, and dene ai to be the i-th column of A and bi to be the i-th row of B . Show that AB =
m i=1
ai bi .
Apply this to the case of B = b being a vector with = 1 and show that R(A) becomes the space spanned by the column vectors a1 , . . . , am of A. 2. Show that rank AB = rank A if and only if N (A) R(B ) = {0} for any matrices A and B of conformable dimensions, and use this result to deduce that rank A A = rank A for any matrix A. 3. Let A and B be n m matrices of full column rank such that R(A) R(B ) = {0}. Show that the projection on R(A) along R(B ) in Rn is given by P = A(B A)1 B . Hint: Choose an arbitrary x Rn and write P x = Ab for some b Rm , and obtain b from the condition x Ab R(B ) = N (B ).
10 4. Dene x= ( ) and y = ( ) .
2 1
1 1
(a) Find the orthogonal projection on the span of x. (b) Find the projection on the span of x along the span of y . 5. For a matrix A dened as
A=
3 1 2 2 , 1 3 2 2
nd A10 ,
A , log A, A1 and eA .
6. On matrix inequality, answer the following: (a) Show that A 0 implies B AB 0 for any matrix B of conformable dimension, and use this result to deduce that A B implies C AC C BC for any matrix C of conformable dimension. (b) Show that A I implies A1 I , and use this result to deduce that A B > 0 implies 0 < A1 B 1 . Hint: Note that, if A has the spectral representation A = m AI = m i=1 (i 1)Pi , since i=1 Pi = I . m
i=1
i Pi , then we have