Lec 6 Ginverse

San José State University
Math 253: Mathematical Methods for Data Visualization
Lecture 6: Generalized inverse and pseudoinverse
Dr. Guangliang Chen

Outline
• Matrix generalized inverse
• Pseudoinverse
• Application to solving linear systems of equations

Generalized inverse and pseudoinverse
Recall
... that a square matrix A ∈ Rn×n is invertible if there exists a square matrix
B of the same size such that
AB = BA = I
In this case, B is called the matrix inverse of A and denoted as B = A−1 .
We already know that two equivalent ways of characterizing a square, invertible

matrix A are
• A has full rank, i.e., rank(A) = n
• A has nonzero determinant: det(A) 6= 0
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 3/43
Remark. For any invertible matrix A ∈ Rn×n and any vector b ∈ Rn , the linear
system Ax = b has a unique solution x∗ = A−1 b.
MATLAB command for solving a linear system Ax = b
A\b % recommended
inv(A) ∗ b % avoid (especially when A is large)
What about general matrices?
Let A ∈ Rm×n . We would like to address the following questions:
• Is there some kind of inverse?
• Given a vector b ∈ Rm , how can we solve the linear system Ax = b?
More motivation
In many practical tasks such as multiple linear regression, the least squares
problem arises naturally:
min kAx − bk2 (where A ∈ Rm×n , b ∈ Rm are fixed)

x∈Rn
If A has full column rank (i.e., rank(A) = n ≤ m), then the above problem has
a unique solution
x∗ = (AT A)−1 AT b
We want to better understand the matrices:
• (AT A)−1 AT (pseudoinverse): Optimal solution is x∗ = (AT A)−1 AT b;
• A(AT A)−1 AT (projection matrix): Closest approximation of b is Ax∗ =

A(AT A)−1 AT b
Generalized inverse
Def 0.1. Let A ∈ Rm×n be any matrix. We call the matrix G ∈ Rn×m a
generalized inverse of A if it satisfies
AGA = A
Remark. If A is square and invertible, then it has one and only one generalized
inverse which must coincide with the ordinary inverse A−1 . To see this, first
observe that A−1 apparently satisfies the definition and thus is a generalized
inverse. Conversely, if A has a generalized inverse G, then from the equation
AGA = A we get
G = A−1 (AGA)A−1 = A−1 (A)A−1 = A−1
This thus justifies the term “generalized inverse”.
Remark. For a general matrix A ∈ Rm×n , its generalized inverse always exists
but might not be unique.
1×2
" # example, let A = [1, 2] ∈ R . Its generalized inverse is a matrix G =
For
x
∈ R2×1 satisfying
y
" #
x
[1, 2] = A = AGA = [1, 2] [1, 2] = (x + 2y) · [1, 2].
y
" #
x
This shows that any G = ∈ R2×1 with x + 2y = 1 is a generalized inverse
y
" # " #
1 3
of A, e.g., G = or G = .
0 −1
The following theorem indicates a way to find the generalized inverse of any
matrix.
" #
A11 A12
Theorem 0.1. Let A = ∈ Rm×n be a matrix of rank r, and
A21 A22
" #
−1
A 11 O
A11 ∈ Rr×r . If A11 is invertible, then G = ∈ Rn×m is a generalized
O O
inverse of A.
Remark. Any matrix A ∈ Rm×n with rank r can be rearranged through row
and column permutations to have the above partitioned form with an invertible
r × r submatrix in the top-left corner. This theorem essentially establishes the
existence of a generalized inverse for any matrix.
Remark. We skip the proof but illustrate the theorem with an example:
 
1 2 3
A = 4 5 6
 
7 8 9
Since rank(A) = 2 and the top-left 2 × 2 block is invertible, we can easily find a
generalized inverse  
− 53 2
3 0
G =  43 − 13 0
 
0 0 0
To verify:
     
1 2 3 −5 2
0 1 2 3 1 2 3
  43 3
AGA = 4 5 6  3 − 13 0 4 5 6 = 4 5 6 = A
    
7 8 9 0 0 0 7 8 9 7 8 9
The generalized inverse can also be used to find a solution to a consistent linear
system (i.e., there exists at least a solution).
Theorem 0.2. Consider the linear system Ax = b. Suppose b ∈ Col(A) such that
the system is consistent. Let G be a generalized inverse of A, i.e., AGA = A.
Then x∗ = Gb is a particular solution to the system.
Proof. Multiplying both sides of Ax = b by AG gives that
(AG)b = (AG)Ax = (AGA)x = Ax = b.
This shows that x∗ = Gb is a particular solution to the linear system.
Example 0.1. Consider the linear system Ax = b, where

   
1 2 3 6
A = 4 5 6 , b = 15 .
   
7 8 9 24
According to the system, a particular solution to the system is

    
− 53 2
3 0 6 0
x∗ = Gb =  34 − 13 0 15 = 3
    
0 0 0 24 0
Projection matrices
Def 0.2. A square matrix P is called a projection matrix if P = P2 .
Example 0.2. The following are some projection matrices (but not all):
   
1 0 0 1 0 0
I, 0 1 0 , 0 0 0 , O
   
0 0 0 0 0 0
Remark. Projection matrices must have a determinant of 0 or 1, because
det(P) = det(P2 ) = [det(P)]2 .
Remark. The following statements explain what a projection matrix does:
• A projection matrix P ∈ Rn×n projects any vector in Rn onto its range

(column space). To see this, let x ∈ Rn . Then
 
x1
 .  X
Px = [p1 . . . pn ]  ..  = xi pi ∈ Col(P) ≡ Range(P)
xn
• A projection matrix keeps all points from its range (when applied to them)
in their original places. To see this, let v ∈ Range(P). Then there exists
some x ∈ Rn such that v = Px. It follows that
Pv = P(Px) = P2 x = Px = v.
Theorem 0.3. Let A ∈ Rm×n with a generalized inverse G ∈ Rn×m . Then

AG ∈ Rm×m is a projection matrix.
Proof. From AGA = A, we obtain
(AG)(AG) = (AGA)G = AG.
This shows that AG is a projection matrix.
Remark. Similarly, we can show that GA ∈ Rn×n is also a projection matrix
(GA)(GA) = GA
We’ll focus on AG below.
Remark. AG and A must have the same column space. To see this,
(1) For any y ∈ Col(AG), there exists some x ∈ Rm such that y = (AG)x. It
follows that y = A(Gx) ∈ Col(A). This shows that Col(AG) ⊆ Col(A).
(2) For any y ∈ Col(A), there exists some x ∈ Rn such that y = Ax. Write
y = (AGA)x = (AG)(Ax). This shows that y ∈ Col(AG). Thus,
Col(A) ⊆ Col(AG).
Therefore, AG is a projection matrix onto the column space of A.
Similarly, we can show that GA is a projection matrix onto the row space of A.
Example 0.3. Consider the matrix A and its generalized inverse G:

   
1 2 3 −5 2
0
 43 3
A = 4 5 6 , G= 3 − 13 0
  
7 8 9 0 0 0
We have
    
1 2 3 − 53 2
3 0 1 0 0
AG = 4 5 6  43 − 13 0 =  0 1 0
    
7 8 9 0 0 0 −1 2 0
According to the previous slide,
• AG and A have the same column space.
• AG is a projection matrix onto the column space of A.
Pseudoinverse
Briefly speaking, the matrix pseudoinverse is a generalized inverse with more
constraints.
Def 0.3. Let A ∈ Rm×n . We call the matrix B ∈ Rn×m the pseudoinverse of
A if it satisfies all four conditions below:
(1) ABA = A ←− B is a generalized inverse of A
(2) BAB = B ←− A is a generalized inverse of B
(3) (AB)T = AB ←− AB is symmetric
(4) (BA)T = BA ←− BA is symmetric
Remark.
• If B satisfies Condition (1), it is known as a generalized inverse of A; if B

satisfies Conditions (1) and (2), it is called a reflexive generalized inverse.
Only when B satisfies all 4 conditions, it is called the pseudoinverse of A.
• It can be shown that for any matrix A ∈ Rm×n , the pseudoinverse always
exists and is unique. We denote the pseudoinverse of A as A† .
• A pseudoinverse is sometimes called the Moore–Penrose inverse, after

the pioneering works by E. H. Moore and Roger Penrose.
• The symmetric form of the definition implies B = A† and A = B† , and

thus, A = (A† )† .
Example 0.4. Consider A = [1, 2] ∈ R1×2 again. We showed that any matrix
G = (x, y)T ∈ R2×1 with x + 2y = 1 is a generalized inverse of A:
" #
x
[1, 2] = A = AGA = [1, 2] [1, 2] = (x + 2y) · [1, 2].
y
To find its pseudoinverse, we need to write down three more equations:

" # " # " # " #
x x x x
= G = GAG = [1, 2] = (x + 2y) ·
y y y y
" #
x
x + 2y = (AG)T = AG = [1, 2] = x + 2y
y
" # " # " #
x y T x x 2x
= (GA) = GA = [1, 2] = −→ 2x = y
2x 2y y y 2y
Solving the two equations together gives that x = 15 , y = 25 . Thus, the pseudoin-
verse of A is h iT
A† = 15 52 .
Example 0.5. Let " #

1 0
A= .
1 0
Verify that " #
1 1
† 2 2
A =
0 0
Example 0.6 (Cont’d). Consider the matrix again

 
1 2 3
A = 4 5 6
 
7 8 9
which has the following generalized inverse

 
−5 2
0
 43 3
G= 3 − 13 0

0 0 0
That is, AGA = A. It can be verified that A is also a generalized inverse of G:
GAG = G
Thus, G must be at least a reflexive generalized inverse of A.
However, neither AG nor GA is symmetric:

    
1 2 3 − 53 2
3 0 1 0 0
 4
AG = 4 5 6  3 − 13 0 =  0 1 0
   
7 8 9 0 0 0 −1 2 0
    
5 2
− 0 1 2 3 1 0 −1
 43 3
1
GA =  3 − 3 0 4 5 6 = 0 1 2 
   
0 0 0 7 8 9 0 0 0
Therefore, G is not the pseudoinverse of A.
Orthogonal projection matrices

Since the matrix pseudoinverse is still a generalized inverse, it will automatically
inherit the properties of the matrix generalized inverse. Nevertheless, in many
cases, stronger results can be obtained for a matrix pseudoinverse.
Def 0.4. A square matrix P is called a orthogonal projection matrix if P = PT

and P = P2 .
 
1 0 0 "1 1#
Example 0.7. 0 1 0 , 21 12 are both orthogonal projection matrices,
 
0 0 0 2 2
 
1 0 0
but  0 1 0 is not (it is just a projection matrix).
 
−1 2 0
Theorem 0.4. For any matrix A ∈ Rm×n , AA† is an orthogonal projection

matrix (onto the column space of A).
Proof. First, A† is still a generalized inverse. Thus, AA† is a projection matrix

(onto the column space of A).
Secondly, since A† is the pseudoinverse of A, AA† must be symmetric.
Therefore, by definition, AA† is an orthogonal projection matrix.
Remark. Similarly, A† A is also an orthogonal projection matrix (onto the row

space of A).
Remark. For any projection matrix P ∈ b x

Rn×n and vector x ∈ Rn , we have (I − P)x
x = Px + (I − P)x
b
Px
If P is an orthogonal projection (i.e.,
b
P = PT ), then the two components 0
are orthogonal to each other:
Range(P)
(Px)T (I − P)x = xT P(I − P)x
= xT (P − P2 )x
= 0.
This implies that orthogonal projections

produce orthogonal decompositions of
vectors.
Finding matrix pseudoinverse

Let A ∈ Rm×n . Our goal is to find A† (which exists and is unique).
We first consider the following two special settings:
• A is a tall matrix with full column rank (i.e., rank(A) = n ≤ m).

Note that in this case, AT A ∈ Rn×n is invertible.
• A is a “diagonal” matrix (i.e., aij = 0 whenever i 6= j).
Afterwards, we present a theorem to show how to find the pseudoinverse of a

general matrix via its SVD.
Theorem 0.5. Let A ∈ Rm×n be any tall matrix with full column rank (i.e.,
rank(A) = n ≤ m). Then the pseudoinverse of A is
A† = (AT A)−1 AT .
Proof. It suffices to verify the four conditions for being a pseudoinverse:
AA† A = A · (AT A)−1 AT · A = A

A† AA† = (AT A)−1 AT · A · (AT A)−1 AT = A†
AA† = A(AT A)−1 AT (symmetric)
† T −1 T
A A = (A A) A · A = In (symmetric)
Therefore, A† = (AT A)−1 AT is the pseudoinverse of A.
Remark. The theorem implies that for any tall matrix A ∈ Rm×n with full column
rank (i.e., rank(A) = n ≤ m), the following is an orthogonal projection matrix
(onto the column space of A):
AA† = A(AT A)−1 AT .
Remark. Let U ∈ Rm×n be a tall matrix with orthonormal columns (e.g., an

orthonormal basis matrix). Then it has full column rank, and
 T  
u1 1
 .  ..
UT U =  ..  [u1 . . . un ] =   = In
 
.
unT 1
It follows that
U† = UT (pseudoinverse), and UU† = UUT (orthogonal projection matrix)
Example 0.8. Find the pseudoinverse of

 
1 −1
X = 0 1  .
 
1 0
Solution: Observe that this matrix has full column rank (i.e., 2). We first
compute  
! 1 −1 !
T 1 0 1  2 −1
X X= 0 1  =

−1 1 0 −1 2
1 0
It follows that
! ! !
† T −1 T 1 2 1 1 0 1 1 1 1 2
X = (X X) X = =
3 1 2 −1 1 0 3 −1 2 1
Theorem 0.6. Let A ∈ Rm×n be a diagonal matrix, i.e., all of its entries are zero
except some of those along its diagonal. Then the pseudoinverse of A is another
diagonal matrix B ∈ Rn×m such that
(
1
, if aii 6= 0
bii = aii
0, if aii = 0
Proof. We verify this result using an example. Let

 
" # 0 0
0 0 0 1
A= and B = 0 .

0 3 0 3
0 0
Then  
" # 0 0 0
0 0
AB = and BA = 0 1 0
 
0 1
0 0 0
both of which are symmetric. Furthermore,
" #" # " #
0 0 0 0 0 0 0 0
ABA = = =A
0 1 0 3 0 0 3 0
    
0 0 0 0 0 0 0
BAB = 0 1 0 0 31  = 0 13  = B.
    
0 0 0 0 0 0 0
Thus, B is the pseudoinverse of A.
Theorem 0.7. Let A ∈ Rm×n be any matrix. Suppose its full SVD is A = UΣVT .
Then the pseudoinverse of A is
A† = VΣ† UT
Proof. We verify the four conditions directly:
AA† A = UΣVT · VΣ† UT · UΣVT = UΣΣ† ΣVT = UΣVT = A

A† AA† = VΣ† UT · UΣVT · VΣ† UT = VΣ† ΣΣ† UT = VΣ† UT = A†
AA† = UΣVT · VΣ† UT = UΣΣ† UT (symmetric)
† † T T † T
A A = VΣ U · UΣV = VΣ ΣV (symmetric)
This completes the proof.
Example 0.9. Consider again the matrix

 
1 −1
X = 0 1  .
 
1 0
We have previously found its SVD:
 2  √
√1

√ 0 3 0 !T
6 3 √1 √1
X = − √16 √12 √1  ·  0 1 · 2 2
   
3 − √12 √1
√1 √1 − √1 0 0 2
6 2 3
By the last theorem,

 2 T
! ! √ 0 √1 !
√1 √1 √1 0 0  61 3
1 1 1 2
† 2 2 √1 √1 
X = · 3 · − √6 =

− √12 √1 0 1 0 1
2 3 3 −1 2 1
2 √ √1 − √13
6 2
MATLAB function for computing pseudoinverse

pinv Pseudoinverse.
X = pinv(A) produces a matrix X of the same dimensions

as A0 so that A ∗ X ∗ A = A, X ∗ A ∗ X = X and A ∗ X and X ∗ A
are Hermitian. The computation is based on SV D(A) and any
singular values less than a tolerance are treated as zero.
pinv(A, T OL) treats all singular values of A that are less than T OL as
zero. By default, T OL = max(size(A)) ∗ eps(norm(A)).
Applications of matrix pseudoinverse
• Linear least squares
• Minimum norm solution to a consistent linear system
Linear least squares
Consider a system of linear equations Ax = b where A ∈ Rm×n and b ∈ Rm .
In general, a vector x that solves the system may not exist, or if one does exist,
it may not be unique.
In either case, we seek a least squares solution instead by solving the following
least squares problem
minn kAx − bk
x∈R
This problem always has a solution, as the next slide shows.
Theorem 0.8. A minimizer of the above b b

least squares problem is
x∗ = A† b.
a1 a2 b
Ax
Proof. Since b
0
Ax ∈ Col(A), an
Col(A)
the optimal x should be such that
Ax = (AA† )b
Obviously, x∗ = A† b solves this equa- Remark. If A has full column rank
tion and thus is a solution of the least (i.e., rank(A) = n ≤ m), then the
squares problem (but it might not be least squares solution exits and is unique:
the only solution). x∗ = A† b = (AT A)−1 AT b.
Minimum-norm solution to a consistent linear system
For linear systems Ax = b with non-unique solutions (such as under-determined

systems), the pseudoinverse may be used to construct the solution with minimum
Euclidean norm among all solutions.
Theorem 0.9. Let A ∈ Rm×n and b ∈ Rm . If the linear system Ax = b has

solutions, then x∗ = A† b is an exact solution and has the smallest possible norm,
i.e., kx∗ k ≤ kxk for all solutions x.
Proof. First, since A† is a generalized inverse, it must be a solution to Ax = b.

To show that it has the smallest possible norm, for any solution x ∈ Rn , consider
its orthogonal decomposition via A† A ∈ Rn×n :
x = (A† A)x + (I − A† A)x = A† b + (I − A† A)x
It follows that
kxk2 = kA† bk2 + k(I − A† A)xk2 ≥ kA† bk2
This shows that kxk ≥ kA† bk.
Summary
• Generalized inverse G ∈ Rn×m for a matrix A ∈ Rm×n :
– Definition: AGA = A
– Existence: G always exists but might not be unique
– Computing: How to find a generalized inverse (see slide 9 for formula)
– Property : AG is a projection matrix onto Col(A)
– Application: x = Gb is a solution to Ax = b (if consistent)
• Pseudoinverse A† ∈ Rn×m for a matrix A ∈ Rm×n :
– Definition: AA† A = A† , and A† AA† = A, and both AA† , A† A

are symmetric
– Existence: A† always exists and is unique
– Computing: How to find the pseudoinverse:
∗ If A has full column rank: A† = (AT A)−1 AT
∗ If A is “diagonal”: A† ∈ Rn×m is also “diagonal” with recipro-

cals of nonzero diagonals of A
∗ In general: A† = VΣ† UT (if A = UΣVT )
– Property : AA† is an orthogonal projection matrix onto Col(A)
– Application: For any A ∈ Rm×n , b ∈ Rm , A† b solves the least

squares problem
minn kAx − bk
x∈R
If Ax = b has exact solutions, then A† b is the minimum-norm

solution.

Lec 6 Ginverse

Uploaded by

Copyright:

Available Formats

Lec 6 Ginverse

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 6 Ginverse

Uploaded by

Copyright:

Available Formats

San José State University

Math 253: Mathematical Methods for Data Visualization

Lecture 6: Generalized inverse and pseudoinverse

Dr. Guangliang Chen

• Matrix generalized inverse

• Application to solving linear systems of equations

In this case, B is called the matrix inverse of A and denoted as B = A−1 .

We already know that two equivalent ways of characterizing a square, invertible

• A has full rank, i.e., rank(A) = n

• A has nonzero determinant: det(A) 6= 0

MATLAB command for solving a linear system Ax = b

inv(A) ∗ b % avoid (especially when A is large)

What about general matrices?

Let A ∈ Rm×n . We would like to address the following questions:

• Is there some kind of inverse?

• Given a vector b ∈ Rm , how can we solve the linear system Ax = b?

min kAx − bk2 (where A ∈ Rm×n , b ∈ Rm are fixed)

• (AT A)−1 AT (pseudoinverse): Optimal solution is x∗ = (AT A)−1 AT b;

• A(AT A)−1 AT (projection matrix): Closest approximation of b is Ax∗ =

G = A−1 (AGA)A−1 = A−1 (A)A−1 = A−1

This thus justifies the term “generalized inverse”.

Proof. Multiplying both sides of Ax = b by AG gives that

(AG)b = (AG)Ax = (AGA)x = Ax = b.

This shows that x∗ = Gb is a particular solution to the linear system.

Example 0.1. Consider the linear system Ax = b, where

According to the system, a particular solution to the system is

Remark. Projection matrices must have a determinant of 0 or 1, because

det(P) = det(P2 ) = [det(P)]2 .

Remark. The following statements explain what a projection matrix does:

• A projection matrix P ∈ Rn×n projects any vector in Rn onto its range

Theorem 0.3. Let A ∈ Rm×n with a generalized inverse G ∈ Rn×m . Then

Proof. From AGA = A, we obtain

(AG)(AG) = (AGA)G = AG.

This shows that AG is a projection matrix.

Remark. Similarly, we can show that GA ∈ Rn×n is also a projection matrix

We’ll focus on AG below.

Therefore, AG is a projection matrix onto the column space of A.

Example 0.3. Consider the matrix A and its generalized inverse G:

According to the previous slide,

• AG and A have the same column space.

• AG is a projection matrix onto the column space of A.

(1) ABA = A ←− B is a generalized inverse of A

(2) BAB = B ←− A is a generalized inverse of B

(3) (AB)T = AB ←− AB is symmetric

(4) (BA)T = BA ←− BA is symmetric

• If B satisfies Condition (1), it is known as a generalized inverse of A; if B

• A pseudoinverse is sometimes called the Moore–Penrose inverse, after

• The symmetric form of the definition implies B = A† and A = B† , and

To find its pseudoinverse, we need to write down three more equations:

Example 0.5. Let " #

Example 0.6 (Cont’d). Consider the matrix again

which has the following generalized inverse

That is, AGA = A. It can be verified that A is also a generalized inverse of G:

Thus, G must be at least a reflexive generalized inverse of A.

However, neither AG nor GA is symmetric:

Therefore, G is not the pseudoinverse of A.

Orthogonal projection matrices

Def 0.4. A square matrix P is called a orthogonal projection matrix if P = PT