Inner Product Space IITB
Inner Product Space IITB
Inner Product Space IITB
Remark 6.1
(i) Observe that we have not mentioned whether V is a real vector space or a complex vector
space. The above definition includes both the cases. The only difference is that if K = R then
the conjugation is just the identity. Thus for real vector spaces, (1) will becomes ‘symmetric
property’ since for a real number c, we have c̄ = c.
(ii) Combining (1) and (2) we obtain
(2’) hαu + βv, wi = ᾱhu, wi + β̄hv, wi. In the special case when K = R, (2) and (2’)
together are called ‘bilinearity’.
Example 6.1
(i) The dot product in Rn is an inner product. This is often called the standard inner product.
(ii) Let x = (x1 , x2 , . . . , xn )t and y = (y1 , y2, . . . , yn )t ∈ Cn . Define hx, yi = ni=1 xi yi = x∗ y.
P
50
6.2 Norm Associated to an Inner Product
Definition 6.2
Let V be an inner product space. For q any v ∈ V, the norm of v, denoted by kvk, is the
positive square root of hv, vi : kvk = hv, vi.
For standard inner product in Rn , kvk is the usual length of the vector v.
Proposition 6.1 Let V be an inner product space. Let u, v ∈ V and c be a scalar. Then
(i) kcuk = |c|kuk;
(ii) kuk > 0 for u 6= 0;
(iii) |hu, vi| ≤ kukkvk;
(iv) ku + vk ≤ kuk + kvk, equality holds in this triangle inequality iff u, v are linearly
dependent over R.
w = v − au.
Substitute a = hu, vi/hu, ui and multiply by hu, ui, we get |hu, vi|2 ≤ hu, uihv, vi.
This proves (iii).
(iv) Using (iii), we get ℜ(hu, vi) ≤ kuk kvk. Therefore,
d(u, v) = ku − vk.
51
Definition 6.4 Let V be a real inner product space. The angle between vectors u, v is defined
to be the angle θ, 0 ≤ θ ≤ π so that
hu, vi
cos θ = .
kukkvk
Proposition 6.2
(i) d(u, v) ≥ 0 and d(u, v) = 0 iff u = v;
(ii) d(u, v) = d(v, u);
(iii) d(u, w) ≤ d(u, v) + d(v, w).
Definition 6.5 Two vectors u, v of an inner product space V are called orthogonal to each
other or perpendicular to each other if hu, vi = 0. We express this symbolically by u ⊥ v. A
set S consisting of non zero elements of V is called orthogonal if hu, vi = 0 for every pair of
elements of S. Further if every element of V is of unit norm then S is called an orthonormal
set. Further, if S is a basis for V then it is called an orthonormal basis.
One of the biggest advantage of an orthogonal set is that Pythagorus theorem and its
general form are valid, which makes many computations easy.
Proof: We have,
c1 u1 + c2 u2 + . . . + cn un = 0.
ci hui , ui i = 0.
52
Definition 6.6 If u and v are vectors in an inner product space V and v 6= 0 then the
vector
v
hv, ui
kvk2
is called the orthogonal projection of u along v.
Remark 6.2 The order in which you take u and v matters here when you are working over
complex numbers because of the inner product is linear in the second slot and conjugate linear
in the first slot. Over the real numbers, however, this is not a problem.
Proof: The construction of the orthonormal set is algorithmic and of course, inductive. First
we construct an intermediate orthogonal set {w1 , . . . , wn } with the same property for each
k. Then we simply ‘normalize’ each element, viz, take
wi
vi = .
kwi k
That will give us an orthonormal set as required. The last part of the theorem follows if we
begin with {u1 , . . . , um } as any basis for V.
Take w1 := u1 . So, the construction is over for k = 1. To construct w2 , subtract from
u2 , its orthogonal projection along w1 Thus
w1
w2 = u2 − hw1 , u2 i .
kw1 k2
u2
w1
w2
Check that hw2 , w1 i = 0. Thus {w1 , w2 } is an orthogonal set. Check also that L({u1 , u2 }) =
L({w1 , w2 }). Now suppose we have constructed wi for i ≤ k < n as required. Put
k
X wj
wk+1 = uk+1 − hwj , uk+1 i .
j=1 kwj k2
Now check that wk+1 is orthogonal to wj , j ≤ k and L({w1 , w2 , . . . , wk+1}) = L({u1 , . . . , uk }).
By induction, this completes the proof. ♠
53
Example 6.2 Let V = P3 [−1, −1] denote the real vector space of polynomials of degree
at most 3 defined on [−1, 1] together with the zero polynomial. V is an inner product space
under the inner product Z 1
hf, gi = f (t)g(t) dt.
−1
To find an orthonormal basis, we begin with the basis {1, x, x2 , x3 }. Set v1 = w1 = 1. Then
1
w2 = x − hx, 1i
k1k2
1 1
Z
= x− tdt = x,
2 −1
1 x
w3 = x2 − hx2 , 1i − hx2 , xi
2 (2/3)
1 3
Z 1 Z 1
= x2 − t2 dt − x t3 dt
2 −1 2 −1
1
= x2 − ,
3
1 x 1 x2 − 31
w4 = x3 − hx3 , 1i − hx3 , xi − hx3 , x2 − i
2 (2/3) 3 (2/5)
3
= x3 − x.
5
Thus {1, x, x2 − 13 , x3 − 53 x} is an orthogonal basis. We divide these by respective norms to
get an orthonormal basis.
s √ √
1 3 1 3 5 3 5 7
√ , x , (x2 − ) √ , (x3 − x) √ .
2 2 3 2 2 5 2 2
You will meet these polynomials while studying differential equations.
54
Proof: Fix an orthonormal basis {w1 , . . . , wk } for W. Define w = ki=1 hwi , viwi . Take
P
Remark 6.3
(i) The element w so obtained is called the orthogonal projection of v on W. One can easily
check that the assignment v 7→ w itself defines a linear map PW : V −→ W. (Use the formula
2
PW (v) = i hwi , viwi .) Also observe that PW is identity on W and hence PW = PW .
P
(ii) Observe that if V itself is finite dimensional, then it follows that the above proposition is
applicable to all subspaces in V. In this case it easily follows that dim W + dim W ⊥ = dimV
(see exercise at the end of section Linear Dependence.)
Answer: Step 1. Apply GJEM to find a basis for the null-space W of A and a partic-
ular solution p of the system.
Step 2. Apply Gram-Schmidt process to the basis of W to get an orthonomal basis
{v1 , . . . , vk } for W.
Step 3. Let PW (v − p) = kj=1 h(v − p), vj ivj denote the orthogonal projection of v − p
P
55
(a) Write down a basis for the space of solution of the associated homogeneous system.
(b) Given the vector v = (1, 6, 2, 1, 1)t, find the solution to the system which is nearmost to v.
56
7 Eigenvalues and Eigenvectors
7.1 Introduction
The simplest of matrices are the diagonal ones. Thus a linear map will be also easy to
handle if its associated matrix is a diagonal matrix. Then again we have seen that the
matrix associated depends upon the choice of the bases to some extent. This naturally leads
us to the problem of investigating the existence and construction of a suitable basis with
respect to which the matrix associated to a given linear transformation is diagonal.
Definition 7.1 A n × n matrix A is called diagonalizable if there exists an invertible n × n
matrix M such that M −1 AM is a diagonal matrix. A linear map f : V −→ V is called
diagonalizable if the matrix associated to f with respect to some basis is diagonal.
Remark 7.1
(i) Clearly, f is diagonalizable iff the matrix associated to f with respect to some basis (any
basis) is diagonalizable.
(ii) Let {v1 , . . . , vn } be a basis. The matrix Mf of a linear transformation f w.r.t. this basis
is diagonal iff f (vi ) = λi vi , 1 ≤ i ≤ n for some scalars λi . Naturally a subquestion here is:
does there exist such a basis for a given linear transformation?
Remark 7.2
(i) It is easy to see that eigenvalues and eigenvectors of a linear transformation are same as
those of the associated matrix.
(ii) Even if a linear map is not diagonalizable, the existence of eigenvectors and eigenvalues
itself throws some light on the nature of the linear map. Thus the study of eigenvalues becomes
extremely important. They arise naturally in the study of differential equations. Here we shall
use them to address the problem of diagonalization and then see some geometric applications
of diagonalization itself.
57
Definition 7.3 For any square matrix A, the polynomial χA (λ) = det (A−λI) in λ is called
the characteristic polynomial of A.
Hence eigenvalues of A are 3 and 6. The eigenvalue λ = 3 is a double root of the charac-
teristic polynomial of A. We say that λ = 3 has algebraic multiplicity 2. Let us find the
eigenspaces E(3) and E(6).
0 0 0
λ = 3 : A − 3I = −2 1 2
. Hence rank (A − 3I) = 1. Thus nullity (A − 3I) = 2. By
−2 1 2
solving the system (A − 3I)v = 0, we find that
The dimension of EA (λ) is called the geometric multiplicity of λ. Hence geometric mul-
tiplicity of λ = 3 is2.
−3 0 0
λ = 6 : A − 6I = −2 −2 . Hence rank(A − 6I) = 2. Thus dim EA (6) = 1. (It
2
−2 1 −1
can be shown that {(0, 1, 1)} is a basis of EA (6).) Thus both the algebraic and geometric
multiplicities" of the #eigenvalue 6 are equal to 1.
1 1
(3) A = . Then det (A − λI) = (1 − λ)2 . Thus λ = 1 has algebraic multiplicity
0 1
2.
58
" #
0 1
A−I = . Hence nullity (A − I) = 1 and EA (1) = L{e1 }. In this case the
0 0
geometric multiplicity is less than the algebraic multiplicity of the eigenvalue 1.
Remark 7.3
(i) Observe that χA (λ) = χM −1 AM (λ). Thus the characteristic polynomial is an invariant
of similarity. Thus the characteristic polynomial of any linear map f : V −→ V is also
defined (where V is finite dimensional) by choosing some basis for V, and then taking the
characteristic polynomial of the associated matrix M(f ) of f. This definition does not depend
upon the choice of the basis.
(ii) If we expand det (A − λI) we see that there is a term
This is the only term which contributes to λn and λn−1 . It follows that the degree of the
characteristic polynomial is exactly equal to n, the size of the matrix; moreover, the coefficient
of the top degree term is equal to (−1)n . Thus in general, it has n complex roots, some of
which may be repeated, some of them real, and so on. All these patterns are going to influence
the geometry of the linear map.
(iii) If A is a real matrix then of course χA (λ) is a real polynomial. That however, does
not allow us to conclude that it has real roots. So while discussing eigenvalues we should
consider even a real matrix as a complex matrix and keep in mind the associated linear
map Cn −→ Cn . The problem of existence of real eigenvalues and real eigenvectors will be
discussed soon.
(iv) Next, the above observation also shows that the coefficient of λn−1 is equal to
Lemma 7.1 Suppose A is a real matrix with a real eigenvalue λ. Then there exists a real
column vector v 6= 0 such that Av = λv.
Proof: Start with Aw = λw where w is a non zero column vector with complex entries.
Write w = v + ıv′ where both v, v′ are real vectors. We then have
Compare the real and imaginary parts. Since w 6= 0, at least one of the two v, v′ must be a
non zero vector and we are done. ♠
59
(−1)n λn + (−1)n−1 λn−1 (a11 + . . . + ann ) + . . . (48)
Comparing (49) and 51), we get, the constant term of det (A − λI) is equal to λ1 λ2 . . . λn =
det A and tr(A) = a11 + a22 + . . . + ann = λ1 + λ2 + . . . + λn . ♠
λ1 (c1 v1 + c2 v2 + . . . + ck vk ) − (λ1 c1 v1 + λ2 c2 v2 + . . . + λk ck vk )
= (λ1 − λ2 )c2 v2 + (λ1 − λ3 )c3 v3 + . . . + (λ1 − λk )ck vk = 0
C −1 AC = D(λ1 , . . . , λn ) = D
60
7.3 Relation Between Algebraic and Geometric Multiplicities
Recall that
Proposition 7.5 Both algebraic multiplicity and the geometric multiplicities are invariant
of similarity.
Proof: We have already seen that for any invertible matrix C, χA (λ) = χC −1 AC (λ). Thus
the invariance of algebraic multiplicity is clear. On the other hand check that EC −1 AC (λ) =
C(EA (λ)). Therefore, dim (EC −1 AC (λ)) = dim C(EA λ)) = dim (EA (λ)), the last equality
being the consequence of invertibility of C.
♠
We have observed in a few examples that the geometric multiplicity of an eigenvalue is
at most its algebraic multiplicity. This is true in general.
Proposition 7.6 Let A be an n×n matrix. Then the geometric multiplicity of an eigenvalue
µ of A is less than or equal to the algebraic multiplicity of µ.
Proof: Put aA (µ) = k. Then (λ − µ)k divides det (A − λI) but (λ − µ)k+1 does not.
Let gA (µ) = g, be the geometric multiplicity of µ. Then EA (µ) has a basis consisting
of g eigenvectors v1 , v2 , . . . , vg . We can extend this basis of EA (µ) to a basis of Cn , say
{v1 , v2 , . . . , vg , . . . , vn }. Let B be the matrix such that B j = vj . Then B is an invertible
matrix and
µIg X
−1
B AB =
0 Y
where X is a g × (n − g) matrix and Y is an (n − g) × (n − g) matrix. Therefore,
Thus g ≤ k. ♠
Remark 7.4 We will now be able to say something about the diagonalizability of a given
matrix A. Assuming that there exists B such that B −1 AB = D(λ1 , . . . , λn ), as seen in the
previous proposition, it follows that AB = BD . . . etc.. AB i = λB i where B i denotes the
ith column vector of B. Thus we need not hunt for B anywhere but look for eigenvectors of
A. Of course B i are linearly independent, since B is invertible. Now the problem turns to
61
the question whether we have n linearly independent eigenvectors of A so that they can be
chosen for the columns of B. The previous proposition took care of one such case, viz., when
the eigenvalues are distinct. In general, this condition is not forced on us. Observe that the
geometric multiplicity and algebraic multiplicity of an eigenvalue co-incide for a diagonal
matrix. Since these concepts are similarity invariants, it is necessary that the same is true
for any matrix which is diagonalizable. This turns out to be sufficient also. The following
theorem gives the correct condition for diagonalization.
Proof: We have already seen the necessity of the condition. To prove the converse, suppose
that the two multiplicities coincide for each eigenvalue. Suppose that λ1 , λ2 , . . . , λk are all
the eigenvalues of A with algebraic multiplicities n1 , n2 , . . . , nk . Let
62