LAforAIML 2

Indian Institute of Technology Kharagpur
Centre of Excellence in Artificial Intelligence

AI61003 Linear Algebra for AI and ML
Assignment 2, Due on: October 20, 2021
ANSWER ALL THE QUESTIONS
1. Let A, B ∈ Rn×n . Prove that ∥AB∥2 ⩽ ∥A∥2 ∥B∥2 . This property of 2-norm
is called as sub-multiplicativity property. Does this property hold true for
Frobenius norm?
2. Let A ∈ Rn×n be an invertible matrix. Define max mag(A) and min mag(A)
and cond(A). Show that
1
(a) max mag(A) =
min mag(A−1 )
max mag(A)
(b) cond(A) =
min mag(A)
3. In each of the following cases, consider the matrix A ∈ Rm×n as a linear
function from Rn to Rm . Plot the unit sphere in Rn . Plot the ellipsoid obtained
in Rm as image of the unit sphere in Rn . Compute the condition number of
A (using inbuilt command). Further, if m = n, check whether the matrix is
invertible. Compute the determinant of A as well. Is there any relationship
between determinant and condition number?
 1 
−√ 0
 2 
(a) A =  0
 1 
− √ 
2
 
−1 1

−2 1 2
(b) A =
0 2 0

1 0.9
(c) A =
0.9 0.8

1 0
(d) A =
0 −10

1 1
(e) A = , where ε = 10, 5, 1, 10−1 , 10−2 , 10−4 , 0.
1 ε
4. For a matrix A with the property that the columns of A are linearly indepen-
dent, give the geometrical interpretation of the least squares solution to the
problem Ax = b and justify the name normal equations. In case, the matrix
A does not have linearly independent columns, comment on the nature of the
least squares solution.
1/3
5. Consider the system of linear equations Ax = b where A ∈ Rn×n is an invertible
matrix and b ∈ Rn is a given vector. Discuss the advantages in the case when
A is orthogonal.
6. Bi-linear interpolation: We are given scalar value at each of the M N grid

points of a grid in R2 with a typical grid point represented as Pij = (xi , yj )
where i = 1, 2, . . . , M and j = 1, 2, . . . , N and x1 < x2 < · · · < xM and
y1 < y2 < · · · < yN . Let the scalar value at the grid point Pij be referred to
as Fij for i = 1, 2, . . . , M and j = 1, 2, . . . , N . A bi-linear interpolation is a
function of the form
f (u, v) = θ1 + θ2 u + θ3 v + θ4 uv
where θ1 , θ2 , θ3 , θ4 are the coefficients. This function further satisfies f (Pij ) =

Fij for i = 1, 2, . . . , M and j = 1, 2, . . . , N .
(a) Express these interpolation conditions as a system linear equations of the

form Aθ = b where b is an M N vector consisting of Fij values. Write
clearly all the entries of A, θ and b and their sizes.
(b) What are the minimum values of M and N so that you may expect a
unique solution to the system of equations Aθ = b?
7. Iterative LS : Let A ∈ Rm×n have linearly independent columns and let b ∈ Rm

be a given vector. Further, let x
b denote the LS solution to the problem Ax = b.
(1)
Define x = 0 and for k = 0, 1, 2, . . .
1 ⊤
x(k+1) = x(k) − (k)

A Ax − b
∥A∥2
(a) Show that the sequence {x(k) } converges to x

b as k → ∞.
(b) Discuss the computational complexity of computing {x(k) } for any k ⩾ 1.
(c) Generate a 30 × 10 random matrix A and a 30 × 1 random vector b.
Check that the matrix is full column rank! Run the algorithm for 100
steps. Verify numerically that the algorithm converges to x
b.
(d) Do you think this iterative method may be computationally beneficial
over the direct methods of computing the LS solution?
8. Suppose that z1 , z2 , . . . , z100 is observed time series data. An autoregressive

model for this data has the following form.
zbt+1 = θ1 zt + · · · + θM zt−M +1 , t = M, M + 1, . . . , 100
where M is the memory or the lag of the model. This model can be used to
predict the next observation in the time series.
(a) Set up a least squares problem to estimate the parameters in the model.
(b) Clearly write down the matrices A and b in the least squares formulation.
(c) What is the special structure that one can observe in A?
2/3
(d) Is there any relation of rank of A with M ?
9. Polynomial Classifier: Generate 500 random vectors x(i) ∈ R2 for i = 1, 2, . . . , 500

from a standard normal distribution. Define, for i = 1, 2, . . . , 500,
(i) (i)

(i) +1 x1 x2 ⩾ 0
y =
−1 otherwise
Fit a polynomial least squares classifier of degree 2 to the data set using the
polynomial
fe(x) = θ1 + θ2 x1 + θ3 x2 + θ4 x1 x2 + θ5 x21 + θ6 x22
(a) Give the error rate of the classifier using the confusion matrix.
(b) Show the regions in the R2 plane where the classifier model fb(x) = 1 and
fb(x) = −1.
(c) Does the second degree polynomial g = x1 x2 classify the generated points
with zero error? Compare the parameters estimated polynomial model
from the data with those of g.
10. MNIST dataset: For each of the digit 0, 1, . . . , 9 randomly select 1000 images
to generate a training data set of size 10000 images. Similarly generate a test
data set of 1000 images as a test data set. Fit a linear least squares classifier to
classify the data set into 10 classes and test prediction accuracy of the model
using the 10 × 10 confusion matrix. Do not use any inbuit functions for fitting
the model.
****************** THE END ******************
3/3

LAforAIML 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LAforAIML 2

Uploaded by

Copyright:

Available Formats

Indian Institute of Technology Kharagpur

Centre of Excellence in Artificial Intelligence

ANSWER ALL THE QUESTIONS

6. Bi-linear interpolation: We are given scalar value at each of the M N grid

where θ1 , θ2 , θ3 , θ4 are the coefficients. This function further satisfies f (Pij ) =

(a) Express these interpolation conditions as a system linear equations of the

7. Iterative LS : Let A ∈ Rm×n have linearly independent columns and let b ∈ Rm

(a) Show that the sequence {x(k) } converges to x

8. Suppose that z1 , z2 , . . . , z100 is observed time series data. An autoregressive

zbt+1 = θ1 zt + · · · + θM zt−M +1 , t = M, M + 1, . . . , 100

9. Polynomial Classifier: Generate 500 random vectors x(i) ∈ R2 for i = 1, 2, . . . , 500

fe(x) = θ1 + θ2 x1 + θ3 x2 + θ4 x1 x2 + θ5 x21 + θ6 x22

THE END

You might also like