Module 2 - DS I
Module 2 - DS I
Module-2
Best-Fit Subspaces and Singular Value
Decomposition (SVD)
Prerequisites….
Orthogonal and Orthonormal vectors
Orthogonal and Orthonormal
vectors
LINEAR INDEPENDENCE OF VECTORS
The smallest subspace containing a finite set of vectors of a vector space is said to be linear
span of the set. (That is the set (basis vectors) is said to be spanning the subspace)
Eigen values and Eigen vectors
Eigen Value and Eigen Vector
• Here the dependent variable is the house price and there are
a large number of independent variables (features) on which
the house price depends.
(0,2)
(2,1)
(3,1)
Recall
(3,5)
(0,2)
(2,1)
(3,1)
What do matrices do to vectors?
(3,5)
• The new vector is:
(0,2)
(2,1)
1) rotated
(3,1)
2) scaled
Are there any special vectors
that only get scaled?
Are there any special vectors
that only get scaled?
Try (1,1)
Are there any special vectors
that only get scaled?
= (1,1)
Are there any special vectors
that only get scaled?
= (3,3)
= (1,1)
Are there any special vectors
that only get scaled?
Or,
Or,
𝐴= [ 𝑎
The equation
Finding Eigen value, Eigen vector
Example-1:
1. Ax = λx requires A to be square.
2. Eigen vectors are generally not orthogonal.
3. There are not always enough Eigen vectors to
construct P for diagonalization
A UV T
mm mn V is nn
• The diagonal values in the Sigma matrix are known as the singular
values of the original matrix A.
• The columns of the U matrix are called the left-singular vectors of A.
• The columns of V are called the right-singular vectors of A.
• Left and right singular vectors are pairwise orthogonal
Singular Value Decomposition (SVD)
A UV T
diag 1... r
Properties of SVD
• Exercise
1. Find the SVD of
2. Find the SVD of
3. Find the SVD of
Low rank approximation of a matrix
• Often a data matrix A is close to a low rank matrix and SVD is useful
to find a good low rank approximation to A.
• For any k (rank); the SVD of A gives the best rank-k approximation
to A.
• They usually have sparse representations- you don't need to store the
m*n numbers for a low rank approximation.
• Since from SVD we can find the low rank approximation of a matrix
(say ‘A’), it can be used to find the best-fitting k-dimensional subspace
for k = 1,2,3,…. For the set of n data points.
**Argmax is an operation that finds the argument (index) that gives the maximum value from a target
function. Argmax is most commonly used in machine learning for finding the class with the largest
predicted probability.
Finding best-fit sub space
Finding best-fit sub space:
Greedy algorithm
Finding best-fit sub space:
Greedy algorithm
Finding best-fit sub space:
Greedy algorithm
The following theorem establishes that the greedy algorithm finds the
best subspaces of every dimension.
If one wants statistical information relative to the mean of the data, one needs to
center the data. If one wants the best low rank approximation, one would not
center the data.
• It can be shown that the line minimizing the sum of squared
distances to a set of points, if not restricted to go through the
origin, must pass through the centroid of the points.
• One hypothesizes that there are only k underlying basic factors that
determine how much a given customer will like a given movie, where .
PCA
• Principal components analysis (PCA) is a technique that can be
used to simplify a dataset.
57
Are they correleted?
PCA
• Properties
– It can be viewed as a rotation of the existing axes to new
positions in the space defined by original variables
– New axes are orthogonal and represent the directions with
maximum variability
PCA
• PCA is performed by finding the eigenvalues and eigenvectors
of the covariance matrix.
Original Variable B PC 2
PC 1
Original Variable A
=
=
=
Why ??? Because only then the transformed data, Y is completely decorrelated !!
• Hence columns of P should diagonalize Cov(X) matrix
• Using SVD we can find the subspace that contains the centers.
• We discussed in Chapter 2 distances between two sample points from
the same Gaussian as well the distance between two sample points
from two different Gaussians.
• Recall from Module-1 that if
• It can be shown that that the top k singular vectors produced by the
SVD span the space of the k centers. For this,
• Then we show that for a single (1D) spherical Gaussian whose center is
not the origin, the best fit 1-dimensional subspace is the line though
the center of the Gaussian and the origin.
• Next, we show that the best fit k-dimensional subspace for a single
Gaussian whose center is not the origin is any k-dimensional subspace
containing the line through the Gaussian's center and the origin.
• Thus, the SVD finds the subspace that contains the centers.
Best-fit k-dimensional subspace
• It can be shown that that the top k singular vectors produced by the
SVD span the space of the k centers. For this,
• Then we show that for a single (1D) spherical Gaussian whose center is
not the origin, the best fit 1-dimensional subspace is the line though
the center of the Gaussian and the origin.
• Next, we show that the best fit k-dimensional subspace for a single
Gaussian whose center is not the origin is any k-dimensional subspace
containing the line through the Gaussian's center and the origin.
• Thus, the SVD finds the subspace that contains the centers.
• Recall that for a set of points, the best-fit line is the line passing
through the origin that maximizes the sum of squared lengths of the
projections of the points onto the line.
• Then it can be shown that for a single (1D) spherical Gaussian whose
center is not the origin, the best fit 1-dimensional subspace is the line
though the center of the Gaussian and the origin.
• Next, we show that the best fit k-dimensional subspace for a single
Gaussian whose center is not the origin is any k-dimensional subspace
containing the line through the Gaussian's center and the origin.
• Thus, the SVD finds the subspace that contains the centers.
• It can be shown that that the top k singular vectors produced by the
SVD span the space of the k centers. For this,
• Then it can be shown that for a single (1D) spherical Gaussian whose
center is not the origin, the best fit 1-dimensional subspace is the line
though the center of the Gaussian and the origin.
• Next, we show that the best fit k-dimensional subspace for a single
Gaussian whose center is not the origin is any k-dimensional subspace
containing the line through the Gaussian's center and the origin.
• Thus, the SVD finds the subspace that contains the centers.
• It can be shown that that the top k singular vectors produced by the
SVD span the space of the k centers. For this,
• Then it can be shown that for a single (1D) spherical Gaussian whose
center is not the origin, the best fit 1-dimensional subspace is the line
though the center of the Gaussian and the origin.
• Next, we show that the best fit k-dimensional subspace for a single
Gaussian whose center is not the origin is any k-dimensional subspace
containing the line through the Gaussian's center and the origin.
• Thus, the SVD finds the subspace that contains the centers.
• For an infinite set of points drawn according to the mixture, the
k-dimensional SVD subspace gives exactly the space of the
centers.