Algebraic Methods in Data Science: Lesson 1: Dan Garber

Algebraic Methods in Data Science: Lesson 1
Faculty of Industrial Engineering and Management

Technion - Israel Institute of Technology
Dan Garber
https://dangar.net.technion.ac.il/
Winter Semester 2020-2021
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 1 / 25
Introduction
Course staff:
1 Lecturer: Dan Garber (dangar@technion.ac.il)
2 TA in charge: Ido Botzer (idobotzer@campus.technion.ac.il)
3 TA: Or Markovetzki (ormar@campus.technion.ac.il)
Grade:
1 Homework: - 15% of grade - TAKEF
7 assignments (best 6 out of 7)

Mostly theoretical questions but also some programming in Python
2 Final exam: 85% of grade

Introduction
Topics:
1 Complementary material in linear algebra
2 The Singular Value Decomposition, algorithms and applications
3 Linear Systems and the Least Squares Problem, algorithms and
applications
Importance: the material in this course is fundamental to data science,

from some of the most basic models and algorithms to the most advanced
ones.
It is hard to over estimate its importance to modern DS/ML/AI. Word of
advice: this is a very challenging mathematical / algorithmic course. You

will have to be proficient in linear algebra to succeed.
Make as much effort as possible to keep up with it during the semester.
WORK HARD ON YOUR HOMEWORK.
Part I - Complementary material in linear algebra
Notions such as distance between points, angles between lines, or the

concept of two lines being perpendicular (orthogonal) to each other are
well familiar from plane geometry learned in high school.
We will first develop similar notions for the more general and more
abstract linear (vector) spaces and in particular for Rn and Rm×n .
This will in turn lead to the theory of eigenvalues and eigenvectors for real
matrices and (basically) to everything we will do in this course.

Norms
Norms generalize the notion of distance form plane geometry to linear

spaces.
A norm is a function that assigns a strictly positive length (or size) to each
vector in a linear space, except for the zero vector, which is assigned a
length of zero. In the following, let X be a linear space.
Definition
A function k·k : X → R is a norm, if
1 ∀x ∈ X kxk ≥ 0, and kxk = 0 if and only if x = 0 (positivity)
2 ∀x, y ∈ X kx + yk ≤ kxk + kyk (triangle inequality)
3 ∀α ∈ R, x ∈ X : kαxk = |α| · kxk (homogeneity)
Norms - Example (p-norms)

Consider X = Rn . The family of `p norms is defined as follows:
n
!1/p
X
kxkp := |xi |p , 1 ≤ p ≤ ∞.
i=1
In particular, for p = 2 we get the standard Euclidean distance
v
u n
uX
kxk2 := t x2i .
i=1
For p = 1 we obtain the sum-of-absolute-values length (Manhattan

distance)
n
X
kxk1 := |xi |.
i=1
The limit p = ∞ exits, in this case we get the max-absolute-value norm
kxk∞ := lim kxkp = max |xi |.
p→∞ i∈{1,...,n}
Norms - Example (p-norms)
Theorem
Fix X = Rn . For any p ∈ [1, ∞], kxkp := ( ni=1 |xi |p )1/p is a norm.
P
Recall that in order to prove theorem we need to show:

1 ∀x ∈ Rn kxk ≥ 0, and kxk = 0 if and only if x = 0 (positivity)
p p
2 ∀x, yR n kx + ykp ≤ kxk + kykp (triangle inequality)
3 ∀α ∈ R, x ∈ Rn : kαxk = |α| · kxk (homogeneity)
p p
Note positivity holds trivially. Similarly, homogeneity holds since
n
!1/p n
!1/p
X X
kαxkp = |αxi |p = |α|p |xi |p
i=1 i=1
n
!1/p
X
= |α| |xi |p = |α|kxkp .
i=1
It remains to prove that the triangle inequality holds.

Proof of triangle inequality for p-norms

We begin with a warmup.
The case p = 1: for any x, y ∈ Rn we have
n
X n
X
kx + yk1 = |xi + yi | ≤ (|xi | + |yi |) = kxk1 + kyk1 ,
i=1 (1) i=1
where (1) follows from the usual triangle inequality for scalars. The case
p = 2: simply the Euclidean-norm (basic geometry).
The case p = ∞:
kx + yk∞ = max |xi + yi | ≤ max(|xi | + |yi |)

i∈[n] (1) i∈[n]
≤ max |xi | + max |yj | = kxk∞ + kyk∞ ,
i j
where again, (1) follows from the triangle inequality for scalars.
Lets get to proving the general case, i.e., for all p ∈ [1, ∞].
First, note that if x = 0 or y = 0 then the proof is trivial: if w.l.o.g.
x = 0 we have kx + ykp = kykp = kykp + 0 = kykp + kxkp .
Consider now the case that kxkp + kykp = 1. It suffices to show that
kx + ykpp ≤ 1 = (kxkp + kykp )p .
Definition: a function g(x) : R → R is convex on interval (a, b) if for any
x, y ∈ (a, b), λ ∈ [0, 1] we have g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y).
First, note that if x = 0 or y = 0 then the proof is trivial, since if w.l.o.g.

x = 0 we have kx + ykp = kykp = kykp + 0 = kykp + kxkp .
Consider now the case that kxkp + kykp = 1. It suffices to show that
kx + ykpp ≤ 1 = (kxkp + kykp )p .
Definition: a function g(x) : R → R is convex on interval (a, b) if for any

x, y ∈ (a, b), λ ∈ [0, 1] we have g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y).
Fact: the scalar function f (x) = |x|p is convex on (−∞, ∞). That is, for
any x, y ∈ R and λ ∈ [0, 1] it holds that
|λx + (1 − λ)y|p ≤ λ|x|p + (1 − λ)|y|p .

We assume x, y 6= 0, kxkp + kykp = 1 and need to prove kx + ykpp ≤ 1.
Convexity of |x|p : ∀x, y, λ ∈ [0, 1]: |λx + (1 − λ)y|p ≤ λ|x|p + (1 − λ)|y|p .
Let us denote λ = kxkp . Note that 0 < λ < 1 and that kykp = 1 − λ
(since x, y 6= 0 and kxkp < kxkp + kykp = 1).
n n
x y p

i i
X X
kx + ykpp = |xi + yi |p = λ + (1 − λ) .

i=1 i=1
λ 1 − λ
Applying convexity of |x|p for every i ∈ [n] we have that

n x p y p n n
X i i X X
kx + ykpp ≤ λ + (1 − λ) =λ 1−p p
|xi | + (1 − λ)1−p
|yi |p
i=1
λ 1 − λ i=1 i=1
= λ1−p kxkpp + (1 − λ)1−p kykpp = kxk1−p
p kxkpp + kyk1−p
p kykpp
= kxkp + kykp = 1.
And so we have proved the claim for the case x, y 6= 0, kxkp + kykp = 1.
Finally, we need to consider the case x, y 6= 0 and kxkp + kykp 6= 1.
In this case let us denote M = kxkp + kykp . Using the homogeneity of

k·kp (which we already proved) we have
x y p
kx + ykpp ≤ (kxkp + kykp )p ⇐⇒ M p k + kp ≤ M p
M M
x y p
⇐⇒ k + k ≤ 1.
M M p
x y 1 M
Observe also that k M kp + k M kp = M (kxkp + kykp ) = M = 1.
Thus, we are back at the previous case.

Inner Product Spaces
Inner product is a function that associates any two vectors in a linear
space with a scalar value. It will be important to generalize familiar
concepts from plane geometry such as angles, or orthogonality and much
more, to abstract linear spaces.
Definition
An inner product on a (real) vector space X is a function which maps any
pair x, y ∈ X into a real scalar denoted by hx, yi, which satisfies the
following axioms for any x, y, z ∈ X and scalar α ∈ R:
1 hx, xi ≥ 0, and hx, xi = 0 if and only if x = 0 (positivity)
2 hx + y, zi = hx, zi + hy, zi (additivity)
3 hαx, yi = αhx, yi (homogeneity)
4 hx, yi = hy, xi (symmetry)
A vector space equipped with an inner product is called an inner product

space.
Example - the standard inner product defined in Rn
The standard inner product defined in Rn is the ”row-column” product of

two vectors
n
X
>
hx, yi = x y = xi yi .
i=1
It is not difficult to show (try it for yourself) that it indeed satisfies the
inner product properties:
1 hx, xi ≥ 0, and hx, xi = 0 if and only if x = 0 (positivity)
2 hx + y, zi = hx, zi + hy, zi (additivity)
3 hαx, yi = αhx, yi (homogeneity)
4 hx, yi = hy, xi (symmetry)

The Cauchy-Schwarz Inequality
Theorem (Cauchy-Schwarz inequality)

p
For any x, y ∈ X : |hx, yi| ≤ hx, xi · hy, yi.
Proof: First, consider the case hx, xi = hy, yi = 1.

Using the inner-product properties:
0 ≤ hx − y, x − yi = hx, x − yi + h−y, x − yi //positivity, additivity

= hx − y, xi − hx − y, yi //symmetry, homogeneity
= hx, xi − hy, xi − hx, yi + hy, yi //additivity, homogeneity
= hx, xi − 2hx, yi + hy, yi //symmetry
= 2 − 2hx, yi //assumption that hx, xi = hy, yi = 1
p
Rearranging we indeed get: hx, yi ≤ 1 = hx, xi · hy, yi.
The Cauchy-Schwarz Inequality

p
Recall we want to prove: x, y ∈ X : |hx, yi| ≤ hx, xi · hy, yi.
Proof cont.: we have proved the theorem for the case hx, xi = hy, yi = 1.
Let us now get to the remaining cases.
First, in case either x = 0 or y = 0, the theorem holds trivially, because
h0, yi = h0 · 0, yi = 0 · h0, yi = 0, and h0, 0i = 0.
Assume now that both x 6= 0, y 6= 0. Consider the normalized-vectors:

x y
x̄ = p , ȳ = p .
hx, xi hy, yi
Clearly, hx̄, x̄i = hȳ, ȳi = 1. Then, using our result we have
D x y E
,p = |hx̄, ȳi| ≤ 1.

p
hx, xi hy, yi
D E
1 1 x y
Since √ √ |hx, yi| = √ ,√ , rearranging we get the

hx,xi hy,yi hx,xi hy,yi
result.
Inner Products Induce Norms
Theorem
Let X be an
pinner product space. Then, the function k·k : X → R given
by kxk := hx, xi is a norm.
Recall we need to show:

1 ∀x ∈ Rn kxk ≥ 0, and kxk = 0 if and only if x = 0 (positivity)
2 ∀x, yRn kx + yk ≤ kxk + kyk (triangle inequality)
3 ∀α ∈ R, x ∈ Rn : kαxk = |α| · kxk (homogeneity)
The fact that ∀x: kxk ≥ 0 and kxk = 0 ⇔ x = 0, follows directly from
the first property of inner products.
To prove the homogeneity, fix some x ∈ X and scalar α ∈ R. We have
p p p
2
kαxk = hαx, αxi = α hx, yi = |α| hx, xi = |α|kxk,
(1)
where (1) follows from homogeneity and symmetry of the inner product.
It remains to prove the triangle inequality.
Inner Products Induce Norms
Theorem
Let X be an
pinner product space. Then, the function k·k : X → R given
by kxk := hx, xi is a norm.
To show k·k satisfies the triangle inequality, take x, y ∈ X .

Using properties of the inner-product we have:
kx + yk2 = hx + y, x + yi = hx, xi + 2hx, yi + hy, yi.

p
Using the CS-inequality we have hx, yi ≤ hx, xihy, yi. Thus,
p
kx + yk2 ≤ hx, xi + 2 hx, xihy, yi + hy, yi
= kxk2 + 2kxkkyk + kyk2
= (kxk + kyk)2 .
Hence, k·k satisfies the triangle inequality.

Standard Inner Product in Rn and Angles Between Vectors
The standard inner product in Rn (hx, yi = x> y = ni=1 xi yi ) is related
P
to the notion of angle between two vectors. For any two non-zero vectors
x, y ∈ Rn , consider the triangle whose vertices are the points (0, x, y),
and denote by θ the angle between the edges x − 0 and y − 0.
Recall the cosine theorem from plane geometry:
kx − yk22 = kx − 0k22 + ky − 0k22 − 2kx − 0k2 ky − 0k2 cos θ

= kxk22 + kyk22 − 2kxk2 kyk2 cos θ.

The standard inner product in Rn (hx, yi = x> y = ni=1 xi yi ) is related
P
to the notion of angle between two vectors.
For any two non-zero vectors x, y ∈ Rn , consider the triangle whose
vertices are the points (0, x, y), and denote by θ the angle between the
edges x − 0 and y − 0.
Recall the cosine theorem from plane geometry:
kx − yk22 = kx − 0k22 + ky − 0k22 − 2kx − 0k2 ky − 0k2 cos θ
= kxk22 + kyk22 − 2kxk2 kyk2 cos θ.
Also,
kx − yk22 = (x − y)> (x − y) = x> x − x> y − y> x + y> y
= kxk22 + kyk22 − 2x> y.
Combining we have, x> y = kxk2 kyk2 cos θ. The angle between x and y
x> y
is therefore given by cos θ = kxk2 kyk2 .
We have seen that for the standard inner product in Rn it holds for any
two vectors x, y ∈ Rn with angle θ between them that
> x> y
x y = kxk2 kyk2 cos θ, cos θ =
kxk2 kyk2
Note that since cos θ ∈ [−1, 1] the following implies that
|x> y| ≤ kxk2 kyk2 .
Thus, we have reproved the Cauchy-Schwartz inequality for the special

case of the standard inner product in Rn .
Orthogonality
Orthogonality generalizes to notion of two perpendicular lines from plane
geometry to abstract inner product spaces. It will be central to everything
we will do in this course.
Definition
Given an inner product space X and vectors x, y ∈ X , we say that x, y
are orthogonal if hx, yi = 0, and we write x ⊥ y.
Orthogonality
Definition
Given an inner product space X and vectors x, y ∈ X , we say that x, y
are orthogonal if hx, yi = 0, and we write x ⊥ y.
Theorem (Pythagorean theorem)

Let X be an inner product space and let x, y ∈ X such that x ⊥ y. Then
kx + yk2 = kxk2 + kyk2 ,
where k·k is the norm induced by the inner product.
Proof: Using properties of the inner product we have

kx + yk2 = hx + y, x + yi = hx, xi + 2hx, yi + hy, yi
= kxk2 + 2hx, yi + kyk2 = kxk2 + kyk2 ,
where that last equality follows since x ⊥ y.
Orthogonality
Definition
Given an inner product space X and vectors x(1) , . . . , x(n) in X , all are
non-zero, we say that x(1) , . . . , x(n) are mutually orthogonal if
hx(i) , x(j) i = 0 for all i 6= j.
Theorem
Given an inner product space X , any mutually orthogonal vectors
x(1) , . . . , x(n) are linearly independent.
Recall x(1) , . . . , x(n) are linearly independent if and only if

n
X
αi x(i) = 0 ⇔ α1 = α2 = · · · = αn = 0.
i=1

Orthogonality
Theorem
Given an inner product space X , any mutually orthogonal vectors
x(1) , . . . , x(n) are linearly independent.
Proof: Suppose by contradiction that x(1) , . . . , x(n) are linearly

dependent. Assume w.l.o.g. that x(1) = ni=2 αi x(i) , and that αj 6= 0 for
P
some j ∈ {2, . . . , n}. Then, since x(1) , x(j) are orthogonal we have that
n
X n
X
(1) (j) (i) (j)
0 = hx ,x i=h αi x , x i= αi hx(i) , x(j) i
i=2 i=2
n
X
= αi hx(i) , x(j) i + αj hx(j) , x(j) i
i=2,i6=j
= αj hx(j) , x(j) i =
6 0 {x(i) , x(j) are orthogonal, αj 6= 0}
We have arrived at a contradiction, and the vectors must be linearly ind.


Algebraic Methods in Data Science: Lesson 1: Dan Garber

Uploaded by

Copyright:

Available Formats

Algebraic Methods in Data Science: Lesson 1: Dan Garber

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algebraic Methods in Data Science: Lesson 1: Dan Garber

Uploaded by

Copyright:

Available Formats

Algebraic Methods in Data Science: Lesson 1

Faculty of Industrial Engineering and Management

Winter Semester 2020-2021

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 1 / 25

7 assignments (best 6 out of 7)

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 2 / 25

Importance: the material in this course is fundamental to data science,

advice: this is a very challenging mathematical / algorithmic course. You

Part I - Complementary material in linear algebra

Notions such as distance between points, angles between lines, or the

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 4 / 25

Norms generalize the notion of distance form plane geometry to linear

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 5 / 25

Norms - Example (p-norms)

For p = 1 we obtain the sum-of-absolute-values length (Manhattan

Recall that in order to prove theorem we need to show:

It remains to prove that the triangle inequality holds.

Proof of triangle inequality for p-norms

kx + yk∞ = max |xi + yi | ≤ max(|xi | + |yi |)

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 9 / 25

Proof of triangle inequality for p-norms

First, note that if x = 0 or y = 0 then the proof is trivial, since if w.l.o.g.

kx + ykpp ≤ 1 = (kxkp + kykp )p .

Definition: a function g(x) : R → R is convex on interval (a, b) if for any

|λx + (1 − λ)y|p ≤ λ|x|p + (1 − λ)|y|p .

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 10 / 25

Applying convexity of |x|p for every i ∈ [n] we have that

Proof of triangle inequality for p-norms

Finally, we need to consider the case x, y 6= 0 and kxkp + kykp 6= 1.

In this case let us denote M = kxkp + kykp . Using the homogeneity of

Thus, we are back at the previous case.

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 12 / 25

A vector space equipped with an inner product is called an inner product

Example - the standard inner product defined in Rn

The standard inner product defined in Rn is the ”row-column” product of

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 14 / 25

Theorem (Cauchy-Schwarz inequality)

Proof: First, consider the case hx, xi = hy, yi = 1.

0 ≤ hx − y, x − yi = hx, x − yi + h−y, x − yi //positivity, additivity

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 15 / 25

The Cauchy-Schwarz Inequality

Assume now that both x 6= 0, y 6= 0. Consider the normalized-vectors:

Recall we need to show:

2 ∀x, yRn kx + yk ≤ kxk + kyk (triangle inequality)

3 ∀α ∈ R, x ∈ Rn : kαxk = |α| · kxk (homogeneity)

Inner Products Induce Norms

To show k·k satisfies the triangle inequality, take x, y ∈ X .

kx + yk2 = hx + y, x + yi = hx, xi + 2hx, yi + hy, yi.

Hence, k·k satisfies the triangle inequality.

Recall the cosine theorem from plane geometry:

kx − yk22 = kx − 0k22 + ky − 0k22 − 2kx − 0k2 ky − 0k2 cos θ

Standard Inner Product in Rn and Angles Between Vectors

Note that since cos θ ∈ [−1, 1] the following implies that

|x> y| ≤ kxk2 kyk2 .

Thus, we have reproved the Cauchy-Schwartz inequality for the special

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 21 / 25

Theorem (Pythagorean theorem)

kx + yk2 = kxk2 + kyk2 ,

where k·k is the norm induced by the inner product.