Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Algebraic Methods in Data Science: Lesson 1: Dan Garber

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Algebraic Methods in Data Science: Lesson 1

Faculty of Industrial Engineering and Management


Technion - Israel Institute of Technology

Dan Garber
https://dangar.net.technion.ac.il/

Winter Semester 2020-2021

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 1 / 25

Introduction

Course staff:
1 Lecturer: Dan Garber (dangar@technion.ac.il)
2 TA in charge: Ido Botzer (idobotzer@campus.technion.ac.il)
3 TA: Or Markovetzki (ormar@campus.technion.ac.il)

Grade:
1 Homework: - 15% of grade - TAKEF

7 assignments (best 6 out of 7)


Mostly theoretical questions but also some programming in Python
2 Final exam: 85% of grade

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 2 / 25


Introduction
Topics:
1 Complementary material in linear algebra
2 The Singular Value Decomposition, algorithms and applications
3 Linear Systems and the Least Squares Problem, algorithms and
applications

Importance: the material in this course is fundamental to data science,


from some of the most basic models and algorithms to the most advanced
ones.
It is hard to over estimate its importance to modern DS/ML/AI. Word of

advice: this is a very challenging mathematical / algorithmic course. You


will have to be proficient in linear algebra to succeed.
Make as much effort as possible to keep up with it during the semester.
WORK HARD ON YOUR HOMEWORK.
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 3 / 25

Part I - Complementary material in linear algebra

Notions such as distance between points, angles between lines, or the


concept of two lines being perpendicular (orthogonal) to each other are
well familiar from plane geometry learned in high school.

We will first develop similar notions for the more general and more
abstract linear (vector) spaces and in particular for Rn and Rm×n .

This will in turn lead to the theory of eigenvalues and eigenvectors for real
matrices and (basically) to everything we will do in this course.

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 4 / 25


Norms

Norms generalize the notion of distance form plane geometry to linear


spaces.

A norm is a function that assigns a strictly positive length (or size) to each
vector in a linear space, except for the zero vector, which is assigned a
length of zero. In the following, let X be a linear space.

Definition
A function k·k : X → R is a norm, if
1 ∀x ∈ X kxk ≥ 0, and kxk = 0 if and only if x = 0 (positivity)
2 ∀x, y ∈ X kx + yk ≤ kxk + kyk (triangle inequality)
3 ∀α ∈ R, x ∈ X : kαxk = |α| · kxk (homogeneity)

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 5 / 25

Norms - Example (p-norms)


Consider X = Rn . The family of `p norms is defined as follows:
n
!1/p
X
kxkp := |xi |p , 1 ≤ p ≤ ∞.
i=1
In particular, for p = 2 we get the standard Euclidean distance
v
u n
uX
kxk2 := t x2i .
i=1

For p = 1 we obtain the sum-of-absolute-values length (Manhattan


distance)
n
X
kxk1 := |xi |.
i=1
The limit p = ∞ exits, in this case we get the max-absolute-value norm
kxk∞ := lim kxkp = max |xi |.
p→∞ i∈{1,...,n}
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 6 / 25
Norms - Example (p-norms)
Theorem
Fix X = Rn . For any p ∈ [1, ∞], kxkp := ( ni=1 |xi |p )1/p is a norm.
P

Recall that in order to prove theorem we need to show:


1 ∀x ∈ Rn kxk ≥ 0, and kxk = 0 if and only if x = 0 (positivity)
p p
2 ∀x, yR n kx + ykp ≤ kxk + kykp (triangle inequality)
3 ∀α ∈ R, x ∈ Rn : kαxk = |α| · kxk (homogeneity)
p p
Note positivity holds trivially. Similarly, homogeneity holds since
n
!1/p n
!1/p
X X
kαxkp = |αxi |p = |α|p |xi |p
i=1 i=1
n
!1/p
X
= |α| |xi |p = |α|kxkp .
i=1

It remains to prove that the triangle inequality holds.


@ Dan Garber (Technion) Lesson 1 Winter 2020-21 7 / 25

Proof of triangle inequality for p-norms


We begin with a warmup.
The case p = 1: for any x, y ∈ Rn we have
n
X n
X
kx + yk1 = |xi + yi | ≤ (|xi | + |yi |) = kxk1 + kyk1 ,
i=1 (1) i=1

where (1) follows from the usual triangle inequality for scalars. The case
p = 2: simply the Euclidean-norm (basic geometry).
The case p = ∞:

kx + yk∞ = max |xi + yi | ≤ max(|xi | + |yi |)


i∈[n] (1) i∈[n]
≤ max |xi | + max |yj | = kxk∞ + kyk∞ ,
i j

where again, (1) follows from the triangle inequality for scalars.

Lets get to proving the general case, i.e., for all p ∈ [1, ∞].
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 8 / 25
Proof of triangle inequality for p-norms
First, note that if x = 0 or y = 0 then the proof is trivial: if w.l.o.g.
x = 0 we have kx + ykp = kykp = kykp + 0 = kykp + kxkp .

Consider now the case that kxkp + kykp = 1. It suffices to show that
kx + ykpp ≤ 1 = (kxkp + kykp )p .
Definition: a function g(x) : R → R is convex on interval (a, b) if for any
x, y ∈ (a, b), λ ∈ [0, 1] we have g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y).

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 9 / 25

Proof of triangle inequality for p-norms

First, note that if x = 0 or y = 0 then the proof is trivial, since if w.l.o.g.


x = 0 we have kx + ykp = kykp = kykp + 0 = kykp + kxkp .

Consider now the case that kxkp + kykp = 1. It suffices to show that

kx + ykpp ≤ 1 = (kxkp + kykp )p .

Definition: a function g(x) : R → R is convex on interval (a, b) if for any


x, y ∈ (a, b), λ ∈ [0, 1] we have g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y).

Fact: the scalar function f (x) = |x|p is convex on (−∞, ∞). That is, for
any x, y ∈ R and λ ∈ [0, 1] it holds that

|λx + (1 − λ)y|p ≤ λ|x|p + (1 − λ)|y|p .

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 10 / 25


Proof of triangle inequality for p-norms
We assume x, y 6= 0, kxkp + kykp = 1 and need to prove kx + ykpp ≤ 1.
Convexity of |x|p : ∀x, y, λ ∈ [0, 1]: |λx + (1 − λ)y|p ≤ λ|x|p + (1 − λ)|y|p .
Let us denote λ = kxkp . Note that 0 < λ < 1 and that kykp = 1 − λ
(since x, y 6= 0 and kxkp < kxkp + kykp = 1).
n n   
x y p

i i
X X
kx + ykpp = |xi + yi |p = λ + (1 − λ) .

i=1 i=1
λ 1 − λ

Applying convexity of |x|p for every i ∈ [n] we have that


n x p y p n n
X i i X X
kx + ykpp ≤ λ + (1 − λ) =λ 1−p p
|xi | + (1 − λ)1−p
|yi |p
i=1
λ 1 − λ i=1 i=1
= λ1−p kxkpp + (1 − λ)1−p kykpp = kxk1−p
p kxkpp + kyk1−p
p kykpp
= kxkp + kykp = 1.

And so we have proved the claim for the case x, y 6= 0, kxkp + kykp = 1.
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 11 / 25

Proof of triangle inequality for p-norms

Finally, we need to consider the case x, y 6= 0 and kxkp + kykp 6= 1.

In this case let us denote M = kxkp + kykp . Using the homogeneity of


k·kp (which we already proved) we have
x y p
kx + ykpp ≤ (kxkp + kykp )p ⇐⇒ M p k + kp ≤ M p
M M
x y p
⇐⇒ k + k ≤ 1.
M M p
x y 1 M
Observe also that k M kp + k M kp = M (kxkp + kykp ) = M = 1.

Thus, we are back at the previous case.

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 12 / 25


Inner Product Spaces
Inner product is a function that associates any two vectors in a linear
space with a scalar value. It will be important to generalize familiar
concepts from plane geometry such as angles, or orthogonality and much
more, to abstract linear spaces.
Definition
An inner product on a (real) vector space X is a function which maps any
pair x, y ∈ X into a real scalar denoted by hx, yi, which satisfies the
following axioms for any x, y, z ∈ X and scalar α ∈ R:
1 hx, xi ≥ 0, and hx, xi = 0 if and only if x = 0 (positivity)
2 hx + y, zi = hx, zi + hy, zi (additivity)
3 hαx, yi = αhx, yi (homogeneity)
4 hx, yi = hy, xi (symmetry)

A vector space equipped with an inner product is called an inner product


space.
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 13 / 25

Example - the standard inner product defined in Rn

The standard inner product defined in Rn is the ”row-column” product of


two vectors
n
X
>
hx, yi = x y = xi yi .
i=1

It is not difficult to show (try it for yourself) that it indeed satisfies the
inner product properties:
1 hx, xi ≥ 0, and hx, xi = 0 if and only if x = 0 (positivity)
2 hx + y, zi = hx, zi + hy, zi (additivity)
3 hαx, yi = αhx, yi (homogeneity)
4 hx, yi = hy, xi (symmetry)

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 14 / 25


The Cauchy-Schwarz Inequality

Theorem (Cauchy-Schwarz inequality)


p
For any x, y ∈ X : |hx, yi| ≤ hx, xi · hy, yi.

Proof: First, consider the case hx, xi = hy, yi = 1.


Using the inner-product properties:

0 ≤ hx − y, x − yi = hx, x − yi + h−y, x − yi //positivity, additivity


= hx − y, xi − hx − y, yi //symmetry, homogeneity
= hx, xi − hy, xi − hx, yi + hy, yi //additivity, homogeneity
= hx, xi − 2hx, yi + hy, yi //symmetry
= 2 − 2hx, yi //assumption that hx, xi = hy, yi = 1
p
Rearranging we indeed get: hx, yi ≤ 1 = hx, xi · hy, yi.

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 15 / 25

The Cauchy-Schwarz Inequality


p
Recall we want to prove: x, y ∈ X : |hx, yi| ≤ hx, xi · hy, yi.
Proof cont.: we have proved the theorem for the case hx, xi = hy, yi = 1.
Let us now get to the remaining cases.
First, in case either x = 0 or y = 0, the theorem holds trivially, because
h0, yi = h0 · 0, yi = 0 · h0, yi = 0, and h0, 0i = 0.

Assume now that both x 6= 0, y 6= 0. Consider the normalized-vectors:


x y
x̄ = p , ȳ = p .
hx, xi hy, yi
Clearly, hx̄, x̄i = hȳ, ȳi = 1. Then, using our result we have
D x y E
,p = |hx̄, ȳi| ≤ 1.

p
hx, xi hy, yi
D E
1 1 x y
Since √ √ |hx, yi| = √ ,√ , rearranging we get the

hx,xi hy,yi hx,xi hy,yi
result.
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 16 / 25
Inner Products Induce Norms

Theorem
Let X be an
pinner product space. Then, the function k·k : X → R given
by kxk := hx, xi is a norm.

Recall we need to show:


1 ∀x ∈ Rn kxk ≥ 0, and kxk = 0 if and only if x = 0 (positivity)

2 ∀x, yRn kx + yk ≤ kxk + kyk (triangle inequality)

3 ∀α ∈ R, x ∈ Rn : kαxk = |α| · kxk (homogeneity)

The fact that ∀x: kxk ≥ 0 and kxk = 0 ⇔ x = 0, follows directly from
the first property of inner products.
To prove the homogeneity, fix some x ∈ X and scalar α ∈ R. We have
p p p
2
kαxk = hαx, αxi = α hx, yi = |α| hx, xi = |α|kxk,
(1)

where (1) follows from homogeneity and symmetry of the inner product.
It remains to prove the triangle inequality.
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 17 / 25

Inner Products Induce Norms

Theorem
Let X be an
pinner product space. Then, the function k·k : X → R given
by kxk := hx, xi is a norm.

To show k·k satisfies the triangle inequality, take x, y ∈ X .


Using properties of the inner-product we have:

kx + yk2 = hx + y, x + yi = hx, xi + 2hx, yi + hy, yi.


p
Using the CS-inequality we have hx, yi ≤ hx, xihy, yi. Thus,
p
kx + yk2 ≤ hx, xi + 2 hx, xihy, yi + hy, yi
= kxk2 + 2kxkkyk + kyk2
= (kxk + kyk)2 .

Hence, k·k satisfies the triangle inequality.


@ Dan Garber (Technion) Lesson 1 Winter 2020-21 18 / 25
Standard Inner Product in Rn and Angles Between Vectors
The standard inner product in Rn (hx, yi = x> y = ni=1 xi yi ) is related
P
to the notion of angle between two vectors. For any two non-zero vectors
x, y ∈ Rn , consider the triangle whose vertices are the points (0, x, y),
and denote by θ the angle between the edges x − 0 and y − 0.

Recall the cosine theorem from plane geometry:

kx − yk22 = kx − 0k22 + ky − 0k22 − 2kx − 0k2 ky − 0k2 cos θ


= kxk22 + kyk22 − 2kxk2 kyk2 cos θ.
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 19 / 25

Standard Inner Product in Rn and Angles Between Vectors


The standard inner product in Rn (hx, yi = x> y = ni=1 xi yi ) is related
P
to the notion of angle between two vectors.
For any two non-zero vectors x, y ∈ Rn , consider the triangle whose
vertices are the points (0, x, y), and denote by θ the angle between the
edges x − 0 and y − 0.
Recall the cosine theorem from plane geometry:
kx − yk22 = kx − 0k22 + ky − 0k22 − 2kx − 0k2 ky − 0k2 cos θ
= kxk22 + kyk22 − 2kxk2 kyk2 cos θ.
Also,
kx − yk22 = (x − y)> (x − y) = x> x − x> y − y> x + y> y
= kxk22 + kyk22 − 2x> y.
Combining we have, x> y = kxk2 kyk2 cos θ. The angle between x and y
x> y
is therefore given by cos θ = kxk2 kyk2 .
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 20 / 25
Standard Inner Product in Rn and Angles Between Vectors

We have seen that for the standard inner product in Rn it holds for any
two vectors x, y ∈ Rn with angle θ between them that

> x> y
x y = kxk2 kyk2 cos θ, cos θ =
kxk2 kyk2

Note that since cos θ ∈ [−1, 1] the following implies that

|x> y| ≤ kxk2 kyk2 .

Thus, we have reproved the Cauchy-Schwartz inequality for the special


case of the standard inner product in Rn .

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 21 / 25

Orthogonality
Orthogonality generalizes to notion of two perpendicular lines from plane
geometry to abstract inner product spaces. It will be central to everything
we will do in this course.

Definition
Given an inner product space X and vectors x, y ∈ X , we say that x, y
are orthogonal if hx, yi = 0, and we write x ⊥ y.
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 22 / 25
Orthogonality
Definition
Given an inner product space X and vectors x, y ∈ X , we say that x, y
are orthogonal if hx, yi = 0, and we write x ⊥ y.

Theorem (Pythagorean theorem)


Let X be an inner product space and let x, y ∈ X such that x ⊥ y. Then

kx + yk2 = kxk2 + kyk2 ,

where k·k is the norm induced by the inner product.

Proof: Using properties of the inner product we have


kx + yk2 = hx + y, x + yi = hx, xi + 2hx, yi + hy, yi
= kxk2 + 2hx, yi + kyk2 = kxk2 + kyk2 ,
where that last equality follows since x ⊥ y.
@ Dan Garber (Technion) Lesson 1 Winter 2020-21 23 / 25

Orthogonality

Definition
Given an inner product space X and vectors x(1) , . . . , x(n) in X , all are
non-zero, we say that x(1) , . . . , x(n) are mutually orthogonal if
hx(i) , x(j) i = 0 for all i 6= j.

Theorem
Given an inner product space X , any mutually orthogonal vectors
x(1) , . . . , x(n) are linearly independent.

Recall x(1) , . . . , x(n) are linearly independent if and only if


n
X
αi x(i) = 0 ⇔ α1 = α2 = · · · = αn = 0.
i=1

@ Dan Garber (Technion) Lesson 1 Winter 2020-21 24 / 25


Orthogonality

Theorem
Given an inner product space X , any mutually orthogonal vectors
x(1) , . . . , x(n) are linearly independent.

Proof: Suppose by contradiction that x(1) , . . . , x(n) are linearly


dependent. Assume w.l.o.g. that x(1) = ni=2 αi x(i) , and that αj 6= 0 for
P
some j ∈ {2, . . . , n}. Then, since x(1) , x(j) are orthogonal we have that
n
X n
X
(1) (j) (i) (j)
0 = hx ,x i=h αi x , x i= αi hx(i) , x(j) i
i=2 i=2
n
X
= αi hx(i) , x(j) i + αj hx(j) , x(j) i
i=2,i6=j

= αj hx(j) , x(j) i =
6 0 {x(i) , x(j) are orthogonal, αj 6= 0}

We have arrived at a contradiction, and the vectors must be linearly ind.


@ Dan Garber (Technion) Lesson 1 Winter 2020-21 25 / 25

You might also like