Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Lecture 2.

1: Vector Calculus
CSC 84020 - Machine Learning
Andrew Rosenberg

February 5, 2009

Today

Last Time
Probability Review

Today
Vector Calculus

Background

Lets talk.
Linear Algebra
Vectors
Matrices
Basis Spaces
Eigenvectors/values?
Inversion and transposition

Calculus
Derivation
Integration

Vector Calculus
Gradients
Derivation w.r.t. a vector

Linear Algebra Basics

What is a vector?
What is a matrix?
Transposition
Adding matrices and vectors
Multiplying matrices.

Definitions

A vector is a one dimensional array.


We denote vectors as either x, x.
If we dont specify otherwise assume x is a column vector.

x0
x1

x=
...
xn1

Definitions

A matrix is a higher dimensional array.


We typically denote matrices as capital letters e.g., A.
If A is an n-by-m matrix, it has the following structure

a0,0
a0,1 . . . a0,m1
a1,0
a1,1
a1,m1

A= .

..
..
..

.
.
an1,0 an1,1 . . . an1,m1

Matrix transposition

Transposing a matrix or vector swaps rows and columns.


A column-vector becomes a row-vector

x0
x1

x=
...
xn1
xT = x0 x1 . . . xn1

Matrix transposition
Transposing a matrix or vector swaps rows and columns.
A column-vector becomes a row-vector

a0,0
a0,1 . . . a0,m1
a1,0
a1,1
a1,m1

A= .

..
..
..

.
.
an1,0 an1,1 . . . an1,m1

a0,0
a1,0
...
an1,0
a0,1
a1,1
a1,m1

AT = .

..
..
..

.
.
a0,m1 a1,m1 . . . an1,m1
If A is n-by-m, then AT is m-by-n.

Adding Matrices

Matrices can only be added if they have the same dimension.

a0,0 + b0,0
a1,0 + b1,0
..
.

a0,1 + b0,1
a1,1 + b1,1

...

a0,m1 + b0,m1
a1,m1 + b1,m1
..
.

A+B =

.
.

.
an1,0 + bn1,0 an1,1 + bn1,1 . . . an1,m1 + bn1,m1

Multiplying matrices
To multiply two matrices, the inner dimensions must match.
An n-by-m can be multiplied by an n -by-m matrix iff m = n .
AB = C
m
X
cij =
aik bkj
k=0

That is, multiply the i -th row by the j-th column.

Image from wikipedia.

Useful matrix operations

Inversion
Norm
Eigenvector decomposition

Matrix Inversion

The inverse of an n-by-m matrix A is denoted A1 , and has the


following property.
AA1 = I
Where I is the identity matrix, an n-by-n matrix where Iij = 1 iff
i = j and 0 otherwise.
If A is a square matrix (iff n = m) then,
A1 A = I

Matrix Inversion

The inverse of an n-by-m matrix A is denoted A1 , and has the


following property.
AA1 = I
Where I is the identity matrix, an n-by-n matrix where Iij = 1 iff
i = j and 0 otherwise.
If A is a square matrix (iff n = m) then,
A1 A = I
What is the inverse of a vector? x1 =?

Some useful Matrix Inversion Properties

(A1 )1 = A
(kA)1 = k 1 A1
(AT )1 = (A1 )T
(AB)1 = B 1 A1

The norm of a vector


The norm of a vector x is written ||x||.
The norm represents the euclidean length of a vector.
v
un1
uX
||x|| = t
xi2
i =0

2
x02 + x12 + . . . + xn1

Positive Definite/Semi-Definite

A positive definite matrix, M has the property that


x T Mx > 0
A positive semi-definite matrix, M has the property that
x T Mx 0
Why might we care about these matrices?

Eigenvectors

For a square matrix A, the eigenvector is defined as


Aui = i ui
Where ui is an eigenvector and i is its corresponding
eigenvalue.
In general, eigenvalues are complex numbers, but if A is
symmetric, they are real.
Eigenvectors describe how a matrix transforms a vector, and can
be used to define a basis space, namely the eigenspace.
Who cares? The eigenvectors of a covariance matrix have some
very interesting properties.

Basis Spaces

Basis spaces allow vectors to be represented in different spaces.


Our normal 2-dimensional basis space is generated by the vectors
[0, 1], [1, 0].
Any 2-d vector can be expressed as the sum of linear factors
of these two basis vectors.
However, any two non-colinear vectors can generate a 2-d basis
space. In this basis space, the generating vectors are perpendicular.

Basis Spaces

Basis Spaces

Basis Spaces

Why do we care?
Dimensionality reduction.

Calculus Basics

What is a derivative?
What is an integral?

Derivatives

d
A derivative, dx
f (x) can be thought of as defining the slope of a
function f (x). This is sometimes also written as f (x).

Derivative Example

Integrals

Integrals are an inverse operation of the derivative (plus a


constant).
Z
f (x)dx = F (x) + c
F (x) = f (x)
An integral can be thought of as a calculation of the area under
the curve defined by f (x).
A definite integral evaluates the area over a finite region. An
indefinite integral is calculated over the range of (, ).

Integration Example

Useful calculus operations

Product, quotient, summation rules for derivatives.


Useful integration and derivative identities.
Chain rule
Integration by parts
Variable substitution (dont forget the Jacobian!)

Calculus Identities
Summation rule
g (x) = f0 (x) + f0 (x)
g (x) = f0 (x) + f1 (x)
Product Rule
g (x) = f0 (x)f1 (x)
g (x) = f0 (x)f1 (x) + f0 (x)f1 (x)
Quotient Rule
g (x) =
g (x) =

f0 (x)
f1 (x)

f0 (x)f1 (x) f0 (x)f1 (x)


f12 (x)

Calculus Identities

Constant multipliers
g (x) = cf (x)
g (x) = cf (x)
Exponent Rule
g (x) = f (x)k
g (x) = kf (x)k1
Chain Rule
g (x) = f0 (f1 (x))
g (x) = f0 (f1 (x))f1 (x)

Calculus Identities
Exponent Rule
g (x) = e x
g (x) = e x
g (x) = k x
g (x) = ln(k)k x
Logarithm Rule
g (x) = ln(x)
1
x
g (x) = logb (x)
g (x) =

g (x) =

1
x ln b

Calculus Operations

Integration by Parts
Z
Z
df (x)
dg (x)
dx = f (x)g (x) g (x)
dx
f (x)
dx
dx
Variable Substitution
Z

f (g (x))g (x)dx =
a

g (b)

f (x)dx
g (a)

Vector Calculus

Derivation with respect to to a vector or matrix.


Gradient of a vector.
Change of variables with a vector.

Derivation with respect to a vector

Given a vector x = (x0 , x1 , . . . , xn1 )T , and a function


f (x) : Rn R how can we find fx(x) ?

Derivation with respect to a vector

Given a vector x = (x0 , x1 , . . . , xn1 )T , and a function


f (x) : Rn R how can we find fx(x) ?
f (x)
x

0
f (x)

f (x) x1
= .

x
..

f (x)
xn1

This is also called the gradient of the function, and is often


written f (x) or f .

Derivation with respect to a vector

Given a vector x = (x0 , x1 , . . . , xn1 )T , and a function


f (x) : Rn R how can we find fx(x) ?
f (x)
x

0
f (x)

f (x) x1
= .

x
..

f (x)
xn1

This is also called the gradient of the function, and is often


written f (x) or f .
Why might this be useful?

Useful Vector Calculus identities

Given a vector x with |x| = n and a scalar variable y .


x
0

y
x1
y

x
=
..
y
.

xn1
y

Useful Vector Calculus identities

Given a vector x with |x| = n and a vector y with |y| = m .

x0
x0
0
. . . yx
y0
y1
m1
x1
x1
1
. . . yx

x
y0
y1
m1

= .
.
.
.
..
..
..
y ..

xn1
xn1
xn1
. . . ym1
y0
y1

Vector Calculus Identities


Similar to Scalar Multiplication Rule
T
T
(x a) =
(a x) = a
x
x
Similar to Product Rule

A
B
(AB) =
B +A
x
x
x
Derivative of an Matrix inverse.
A 1
1
(A ) = A1
A
x
x
Change of Variable in an Integral

Z
Z
x
f (x)dx = f (u) du
u

Calculating the Expectation of a Gaussian

Now we have enough tools to calculate the expectation of a


variable given a Gaussian Distribution.
Recall:
2

E[x|, ] =

Z
Z

p(x|, 2 )xdx

N(x|, 2 )xdx


Z
1
1
2

=
exp 2 (x ) xdx
2
2 2
=

Calculating the Expectation of a Gaussian

E[x|, 2 ]

1
1
exp 2 (x )2 xdx
2
2 2

u = x
du = dx

E[x|, 2 ]

=
=
=

1
1
exp 2 (x )2 xdx
2
2 2

Z
1
1

exp 2 u 2 (u + )du
2
2 2

Z
Z
1
1
1
1

exp 2 u 2 udu +
exp 2 u 2 du
2
2
2 2
2 2

Calculating the Expectation of a Gaussian

E[x|, 2 ] =

Z
1
1
1
1
exp 2 u 2 udu +
exp 2 u 2 du
2
2
2 2
2 2

Z
1
1 2

exp 2 u du = 1
2
2 2

Z
1
1

exp 2 u 2 udu +
E[x|, 2 ] =
2
2 2

Aside: A function is Odd iff f (x) = f (x).


R
Odd functions have the property f (x)dx = 0.

A function is Even iff f (x) = f (x).

The product of an odd function and an even function is an odd function.

Calculating the Expectation of a Gaussian



1 2
1

exp 2 u udu +
E[x|, ] =
2
2 2


1 2
exp 2 u
is even
2
 u is odd

1 2
exp 2 u u is odd
2


Z
1
1 2

exp 2 u udu = 0
2
2 2
2

E[x|, 2 ] =

Why does Machine Learning need these tools?

Calculus
We need to find maximum likelihoods or minimum risks. This
optimization is accomplished with derivatives.
Integration allows us to marginalize continuous probability
density functions.
Linear Algebra
We will be working in high-dimension spaces.
Vectors and Matrices allow us to refer to high dimensional
points groups of features as vectors.
Matrices allow us to describe the feature space.

Why does machine learning need these tools

Vector Calculus
We need to do all of the calculus operations in
high-dimensional feature spaces.
We will want to optimize multiple values simultaneously
Gradient Descent.
We will need to take a marginal over a high dimensional
distributions Gaussians.

Broader Context

What we have so far:


Entities in the world are represented as feature vectors and
maybe a label.
We want to construct statistical models of the feature vectors.
Finding the most likely model is an optimization problem.
Since the feature vectors may have more than one dimension,
linear algebra can help us work with them.

Bye

Next
Linear Regression

You might also like