Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
1: Vector Calculus
CSC 84020 - Machine Learning
Andrew Rosenberg
February 5, 2009
Today
Last Time
Probability Review
Today
Vector Calculus
Background
Lets talk.
Linear Algebra
Vectors
Matrices
Basis Spaces
Eigenvectors/values?
Inversion and transposition
Calculus
Derivation
Integration
Vector Calculus
Gradients
Derivation w.r.t. a vector
What is a vector?
What is a matrix?
Transposition
Adding matrices and vectors
Multiplying matrices.
Definitions
x0
x1
x=
...
xn1
Definitions
a0,0
a0,1 . . . a0,m1
a1,0
a1,1
a1,m1
A= .
..
..
..
.
.
an1,0 an1,1 . . . an1,m1
Matrix transposition
x0
x1
x=
...
xn1
xT = x0 x1 . . . xn1
Matrix transposition
Transposing a matrix or vector swaps rows and columns.
A column-vector becomes a row-vector
a0,0
a0,1 . . . a0,m1
a1,0
a1,1
a1,m1
A= .
..
..
..
.
.
an1,0 an1,1 . . . an1,m1
a0,0
a1,0
...
an1,0
a0,1
a1,1
a1,m1
AT = .
..
..
..
.
.
a0,m1 a1,m1 . . . an1,m1
If A is n-by-m, then AT is m-by-n.
Adding Matrices
a0,0 + b0,0
a1,0 + b1,0
..
.
a0,1 + b0,1
a1,1 + b1,1
...
a0,m1 + b0,m1
a1,m1 + b1,m1
..
.
A+B =
.
.
.
an1,0 + bn1,0 an1,1 + bn1,1 . . . an1,m1 + bn1,m1
Multiplying matrices
To multiply two matrices, the inner dimensions must match.
An n-by-m can be multiplied by an n -by-m matrix iff m = n .
AB = C
m
X
cij =
aik bkj
k=0
Inversion
Norm
Eigenvector decomposition
Matrix Inversion
Matrix Inversion
(A1 )1 = A
(kA)1 = k 1 A1
(AT )1 = (A1 )T
(AB)1 = B 1 A1
2
x02 + x12 + . . . + xn1
Positive Definite/Semi-Definite
Eigenvectors
Basis Spaces
Basis Spaces
Basis Spaces
Basis Spaces
Why do we care?
Dimensionality reduction.
Calculus Basics
What is a derivative?
What is an integral?
Derivatives
d
A derivative, dx
f (x) can be thought of as defining the slope of a
function f (x). This is sometimes also written as f (x).
Derivative Example
Integrals
Integration Example
Calculus Identities
Summation rule
g (x) = f0 (x) + f0 (x)
g (x) = f0 (x) + f1 (x)
Product Rule
g (x) = f0 (x)f1 (x)
g (x) = f0 (x)f1 (x) + f0 (x)f1 (x)
Quotient Rule
g (x) =
g (x) =
f0 (x)
f1 (x)
Calculus Identities
Constant multipliers
g (x) = cf (x)
g (x) = cf (x)
Exponent Rule
g (x) = f (x)k
g (x) = kf (x)k1
Chain Rule
g (x) = f0 (f1 (x))
g (x) = f0 (f1 (x))f1 (x)
Calculus Identities
Exponent Rule
g (x) = e x
g (x) = e x
g (x) = k x
g (x) = ln(k)k x
Logarithm Rule
g (x) = ln(x)
1
x
g (x) = logb (x)
g (x) =
g (x) =
1
x ln b
Calculus Operations
Integration by Parts
Z
Z
df (x)
dg (x)
dx = f (x)g (x) g (x)
dx
f (x)
dx
dx
Variable Substitution
Z
f (g (x))g (x)dx =
a
g (b)
f (x)dx
g (a)
Vector Calculus
0
f (x)
f (x) x1
= .
x
..
f (x)
xn1
0
f (x)
f (x) x1
= .
x
..
f (x)
xn1
y
x1
y
x
=
..
y
.
xn1
y
x0
x0
0
. . . yx
y0
y1
m1
x1
x1
1
. . . yx
x
y0
y1
m1
= .
.
.
.
..
..
..
y ..
xn1
xn1
xn1
. . . ym1
y0
y1
A
B
(AB) =
B +A
x
x
x
Derivative of an Matrix inverse.
A 1
1
(A ) = A1
A
x
x
Change of Variable in an Integral
Z
Z
x
f (x)dx = f (u) du
u
E[x|, ] =
Z
Z
p(x|, 2 )xdx
N(x|, 2 )xdx
Z
1
1
2
=
exp 2 (x ) xdx
2
2 2
=
E[x|, 2 ]
1
1
exp 2 (x )2 xdx
2
2 2
u = x
du = dx
E[x|, 2 ]
=
=
=
1
1
exp 2 (x )2 xdx
2
2 2
Z
1
1
exp 2 u 2 (u + )du
2
2 2
Z
Z
1
1
1
1
exp 2 u 2 udu +
exp 2 u 2 du
2
2
2 2
2 2
E[x|, 2 ] =
Z
1
1
1
1
exp 2 u 2 udu +
exp 2 u 2 du
2
2
2 2
2 2
Z
1
1 2
exp 2 u du = 1
2
2 2
Z
1
1
exp 2 u 2 udu +
E[x|, 2 ] =
2
2 2
1 2
1
exp 2 u udu +
E[x|, ] =
2
2 2
1 2
exp 2 u
is even
2
u is odd
1 2
exp 2 u u is odd
2
Z
1
1 2
exp 2 u udu = 0
2
2 2
2
E[x|, 2 ] =
Calculus
We need to find maximum likelihoods or minimum risks. This
optimization is accomplished with derivatives.
Integration allows us to marginalize continuous probability
density functions.
Linear Algebra
We will be working in high-dimension spaces.
Vectors and Matrices allow us to refer to high dimensional
points groups of features as vectors.
Matrices allow us to describe the feature space.
Vector Calculus
We need to do all of the calculus operations in
high-dimensional feature spaces.
We will want to optimize multiple values simultaneously
Gradient Descent.
We will need to take a marginal over a high dimensional
distributions Gaussians.
Broader Context
Bye
Next
Linear Regression