01 - Lab Notes
01 - Lab Notes
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
They may be distributed outside this class only with the permission of the Instructor(s).
This recitation serves as a recap of useful mathematical definitions and tools for VNAV 2020, and for research
in robotics and computer vision in general. It covers basic linear algebra (Section 1.1) and matrix calculus
(Section 1.2). These notes have benefited from [2, 1].
Notations. We use lowercase characters (e.g., s ∈ R, C) to denote real and complex scalars, bold lowercase
characters (e.g., v ∈ Rn , Cn ) for real and complex vectors, and bold uppercase characters (e.g., M ∈
Rm×n , Cm×n ) for real and complex matrices. vi denotes the i-th scalar entry of vector v, and MijPdenotes
. n
the i-th row and j-th column scalar entry of matrix M . For a square matrix M ∈ Rn×n , tr (M ) = i=1 Mii
m×n
denotes the trace of M , and det (M ) denotes the determinant of M . For any matrix M ∈ R , denote
vec (M ) ∈ Rmn as the column-wise vectorization of M by vertically stacking its columns. We use S n to
denote the set of real symmetric matrices of size n × n. For any vector v ∈ Rn , diag (v) creates a diagonal
matrix V ∈ S n with diagonal entries Vii = vi , i = 1, . . . , n.
1.1.1 Norms
Inner Product. The standard inner product on Rn , the set of n-dimensional real vectors, is defined as:
n
X
hx, yi = xT y = xi yi . (1.1)
i=1
The standard inner product on Rm×n , the set of m × n real matrices, is defined as:
m X
n
X
hX, Y i = tr X T Y = Xij Yij . (1.2)
i=1 j=1
Vector Norms. Let us first introduce the definition of a general vector norm in Rn .
1
Lecture 1: Mathematical Preliminaries
When f satisfies Definition 1, f is called a norm function and typically denoted as k·k. We use kxkp to
denote the `p norm of a vector x ∈ Rn . When p ≥ 1, kxkp is defined as:
n
!1/p
. X p
kxkp = |xi | . (1.3)
i=1
For applications of `∞ norm in computer vision, one can refer to [5, 4] and the CVPR 2018 tutorial.1
Exercise: Verify the three norms (`1 , `2 , `∞ ) satisfy the properties in Definition 1.
Angle. The angle between two nonzero vectors x, y ∈ Rn is defined as:
xT y
6 (x, y) = arccos , (1.4)
kxk2 kyk2
where we take arccos (·) ∈ [0, π]. We say x and y are orthogonal when xT y = 0. In machine learning, cosine
similarity, i.e., the cosine of the angle 6 (x, y), is often used to measure the similarity of two vectors x, y.
Frobenius Norm. The most common norm on Rm×n is the Frobenius norm, For X ∈ Rm×n , its Frobenius
norm is defined as:
v
um X n
q uX
kXkF = tr (X T X) = t 2 = kvec (M )k .
Xij (1.5)
2
i=1 j=1
Operator Norm. Suppose k·kp (p ≥ 1) is a norm on Rn and Rm , then we can define the operator norm
(induced norm) of X ∈ Rm×n as:
kXkp = sup kXvkp . (1.6)
v∈Rn ,kvkp ≤1
A special case of the operator norm is kXk2 = σmax (X), where σmax (X) denotes the maximum singular
value of X. In general, kXkp is NP-hard to compute for p 6∈ {1, 2, ∞}.
Exercise: Verify the matrix operator norm defined in eq. (1.6) satisfies Definition 1.
2
Lecture 1: Mathematical Preliminaries
Kronecker Product. If A ∈ Rm×n and B ∈ Rp×q , then the Kronecker product A ⊗ B is defined as:
A11 B · · · A1n B
A⊗B = .. .. .. mp×nq
∈R . (1.9)
. . .
Am1 B · · · Amn B
Useful Equalities. The following equalities can be useful when manipulating mathematical equations:
T
(i) If A, B ∈ Rn×n , then tr AT B = vec (A) vec (B).
(iv) Let X ∈ Rm×n , then its (i, j)-th entry can be written as:
Xij = eT T T
i Xej = tr ei Xej = tr Xej ei , (1.13)
where ei ∈ Rm is the i-th standard basis vector (1 at the i-th entry and 0 everywhere else), and ej ∈ Rn
is the j-th standard basis vector (1 at the j-th entry and 0 everywhere else).
The interested reader can refer to the supplementary material of [3] for an application of the equalities above
to solving a problem in computer vision.
An orthogonal matrix is a real square matrix whose rows and columns are orthonormal vectors (orthogonal
and unit norm). Formally, let Q ∈ Rn×n , then Q is an orthogonal matrix if and only if:
We use O(n), the n-dimensional orthogonal group, to denote the set of orthogonal matrices with size n × n.
An orthogonal matrix has determinant equal to either +1 or −1, which can be easily seen from:
2
det QT Q = (det (Q)) = det (In ) = 1 =⇒ det (Q) = ±1.
(1.15)
3
Lecture 1: Mathematical Preliminaries
Given a square matrix A ∈ Cn×n , if there exists a scalar λ ∈ C and a nonzero vector v ∈ Cn such that:
Av = λv, (1.18)
then v is called a right eigenvector of A and λ is the associated eigenvalue. If there exists a scalar κ ∈ C
and a nonzero vector u ∈ Cn such that:
uT A = κuT , (1.19)
then u is called a left eigenvector of A with associated eigenvalue κ. If all right eigenvectors of A are linearly
independent, then denoting V = [v1 , . . . , vn ] and Λ = diag ([λ1 , . . . , λn ]), we have:
AV = V Λ =⇒ V −1 AV = Λ, (1.20)
i.e., A is diagonalizable by V .
As an exercise, try to prove the following lemmas.
Lemma 2 (Eigenvalues and Characteristic Polynomial). Any matrix A ∈ Cn×n has equal left and right
eigenvalues, and they are the roots of the characteristic polynomial f (λ) = det (A − λIn ) .
Lemma 3 (Real Symmetric Matrices have Real Eigenvalues). The eigenvalues of any real symmetric matrix
A ∈ S n are all real. Hence, the eigenvalues can be sorted: λ1 ≥ . . . ≥ λn .
Lemma 4 (Real Symmetric Matrices have Orthogonal Eigenvectors). Let A ∈ S n be a real symmetric
matrix and let λi 6= λj be any two distinct eigenvalues with associated eigenvectors vi and vj , then viT vj = 0.
Moreover, if λi is a repeated eigenvalue with multiplicity m ≥ 2, then there exist m orthonormal eigenvectors
corresponding to λi .
Corollary 5 (Real Symmetric Matrices are Diagonalizable). Any real symmetric matrix A ∈ S n can be
diagonalized as:
A = U ΛU T , (1.21)
where U = [u1 , . . . , un ] ∈ O(n) is an orthogonal matrix whose columns ui are the eigenvectors of A, and
Λ = diag ([λ1 , . . . , λn ]) is a diagonal matrix containing the eigenvalues of A. The factorization in eq. (1.21)
is called the eigendecomposition or spectral decomposition of A, and is unique (up to permutation of ui and
λi ) when all the eigenvalues of A are distinct.
are the minimum and maximum eigenvalues/engenvectors: (λmin , vmin ) and (λmax , vmax ). As a result,
2 2
we have λmin kxk2 ≤ xT Ax ≤ λmax kxk2 , for any x ∈ Rn .
4
Lecture 1: Mathematical Preliminaries
The singular value decomposition (SVD) of any real matrix2 M ∈ Rm×n is:
and S ∈ Rm×n is a rectangular diagonal matrix with nonnegative diagonal entries. The diagonal entries of
S are called the singular values of M . The number of nonzero singular values in S is equal to the rank of
M . The SVD in eq. (1.23) in equivalent to:
which implies that M vi = Sii ui , i = 1, . . . , min {m, n}, where ui ∈ Rm , vi ∈ Rn are the i-th column of U
and V , and they are called the left and right singular vectors of M , respectively.
Exercise: Is the SVD of a matrix unique? What is an SVD of an orthogonal matrix and a rotation matrix?
Relationship to Spectral Decomposition. Consider matrices M T M and M M T :
M T M = V S T U T U SV T = V S T S V T ,
(1.25)
M M T = U SV T V S T U T = U SS T U T .
(1.26)
Therefore, the columns of V are eigenvectors of M T M , while the columns of U are eigenvectors of M M T .
The nonzero singular values of M are the square roots of the nonzero eigenvalues of M T M and M M T .
The following statements about positive semidefinite (PSD) matrices are equivalent:
The following statements about positive definite (PD) matrices are equivalent:
5
Lecture 1: Mathematical Preliminaries
Matrix Congruence. If P ∈ Rn×n is invertible (nonsingular), then A, B ∈ Rn×n are congruent if:
P T AP = B. (1.27)
If both A and B are symmetric, then A and B have the same numbers of positive, negative, and zero
eigenvalues. Therefore, A 0 ⇐⇒ P T AP 0.
Matrix Similarity. If P ∈ Rn×n is invertible (nonsingular), then A, B ∈ Rn×n are similar if:
P −1 AP = B. (1.28)
Similar matrices have the same characteristic polynomials, hence, the same eigenvalues. (Exercise: prove
this.) Therefore, A 0 ⇐⇒ P −1 AP 0. See [6] for an application of this.
Schur Complement. Consider the following block matrix:
A B
X= ∈ R(m+n)×(m+n) , A ∈ Rm×m , B ∈ Rm×n , C ∈ Rn×m , D ∈ Rn×n . (1.29)
C D
In the case that A or D is singular, replacing A−1 and D −1 with generalized inverses 3 yields the generalized
Schur complement.
Schur complement is one of the most important tools for analyzing positive semi-definiteness and positive
definiteness of symmetric matrices. Consider the following symmetric matrix:
A B
X= ∈ S (m+n) , A ∈ Rm×m , B ∈ Rm×n , D ∈ Rn×n , (1.32)
BT D
then we have:
(iv) Sufficient and necessary condition for X 0 can be described using generalized Schur complements [7].
3 https://en.wikipedia.org/wiki/Generalized_inverse
6
Lecture 1: Mathematical Preliminaries
where each fi , i = 1, . . . , m is differentiable. Then the Jacobian (or derivative) of f w.r.t. x, denoted as
Df (x), is an m × n matrix whose (i, j)-th entry is:
∂fi (x)
[Df (x)]ij = , i = 1, . . . , m; j = 1, . . . , n. (1.37)
∂xj
In other word, the i-th row of the Jacobian is the derivative of fi w.r.t. x:
Df1 (x)
.. m×n
, Dfi (x) ∈ R1×n , i = 1, . . . , m.
Df (x) = ∈R (1.38)
.
Dfm (x)
In the case when f is a real-valued function, i.e., f : Rn → R (e.g., each fi in eq. (1.36)), then the gradient
of f w.r.t. x is the transpose of Df (x):
T
∇f (x) = Df (x) ∈ Rn , (1.39)
Chain Rule. Suppose f : Rn → Rm and g : Rm → Rp are both differentiable, then the composition
h : Rn → Rp defined by h (x) = g (f (x)) is differentiable at x, with the derivative computed by the chain
rule as:
Let f : Rn → R be a twice differentiable function, then the second derivative, i.e., the Hessian, of f w.r.t.
x, denoted as ∇2 f (x), is:
2 ∂f (x)
∇ f (x) ij = , i = 1, . . . , n; j = 1, . . . , n. (1.42)
∂xi ∂xj
By definition, the Hessian ∇2 f (x) ∈ S n is a symmetric matrix. The Hessian can be interpreted as the
derivative of the gradient: ∇2 f (x) = D∇f (x).
7
Lecture 1: Mathematical Preliminaries
Using the gradient and Hessian of f , the second-order approximation of f at x can be written as:
T 1 T
f (z) ≈ f (x) + ∇f (x) (z − x) + (z − x) ∇2 f (x) (z − x) . (1.43)
2
Exercise: Let f : Rn → R, and g : R → R be two twice differentiable functions, what is the Hessian of
h(x) = g (f (x))?
References
[1] Grigoriy Blekherman, Pablo A Parrilo, and Rekha R Thomas. Semidefinite optimization and convex
algebraic geometry. SIAM, 2012.
[2] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.
[3] Jesus Briales, Laurent Kneip, and Javier Gonzalez-Jimenez. A certifiably globally optimal solution to
the non-minimal relative pose problem. In IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR), 2018.
[4] Olof Enqvist and Fredrik Kahl. Robust optimal pose estimation. In European Conf. on Computer Vision
(ECCV), pages 141–153. Springer, 2008.
[5] F. Kahl and R. Hartley. Multiple-view geometry under the `∞ -norm. IEEE Trans. Pattern Anal. Machine
Intell., 30(9):1603–1617, 2008.
[6] H. Yang and L. Carlone. A quaternion-based certifiably optimal solution to the Wahba problem with
outliers. In Intl. Conf. on Computer Vision (ICCV), 2019.
[7] Fuzhen Zhang. The Schur complement and its applications, volume 4. Springer Science & Business
Media, 2006.