Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

COMS 6998 Lec 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

COMS 6998: Algebraic Techniques in TCS (Fall’21) Sep 14, 2021

Lecture 1: Introduction, Graph algorithms using MM


Instructor: Josh Alman Scribe notes by: Shunhua Jiang

Disclaimer: This draft may be incomplete or have errors. Consult the course webpage for the most
up-to-date version.

1 Logistics
Prerequisites.

• Mathematical maturity: read and write formal math proofs.

• Design and analysis of algorithms.

• Linear algebra.

Grading.

• Scribe notes (15%): Scribe for one lecture. Draft due two days after class.

• Problem sets (35%): 3 or 4 in total. Each due in two weeks.

• Final project (50%): Choose between (1) A reading-based project to survey one or more papers.
(2) A research project.
Final report of 5-15 pages + presentation.

2 Overview
We will study applications of algebraic techniques in TCS. Here “TCS” mainly means algorithms and
complexity. We will briefly mention other areas as well, e.g., using the polynomial method in learning
theory. This course mainly covers four topics:

• Algebraic graph algorithms.

• The polynomial method.

• Matrix rigidity.

• Matrix multiplication.

1
2.1 Algebraic Graph Algorithms
We study algebraic tools for faster graph algorithms. Some examples:

• Graph problems: All-pairs shortest paths (APSP), subgraph isomorphism, maximum matching,
longest path.

• Algebraic tools: Polynomial identity testing, fast matrix multiplication (FMM), algorithms for
determinant/inverse.

2.2 The Polynomial Method


The polynomial method represents the target Boolean function as a polynomial, and then derives prop-
erties of the Boolean function by studying the polynomial.
Two ways that we will measure the complexity of a polynomial are:

1. Sparsity: number of monomials in the polynomial,

2. Degree: maximum degree of all monomials.

2.2.1 Examples of Polynomials Representing Boolean Functions


Exact representation of the AND function. The Boolean AND function AN D : {0, 1}n → {0, 1}
is defined as (
1, if x1 = x2 = · · · = xn = 1,
AN D(x1 , x2 , · · · , xn ) =
0, otherwise.
Define a polynomial p : Rn → R as p(x1 , x2 , · · · , xn ) = x1 · x2 · · · xn . It’s easy to see that

∀x ∈ {0, 1}n , p(x) = AN D(x).

The polynomial p has sparsity(p) = 1 and deg(p) = n.


Question: Does there exist a polynomial that equals to the AND function on all x ∈ {0, 1}n and has
degree < n? The answer is no. See the homework.
Instead, we can loosen the restrictions on what it means for a polynomial to compute a Boolean
function to try to achieve lower degree.

Polynomial threshold function. We say a polynomial q : Rn → R is a polynomial threshold function


for AN D if it satisfies the following: ∀x ∈ {0, 1}n ,

if AN D(x) = 1, then q(x) ≥ 0,


if AN D(x) = 0, then q(x) < 0.

For example, the following polynomial q is a polynomial threshold function for the AND function:
 
1
q(x) = x1 + x2 + · · · + xn − n − .
2

2
This is because for x1 = x2 = · · · = xn = 1, q(x) = n − n − 21 = 12 ≥ 0. And if some xi = 0,


q(x) ≤ (n − 1) − (n − 21 ) = − 12 < 0.
Note that the degree of q is deg(q) = 1. This is much smaller than the degree of the previous
polynomial p(x) = x1 · x2 · · · xn .

Approximate polynomial. We say a polynomial q : Rn → R is an approximate polynomial for a


Boolean function f : {0, 1}n → {0, 1} if it satisfies the following: ∀x ∈ {0, 1}n ,

if AN D(x) = 1, then q(x) = 1,


1 1
if AN D(x) = 0, then − ≤ q(x) ≤ .
3 3

We can design an approximate polynomial for the AND function with degree Θ( n) using Chebyshev
polynomials. See homework.

Other types of approximate polynomials. Another definition of approximate polynomial: q(x) =


f (x) for at least 43 fraction of x ∈ {0, 1}n . The polynomial q(x) = 0 is such an approximation for the
AND function since AN D(x) = 0 for all x except the all one vector.
What if we further add another condition that q(x) must equal to 1 when f (x) = 1? The polynomial
q(x) = x1 · x2 satisfies this stronger approximation condition for the AND function. This is because
q(1n ) = 1, and only 41 fraction of x ∈ {0, 1}n has x1 = x2 = 1.

2.2.2 Applications
1. Lower bounds. The polynomial method has been used to prove lower bounds in circuit complexity
and communication complexity, e.g., circuit lower bounds for AC 0 .
The high-level idea is to use a reduction from the known lower bounds of the degree of polynomials.

2. Faster algorithms. The polynomial method is also useful for designing faster algorithms, e.g.,
nearest neighbor search (NNS), all-pairs shortest paths (APSP).
The high-level idea behind these algorithms is to first convert the original problem into a Boolean
function, then approximate the Boolean function using a polynomial with good properties, and
finally design algorithms to evaluate the polynomial efficiently.

2.3 Matrix Rigidity


Definition 1 (Matrix rigidity). Given an integer r and a matrix M of size N × N , the rank r rigidity of
M , denoted as RM (r), is the minimum number of entries of M one must change to make its rank ≤ r.

3
2.3.1 Examples
Identity matrix. The first example is the simplest identity matrix:
   
1 1
1 0
   
IN =  ⇒ .
   
 1   0 
1 0
N r N −r

We can change N − r diagonal entries of IN to zero to decrease its rank to r. Thus RIN (r) ≤ N − r.
In fact, RIN (r) = N − r. This is because changing one entry of a matrix can decrease its rank by at
most 1, so we need to change at least N − r entries of IN until its rank decreases to r.

Upper triangular matrix. A more rigid example is the upper triangular matrix:
 
1 1 1 1
1 1 1 
 
UN =  .

 1 1 
1
PN −r
A naive upper bound is RUN (r) ≤ i=1 i = O((N − r)2 ) where we change all the ones in the lower
N − r rows to zero.
Here is a more efficient way to decrease the rank. We divide the rows of UN into r groups of size Nr ,
PN/r−1
and in each group we change the i=1 i = O(( Nr )2 ) number of zeroes in the bottom left corner to one.
In this way all the rows in one group become the same, so the rank of the modified matrix is r. Below
we show an illustration for N = 6 and r = 2:
 
1 1 1 1 1 1
 ∗ 1 1 1 1 1 
 
 ∗ ∗ 1 1 1 1  N
 r groups of size .
 
1 1 1  r


 
 ∗ 1 1 
∗ ∗ 1
2
We get a better upper bound RUN (r) ≤ O(r · ( Nr )2 ) = O( Nr ). This bound is actually tight.

2.3.2 Valiant Rigidity


Is the upper triangular matrix rigid enough? What is the standard of rigidity? A gold standard is the
Valiant rigidity [Val77]: A matrix M of size N × N is Valiant-rigid if
 
N
RM ≥ N 1+ for some constant  > 0.
log log N

4
The upper triangular matrix is far from being Valiant-rigid:

N2
   
N
RUN =O = O(N log log N )  N 1+ .
log log N N/ log log N

In [Val77], Valiant showed that if there exists an explicit construction of a family of matrices that is
Valiant-rigid, then we can prove a big break-through result in circuit complexity. Currently we still do
not have any explicit construction of matrices that are Valiant-rigid.
Meanwhile, it is known that a uniformly random {0, 1}-matrix M satisfies
 
N
RM ≥ Ω(N 2 )
2

with high probability, and this is even stronger than Valiant-rigidity.


Thus, even though most of the {0, 1}-matrices satisfy our desired property, we still do not have any
explicit construction. This is a common phenomenon in TCS, often referred to as “finding hay in a
haystack”.

2.4 Matrix Multiplication


We are given two matrices A, B ∈ FN ×N over some field F as input, and the goal is to compute C = A×B,
i.e., Cij = N
P
k=1 Aik · Bkj .
Note that naively computing the matrix multiplication takes O(N 3 ) operations.
We always measure the complexity of matrix multiplication by the number of arithmetic operations.
Rather than saying O(· · · ) time, we will say O(· · · ) operations. This is because for different field F, the
time to multiply/add two entries of A and B can be very different. The time to multiply two matrices is
then the number of operations multiplied by the time to do each operation over the field.

2.4.1 Strassen’s Algorithm


In 1969, Strassen [Str69] published the first matrix multiplication algorithm that runs in o(N 3 ) time. His
algorithm runs in O(N 2.81 ) time.

Computing 2 × 2 matrix multiplication faster. Let A and B be two matrices of size 2 × 2:


" # " #
A11 A12 B11 B12
A= , B= .
A21 A22 B21 B22

When computing A × B, the naive algorithm uses 4 additions and 8 multiplications.


Strassen showed a way to compute A × B using 18 additions and 7 multiplications. Strassen’s
algorithm saves the number of multiplications by increasing the number of additions.

5
For completeness we include Strassen’s algorithm for 2 × 2 matrices here. First compute M1 to M7 :

M1 = (A11 + A22 )(B11 + B22 ),


M2 = (A21 + A22 )B11 ,
M3 = A11 (B12 − B22 ),
M4 = A22 (B21 − B11 ),
M5 = (A11 + A12 )B22 ,
M6 = (A21 − A11 )(B11 + B12 ),
M7 = (A12 − A22 )(B21 + B22 ).

The matrix C = A × B is then computed as:

C11 = M1 + M4 − M5 + M7 ,
C12 = M3 + M5 ,
C21 = M2 + M4 ,
C22 = M1 − M2 + M3 + M6 .

The recursive algorithm. For larger matrices, Strassen’s algorithm divides the matrix into 4 block
matrices of size N2 × N2 , and computes the multiplications of the submatrices recursively. In order to
multiply two N × N matrices, the algorithm plugs these submatrices into Strassen’s identity above, so it
computes 18 additions of N2 × N2 submatrices, and 7 multiplications of N2 × N2 submatrices. Fortunately,
even though we are doing many additions, we can very quickly add matrices in just O(N 2 ) time. Thus
we get the following recursive formula for the running time:
 
N
T (N ) = 7 · T + 18 · O(N 2 )
2
=⇒ T (N ) = O(N log2 7 ) ≤ O(N 2.81 ).

Current fast matrix multiplication algorithms. Currently, the fastest matrix multiplication algo-
rithm runs in O(N 2.373 ) time.
Even though Strassen’s algorithm is used in practice, the later theoretically faster algorithms usually
have exceedingly large constants, and they cannot be used in practice.

2.4.2 Applications
1. Matrix multiplication is used in all three previous problems.
In this course, we will first use matrix multiplication as a black-box to design algorithms for other
problems, and in the end we cover the fast matrix multiplication algorithms.

2. Many other linear algebra tasks can be done in the same time as matrix multiplication, including
computing determinant, inverse, linear systems, and some linear programs.

6
3 Graph Algorithms Using MM
Now we delve into our first topic (algebraic graph algorithms). In this section we consider designing
graph algorithms using the algebraic tool of matrix multiplication.

3.1 Finding triangles in a graph


Input: An undirected graph G on N nodes.
Output: Are there nodes a, b, c such that (a, b), (b, c), (c, a) ∈ E(G)?
We can trivially solve this problem in O(N 3 ) time by enumerating all triples (a, b, c). Next we show
an algorithm that runs in O(N 2.373 ) time (the matrix multiplication time).

Algorithm. The algorithm first forms the adjacency matrix A ∈ {0, 1}N ×N of G:
(
1, if (i, j) ∈ E(G),
Aij =
0, otherwise.

The algorithm then computes the matrix A2 , and checks if there exists a pair (a, b) that satisfies

(a, b) ∈ E(G) and A2 [a, b] > 0.

If so, the algorithm outputs “yes” (there exists a triangle), and otherwise the algorithm outputs “no”.

Analysis. We first show the geometric interpretation of the matrix A2 :


N
X
A2 [i, j] = A[i, k] · A[k, j]
k=1
= # of k s.t. (i, j) and (k, j) ∈ E(G)
= # of length-2 paths from i to j.

A2 [a, b] > 0 means there exists a length-2 path from a to b, i.e., there exists another node c where
(a, c), (b, c) ∈ E(G). Thus (a, b, c) is a triangle iff (a, b) ∈ E(G) and A2 [a, b] > 0.
The bottleneck of this algorithm is to compute A2 , and this takes O(N 2.373 ) time. All other compu-
tation can be performed in O(N 2 ) time.

3.2 Finding in a graph


In this section we study another induced subgraph isomorphism problem. The target subgraph is shown
in Figure 1.
Input: An undirected graph G on N nodes.
Output: Are there nodes a, b, c, d such that (a, b), (b, c), (c, a), (a, d), (b, d) ∈ E(G) and (c, d) ∈
/ E(G)?
An incorrect algorithm. Inspired by the triangle algorithm, a straightforward algorithm computes
A2 and checks if there exists (a, b) that satisfies

(a, b) ∈ E(G) and A2 [a, b] ≥ 2.

7
a

c d

b
Figure 1: Target subgraph.

However, this naive algorithm doesn’t work because it doesn’t rule out the existence of the edge (c, d).
A correct algorithm must distinguish our target subgraph and the 4-clique .

The correct algorithm. We make the following observation:


X A2 [a, b]
= #( ) + #( ) · 6. (1)
2
(a,b)∈E(G)

2
This is because (a,b)∈E(G) A [a,b]
P 
2 counts the total number of pairs of “parallel” length-2 paths. A target
subgraph contributes 1 pair (the pair (a − c − b) and (a − d − b)), while a 4-clique contributes
4

2 = 6 pairs (one pair between any two nodes in {a, b, c, d}).
2
Define R(G) := (a,b)∈E(G) A [a,b]
P 
2 . Eq. (1) implies the following:

• If R(G) is not a multiple of 6, then the graph G must contain .

• If R(G) is a multiple of 6, then it’s not clear if G doesn’t contain , or if the number of in G
is a multiple of 6.

In order to truly determine whether G contains , we design a randomized algorithm. The algorithm
0 0
first samples a subgraph G of G where G keeps each node of G with probability 12 . In the next lecture,
we will show that if G contains , then with high probability R(G0 ) is not a multiple of 6.

References
[Str69] Volker Strassen. Gaussian elimination is not optimal. Numerische mathematik, 13(4):354–356,
1969.

[Val77] Leslie G Valiant. Graph-theoretic arguments in low-level complexity. In International Symposium


on Mathematical Foundations of Computer Science, pages 162–176. Springer, 1977.

You might also like