0% found this document useful (0 votes)

8 views

Fast Parallel Matrix GCD

The document presents parallel algorithms for computing the determinant, characteristic polynomial of matrices, and the GCD of polynomials, utilizing Las Vegas algorithms that operate over arbitrary fields. These algorithms achieve a parallel time complexity of O(log² n) and require a polynomial number of processors. The work emphasizes the importance of fast parallel methods for algebraic computations, with applications in solving linear equations and matrix rank determination.

Uploaded by

besnik

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Fast Parallel Matrix GCD

Uploaded by

besnik

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

INFORMATION AND CONTROL 52, 241--256 (1982)

Fast Parallel Matrix and GCD Computations

ALLAN BORODIN,* JOACHIM VON ZUR Q A T H E N , * AND

JOHN HOPCROFT t

*University of Toronto, Toronto, Canada, and

+Cornell University, Ithaca, New York

Parallel algorithms to compute the determinant and characteristic polynomial of

matrices and the gcd of polynomials are presented. The rank of matrices and
solutions of arbitrary systems of linear equations are computed by parallel Las
Vegas algorithms. All algorithms work over arbitrary fields. They run in parallel
time O(log ~ n) (where n is the number of inputs) and use a polynomial number of
processors.

1. INTRODUCTION

Today's technology has motivated recent activity concerning parallel

programs. Much of this activity has focussed on combinatorial questions
(sorting, graph theoretic algorithms, etc.) and on questions relating to the
parallel architecture itself (routing, queuing, etc.). It is also of recognized
importance to investigate algebraic questions, and to this end we present
algorithms for some basic problems such as computing the determinant and
the rank of matrices or the gcd of polynomials.
There are two basically different approaches to what constitutes a "fast
parallel algorithm." One is to start from a good sequential algorithm and try
to parallelize it with a near-optimal speed-up, i.e., try to achieve parallel time
close to (sequential time)/(number of processors). The second approach is to
attempt to make the parallel time as small as possible, allowing an almost
arbitrary (e.g., polynomially bounded) number of processors. While the first
approach seems to be appropriate for the present technology, where in effect
only a rather limited amount of parallelism is available, it is not
unreasonable to expect that some time in the future the "asymptotically fast
algorithms" of the second approach will play an important role. The
situation is not unlike the dual approach to sequential algorithms, where one
is interested in both constant speed-up of known algorithms (say, by

The work of the first two authors was partially supported by NSERC Grant A7631, and
that of the third author by ONR Grant N00014-76-C-0018.
241
0019-9958/82 $2.00
Copyright © 1982 by AcademicPress, Inc.
All rights of reproductionin any form reserved.
242 BORODIN, VON ZUR GATHEN, AND HOPCROFT

programming optimization) and the construction of asymptotically fast

algorithms (even though the hidden constants for the computing time may be
large). Perhaps the reader has guessed by now that here we pursue the
second approach to parallel programming.
In this paper we discuss two basic problems: solving linear equations and
simplifying rational expressions. Both have nice sequential
solutions--Gaussian elimination and Euclid's algorithm--and it is an
intriguing question if there also exist fast parallel methods. While Csanky
(1976) has given a fast determinant algorithm over fields of characteristic
zero, applications such as factoring polynomials require an algorithm that
works over arbitrary fields, in particular finite fields. We present such an
algorithm below, based on the general parallelization result by Valiant-
Skyum-Berkowitz-Rackoff (1981).
As direct corollaries we get fast methods for inverting matrices and
solving nonsingular systems of linear equations. Further applications are the
characteristic polynomial of a matrix and the gcd of polynomials.
Some interesting combinatorial problems--maximal matchings, maximal
flow--translate into the problem of computing the rank of matrices. We
present a fast parallel Las Vegas method that either returns the rank of the
input matrix or reports that it failed; the latter with small probability.
Applications include finding a basis for the nullspace of a matrix, finding a
maximal linearly independent subset of a given set of vectors, and the
solution of a general (possibly singular) system of linear equations, all this
again over arbitrary fields.

2. THE MODEL

The algorithms described in this paper can be implemented on a

synchronous shared-memory model of parallel computation such as the
PRAM (Fortune-Wyllie, 1978), with arithmetic and tests in the ground field
F as basic operations, or on an algebraic circuit. The algorithms all use
O(log 2 n) parallel time (=depth for circuits) and n °u) processors (=gates for
circuits), when n is the number of inputs. In particular, it follows that the
determinant and gcd problems are in the appropriate analog of uniform N C
(Pippenger, 1979), and the rank problem is in the Monte Carlo or
nbnuniform analog of NC.
When the ground field F is Q or a finite field, we can represent the inputs
as strings over a finite alphabet and ask for a (say) PRAM with bit
instructions solving the problem. Our algorithms show that the determinant
and gcd problems are in the corresponding Boolean class Arc, and the rank
problem is in the Monte Carlo or nonuniform Boolean version of N C (in
fact, for F = @ in NC). Here the essential point is that (for F = Q) according
FAST PARALLEL MATRIX 243

to Edmonds (1967) the intermediate values in Gaussian elimination are

reasonably small (see also Bareiss, 1968).
The basic stepping stone for the whole theory is the result by Valiant-
Skyum-Berkowitz-Rackoff (1981). It says that any sequential program
computing a polynomial of degree n with t steps can be converted to a
parallel program with parallel time O0ogn(logn + l o g t)) using O(t3n 6)
processors.
3. DETERMINANT AND GCD
In this section we discuss the following problems: DETERMINANT(n)
(=computing the determinant of an n × n matrix), CHARACTERISTIC
POLYNOMIAL(n), N O N S I N G U L A R EQUATIONS(n) (=computing the
solution of a nonsingular n × n system of linear equations), INVERSION(n)
(=computing the inverse of an n × n matrix, if it is nonsingular), GCD(n)
(=computing the monic gcd(f, g), where f , g E F[x] have degree ~<n). We
write, e.g., INVERSION for the sequence (INVERSION(n)).
The following result was proved by Csanky (1976), but only for fields of
characteristic zero.

THEOREM 1. For any field F, DETERMINANT, INVERSION,

NONSINGULAR EQUATIONS, and CHARACTERISTIC
POLYNOMIAL can be eomputed in parallel time O(log 2 n) using a
polynomially bounded number of processors. For D E T E R M I N A N T and
CHARACTERISTIC POLYNOMIAL, neither branching nor division
occurs.
Proof. We consider ordinary Gaussian elimination performed on an
n × n matrix X = (xij) of indeterminates, with pivots chosen on the diagonal.
This yields a (sequential) computation of d e t X in time O(n3), using +, - ,
• , /. Strassen (1973) gives a general recipe for avoiding divisions which in
our case works as follows: When we execute the above Gaussian elimination
on the n × n identity matrix/, the only divisions that occur are by 1. In other
words, if we shift the input matrix X by the negative of the identity matrix
and thus consider new indeterminates Yo"= xi] - g;j-, then all divisions in the
algorithm (with xij replaced by Yij + g0") are by rational functions in the yij's
which are 1 for Yll = YI2 . . . . . y , , = O, since this corresponds to setting
X=L These rational functions are invertible in the ring R =
FIY11,Y~2 ..... Y,,I of formal power series in the yifs. Also, d e t X =
det((yij + gi])ij) is a polynomial in R of degree n.
We now modify our Gaussian elimination as follows: Replace each
division f / ( 1 - g), where g has constant term 0, by a multiplication f *
(1 + g + g 2 + ...). Also, instead of computing a power series h = h o +
h 1 + .-., where h t C R is homogeneous of degree/, compute only the
homogeneous parts ho,...,h .. This can be done at a cost of ~<(n + 1) 2
244 BORODIN, VON ZUR GATHEN, AND HOPCROFT

operations per original operation. All this leads to a sequential computation

of d e t X in R with sequential time O(nS), using 1, +, - , . only. (This timing
estimate can be improved; see Strassen, 1973.) Substituting Yo = xo - 6tJ in
this algorithm, we get a division-free straight-line algorithm in
F[Xal, x~2 ..... x , , ] that computes d e t X in time O(nS).
We can now apply the parallelization method of Valiant-Skyum-
Berkowitz-Rackoff (1981) to obtain a parallel algorithm for the determinant
using O(log z n) time and a polynomial number of processors. Neither
division nor branching occurs.
Obviously INVERSION and N O N S I N G U L A R EQUATIONS are not
harder than DETERMINANT, but they require a division step at the end of
the computation. For the characteristic polynomial, we execute the sequential
division-free determinant algorithm on X - t I . Each step computes a
polynomial in t, and we split the step into ~<(n + 1) 2 operations in
F[Xll,X~z ..... x , , ] by computing the coefficients of t °, tl,...,t ~ separately.
Parallelization applies again to yield the result. |

R e m a r k 1. The above two-step conversion process applies to any

sequential computation that computes a polynomial f C F [ x 1..... Xm] of
degree n in time t.
The first step is to get rid of divisions ~ la Strassen (1973). For this, we
have to find values a~,..., a m E F such that h(al ,..., am) 4= 0 for each division
g/h in the computation. Then we shift the inputs by the negative of the ai's
and thus consider new indeterminates Yz = xi - ai. Now each division in the
algorithm (where every occurrence of xi is replaced by Yt + at) is by a
rational function in Yl,.-., Ym which has a nonzero value for y~ . . . . .
Ym = O. Such a rational function is invertible in the ring R = F l y ~. . . . . Y m ~ ,
and f = f ( Y l + al . . . . . Ym + a,,) E R is a polynomial of degree n.
As above, we replace every division by a multiplication in R, and for all
operations compute the homogeneous parts of degree ~n. This yields a
sequential computation in R for f with time O(tn2), using only +, --, , , and
constants. Back-substituting we get an O(tn2)-algorithm in F i x I ..... Xm] that
computes f without divisions.
The second step now is to apply the parallelization technique of Valiant-
Skyum-Berkowitz-Rackoff (1981) to obtain a parallel algorithm with
parallel time O(log2(tn)) using a polynomial (in t and n) number of
processors.
How can we find the a~,..., a m required in the first step? Every rational
function g computed by the given algorithm can be written as a quotient
g = b/c of polynomials in F [ x 1..... Xm] with deg b, deg c ~< U. The product d
of all such denominators c has degree <~t2t. For every subset P ___F with
# P > rot2 t there exists a = (al ..... am) C pm such that d(a I ..... am) 4= O. Such
an a satisfies the requirements.
FAST PARALLEL MATRIX 245

In fact, there is a Monte Carlo algorithm to find such an a: Take P with

#P>~ 2mt2 t and choose a ~ pm at random. Then run the algorithm with
input a, and report "failure" if a division by 0 is attempted. By Lemma 1, the
probability that this will occur is ~<½. Note that if F = Q, say, and we have a
"random bit generator," then we can choose P : {-mt2t,...,mt2t t and
generate the required a's in time polynomial in t and m.
If F is finite, say with p elements, then we can take an irreducible
polynomial u ~ F[z] of degree 1 such that pt>l 2mt2 t, and consider the
extension field G : F [ z ] / ( u ) ofF. We can find such a u in random
polynomial (in m and t) time (Rabin, 1980). We now view the algorithm as a
computation over G[x I ..... Xm]. The field G has >/2mt2 t elements, so that we
can apply the above Monte Carlo procedure to find an appropriate a ~ Gm.
Each operation in G can be simulated by O(I 2) operations inF. Thus the
"sequential to parallel" conversion also works over finite fields.
One important property that is lost in the conversion is uniformity: given
a uniform family of sequential algorithms, it is not clear how to make the
resulting family of parallel algorithms uniform. In order to conserve
uniformity, it would be sufficient to know the existence of a "test"
polynomial d as above, with degree polynomial (not exponential as the d
above) in tn. (The family of parallel algorithms in Theorem 1 is, of course,
uniform, because we could explicitly write down al,..., am.)
Remark 2. The algorithm of Theorem l uses O(n ~5) processors.
Recently, Berkowitz (1982) has given an algorithm that computes n × n
determinants in parallel time O(log 2 n) using O(n 3"5) processors.

THEOREM 2. For any field F, G C D ean be computed in parallel time

O(log 2 n).
Proof Let f, g C F [ x ] be nonzero, d e g f : m ~ < n = d e g g . Let h :
gcd(f, g) be the monic greatest common divisor o f f and g, and d = deg h.
There exist polynomials u, v ~ F[x] such that h = uf + vg, deg u < n - d,
and deg v < m - d . For any k / > 0 and polynomials s = ~ s i x i and t =
t~x i, the conditions "sf + tg is monic of degree k, and deg s < n -- k" tran-
slate into the following (n + m -- 2k) X (n + m - 2k) systems S k of linear
equations in the coefficients of s and t:

fm
fm-I
gn
gn-1 gn
-~s._~_~] -0].,::
°.

g~
l SO [
f. ! tm--k-I ]

A..-A
go.
• go "'" g k _1/
o,
246 BORODIN, VON ZUR GATHEN, AND HOPCROFT

So for f , g as above, the following algorithm computes gcd(f, g) in parallel

time O(log z n):

(1) Compute in parallel a o..... a,., where a k = d e t A k and A k is the

coefficient matrix of S k.
(2) Set d = rain{k: a k 4= 0}.
(3) Compute a solution (s, t) of S d. (Note that S d is a nonsingular
system.)
(4) Compute gcd(f, g) = sf + tg. |
It would be important to have a similar result for the gcd of two integers,
and we ask the
Open Question 1. Is I N T E G E R G C D C NC?

4. RANK OF MATRICES

For algorithms computing the rank of matrices, i.e., the maximal size of
nonsingular submatrices, we can restrict attention to square matrices (by
padding with zeroes if necessary). We denote by Mn(S ) the set of n × n
matrices with entries from a set S. The rank cannot in general be considered
as an element of the ground field, and we have to make some output
convention such as the following: we require the algorithm to compute
fl ..... f~ ~ F(xl~, xlz ..... x~n) such that for any A E M.(F)
rank(A) = max{i: i = 0 or f/(A) 4= 0}.

We remark that this convention is inadequate over C, since then the n

equations fl(A ) . . . . . fn(A) = 0 in n ~ variables should have only A = 0 as
solution. If n > 1, then this is impossible over any algebraically closed field.
Ibarra-Moran-Rosier (1980) have the following result: If F is a subfield
of ~, then rank(A)=rank(AtA) and the matrix AtA is symmetric and
diagonalizable. Hence its rank can be read off from the coefficients of its
characteristic polynomial c = Y~0<i<n ci xi as

rank(A) -- max{j: e._j 4= 0}.

Thus R A N K can be computed in parallel time O(log 2 n) for such a field.

They also show that this is true if F is a subfield of C and one is allowed to
use complex conjugation in the algorithm.
The rank question has some nice applications in combinatorial
complexity. Consider a bipartite graph G on 2n nodes. We associate to it an
n × n matrix X over Q(x11, xlz ..... xn,) which has x u in the (i, j ) position if
node i is connected to node j, and zero otherwise. Then the maximal size of a
FAST PARALLEL MATRIX 247

matching in G is equal to rank(X) (Edmonds, 1967). By substituting

randomly chosen integers (from a fixed finite range, see Lemma 1) for the
indeterminates and computing the rank of such matrices over Q, we get a
Monte Carlo algorithm to determine the maximal size of matchings in G,
and hence a (nonuniform) algorithm for this problem whose parallel time is
O(log 2 n). It would be interesting to have an algorithm of this type that
actually constructs a maximal matching.
Next consider a directed graph with each edge being assigned an integer
capacity that is polynomially bounded by the number of nodes of the graph.
Feather (1981) reduces the problem of finding the maximum flow in such a
graph to the above maximal matching problem, and thus obtains an
O(log 2 n) algorithm.
It is interesting to note that without the restriction on the capacities the
problem is log space complete for P (Goldschlager-Shaw-Staples, 1982),
and hence we do not expect it to have a solution in parallel time O(log k n)
for some k.
The rank algorithm by Ibarra-Moran-Rosier provides a nice solution for
fields like • and ~. For the general case, in particular for finite fields, we
have the following result:

THEOREM 3. For any field F, we can compute the rank of n X n matrices

over F with a Monte Carlo algorithm with error probability ~<0.95 using
parallel time O(log z n) and a polynomially bounded total number of
processors. No division occurs.
Proof Let F be a field, n C N and A ~ Mn(F ). We assume some finite
subset P ___F of cardinality p such that either p ) 3n z or P = F. Furthermore,
we assume a "random element generator" for P that produces in one step of
a processor a randomly chosen element from P with respect to the uniform
distribution on P. (Thus if # F >~ 3n 2, we can take any large enough subset
for P, and otherwise we take P = F.)
For a matrix M ~ M n ( F ) and l ~ i ~ < n let Pi(M) be the principal i × i
minor of M, i.e., the square submatrix of M consisting of the first i rows and
columns of M, and pi(M) = det(Pi(M)). We perform the following algorithm
to compute rank(A):
(1) Choose two matrices B, C C MR(P) at random.
(2) For all i, 1 ~< i ~ n, compute f / = pi(BAC).
The output s can be read off in the manner described above:

s = max{i: i = 0 or f. 4= 0}.

It is clear from Theorem I that this algorithm uses parallel time O(log 2 n), a
polynomially bounded number of processors, and no division. Also
248 BORODIN, VON ZUR GATHEN, AND HOPCROFT

s ~ rank(A). The remainder of this proof is devoted to establishing the

estimate
Prob(s = rank(A)) ~> 0.05.

Let r = rank(A)• We consider

g~ = Pr(QAR ) ~ F[QI~, Q12,..., Rn,]

as a polynomial in the indeterminate entries Qis, Ru of the n × n-matrices
Q, R. Thus gA is linear in each of its 2n 2 indeterminates. It is nonzero, since
by permuting the rows and columns of A one can obtain a matrix with
nonsingular principal r X r minor; in other words, there exist permutation
matrices S, T E M , ( F ) such that Pr(SAT) is nonsingular, and hence
gA(S, T) ~ O. We also have
Prob(s < r) = Prob(ga(B, C) = 0)
-~ p-2"2 #{(B, C) C M,(P)a: gA(B, C) = 0}.

It remains to show that

Prob(g~(B, C) = 0) ~ 0.95.

We now distinguish two cases.

Case 1. p >~3n 2. By Lemma l,
Prob(gA(B, C) = O) <~2na/p ~ 0.95.
Case 2. P = F. Let
1 "..

1
D = M.(F)
0

be a diagonal matrix with r a n k ( D ) = r. Then

Pr(BDC) = Pr(B) Pr(C), Pr(BDC) = Pr(B)pr(C).

By Lemma 2(ii), we have
Prob(pr(B ) = 0) < 3,

and similarly for C. Since the entries of B and C are chosen independently,
pr(B) and p~(C) are independent random variables• Hence we get
FAST PARALLEL MATRIX 249

Prob(pr(BDC ) = O) = Prob(pr(B) p~(C) = O)

= Prob(pr(B ) = 0) + Prob(pr(C ) = O)

-- Prob(flr(B ) = Pr(C) = 0) ~< ~~- < 0.95,

using the monotonicity of the function 2 x - x 2 for 0 ~< x ~< 1.

Getting back to our input matrix A, we note that there exist nonsingular
matrices S, T ~ M , ( F ) such that A = SDT. Then

Prob(gA(B, C) = 0) = Prob(p~(BAC) = O)

= Prob(pr(BSDTC ) = O) = Prob(Pr(BDC ) = 0) ~< 0.95.

The third equality follows from the fact that the mapping

Mn(F ) × M , ( F ) -t Mn(F) X M , ( F )
(B, C ) ~ (BS-', T-1C)
is a bijection. |
For the two lemmas that establish the probabilistic estimates, we assume
the following sebup: A field F, P _~ F finite with p ~> 2 elements, and d ~ N.

LEMMA I. Let m E N, f C F[x, ..... Xm] be nonzero and of degree <.d in

each variable. Then
P r o b ( f = 0) ~ md/p.
Proof Let

q = V r o b ( f = 0) = p - m # { a ~ pro: f ( a ) = 0}.

W e s h o w by induction on m that q <~md/p. The case m = 1 is obvious. For

m > 1 we can a s s u m e that x m occurs in f (otherwise, apply the induction
hypothesis), and write f as a polynomial in x m

f= ~P giXrn, g o , . . . , g e E F [ x l ..... X m _ l ] , ge=/=O, O < e < . d .

O<~i~e
Then

#{a c P m : f ( a ) = 0} ~<#{a c P m : ge(a) = 0}

+#
la ~Pm: g~(a)@ 0 and ~ gi(a) a m
i
<~ pm(m -- 1)diP + d p m- ' = pmmd/p.

Hence q <~md/p. |
250 BORODIN, VON ZUR GATHEN, AND HOPCROFT

u(t, n)

3/4

1/2

1/4

t
1/4 1/2 3/4 1

FIG. 1. The upper bound u(t, n) on the probability that an n X n matrix is singular. The
entries of the matrix are nonconstant polynomials of degree ~<d over an arbitrary field F
whose arguments are randomly chosen from a subset of F with p elements, and t = dip. The
values n = 1, 2, 3, 4 are shown, and n = oe shows an upper bound for any n E N.

Lemma 2 gives a sharp estimate of the probability that a matrix of

polynomials is singular. We consider indeterminates X~l, x ~ ..... x,n over F,
and an n X n matrix g of univariate polynomials gu C F[xu] of degree ~<d.
We assume that no go is constant. Furthermore, we introduce the function
u(t, n ) = 1 - I - I i < i < n ( 1 - t i) for 0 ~ t ~ < 1 and n@ N. u is monotonically
increasing in both arguments (see Fig. 1).

LEMMA 2. In the above set-up, let q be the probability that det(g) is

zero. Then
(i) q <~u(d/p, n),
(ii) if d = 1, then q < ~,
(iii) if d = 1 and P is afield, then q = u ( 1 / p , n ) .
Proof. Let t = diP.
(i) We have to show that

q = p - " 2 # { A ~ M,(P): det(g(A)) = 0} ~< u(t, n),

where g(A) is the matrix with (i,j) entry gu(Au). For 1 ~< r~< n, let Tr be the
set of all r × n matrices with entries from P, and S t = {A E Tr:
rank(g(A)) = r}. Now let A C S~_ 1 and 1~_ {1 ..... n} such that # I = r - 1
FAST PARALLEL MATRIX 251

and the square minor of g(A) with rows 1..... r - - 1 and columns i ~ I is
nonsingular. A vector y C F" which is linearly dependent on the row vectors
of g(A) is determined by its entries Yi for i E I.
For a vector z C T~ such that y = (gn(z0,..., g~,(z,)) is linearly dependent
on the rows of g(A), we can in general choose the entries z i for i E I
arbitrarily; for the other entries we then have at most d choices each, deter-
mined by the fixed value for gri(zi). In particular, there are at most
pr-~d ~-~+1 such vectors z E T 1. Thus
#Sr>/ ( p n - pr-ld"-r+~)#Sr_ l= p n ( 1 - t " r+,)#Sr_~,

and by induction on r we get

#S~ >~ p r , 1-I (1 -- t;).

n-r<i<~n

Thus

q=P n2(p,2_#Sn)~<l_ I1 (1--t')=u(t,n).

l<i<n

(ii) For 0 ~ t < ~ l , let

v(t)= lira u(t,n)= l - ]-I ( 1 - t i ) .

n~oo 1<i

The product in v is uniformly convergent on every interval [0, a] with a < 1

(see Henrici (1977), Sect. 8.2, for a discussion and the relation of v with
combinatorial questions), and v is a monotonically increasing function. In
particular, for 0 ~< t ~ ½ and n ~ N we have

0 <. u(t, n) < v(t) ~< v(½) = 0.7112... < 3.

This proves (ii), since then t = lip ~ ½.
(iii) Under the hypotheses of (iii), we can replace "at most
pr-ld"-r+l" by "exactly pr-l,, in the argument for (i), and
#St = pr, 1~ (1 - t i)
n-r<i<~n

follows. This implies q =- u(t, n). This fact has been discovered several times,
see, e.g., Jordan (1911), F i n e - N i v e n (1944), and R o t m a n (1973,
Theorem 8.8). |
Statement (iii) shows that the estimate in (i) is sharp in a certain sense.
We remark that the estimate of (i) also holds when the determinant is
replaced by the permanent; the proof is slightly different.
It is now clear what the estimates in the proof of Theorem 3 really should
252 BORODIN, VON ZUR GATHEN, AND HOPCROFT

be: if c = 2 v ( ½ ) - v(½)2 = 0.9166..., then we should assume either p >/2n2/c

or P = F, and the error probability then is ~<e.
When interpreted as the decision problem "is A in U r = {B ~ M ~ ( F ) :
rank(B) ~< r}?", the above Monte Carlo algorithm always answers correctly
if A C U r, and correctly with probability ~>@0 if A ~ U r. We can improve
this behaviour to get a Las Vegas algorithm that always answers correctly
and with high probability has a low parallel running time.

THEOREM 4. For any field F, there exists a Las Vegas algorithm f o r the
rank of n × n matrices that uses parallel time O(log z n), no divisions and a
polynomial number o f processors and that either returns the rank o f the input
matrix (and a maximal nonsingular minor) or reports "failure." The
probability o f the second ease is ~ 2 - " .
Proof Let A ~ Mn(F ). For 1 ~< i ~< n let A i C Mn(F) consist of the first i
rows of A and zero rows otherwise, and r i = rank(Ai). Apply the algorithm
of Theorem 3 independently 28n times in parallel to each A i, and let s i be the
maximum of the numbers computed forAi. Thus s; ~< r t, and for each i
[ 19 "~28n
Prob(si < ri) <. ~ j .

Let I = { i : - l ~ < i < ~ n and si_ l < s i } (with s 0 = 0 ) , and A ' C M , ( F ) be the
matrix whose ith row is the ith row of A if i ~ 1 and zero otherwise. Perform
the computation that was done above for the rows of A now for the columns
of A' to obtain a set J~_ {1,..., n}. The parallel time for all this is O(log 2 n),
and
Prob(the I × J minor of A is not a maximal (square)
nonsingular minor) <. zntT¢)
~ " 19 x28n < 2 - "

If #I4= g~4r, then report failure. Else apply the determinant algorithm of
Theorem 1 to compute AIj and each AKL with I t _ K , J c L , # K = # L =
# I + 1 in parallel time O(log 2 n). Here AKL denotes the determinant of the
minor of A with rows from K and columns from L. If either A u = 0 or some
such A/~L is nonzero, then report failure. Otherwise, return s n = rank(A) and
the I × J minor as a maximal nonsingular minor. II
Using the rank algorithm over ~ by Ibarra-Moran-Rosier and the above
construction, we get a deterministic algorithm that uses parallel time
O(log 2 n) and computes a maximal nonsingular minor. Can we have similar
improvement for arbitrary fields? Of course our Monte Carlo algorithm
yields nonuniform algorithms using parallel time O(log 2 n). It would be
particularly interesting to have an answer to the following:
Open Question 2. Can R A N K be computed in parallel time O(log 2 n)
over finite fields, using a uniform family of deterministic algorithms?
FAST PARALLEL MATRIX 253

In connection with the maximal matching problem we remark that our

algorithm gives an O(log 2 n) method to determine a set of nodes on which a
matching of maximal size will take place; not, however, the actual matching.

5. REDUCTIONS

We have considered a number of algebraic problems whose parallel

complexity is now known to be O(log 2 n). Whether or not the parallel
complexity for any of these (and some closely related) problems can be
reduced to O(log n) remains a fundamental challenge for all of complexity
theory (see, for example, Borodin, 1982, and Valiant, 1982). It is natural
then to study the relative complexity of these problems.
Valiant (1982) introduced the notion of algebraic projections as a strong
type of reducibility between polynomials. For functions P = (P(n)) and P' =
(P'(n)), we write P <~P' to informally denote the fact that P can be reduced
to P' essentially by multiple use of projections and possibly some simple
(i.e., of O(log n) parallel complexity) arithmetic operations. A precise
definition for <~ will not be developed here. Given any reasonable definition
of reducibility P ~<P', and certainly for the specific reductions given in this
section, it follows that T(P) and T(P') satisfy T(P)--O(T(P')), using the
fact that for all interesting functions P, log n = O(T(P(n))). We write P ~ P'
to denote P <~P' and P' <~P, and we write P <~P' + P" to denote that both
P' and P" are used in the reduction.
Csanky (1976) has the following reductions:
(a) Over any field,
INVERSION ~ NONSINGULAR EQUATIONS
DETERMINANT ~< CHARACTERISTIC POLYNOMIAL.

(b) The characteristic polynomial can be computed by evaluating it at

several points (using a determinant algorithm) and interpolating. If the field
contains the necessary roots of unity to support a fast Fourier transform,
then the interpolation can be performed in O(log n), using the roots of unity
as interpolation points. We then have
CHARACTERISTIC POLYNOMIAL ~< DETERMINANT.

(c) If the field has characteristic zero, then

DETERMINANT <~ NONSINGULAR EQUATIONS.

Thus, over C all four problems are equivalent.

Next, we consider the problems BASIS (=computing a maximal linearly
independent subset of a given set of vectors), EQUATIONS (=deciding
254 BORODIN~ VON ZUR GATHEN, AND HOPCROFT

whether a (possibly singular) system of linear equations has a solution, and

in the affirmative case, computing one solution), and N U L L S P A C E
(=computing a basis for the nullspace of a matrix).

THEOREM 5. For an arbitrary field, we have the following reductions:

(i) BASIS ~ RANK ~< NULLSPACE <~ R A N K + INVERSION,

(ii) EQUATIONS ~ RANK + INVERSION.
Proof (i) We note that the rank is implied directly by BASIS or
NULLSPACE. The reduction BASIS ~< R A N K is obtained by including v i
in the basis iff rank(v 1..... Vi_l) < rank(v~ ..... vi). For the remaining
reduction, we can compute a maximal nonsingular minor M for an input
matrixA, by applying BASIS (say) first to the columns of A and then to the
rows of the selected columns. Without loss of generality let M be the upper
r × r submatrix of A. For each i satisfying r < i~<n, we can solve the
nonsingular system M x i = y i , where Yi denotes the first r rows of the ith
column ofA. A basis for the nullspace of A then consists of the vectors

[iJ
1 ' r<i<~n,

that is, x i is extended by 0's except for an - i in the ith position.

(ii) For EQUATIONS, we again compute a maximal nonsingular
minor M of the input matrix, solve the corresponding nonsingular system of
linear equations, set all indeterminates not corresponding to columns of M
equal to zero, and check whether this constitutes a solution. If it does not,
then the input system has no solution at all. |
Allowing nonuniform reductions, and allowing the concept of reducibility
to use linear transformations, it follows (by Theorem 3 and Csanky's
reductions) that RANK ~< CHARACTERISTIC POLYNOMIAL ~<
D E T E R M I N A N T and hence EQUATIONS <~ DETERMINANT. We also
know (from Theorem 2) that GCD ~< DETERMINANT.
We are left with a number of potential reductions which are either
completely unresolved or hold only in the nonuniform case. In particular, we
ask the following:
Open Question 3. (a) Are any of the above problems ~<GCD?
(b) Is D E T E R M I N A N T ~ N O N S I N G U L A R EQUATIONS for
arbitrary fields?
FAST PARALLEL MATRIX 255

The latter question is also open in the sequential setting (see Baur-Strassen,
1983).

6. CONCLUSIONS

We have laid the foundation for what might be called a "theoretical

package for parallel symbolic manipulation." The problems investigated
include gcd of polynomials, solution of linear equations, determinant and
rank of matrices. They all can be solved in parallel time O(log2(input size));
for rank a Las Vegas method is used.
Further important routines for this "theoretical package" include factoring
polynomials over finite fields (which provided the original motivation for this
paper; consider the critical role of the GCD and NULLSPACE in the
Berlekamp (1970) factoring algorithm), continued fractions and partial
fraction expansion of rational functions, Pad6 approximation of power series,
and interpolation. They will be discussed in a forthcoming paper (von zur
Gathen, 1983).
Finally, as in sequential computation, we remark that integer problems are
often more difficult to understand than the corresponding polynomial
problem. For example, it remains an open problem as to whether or not a
fast (e.g., O(log z n)) parallel algorithm exists for the gcd of two n-bit
numbers. An even more challenging open problem concerns the parallel
complexity of determining primality.

ACKNOWLEDGMENTS

We are very grateful to Steve Cook for an observation which led to Theorem 2, to both
Steve Cook and Les Valiant for pointing out relations with combinatorial problems, and to
Volker Strassen for an improvement in the algorithm for Theorem 3. Thanks also go to
Yaacov Yesha for pointing out one of the references.

REFERENCES

BAREISS, E. H. (1968), Sylvester's identity and multistep integer-preserving Gaussian

elimination, Math, Cornput. 22, 565-578
BAUR, W. AND STRASSEN, V. (1983), The computational complexity of partial derivatives,
Theor. Comput. Sci. 22, 317-330.
BERLEKAMP, E. R. (1970), Factoring polynomials over large finite fields, Math. Comput. 24,
713-735.
BERKOWITZ, S. J. (1982), On computing the determinant in small parallel time using a small
number of processors, manuscript, October 1982.

643/52/3-2
256 BORODIN, VON ZUR GATHEN, AND HOPCROFT

BORODIN, A. (1982), Structured vs. general models in computational complexity, in "Logic

and Algorithmic, Symposium in Honour of Ernst Specker," Monographie No. 30 de FEn-
seignement Math6matique, pp. 47-65.
CSANKY, L. (1976), Fast parallel matrix inversion algorithms, S I A M J. Comput. 5, 618-623.
EDMONDS, J. (1967), Systems of distinct representatives and linear algebra, J. Res. Nat. Bur.
Stand. 71B (4), 241-245.
FEATHER, T. (1981), Private communication, November 1981.
FINE, N. J. AND NIVEN, I. (1944), The probability that a determinant be congruent to a
(mod m), Bull. Amer. Math. Soe. 50, 89-93.
FORTUNE, S. AND WVLLIE, J. (1978), Parallelism in random access machines, in "Proc. 10th
ACM Symposium on Theory of Computing," pp. 114-118, San Diego, California.
GATHEN, J. VON ZUR (1983), Parallel algorithms for algebraic problems, in "Proc. 15th
ACM Symp. Theory of Computing, Boston."
GOLDSCHLAGER,L. M., SHAW, R. A., AND STAPLES,J. (1982), The maximum flow problem is
log space complete for P, Theor. Comput. Sci. 21, 105-111.
HENRICI, P. (1977), "Applied and Computational Complex Analysis," Vol. 2, Wiley,
New York.
IBARRA, O., MORAY, S., AND ROSIER, L. E. (1980), A note on the parallel complexity of
computing the rank of order n matrices, Inf. Process. Lett. I 1, 162.
JORDAN, C. (1911), Sur le nombre des solutions de la congruence laikl -~A (mod M), J. Math.
Pures Appl. (6), 7, 409-416.
PIPPENGER, N. (1979), On simultaneous resource bounds, in "Proc. IEEE 20th Annual
FOCS," October 1979, 307-311.
RA~rN, M. O. (1980), Probabilistic algorithms in finite fields, S I A M J. Comput. 9, 273-280.
ROTMAN, J. J. (1973), "The Theory of Groups," (2nd ed.), Allyn & Bacon, Boston.
STRASSEN, V. (1973), Vermeidung von Divisionen, J. Reine Angew. Math. 264, 184-202.
VALIANT, L., SKYUM,S., BERKOWITZ,S., AND RACKOFF, C. (1981), Fast parallel computation
of polynomials using few processors, S I A M J. Comput., to appear.
VALIANT, L. (1982), Reducibility by algebraic projections, in "Logic and Algorithmic,
Symposium in Honour of Ernst Specker," Monographie No. 30 de l'Enseignement
Math6matique, pp. 365-380.