Parallel DP

Ecient Parallel Dynamic Programming1
(Revised)
Phillip G. Bradford
Indiana University
Department of Computer Science
215 Lindley Hall
Bloomington, IN 47405
(812) 855-3609
bradford@cs.indiana.edu
October 27, 1994
1 An extended abstract of this paper is in the Proceedings of the 30th Annual Allerton Conference on
Communication, Control and Computing, University of Illinois, 185-194, 1992. This is Technical Report
# 352 (Revised), Indiana University. Hardcopy is available or a postscript le is obtainable via internet:
cs.indiana.edu:/pub/techreports/TR352.ps.Z. Tech-Report revision posted: Oct. 1994.
Abstract
In 1983, Valiant, Skyum, Berkowitz and Racko showed that many problems with simple O(n3)
sequential dynamic programming solutions are in the class NC . They used straight line programs
to show that these problems can be solved in O(lg2 n) time with n9 processors. In 1988, Rytter used
pebbling games to show that these same problems can be solved on a CREW PRAM in O(lg2 n)
time with n6 =lg n processors. Recently, Huang, Liu and Viswanathan [23] and Galil and Park [15]
give algorithms that improve this processor complexity by polylog factors.
Using a graph structure that is analogous to the classical dynamic programming table, this
paper improves these results. First, this graph characterization leads to a polylog time and n6 =lg n
processor algorithm that solves these problems. Second, there follows a subpolylog time and sub-
linear processor parallel approximation algorithm for the matrix chain ordering problem. Finally,
this paper presents a n3 =lg n processor and O(lg3 n) time algorithm that solves the optimal matrix
chain ordering problem and the problem of nding an optimal triangulation of a convex polygon.
1.1 Introduction
Dynamic programming has long been useful in designing ecient sequential algorithms for a va-
riety of combinatorial optimization problems. However, the standard sequential construction of
dynamic programming tables has prevented the development of ecient parallel algorithms. With
straightforward parallelization, many elementary dynamic programming algorithms require linear
time at best. This paper gives a graph characterization that models dynamic programming tables.
These graphs lead directly to polylog time algorithms for optimal matrix chain ordering, the opti-
mal construction of binary search trees and the optimal triangulation of convex polygons. Further,
tree decomposition of these graphs gives ecient polylog time parallel algorithms for the matrix
chain ordering problem. The matrix chain ordering problem is the primary focus of this paper.
The dynamic programming paradigm is based on the principle of optimality. This principle
is that for a structure to be optimal all of its well-formed substructures must also be optimal.
Hence, the dynamic programming paradigm is essentially a top down algorithm design method.
Conversely, the greedy principle basically is that if a substructure is optimal then it is in some
optimal superstructure. In some sense this is a bottom up design method. Intuitively, the main
results of the paper rely on nding applications of the greedy principle that enhance the eciency
of applying the principle of optimality.
This paper rst shows how to solve several problems by nding a shortest path in an associated
digraph. Next, it focuses on solving the matrix chain ordering problem building on work by Hu
and Shing, and Berkman et al. Given a graph characterizing a dynamic programming table and by
isolating monotonic and non-monotonic lists of adjacent matrix dimensions, we can ignore certain
subgraphs while nding a shortest path.
1.1.1 Main Results of this Paper
This paper lays a foundation for new parallel algorithms that eciently solve many combinatorial
optimization problems. From this foundation, several algorithms that solve a variety of problems
follow. The following problems are representative of those to which these methods can be applied
(for formal denitions of these problems see [3, 11]):
optimal matrix chain ordering: Find an optimal ordering to multiply a matrix chain together,
where the matrices are pairwise compatible but of varying dimensions.
optimal convex polygon triangulation: Find an optimal triangularization of a convex polygon
given a triangle cost metric.
optimal binary search tree construction: Build a binary search tree with minimal average
lookup time given a totally ordered set of elements and their access probabilities.
The nal sections of this paper focus on the matrix chain ordering problem and the convex polygon
triangulation problem.
These three problems are representative of many that can be solved sequentially in O(n3 ) time
with elementary dynamic programming algorithms, although there are faster algorithms that are
more complex.
This paper shows how to transform all three problems to a minimum cost parenthesization
problem on a weighted semigroupoid. This problem is then transformed to a shortest path problem
on a special weighted digraph. Applying a shortest path algorithm based on matrix multiplication
then gives a O(lg2 n) time and n6 =lg n processor solution.
1
The rest of the paper is about the matrix chain ordering problem. Next follows a subpoly-
logarithmic time and sublinear processor algorithm that provides an approximate solution that is
guaranteed to be within 15.5% of optimality. This algorithm follows from the work of Chin [10], and
Hu and Shing [20] combined with that of Berkman et al. [4]. Finally, exploiting special properties
of the digraphs that model the semigroupoids, a O(log3 n) time algorithm using n3 =log n proces-
sors is given. In some sense this work extends the results of [1, 2, 26]. The work of Hu and Shing
[19, 21, 22] supplies a basis for these algorithms, although they are built in a dierent framework.
This paper is an update of [6] and the full version of [7]. Much subsequent work has appeared,
take for example [13, 29, 8, 30]. In addition, [12] reports very similar results to those in section 6
of this paper.
This paper takes all logarithms as base 2 and assumes the common-CRCW PRAM parallel
model.
1.1.2 Previous Results
Using straight line arithmetic programs Valiant et al. [32] showed that many classical optimization
problems with ecient sequential dynamic programming solutions are NC . However, the algorithms
they suggest appear to require (log2 n) time and n9 processors. Using pebbling games, Rytter [31]
describes a general method to generate more ecient parallel algorithms for a class of optimization
problems with dynamic programming solutions. He solves the three problems mentioned in the
previous subsection in O(log2 n) time with n6 =log n processors on a CRCW PRAM. Huang, Liu
and Viswanathan [23] and Galil and Park [15] give algorithms that solve this problem in O(lg2 n)
time using n6 =! lg5 n and n6 =! lg6 n processors, respectively.
The three problems in Subsection 1.1.1 can be solved sequentially in O(n3 ) time with elementary
dynamic programming algorithms, and in O(n2 ) time by more complex algorithms [33]. But the best
serial solution of the matrix chain ordering problem known is Hu and Shing's O(n lg n) algorithm
[19, 21, 22]. It has been conjectured that Hu and Shing's algorithm is optimal [28].
Finally, variations of the string editing problem are often solved with dynamic programming
algorithms [11], and parallel polylog time solutions of these problems use O(n2 ) processors [1, 2, 26].
However, these string edit problems dier from the three in Subsection 1.1.1 and have elementary
O(n2 ) sequential dynamic programming solutions. In particular, Apostolico et al. [1] use divide
and conquer techniques to nd shortest paths in special planar digraphs in O(lg n) time using
n2 =lg n processors. Ibarra, Pong and Sohn [26] also use a graph characterization to solve such
problems achieving similar results. Aggarwal and Park [2] also employ graph characterizations of
these problems, but in addition they use properties of certain monotone arrays to nd shortest
paths.
1.1.3 Structure of this Paper
Section 2 gives the computational assumptions of this paper.
Section 3 denes the balanced minimum cost parenthesization problem (BPP). This problem
models variations of the string edit problem. Finding shortest paths in planar weighted digraphs
solves the BPP.
Section 4 generalizes the BPP to the minimum cost parenthesization problem (MPP). The MPP
models problems such as the three in Subsection 1.1.1 and it can be solved by nding a shortest
path in special non-planar weighted digraphs.
Section 5 focuses on the matrix chain ordering problem (MCOP), a special case of the MPP.
Here, a list of weights represents the dimensions of the matrices and combinations of these weights
2
make up the edge weights of the corresponding graph. The convex polygon triangulation problem
is also a special case of the MPP. Alternatively, for the convex polygon triangulation problem a list
of weights represents vertices of a convex polygon. Throughout this paper we focus on the MCOP
since solutions to the MCOP are often given as standard examples of the dynamic programming
paradigm. We model these weights and related graph nodes by the nesting levels of parentheses,
using the all nearest smaller value problem of Berkman et al. [4, 5].
Section 6 contains a parallel approximation algorithm for the MCOP. This algorithm does a
linear amount of work. Intuitively, this algorithm works by removing relatively heavy weights so
that the remaining weight list closely approximates an optimal solution.
Section 7 gives the polylog time parallel algorithm for the MCOP and the convex polygon
triangulation problem. It works in O(lg3 n) time using n3 =lg n processors. This algorithm works
by isolating atomic subgraphs and then connecting them to form a tree. Finally, a tree contraction
algorithm computes a shortest path.
2.1 Denitions
This section contains denitions of structures that model associative products. With these deni-
tions, problems including the three in Subsection 1.1.1 can be suitably addressed.
2.1.1 Computational Assumptions
A weighted semigroupoid (S; R; ; pc) is a semigroupoid (S; R; ) with a non-negative product cost
function pc such that if (ai ; ak ) 2 R then pc(ai ; ak ) is the cost of evaluating ai ak . The minimal
cost of evaluating an associative product ai ai+1 ak is denoted by mp(i; k). That is,
mp(i; k) = imin
j<k
f mp(i; j ) + mp(j + 1; k) + f (i; j; k) g
where f (i; j; k) = pc(ai ai+1 aj ; aj +1 aj +2 ak ).
Given a weighted semigroupoid and an associative product a1 a2 an the problem of
nding mp(1; n) is the minimum parenthesization problem on a weighted semigroupoid (MPP).
Assume a globally accessible 4-tuple (S; R; ; pc) represents a weighted semigroupoid. Represent
pc; R; and by the array, f (1::n; 1::n; 1::n), where
(
f (i; j; k) = pc(ai aj ; aj+1 ak ) if (ai aj ) (aj+1 ak ) 2 S
1 otherwise
Given ai and aj , assume that both ai aj and pc(ai; aj ) can be computed in constant time.
The paper assumes the standard elementary matrix multiplication algorithm. In addition, it
assumes the cost of multiplying a matrix of dimension wi wj by one of dimension wj wk is
wi wj wk operations. Also, the convex polygon triangulation problem assumes that a triangle with
the vertices wi; wj and wk has cost wiwj wk .
This paper is about nding the minimum number of operations necessary to solve optimization
problems. It does not address solving the optimization problems themselves. For instance, in
solving the MCOP we derive the minimal number of operations needed to multiply n matrices. (In
addition, we compute the actual optimal ordering of the matrices.) But we do not consider whether
these matrices are to be multiplied sequentially or in parallel.
3
3.1 The Balanced Minimum Cost Parenthesization Problem
This section denes a subproblem of the MPP called the balanced parenthesization problem on
a weighted semigroupoid (BPP). We use this problem to develop the MPP and to discuss some
previous results.
S1;n is the start symbol

Si;i+1 ! i j i i + 1 = j and i + 1 n
Si;j ! i (Si+1;j ) j (Si;j ,1) j i i < j , 1
Figure 3.1: The Grammar L1

Any string derived from L1 is called a balanced parenthesization of n elements.
For every weighted semigroupoid with a balanced associative product of n elements we can
construct a corresponding weighted digraph. As we shall see, nding a shortest path in such a
graph solves the BPP for the given associative product.
Denote vertices by (i; j ), where 1 i j n, and edges by !, " or %. Edge (i; j ) " (i , 1; j )
represents the product ai,1 (ai aj ), therefore it weighs f (i , 1; i , 1; j ). Similarly, (i; j ) !
(i; j + 1) represents the product (ai aj ) aj +1 and it weighs f (i; j; j + 1). Also, % represents
an edge from (0; 0) to (i; i) for all i; 1 i n.
Denition 1 Given an n-element weighted semigroupoid, the graph Gn = (V; E ) has vertices,
V = f(i; j ) : 1 i j ng [ f(0; 0)g
and unit edges,
E = f(i; j ) ! (i; j + 1) : 1 i j < ng [
f(i; j ) " (i , 1; j ) : 1 < i j ng [
f(0; 0) % (i; i) : 1 i ng
and a weight function W where
W ((i; j ) ! (i; j + 1)) = f (i; j; j + 1) 1ij<n
W ((i; j ) " (i , 1; j )) = f (i , 1; i , 1; j ) 1 < i j n
W ((0; 0) % (i; i)) = 0 1in
For example, see the graph G4 in Figure 3.2.

Given a weighted semigroupoid, n2 processors can construct a corresponding Gn graph in con-
stant time, since each vertex has indegree and outdegree of at most two.
The following restricted instance of the matrix chain ordering problem is a special case of the
BPP: Let \" denote matrix multiplication in the four-matrix instance, M1 M2 M3 M4 of the
BPP. The problem is to minimize the cost of multiplying this chain while excluding the product
(M1 M2 ) (M3 M4 ), since it is not a balanced parenthesization.
Given a vertex (i; j ) of Gn nding a shortest path from (0; 0) to (i; j ) solves the minimum cost
parenthesization problem for the balanced associative product ai ai+1 aj .
4
(1,1) (1,2) (1,3) (1,4)
(2,2) (2,3) (2,4)
(3,3) (3,4)
(0,0) (4,4)
Figure 3.2: The Balanced Weighted Digraph G4
Theorem 1 Finding a shortest path from the vertex (0; 0) to vertex (1; n) in Gn solves the minimal
cost balanced parenthesization problem for an associative product of n elements.
A proof is by induction on n.
Since Gn has O(n2 ) vertices, computing a minimum path in O(lg2 n) time by a parallel matrix
multiplication based minimum path algorithm takes n6 =lg n processors. With specialized minimum
path algorithms the processor complexity improves dramatically [1, 2, 26]. These methods can
compute a shortest path in O(lg2 n) time with n2 =lg n processors on a CREW PRAM.
In [1, 2, 26] dynamic programming problems are also transformed into graph search problems;
these variations of the BPP are used to solve string edit problems.
4.1 The Minimum Cost Parenthesization Problem

To represent these split parenthesizations we add edges to Gn called jumpers. Denote a horizontal
jumper by =), and a vertical jumper by *. The vertical jumper (i; j ) * (s; j ) is i , s units long
and the horizontal jumper (i; j ) =) (i; t) is t , j units long, where all non-jumper edges are of
length 1. See Figure 4.3.
mp(j+1,k) + f(i,j,k)
(i,j) (i,j+1) (i,k-1) (i,k)

Figure 4.3: A Horizontal Jumper with its Associated Weight
The jumper (i; j ) =) (i; k) represents the product (ai aj ) (aj +1 ak ) and this jumper
5
weighs sp(j + 1; k) plus f (i; j; k). When nding a shortest path to (i; k) we take it on faith that a
shortest path to (j +1; k) will be computed in time for sp(j +1; k) to be added to f (i; j; k). Similar
observations hold for vertical jumpers.
Denition 2 The weighted digraph Dn = (V; E [ E 0), is a weighted digraph Gn = (V; E ) together
with the jumpers,
E 0 = f(i; j ) =) (i; t) : 1 i < j < t ng [
f(s; t) * (i; t) : 1 i < s < t ng
and each jumper has weight
W ((i; j ) =) (i; t)) = sp(j + 1; t) + f (i; j; t) 1<i<j<tn
W ((s; t) * (i; t)) = sp(i; s , 1) + f (i; s , 1; t) 1 i < s < t n
For example, see the graph D4 in Figure 4.4.
The following instance of the matrix chain ordering problem is a special case of the MPP: Let
\" denote matrix multiplication in the four-matrix instance, M1 M2 M3 M4 , of the MCOP.
Say these matrices are of dimensions 5 10; 10 3; 3 20; and 20 6, respectively. In this case,
the optimal product of matrices M1 ; M2; and M3 is (M1 M2) M3 . But this is not a well-formed
subproduct of the optimal matrix product of all four matrices, that is (M1 M2 ) (M3 M4 ). This
apparent lack of greediness seems to make techniques such as those of [1, 2, 26], fail to work for the
MCOP.
Note the similarity of a Dn graph and a classical dynamic programming table, T , for the matrix
chain ordering problem. The value of sp(i; k) in a Dn graph is the same as T [i; k] in the equivalent
dynamic programming table.
Calculating a shortest path to (i; k) gives the minimum cost parenthesization of ai ak , for
1 i < k n. So nding a shortest path from (i; k) to (1; n) gives a minimal parenthesization of
a1 ai,1 P ak+1 an where P = (ai ak ).
Theorem 2 Finding a shortest path from (0; 0) to (1; n) in Dn solves the minimal cost parenthe-
sization problem for an associative product of n elements.
There is a straightforward proof by induction on jumper length assuming Theorem 1.
The converse of this theorem also holds. That is, for any optimal parenthesization of a weighted
semigroupoid there is a shortest path in a corresponding Dn graph.
It is not clear how to generalize the shortest path algorithms in [1, 2, 26] to Dn graphs. This is
at least partially due to the jumper's weights; computing these weights seems dicult.
4.1.1 Constructing a Dn Graph
Constructing a Dn graph can be done by starting with a weighted digraph Gn and then performing
incremental path relaxation [11] while adding new jumpers. A shortest path in Dn is found simul-
taneously. We accomplish these two goals by using a variation of a matrix multiplication based all
pairs shortest path algorithm. We refer to the matrix multiplication based all pairs shortest path
algorithm as a (min; +) matrix multiplication algorithm.
Jumper lengths are the basis of the arguments in this subsection.
Although Gn can be constructed in constant time with n2 processors, it does not seem possible
to construct Dn in constant time with as few processors. This is because the weight of a jumper
(i; j ) =) (i; k) cannot be computed until sp(j + 1; k) becomes available.
6
(1,1) (1,2) (1,3) (1,4)
(2,2) (2,3) (2,4)
(3,3) (3,4)
(0,0) (4,4)
Figure 4.4: The Weighted Graph D4
Lemma 1 For all vertices (i; k) in a Dn graph, sp(i; k) can be computed by a path having edges of
length no larger than d(k , i)=2e.
Proof: Suppose that (i; j ) =) (i; k) is in a shortest path to (i; k) and k , j > d(k , i)=2e. Hence,
sp(i; k) = sp(i; j ) + W ((i; j ) =) (i; k))
= sp(i; j ) + sp(j + 1; k) + f (i; j; k)
But W ((j + 1; k) * (i; k)) = sp(i; j ) + f (i; j; k)
so,
sp(i; k) = sp(j + 1; k) + W ((j + 1; k) * (i; k))
The jumper (j + 1; k) * (i; k) is of length j + 1 , i. Therefore, since j + 1 , i + k , j = k , i + 1

and k , j > d(k , i)=2e, we know that j + 1 , i d(k , i)=2e.
On the other hand, a shortest path to (j +1; k) cannot contain a jumper longer than k , (j +1).
Since k , (j + 1) < k , j this lemma follows inductively.
2
The proof of this last lemma leads directly to the following theorem.
Theorem 3 (Duality Theorem) If a shortest path from (0; 0) to (i; k) contains the jumper
(i; j ) =) (i; k), then there is a dual shortest path containing the jumper (j + 1; k) * (i; k).
Now we will show how to construct a Dn graph starting with a Gn graph. For any vertex (i; j )
the value j , i is the distance from (0; 0) to (i; j ). So starting with a Gn graph, one (min; +) matrix
multiplication computes all sp(i; j ) for any (i; j ) where j , i 21 , 1. Also, the minimum distances
7
between all pairs of nodes in Gn up to 2 nodes apart are now available. With this, construct all
jumpers of length 2. For length 2 horizontal jumpers, this is done by relaxing [11] them with the
two horizontal edges they are directly above. In particular, having more than one path from a
source to a destination relaxing is the process of nding the minimum of these paths. In Figure
4.5, the min operations are path relaxations.
Another, (min; +) matrix multiplication gives sp(i; j ) for all (i; j ) such that j , i 22 , 1.
Continue this process inductively. Suppose that for all (i; j ), where j , i 2r , 1, the values
sp(i; j ) have been computed. At the same time the minimum distances between all pairs of nodes
in Gn up to 2r nodes apart have been computed. Now we can compute all jumpers of lengths
ranging from 2r,1 + 1 through 2r and then relax them with the appropriate straight paths of
lengths from 2r,1 + 1 through 2r . Performing another matrix multiplication makes the values for
sp(i; j ), for all (i; j ) where j , i 2r+1 , 1, become available.
Lemma 2 Assume all shortest paths have been calculated between each pair of vertices 2r, units 1
apart. Suppose, for all (j; k) where k , j 2r, , 1, the value sp(j; k) is available. Then we can
1
calculate sp(i; t) with two (min; +) matrix multiplication for all (i; t) such that t , i 2r , 1.
Proof: Suppose sp(j; k) is available for all (j; k) where k , j < 2r, . Also, assume that the all pairs
1
shortest paths have been calculated for all pairs of vertices of distance up to 2r,1.
Now construct every jumper in Dn of length 2r,1 or smaller. Placing these jumpers, at the
same time relaxing these paths, in Dn and performing one (min; +) matrix multiplication supplies
sp(i; t) for all t , i < 2r by Lemma 1.
2
The algorithm in Figure 4.5 is a modied (min; +) matrix multiplication all pairs shortest
path algorithm. It is modied in that straight minimum paths are dynamically relaxed with new
jumpers. This algorithm is basically the same as Rytter's algorithm [31].
for all 1 i; j; u; v n in parallel do

W [(i; j ); (u; v)] 1
W [(0; 0); (i; i)] 0
for all 1 i; j n in parallel do
if j + 1 n then W [(i; j ); (i; j + 1)] f (i; j; j + 1)
if i , 1 1 then W [(i; j ); (i , 1; j )] f (i; i , 1; j )
loop 2dlg ne times
for all (0; 0) (i; j ) (s; t) (u; v) (n; n) in parallel do
W [(i; j ); (u; v)] minf W [(i; j ); (s; t)] + W [(s; t); (u; v)]; W [(i; j ); (u; v )] g
W [(i; j ); (u; j)] minf W [(i; j ); (u; j )]; W [(0; 0); (u; i , 1)] + f (u; i , 1; j ) g
W [(i; j ); (i; v)] minf W [(i; j ); (i; v)]; W [(0; 0); (j + 1; v )] + f (i; j; v) g
Figure 4.5: Modied (min; +) All Pairs Shortest Path Algorithm
Theorem 4 Immediately after iteration r, for 1 r 2dlg ne, the algorithm in Figure 4.5, has
computed sp(i; t) for all t , i 2r , 1.
Proof: The correctness of the rst two loops is straightforward, so consider the third loop.
The outer loop iterates 2dlg ne times because of Lemma 2.
8
The rst line of the inner loop is the standard (min; +) matrix multiplication all pairs shortest
path algorithm; provided we do not violate the adjacency matrix, its correctness follows from the
correctness of such shortest path algorithms. An adjacency matrix has been violated when it can
no longer be used to correctly determine shortest paths using the same (min; +) shortest path
algorithm.
Justication of the next two lines follows by induction on jumper length. We only consider
horizontal jumpers here, vertical jumpers follow immediately.
After the rst iteration of the algorithm sp(j; k) has been correctly calculated for all (j; k) where
k , j < 2. After iteration r; for some r 1, sp(j; k) has been calculated for all (j; k) such that
k , j < 2r . At the beginning of iteration r +1, sp(j; k) is calculated for all (j; k) such that k , j < 2r
by the inductive hypothesis. During iteration r +1 the algorithm computes shortest paths of length
between 2r + 1 and 2r+1 . By Lemma 2, this gives sp(i; t), for all (i; t) such that t , i < 2r+1 .
Next in line 2, during iteration r + 1 the relaxation of straight vertical unit paths of length
ranging from 2r + 1 to 2r+1 with vertical jumpers occurs. This is done by replacing W [(i; j ); (i; t)]
with
minf W [(i; j ); (i; t)]; W [(0; 0); (j + 1; t)] + f (i; j; t) g
for all (j + 1; t) such that t , (j + 1) < 2r+1 . This does not violate the adjacency matrix since
any new distance cost may be replaced by smaller, but positive, values because they have not been
used yet in the calculation of other shortest paths. So relax the paths of length from 2r,1 + 1 to
2r with the appropriate jumpers of the same lengths.
2
The algorithm in Figure 4.5 solves the three problems of Section 1.1.1 in O(lg2 n) time with
n =lg n processors.
6
Any cost function f (i; j; k) can be used to solve the MPP as long as the shortest paths in the
appropriate Dn graph remain the same.
5.1 The Matrix Chain Ordering Problem

This section discusses a subproblem of the MPP. In this subproblem the associative product costs
are computed as f (i; j; k) = wiwj +1 wk+1 where ws = wt i s = t. All weights are positive integers
and they are distinct only for convenience. With these associative product costs the MPP models
the matrix chain order problem and the minimum cost triangulation of convex polygons. We derive
the intuition of this section from the MCOP.
From here on Dn graphs are the central focus since the balanced version of this problem has
been solved eciently in [1, 2, 26].
Letting be a cyclic rotation function,
Theorem 5 (Deimel and Lampe [14]; Hu and Shing [21]) Given an instance of the MCOP
with the weight list l1 = w1; w2; : : :; wn+1 and cyclically rotating it getting l2 =
w (1); w (2); : : :; w (n+1), then nding an optimal parenthesization with l2 provides an optimal solu-
tion to the original instance of the MCOP with l1.
Directly from this last theorem and Theorem 2 we have the next corollary.
Corollary 1 Finding a shortest path in a Dn graph whose edge weights are constructed from either
a weight list l or a cyclic rotation of l gives an optimal solution to the MCOP.
In the rest of this paper let w1 denote the smallest weight in any weight list.
9
5.1.1 Matrix Dimensions as Nesting Levels of Matching Parentheses
This subsection discusses an algorithmic technique that is central to the rest of the paper. In-
tuitively, this technique allows matrix dimensions to approximate the nesting level of matched
parentheses.
Given an associative product where the level of each parenthesis in an optimal product is known,
we can compute the parenthesization of this associative product by solving the following all nearest
smaller value (ANSV) problem [4, 5]: Given w1; w2; : : :; wn drawn from a totally ordered set, for
each wi nd the largest j , where 1 j < i, and smallest k where i < k n, so that wj < wi and
wk < wi if such values exist.
Note that a weight list may contain weights without any matches. For example, in a monotone
weight list there are no ANSV matches for any weight.
Now we investigate the eect monotonicity has on weight lists. This is interesting because we
can solve any instance of the MCOP
P very eciently if its weight list is monotonic.
As in [19] let kwi : wk k = kj =,i1 wj wj +1 , where all such pair-wise sums can be computed by
performing one parallel partial prex.
Theorem 6 (Hu and Shing [19]) The vector F [i] = kw : wik can be computed by a parallel
1
partial prex.
As suggested by this last theorem, we can compute kwi : wk k by performing one subtraction
kwi : wk k = F [k] , F [i]. Therefore we can calculate the cost of any horizontal or vertical unit path
in Dn by multiplying such a sum by the appropriate weight.
Let the ith row of Dn be all vertices of the form (i; k) for 1 k n and the kth column be
vertices of the form (i; k) for 1 i k. So, the unit path along the ith row costs wi kwi+1 : wn+1 k
where the unit path in the kth column costs wk+1 kw1 : wk k.
Lemma 3 Given a Dn graph with a weight list containing a sublist of increasing weights wi <
wi < < wk , in Dn the unit path (0; 0) % (i; i) ! ! (i; k) is cheaper than the unit path
+1 +1
(0; 0) % (k; k) " (k , 1; k) " " (i; k).
Proof: Column k costs wk kwi : wk k = wk
Pk, w w and row i costs w Pk w w
1
P +1 j i Pj j
+1 =
P
+1 i j i j j = +1 +1
which is wi k,1 w w . Clearly, wk+1 > wi and k,1 w w > k ,1 w w , hence the
j =i j +1 j +2 j =i j +1 j +2 j =i j j +1
lemma follows.
2
Given a sublist of decreasing product weights wi > wi+1 > > wk+1 then the horizontal unit
path (0; 0) % (i; i) ! ! (i; k) is more expensive than (0; 0) % (k; k) " (k , 1; k) " " (i; k).
The two unit paths, A = (i; i) ! ! (i; k) and B = (i + 1; i + 1) ! ! (i + 1; k) " (i; k)
are adjacent horizontal unit paths.
Lemma 4 (Row Trade O Lemma) Given two adjacent horizontal unit paths A and B going
to (i; k), which cost wi kwi : wk k and wi kwi : wk k + wiwi wk respectively, one of the
+1 +1 +1 +2 +1 +1 +1
following conditions may hold:
if wi < wi and wi < wk then A is cheaper
+1 +2 +1
if wi > wi and wi > wk then B is cheaper

+1 +2 +1
10
Proof: Since A costs wikwi+1 : wk+1k and B costs wi+1 kwi+2 : wk+1 k + wiwi+1 wk+1 , the dierence
in their costs is
d = wi kwi+1 : wk+1k , wi+1kwi+2 : wk+1k , wiwi+1wk+1:
That is, d = (wi , wi+1 )kwi+2 : wk+1 k +(wi+2 , wk+1 )wi wi+1 and when wi > wi+1 and wi+2 > wk+1
then d is positive, hence B is cheaper.
The other case follows similarly.
2
There is a similar Column Trade O Lemma, and its proof is almost identical.
5.1.2 Critical Nodes in Dn
This subsection relates the ANSV problem to the Row and Column Trade O Lemmas. By solving
the ANSV problem it is possible to isolate certain nodes that are central to nding shortest paths
in Dn graphs.
The next denition is originally due to Hu and Shing [19], although they present it in a dierent
framework: In Dn , a critical node is a node (i; k) such that wj > maxfwi; wk+1 g for i < j k.
Note that there are no critical nodes of the form (j; j ) for 1 j n.
The row and column equations are,
R = (wi , wi+1)kwi+2 : wk+1k + (wi+2 , wk+1)wiwi+1 k + 1 6= i + 2
C = (wk+1 , wk)kwk,1 : wik + (wk,1 , wi)wk+1wk k , 1 6= i
Lemma 4 elucidates a key observation about the row and column equations. Particularly, only the
order relationships of four weights is enough to determine whether equation R will be necessarily
positive or necessarily negative. A similar observation holds for C . This cannot be done for either
C or R when both conditions wi < wj and wj > wk+1, for i < j k, hold. Generalizing for the
possibility of an edge-connected path of critical nodes and associated row and column equations we
say that C and R are order indeterminate when wj > maxfwi ; wk+1g for all j; i < j k. Critical
nodes determine where both the row and column equations fail to provide any information. Further,
critical nodes indicate where magnitude can overtake order in the row and column equations.
By solving the ANSV problem we can compute all critical nodes of a Dn graph. Although,
Berkman et al. proved the next theorem for ANSV matches, it follows as a direct result of the
relationship of matches in the weight list and critical nodes in Dn graphs.
Theorem 7 (Berkman et al. [4]) All critical nodes can be computed in O(lg lg n) and O(lg n)
time using n=lg lg n and n=lg n processors, respectively.
Critical nodes (i; s) and (j; t) are not compatible when s , i = t , j and there exists some
vertex other than (0; 0) that can reach both (i; s) and (j; t) by a unit path. The next theorem
was originally proved by Hu and Shing in a dierent framework and follows from the properties of
ANSV matches.
Theorem 8 (Hu and Shing [21]) In any instance of the matrix chain ordering problem all crit-
ical nodes are compatible.
Proof: Suppose Dn represents an instance of the matrix chain ordering problem with two non-
compatible critical nodes (i; s) and (j; t). Since these are critical nodes, it must be that either
j < s < t or i < j < s. In either case we get a contradiction since both wk > maxfwi; ws+1g, for
11
i < k < s + 1, and wr > maxfwj ; wt+1g, for j < r < t + 1, cannot hold.
2
A Dn graph can have as many as n , 1 and as few as zero critical nodes. For example, a monotone
weight list does not have any critical nodes.
The next lemma follows from the ANSV characterization of critical nodes.
Lemma 5 (Hu and Shing [21]) Any Dn graph has at most n , 1 critical nodes.
Proof: Take a list of n + 1 weights w1; w2; : : :; wn+1 making up the edge weights of some Dn graph.
A representative weight for the critical node (j; k) is the weight wj with match [wi; wk+1]. Each
unique critical node has a unique representative weight. Further, a list of n + 1 weights can have at
most n , 1 representative weights, since neither w1 nor wn+1 can be representative weights. Hence,
there can be at most n , 1 critical nodes.
2
A jumper (i; j ) =) (i; v ) is said to include all critical nodes that are in some path r from (0; 0)
to (j + 1; v ), where r may contain jumpers too. Suppose there is a path r from (0; 0) to (j + 1; v )
that includes all critical nodes in the set f (k; u) g, for all j + 1 k < u v . Then we say any
path q , containing only this one jumper (i; j ) =) (i; v ), includes all critical nodes that are vertices
of q and all critical nodes in r. Now generalize this notion recursively to paths with more than one
jumper. Going even further, we can prove the next theorem.
Theorem 9 In a Dn graph there is at least one path from (0; 0) to (1; n) that includes all critical
nodes.
A proof of this theorem follows from the fact that all ANSV matches are compatible and each match
represents a pair of parentheses. If a solution of the ANSV problem gives a parenthesization of all
associative elements, then we are done. So consider the case where a solution of the ANSV problem
only provides a partial parenthesization of the associative product. In this case, any arbitrary but
legal completion of the parenthesization describes a path in the associated Dn graph.
5.1.3 Canonical Subgraphs of Dn
This subsection investigates the interaction between monotonicity and critical nodes. Weight lists
can be broken into monotone sublists and sublists that have ANSV matches. Such sublists naturally
lead to subgraphs that are useful for nding shortest paths.
(i,t)
(j,k)
i;t) Graph without (0; 0) and with No Jumpers Shown

Figure 5.6: A D((j;k )
A subgraph D(i; t) of Dn is all vertices and edges of Dn that can reach (i; t) by a unit path; this
includes (0; 0). Formally, D(i; t) is all vertices (j; k) 2 V [Dn ] such that i j k t in addition to
12
the associated edges. We also include (0; 0) in each subgraph. Also, when a graph D(i; j ) is such
that wi ; : : :; wj +1 is monotonic, we say the graph D(i; j ) is monotonic.
Critical nodes may form a unit edge-connected maximal path p that isolates a subgraph. Sup-
pose that p forms a contiguous maximal path starting at some critical node (j; k), where k , j 1
i;t) . Saying the unit path p is maximal
and terminating at the critical node (i; t), then p isolates D((j;k )
means that if there is any unit edge-connected path q of critical nodes that p is a subpath of, then
p = q.
Figure 5.7: Several Canonical Subgraphs and Their Weight List

A canonical subgraph D((j;ki;t) is the subgraph containing the maximal contiguous edge-connected
)
path of critical nodes that begins at critical node (j; k) and terminates at critical node (i; t). The
canonical subgraph D((j;k i;t) has vertex set V [D(i; t)] , V [D(j +1; k , 1)] [f(0; 0)g and the associated
)
edges, see Figure 5.6. We write D(i;t) for a canonical subgraph of the form D((j;j i;t) , and p denotes
+1)
the path of critical nodes in a canonical subgraph.
For instance, a canonical graph D(i;t) is a graph that in some sense isolates the weight list
wi ; wi+1; : : :; wt+1 where wk > maxfwj ; wsg for all k, such that i j < k < s t + 1.
In Figure 5.7 the weight list is represented as the contour below the Dn graph. In this contour,
four key ANSV matches are represented by dotted lines. The four corresponding critical nodes are
circled in the related Dn graph.
Canonical subgraphs are easily distinguishable because of properties of their critical nodes. So
by Theorem 7, we can nd all canonical subgraphs in O(lg lg n) time using n=lg lg n processors.
Notice that there are only two basic kinds of canonical subgraphs.
Theorem 10 (Monotonicity Theorem) Given a D(i; u) graph with a monotone list of weights
wi < wi+1 < < wu+1, the shortest path from (0; 0) to (i; u) is along the straight unit path
(0; 0) % (i; i) ! (i; i + 1) ! ! (i; u).
13
Proof:
The proof is in two cases. In the rst case, the theorem is shown to be true for Gn graphs while
the second case shows that the theorem also holds for Dn graphs.
Case i: A unit path in G(i; u)
Let (0; 0) % (i; i) ! (i; i + 1) ! ! (i; u) be a horizontal path with the associated
vertical path (0; 0) % (u; u) " (u , 1; u) " " (i; u).
By Lemma 3, the horizontal path is cheaper. Inductive application of the Row and
Column Trade O Lemmas (see Lemma 4) completes this case.
Case ii: A path with jumpers in D(i; u)
This case only addresses horizontal jumpers, since vertical jumpers follow by symmetry
and Lemma 3.
Given an arbitrary path r from (0; 0) to (i; u) in D(i; u) that includes one jumper, the
unit path from (i; i) to (i; u) is shown to be cheaper than r. Suppose the jumper in this
path is (k; s) =) (k; t), for i < k < s < t < u. Since this is the only jumper in r, the
path to (k; s) is (k; k) ! ! (k; s).
Therefore, this problem has been reduced to nding a shortest path to (k; t). For, if the
shortest path to (k; t) is a unit path then r cannot be a shortest path by case i, because
(k; s) =) (k; t) is the only jumper in r.
Still, since W ((k; s) =) (k; t)) = sp(s + 1; t) + f (k; s; t) we must consider jumpers in
the shortest path to (s + 1; t). Say a shortest path to (s + 1; t) is a unit path, then by
case i it is the straight unit path (s + 1; s + 1) ! ! (s + 1; t). Since f (k; s; t) =
wk ws+1 wt+1 and W ((k; s) ! (k; s + 1)) = wk ws+1 ws+2, it must be that t > s + 1 so
f (k; s; t) > W ((k; s) ! (k; s + 1)). Intuitively, we are trading the associative product
cost f (k; s; t) for the rst edge of (k; s) ! ! (k; t). This rst unit edge is cheaper
than the associative product cost. With this, because (k; s + 1) ! ! (k; t) costs
wk kws+2 : wt+1 k and the path (s + 1; s + 1) ! ! (s + 1; t) costs ws+1 kws+2 : wt+1k
and since wk < ws+1 the theorem holds.
Otherwise, say there is a horizontal jumper in r to (s + 1; t). Apply this case again
inductively, until there is a jumper that derives its weight from a straight unit path.
We are trading each jumper's associative product cost for the cost of the rst unit
edge in the associated unit path. These rst unit edges are always cheaper than the
related associative product cost. Eventually, the shortest path to (k; t) is shown to be
(k; k) ! ! (k; t). Hence, r is a unit path, but this means that it must be the
straight unit path in row i by case i.
Handling a path with more than one jumper is straightforward. Inductively applying these two
cases to jumpers successively farther away from (0; 0) completes the proof.
2
This theorem also holds for a monotone list of weights having the relation wi > wi+1 > >
wj +1 where the shortest path is (0; 0) % (j; j ) " (j , 1; j ) " " (i; j ). Therefore, if the list of
weights wi; wi+1; : : :; wj +1 is monotone then we do not have to construct any jumpers in D(i; j ).
We say that D(i; t) intersects D(j; v ) i V [D(i; t)] \ V [D(j; v )] ,f(0; 0)g =
6 ;. From here on when
we refer to monotone subgraphs we assume that they do not intersect any canonical subgraphs.
14
Theorem 11 If D(i; t) does not intersect any canonical graphs and has no critical nodes, then the
weight list wi ; wi+1; : : :; wt+1 is monotonic.
Proof: Say there are no critical nodes in D(i; t). Then there is at most one weight wj +1 where
j + 1 6= 1 such that wi > wj > wj+1 < wj+2 < < wt+1, otherwise D(i; t) would contain
critical nodes. But in this case, since w1 = min1in+1 fwig it must be that D(1; j +1) must contain
the critical node (1; j ), which means that D(i; t) would intersect with a canonical subgraph.
On the other hand, this means both the row and column equations begin and remain indeter-
minate so the following fact holds.
Fact 1: For all (j; j +2) 2 V [D(i; t)], either wj < wj +1 < wj +2 < wj +3 or wj > wj +1 >
wj +2 > wj+3 .
Applying the row or column equations to the nodes (j; j + 2), for all j; i < j t, establishes this
fact. For instance, let R = (wj , wj +1 )kwj +2 : wj +3k + (wj +2 , wj +3 )wj wj +1 , then R is order
determinant so it must be either wj < wj +1 and wj +2 < wj +3 , or wj > wj +1 and wj +2 > wj +3 , but
not both.
Suppose wj < wj +1 and wj +2 < wj +3 , and assume that wj +1 > wj +2 , otherwise if wj +1 < wj +2
then wj < wj +1 < wj +2 < wj +3 so we are done. This means wj < wj +1 and wj +1 > wj +2 , therefore
wj +1 > maxfwj ; wj+2g and this indicates (j; j + 1) is a critical node. This is a contradiction, hence
it must be that wj < wj +1 < wj +2 < wj +3 .
Using fact 1, the theorem follows inductively.
2
A lack of critical nodes implies the existence of a monotone subgraph. Just the same, a lack
of ANSV matches in a section of a weight list indicates a monotone sublist. Assuming that D(i; t)
does not intersect with any canonical graphs we have the next corollary.
Corollary 2 If D(i; t) contains no critical nodes, then there are no jumpers in a shortest path from
(0; 0) to (i; t). Moreover, if D(i; t) contains no critical nodes then the shortest path from (0; 0) to
(i; t) is a straight unit path.
A proof of this follows immediately from Theorems 10 and 11.
Take a canonical subgraph D(1;m), where all critical nodes have been found and form a unit
edge-connected path p. Removing the nodes and adjacent edges of p splits D(1;m) in two. These
two pieces of D(1;m) are U for upper and L for lower. See Figure 5.8.
(1,n)
w1
U
wi
L
wj
wn+1
Figure 5.8: A Dn Graph Split by a Path of Critical Nodes, Arrows Point Toward Smaller Weights
Let D(1; s) be the maximal well-formed subgraph of U and let D(s +1; m) be the maximal well-
formed subgraph of L. By the maximality of D(i; t) in U we mean that for any other well-formed
subgraph D(j; k), if D(j; k) U then D(j; k) D(i; t).
15
Theorem 12 (Modality Theorem) If D(1; s) U and D(s + 1; n) L, where both D(1; s) and
D(s + 1; m) are maximal, then w < w < < ws and ws > ws > > wm > wm .
1 2 +1 +2 +3 +1
Proof: The path p splits D ;m into U and L where D(1; s) and D(s +1; n) are maximal, so (s; s +1)
(1 )
is a critical node. Therefore, we know that ws > maxfws ; ws g, so it must be that ws < ws
+1 +2 +1
and ws+1 > ws+2 .
By Theorem 11, the weight lists w1; w2; : : :; ws; ws+1 and ws+1 ; ws+2 ; : : :; wm; wm+1 are both
monotonic. Thus ws < ws+1 , so it must be that w1 < w2 < < ws < ws+1 . In addition, because
ws+1 > ws+2, we know that ws+1 > ws+2 > > wm > wm+1.
2 i;t) canonical graph, then both w < w < < w and w > w
Take a D((j;k ) i i+1 j k k+1 > > wt+1
follow from Theorem 12.
6.1 A Parallel Approximation Algorithm for the MCOP

Combining results of Chin [10], and Hu and Shing [20] with those of Berkman et al. [4] this section
develops an O(lg lg n) time and n=lg lg n processor approximation algorithm for the MCOP. This
algorithm approximates the MCOP to within 15:5% of optimality. In addition, the processor time
product of this algorithm is linear.
The approximation algorithm in this section consists of two stages. The rst stage isolates rela-
tively heavy weights by nding matrix products that must be in an optimal parenthesization. The
isolation of such heavy weights provides optimal substructures that are in optimal superstructures|
essentially giving a converse to the principle of optimality. The second stage is simply a greedy
approach for nding a parenthesization once we have applied the rst stage of the algorithm.
In this section, as before, by Corollary 1 rotate any given weight list so that w1 is the smallest
weight. For the next theorem let wi ; wi+1; and wi+2 be three adjacent weights in a weight list
of an instance of the MCOP where wi < wi+1 and wi+1 > wi+2 which together means that
wi+1 > maxfwi; wi+2g.
Theorem 13 (Hu and Shing [20]) If
w1wi wi+2 + wiwi+1 wi+2 < w1 wiwi+1 + w1wi+1wi+2 (6.1)
then the product (ai ai+1 ) is in an optimal parenthesization.
Proof of this Theorem is left to the literature, see [10] and [20] for dierent proofs. When
wi+1 > maxfwi; wi+2g fails to hold Equation 6.1 cannot hold, so there is no gain in assuming
wi+1 > maxfwi; wi+2g.
Corollary 3 If Equation 6.1 holds, then wi > maxfwi; wi g.
+1 +2
A proof follows from Equation 6.1 with w = min in fwig, and w wi (wi , wi ) +
1 1 +1 1 +2 +1
wi wi (wi , w ) < 0 so it must be that wi , wi < 0. Also, starting with Equation 6.1
+1 +2 1 +2 +1
again and reassociating, we get w wi (wi , wi ) + wi wi (wi , w ) < 0 so wi , wi < 0.
1 +2 +1 +2 +2 1 +1
Unfortunately, the converse of this last corollary is not true. An ANSV match may not represent
a minimal parenthesization in the MCOP. But any product that is in a minimal parenthesization
by way of Equation 6.1 has been isolated by some match. Therefore, using the ANSV problem, the
values in the weight list approximate the optimal level of parentheses.
16
A list of weights is reduced i for all weights, say wi+1 , with ANSV match [wi; wi+2] Equation
6.1 fails to hold, [10]. A reduced weight list may be non-monotonic.
Now we generalize Equation 6.1. Suppose by Theorem 13 that (ai ai+1 ) is in an optimal
parenthesization. Now we can apply Theorem 13 to the list l = w1 ; : : :wi,1 ; wi; wi+2; wi+3; : : :; wn+1
in the same way. That is, if wi > maxfwi,1; wi+2g and w1wi,1 wi+2 + wi,1wi wi+2 < w1wi,1 wi +
w1wi wi+2, then the parenthesization given by the solution of the ANSV problem on l indicates that
(ai,1 ai+1 ) is optimal.
Given the weight list l1 = w1 ; w2; : : :; wn+1 the approximation algorithm is [10, 18, 20]:
1. Reduce the weight list l1 giving the weight list l2 , renumbering l2 to be l2 = w1; w2; : : :; wr+1
where w1 = min1ir+1 fwi g.
2. If l2 has more than two weights, then compute the depths of the parentheses for the linear
product (( (a1 a2) ) ar ) of cost w1kw2 : wr+1k. With this, make the parenthesis
discovered in Step 1 the appropriate amount deeper.
The depth of the parentheses determines the order to multiply the matrices.
Next we develop techniques to run this algorithm eciently in parallel. Intuitively, by Theorem
13, if the match [wi,1; wi+1] represents the nesting level of two parentheses in an optimal product,
then we have characterized wi's in uence. Remove wi from the weight list and recursively apply
Theorem 13.
Suppose in solving the ANSV problem the weight wj has the match [wi; wk ]. Then, if
w1wi wk + wiwj wk < w1wj wk + w1wiwj (6.2)
and products (ai aj ,1 ) and (aj ak,1 ) are both in an optimal parenthesization, then
the product (ai aj ,1 ) (aj ak,1 ) is also in an optimal parenthesization. Certainly, by
Theorem 13 this is true when i = j , 1 and j = k , 1. In addition, Corollary 3 generalizes to suit
Equation 6.2.
A weight list can be reduced in O(lg lg n) time with n=lg lg n processors on the common-CRCW
PRAM model. We can reduce a weight list with two applications of the ANSV problem as follows.
Given the weight list l1 the next algorithm outputs a reduced weight list. Let A[1 : : :n + 1] be
an array of n + 1 integers all initialized to zero.
1. Solve the ANSV problem on the weight list l1. Next check to see if there are any weights
satisfying the condition described by Equation 6.1. If there are none, then output l1 since it
is reduced, then stop.
2. For all weights wj in l1 that have matches, say [wi; wk ], if wj and wi ; wk satisfy Equation 6.2,
then assign a 1 to A[j ].
3. Now solve the ANSV problem on A[1::n + 1]. If the nearest smaller values of A[j ] are in the
match [A[i]; A[k]], then (ai ak,1 ) is in an optimal parenthesization. Removing all of the
weights isolated by optimal parenthesizations gives a reduced weight list, which is output.
This algorithm produces a reduced weight list and optimal parenthesizations that have been
isolated by the conditions of Equation 6.1.
The rst step of this algorithm is correct by Theorem 13 and Corollary 3. The next theorem
establishes the correctness of the last two steps of the algorithm.
17
Theorem 14 If the ANSV match of A[j ] is [A[i]; A[k]] where i < k, then the product (ai ak, ) 1
is in an optimal parenthesization.
Proof: The array A contains values from the set f0; 1g, so if A[j ] = 0 then A[j ] does not have an
ANSV match. On the other hand, if A[j ] = 1 then A[j ] must have an ANSV match since A[1] = 0
and A[n + 1] = 0.
Now consider the case where A[j ] has match [A[i]; A[k]]. This means that for all t such that
i < t < k, A[t] also has match [A[i]; A[k]]. All of these matches are compatible, consequently all
A[t] = 1 for i < t < k are nested ANSV matches. This means there must be at least one list
of three consecutive weights, say wt; wt+1; and wt+2 , that satisfy Equation 6.1. Now remove the
middle such weight, wt+1, and recursively continue this argument knowing that Equation 6.2 has
marked the other such weights.
2
By Theorem 7, the three steps of this algorithm cost O(lg lg n) time using n=lg lg n processors
on the common-CRCW PRAM or in O(lg n) time using n=lg n processors on the EREW PRAM
and the common-CRCW PRAM.
Assume that r + 1 weights remain after reduction. Then renumbering and rotating the list
of remaining weights gives w1; w2; ; wr+1 where w1 = min1ir+1 fwig. The second step of
the approximation algorithm requires that we now form the appropriate linear product with the
remaining matrices.
The depth of the parentheses provides an approximation to within 15:5% of optimal for the
MCOP. This is due to Chandra [9], Chin [10], and Hu and Shing [20].
Theorem 15 (Hu and Shing [20]) If a weight list w ; w ; ; wr is reduced, then the MCOP
1 2 +1
can be solved to within a multiplicative factor of 1:155 from optimal in constant time using n
processors. This is done by choosing the linear parenthesization (( (a1 a2 ) ) ar,1 ) ar .
From this theorem we see that, after a weight list is reduced, choosing a linear parenthesization
with cost w1kw2 : wr+1k gives the matrix product within a multiplicative factor of 1:155 from
optimal.
The matrix chain order problem can be solved to within 15:5% of optimal in O(lg lg n) time
with n=lg lg n processors.
The approximation algorithm given here is another problem whose solution built on the ANSV
problem. This algorithm also shows that only a linear number of entries of a dynamic programming
table give a nice approximate solution to the MCOP. That is, the path of critical nodes in the
canonical subgraphs supply a good approximate solution for the matrix chain ordering problem.
This algorithm is built on the greedy principle more than the dynamic programming paradigm.
In terms of the processor time product, this algorithm is optimal.
7.1 Solving the Matrix Chain Ordering Problem in Parallel

This section gives a polylog time algorithm for solving the matrix chain ordering problem that
uses n3 =lg n processors. Throughout this section canonical subgraphs of the form D(i;j ) are written
as D(1;m) where 1 m n. In addition, assume that Dn contains critical nodes, otherwise by
Corollary 2 there is an immediate exact solution.
Canonical subgraphs can be treated atomically while nding a shortest path in a Dn graph.
Further, using (min; +) matrix multiplication we can join these subgraphs together to form a
shortest path in an entire Dn graph.
18
7.1.1 Shortest Paths Containing No Critical Nodes in Canonical Subgraphs
This subsection culminates in Theorem 18 which states that in a D(1;m) graph shortest paths have
a very rigid structure; this result supplies the rst step for nding shortest paths in canonical
subgraphs. All results in this subsection apply to shortest paths from (0; 0) to (1; m) in D(1;m)
i;v) graphs.
graphs and to shortest paths from (j; k) to (i; v ) in D((j;k )
A path q with one jumper contains no critical nodes i there are no critical nodes in q and there
are no critical nodes in q 's dual path. In particular, a jumper (i; j ) =) (i; k) contains no critical
nodes if both (i; j ) and (i; k) are not critical nodes and there are no critical nodes in a shortest
path contributing to this jumper's weight. This generalizes to paths with more than one jumper.
In our terminology we have the following result of Hu and Shing [21, Corollary 3],
Theorem 16 (Hu and Shing [21]) In any canonical graph, the sum of the two products
wi wj +1wk+1 + wj+1 wj+2wk+1 where i < j < k, cannot contribute to the weight of any shortest
path i wk+1 > wj +1 > wi .
Next is a useful technical lemma.
Lemma 6 In a D (1;m) graph if (i; t) 2 V [U ] then wi < wt+1.
Proof: Since (i; t) 2 V [U ] and i < t, there must be some critical node (i; u) 2 V [p] where t < u.
This means that wj > maxfwi; wu+1g, for all j; i < j u. Since i < t < u it must be that
wi < wt+1.
2
A symmetric argument to that of Lemma 6 shows that wi > wj +1 for all nodes (i; j ) 2 V [L].
Theorem 17 Any shortest path q to some vertex (i; j ) in D ;m) where q contains no critical nodes,
(1
except possibly (i; j ), is a straight unit path.

A proof of this theorem follows from an inductive application of Lemma 4 and Theorem 12.
The last theorem and all previous results of this section also apply to shortest paths from (j; k)
i;v) canonical subgraphs.
to (i; v ) in D((j;k )
Jumpers of the form (i; j ) =) (i; k) such that (j + 1; k) 2 V [p] are very important. Such a
jumper contains at least one critical node, namely (j + 1; k).
Lemma 7 If a horizontal shortest path q to (i; u) 2 V [U ] [ V [p], is such that q has one jumper
(i; j ) =) (i; k) where (j + 1; k) 2 V [p], then q is equivalent to a shortest path to (j + 1; k) followed
by (j + 1; k) * (i; k) ! ! (i; u).
The same holds for any such vertical path. A proof of this lemma follows from Lemma 1 and
Theorem 3.
Suppose (j; k) and (i; t) are two critical nodes in a canonical graph D(1;m), where i j k t.
Then there is a unit path of critical nodes from (j; k) to (i; t). With this, there are two important
symmetric paths between (j; k) and (i; t):
The upper angular path of (j; k) and (i; t) is
(j; k) * (i; k) ! ! (i; t)
and the lower angular path of (j; k) and (i; t) is
(j; k) =) (j; t) " " (i; t)
19
A trivial angular path has unit edges replacing the jumpers. That is, if the three unit edge connected
critical nodes (j; k) ! (j; k +1) " (j , 1; k +1), then there is a trivial angular path (j; k) " (j , 1; k) !
(j , 1; k , 1) where (j , 1; k) is not a critical node. For examples of angular paths see Figure 7.9.
In a canonical subgraph the shortest path between any two critical nodes that contains no other
critical nodes is an angular path. Angular paths are central to the rest of this section.
(i,t)
(j,k)
Figure 7.9: Two Angular Paths

A central result of this section is that a shortest path to a critical node (i; j ) in a canonical
graph may be along a path of critical nodes, then over an angular path then back to a subpath of
critical nodes, then over an angular path and back to a subpath of critical nodes, etc.
Theorem 18 (Hu and Shing [21]) A shortest path to a critical node (i; j ) in a D (1 ;m) graph is
either along a straight unit path to (i; j ), along the path of critical nodes to (i; j ), or along subpaths
of critical nodes connected together by angular paths and nally to (i; j ).
A proof of this theorem follows inductively from Theorem 17 and the following. Say that
there is some jumper (i; j ) =) (i; k) such that (j + 1; k) 62 V [p]. Then, since W ((i; j ) =) (i; k)) =
f (i; j; k)+sp(j +1; k) and assume without loss that (j +1; k) 2 L, see Figure 7.10a. Now, by Theorem
17 we know the shortest path to (j +1; k) includes the unit edge (j +2; k) " (j +1; k) which has cost
wj wj +1wk+1 . Additionally, assume that (j + 2; k) 62 V [p]. But, we know f (i; j; k) = wiwj +1wk+1
and by Lemma 6 since (i; j ) and (i; k) are in U we know wi < wj +1 < wk+1 . Therefore we can
apply Theorem 16 showing that we cannot have jumper (i; j ) =) (i; k) in a minimal path in Dn .
A similar case holds for Figure 7.10b.
(i,j) (i,k)
(i,j) (i,k)
(j+1,k) (j+1,k)
(j+1,j+1)
(k,k)
(a) (b)
Figure 7.10: Two Jumpers and their Complimentary Paths
20
By the Duality Theorem and the compatibility of critical nodes, any jumpers over p have dual
paths containing no jumpers over p giving the last case we discussed.
(i,j) (i,k)
(i,j) (i,k)
Figure 7.11: Two Jumpers Over the Path p

This last theorem and the next corollary also hold for shortest paths from (j; k) to (i; v ) in
i;v)
D j;k
(
graphs.
( )
Corollary 4 A shortest path from (0; 0) to a non-critical node (s; t) in a D (1;m) graph is either a
straight unit path, or it is a minimal path to some critical node (i; j ) and from (i; j ) to (s; t) by an
angular path.
7.1.2 Combining the Canonical Graphs for an Ecient Parallel Algorithm

This subsection gives a parallel divide and conquer tool for nding minimal paths in Dn graphs
based on work of Hu and Shing. This is done by isolating an underlying tree structure connecting the
canonical subgraphs, so we can nd shortest paths in these subgraphs individually while essentially
ignoring the eect of the monotone subgraphs. This divide and conquer tool allows the computation
of a shortest path in a Dn graph by applying variations of tree contraction techniques. These
techniques incorporate special \leaf pruning" operations.
By Corollary 1, from here on assume
w1 = 1min fw g
in+1 i
Next we present, without proof, a central result of Hu and Shing. In essence, this result gives
some of the power of the greedy principle together with the principle of optimality. That is, with
this result we can isolate some substructures that are necessarily in an optimal superstructure.
Theorem 19 (Hu and Shing [21]) Given a weight list w1; : : :; wn+1 with the three smallest
weights w1 < wi+1 < wv+1 , the products w1wi+1 and w1wv+1 are in some associative product(s) in
an optimal parenthesization.
There may be one or two f s that contain the products w1wi+1 and w1wv+1 .
The next corollary is central to the results of this section. It essentially guarantees that certain
easily computable nodes are a part of a shortest path in a given Dn graph. Where Hu and Shing
use the results of the last theorem as a sequential divide and conquer tool, here it is made into a
parallel tool.
21
Corollary 5 (Atomicity Corollary) Given a weight list w ; : : :; wn with the three smallest
1 +1
weights w1 < wi+1 < wv+1 and i + 1 < v , the critical nodes (1; i) and (1; v ) are in a shortest path
from (0; 0) to (1; n) in Dn .
A proof follows directly from Theorem 19.
Suppose w1 < wi+1 < wv+1 are the three smallest weights in Dn and both D(1;i) and D(i+1;v)
are canonical subgraphs. Then clearly there is a shortest path from (0; 0) to (1; i) in D(1;i). Also,
applying the Atomicity Corollary (Corollary 5) to the subgraph D(1; v ), which also has the three
smallest weights w1 < wi+1 < wv+1 , shows that (1; i) is in a minimal path to (1; v ). Therefore by
the structure of Dn graphs the only contribution that D(i+1;v) can make to a shortest path to (1; v )
is by providing sp values for jumpers along the unit path (1; i) ! ! (1; v ). That is, there may
be some jumper (1; j ) =) (1; k) such that (j + 1; k) 2 V [D(i+1;v) ] and (1; i) ! ! (1; j ) =)
(1; k) ! ! (i; v ) is cheaper than (1; i) ! ! (1; v ). Shortly, in Lemma 9, we will see that we
only have to consider jumpers (1; j ) =) (1; k) such that (j + 1; k) is a critical node in D(i+1;v) .
Canonical Trees
Dividing a Dn graph into a tree of canonical subgraphs using the Atomicity Corollary (Corollary
5) is easily done in O(lg n) time with n=lg n processors by solving the ANSV problem.
In a Dn graph the structure joining all of the critical nodes is a canonical tree, see also Hu
and Shing [19, 21, 22]. Dene the leaves, edges, and internal nodes of a canonical tree as follows.
Initially, in every canonical subgraph D(1;m) the critical node (1; m) is a leaf and is denoted by
(1; m) to distinguish it from other critical nodes. A D(1;m) canonical subgraph only contains one
tree node, namely (1; m). On the other hand, an isolated critical node is a critical node with no
critical nodes that are one unit edge away. The internal tree nodes are isolated critical nodes or
i;v) canonical graphs, where k 6= j + 1. A D(i;v) graph only contains the
(i; v ) and (j; k) for D((j;k ) (j;k )
two tree nodes (i; v ) and (j; k). Notice that all tree nodes are also critical nodes, thus they are
compatible.
The edges of a canonical tree are the straight unit paths that connect tree nodes. Jumpers may
reduce the cost of tree edges. An edge from (i; j ) to (i; k) is denoted by (i; j ) ! ! (i; k) and
all edges are directed towards (1; n).
Since tree nodes are critical nodes with easily discernible properties, we can eciently distinguish
them in parallel. In addition, we can discard all monotone subgraphs since they have no in uence
on a shortest path to (1; n), except if D(1; i) is monotone and for some i, w1 = min1in+1 fwi g.
That is, since w1 = min1in+1 fwi g, if D(1; i) is monotone for some i, then a shortest path to
(1; n) will travel along the path (1; 1) ! ! (1; i). But this is the only case when a monotone
graph contains a piece of a minimal path from (0; 0) to (1; n).
For the next lemma, assume w1 = 1min in+1 i
fw g
Lemma 8 In a monotone subgraph D(i; k) of Dn, the cost of a shortest path to (i; k) for i > 1
plus the associative weight f (1; i , 1; k) is more than the unit path (1; i , 1) ! ! (1; k).
Proof: Since D(i; k) is monotone, we either have wi < < wk or wi > > wk . So by
+1 +1
Theorem 10, the shortest path to (i; k) in D(i; k) is either (i; i) ! ! (i; k) or (k; k) " " (i; k).
Therefore, taking the jumper (1; i , 1) =) (1; k) with weight sp(i; k) + f (1; i , 1; k) we have two
cases.
22
Case i: The ordering of the weight list is wi < < wk+1 .
In this case, the shortest path to (i; k) is (i; i) ! ! (i; k). Clearly, f (1; i , 1; k) =
w1wi wk+1 and W ((1; i , 1) ! (1; i)) = w1wiwi+1 . But since wi+1 wk+1, it must be
f (1; i , 1; k) W ((1; i , 1) ! (1; i)). Along the same lines, W ((1; j ) ! (1; j + 1)) <
W ((i; j ) ! (i; j + 1)) holds for all j; i j < k.
Case ii: The ordering of the weight list is wi > > wk+1 .
In this case, the shortest path to (i; k) is (k; k) " " (i; k). Since f (1; i , 1; k) =
w1wi wk+1, W ((1; k , 1) ! (1; k)) = w1wk wk+1, and wi wk , we know that f (1; i ,
1; k) W ((1; k , 1) ! (1; k)). Along the same lines, W ((1; i + j , 1) ! (1; i + j )) <
W ((i + j + 1; k) " (i + j; k)) for all j; 0 j < k , i.
2
Suppose Dn has fewer than n , 1 critical nodes. Then Dn may have several disconnected
canonical trees and one or more monotone subgraphs by Corollary 2. But, by Theorem 9 there
is at least one path from (0; 0) to (1; n) joining these canonical subtrees. From Lemma 8, after
discarding irrelevant monotone subgraphs and for the moment ignoring D((j;k i;v) graphs, there are
)
several structures that D(1;m) graphs may form together. The relationships of tree nodes is the
basis of all of these structures.
Let (i; j ); (j + 1; k); : : :; (u + 1; v ) be neighboring leaves in D(i; v ) such that they are all in a
canonical tree rooted at (i; v ) or in no canonical tree at all. All of these leaves together can have
the next relationships, or combinations of them.
Case 1: The leaves are not joined together by internal tree nodes. For instance, this case
occurs if wi < wj +1 < < wu+1 < wv+1 .
Case 2: The leaves form a binary canonical tree, see Figure 7.12. In this case, there are
internal tree nodes in D(i; v ) that connect the leaves together.
Figure 7.12: A Canonical Tree of D(1;m) Graphs, the Circles Denote Tree Nodes
We can solve Case 1 by creating a surrogate canonical tree and treating it as in Case 2. The
viability of treating these situations as Case 2 is now shown. Take a list of r monotone leaves,
none of which are in a canonical tree. Label the leaves (i; j ); (j + 1; k); (k + 1; t); : : :; (u + 1; v ) and
without loss assume that wi < wj +1 < < wu+1 < wv+1 so these leaves are all in D(i; v ). Now,
by the Atomicity Corollary we know that tree node (i; j ) must be in a shortest path from (0; 0) to
(i; v ). Therefore, applying the Duality Theorem (Theorem 3) we know that a shortest path from
(0; 0) to (i; v ) goes from (0; 0) to (j + 1; v ) and then over the tree edge (j + 1; v ) " " (i; v ). Now,
we can complete this argument by induction.
23
7.1.3 Finding Shortest Paths to All Critical Nodes in Canonical Subgraphs
This section discusses an algorithm for nding shortest paths in D(1;m) and D((j;k i;v) canonical graphs.
)
As before, p denotes the continuous path of critical nodes in a D(1;m) or D((j;ki;v) graph.
)
First an m3 =lg m processor and polylog time algorithm for nding a shortest path to all critical
nodes in a D(1;m) canonical subgraph is given. This is done by treating angular paths as edges so
p is now an (m + 1)-node graph with (m2 ) edges. That is, any angular path (j; k) * (i; k) !
! (i; t) connecting the critical nodes (j; k) and (i; t) becomes one edge from (j; k) to (i; t)
costing W ((j; k) * (i; k)) + wi kwk+1 : wt+1k. Of course, since (j; k) is a critical node but (i; k) is
not a critical node and both are in the same canonical graph, it must be that W ((j; k) * (i; k)) =
wi kwi+1 : wj k + f (i; j , 1; k). Now by Theorem 17, this also holds for D((j;k i;v) canonical graphs.
)
Further, using a (min; +) matrix multiplication shortest path algorithm, nding shortest paths in
such an (m + 1)-node graph can be done in O(lg2 m) time with m3 =lg m processors. This results
in a polylog time and n3 =lg n processor MCOP algorithm in the last subsections of the paper.
Next is the O(lg2 m) time and m3 =lg m processor algorithm for nding a shortest path to each
critical node in a D(1;m) graph. First compute all of the unit paths to nodes in p in constant time
using m processors. Perhaps, we can best view these paths as edges from (0; 0) to the nodes in
p. Further, the cost of each of the (m2) angular paths can be computed in constant time using
m2 processors, with preprocessing costing O(lg m) time and m=lg m processors. Now compute the
shortest path to each node in p by treating every angular path as a weighted edge and applying
a parallel (min; +) matrix multiplication all pairs shortest path algorithm to the nodes in p. This
algorithm costs O(lg2 m) time and m3 =lg m processors, and provides a shortest path from (0; 0) to
every critical node in a D(1;m) graph. Compute the shortest paths from (0; 0) to all critical nodes
i;v) canonical graph in the same way.
in a D((j;k )
Theorem 20 Given a D (1 ;m) graph we can compute a shortest path from (0; 0) to all nodes in p
in O(lg m) time using m =lg m processors.
2 3
i;v)
The results of this theorem also hold for nding shortest paths from all critical nodes in a D((j;k )
canonical graph to (i; v ).
Leaf Pruning and Band Merging
Here a basic technique for joining canonical subgraphs quickly in parallel is given. This technique
is based on edge minimization and builds shortest paths in Dn graphs.
Take the two jumpers (i; j ) =) (i; t) and (i; k) =) (i; u) in row i and without loss say j < k.
Then they are not compatible i k < t < u. If (j + 1; t) 2 V [p] and (k + 1; u) 2 V [p], then (j + 1; t)
and (k + 1; u) are compatible. Consequently, any two jumpers in row i such as (i; j ) =) (i; t) and
(i; k) =) (i; u), where (j + 1; t) 2 V [p] and (k + 1; u) 2 V [p0], must be compatible, where p and
p0 are possibly distinct paths of critical nodes. Notice that if p and p0 are distinct, then they are
still compatible. Given a Dn graph with the jumpers (i; j ) =) (i; t) and (i; k) =) (i; u), let p be a
minimal path from (0; 0) to (1; n) in Dn . Then all jumpers (i; j ) =) (i; t) and (i; k) =) (i; u) such
that (j + 1; t) 2 V [p] and (k + 1; u) 2 V [p] are compatible.
Minimizing the cost of a straight unit edge path in a canonical tree by using jumpers is edge
minimizing, and we will show that the jumpers only have to get their sp values from critical nodes.
i;v) graphs. For example, let p be
We will only edge minimize tree edges or straight unit paths in D((j;k )
the path of critical nodes in D(k; t) and consider the straight unit path (i; j ) ! ! (i; v ). If the
24
jumper (i; k) =) (i; t) is such that (k +1; t) 2 V [p] and (i; j ) ! ! (i; k) =) (i; t) ! ! (i; v )
is cheaper than (i; j ) ! ! (i; v ), then we edge minimize (i; j ) ! ! (i; v ) with (i; k) =) (i; t).
The next procedure edge minimizes the unit path along the ith row to the critical node (i; v )
with all jumpers that get their sp values from the critical nodes V [p],
L = wi kwi+1 : wv+1 k
A[i; v] = 8(k+1min
;u)2V [p]
f L; wikwk+1 : wu+1 k , wikwk+1 : wu+1k + W ((i; k) =) (i; u)) g
and the same can be done for straight vertical unit paths. The minimal cost along the tree edge i
to (i; v ) is in A[i; v ], assuming only one connected path of critical nodes p. Notice by Theorem 6
that we can compute the cost of the straight unit (sub)paths in constant time with one processor.
Lemma 9 When edge minimizing a tree edge (i; j ) ! ! (i; v) in a canonical subgraph we only
have to consider jumpers (i; k) =) (i; t) such that (k + 1; t) 2 V [p].
Proof: Take row i, and (i; i) ! ! (i; v ) assuming that some jumper (i; s) =) (i; t) minimizes
row i where (s + 1; t) 62 V [p] and i < s < t u. Without loss, say (s + 1; t) 2 V [U ], so a shortest
path to (s + 1; t) is either the straight unit path (s + 1; s + 1) ! ! (s + 1; t) or this unit path
with jumpers by Corollary 4.
All jumpers getting their sp value from a critical node in p must be compatible. Therefore,
there can be only one jumper getting its sp value from a critical node in a shortest path from
(s + 1; s + 1) to (s + 1; t). Now there are three cases to consider,
Case i: A minimal path to (s +1; t) is the straight unit path (s +1; s +1) ! ! (s +1; t)
By the Duality Theorem the minimal path along (i; i) ! ! (i; s) =) (i; t) is
equivalent to the path (s +1; s +1) ! ! (s +1; t) * (i; t). But (s +1; s +1) ! !
(s + 1; t) * (i; t) is not an angular path, a straight unit path, or a path of critical nodes
intermixed with angular paths getting their sp value from p. Therefore, we arrive at a
contradiction of Corollary 4.
Case ii: A shortest path to (s + 1; t) is along the unit path from (s + 1; s + 1) to (s + 1; t)
and contains one or more jumpers whose sp values are not from V [p].
This case would imply that a shortest path from (0; 0) to nodes in U are not angular
paths or straight line unit paths contradicting Corollary 4.
Case iii: A shortest path to (s + 1; t) is along the unit path from (s + 1; s + 1) to (s + 1; t)
and contains one jumper that gets its sp value from a critical node in p.
Let (s +1; j ) =) (s +1; k) be a jumper such that (j +1; k) 2 V [p]. Thus by the Duality
Theorem (Theorem 3) we know that the path (s + 1; s + 1) ! ! (s + 1; j ) =)
(s + 1; k) ! ! (s + 1; t) followed by the jumper (s + 1; t) * (i; t) costs the same as
the path (i; i) ! ! (i; s) =) (i; t). At the same time, the path (s +1; s +1) ! !
(s + 1; j ) =) (s + 1; k) ! ! (s + 1; t) is equivalent to a shortest path to (j + 1; k)
followed by the angular path (j + 1; k) * (s + 1; k) ! ! (s + 1; t), by the Duality
Theorem. This means a shortest path from (0; 0) to (i; t) is from (0; 0) to (j + 1; k)
then (j + 1; k) * (s + 1; k) ! ! (s + 1; t) * (s + 1; t), but this is a contradiction by
Corollary 4, since this path is not an angular path.
25
Cases i and ii can occur at the same time, though the above arguments still hold.
2
This lemma easily generalizes to the case where p is a path of critical nodes in a conglomerate
of canonical subgraphs. Lemma 9 also highlights the role well-formed subsolutions play in the
dynamic programming solution of the MCOP.
Let p be a minimal path from (0; 0) to (k; t) in D(k; t). Then we can extend the result of
Lemma 9 to the case where we only have to consider jumpers (i; k) =) (i; t) along tree edges
(i; j ) ! ! (i; v ) such that (k + 1; t) 2 V [p].
Theorem 21 When edge minimizing a tree edge (i; j ) ! ! (i; v) in a canonical subgraph we
only have to consider jumpers (i; k) =) (i; t) such that (k + 1; t) 2 V [p].
Proof: Since (i; j ) ! ! (i; v ) is a tree edge it must be that maxfwi; wv g < ws for all
+1
s; i s v + 1.
By contradiction, say the jumper (i; k) =) (i; t) such that (k + 1; t) 62 V [p] but (k + 1; t) 2 V [p]
is the jumper that minimizes the tree edge (i; j ) ! ! (i; v ) more than any other jumper.
But since (k + 1; t) 62 V [p], we can assume that the angular path (r; s) * (q; s) ! ! (q; u) is
the angular path in p that goes around (k + 1; t), so (r; s) 2 V [p] and (q; u) 2 V [p]. In this case,
by the Duality Theorem (Theorem 3) a minimal path to (q; u) is (q; q ) ! ! (q; r , 1) =)
(q; s) ! ! (q; u) such that (r; s) 2 V [p]. In particular, notice that the jumper (q; k) =) (q; t)
saves no more than any other jumper by edge minimizing the unit path (q; q ) ! ! (q; u) and,
since (i; j ) ! ! (i; v ) is a tree edge, we know that wi < wq and f (i; k; t) < f (q; k; t). Thus, if
(q; k) =) (q; t) saves no more than the jumper (q; r , 1) =) (q; s) in row q , then (i; k) =) (i; t)
saves no more than any other jumper in row i.
2
This last theorem is very important. It says once a minimal path p is discovered in a subgraph
D(j; k) then during edge minimization we only have to consider jumpers with sp values from p. Of
course V [p] V [p], where p is the path of critical nodes.
Theorem 22 Given a tree node (i; u) where the graph D(i; u) contains the leaves D(i; j ) and
D(j + 1; u) so (i; j ) and (j + 1; u) are tree nodes, if D(j + 1; u) is a canonical subgraph, then a
shortest path from (i; j ) to (i; u) can be found in O(u , j ) operations.
Proof: Since (i; j ); (i; u) and (j +1; u) are all critical nodes, the three smallest weights in wi ; : : :; wu+1
are wi ; wj +1; and wu+1 . Now assume without loss that wi < wu+1 < wj +1 . Hence, by Corollary 5,
tree node (i; j ) must be in a shortest path from (0; 0) to (i; u). Therefore we will edge minimize
the unit path (i; j ) ! ! (i; u). Otherwise, if wu+1 < wi < wj +1 then (j + 1; u) is in a
shortest path from (0; 0) to (i; u). Therefore by the Theorem 3 we can edge minimize the path
(i; j ) ! ! (i; u).
There could be a quadratic number of jumpers of the form (i; k) =) (i; t) such that j k <
t u. But by Theorem 21 we only have to consider jumpers along row i that get their sp values
from p. That is, only jumpers such as (i; k) =) (i; t) where (k +1; t) 2 V [p]. The appropriate value
of sp, which has been computed for each node in p, can be retrieved and added to the appropriate
value of f in constant time. Therefore we can nd the minimal values for the paths between nodes
(i; j ) and (i; u) in O(u , j ) operations, since there are O(u , j ) such compatible jumpers.
2
The O(u , j ) operations are easily done in O(lg(u , j )) time using (u , j )= lg(u , j ) processors.
26
The last theorem holds for leaves in the canonical tree that have been combined and become
conglomerates of other leaves and internal nodes. In this situation, jumpers derived from critical
nodes in dierent subtrees are independent and compatible, so minimizing tree edges with them is
done simultaneously. But canonical subgraphs of the form D((j;k i;v) must be considered. In this case,
)
i;v) , where D(j; k) consists of the pruned subgraphs
take the leaf D(j; k) that must be pruned in D((j;k )
D(j; t) and D(t + 1; k).
If there is a shortest path through (j; k) to (i; v ), then by the Atomicity Corollary (Corollary
5) and depending on whether wj < wk+1 or wk+1 < wj either (j; t) or (t + 1; k), respectively, is in
a shortest path to (j; k). Therefore, by Theorem 22 we would be done. But we must address the
possibility that the cost of paths from (0; 0) to critical nodes throughout D(j; k) can contribute
to shortest paths to critical nodes from (j; k) to (i; v ). In fact, we must combine the information
about shortest paths in D(j; k) with information about shortest paths in D((j;k i;v) . The combination
)
of this shortest path information is done by edge minimizing unit paths in D((j;k i;v) with sp values
)
from critical nodes of D(j; k).
The next theorem is another parallel divide and conquer tool, but it is for merging a D(j; k)
i;v) graph.
leaf into a D((j;k )
i;v graph with a shortest path p from (j; k) to (i; v ) and suppose the four
Theorem 23 In a D j;k ( )
( )
smallest weights are wi < wv+1 < wi+1 < wv then p goes through one of
1. Either (i + 1; v ) or (i + 1; v , 1) or both
2. (i; i + 1) and (i; v , 1)
3. (v , 1; v ) and (i + 1; v )
Proof: Since wi < wv+1 < wi+1 < wv and considering a D((j;k i;v) graph, it must be that (i; v ); (i +1; v)
)
and (i + 1; v , 1) are all critical nodes, so p may go through them. Clearly, if p goes through
(i + 1; v , 1), then p cannot go through (i; i + 1) or (v , 1; v ).
The shortest path p has a jumper up to row i, then (i; i +1) will be in p in the sense of Theorem
9. Of course, if p goes through (i; i + 1), then p cannot go through either (v , 1; v ) or (i + 1; v ).
Finally, if (v , 1; v ) is in a shortest path to (i; v ), then (i + 1; v ) is also and both (i; i + 1) and
(i + 1; v , 1) are not in this minimal path.
2
This last theorem also holds for D(1;m) graphs.
By Theorem 23 take a shortest path from (0; 0) to (i; v ) that goes through (i; i +1). Then there
might be a straight unit path (i; i + 1) ! ! (i; v ) connecting (i; i + 1) and (i; v ). On the other
hand, there may be some jumper (i; j ) =) (i; k) such that
(i; i + 1) ! ! (i; j ) =) (i; k) ! ! (i; v )
is cheaper than (i; i + 1) ! ! (i; v ).
The only ways to merge canonical graphs together are given in Figure 7.13. Merging the two
leaf canonical graphs as in Figure 7.13a is leaf pruning, and Figures 7.13b,c are band merging. Leaf
pruning can be accomplished by edge minimizing alone after all shortest paths to critical nodes in
the leaves have been found.
27
Min−A Min−B
Min−A
A
Min−B
a) b) c)
B
Figure 7.13: The Variations of Band Merging or Leaf Pruning
In Figure 7.13c, contracted trees A and B are used to edge minimize the unit paths marked by
\Min-A" and \Min-B." Edge minimizing the unit paths in the outer band with the contracted
trees gives an instance of Figure 7.13b.
i;v) with D(j;t) can be done in O(lg2 n) time using n3 = lg n processors. Let p be a
To merge D((j;t ) (k;s) 1
(i;v )
minimal path from (i; v ) back to (0; 0) only through D(j;t) , and let p2 be a minimal path from (j; t)
j;t) . Now, merging these two graphs by nding all the angular paths
back to (0; 0) only through D((k;s )
from critical nodes in p2 forward to any critical nodes in D((j;ti;v) . Next, applying an all pairs shortest
)
path algorithm combines these bands giving a shortest path from (i; v ) back to (0; 0) through D((k;s i;v) .
)
7.1.4 Contracting a Canonical Tree

This subsection shows how to use edge minimization and band merging as the prune operations for
a standard tree contraction algorithm. Together these algorithms complete an ecient solution to
the MCOP.
In a tree that contains only D(1;m) canonical subgraphs, each prune operation is an edge min-
imization that joins leaves together, until the entire canonical tree is one leaf. Every critical node
(i; j ) has an associated variable sp(i; j ) where sp(i; j ) denotes the cost of a shortest path from (0; 0)
to (i; j ). Further, a canonical tree has at most n , 1 critical nodes, so only O(n) such variables are
necessary.
Initially, all internal tree nodes have sp(i; j ) = 1 and all nodes in p in tree leaves have the
minimum value from (0; 0) to (i; j ) stored in sp(i; j ). That is, in the tree leaves the minimal paths p
have been computed. In addition, for all D((j;k i;v) graphs compute and store the value of the shortest
)
path from all critical nodes in D((j;k i;v) to (i; v ). Compute these initial values using the methods of
)
Subsection 7.1.3. Also, each tree edge (i; j ) ! ! (i; k) initially has weight wi kwj +1 : wk+1 k.
After the appropriate preprocessing, by Theorem 6, we can compute the initial cost of each of these
O(n) tree edges in constant time with one processor each.
Assume the standard tree contraction algorithm [24] except for the prune operation. Along
with the new prune operation, there is an ordering of the leaves that prevents the simultaneous
pruning of two adjacent leaves. Take two neighboring canonical graphs D(i;j ) and D(j +1;k) with the
two leaves (i; j ) and (j + 1; k) and the internal node (i; k), say wi < wk+1 < wj +1 . Then prune leaf
(j + 1; k) since (i; j ) is in a shortest path from (0; 0) to (i; k) by the Atomicity Corollary (Corollary
5). Otherwise, say wk+1 < wi < wj +1 , then prune leaf (i; j ). While, standard tree contraction
algorithms use the Euler Tour Technique [24, 27] to number the leaves appropriately, as we have
just seen the canonical nodes often provide a natural prune ordering. The viability of this natural
leaf numbering follows by induction. Thus we apply the Euler Tour Technique in case the leaf
28
ordering is arbitrary, otherwise take the natural pruning order. Take the tree made by connecting
the two leaves (i; j ); (j + 1; k) to the internal node (i; k), then the pruning order is arbitrary if
wi = wj +1 = wk+1 .
In addition, the numbering of tree nodes by the Euler Tour Technique is used to prevent a
band graph from being merged simultaneously with a band inside of it and a band around it. See
standard tree collapsing techniques for details of this use of the Euler Tour Technique.
[i,j] [i,k] [i,u]
[j+1,k]
Figure 7.14: A Small Canonical Tree
As Figure 7.14 depicts, take a canonical tree rooted at (i; k) which is the parent of tree nodes
(i; j ) and (j + 1; k) which are siblings representing D(i; j ) and D(j + 1; k), respectively. Assume
that D(j + 1; k) has been pruned into a leaf, and if wk+1 < wi < wj +1 then compute what contri-
bution, if any, (j + 1; k) makes to the shortest path to (i; k). This is determined by edge minimizing
the tree edge (i; j ) ! ! (i; k) with all nodes in a minimal path from (0; 0) to (j + 1; k). After
this edge minimization, inactivate (j + 1; k) and its parent (i; k), so (j + 1; k) would not be pruned
again.
Notice, the tree leaves depicted in Figure 7.15 can be pruned in any order. They are \pruned
into" the edge joining them by rst nding minimal paths to all critical nodes in each leaf and then,
edge minimizing the edge joining them all with jumpers that get their sp values from the critical
nodes in the leaves.
A linear list of leaves as those in Figure 7.15 can be pruned in any order. Therefore, choosing
to do them simultaneously makes most sense. But, any number of nested bands must be merged
in a way to avoid con icts. This is easily done using the Euler Tour Technique for numbering
appropriately for tree contraction, see [27].
The next lemma shows the correctness of pruning canonical subgraphs of the form D(1;m). This
is necessary for canonical subtrees as in Figure 7.12.
Lemma 10 Tree contraction of a tree of D ;m) graphs with isolated internal tree nodes correctly
(1
computes a shortest path from (0; 0) to (1; n) in Dn .

Proof: The proof is by induction on the tree node depth, where a leaf is of depth 1.
Figure 7.15: A Linear List of Tree Leaves
29
The case where the depth d = 1 is trivial, so consider when the depth is d = 2. Without loss
say the canonical tree of depth 2 has nodes (i; j ); (j + 1; k) and (i; k), (see the subgraph D(i; k) in
Figure 7.14). Now by the Atomicity Corollary (Corollary 5) and since wi ; wj +1; and wk+1 are the
three smallest weights in the list wi ; : : :; wk+1 we know that (i; j ) or (j + 1; k) is in a shortest path
from (0; 0) to (i; k). Thus, the prune operation above processes this properly.
Suppose the prune operation is correct for all trees Td of depth d, and take the tree Td+1 of
depth d + 1. Without loss, say Td+1 contains two subtrees of depth d and the root of Td+1 is
(i; k). In addition, if the two depth d subtrees of Td+1 have roots (i; j ) and (j + 1; k), then, by the
properties of critical nodes, we know that the three smallest weights in wi ; : : :; wk+1 are wi; wj +1;
and wk+1 . Without loss, say wk+1 < wi < wj +1 , which by the Atomicity Corollary tree node (i; j )
must be in a shortest path from (0; 0) to (i; k).
By the inductive hypothesis, we know that if all the subtrees rooted at (i; j ) and (j + 1; k)
have been pruned, then we have a shortest path from (0; 0) to (i; j ) and another to (j + 1; k).
Pruning leaf (j + 1; k) nds all critical nodes that are in each of the two subtrees rooted at (i; j )
and (j + 1; k). The pruning operation is just an edge minimization to see which combinations of
critical nodes may from a shortest path form (0; 0) to (i; k).
2
Lemma 10 shows that pruning allows us to build a shortest path from (0; 0) to (1; n) in a tree
made of leaf canonical graphs and isolated critical nodes. We can implement such tree pruning in
O(lg n) time using n=lg n processors following standard tree contraction techniques. Specically,
either the Atomocity Corollary gives the leaf pruning ordering, or they can be pruned arbitrarily
where we would choose an ordering like the one specied by the Euler Tour Technique.
i;v and D k;t , where j k < t u, with no
Lemma 11 Given two nested canonical graphs D j;u ( ) ( )
r;s
( ) ( )
such canonical subgraph between them, we can join them with one merge operation.
sA proof of this follows the proof of Lemma 10 in a straightforward manner.
Assume that the shortest paths in all canonical subgraphs are computed rst at a cost of O(lg2 n)
time and n3 =lg n processors. Independently, compute the shortest paths of all of these canonical
subgraphs. Then the pruning algorithm takes O(lg2 n) time using n3 =lg n processors.
Now considering Lemmas 11 and 10 we can solve the MCOP by performing tree contraction
with the prune operation.
Analyzing this algorithm gives the following theorem.
Theorem 24 We can solve the MCOP in O(lg n) time using n =lg n processors.
3 3
This algorithm uses O(n) nodes in a Dn graph to solve the MCOP. Thus we can solve the
MCOP by using only O(n) elements of a classical dynamic programming table.
Theorem 24 also applies to the optimal convex triangulation problem with the standard triangle
cost metrics [11, 21].
7.2 Acknowledgments
Greg Shannon has been very supportive of this work; in particular he made me aware of reference
[4]. Gregory J. E. Rawlins has been extremely helpful with the presentation, as was Sudhir Rao
and Andrea Rafael. Alok Aggarwal has been an inspiration for this work. Artur Czumaj was also
helpful. One of the anonymous referee's comments contributed greatly to this paper.
30
Bibliography
[1] A. Apostolico, M. J. Atallah, L. L. Larmore and S. H. McFaddin: \Ecient Parallel Algorithms
for String Editing and Related Problems", SIAM Journal on Computing, Vol. 19, No. 5, 968-
988, Oct. 1990.
[2] A. Aggarwal and J. Park: \Notes on Searching Multidimensional Monotone Arrays", Proceed-
ings of the 29th Annual IEEE Symposium on the Foundations of Computer Science, 497-512,
1988.
[3] S. Baase: Computer Algorithms, Second Edition, Addison-Wesley, 1988.
[4] O. Berkman, D. Breslauer, Z. Galil, B. Schieber and U. Vishkin: \Highly Parallelizable Prob-
lems", Symposium on the Theory on Computing, 309-319, 1989.
[5] O. Berkman, B. Schieber and U. Vishkin: \Optimal Doubly Logarithmic Parallel Algorithms
Based on Finding All Nearest Smaller Values," J. of Algorithms, Vol. 14, 344-370, 1993.
[6] P. G. Bradford: \Ecient Parallel Dynamic Programming," Technical Report # 352, Indiana
University, April 1992.
[7] P. G. Bradford: \Ecient Parallel Dynamic Programming," Extended Abstract in the Proceed-
ings of the 30th Allerton Conference on Communication, Control and Computation, University
of Illinois at Urbana-Champaign, 185-194, 1992.
[8] P. G. Bradford, G. J. E. Rawlins and G. E. Shannon: \Matrix Chain Ordering in Polylog
Time with n=lg n Processors," Technical Report # 360, Indiana University, December 1992.
[9] A. K. Chandra: \Computing Matrix Chain Products in Near Optimal Time", IBM Research
Report RC-5625, Oct. 1975.
[10] F. Y. Chin: \An O(n) Algorithm for Determining Near-Optimal Computation Order of Matrix
Chain Products", Communications of the ACM, Vol. 21, No. 7, 544-549, July 1978.
[11] T. H. Cormen, C. E. Leiserson and R. L. Rivest: Introduction to Algorithms, McGraw Hill,
1990.
[12] A. Czumaj: \An Optimal Parallel Algorithm for Computing a Near-Optimal Order of Matrix
Multiplications," SWAT, Springer Verlag, LNCS # 621 , 62-72, 1992.
[13] A. Czumaj: \Parallel algorithm for the matrix chain product and the optimal triangulation
problem (Extended Abstract)," STACS 93, Springer Verlag, LNCS # 665, 294-305, 1993.
31
[14] L. E. Deimel, Jr. and T. A. Lampe: \An Invariance Theorem Concerning Optimal Computa-
tion of Matrix Chain Products," North Carolina State Univ. Tech Report # TR79-14.
[15] Z. Galil and K. Park: \Parallel Dynamic Programming," 1991, Submitted.
[16] A. Gibbons and W. Rytter: Ecient Parallel Algorithms, Cambridge University Press, 1988.
[17] D. S. Hirschberg and L. L. Larmore, \The Least Weight Subsequence Problem", SIAM J. on
Computing, Vol. 16, No. 4, 628-638, 1987.
[18] T. C. Hu: Combinatorial Algorithms, Addison-Wesley, 1982.
[19] T. C. Hu and M. T. Shing: \Some Theorems about Matrix Multiplication", Proceedings of the
21st Annual IEEE Symposium on the Foundations of Computer Science, 28-35, 1980.
[20] T. C. Hu and M. T. Shing: \An O(n) Algorithm to Find a Near-Optimum Partition of a
Convex Polygon", J. of Algorithms, Vol. 2, 122-138, 1981.
[21] T. C. Hu and M. T. Shing: \Computation of Matrix Product Chains. Part I", SIAM J. on
Computing, Vol. 11, No. 3, 362-373, 1982.
[22] T. C. Hu and M. T. Shing: \Computation of Matrix Product Chains. Part II", SIAM J. on
Computing, Vol. 13, No. 2, 228-251, 1984.
[23] S.-H. S. Huang, H. Liu, V. Viswanathan: \Parallel Dynamic Programming," Proceedings of
the 2nd IEEE Symposium on Parallel and Distributed Processing, 497-500, 1990.
[24] J. JaJa: An Introduction to Parallel Algorithms, Addison-Wesley, 1992.
[25] L. L. Larmore and W. Rytter: \Ecient Sublinear Time Parallel Algorithms for the Recog-
nition of Context-Free Languages", Proceedings of 2nd Scandinavian Workshop on Algorithm
Theory 1992, Springer Verlag, LNCS #577, 1992.
[26] O. H. Ibarra, T.-C. Pong and S. M. Sohn: \Hypercube Algorithms for Some String Comparison
Problems", Proceedings of the IEEE International Conference on Parallel Processing, 190-193,
1988.
[27] R. M. Karp and V. Ramachandran: \Parallel Algorithms for Shared Memory Machines",
Chapter 17 in Handbook of Theoretical Computer Science, Vol. A, Algorithms and Complexity,
V. Van Leeuwen|editor, Elsevier, 1990.
[28] P. Ramanan: \A New Lower Bound Technique and its Application: Tight Lower Bounds for
a Polygon Triangularization Problem", Proceedings of the Second Annual ACM-SIAM Sympo-
sium on Discrete Algorithms, 281-290, 1991.
[29] P. Ramanan: \An Ecient Parallel Algorithm for Finding an Optimal Order of Computing
a Matrix Chain Product," Technical Report, WSUCS-92-2, Wichita State University, June,
1992.
[30] P. Ramanan: \An Ecient Parallel Algorithm for the Matrix Chain Product Problem," Tech-
nical Report, WSUCS-93-1, Wichita State University, January, 1993.
[31] W. Rytter: \On Ecient Parallel Computation for Some Dynamic Programming Problems",
Theoretical Computer Science, Vol. 59, 297-307, 1988.
32
[32] L. G. Valiant, S. Skyum, S. Berkowitz and C. Racko: \Fast Parallel Computation of Polyno-
mials Using Few Processors", SIAM J. on Computing, Vol. 12, No. 4, 641-644, Nov. 1983.
[33] F. F. Yao: \Speed-Up in Dynamic Programming", SIAM J. on Algebraic and Discrete Methods,
Vol. 3, No. 4, 532-540, 1982.
33

Parallel DP

Uploaded by

Copyright:

Available Formats

Parallel DP

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel DP

Uploaded by

Copyright:

Available Formats

Ecient Parallel Dynamic Programming1

S1;n is the start symbol

Figure 3.1: The Grammar L1

For example, see the graph G4 in Figure 3.2.

(2,2) (2,3) (2,4)

Figure 3.2: The Balanced Weighted Digraph G4

4.1 The Minimum Cost Parenthesization Problem

(i,j) (i,j+1) (i,k-1) (i,k)

(2,2) (2,3) (2,4)

Figure 4.4: The Weighted Graph D4

The jumper (j + 1; k) * (i; k) is of length j + 1 , i. Therefore, since j + 1 , i + k , j = k , i + 1

for all 1  i; j; u; v  n in parallel do

Figure 4.5: Modi ed (min; +) All Pairs Shortest Path Algorithm

5.1 The Matrix Chain Ordering Problem

 if wi > wi and wi > wk then B is cheaper

i;t) Graph without (0; 0) and with No Jumpers Shown

Figure 5.7: Several Canonical Subgraphs and Their Weight List

6.1 A Parallel Approximation Algorithm for the MCOP

7.1 Solving the Matrix Chain Ordering Problem in Parallel

except possibly (i; j ), is a straight unit path.

Figure 7.9: Two Angular Paths

Figure 7.10: Two Jumpers and their Complimentary Paths

Figure 7.11: Two Jumpers Over the Path p

7.1.2 Combining the Canonical Graphs for an Ecient Parallel Algorithm

Figure 7.13: The Variations of Band Merging or Leaf Pruning

7.1.4 Contracting a Canonical Tree

computes a shortest path from (0; 0) to (1; n) in Dn .

Figure 7.15: A Linear List of Tree Leaves

You might also like

Ecient Parallel Dynamic Programming1

for all 1 i; j; u; v n in parallel do

Figure 4.5: Modied (min; +) All Pairs Shortest Path Algorithm

if wi > wi and wi > wk then B is cheaper

7.1.2 Combining the Canonical Graphs for an Ecient Parallel Algorithm