We study a class of simple algorithms for concurrently computing the connected components of an n-vertex, m-edge graph. Our algorithms are easy to implement in either the COMBINING CRCW PRAM or the MPC computing model. For two related algorithms in this class, we obtain Θ (lg n) step and Θ (m lg n) work bounds.1 For two others, we obtain O(lg2n) step and O(m lg2n) work bounds, which are tight for one of them. All our algorithms are simpler than related algorithms in the literature. We also point out some gaps and errors in the analysis of previous algorithms. Our results show that even a basic problem like connected components still has secrets to reveal.
1 Introduction
The problem of finding the connected components of an undirected graph with n vertices and m edges is fundamental in algorithmic graph theory. Any kind of graph search, such as depth-first or breadth-first, solves it in linear time sequentially, which is the best possible way. The problem becomes more interesting in a concurrent model of computation. In the heyday of the theoretical study of PRAM (parallel random-access machine) algorithms, many more, and more efficient, algorithms for the problem were discovered, culminating with the \(O(\lg n)\) step, \(O(m)\) work randomized algorithms of Halperin and Zwick [10, 11], the second of which computes spanning trees of the components. The goal of most of the work in the PRAM model was to obtain the best asymptotic bounds, not the simplest algorithms.
With the growth of the internet, the world-wide web, and cloud computing, finding connected components on huge graphs has become commercially important, and practitioners have put versions of the PRAM algorithms into use. Many of the PRAM algorithms are complicated, and even some of the simpler ones have been further simplified when implemented. Experiments suggest that the resulting algorithms perform well in practice, but some of the published claims about their theoretical performance are incorrect or unjustified.
Given this situation, our goal here is to develop and analyze the simplest possible efficient algorithms for the problem and to rigorously analyze their efficiency. In exchange for algorithmic simplicity, we are willing to allow analytic complexity. Our algorithms are easy to implement in either the COMBINING CRCW PRAM model [1] in which write conflicts are resolved in favor of the smallest value, or in the MPC (massive parallel computing) model [4].
The COMBINING CRCW PRAM model is stronger than the more standard ARBITRARY CRCW PRAM model, in which write conflicts are resolved arbitrarily (one of the writes succeeds, but the algorithm has no control over which one), but is weaker than the MPC model.
We consider a class of simple deterministic algorithms for the problem. We study in detail four algorithms in the class, all of which are simpler than corresponding existing algorithms. For two of them we prove a step bound of \(\Theta (\lg n)\) and a work bound of \(\Theta (m \lg n)\). For the other two, we prove a step bound of \(O(\lg ^2 n)\) and a work bound of \(O(m \lg ^2 n)\). These bounds are tight for one of the two. We also show that one of our \(O(\lg ^2 n)\)-step algorithms takes \(O(d)\) steps where d is the largest diameter of a component, but the others do not.
Our paper is a revised and expanded version of a conference paper [17]. We have deleted an incorrect analysis of one algorithm (algorithm P, Section 4), removed another algorithm for which we had an incorrect analysis, expanded our analysis of a third algorithm (algorithm RA, Section 3), expanded our discussion of related work, and made a number of stylistic changes. The paper contains five sections in addition to this introduction. Section 2 presents our algorithmic framework and a general lower bound in the MPC model. Section 3 presents our algorithms. Section 4 proves upper and lower step bounds. Section 5 discusses related work. Section 6 contains final remarks and open problems.
2 Algorithmic Framework
Given an undirected graph with vertex set \([n] = \lbrace 1, 2, \ldots , n\rbrace\) and m edges, we wish to compute its connected components via a concurrent algorithm. More precisely, for each component we want to label all its vertices with a unique vertex in the component, so that two vertices are in the same component if and only if they have the same label. To state bounds simply, we assume \(n \gt 2\) and \(m \gt 0\). We denote an edge by the unordered pair of its ends.
As our computing model, we use a COMBINING CRCW (concurrent read, concurrent write)PRAM (parallel random-access machine) [1]. Such a machine consists of a large common memory and a number of processes, each with a small private memory. In one step, each process can do one unit of local computation, read one word of common memory, or write into one word of common memory. The processes operate in lockstep. Concurrent reads and writes are allowed, with write conflicts resolved in favor of the smallest value written. We discuss weaker variants of the PRAM model in Section 5. We measure the efficiency of an algorithm primarily by the number of concurrent steps and secondarily by the work, defined to be the number of steps times the number of processes.
Our algorithms are also easy to implement on the MPC (massively parallel computing) model [4]. This is a model of distributed computation based on the BSP (bulk synchronous parallel) model [28]. The MPC model is more powerful than our PRAM model but is a realistic model of cloud computing platforms. An MPC machine consists of a number of processes, each with a private memory. There is no common global memory; the processes communicate with each other by sending messages. Computation proceeds in globally synchronized steps. In one step, each process receives the messages sent to it in the previous step, does some amount of local computation, and then sends a message or messages to one or more other processes.
We specialize the MPC model to the connected components problem as follows. There is one process per edge and one per vertex. A process can only send a message to another process once it knows about that process. Initially a vertex knows only about itself, and an edge knows only about its two ends. Thus, in the first concurrent step only edges can send messages, and only to their ends. A vertex or edge knows about another vertex or edge once it has received a message containing the vertex or edge. This model ignores contention resulting from many messages being sent to the same process, and it allows a process to send many messages in one step.
The MPC model is quite powerful, but even in this model there is a non-constant lower bound on the number of steps needed to compute connected components.
It is easy to solve the problem in \(O(\lg d)\) steps if messages can be arbitrarily large: send each edge end to the other end, and then repeatedly send from each vertex all the vertices it knows to all its incident vertices [21]. If there is a large component, however, this algorithm is not practical, for at least two reasons: it requires huge memory at each vertex, and the number of messages sent in the last step can be quadratic in n. Hence, we restrict the local memory of a vertex or edge to hold only a small constant number of vertices and edges. We also restrict messages to hold only a small constant number of vertices and edges, along with an indication of the message type, such as a label request or a label update. Our goal is a simple algorithm with a step bound of \(O(\lg n)\). (We discuss the harder goal of achieving a step bound of \(O(\lg d)\) in Section 5.)
We consider algorithms that maintain a label for each vertex u, initially u itself. The algorithm updates labels step-by-step until none changes, after which all vertices in a component have the same label, which is one of the vertices in the component. At any given time the current labels define a digraph (directed graph) of out-degree one whose arcs lead from vertices to their labels. We call this the label digraph. If these arcs form no cycles other than loops (arcs of the form \((u, u)\)), then this digraph is a forest of trees rooted at the self-labeled vertices: the parent of u is its label unless this label is u, in which case u is a root. We call this the label forest. Each tree in the forest is a label tree.
All our algorithms maintain the label digraph as a forest; that is, they maintain acyclicity except for self-labels. (We know of only two previous algorithms that do not maintain the label digraph as a forest: see Section 5.) Henceforth, we call the label of a vertex u its parent and denote it by \(u.p\), and we call a label tree just a tree. A non-root vertex is a child of its parent. A vertex is a leaf if it is a child but it has no children of its own. A tree is flat if the root is the parent of every child in the tree, and is a singleton if the root is the only vertex. (Some authors call a flat tree a star.) The depth of a tree vertex x is the number of arcs on the path from x to the root of the tree: a root has depth zero, a child of a root has depth one. The depth of a tree is the maximum of the depths of its vertices.
When changing labels, a simple way to guarantee acyclicity is to replace a parent only by a smaller vertex. We call this minimum labeling. (An equivalent alternative, maximum labeling, is to replace a parent only by a larger vertex). A minimum labeling algorithm stops with each vertex labeled by the smallest vertex in its component. All our algorithms do minimum labeling. They also maintain the invariant that all vertices in a tree are in the same component, as do all algorithms known to us. That is, they never create a tree containing vertices from two or more components. Our specialization of the MPC model to the connected components problem maintains this invariant. At the end of the computation there is one flat tree per component, whose root is the minimum vertex in the component.
3 Algorithms
We consider algorithms that consist of initialization followed by a main loop that updates parents and repeats until no parent changes. Initialization consists of setting the parent of each vertex equal to itself. The following pseudocode does initialization:
initialize:
for each vertex vdo\(v.p = v\)
Since initialization is the same for all our algorithms, we omit it in the descriptions below and focus on the main loop. Each iteration of the loop does a connect step, which updates parents using current edges, one or more shortcut steps, each of which updates parents using old parents, and possibly an alter step, which alters the edges of the graph. When discussing and analyzing our algorithms, we use the following terminology. By the graph we mean the input graph, or the graph formed from the input graph by the alter steps done so far, if the algorithm does alter steps. An edge is an edge of the graph; a component is a connected component of the graph. All alter steps preserve components. By the forest we mean the forest whose child-parent arcs are the pairs \((v, v.p)\) with \(v \ne v.p\); a tree is a tree in this forest. The terms parent, child, ancestor, descendant, root, and leaf refer to the forest.
Our algorithms maintain the following connectivity invariant: v and \(v.p\) are in the same component, as are v and w if \(\lbrace v, w\rbrace\) is an edge. If \(\lbrace v, w\rbrace\) is an edge, replacing the parent of v by any ancestor of w preserves the invariant, as does replacing the parent of w by any ancestor of v. This gives us many ways of doing a connect step. We focus on two. The first is direct-connect, which for each edge \(\lbrace v, w\rbrace\), uses the minimum of v and w as a candidate for the new parent of the other. The other is parent-connect, which uses the minimum of the old parents of v and w as a candidate for the new parent of the old parent of the other. We express these methods in pseudocode below. We have written all our pseudocode so that it is correct and produces unambiguous results even if the loops run sequentially rather than concurrently and the vertices and edges are processed in arbitrary order. We say more about this issue below.
direct-connect:
for each edge \(\lbrace v, w\rbrace\)do
if\(v \gt w\)then
\(v.p = \min \lbrace v.p, w\rbrace\)
else\(w.p = \min \lbrace w.p, v\rbrace\)
The pseudocode for parent-connect begins by computing \(v.o\), the old parent of v, for each vertex v. It then uses these old parents to compute the new parent \(v.p\) of each vertex v. The new parent of a vertex x is the minimum \(w.o \lt x.o\) such that there is an edge \(\lbrace v, w\rbrace\) with \(v.o = x\), if there is such an edge; if not, the parent of x does not change.
parent-connect:
for each vertex vdo\(v.o = v.p\)
for each edge \(\lbrace v, w\rbrace\)do
if\(v.o \gt w.o\)then
\(v.o.p = \min \lbrace v.o.p, w.o\rbrace\)
else\(w.o.p = \min \lbrace w.o.p, v.o\rbrace\)
If all reads and comparisons occur before all writes, and all writes occur concurrently, the following simpler pseudocode has the same semantics as that for parent-connect:
for each edge \(\lbrace v, w\rbrace\)do
if\(v.p \gt w.p\)then
\(v.p.p = \min \lbrace v.p.p, w.p\rbrace\)
else\(w.p.p = \min \lbrace w.p.p, v.p\rbrace\)
On the other hand, if this simpler loop is executed sequentially, then in general the results depend on the order in which the edges are processed. Suppose for example there are two edges \(\lbrace x, y\rbrace\) and \(\lbrace v, w\rbrace\) such that \(x.p = v\) and \(v.p = z\). In parent-connect, \(w.p\) is a candidate to be the new parent of z. That is, after the connect, the new parent of z will be no greater than the old parent of w. But in the simpler loop, if \(\lbrace x, y\rbrace\) is processed before \(\lbrace v, w\rbrace\), the processing of \(\lbrace x, y\rbrace\) might change the parent of v to a vertex other than z, thereby making \(w.p\) no longer a candidate for the new parent of z. Even though we are primarily interested in global concurrency, we want our algorithms and bounds to be correct in the more realistic setting in which the edges are processed one group at a time, with the group size determined by the number of available processes.
On a COMBINING CRCW PRAM, we can use the simple loop for parent-connect, since there is global concurrency. Each process for an edge \(\lbrace v, w\rbrace\) reads \(v.p\) and \(w.p\). If \(v.p \gt w.p\), it reads \(v.p.p\), tests if \(w.p \lt v.p.p\); and, if so, writes \(w.p\) to \(v.p.p\); if \(v.p \le w.p\), it reads \(w.p.p\), tests if \(v.p \lt w.p.p\); and, if so, writes \(v.p\) to \(w.p.p\). All the reads occur before all the writes, and all the writes occur concurrently, with a write of smallest value succeeding if there is a conflict.
In the MPC model, each vertex stores its parent. To execute parent-connect, each process for an edge \(\lbrace v, w\rbrace\) requests \(v.p\) and \(w.p\). If \(v.p \gt w.p\), it sends \(w.p\) to \(v.p\); otherwise, it sends \(v.p\) to \(w.p\). Each vertex then updates its parent to be the minimum of its old value and the smallest of the received values. All our other loops can be similarly implemented on a COMBINING CRCW PRAM or in the MPC model.
A connect step of either kind can move a subtree from one tree to another. We can prevent this by restricting connection so that it only updates parents of roots. The following pseudocode implements such restrictions of direct-connect and parent-connect, which we call direct-root-connect and parent-root-connect, respectively:
direct-root-connect:
for each vertex vdo\(v.o = v.p\)
for each edge \(\lbrace v, w\rbrace\)do
if\(v \gt w\) and \(v = v.o\)then
\(v.p = \min \lbrace v.p, w\rbrace\)
else if\(w = w.o\)then
\(w.p = \min \lbrace w.p, v\rbrace\)
parent-root-connect:
for each vertex vdo\(v.o = v.p\)
for each edge \(\lbrace v, w\rbrace\)do
if\(v.o \gt w.o\) and \(v.o = v.o.o\)then
\(v.o.p = \min \lbrace v.o.p, w.o\rbrace\)
else if\(w.o = w.o.o\)then
\(w.o.p = \min \lbrace w.o.p, v.o\rbrace\)
In direct-root-connect (as in parent-connect) we need to save the old parents to get a correct sequential implementation, so that the root test is correct even if the parent has been changed by processing another edge during the same iteration of the loop over the edges. If we truly have global concurrency, simpler pseudocode suffices, as for parent-connect.
Shortcutting is the key to obtaining a logarithmic step bound. Shortcutting replaces the parent of each vertex by its grandparent. The following pseudocode implements shortcutting:
shortcut:
for each vertex vdo
\(v.o = v.p\)
for each vertex vdo
\(v.p = v.o.o\)
In the case of shortcut, the simpler loop “for each vertex v do \(v.p =v.p.p\)” produces correct results and preserves our time bounds, even though sequential execution of the code produces different results depending on the order in which the vertices are processed. All we need is that the new parent of a vertex is no greater than its old grandparent. Thus, the simpler loop might well be a better choice in practice.
Edge alteration deletes each edge \(\lbrace v, w\rbrace\) and replaces it by \(\lbrace v.p, w.p\rbrace\) if \(v.p \ne w.p\). The following pseudocode implements alteration:
alter:
for each edge \(\lbrace v, w\rbrace\)do
if\(v.p = w.p\)then
delete \(\lbrace v, w\rbrace\)
else replace \(\lbrace v, w\rbrace\) by \(\lbrace v.p, w.p\rbrace\)
We shall study in detail four algorithms, whose main loops are given below:
Algorithm S: repeat {parent-connect; repeatshortcutuntil no \(v.p\) changes} until no \(v.p\) changes
Algorithm R: repeat {parent-root-connect; shortcut} until no \(v.p\) changes
Algorithm RA: repeat {direct-root-connect; shortcut; alter} until no \(v.p\) changes
Algorithm A: repeat {direct-connect; shortcut; alter} until no \(v.p\) changes
In algorithm S, the inner loop “repeatshortcutuntil no \(v.p\) changes” terminates after a shortcut that changes no parents. Similarly, in each of the algorithms the outer repeat loop terminates after an iteration that changes no parents.
Many algorithms fall within our framework. We focus on these four because they are simple and natural and we can prove good bounds for them. Algorithm S simplifies the algorithm of Hirschberg, Chandra, and Sarwate [12]. Theirs was the first algorithm to run in a polylogarithmic number of steps, specifically \(O(\lg ^2 n)\). Algorithm R simplifies the algorithm of Shiloach and Vishkin, which runs in \(O(\lg n)\) steps. We discuss in detail the relationship of our algorithms to theirs, as well as other related work, in Section 5.
Some observations guided our choice of algorithms to study. There is a tension between connect steps and shortcut steps: the former combine trees but generally produce deeper trees; the latter make trees shallower. Algorithm S completely flattens all the existing trees between each connect step. This guarantees monotonicity: the parent-connect steps in S change the parents only of roots. As we shall see, completely flattening the trees also guarantees that each tree is connected to another tree in at most two iterations of the outer loop. Algorithm S has an \(O(\lg ^2 n)\) step bound, one log factor coming from the at most \(\lg n\) iterations of the inner loop needed to flatten all trees, the other coming from the at most \(2lg n\) iterations of the outer loop needed to reduce the number of trees in each component to 1. Unfortunately, the \(O(\lg ^2 n)\) bound is tight to within a constant factor.
To obtain a step bound of \(\lg n\), we must reduce the number of shortcuts per round to \(O(1)\). Algorithm R does this. It uses root-connect, making it monotonic. Algorithm RA is a related algorithm that does edge alteration, allowing it to use direct-root-connect instead of the more-complicated root-connect. Both R and RA have an \(O(\lg n)\) step bound.
It is natural to wonder whether monotonicity is needed to get a polylogarithmic step bound. Algorithm A answers this question negatively. It is algorithm RA with direct-connect replacing direct-root-connect. We shall prove an \(O(\lg ^2 n)\) step bound for algorithm A. We do not know if this bound is tight.
Each of our four algorithms is distinct: for each pair of algorithms, there is a graph on which the two algorithms make different parent changes, as one can verify case-by-case. Use of direct-connect or direct-root-connect requires edge alteration to obtain a correct algorithm. (A counterexample is the graph whose vertices are \(1, 2, 3\) and whose edges are \(\lbrace 1, 3\rbrace , \lbrace 2, 3\rbrace\): the parent of vertex 2 remains 2 but should become 1.) Although different, algorithms R and RA behave similarly, and we use the same techniques to obtain bounds for both of them. Algorithm S is equivalent to the algorithm formed by replacing parent-connect by parent-root-connect, and to the algorithm formed by replacing parent-connect by direct-connect and adding alter to the end of the main loop: all three algorithms make the same parent changes.
We conclude this section with a proof that our algorithms are correct. We begin by establishing some properties of edge alteration. We call an iteration of the main loop (the outer loop in algorithm S) a round. We call a vertex bare if it is not an edge end and clad if it is. Only alter steps can change vertices from clad to bare or vice-versa. A clad vertex becomes bare if it loses all its incident edges during an alter step. A bare leaf cannot become clad, nor can a bare vertex all of whose descendants are bare, because no connect step can give it a new (clad) descendant.
To prove correctness, we need the following key result.
4 Efficiency
On a graph that consists of a path of n vertices, all our algorithms take \(\Omega (\lg n)\) steps, since this graph has one component of diameter \(n-1\), and the general lower bound of Theorem 2.1 applies. We prove worst-case step bounds of \(O(\min \lbrace d, \lg n\rbrace \lg n)\) for S, \(O(\min \lbrace d, \lg ^2 n\rbrace)\) for A, and \(O(\lg n)\) for R and RA. We also show that algorithm S can take \(\Omega (\lg ^2 n)\) steps, and algorithms R and RA can take \(\Omega (\lg n)\) steps even on graphs with constant diameter d.
4.1 Analysis of S
We show by example that S can take \(\Omega (\lg ^2 n)\) steps.
If there is a tree of depth \(2^k\) just before the inner loop in a round, this loop will take at least k steps. If every round produces a new tree of depth \(2^k\), and the algorithm takes k rounds, the total number of steps will be \(\Omega (k^2)\). Our bad example is based on this observation. It consists of k components such that running algorithm S for i rounds on the i-th component produces a tree that contains a path of length \(2^k\).
At the beginning of a round of the algorithm, we define the implied graph to be the graph whose vertices are the roots and whose edges are the pairs \(\lbrace v.p, w.p\rbrace\) such that \(\lbrace v, w\rbrace\) is a graph edge and \(v.p \ne w.p\). Restricted to the roots, the subsequent behavior of algorithm S on the original graph is the same as its behavior on the implied graph. We describe a way to produce a given implied graph in a given number of rounds.
To produce a given implied graph G with vertex set \([n]\) in one round, we start with the generator\(g(G)\) of G, defined to be the graph with vertex set \([2n]\), edges \(\lbrace v, v+n\rbrace\) and \(\lbrace v+n, v\rbrace\) for each \(v \in [n]\), and an edge \(\lbrace v+n, w+n\rbrace\) for each edge \(\lbrace v, w\rbrace\) in G. A round of algorithm S on \(g(G)\) does the following: the connect step makes \(v+n\) a child of v for \(v \in [n]\), and the shortcuts do nothing. The resulting implied graph has vertex set \([n]\) and an edge \(\lbrace v, w\rbrace\) for each edge \(\lbrace v,w\rbrace\) in G; that is, it is G.
To produce a given implied graph G in i rounds, we start with \(g^i(G)\). An induction on the number of rounds shows that after i rounds of algorithm S, G is the implied graph.
Let k be a positive integer, and let P be the path of vertices \(1, 2, \ldots , 2^k + 1\) and edges \(\lbrace i, i+1\rbrace\) for \(i \in [2^k]\). Our bad example is the disjoint union of \(P, g(P), g^2(P), \ldots , g^{k-1}(P)\), with the vertices renumbered so that each component has distinct vertices and the order within each component is preserved. Algorithm S takes \(\Omega (k^2)\) steps on this graph. The number of vertices is \(n = (2^k + 1) (2^k - 1) = 2^{2k} - 1\). The number of edges is \(2^{2k} - k - 1\), since the graph is a set of k trees. Thus, the number of steps is \(\Omega (\lg ^2 n)\).
4.2 Analysis of A
In analyzing A, R, and RA, we assume that the graph is connected: this is without loss of generality since A, R, and RA operate independently and concurrently on each component, and each round does \(O(1)\) steps.
A simple example shows that Theorem 4.2 is false for S, R, and RA. Consider the graph whose edges are \(\lbrace i, i+1\rbrace\) for \(i \in [n-1]\) and \(\lbrace i, n\rbrace\) for \(i \in [n-1]\). After the first connection step, there is one tree: 1 is the parent of 2 and n, and i is the parent of \(i+1\) for \(i \in [2, n-2]\). Subsequent connection steps do nothing; the tree only becomes flat after \(\Omega (\lg n)\) shortcuts.
To obtain an \(O(\lg ^2 n)\) step bound for algorithm A, we show that \(O(\lg n)\) rounds reduce the number of non-leaf vertices by at least a factor of 2, from which an overall \(O(\lg ^2 n)\) step bound follows.
It is convenient to shift our attention from rounds to passes. A pass is the interval from the beginning of one shortcut to the beginning of the next. Pass 1 begins with the shortcut in round 1 and ends with the connect in round 2. We need one additional definition. A vertex is deep if it is a non-root with at least one child and all its children are leaves.
We need one more idea, which we borrow from the analysis of path halving, a method used in disjoint set union algorithms that shortcuts a single path [27]. For any vertex v, we define the level of v to be \(v.l = \lfloor \lg (v - v.p) \rfloor\) unless \(v.p = v\), in which case \(v.l = 0\). The level of a vertex is at least 0, less than \(\lg n\), and non-decreasing. The following lemma quantifies the effect of a shortcut on a sufficiently long path in a tree.
We combine Lemmas 4.3, 4.4, and 4.5 to obtain the desired result.
We do not know whether the bound in Theorem 4.7 is tight. We conjecture that it is not, and that algorithm A takes \(O(\lg n)\) steps. We are able to prove an \(O(\lg n)\) bound for the monotone algorithms R and RA, which we do in the next section.
An algorithm similar to algorithm A is algorithm P, which replaces direct-connect in A by parent-connect and deletes alter. The following pseudocode implements the main loop of this algorithm:
Algorithm P: repeat {parent-connect; shortcut} until no \(v.p\) changes
We conjecture that algorithm P, too, has an \(O(\lg n)\) step bound, but we are unable to prove even an \(O(\lg ^2 n)\) bound, the problem being that this algorithm can leave flat trees unchanged for a non-constant number of rounds.
4.3 Analysis of R and RA
Algorithms R and RA can also leave flat trees unchanged for a non-constant number of rounds, so the analysis of Section 4.2 also fails for these algorithms. But we can obtain an even better bound than that of Theorem 4.7 by using a different analytical technique, that of Awerbuch and Shiloach [3], extended to cover a constant number of rounds rather than just one. This analysis requires the algorithm to be monotonic.
We call a tree passive in a round if it exists both at the beginning and at the end of the round; that is, the round does not change it. A passive tree is flat, but a flat tree need not be passive. We call a tree active in a round if it exists at the end of the round but not at the beginning. An active tree contains at least two vertices and has depth at least one.
We say a connect links trees T and \(T^{\prime }\) if it makes the root of one of them a child of a vertex in the other. If the connect makes the root of T a child of a vertex in \(T^{\prime }\), we say the connect links T to \(T^{\prime }\).
If T exists at the end of round k, its constituent trees at the end of round \(j \le k\) are the trees existing at the end of round j whose vertices are in T. Since algorithms R and RA are monotone, these trees partition the vertices of T.
Since algorithm RA is slightly simpler to analyze than R, we first analyze RA, and then discuss the changes needed to make the analysis apply to R. We measure progress using the potential function of Awerbuch and Shiloach, modified so that it is non-increasing and passive trees have zero potential. In RA, we define the individual potential of a tree T at the end of round k to be zero if T is passive in round k, or two plus the maximum of zero and the maximum depth of an arc end in T if T is active in round k. If T exists at the end of round k and \(j \le k\), we define the potential\(\Phi _j(T)\) of T at the end of round j to be the sum of the individual potentials of its constituent trees at the end of round j. We define the total potential at the end of round k to be the sum of the potentials of the trees existing at the end of round k.
We shall prove that the total potential decreases by a constant factor in a constant number of rounds. This will give us an \(O(\lg n)\) bound on the number of rounds. It suffices to consider each active tree individually.
Lemma 4.10 gives a potential drop for any active tree T such that \(\Phi _{k-1}(T) \ge 4\). To obtain a potential drop if \(\Phi _{k-1}(T) \lt 4\), we need to consider two rounds if \(\Phi _{k-1}(T) = 3\) and three rounds if \(\Phi _{k-1}(T) = 2\).
Having covered all the cases, we are ready to put them together. Let \(a = (3/2)^{1/3} = 1.1447+\), and let \(|T|\) be the number of vertices in tree T.
We can use the same approach to prove an \(O(\lg n)\) step bound for algorithm R, but the details are more complicated. Algorithm RA has the advantage over R that the alter step in effect does extra flattening. If T is active in round k and \(T_1\) is the only active constituent tree of T in round \(k-1\), it is possible for the depth of \(T_1\) to be one and that of T to be two. This is not a problem in RA, because the alter decreases the depth of each edge end by one. But in R we need to give extra potential to trees of depth one to make the potential function non-increasing.
In R we define the individual potential of a tree T at the end of round k to be zero if T is passive in round k, or the depth of T if T is active in round k and not flat, or two if T is active in round k and flat. As in RA, if T exists at the end of round k and \(j \le k\), we define the potential \(\Phi _j(T)\) of T at the end of round j to be the sum of the potentials of its constituent trees at the end of round j, and we define the total potential at the end of round k to be the sum of the potentials of the trees existing at the end of round k. We prove analogues of Lemmas 4.10–4.13 and Theorem 4.14 for R.
Lemma 4.15 gives a potential drop for any active tree T such that \(\Phi _{k-1}(T) \ge 4\). To obtain a potential drop if \(\Phi _{k-1}(T) \lt 4\), we need to consider two rounds if \(\Phi _{k-1}(T) = 3\) and five rounds if \(\Phi _{k-1}(T) = 2\).
Let \(b = (3/2)^{1/5} = 1.0844+\).
For the variants of R and RA that do two shortcuts in each round instead of just one, we can simplify the analysis and improve the constants. For RA, we let the potential of an active tree be the maximum depth of an edge end (or zero if there are no edge ends) plus one. The potential of an active tree drops from the previous round by at least a factor of two unless it has only one active constituent tree in the previous round and that tree has potential one. The proof of Lemma 4.12 gives a potential reduction of at least a factor of two in at most three rounds in this case. Lemma 4.13 holds with a replaced by \(2^{1/3} = 1.2599+\). For R, we let the potential of an active tree be its depth. The potential of an active tree drops from the previous round by at least a factor of at least two unless it has only one active constituent tree in the previous round and that tree is flat. The proof of Lemma 4.17 gives a potential reduction of at least a factor of two in at most three rounds, and Lemma 4.18 holds with b replaced by \(2^{1/3}\), the same constant as for the two-shortcut variant of RA.
This analysis suggests that doing two shortcuts per round rather than one might improve the practical performance of R and RA, especially since a shortcut needs only n processes, but a connect step needs m. Exactly how many shortcuts to do per round is a question for experiments to resolve. Our analysis of algorithm S suggests that doing a non-constant number of shortcuts per round is likely to degrade performance. We have analyzed the one-shortcut-per-round algorithms R and RA, even though their analysis is more complicated than the corresponding two-shortcut-per-round algorithms, because this analysis applies to the corresponding algorithms with any constant number of shortcuts per round, and our goal is to determine the simplest algorithms with an \(O(\lg n)\) step bound.
5 Related Work
In this section we review previous work related to ours. We have presented our results first, since they provide insights into the related work. As far as we can tell, all our algorithms are novel and simpler than previous algorithms, although they are based on some of the previous algorithms.
Two different communities have worked on concurrent connected components algorithms, in two overlapping eras. First, theoretical computer scientists developed provably efficient algorithms for various versions of the PRAM model. This work began in the late 1970’s and reached a natural conclusion in the work of Halperin and Zwick [10, 11], who gave \(O(\lg n)\)-step, \(O(m)\)-work randomized algorithms for the EREW (exclusive read, exclusive write) PRAM. Their second algorithm finds spanning trees of the components. The EREW PRAM is the weakest variant of the PRAM model, and finding connected components in this model requires \(\Omega (\lg n)\) steps [7]. To solve the problem sequentially takes \(O(m)\) time, so the Halperin-Zwick algorithms minimize both the number of steps and the total work (number of steps times the number of processes). Whether there is a deterministic EREW PRAM algorithm with the same efficiency remains an open problem.
Halperin and Zwick’s paper [11] contains a table listing results preceding theirs, and we refer the reader to their paper for these results. Our interest is in simple algorithms for a more powerful computational model, so we content ourselves here with discussing simple labeling algorithms related to ours. (The Halperin-Zwick algorithms and many of the preceding ones are not simple.) First, we review variants of the PRAM model and how they relate to our algorithmic framework.
The three main variants of the PRAM model, in increasing order of strength, are EREW, CREW (concurrent read, exclusive write), and CRCW (concurrent read, concurrent write). The CRCW PRAM has four standard versions that differ in how they handle write conflicts: (i) COMMON: all writes to the same location at the same time must be of the same value; (ii) ARBITRARY: among concurrent writes to the same location, an arbitrary one succeeds; (iii) PRIORITY: among concurrent writes to the same location, the one done by the highest-priority process succeeds; (iv) COMBINING: values written concurrently to a given location are combined using some symmetric function. As discussed in Section 2, our algorithms can be implemented on a COMBINING CRCW PRAM, with minimization as the combining function.
An early and important theoretical result is the \(O(\lg ^2 n)\)-step CREW PRAM algorithm of Hirschberg, Chandra, and Sarwate [12]. Algorithm S is a simplification of their algorithm. They represent the graph by an adjacency matrix, but it is easy to translate their basic algorithm into our framework. Their algorithm alternates connect steps with repeated shortcuts. To do connection, they use a variant of parent-connect that we call strong-parent-connect. It concurrently sets \(x.p\) for each vertex x equal to the minimum \(w.p \ne x\) such that there is an edge \(\lbrace v, w\rbrace\) with \(v.p = x\); if there is no such edge, \(x.p\) does not change. The following pseudocode implements this method in our framework:
strong-parent-connect:
for each vertex vdo
\(v.n = \infty\)
for each edge \(\lbrace v, w\rbrace\)do
if\(v.p \ne w.p\)then
\(v.p.n = \min \lbrace v.p.n, w.p\rbrace\)
\(w.p.n = \min \lbrace w.p.n, v.p\rbrace\)
for each vertex vdo
if\(v.n \ne \infty\)then
\(v.p = v.n\)
This version of connection can make a larger vertex the parent of a smaller one. Thus, their algorithm does not do minimum labeling. Furthermore it can create parent cycles of length two, which Hirschberg et al. eliminate in a cleanup step at the end of each round. To do the cleanup it suffices to concurrently set \(v.p = v\) for each vertex such that \(v.p \gt v\) and \(v.p.p = v\). Their algorithm is one of two we have found in the literature that can create parent cycles.
Although they do not say this, we think the reason Hirschberg et al. used strong-parent-connect was to guarantee that each tree links with another tree in each round. This gives them an \(O(\lg n)\) bound on the number of rounds and an \(O(\lg ^2 n)\) bound on the number of steps, since there are \(O(\lg n)\) shortcuts per round. Our simpler algorithm S uses parent-connect in place of strong-parent-connect, making it a minimum labeling algorithm and eliminating the cleanup step. Although parent-connect does not guarantee that each tree links with another tree every round, it does guarantee such linking every two rounds, giving us the same \(O(\lg ^2 n)\) step bound as Hirschberg et al. See the proof of Theorem 4.1.
The first \(O(\lg n)\)-step PRAM algorithm was that of Shiloach and Vishkin [23]. It runs on an ARBITRARY CRCW PRAM, as do the other algorithms we discuss, except as noted. The following is a version of their algorithm SV in our framework:
and stagnant-parent-root-connect is the following variant:
stagnant-parent-root-connect:
for each vertex vdo
\(v.o = v.p\)
for each edge \(\lbrace v, w\rbrace\)do
if\(v.o \ne w.o\)then
if\(v.o\) is a stagnant root then
\(v.o.p = w.o\)
else if\(w.o\) is a stagnant root then
\(w.o.p = v.o\)
Whereas parent-root-connect updates the parent of each root x to be the minimumy such that there is an edge \(\lbrace v, w\rbrace\) with \(x = v.o\) and \(y = w.o \lt v.o\) if there is such an edge, arb-parent-root-connect replaces the parent of each such root by an arbitrary such y. Arbitrary resolution of write conflicts suffices to implement the latter method, but not the former.
Shiloach and Vishkin define a root to be stagnant if its tree is not changed by the first two steps of the main loop (the first shortcut and the arb-parent-root-connect). Their algorithm has additional steps to keep track of stagnant roots. Method stagnant-parent-root-connect updates the parent of each stagnant root x to be an arbitrary y such that there is an edge \(\lbrace v, w\rbrace\) with \(x = v.o\) and \(y = w.o \ne v.o\) if there is such an edge. The definition of “stagnant” implies that no two stagnant trees are connected by an edge.
Algorithm SV does not do minimum labeling, since stagnant-parent-root-connect can make a larger vertex the parent of a smaller one. Nevertheless, the algorithm creates no cycles, although the proof of this is not straightforward, nor is the efficiency analysis.
Algorithm R is algorithm SV with the third and fourth steps of the main loop deleted and the second step modified to resolve concurrent writes by minimum value instead of arbitrarily. Shiloach and Vishkin state that one shortcut can be deleted from their algorithm without affecting its asymptotic efficiency. They included the third step for two reasons: (i) their analysis examines one round at a time, requiring that every tree change in every round, and (ii) if the third step is deleted, the algorithm can take \(\Omega (n)\) steps on a graph that is a tree with edges \(\lbrace i, n\rbrace\) for \(i \in [n-1]\). This example strongly suggests that to obtain a simpler algorithm one needs to use a more powerful model of computation, as we have done by using minimization to resolve write conflicts.
Awerbuch and Shiloach presented a slightly simpler \(O(\lg n)\)-step algorithm and gave a simpler efficiency analysis [3]. Our analysis of algorithms R and RA in Section 4.3 uses a variant of their potential function. Their algorithm is algorithm SV with the first shortcut deleted and the two connect steps modified to update only parents of roots of flat trees. The computation needed to keep track of flat tree roots is simpler than that needed in algorithm SV to keep track of stagnant roots.
An even simpler but randomized \(O(\lg n)\)-step algorithm was proposed by Reif [22]:
Algorithm Reif:
repeat
{for each vertex flip a coin; random-parent-connect; shortcut}
until no \(v.p\) changes
where random-parent-connect is:
random-parent-connect:
for each vertex vdo
\(v.o = v.p\)
for each edge \(\lbrace v, w\rbrace\)do
if\(v.o\) flipped heads and \(w.o\) flipped tails then
\(v.o.p = w.o\)
else if\(w.o\) flipped heads and \(v.o\) flipped tails then
\(w.o.p = v.o\)
Reif’s algorithm keeps the trees flat, making the algorithm monotone, although it does not do minimum labeling. Although it is randomized, Reif’s algorithm is simpler than those of Shiloach and Vishkin, but R and RA are even simpler and are deterministic.
We know of one algorithm other than that of Hirschberg et al. [12] that does not maintain acyclicity. This is the algorithm of Johnson and Metaxis [16]. Their algorithm runs in \(O((\lg n)^{3/2})\) steps on an EREW PRAM. It uses a form of shortcutting to eliminate any cycles created by connection steps.
Algorithms that run on a more restricted form of PRAM, or use fewer processes (and thereby do less work) use various kinds of edge alteration, edge addition, and edge deletion, along with techniques to resolve read and write conflicts. Such algorithms are much more complicated than those we have considered. Again, we refer the reader to [10, 11] for results and references.
The second era of concurrent connected components algorithms was that of the experimentalists. It began in the 1990’s and continues to the present. Experimentation has expanded greatly with the growing importance of huge graphs representing the internet, the world-wide web, friendship connections, and other symmetric relations, as well as the development of cloud computing frameworks. These trends make concurrent algorithms for connected components both practical and useful. The general approach of the experimentalists has been to take one or more existing algorithms, possibly simplify or modify them, implement the resulting suite of algorithms on one or more computing platforms, and report the results of experiments done on some collection of graphs. Examples of such studies include [8, 9, 13, 14, 20, 26, 29, 30].
Some of these papers make claims about the theoretical efficiency of algorithms they propose, but several of these claims are incorrect or unjustified. We give some examples. The first is a paper by Greiner [9] in which he claims an \(O(\lg ^2 n)\) step bound for his “hybrid” algorithm.
Greiner’s description of this algorithm is incomplete. The algorithm is a modification of the algorithm of Hirschberg et al. [12]. Each round does a form of direct connect followed by repeated shortcuts followed by an alteration. Since repeated shortcuts guarantee that all trees are flat at the beginning of each round, this is equivalent to using a version of parent-connect and not doing alteration. The main novelty in his algorithm is that alternate rounds use maximization instead of minimization in the connect step. He does not specify exactly how the connect step works. There are at least two possibilities. One is to use direct-connect, but in alternate rounds replace min by max. The resulting algorithm is a min-max version of algorithm S.
The second possibility is to use the following strong version of direct-connect, but in alternate rounds replace min by max and \(\infty\) by \(-\infty\):
strong-direct-connect:
for each vertex vdo
\(v.n = \infty\)
for each edge \(\lbrace v, w\rbrace\)do
\(v.n = \min \lbrace v.n, w\rbrace\)
\(w.n = \min \lbrace w.n, v\rbrace\)
for each vertex vdo
if\(v.n \ne \infty\)then
\(v.p = v\)
The resulting algorithm is a min-max version of the Hirschberg et al. algorithm. Greiner claims an \(O(\lg n)\) bound on the number of rounds and an \(O(\lg ^2 n)\) bound on the number of steps. But these bounds do not hold for the algorithm that uses the min-max version of direct-connect: on the bad example of Shiloach and Vishkin consisting of an unrooted tree with vertex n adjacent to vertices 1 through \(n-1\), the algorithm takes \(\Omega (n)\) steps. This example has a high-degree vertex, but there is a simple example whose vertices are of degree at most three, consisting of a path of odd vertices \(1, 3, 5, \dots , n\) with each even vertex i adjacent to \(i + 1\).
On the other hand, the algorithm that uses the min-max version of strong-direct-connect can create parent cycles of length two, which must be eliminated by a cleanup as in the Hirschberg et al. algorithm. Greiner says nothing about eliminating cycles. We conclude that either his step bound is incorrect or his algorithm is incorrect.
At least one other work reproduces Greiner’s error: Soman et al. [24, 25] propose a modification of Greiner’s algorithm intended for implementation on a GPU model. Their algorithm is the inefficient version of Greiner’s algorithm, modified to use parent-connect instead of direct-connect and without alteration.
Their specific implementation of the connect step is as follows:
alternate-connect:
for each edge \(\lbrace v, w\rbrace\)do
if\(v.p \ne w.p\)then
\(x = \min \lbrace v.p, w.p\rbrace\)
\(y = \max \lbrace v.p, w.p\rbrace\)
if round is even then
\(y.p = x\)
else\(x.p = y\)
Soman et al. say nothing about how to resolve concurrent writes. If this resolution is arbitrary, or by minimum in the even rounds and by maximum in the odd rounds, then the algorithm takes \(\Omega (n)\) steps on the examples mentioned above.
Algorithm S, and the equivalent algorithm that uses direct-connect and alteration, are simpler than the algorithms of Greiner and Soman et al. and have guaranteed \(O(\lg ^2 n)\) step bounds. We conclude that alternating minimization and maximization adds complication without improving efficiency, at least in theory.
Another paper that has an invalid efficiency bound as a result of not handling concurrent writes carefully is that of Yan et al. [29]. They consider algorithms in the PREGEL framework [19], which is a graph-processing platform designed on top of the MPC model. All the algorithms they consider can be expressed in our framework. They give an algorithm obtained from algorithm SV by deleting the first shortcut and replacing the second connect step by the first connect step of Awerbuch and Shiloach’s algorithm. In fact, the second connect step does nothing, since any parent update it would do has already been done by the first connect step. That is, this algorithm is equivalent to algorithm SV with the first shortcut and the second connect step deleted. Their termination condition, that all trees are flat, is incorrect, since there could be two or more flat trees in the same component. They claim an \(O(\lg n)\) bound on steps, but since they assume arbitrary resolution of write conflicts, the actual step bound is \(\Theta (n)\) by the example of Shiloach and Vishkin.
A third paper with an analysis gap is that of Stergio, Rughwani, and Tsioutsiouliklis [26]. They present an algorithm that we call SRT, whose main loop expressed in our framework is the following:
Algorithm SRT:
repeat
for each vertex vdo
\(v.o = v.p\)
\(v.n = v.p\)
for each edge \(\lbrace v, w\rbrace\)do
if\(v.o \gt w.o\)then
\(v.n = \min \lbrace v.n, w.o\rbrace\)
else\(w.n = \min \lbrace w.n, v.o\rbrace\)
for each vertex vdo
\(v.o.p = \min \lbrace v.o.p, v.n\rbrace\)
for each vertex vdo
\(v.p = \min \lbrace v.p, v.n.o\rbrace\)
until no \(v.p\) changes
This algorithm does an extended form of connection combined with a variant of shortcutting that combines old and new parents. It is not monotone. Stergio et al. implemented this algorithm on the Hronos computing platform and successfully solved problems with trillions of edges. They claimed an \(O(\lg n)\) step bound for the algorithm, but we are unable to make sense of their analysis. Their paper motivated our work.
A recently proposed algorithm similar to SRT is FastSV, proposed by Zhang, Azad, and Hu [30]. This algorithm is not monotone. It combines connecting and shortcutting. In each round it computes the grandparent \(v.g = v.p.p\) of each vertex v and uses \(v.g\) as a candidate for \(v.p\), \(w.p.p\) and \(w.p\), for each edge \(\lbrace v, w\rbrace\). They did not analyze FastSV but instead compared it experimentally to a simplified version of Awerbuch and Shiloach’s algorithm called LACC and to other variants of SV. In their experiments SV was fastest.
We have been unable to prove a worst-case polylogarithmic step bound for SRT, nor for FastSV, nor indeed for any non-monotone algorithm that does not use edge alteration. But we do not have bad examples for these algorithms either.
A final paper with an interesting algorithm but incorrect analysis is that of Burkhardt [6]. The main novelty in Burkhardt’s algorithm is to replace each edge \(\lbrace v, w\rbrace\) by a pair of oppositely directed arcs \((v, w)\) and \((w, v)\) and to use asymmetric alteration: he replaces \((v, w)\) by \((w, v.p)\) instead of \((v.p, w.p)\) (unless \(w = v.p\)). This idea allows him to combine connecting and shortcutting in a natural way. (Burkhardt claims that his algorithm does not do shortcutting, but it does, implicitly.) Burkhardt does not give an explicit stopping rule, saying only, “This is repeated until all labels converge.” An iteration can alter arcs without changing any parents, so one must specify the stopping rule carefully. The following is a version of the main loop of Burkhardt’s algorithm with the parent updates and the arc alterations disentangled, and which stops when there is one root and all other vertices are leaves:
Algorithm B:
repeat
for each arc \((v, w)\)do
if\(v \gt w\)then
\(v.p = \min \lbrace v.p, w\rbrace\)
for each arc \((v, w)\)do
if\(v.p \ne w\)then
replace \((v, w)\) by \((w, v.p)\)
else delete \((v, w)\)
for each vertex vdo
if\(v.p \ne v\)then
add arc \((v.p, v)\)
until every arc \((v, w)\) has \(v.p = w.p\) and \(v.p \in \lbrace v, w\rbrace\)
Burkhardt claims that his algorithm takes \(O(\lg d)\) steps, which would be remarkable if true. Unfortunately, a long skinny grid is a counterexample, as shown in [2]. Burkhardt also claimed that the number of arcs existing at any given time is at most \(2m + n\). The version above has a 2m upper bound on the number of arcs. Two small changes in the algorithm reduce the upper bound on the number of arcs to m and make the shortcutting more efficient: replace each original edge \(\lbrace v, w\rbrace\) by one arc \((\max \lbrace v, w\rbrace , \min \lbrace v, w\rbrace)\), and in the loop over the vertices replace “add arc \((v.p, v)\)” by “add arc \((v, v.p.p)\).” We call the resulting algorithm AA, for asymmetric alteration.
The following pseudocode implements this algorithm:
A version of Algorithm AA was proposed to us by Yu-Pei Duo [private communication, 2018]. The techniques of Section 4.2 extend to give an \(O(\lg ^2 n)\) step bound for AA, B, and Burkhardt’s original algorithm. We omit the details.
Very recently, theoreticians have become interested in concurrent algorithms for connected components again, with the aim of obtaining a step bound logarithmic in d rather than n, for a suitably powerful model of computation. The first breakthrough result in this direction was that of Andoni et al. [2]. They gave a randomized algorithm that takes \(O(\lg d \lg \log _{m/n} n)\) steps in the MPC model. Their algorithm uses graph densification based on the distance-doubling technique of [21], controlled to keep the number of edges linearly bounded. Behnezhad et al. [5] improved the result of Andoni et al. by reducing the number of steps to \(O(\lg d + \lg \log _{m/n} n)\). Their algorithm can be implemented in the MPC model or on a very powerful version of the CRCW PRAM that supports a “multiprefix” operation. In recent work [18], we show that this algorithm and that of Andoni et al. can be simplified and implemented on an ARBITRARY CRCW PRAM.
6 Remarks
We have presented several very simple label-update algorithms to compute connected components concurrently. Our best bounds, of \(O(\lg n)\) steps and \(O(m \lg n)\) work, are for two related monotone algorithms, R and RA. For two other algorithms, A, which is non-monotone, and S, which keeps all trees flat by doing repeated shortcuts, our bounds are \(O(\lg ^2 n)\) steps and \(O(m \lg ^2 n)\) work, which are tight for S but maybe not for A. We have also pointed out errors in previous analyses of similar algorithms.
Our analysis of these algorithms is novel in that it extends over several rounds of the main loop, unlike previous analyses that consider only one round at a time. Our analysis of A combines new ideas with an idea from the analysis of disjoint set union algorithms. As mentioned in Section 5, this analysis extends to give an analysis of another algorithm in the literature (B) and to a variant of this algorithm (AA).
Our results illustrate the subtleties of even simple algorithms. A number of theoretical questions remain open, notably, determining tight asymptotic step bounds for algorithms P, A, H, SRT, FastSV, AA, and B. For P, SRT, and FastSV we know nothing interesting: our techniques seem too weak to derive a poly-logarithmic step bound for an algorithm such as these that is non-monotone and in which trees can be passive for an indefinite number of rounds. For A, AA, and B we have a bound of \(O(\lg ^2 n)\) steps but the lower bound is \(\Omega (\lg n)\).
All our algorithms are simple enough to merit experiments. Indeed, at least one recent experimental study using GPUs [13] has included our algorithms. In this study, our algorithms were faster than SV and much faster than simple label propagation (no shortcuts), but algorithms using disjoint set union were faster, and random sampling of the graph edges sped up all algorithms.
There is a natural way to convert each of the deterministic algorithms we have considered into a randomized algorithm: number the vertices from 1 to n uniformly at random and identify the vertices by number. In an application, one may get such randomization for free, for example if a hash table stores the vertex identifiers. It is natural to study the efficiency that results from such randomization. Working with Eitan Zlatin, we have obtained a high-probability \(O(\lg n)\) or \(O(\lg ^2 n)\) step bound for the randomized version of most of the algorithms we have presented, and in particular an \(O(\lg n)\) bound for algorithm A, improving our \(O(\lg ^2 n)\) worst-case bound. We shall report on these results in the future.
We have assumed global synchronization. The problem becomes much more challenging in an asynchronous setting. One of the authors and a colleague have developed algorithms for asynchronous concurrent disjoint set union [15], the incremental version of the connected components problem. Their algorithms can be used to find connected components asynchronously.
An interesting extension of the connected components problem is to construct a spanning tree of each component. It is easy to extend algorithms R, RA, and S to do this: when an edge causes a root to become a child, add the corresponding original edge to the spanning forest. Extending non-monotone algorithms such as A to construct spanning trees seems a much bigger challenge.
Acknowledgments
We thank Dipen Rughwani, Kostas Tsioutsiouliklis, and Yunhong Zhou for telling us about [26], for extensive discussions about the problem and our algorithms, and for insightful comments on our early results.
In the preliminary version of this work [17], we claimed an \(O(\lg ^2 n)\) step bound for algorithm P and for a related algorithm E (which we have omitted from the current paper). We thank Pei-Duo Yu for discovering that Lemma 6 in [17] does not hold for P and E, invalidating our proof of Theorem 13 in [17] for these algorithms.
Footnote
1
We denote by \(\lg\) the base-two logarithm.
References
[1]
Selim G. Akl. 1989. Design and Analysis of Parallel Algorithms. Prentice Hall.
Alexandr Andoni, Zhao Song, Clifford Stein, Zhengyu Wang, and Peilin Zhong. 2018. Parallel graph connectivity in log diameter rounds. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7–9, 2018. 674–685.
Baruch Awerbuch and Yossi Shiloach. 1987. New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Trans. Computers 36, 10 (1987), 1258–1263.
Stephen A. Cook, Cynthia Dwork, and Rüdiger Reischuk. 1986. Upper and lower time bounds for parallel random access machines without simultaneous writes. SIAM J. Comput. 15, 1 (1986), 87–97.
Steve Goddard, Subodh Kumar, and Jan F. Prins. 1994. Connected components algorithms for mesh-connected parallel computers. In Parallel Algorithms, Proceedings of a DIMACS Workshop, Brunswick, New Jersey, USA, October 17–18, 1994. 43–58.
Shay Halperin and Uri Zwick. 1996. An optimal randomised logarithmic time connectivity algorithm for the EREW PRAM. J. Comput. Syst. Sci. 53, 3 (1996), 395–416.
Daniel S. Hirschberg, Ashok K. Chandra, and Dilip V. Sarwate. 1979. Computing connected components on parallel computers. Commun. ACM 22, 8 (1979), 461–464.
Changwan Hong, Laxman Dhulipala, and Julian Shun. 2020. Exploring the design space of static and incremental graph connectivity algorithms on GPUs. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. 55–69.
Tsan-Sheng Hsu, Vijaya Ramachandran, and Nathaniel Dean. 1997. Parallel implementation of algorithms for finding connected components in graphs. Parallel Algorithms: Third DIMACS Implementation Challenge, October 17–19, 1994 30 (1997), 20.
Siddhartha V. Jayanti and Robert E. Tarjan. 2016. A randomized concurrent algorithm for disjoint set union. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, PODC 2016, Chicago, IL, USA, July 25–28, 2016. 75–82.
Donald B. Johnson and Panagiotis Takis Metaxas. 1997. Connected components in O (log23033/2 n) parallel time for the CREW PRAM. J. Comput. Syst. Sci. 54, 2 (1997), 227–242.
S. Cliff Liu and Robert E. Tarjan. 2019. Simple concurrent labeling algorithms for connected components. In 2nd Symposium on Simplicity in Algorithms, SOSA@SODA 2019, January 8–9, 2019 - San Diego, CA, USA. 3:1–3:20.
S. Cliff Liu, Robert E. Tarjan, and Peilin Zhong. 2020. Connected components on a PRAM in log diameter time. CoRR abs/2003.00614 (2020). https://arxiv.org/abs/2003.00614.
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6–10, 2010. 135–146.
Frank McSherry, Michael Isard, and Derek Gordon Murray. 2015. Scalability! But at what COST? In 15th Workshop on Hot Topics in Operating Systems, HotOS XV, Kartause Ittingen, Switzerland, May 18–20, 2015.
Vibhor Rastogi, Ashwin Machanavajjhala, Laukik Chitnis, and Anish Das Sarma. 2013. Finding connected components in map-reduce in logarithmic rounds. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8–12, 2013. 50–61.
Jyothish Soman, Kishore Kothapalli, and P. J. Narayanan. 2010. A fast GPU algorithm for graph connectivity. In 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19–23 April 2010 - Workshop Proceedings. 1–8.
Jyothish Soman, Kishore Kothapalli, and P. J. Narayanan. 2010. Some GPU algorithms for graph connected components and spanning tree. Parallel Processing Letters 20, 4 (2010), 325–339.
Stergios Stergiou, Dipen Rughwani, and Kostas Tsioutsiouliklis. 2018. Shortcutting label propagation for distributed connected components. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February 5–9, 2018. 540–546.
Da Yan, James Cheng, Kai Xing, Yi Lu, Wilfred Ng, and Yingyi Bu. 2014. Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB 7, 14 (2014), 1821–1832.
Yongzhe Zhang, Ariful Azad, and Zhenjiang Hu. 2020. FastSV: A distributed-memory connected component algorithm with fast convergence. In Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing. SIAM, 46–57.
He QZhang ZBi TFang HYi XYu K(2025)Adaptive Rumor Suppression on Social Networks: A Multi-Round Hybrid ApproachACM Transactions on Knowledge Discovery from Data10.1145/370173819:2(1-24)Online publication date: 11-Jan-2025
Kim CHan CPark H(2024)BTS: Load-Balanced Distributed Union-Find for Finding Connected Components with Balanced Tree Structures2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00089(1090-1102)Online publication date: 13-May-2024
Verma CGupta C(2024)Effect of Vaccination on Stability of Wireless Sensor Network Against Malware Attack: An Epidemiological ModelSN Computer Science10.1007/s42979-023-02532-35:2Online publication date: 27-Jan-2024
SPAA '20: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures
We present an O(log d + log logm/n n)-time randomized PRAM algorithm for computing the connected components of an n-vertex, m-edge undirected graph with maximum component diameter d. The algorithm runs on an ARBITRARY CRCW (concurrent-read, concurrent-...
We present fast and efficient parallel algorithms for finding the connected components of an undirected graph. These algorithms run on the exclusive-read, exclusive-write (EREW) PRAM. On a graph with n vertices and m edges, our randomized algorithm ...
SPAA '24: Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures
Computing the connected components of a graph is a fundamental problem in algorithmic graph theory. A major question in this area is whether we can compute connected components in o(log n) parallel time. Recent works showed an affirmative answer in the ...
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
He QZhang ZBi TFang HYi XYu K(2025)Adaptive Rumor Suppression on Social Networks: A Multi-Round Hybrid ApproachACM Transactions on Knowledge Discovery from Data10.1145/370173819:2(1-24)Online publication date: 11-Jan-2025
Kim CHan CPark H(2024)BTS: Load-Balanced Distributed Union-Find for Finding Connected Components with Balanced Tree Structures2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00089(1090-1102)Online publication date: 13-May-2024
Verma CGupta C(2024)Effect of Vaccination on Stability of Wireless Sensor Network Against Malware Attack: An Epidemiological ModelSN Computer Science10.1007/s42979-023-02532-35:2Online publication date: 27-Jan-2024