This article presents I/O-efficient algorithms for topologically sorting a directed acyclic graph and for the more general problem identifying and topologically sorting the strongly connected components of a directed graph G = (V, E). Both algorithms are randomized and have I/O-costs O(sort(E) · poly(log V)), with high probability, where sort(E) = O(E/B log M/B(E/B)) is the I/O cost of sorting an |E|-element array on a machine with size-B blocks and size-M cache/internal memory. These are the first algorithms for these problems that do not incur at least one I/O per vertex, and as such these are the first I/O-efficient algorithms for sparse graphs. By applying the technique of time-forward processing, these algorithms also imply I/O-efficient algorithms for most problems on directed acyclic graphs, such as shortest paths, as well as the single-source reachability problem on arbitrary directed graphs.
1 Introduction
Ullman and Yannakakis [25] and Chiang et al. [10] initiated the study of graph algorithms in the I/O model [2] over 20 years ago. Despite decades of research and many efficient algorithms for undirected graphs, there are essentially no I/O-efficient algorithms known for even the most basic problems on sparse directed graphs. Perhaps the most coveted is an algorithm for topologically sorting a directed acyclic graph (DAG). A topological sort of a DAG \(G=(V,E)\) is an ordering of the vertices such that for every edge \((u,v)\in E\), u precedes v in the ordering.
This article presents the first algorithm for topologically sorting a DAG that is I/O efficient even for sparse graphs. Not only is topologically sorting a fundamental problem on DAGs, but it is also a key subroutine in another general I/O-efficient technique known as time-forward processing [4, 10]. Due to the lack of a good general-purpose algorithm for topological sort, time-forward processing has only generated provably good results for restricted graph classes such as planar graphs [7, 17, 19].
1.1 The I/O Model and Common Subroutines
The I/O model [2], also called the external-memory model or disk-access-machine model, is a standard theoretical model for understanding the performance of algorithms on large datasets by capturing some notion of locality. The I/O model [2] is a two-level memory hierarchy comprising a size-Mcache (also called internal memory) and an external memory of unbounded size. All data, both in cache and in external memory, is organized in size-B chunks called blocks, so the cache consists of \(M/B\ge 1\) blocks. Computation may only occur on data residing in the cache, meaning that data must be transferred from the external memory to cache when needed. These data transfers are performed at the granularity of blocks; each block transfer is called an I/O. The cost of an algorithm in the I/O model, often called the I/O cost, is the number of I/Os performed. Computation itself is free.
The following are common examples of bounds in the I/O model. Iterating over a size-N array in order (assuming \(N\gt M\)) has I/O cost \(\mathit {scan}(N) = \Theta (N/B)\). Sorting [2] a size-N array has I/O cost \(\mathit {sort}(N) = \Theta (\frac{N}{B}\log _{M/B}(N/B))\). A key separation between the RAM model and the I/O model is the difference in cost between models for permuting. In the RAM model, permuting an array is as cheap as scanning. In the I/O model, for most settings of machine parameters permuting is generally as expensive as sorting. Specifically, permuting has I/O cost \(\Theta (\min \left\lbrace N, \mathit {sort}(N)\right\rbrace)\) [2], which for typical values resolves to the sort bound. (The N term corresponds to foregoing an I/O-efficient algorithm entirely—simply run the RAM algorithm and pay an I/O for every operation.) The cost of sorting thus often serves as a lower bound on the I/O cost for problems that can be solved in linear time in the RAM model. Many basic graph problems on sparse graphs (directed or undirected), including topological sort, have \(\Omega (\mathit {sort}(V))\) lower bounds in the I/O model [10].
Topological sort. There are two classic linear-time algorithms for topological sort in the RAM model, either repeatedly peeling off the vertices with in-degree 0, or performing a depth-first search and outputting the vertices in reverse order of finish time [14]. The best I/O algorithms known are based on the depth-first search approach, for which there are two algorithms. Neither is efficient for sparse graphs. Chiang et al. [10] provide an algorithm with I/O cost \(O(V + \mathit {sort}(E) + \frac{VE}{MB})\), and Buchsbaum et al. [9] give an algorithm with I/O cost \(O((V+\frac{E}{B})\log (V/B))\). Both of these bounds include at least a cost of \(\left|V\right|\), indicating that the algorithm may have to perform a random access or I/O for each vertex. For sparse graphs, notably \(\left|E\right| = \Theta (V)\), both of these algorithms are worse than simply running an ordinary RAM DFS and paying an I/O for every operation.
Time-forward processing. Time-forward processing, originally described by Chiang et al. [10], is a technique that allows for efficient evaluation of circuit-like problems on DAGs. Each vertex (or edge) starts with some value \(w(v)\) or \(w(u,v)\). The goal is to compute some label \(L(v)\) on each vertex, where \(L(v)\) depends only on the initial values w and the labels \(\left\lbrace L(u) | (u,v)\in E\right\rbrace\) on v’s immediate predecessors. If the graph is topologically sorted, and certain technical restrictions are met on the function being computed, then the DAG circuit evaluation can be performed in \(O(\mathit {sort}(E))\) I/Os by time-forward processing [4, 10]. The first solution [10] has additional restrictions on the relative size of the cache, but Arge’s [4] solution removes those restrictions by solving the problem with an I/O-efficient priority queue called a Buffer Tree.
One challenging aspect about graph problems in the I/O model is that vertices cannot generally be processed one by one without sacrificing I/O efficiency. Instead, vertices must be processed (roughly speaking) in parallel by applying various sort and scan steps. Time-forward processing is useful in part because it simulates the effect of processing vertices one by one. Thus, information can propagate arbitrarily far in the graph, provided that the graph is topologically sorted.
1.2 Results
This article gives the following results, all having I/O cost \(O(\mathit {sort}(E) \cdot \log ^5 V)\), with high probability, on a graph \(G=(V,E)\). For conciseness, we assume throughout that \(\left|E\right| = \Omega (V)\).
•
(Sections 3 and 4) A randomized algorithm for topologically sorting a DAG.
•
(Section 5) A randomized algorithm for identifying and topologically sorting the strongly connected components (SCCs). Although this result subsumes topologically sorting a DAG, the algorithm includes additional complications and is thus presented separately.
•
Using the topological sort algorithm coupled with time-forward processing [4, 10] yields efficient solutions to other problems on DAGs, such as shortest paths, with the same I/O cost.
•
Again applying time-forward processing [4, 10], the SCC algorithm implies a solution to the single-source reachability problem on directed graphs. Specifically, given a directed graph (not necessarily acyclic) and source vertex s, the set of vertices reachable from s can be be identified in \(O(\mathit {sort}(E)\cdot \log ^5 V)\) I/Os, with high probability.
1.3 Overview of the Approach
The general approach adopted here for a topological sort, loosely based on the IterTS algorithm described by Ajwani et al. [3], is as follows. Initially assign each vertex v a label \(L(v)\). Those labels induce an ordering over vertices. (For both our algorithm and IterTS, the labels correspond to a permutation of vertices, but in principle there could be ties.) Adopting the terminology from [3], an edge \((u,v)\) is satisfied if \(L(u) \lt L(v)\) and violated otherwise. The goal is to adjust labels over time such that eventually all edges are satisfied.
To understand what makes the problem difficult, consider the following naive realization of the general strategy. Use \(L_i(v)\) to denote the label of v in round i. Initially assign all vertices v the label \(L_0(v) = 0\). In each round i, update every vertex v’s label to \(L_i(v) = \max \left\lbrace L_{i-1}(u) + 1 | (u,v) \in E\right\rbrace\). (This type of update can be implemented by standard techniques, obtaining the updated label for all vertices via a constant number of sorts.) Although v’s label increases to ensure that \(L_i(v) \gt L_{i-1}(u)\), the edge \((u,v)\) only becomes satisfied if \(L_i(v) \gt L_i(u)\); if u’s label also increases during this round, then the edge may not be satisfied. In fact, with this algorithm the edge \((u,v)\) would only become satisfied during the round \(\ell\) for \(\ell\) equal to the length of the longest path to u. The end result is an algorithm with \(O(V \cdot \mathit {sort}(E))\) worst-case I/O cost. Granted, this realization is particularly naive, but it seems difficult to beat. Indeed, IterTS [3], which applies heuristics to achieve good performance in practice, encounters this bottleneck.
Note that it is trivial to satisfy roughly half the edges immediately by randomly permuting all the vertices and labeling vertices by their rank in the permutation. The challenge is in improving the labeling beyond that point.
1.3.1 Algorithm Overview.
An issue with the naive algorithm is that, in some sense, its label updates are too aggressive. Perhaps counter-intuitively, directly ensuring that \(L_i(v) \gt L_{i-1}(u)\) for all edges does not seem to lead to efficient algorithms. Instead, our algorithm temporarily gives up on satisfying certain edges, which makes it easier to satisfy the other edges.
Our algorithm (described more fully in Section 3) performs the following type of recursive partitioning of the graph, somewhat inspired by [12]. Each vertex chooses a random priority value. That priority is propagated some (random) distance forward in the graph. Each vertex adopts the highest priority it has seen, with potentially many vertices adopting the same priority. (This step is performed in a way that ensures that the endpoints of already-satisfied edges remain in priority order.) Next, vertices are collected into groups where all vertices in the group have equal priority. The groups are ordered in increasing order of priority, and finally the algorithm recurses on the vertex-induced subgraphs for each group.
The analysis considers any particular violated edge \((u,v)\). The main claim is that in one full execution of this recursive algorithm, \((u,v)\) has at least a constant probability of becoming satisfied. Repeating the recursive algorithm a logarithmic number of times gives the high-probability result for all edges.
The proof itself is counter-intuitive but also simple in hindsight. Consider a particular violated edge \((u,v)\). Initially, both u and v are in the same recursive subproblem. Ties on priority are good in the sense that they keep u and v in the same recursive subproblem. Eventually, at some recursive step, u and v adopt different priorities and are placed in different recursive subproblems, which fixes the status of \((u,v)\) for the remainder of the execution; the edge becomes satisfied if u’s subproblem is ordered before v’s, and the edge is said to be broken if v’s subproblem is ordered before u’s. The two key components of the analysis are the following: (1) at each level of recursion, the probability that the edge becomes broken is proportional to \(1/K\), where K is the number of distances selected from, and (2) after enough levels of recursion, the edge is very likely to cross subproblem boundaries. By selecting distances randomly from a large-enough range, the probability of an edge becoming broken is low enough that the edge is likely to cross subproblem boundaries before it has too many chances of becoming broken. If the edge crosses subproblem boundaries but is not broken, then it must be satisfied.
Extension to strongly connected components. The extended algorithm propagates priorities both backwards and forwards, contracting groups of vertices reached in both directions. The analysis follows a similar framework, but the presence of cycles complicates various aspects of the algorithm and analysis.
Roadmap.
The remainder of the article is organized as follows. Section 2 presents some related work on I/O-efficient graph algorithms as well as non-I/O algorithms that use similar techniques. Section 3 presents the algorithm for topological sort, and Section 4 analyzes that algorithm. Section 5 gives the algorithm for strongly connected components and its analysis.
2 Related Work
There is a large body of work, e.g., [1, 5, 6, 9, 10, 11, 18, 20, 21, 22, 23] on graph algorithms in the I/O model. See [26] or [27] for good surveys on the topic. For undirected graphs, many problems can be solved in \(O(\mathit {sort}(E))\) I/Os. (In fact, for dense graphs the logarithmic term in the sort bound can be improved slightly through sparsification [15].) For example, connectivity problems such as connected components, minimum spanning forest, biconnectivity, and so on, can all be solved in \(O(\mathit {sort}(E))\) I/Os [10], with high probability. If randomization is not allowed, there are several deterministic algorithms [1, 5, 10, 18, 23], which tend to be at worst logarithmic factors from the sort bound.
The directed analog of the connectivity problems are the reachability problems such as single-source reachability, topological sort, and strongly connected components. The best known bounds for these problems are significantly worse than for their undirected counterparts. Specifically, all of the best existing algorithms [9, 10] have a \(\left|V\right|\) term in their I/O cost, which is not I/O efficient in general. If the graphs are restricted to be planar graphs, however, many of these problems and more can be solved in \(O(\mathit {sort}(E))\) I/Os [5, 8, 19].
Due to the lack of provably efficient algorithms for topological sort, some research has focused on engineering practically efficient algorithms [3, 28]. For example, Ajwani et al. [3] use an iterative approach that follows the same general strategy as our algorithm.
Related work beyond I/O algorithms. In the RAM model, SCCs can be identified in linear time [14, 24] by performing depth-first search.
Our algorithm shares some similarities with other topological-sort or SCC algorithms that perform recursive decompositions of the graph [12, 13, 16] instead of depth-first search. Coppersmith et al. [13] describe a randomized divide-and-conquer algorithm for computing the strongly connected components that runs in \(O(E \log V)\) expected time in the RAM model. Cohen et al. [12] use a labeling scheme, which has a similar recursive structure, to solve an incremental topological sort where edges are added to the graph over time. Fineman’s [16] parallel algorithm, which also starts from similar ideas, solves the static reachability problems with \(O(E \cdot \rm {poly}(\log V))\) work and \(O(V^{2/3} \cdot \rm {poly}(\log V))\) span/depth, with high probability.
The recursive structure of our topological-sort algorithm is most similar to that of Cohen et al. [12] in that the subproblems are defined by performing forward searches from each vertex. Like Fineman’s algorithm [16] but unlike the others, our algorithm performs the label propagation/graph search to a bounded distance, but the specific notions of distance are different. Many of the specific details, such as how distances are chosen, also resemble features in Fineman’s algorithm [16]. This fact should not be surprising given that there are relationships between parallel algorithms and I/O algorithms (see, e.g., [10] for discussion).
Though there are some similarities in the details between the parallel algorithm [16] and the I/O algorithm presented herein, these similarities are somewhat superficial; the primary challenges in each setting are actually quite different. Notably, our I/O-efficient algorithm leverages time-forward processing, which is not efficient in the parallel model. In contrast, the parallel algorithm strongly exploits random accesses, which are not efficient in the I/O model.
3 Topological Sort Algorithm
This section describes the algorithm for topologically sorting a directed acyclic graph \(G=(V,E)\). The algorithm is analyzed in Section 4.
The graph is initially provided with the vertices in arbitrary order. As is typical for I/O algorithms, the graph representation is an array V of vertices and an array E of edges. The records of vertices and edges are as follows. Each vertex is represented by a unique ID, and each edge is represented by the IDs of its endpoints. Because the algorithm will sort the edge array many times, there need not be any assumption on the initial ordering or grouping of edges.
The goal of the algorithm is to gradually reorder vertices such that all edges are eventually satisfied, defined next. For each vertex, \(\mathit {index}(v)\) denotes the index of v in the vertex array, i.e., \(v = V[i]\) means \(\mathit {index}(v) = i\).
The algorithm is designed to ensure that once an edge becomes satisfied, it remains satisfied for the rest of the execution.
Algorithm 1 presents a high-level description of the algorithm, ignoring the low-level details necessary to transform the algorithm to an I/O-efficient realization. The main algorithm topologically sorts the graph by performing a sequence of executions of a recursive algorithm, called RecurTS. The goal with each execution of the recursive algorithm is to reorder vertices to satisfy some, ideally a constant fraction, of the violated edges. The main algorithm terminates when all edges have been satisfied, i.e., when the vertices in V are in topological-sort order. Section 3.1 describes RecurTS in more detail, and Section 3.2 briefly describes how to make RecurTS I/O efficient.
3.1 The Recursive Algorithm
At a high level, the recursive algorithm chooses random priorities for each vertex, propagates the priorities some distance in the graph, reorders vertices according to the highest priority observed, and finally recurses on subgraphs induced by vertices of the same priority. Before describing the algorithm in more detail, we first clarify the notion of distance adopted by the algorithm.
Distances. All distances discussed in this article are with respect to the number of violated edges, i.e., interpreting violated edges as having weight 1 and satisfied edges as having weight 0. If there exists a path from u to v that includes at most d violated edges, then we say u can reach v at distance d, denoted \(u\preceq _d v\). We also say that u is a d-hop predecessor of v.
The relation \(\preceq _\infty\) is the standard notion of reachability, and when the vertices are in topological-sort order \(\preceq _0\) and \(\preceq _\infty\) are equivalent. Note that unlike \(\preceq _\infty\), when d is a finite fixed distance \(\preceq _d\) is not transitive; however, \(x \preceq _{d_1} y\) and \(y \preceq _{d_2} z\) implies \(x \preceq _{d_1+d_2} z\).
Vertex-induced subgraphs. Each recursive call operates on a contiguous subarray of vertices and the subgraph induced by those vertices. We use \(G[i\dots \dots j]\) to denote the vertex-induced subgraph of G, induced by vertices in \(V[i\dots \dots j]\).
Global parameters. The algorithm is parameterized by K and \(\lambda\). The value \(\lambda\) specifies the maximum recursion depth, which will be discussed at the end of this subsection. The value K specifies the number of possible distances from which to select a random distance. There is a tradeoff here. Choosing larger K decreases the probability of an edge becoming broken, thereby increasing the number of edges that become satisfied. On the other hand, larger K also leads to higher I/O cost. A good value is selected in Section 4.
The algorithm. For now, ignore the recursion depth, \(\lambda\), and the specific range of distances. The algorithm RecurTS(\(G,i,j,\mathit {depth}\)) operates on the induced subgraph \(G[i\dots \dots j]\) as follows. Choose a distance d uniformly at random from a contiguous range of K possible distances. Assign each vertex v a distinct random priority \(\rho (v)\). For each v, let \(l(v)\) denote the highest priority from among v’s d-hop predecessors, i.e., \(l(v) = \max \left\lbrace \rho (u) : u \preceq _d v\text{ in $G[i\dots \dots j]$}\right\rbrace\). Sort the vertices by \(l(v)\) using a stable sort. This is the only place in the algorithm where vertices are reordered and edges may become satisfied. At this point, vertices with the same label l are grouped together into contiguous subarrays, and the groups are sorted by label. Finally, recurse on each group.
Figure 1 illustrates an example of a single level of recursion. In the figure, the distance used is \(d=2\). The three subfigures illustrate (Figure 1(a)) the initial graph and vertex ordering, (Figure 1(b)) the labels assigned to vertices after propagating priorities to a distance of \(d=2\), and (Figure 1(c)) the new ordering on vertices and the recursive subproblems. After reordering vertices here according to label, two previously violated edges, namely, \((D,C)\) and \((J,I)\), become satisfied. Notice that only some of the edges crossing subproblem boundaries transition from violated to satisfied, and no edges change from satisfied to violated.
Fig. 1.
Distance ranges and maximum recursion depth. One component of the analysis (Section 4) is that the number of d-hop predecessors of v decreases with each level of recursion. This progress argument, however, is with respect to the specific distance, and it seems difficult to argue anything about the number of \(d^{\prime }\)-hop predecessors for \(d^{\prime } \gt d\). On the other hand, to argue that edges are unlikely to be broken, distances need to be selected randomly from K possibilities. To reconcile these two issues, the range of distances depends on the level of recursion, decreasing with each level. Moreover, since distances should always be positive, the distance used at recursion depth 0 places a limit on the number of levels of recursion that the algorithm can support.
Putting these ideas together, we have the following. If a call is made with recursion depth \(\mathit {depth} \ge \lambda\), then the algorithm simply returns. Otherwise, the distance d is selected uniformly at random from the range \([d_{\min },d_{\max })\), where \(d_{\min } = d_{\max }-K\) and \(d_{\max } = (\lambda -\mathit {depth}) \cdot K\).
3.2 Achieving I/O Efficiency
This section describes how to make RecurTS I/O efficient, all of which is fairly straightforward. We first describe the implementation with respect to the initial call to RecurTS on the entire graph. We later describe how to implement the recursion.
Each vertex and edge record is augmented with a constant amount of additional information, so that the total space of the vertex and edge arrays is still \(O(V)\) and \(O(E)\), respectively. The standard technique for transferring information along edges is by performing sorts and scans of the vertex and edge arrays. These sorts should be viewed as transient, unlike the sort explicitly given in Algorithm 1 whose goal is to produce the topological sort.
First, tag each vertex v with its index \(\mathit {index}(v)\), which can be achieved by a single scan (i.e., iterating in order) of the vertex array. Next, tag each edge by the indices of its endpoints. This edge-tagging step can be accomplished by sorting both the vertex array by vertex ID and sorting the edge array by the ID of one endpoint. Then, perform simultaneous scans of the vertex array and edge array, synchronizing the scans on ID, and copying the index of the vertex to the edges with matching endpoint ID. To store the index of the other endpoint, sort by the other endpoint. The cost of these steps is \(\Theta (\mathit {sort}(V)+\mathit {sort}(E))\) for sorting the arrays and \(O(\mathit {scan}(V)+\mathit {scan}(E))\) for iterating over them. The sort bound dominates.
To assign a permutation of priorities, simply select random numbers in the range \({1,2,\ldots ,\left|V\right|^c}\) for each vertex, where \(c \ge 2\) is a constant that controls failure probability. Sort the vertices by priority and perform a scan to verify that all priorities are distinct. Repeat this process until the priorities are distinct.
Propagating priorities.
The most difficult aspect is implementing the label \(l(v) = \max \lbrace \rho (u) :\)\(u\preceq _d v\rbrace\). This is achieved incrementally through a sequence of propagation steps. Initially, set \(l(v) = \rho (v)\) and perform an update called satisfied-edge propagation. Next, perform d rounds, each including violated-edge propagation followed by satisfied-edge propagation. There are thus \(2d+1\) propagation steps in total.
Before describing how to implement the two types of propagation steps, let us first discuss the goal of each type of update. Let \(l(v)\) and \(l^{\prime }(v)\) denote v’s labels at the start and end, respectively, of a single propagation step (satisfied or violated). The goal of satisfied-edge propagation is to update the label to \(l^{\prime }(v) = \max \left\lbrace l(u) : u \preceq _0 v\right\rbrace\), i.e., propagate the label arbitrarily far but along satisfied edges only. The goal of violated-edge propagation is to update the label to \(l^{\prime }(v) = \max \left\lbrace l(v),\max \left\lbrace l(u) : (u,v) \in E\right\rbrace \right\rbrace\), i.e., propagate the label along a single hop that is allowed to be violated.
We now argue that the sequence of propagation steps gives each vertex the intended label.
Implementing satisfied-edge propagation. The satisfied edges \(E_{\mathit {sat}}\) can be identified by scanning through the edge set and identifying those edges \((u,v)\) with \(\mathit {index}(u) \lt \mathit {index}(v)\). Note that when V is sorted by index, the graph \(G_{\mathit {sat}} = (V,E_{\mathit {sat}})\) is topologically sorted, which is important as we shall apply time-forward processing.
In more detail, performing the update \(l^{\prime }(v) = \max \left\lbrace l(u) : u\preceq _0 v \text{ in $G$}\right\rbrace\) is equivalent to computing \(l^{\prime }(v) = \max \left\lbrace l(u) : u\preceq _\infty v \text{ in $G_{\mathit {sat}}$}\right\rbrace\). The following is a simple sequential algorithm for computing the updated label with regard to \(G_{\mathit {sat}}\). Consider the vertices v in \(G_{\mathit {sat}}\) in topological-sort order; update v’s label to \(\max \left\lbrace l(v), \max \left\lbrace l(u) : (u,v) \in E_{\mathit {sat}}\right\rbrace \right\rbrace\), i.e., the maximum of its old value and the value on all immediate predecessors. This local-update rule is exactly the kind that can be implemented I/O-efficiently using time-forward processing [4, 10], assuming \(G_{\mathit {sat}}\) is topologically sorted.
Time-forward processing. This section describes the algorithm for time-forward processing from [4, 10]. The algorithm takes as input a DAG with the vertices in topologically sorted order. We assume the edges are given in a list in no particular order to start. Each vertex v also has a priority \(\rho (v)\). The goal is to compute \(f(v)\) for each vertex where \(f(v) = \max (p(v), f(u))\) for each \((u,v)\) in E, where \(f(u)\) is defined recursively. We use an external memory priority queue, which has amortized cost \(O(1/B \log _{M/B} N/B)\) I/Os for insert, delete, and extract min [4].
The vertices are processed in topological order. To process a vertex v, first compute \(f(v)\), which will be described later. Then for each outgoing edge from v, \((v,w) \in E\), insert \((v,w)\) with key w, i.e., the position of w in topologically sorted order, into the priority queue augmented with \(f(v)\). Notice that each edge is inserted into the priority queue once, and each vertex has an element in the priority queue for each incoming edge it has. To compute \(f(v)\), we must extract min once for each incoming edge that v has. Then set \(f(v)\) to be the maximum of all \(f(w)\) augmented to each incoming edge \((w,v)\), and \(\rho (v)\).
The edges are inserted in topological order of the source vertex. By sorting the edge list by source vertex, this allows for one scan of the edge list to insert all the edges into the priority queue. The number of inserts to the priority queue is the number of edges \(O(E)\). The number of extract mins is also the number of edges \(O(E)\). In total, this is \(O(E/B \log _{M/B} E/B)\) I/Os.
Implementing violated-edge propagation. This step can be accomplished by sorting and scanning. In particular, first sort the edges \((u,v)\) by \(\mathit {index}(u)\) so that all outgoing edges for a particular vertex u are consecutive. Then scan through the vertices and edges simultaneously, synchronizing on the vertex index. Attach to the edge \((u,v)\) the priority \(l(u,v) = l(u)\). Next, sort the edges \((u,v)\) by \(\mathit {index}(v)\). Now the incoming edges for each vertex v are consecutive. Finally, scan through the edges and vertices simultaneously, and for each v update \(l^{\prime }(v) = \max \left\lbrace l(v),\max \left\lbrace l(u) : (u,v) \in E\right\rbrace \right\rbrace\).
Implementing the recursion. To slightly simplify the analysis of the I/O cost, it is convenient to reason about the algorithm as performing the recursion level by level.1 That is, do not actually make multiple recursive calls. Instead, perform the algorithm as described for the entire level at once. The only additional bookkeeping necessary is to delimit the boundaries between each recursive subproblem in the vertex array. Each level of recursion can then be implemented by first scanning through the vertex array and tagging each vertex with a subproblem ID (increasing by one when crossing each subproblem boundary) and similarly tagging the edges with the subproblem IDs of its endpoints. All edges whose endpoints have different subproblems should be ignored in all steps. Whenever sorting vertices by label or priority, the subproblem ID should also be taken into account as the most significant feature in the sort. (That is, sort lexicographically by subproblem ID, then label/priority.) The other details are unchanged.
We can now analyze the I/O cost of the recursive algorithm. Assuming a minimum constant size on the cache is necessary to implement a constant number of synchronized scans in \(O(\mathit {scan}(N))\) I/Os. There are also similar cache-size assumptions in time-forward processing [4] (which are not highlighted in those theorem statements) that would carry over to this setting. The following theorem comes from Theorem 3 and Section 4.1 in [4].
4 Topological Sort Analysis
The section analyzes the topological-sort algorithm given in Section 3. The goal is to show that, with high probability, the main algorithm completes after \(O(\log V)\) executions of RecurTS. The key component toward achieving this goal is to show that in each execution, each violated edge has a constant probability of becoming satisfied. The bulk of this section is devoted to proving this claim. Given the claim, it is simple to show that \(O(\log V)\) executions suffice.
Consider any violated edge \((u,v)\) and an execution of RecurTS. The most important point of the execution is the moment, if any, that u and v receive different priorities and are hence placed in different recursive subproblems. This step is the only time during the execution that the relative order of u and v may change. If u is ordered before v, then the edge becomes satisfied. If u remains ordered after v, however, then the edge \((u,v)\) cannot become satisfied for the remainder of the execution (i.e., until the next execution of RecurTS.) The following definition captures this bad outcome:
In Figure 1(c), the broken edges are \((L,F)\), \((I,H)\), \((O,A)\), and \((O,N)\).
As outlined in Section 1.3.1, our analysis consists of two main components. First, we argue that, for large enough K, an edge \((u,v)\) has at most constant probability of becoming broken during an execution of RecurTS. Second, we argue that for large enough \(\lambda\), the execution is likely to terminate with u and v in different subproblems. If u and v are in different subproblems, and the edge is not broken, then it must be satisfied. The remainder of the section focuses on proving each of these claims.
Predecessors. Throughout the analysis, it is useful to refer to the set of predecessors of a particular vertex. Let \(G=(V,E)\) be a graph, let \(v \in V\) be a vertex in the graph, and let d be a distance. We define the d-hop predecessors of v, denoted by \(A(G,v,d)\), as \(A(G,v,d) = \left\lbrace x : x \preceq _d v \text{ in $G$}\right\rbrace\).
4.1 Bounding the Probability of an Edge Becoming Broken
We argue that in any level of recursion, a particular violated edge \((u,v)\) has probability at most \(O(\log V/ K)\) of becoming broken. Taking a union bound across all \(\lambda\) levels of recursion gives a probability of at most \(O(\lambda \log V / K)\) that the edge becomes broken across the entire execution of RecurTS. Setting \(K=\Omega (\lambda \log V)\) and tuning constants appropriately, the probability that \((u,v)\) becomes broken is upper bounded by a constant.
We begin by considering how an edge can become broken. The following lemma implies that an edge \((u,v)\) can become broken only if u’s highest-priority d-hop predecessor is located exactly d violated hops away.
We next bound the probability that an edge \((u,v)\) becomes broken in a particular recursive call. Consider the random process as follows. First choose a random distance. Then identify which vertex, from among the d-hop predecessors, has highest priority. Specifically, determine if the highest-priority predecessor is also a \((d-1)\)-hop predecessor; if so, by the previous lemma the edge does not break. The probability of the edge breaking thus depends on the relative sizes of the \(A(G^{\prime },u,d)\) and \(A(G^{\prime },u,d-1)\). The main idea is therefore to characterize distances by relative neighborhood sizes.
The argument is roughly as follows, but the following lemma provides a tighter bound. A distance d is “bad” if at least a \(1/\log V\)-fraction of the d-hop predecessors are at distance exactly d, i.e., not also \((d-1)\)-hop predecessors. If a bad distance is selected, the probability of the edge breaking may be high. Fortunately, due to the expansion implied by bad distances, there cannot be too many bad distances—specifically only \(O(\log ^2 V)\) of them. If a good distance is selected, the probability that the edge breaks is at most \(O(1/\log V)\). Putting these together, the probability that the edge breaks is \(O(\log ^2 V / K + 1/\log V)\). The next lemma improves this to \(O(\log V / K)\) by more carefully accounting for how bad each distance is.
Proof. Let B denote the event that the edge \((u,v)\) is broken. Let d denote the random distance chosen, and let x be the vertex in \(A(G^{\prime },u,d)\) with highest priority. By Lemma 4.2, \({\mathbf {Pr}\left[\,{B}\,\right]} \le {\mathbf {Pr}\left[\,{x \not\in A(G^{\prime },u,d-1)}\,\right]}\), so it suffices to bound the latter.
For each possible d in \([d_{\min }-1,d_{\max })\), let \(s_d = \left|A(G^{\prime },u,d)\right|\) denote the number of d-hop predecessors of u in \(G^{\prime }\). Define \(\gamma _d = s_{d-1}/s_d\) to be the fraction of of u’s d-hop predecessors that are also \((d-1)\)-hop predecessors.
Let \(E_d\) denote the event that distance d is chosen. Once d is fixed, we trivially have \({\mathbf {Pr}\left[\,{B | E_d}\,\right]} \le 1-\gamma _d\). Since the distance is chosen uniformly at random from K possibilities, we have
The vertex u is a d-hop predecessor of itself, and at most every vertex is a d-hop predecessor of u, so \(1\le s_d \le \left|V\right|\). We therefore have \(\left|V\right| \ge s_{d_{\max }-1} \ge s_{d_{\max }-1} / s_{d_{\min }-1} = \prod _{d=d_{\min }}^{d_{\max }-1}(1/\gamma _d)\). By monotonicity of the \(\lg\) function, \(\lg (\left|V\right|) \ge \sum _{d=d_{\min }}^{d_{\max }-1} \lg (1/\gamma _d)\). Finally, \(\gamma _d \in (0,1]\), and for this range \(\lg (1/\gamma _d) \ge 1-\gamma _d\). We therefore have \(\lg (\left|V\right|) \ge \sum _{d=d_{\min }}^{d_{\max }-1} (1-\gamma _d)\).
Substituting back for the probability of B, we have
4.2 Bounding the Probability that An Edge Crosses Subproblems
The second key component of the analysis is to argue that at the end of an execution, the edge \((u,v)\) is likely to cross subproblem boundaries. To achieve this goal, we argue that with each level of recursion, v is likely to lose a constant fraction of its nearby predecessors. Thus, with \(\Omega (\log V)\) levels of recursion, it is very likely that v has no predecessors. If v has no predecessors, then u must be in a different subproblem.
Definitions. More formally, consider the calls RecurTS(\(G, i, j,\ell\)) arising during the execution of the recursive algorithm, where \(\ell\) here denotes level or depth of the recursion. If \(v \in G\ [i\dots \dots j]\), we call \(G_v^\ell = G\ [i\dots \dots j]\) the level-\(\ell\) graph of v. Notice that v belongs to at most one subproblem at each level of recursion. If v does not belong to any level-\(\ell\) subproblems (i.e., if the base case was reached early), then \(G_v^\ell\) is the empty graph. Thus, v has a corresponding sequence \(G_v^0, G_v^1,\ldots , G_v^\lambda\) of level-\(0,1,\ldots ,\lambda\) graphs, where \(G_v^0 \supseteq G_v^1 \supseteq \cdots \supseteq G_v^\lambda\).
For this subsection, the important feature is the number of nearby, proper predecessors v has at each level of recursion. A vertex x is a level-\(\ell\) active predecessor of v if \(x \ne v\) and \(x \preceq _{d_{\max }} v\) in \(G_v^\ell\), where \(d_{\max } = K(\lambda - \ell)\) is the maximum distance for this level of recursion. Notice that v is not an active predecessor of itself.
Reducing the number of active predecessors.
We start with a simple observation, captured by the first lemma: no new relationships are created between vertices as the algorithm recurses. Thus, we need not worry about the set of active predecessors growing—the only challenge is to show that a significant fraction of the predecessors are likely to be knocked out.
We are now ready to argue that the number of active predecessors is likely to decrease at each level of recursion. This proof leverages only the random priorities—the fact that distances are chosen randomly is not important. The proof (notably the second claim therein) lifts some ideas from [Lemma 3.4][16].
Lemma 4.5 indicates that with each level of recursion, the number of active predecessors is likely to decrease by a constant factor. The following lemma says that after enough levels of recursion, v is likely to have no remaining active predecessors. The implication is that all of its incoming edges cross subproblem boundaries.
4.3 Edges are Likely to Become Satisfied
Thus far, we have argued that edges are unlikely to become broken at any particular level of recursion, and that edges are likely to cross subproblem boundaries by the time the recursive algorithm terminates. This section combines those pieces to conclude that in a single execution of the recursive algorithm, a violated edge is likely to become satisfied.
Before getting to the main claim, we first observe that satisfied edges stay satisfied. This fact is important both to argue that a violated edge is likely to become satisfied in a single execution, and to argue that multiple executions lead to monotonic progress.
4.4 Bounds on the Main Algorithm
Finally, we analyze the main algorithm, which repeatedly executes RecurTS until the graph is topologically sorted.
5 Strongly Connected Components
This section describes our algorithm for strongly connected components. Given a graph \(G=(V,E)\), vertices \(u,v \in V\) are strongly connected if there exist directed paths both from u to v and from v to u. A strongly connected component is a maximal set of vertices such that every pair of vertices therein is strongly connected. The condensationH of a graph G is the DAG of strongly connected components, i.e., the graph formed if each strongly connected component is contracted. The goal is to identify for each vertex the strongly connected component to which it belongs and to topologically sort the condensation.
At a high level, the main intent of the algorithm is similar to Algorithm 1—reorder vertices to satisfy more edges. But it would, of course, be impossible to simultaneously satisfy all edges on a cycle. Our algorithm for strongly connected components therefore performs a little extra work to identify strongly connected vertices, notably those falling on short cycles, and contracts them into a single supervertex. The graph is thus gradually transformed into its condensation; with each iteration, the number of violated edges may reduce both by removing contracted edges from the graph and by reordering any remaining supervertices.
Aside from the contraction, component maintenance, and extra bookkeeping, the main difference between the algorithms for topological sort and strongly connected components is that the former propagates priorities in only the forward direction, whereas the latter propagates priorities both forwards and backwards. This two-directional propagation facilitates the discovery of cycles.
5.1 Algorithm
Algorithm 2 presents a conceptual version of the algorithm for topologically sorting the condensation H of the graph \(G=(V,E)\). Section 5.2 provides implementation details for mapping this algorithm to the I/O model.
The algorithm maintains three types of information: (1) a mapping from vertices in the original graph to (partial) components, where each partial component is a subset of vertices in a strongly connected component; (2) a graph \(H=(V_H,E_H)\) on the partial components, corresponding to the graph formed by contracting each partial component in G; and (3) an ordering of the vertices \(V_H\) in the component graph. As the algorithm progresses, components are merged together through contraction steps. When the algorithm terminates, H is the condensation, and the vertex ordering represents a topological sort of the condensation.
As before, the top-level algorithm consists of multiple iterations. But now each iteration consists of not only an execution of RecurSCC, but also a contraction step following the execution. Each execution of RecurSCC is analogous to RecurTS, except that some vertices are flagged for contraction. Specifically, the output of RecurSCC is an updated ordering of the vertices \(V_H\) in the component graph H as before, but unlike RecurTS some contiguous sets of vertices are flagged for contraction. Any initially satisfied edges between unflagged vertices remain satisfied, as before, and ideally some violated edges become satisfied. During the contraction step, sets of vertices identified as being strongly connected are contracted, removing any edges between them.
The Recursive Subroutine
The recursive subroutine RecurSCC is parameterized by global values \(\lambda\) and K, denoting the maximum recursion depth and range of distances to choose from, respectively. RecurSCC takes as input an induced subgraph \(H[i\dots \dots j]\) of the graph on partial components, and the current recursion depth \(\mathit {depth}\).
RecurSCC proceeds as follows. Much of the algorithm is similar to RecurTS of Section 3. Firstly, check if the recursion depth is exceeded (i.e., \(\mathit {depth} \ge \lambda\)), and if so simply return. Otherwise, choose a distance d uniformly at random from the range \([d_{\min },d_{\max })\), where \(d_{\min } = d_{\max } - K\). As in Section 3, the offset for the range is chosen according to the recursion depth, with \(d_{\max } = (\lambda - \mathit {depth}) \cdot K\).
Next, assign a uniformly random permutation of priorities \(\rho (v)\) to each vertex. Unlike RecurTS, RecurSCC propagates the priorities in both the forward direction and the backward direction. Specifically, define \(f(v) = \max \left\lbrace \rho (u) : u\preceq _d v\right\rbrace\) and \(b(v) = \max \left\lbrace \rho (w) : v \preceq _d w\right\rbrace\). Assign a label \(l(v)\) to each vertex v based on the results of the forward and backward searches. There are three cases. If the priority from the forward search dominates, i.e., \(f(v) \gt b(v)\), then \(l(v) = f(v)\) as in RecurTS. If the priority from the backward search dominates, i.e., \(b(v) \gt f(v)\), then \(l(v) = -b(v)\). These two cases are symmetric—vertices dominated by larger priorities in the forward direction are pushed later in the ordering, and vertices dominated by larger priorities in the backward direction are pushed earlier in the ordering. The third case is if the priorities are equal in both directions. In this case, set \(l(v) = -b(v) + 1/2\).
Finally, sort the vertices by \(l(v)\), with ties broken according to the current ordering. After sorting, partition the vertices into groups of vertices having the same label, as in RecurTS. Recurse on those groups with integer labels, i.e., \(f(v) \ne b(v)\). The groups with non-integer labels are instead flagged for contraction.
Note that the specific choice of label \(-b(v)+1/2\) for the third case is not particularly important. The only truly important aspect is that \(-b(v) \lt l(v) \lt f(v)\) to ensure that satisfied edges remain satisfied. In fact, the ordering across groups of vertices that fall in this third case, for different priorities, does not matter. It is, however, easier to implement the subsequent contraction if each such group of vertices is contiguous. To achieve that, we include the dominating priority in the label, e.g., choosing \(l(v) = -b(v)+1/2\); many other choices would also suffice.
The main theorem, proved in Section 5.4, is the following.
5.2 I/O-Efficient Details
This section provides details on making the algorithm I/O efficient. The original vertices of graph G are stored in an array V. A second array \(V_H\) stores the vertices of graph H. Each vertex in H corresponds to a partial component in G that has been contracted, identified by the ID of a representative vertex. The edges between components are stored in an array \(E_H\), with each edge storing the component IDs of its endpoints.
All vertex records \(u \in V\) for the original graph are tagged with the ID \(c(u)\) of their component’s representative, which corresponds to the ID of a vertex in \(V_H\). For convenience, the vertex records \(v \in V_H\) are also tagged with \(c(v)\). Initially, \(c(u) = u\) for all \(u \in V\), and \(V_H = V\). In general, between iterations, \(c(v) = v\) if and only if \(v \in V_H\). When the algorithm terminates, the vertices representing each strongly connected component are topologically ordered in \(V_H\), and for each vertex \(u\in V\), \(c(u)\) specifies the representative of u’s strongly connected component.
The details for the recursive algorithm are similar to those for RecurTS in Section 3.2. There are minor differences in that the priorities must be propagated in two directions, now computing \(b(v)\) in addition to the \(f(v)\) already computed in RecurTS. But steps for \(b(v)\) are symmetric, i.e., operating on the transpose graph, which can be computed in \(O(\mathit {sort}(E))\) I/Os.
The only significant difference is implementing the main loop, namely, in the contraction step.
Contraction. By design, when RecurSCC returns, groups of vertices that are to be contracted are contiguous in the vertex array \(V_H\). When a group is flagged for contraction, the boundaries of the group should also be marked.
The first step of the contraction is to update \(c(v)\) for all vertices \(v\in V_H\) to be contracted. Specifically, the first vertex x (in array order) in each group is the representative for the group. All other vertices in the group update \(c(u) = x\). This step can be accomplished by a scan of the array \(V_H\).
The next step is to update the components for vertices stored in V. Specifically, each vertex \(u \in V\) has some component ID \(c(u) = v\), where \(v \in V_H\). The goal is to update \(c(u) = c(v)\). To do so, sort V by component IDs \(c(u)\) and sort \(V_H\) by vertex IDs v. Thus, both V and \(V_H\) are sorted according to their original components. Moreover, the vertices \(v \in V_H\) already know their new component \(c(v)\). Next, perform synchronized scans of V and \(V_H\), and for each vertex \(u \in V\), update \(c(u) = c(v)\).
Any vertices in V with \(c(v) \ne v\) can now be removed from \(V_H\). This step can be accomplished with a scan. At this point, all vertices have the correct component IDs, and only representatives are stored in \(V_H\).
The final step is to update the edges \(E_H\). Specifically, any edge \((u,v)\) should be updated to reflect the component IDs of its endpoints, i.e., \((c(u),c(v))\). To do so, sort V by ID, and sort the edges \((u,v) \in E_H\) by the ID of u. Next, perform synchronized scans of \(E_H\) and V, updating u to \(c(u)\) for each edge \((u,v)\). Then sort the edges by the ID of the other endpoint v and perform a similar update. Finally, self-loops can be removed by scanning through all the edges one last time and checking for any edges of the form \((u,u)\). Optionally, duplicate edges can also be removed by sorting the edges one last time (by both endpoints) and scanning through to remove duplicates.
5.3 I/O Complexity of Strongly Connected Components
Assuming Theorem 5.1, we now bound the I/O cost of the algorithm.
5.4 Strongly Connected Components Analysis
The goal of this section is to prove Theorem 5.1, i.e., that \(O(\log V)\) executions of the main loop suffice, with high probability. The analysis follows a similar structure to the analysis of topological sort in Section 4. The main goal is to show that any violated edge \((u,v) \in E_H\) has a constant probability of either becoming satisfied in an execution of RecurSCC or being contracted away thereafter. Since any cycle in the graph must have at least one violated edge, satisfying all remaining edges implies that all cycles have been contracted, and the condensation of the graph is topologically sorted. Given that claim, it is easy to show that \(O(\log V)\) iterations suffice.
As before, the analysis consists of two main components applied to graph H. A minor difference is that groups of vertices to be contracted are technically not part of a recursive subproblem. Insofar as definitions are concerned (e.g., being broken), when we say “subproblem” we mean each group of vertices produced by the partitioning step in the algorithm, either corresponding to a group to be contracted or a recursive call.
The first component of the analysis is to show that an edge is unlikely to break. This component is largely similar to the corresponding component in Section 4, except that edges may break due to searches in either direction. Note that, conveniently, edges within a group to be contracted are never broken as these edges do not cross subproblem boundaries.
The goal of the second component is now to show that for any edge \((u,v) \in E_H\), the execution of RecurSCC is likely to end either with u and v in different subproblems, or with u and v marked for contraction with each other. If the edge is not broken, and u and v are in different subproblems, then the edge becomes satisfied. If u and v are contracted, then the edge is removed from the graph entirely because the contraction step removes self-loops.
Successors. The main differences in the analysis arise from the fact that priorities are propagated in two directions. It is thus no longer sufficient to focus just on the d-hop predecessors. We must also consider the successors. We define the d-hop successors of v, denoted by \(D(G,v,d)\), as \(D(G,v,d) = \left\lbrace x : v \preceq _d x \text{ in $G$}\right\rbrace\).
The presence of backward propagation impacts that analysis in various places, most of which are minor. The most difficult is in arguing progress with respect to the number of nearby vertices. Rather than argue progress on just the number of nearby (active) predecessors, Lemma 5.8 argues progress on both nearby predecessors and successors.
5.4.1 Bounding the Probability of an Edge Becoming Broken.
The following lemmas are analogous to Lemmas 4.2 and 4.3. Notably, an edge cannot be broken unless either its d-hop predecessor or successor is exactly d hops away, which is unlikely to occur.
5.4.2 Bounding the Probability that An Edge Crosses Subproblems.
The analysis here is analogous to Section 4.2. The main difference is that here we consider the number of active vertices in both directions, not just predecessors. For this section, we adopt the same notion of level-\(\ell\) graphs as in Section 4.2, except applied to the contracted graph H instead of the original graph G. Note that \(H_v^\ell\) is the empty graph if v is no longer part of a recursive subproblem, which can now also occur if v is marked for contraction before the \(\ell\)-th level of recursion.
This first lemma says that vertices do not get closer together, i.e., no new relationships are created, when recursing. Consequently, it is sufficient to argue that a constant fraction of related vertices are likely to be knocked out.
The remainder of the section focuses on the number of active vertices, except we now consider both predecessors and successors.
Consider any edge \((u,v)\) that is violated at the start of the recursive algorithm. Observe that if v has no level-\(\ell\) active predecessors, then either \((u,v)\) falls within a group marked for contraction, or \((u,v)\) crosses a subproblem boundary.
The analysis differs here from topological sort because the subproblem derives from both predecessors and successors. In particular, the label \(l(v)\), for a vertex v, is based on whether the forwards or backwards search dominates. Our claim here is that the sum of the number of active predecessors and active successors decreases by a constant factor in each level of recursion with constant probability.
The preceding lemma indicates that each level of recursion is likely to reduce the total number of active vertices by a constant factor. The following lemma applies this lemma across \(\lambda\) levels of recursion to conclude that v is likely to be in its own subproblem, or marked for contraction, before the recursion bottoms out.
5.4.3 Edges are Likely to Become Satisfied.
We have argued that in each execution of the recursive algorithm, a particular edge \((u,v)\) is unlikely to become broken, and moreover that v is likely to either be in its own subproblem or marked for contraction. The implication is that if both favorable outcomes occur, the edge \((u,v)\) either crosses subproblem boundaries and becomes satisfied, or both u and v are contracted with each other and the edge disappears. Completing this argument again requires monotonic progress on satisfied edges. The following lemma says that satisfied edges never become violated later.
5.4.4 Bounds on the Main Algorithm.
We next prove Theorem 5.1, which states that the algorithm topologically sorts the condensation of the graph after \(\left\lceil (c+2)\lg V \right\rceil\) executions with failure probability at most \(1/V^c\), for any \(c \gt 0\).
Proof of Theorem 5.1. By Lemma 5.12, the algorithm never performs any erroneous contractions. If the algorithm terminates, it must therefore be the case that the graph is topologically sorted, which is only possible if there are no cycles, i.e., if all strongly connected components have been contracted.
Lemma 5.10 says that violated edges are never introduced. Moreover, by Lemma 5.11, each violated edge has a constant probability of being removed or becoming satisfied. The rest of the proof is the same as Theorem 4.9.
6 Conclusions
This article has presented the first algorithm for topological sort and related problems that is I/O efficient even for sparse graphs.
The main question remaining is whether the algorithm can be improved to achieve an I/O cost \(O(\mathit {sort}(E)\cdot \log ^x V)\), for \(x \lt 5\). One of the logarithmic factors arises from the fact that the number of distances K is large (i.e., \(K=\Theta (\log ^2 V)\)). Another arises from the fact that distance ranges do not overlap at each level of recursion. We suspect that at least one of these logarithmic factors can be removed, yielding \(x=4\) or potentially even \(x=3\). Achieving \(x \lt 3\), however, seems difficult. Inherent in the approach are at least two logarithmic factors: the number of iterations and the number of levels of recursion. Moreover, reducing the number of distances to \(K=O(1)\), which would be necessary to get \(x\lt 3\), would require some significant new ideas.
Another interesting question is whether randomization is necessary for these problems. Randomization plays a key role in our algorithm, but there may be alternative approaches.
Footnote
1
The issue is that there are no bounds on the relative sizes of a problem and its recursive subproblems—a graph that fits in cache may be partitioned into subgraphs that are much smaller than a block. If considering each recursive subproblem one at a time, the analysis would have to be careful about the accounting of these small subproblems.
References
[1]
James Abello, Adam L. Buchsbaum, and Jeffery Westbrook. 1998. A functional approach to external graph algorithms. In Proceedings of the 6th Annual European Symposium on Algorithms. 332–343. http://dl.acm.org/citation.cfm?id=647908.740141.
Alok Aggarwal and Jeffrey Vitter. 1988. The input/output complexity of sorting and related problems. Communications of the ACM 31, 9 (1988), 1116–1127.
Deepak Ajwani, Adan Cosgaya-Lozano, and Norbert Zeh. 2011. Engineering a topological sorting algorithm for massive graphs. In Proceedings of the Meeting on Algorithm Engineering & Expermiments. 139–150.
Lars Arge. 1995. The buffer tree: A new technique for optimal I/O-algorithms. In Proceedings of the Workshop on Algorithms and Data Structures. 334–345.
Lars Arge, Gerth Stølting Brodal, and Laura Toma. 2004. On external-memory MST, SSSP and multi-way planar graph separation. Journal of Algorithms 53, 2 (2004), 186–206. DOI:https://doi.org/10.1016/j.jalgor.2004.04.001
Lars Arge, Ulrich Meyer, and Laura Toma. 2004. External memory algorithms for diameter and all-pairs shortest-paths on sparse graphs. In Prooceedings of the 31st International Colloquium on Automata, Languages and Programming. 146–157.
Lars Arge and Morten Revsbæk. 2009. I/O-efficient contour tree simplification. In Proceedings of the International Symposium on Algorithms and Computation. 1155–1165.
Lars Arge, Laura Toma, and Norbert Zeh. 2003. I/O-efficient topological sorting of planar DAGs. In Proceedings of the 15th Annual ACM Symposium on Parallel Algorithms and Architectures. 85–93.
Adam L. Buchsbaum, Michael Goldwasser, Suresh Venkatasubramanian, and Jeffery R. Westbrook. 2000. On external memory graph traversal. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms. 859–860. http://dl.acm.org/citation.cfm?id=338219.338650.
Yi-Jen Chiang, Michael T. Goodrich, Edward F. Grove, Roberto Tamassia, Darren Erik Vengroff, and Jeffrey Scott Vitter. 1995. External-memory graph algorithms. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms. 139–149. http://dl.acm.org/citation.cfm?id=313651.313681.
Rezaul Alam Chowdhury and Vijaya Ramachandran. 2005. External-memory exact and approximate all-pairs shortest-paths in undirected graphs. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms. 735–744. http://dl.acm.org/citation.cfm?id=1070432.1070536
Don Coppersmith, Lisa Fleischer, Bruce Hendrickson, and Ali Pinar. 2005. A Divide-and-Conquer Algorithm for Identifying Strongly Connected Components. Technical Report RC23744. IBM Research.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2001. Introduction to Algorithms (2nd ed.). MIT Press, Cambridge, MA, USA.
David Eppstein, Zvi Galil, Giuseppe F. Italiano, and Amnon Nissenzweig. 1997. Sparsification—A technique for speeding up dynamic graph algorithms. Journal of the ACM 44, 5 (1997), 669–696. DOI:https://doi.org/10.1145/265910.265914
Jeremy T. Fineman. 2018. Nearly work-efficient parallel algorithm for digraph reachability. In Proceedings of the 50th Annual ACM SIGACT Symposium on the Theory of Computation. 457–470.
Jelle Hellings, George H. L. Fletcher, and Herman Haverkort. 2012. Efficient external-memory bisimulation on DAGs. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 553–564.
Vijay Kumar and Eric J. Schwabe. 1996. Improved algorithms and data structures for solving graph problems in external memory. In Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing. 169–176. http://dl.acm.org/citation.cfm?id=829517.830723.
Kurt Mehlhorn and Ulrich Meyer. 2002. External-memory breadth-first search with sublinear I/O. In Proceedings of the 10th Annual European Symposium on Algorithms. 723–735. http://dl.acm.org/citation.cfm?id=647912.740673.
Ulrich Meyer and Norbert Zeh. 2003. I/O-efficient undirected shortest paths. In Proceedings of the 11th Annual European Symposium on Algorithms. 434–445.
Ulrich Meyer and Norbert Zeh. 2006. I/O-efficient undirected shortest paths with unbounded edge lengths. In Proceedings of the 14th Annual European Symposium on Algorithms. 540–551.
Kameshwar Munagala and Abhiram Ranade. 1999. I/O-complexity of graph algorithms. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. 687–694. http://dl.acm.org/citation.cfm?id=314500.314891.
Jeffrey D. Ullman and Mihalis Yannakakis. 1991. The input/output complexity of transitive closure. Annals of Mathematics and Artificial Intelligence 3, 2 (1991), 331–360. DOI:
Jeffrey Scott Vitter. 2008. Algorithms and data structures for external memory. Foundations and Trends in Theoretical Computer Science 2, 4 (2008), 305–474.
Thakkar VSukumar MDai JSingh KCao Z(2024)Can Modern LLMs Tune and Configure LSM-based Key-Value Stores?Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665954(116-123)Online publication date: 8-Jul-2024
Yu QGuo CZhuang JThakkar VWang JCao Z(2024)CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated InfrastructureProceedings of the ACM on Management of Data10.1145/36549272:3(1-28)Online publication date: 30-May-2024
邱 林(2023)Research on Development Paradigm of Internet of Things System Based on Componentization and M2MComputer Science and Application10.12677/CSA.2023.13714813:07(1492-1506)Online publication date: 2023
SODA '19: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms
This paper presents I/O-efficient algorithms for topologically sorting a directed acyclic graph and for the more general problem identifying and topologically sorting the strongly connected components of a directed graph G = (V, E). Both algorithms are ...
PAS '97: Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Abstarct: In this paper, we give eficient parallel and distributed algomthms for the topological sort problem on acyclic graphs with n vertices. Our parallel algorithm solves the problem on a CREW PRAM in O(log2 n) time with O(M(n)/ log n) processors, ...
SPAA '08: Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
We give a randomized (Las-Vegas) parallel algorithm for computing strongly connected components of a graph with n vertices and m edges. The runtime is dominated by O(log2n) multi-source parallel reachability queries; i.e. O(log2n) calls to a subroutine ...
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Thakkar VSukumar MDai JSingh KCao Z(2024)Can Modern LLMs Tune and Configure LSM-based Key-Value Stores?Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665954(116-123)Online publication date: 8-Jul-2024
Yu QGuo CZhuang JThakkar VWang JCao Z(2024)CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated InfrastructureProceedings of the ACM on Management of Data10.1145/36549272:3(1-28)Online publication date: 30-May-2024
邱 林(2023)Research on Development Paradigm of Internet of Things System Based on Componentization and M2MComputer Science and Application10.12677/CSA.2023.13714813:07(1492-1506)Online publication date: 2023