Performance guarantees for distributed reachability queries

W Fan, X Wang, Y Wu - arXiv preprint arXiv:1208.0091, 2012 - arxiv.org
W Fan, X Wang, Y Wu
arXiv preprint arXiv:1208.0091, 2012arxiv.org
In the real world a graph is often fragmented and distributed across different sites. This
highlights the need for evaluating queries on distributed graphs. This paper proposes
distributed evaluation algorithms for three classes of queries: reachability for determining
whether one node can reach another, bounded reachability for deciding whether there
exists a path of a bounded length between a pair of nodes, and regular reachability for
checking whether there exists a path connecting two nodes such that the node labels on the …
In the real world a graph is often fragmented and distributed across different sites. This highlights the need for evaluating queries on distributed graphs. This paper proposes distributed evaluation algorithms for three classes of queries: reachability for determining whether one node can reach another, bounded reachability for deciding whether there exists a path of a bounded length between a pair of nodes, and regular reachability for checking whether there exists a path connecting two nodes such that the node labels on the path form a string in a given regular expression. We develop these algorithms based on partial evaluation, to explore parallel computation. When evaluating a query Q on a distributed graph G, we show that these algorithms possess the following performance guarantees, no matter how G is fragmented and distributed: (1) each site is visited only once; (2) the total network traffic is determined by the size of Q and the fragmentation of G, independent of the size of G; and (3) the response time is decided by the largest fragment of G rather than the entire G. In addition, we show that these algorithms can be readily implemented in the MapReduce framework. Using synthetic and real-life data, we experimentally verify that these algorithms are scalable on large graphs, regardless of how the graphs are distributed.
arxiv.org