Search | arXiv e-print repository

Parallel Graph Coloring Algorithms for Distributed GPU Environments

Authors: Ian Bogle, Erik G Boman, Karen D Devine, Sivasankaran Rajamanickam, George M Slota

Abstract: Graph coloring is often used in parallelizing scientific computations that run in distributed and multi-GPU environments; it identifies sets of independent data that can be updated in parallel. Many algorithms exist for graph coloring on a single GPU or in distributed memory, but to the best of our knowledge, hybrid MPI+GPU algorithms have been unexplored until this work. We present several MPI+GP… ▽ More Graph coloring is often used in parallelizing scientific computations that run in distributed and multi-GPU environments; it identifies sets of independent data that can be updated in parallel. Many algorithms exist for graph coloring on a single GPU or in distributed memory, but to the best of our knowledge, hybrid MPI+GPU algorithms have been unexplored until this work. We present several MPI+GPU coloring approaches based on the distributed coloring algorithms of Gebremedhin et al. and the shared-memory algorithms of Deveci et al. . The on-node parallel coloring uses implementations in KokkosKernels, which provide parallelization for both multicore CPUs and GPUs. We further extend our approaches to compute distance-2 and partial distance-2 colorings, giving the first known distributed, multi-GPU algorithm for these problems. In addition, we propose a novel heuristic to reduce communication for recoloring in distributed graph coloring. Our experiments show that our approaches operate efficiently on inputs too large to fit on a single GPU and scale up to graphs with 76.7 billion edges running on 128 GPUs. △ Less

Submitted 30 June, 2021; originally announced July 2021.

Comments: Submitted to Parallel Computing

arXiv:1701.00503 [pdf, other]

Distributed Graph Layout for Scalable Small-world Network Analysis

Authors: George M Slota, Sivasankaran Rajamanickam, Kamesh Madduri

Abstract: The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory utilization. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, and r… ▽ More The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory utilization. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, and reordering vertex and edge identifiers. In this work, we present DGL, a fast, parallel, and memory-efficient distributed graph layout strategy that is specifically designed for small-world networks (low-diameter graphs with skewed vertex degree distributions). Label propagation-based partitioning and a scalable BFS-based ordering are the main steps in the layout strategy. We show that the DGL layout can significantly improve end-to-end performance of five challenging graph analytics workloads: PageRank, a parallel subgraph enumeration program, tuned implementations of breadth-first search and single-source shortest paths, and RDF3X-MPI, a distributed SPARQL query processing engine. Using these benchmarks, we additionally offer a comprehensive analysis on how graph layout affects the performance of graph analytics with variable computation and communication characteristics. △ Less

Submitted 2 January, 2017; originally announced January 2017.

arXiv:1610.07220 [pdf, other]

Partitioning Trillion-edge Graphs in Minutes

Authors: George M Slota, Sivasankaran Rajamanickam, Karen Devine, Kamesh Madduri

Abstract: We introduce XtraPuLP, a new distributed-memory graph partitioner designed to process trillion-edge graphs. XtraPuLP is based on the scalable label propagation community detection technique, which has been demonstrated as a viable means to produce high quality partitions with minimal computation time. On a collection of large sparse graphs, we show that XtraPuLP partitioning quality is comparable… ▽ More We introduce XtraPuLP, a new distributed-memory graph partitioner designed to process trillion-edge graphs. XtraPuLP is based on the scalable label propagation community detection technique, which has been demonstrated as a viable means to produce high quality partitions with minimal computation time. On a collection of large sparse graphs, we show that XtraPuLP partitioning quality is comparable to state-of-the-art partitioning methods. We also demonstrate that XtraPuLP can produce partitions of real-world graphs with billion+ vertices in minutes. Further, we show that using XtraPuLP partitions for distributed-memory graph analytics leads to significant end-to-end execution time reduction. △ Less

Submitted 23 October, 2016; originally announced October 2016.

Showing 1–3 of 3 results for author: Slota, G M