Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A scalable distributed graph partitioner

Published: 01 August 2015 Publication History

Abstract

We present Scalable Host-tree Embeddings for Efficient Partitioning (Sheep), a distributed graph partitioning algorithm capable of handling graphs that far exceed main memory. Sheep produces high quality edge partitions an order of magnitude faster than both state of the art offline (e.g., METIS) and streaming partitioners (e.g., Fennel). Sheep's partitions are independent of the input graph distribution, which means that graph elements can be assigned to processing nodes arbitrarily without affecting the partition quality.
Sheep transforms the input graph into a strictly smaller elimination tree via a distributed map-reduce operation. By partitioning this tree, Sheep finds an upper-bounded communication volume partitioning of the original graph.
We describe the Sheep algorithm and analyze its space-time requirements, partition quality, and intuitive characteristics and limitations. We compare Sheep to contemporary partitioners and demonstrate that Sheep creates competitive partitions, scales to larger graphs, and has better runtime.

References

[1]
R. Albert, H. Jeong, and A.-L. Barabási. Error and attack tolerance of complex networks. Nature, 406(6794):378--382, 2000.
[2]
S. Arifuzzaman, M. Khan, and M. Marathe. Patric: A parallel algorithm for counting triangles in massive networks. In ACM International Conference on Information and Knowledge Management, 2013.
[3]
C. Ashcraft and J. W. Liu. Robust ordering of sparse matrices using multisection. SIAM Journal on Matrix Analysis and Applications, 19(3):816--832, 1998.
[4]
C. Avery. Giraph: Large-scale graph processing infrastructure on hadoop. Hadoop Summit, 2011.
[5]
H. L. Bodlaender, F. V. Fomin, A. M. Koster, D. Kratsch, and D. M. Thilikos. A note on exact algorithms for vertex ordering problems on graphs. Theory of Computing Systems, 50(3):420--432, 2012.
[6]
H. L. Bodlaender, J. R. Gilbert, H. Hafsteinsson, and T. Kloks. Approximating treewidth, pathwidth, frontsize, and shortest elimination tree. Journal of Algorithms, 18(2):238--255, 1995.
[7]
P. Boldi, M. Santini, and S. Vigna. A large time-aware graph. SIGIR Forum, 42(2):33--38, 2008.
[8]
F. Bourse, M. Lelarge, and M. Vojnovic. Balanced graph edge partition. In 20th ACM International Conference on Knowledge Discovery and Data mining, pages 1456--1465. ACM, 2014.
[9]
R. Chen, J. Shi, Y. Chen, and H. Chen. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In 10th ACM SIGOPS European Conference on Computer Systems. ACM, 2015.
[10]
S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes. K-core organization of complex networks. Physical review letters, 96(4):040601, 2006.
[11]
M. Fredman and M. Saks. The cell probe complexity of dynamic data structures. In 21st ACM Symposium on Theory of Computing, pages 345--354. ACM, 1989.
[12]
A. George. Nested dissection of a regular finite element mesh. SIAM Journal on Numerical Analysis, 10(2):345--363, 1973.
[13]
A. George and J. W. Liu. The evolution of the minimum degree ordering algorithm. SIAM Review, 31(1):1--19, 1989.
[14]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Operating Systems Design and Implementation, volume 12, page 2, 2012.
[15]
P. Heggernes. Minimal triangulations of graphs: A survey. Discrete Mathematics, 306(3):297--317, 2006.
[16]
S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In Conference on Innovative Data systems Research, volume 3, pages 1--8, 2007.
[17]
S. Iyer, T. Killingback, B. Sundaram, and Z. Wang. Attack robustness and centrality of complex networks. PloS ONE, 8(4):e59613, 2013.
[18]
G. Karypis. A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. University of Minnesota, Department of Computer Science and Engineering, Minneapolis, MN, 2013.
[19]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359--392, 1998.
[20]
G. Karypis and V. Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. Journal of Parallel and Distributed Computing, 48(1):71--95, Jan. 1998.
[21]
S. Kundu and J. Misra. A linear tree partitioning algorithm. SIAM Journal on Computing, 6(1):151--154, 1977.
[22]
A. Kyrola, G. E. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In Operating Systems Design and Implementation, volume 12, pages 31--46, 2012.
[23]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[24]
P. Macko, V. J. Marathe, D. W. Margo, and M. I. Seltzer. Llama: Efficient graph analytics using large multiversioned arrays. In International Conference on Data Engineering. IEEE, 2015.
[25]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In 2010 ACM SIGMOD International Conference on Management of Data, pages 135--146. ACM, 2010.
[26]
H. Miao, X. Liu, B. Huang, and L. Getoor. A hypergraph-partitioned vertex programming approach for large-scale consensus optimization. In International Conference on Big Data, pages 563--568. IEEE, 2013.
[27]
R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. Cray User's Group (CUG), 2010.
[28]
M. E. Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2):404--409, 2001.
[29]
S. Parter. The use of linear graphs in gauss elimination. SIAM Review, 3(2):119--130, 1961.
[30]
A. Pothen and S. Toledo. Elimination structures in scientific computing. Handbook on Data Structures and Applications, pages 59--1, 2004.
[31]
V. Prabhakaran, M. Wu, X. Weng, F. McSherry, L. Zhou, and M. Haradasan. Managing large graphs on multi-cores with graph awareness. In USENIX Annual Technical Conference, pages 41--52, 2012.
[32]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In 24th ACM Symposium on Operating Systems Principles, pages 472--488. ACM, 2013.
[33]
S. Salihoglu and J. Widom. Gps: A graph processing system. In 25th International Conference on Scientific and Statistical Database Management. ACM, 2013.
[34]
N. Satish, N. Sundaram, M. M. A. Patwary, J. Seo, J. Park, M. A. Hassaan, S. Sengupta, Z. Yin, and P. Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In ACM SIGMOD International Conference on Management of Data, pages 979--990. ACM, 2014.
[35]
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In 18th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 1222--1230. ACM, 2012.
[36]
C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In 7th ACM International Conference on Web Search and Data Mining, pages 333--342. ACM, 2014.

Cited By

View all
  • (2024)Connectivity-Oriented Property Graph Partitioning for Distributed Graph Pattern Query ProcessingProceedings of the ACM on Management of Data10.1145/36988042:6(1-26)Online publication date: 20-Dec-2024
  • (2024)PECC: parallel expansion based on clustering coefficient for efficient graph partitioningDistributed and Parallel Databases10.1007/s10619-024-07442-842:4(447-467)Online publication date: 1-Dec-2024
  • (2024)Distributed k-Hop Query Powered by an Asynchronous FrameworkWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_22(304-319)Online publication date: 2-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 8, Issue 12
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii
August 2015
728 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2015
Published in PVLDB Volume 8, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Connectivity-Oriented Property Graph Partitioning for Distributed Graph Pattern Query ProcessingProceedings of the ACM on Management of Data10.1145/36988042:6(1-26)Online publication date: 20-Dec-2024
  • (2024)PECC: parallel expansion based on clustering coefficient for efficient graph partitioningDistributed and Parallel Databases10.1007/s10619-024-07442-842:4(447-467)Online publication date: 1-Dec-2024
  • (2024)Distributed k-Hop Query Powered by an Asynchronous FrameworkWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_22(304-319)Online publication date: 2-Dec-2024
  • (2023)NosWalker: A Decoupled Architecture for Out-of-Core Random Walk ProcessingProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582025(466-482)Online publication date: 25-Mar-2023
  • (2023)An unsupervised learning-guided multi-node failure-recovery model for distributed graph processing systemsThe Journal of Supercomputing10.1007/s11227-022-05028-879:9(9383-9408)Online publication date: 13-Jan-2023
  • (2022)GraphFlyProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571944(1-14)Online publication date: 13-Nov-2022
  • (2022)Metamodel driven acceleration of actor-based simulationProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3530050.3532921(1-8)Online publication date: 12-Jun-2022
  • (2022)Application-driven graph partitioningThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00736-232:1(149-172)Online publication date: 11-Apr-2022
  • (2022)An Efficient Data Distribution Strategy for Distributed Graph Processing SystemComputer Information Systems and Industrial Management10.1007/978-3-031-10539-5_26(360-373)Online publication date: 15-Jul-2022
  • (2021)Local Graph Edge PartitioningACM Transactions on Intelligent Systems and Technology10.1145/346668512:5(1-25)Online publication date: 23-Sep-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media