Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3299869.3300076acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Experimental Analysis of Streaming Algorithms for Graph Partitioning

Published: 25 June 2019 Publication History

Abstract

We report a systematic performance study of streaming graph partitioning algorithms. Graph partitioning plays a crucial role in overall system performance as it has a significant impact on both load balancing and inter-machine communication. The streaming model for graph partitioning has recently gained attention due to its ability to scale to very large graphs with limited resources. The main objective of this study is to understand how the choice of graph partitioning algorithm affects system performance, resource usage and scalability. We focus on both offline graph analytics and online graph query workloads. The study considers both edge-cut and vertex-cut approaches. Our results show that the no partitioning algorithms performs best in all cases, and the choice of graph partitioning algorithm depends on: (i) type and degree distribution of the graph, (ii) characteristics of the workloads, and (iii) specific application requirements.

References

[1]
Zainab Abbas, Vasiliki Kalavri, Paris Carbone, and Vladimir Vlassov. 2018. Streaming graph partitioning: an experimental study. Proc. VLDB Endowment, Vol. 11, 11 (2018), 1590--1603.
[2]
Amine Abou-Rjeili and George Karypis. 2006. Multilevel Algorithms for Partitioning Power-law Graphs. In Proc. 20th IEEE Int. Parallel & Distributed Processing Symp. 124--124.
[3]
Khaled Ammar and M. Tamer Özsu. 2018. Experimental Analysis of Distributed Graph Systems. Proc. VLDB Endowment, Vol. 11 (2018). Forthcoming.
[4]
Konstantin Andreev and Harald Racke. 2006. Balanced Graph Partitioning. Theor. Comp. Sci., Vol. 39, 6 (2006), 929--939.
[5]
Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: a database benchmark based on the Facebook social graph. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 1185--1196.
[6]
Charles-Edmond Bichot and Patrick Siarry. 2013. Graph partitioning .John Wiley & Sons.
[7]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th international conference on World Wide Web, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM Press, 587--596.
[8]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, 595--601.
[9]
Florian Bourse, Marc Lelarge, and Milan Vojnovic. 2014. Balanced graph edge partition. In Proc. 20th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM, 1456--1465.
[10]
S. Brin and L. Page. 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Comp. Netw., Vol. 30, 1--7 (1998), 107 -- 117.
[11]
Aydin Bulucc, Henning Meyerhenke, Ilya Safro, Peter Sanders, and Christian Schulz. 2016. Recent Advances in Graph Partitioning. Algorithm Engineering - Selected Results and Surveys. Lecture Notes in Computer Science, Vol. 9220. Springer, 117--158.
[12]
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proc. 10th ACM SIGOPS/EuroSys European Conf. on Comp. Syst. Article 1, bibinfonumpages15 pages.
[13]
Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. Proc. VLDB Endowment, Vol. 3, 1 (2010), 48--57. Issue 1--2. http://dl.acm.org/citation.cfm?id=1920841.1920853
[14]
Dong Dai, Wei Zhang, and Yong Chen. 2017. IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases. ACM, 219--230.
[15]
D. J. DeWitt and J. Gray. 1992. Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM, Vol. 35, 6 (1992), 85--98.
[16]
Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC social network benchmark: Interactive workload. In Proc. ACM SIGMOD Int. Conf. on Management of Data. ACM, 619--630.
[17]
Ioanna Filippidou and Yannis Kotidis. 2015. Online and on-demand partitioning of streaming graphs. In Proc. 2015 IEEE Int. Conf. on Big Data. IEEE, 4--13.
[18]
Hugo Firth and Paolo Missier. 2017. TAPER: query-aware, partition-enhancement for large, heterogenous graphs. dapd, Vol. 35, 2 (2017), 85--115.
[19]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In Proc. 10th USENIX Symp. on Operating System Design and Implementation . 17--30. http://dl.acm.org/citation.cfm?id=2387880.2387883
[20]
Yong Guo, Sungpack Hong, Hassan Chafi, Alexandru Iosup, and Dick Epema. 2017. Modeling, analysis, and experimental comparison of streaming graph-partitioning policies. J. Parallel and Distrib. Comput., Vol. 108 (2017), 106 -- 121.
[21]
Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Ö zsu, Xingfang Wang, and Tianqi Jin. 2014. An Experimental Comparison of Pregel-like Graph Processing Systems. Proc. VLDB Endowment, Vol. 7, 12 (2014), 1047--1058. http://www.vldb.org/pvldb/vol7/p1047-han.pdf
[22]
Jiewen Huang and Daniel J Abadi. 2016. Leopard: lightweight edge-oriented partitioning and replication for dynamic graphs. Proc. VLDB Endowment, Vol. 9, 7 (2016), 540--551.
[23]
Nilesh Jain, Guangdeng Liao, and Theodore L Willke. 2013. Graphbuilder: scalable graph etl framework. In Proc. 1st Int. Workshop on Graph Data Management Experiences and Systems. ACM, 4.
[24]
U. Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. 2009. PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining . 229--238.
[25]
George Karypis and Vipin Kumar. 1996. Parallel Multilevel K-way Partitioning Scheme for Irregular Graphs. In Proc. 1996 ACM/IEEE Conf. on Supercomputing. Article 35.
[26]
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. on Scientific Comput., Vol. 20, 1 (1998), 359--392.
[27]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proc. 19th Int. World Wide Web Conf. ACM, 591--600.
[28]
Michael LeBeane, Shuang Song, Reena Panda, Jee Ho Ryoo, and Lizy K John. 2015. Data partitioning strategies for graph workloads on heterogeneous clusters. In Proc. 2015 ACM/IEEE Conf. on High Performance Computing, Networking, Storage and Analysis . IEEE, 1--12.
[29]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. Proc. ACM SIGMOD Int. Conf. on Management of Data. 135--146.
[30]
Daniel W. Margo and Margo I. Seltzer. 2015. A Scalable Distributed Graph Partitioner . Proc. VLDB Endowment, Vol. 8, 12 (2015), 1478--1489. http://www.vldb.org/pvldb/vol8/p1478-margo.pdf
[31]
Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking Like a Vertex: A Survey of Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing. ACM Comput. Surv., Vol. 48, 2 (2015), 25:1--25:39.
[32]
Daniel Nicoara, Shahin Kamali, Khuzaima Daudjee, and Lei Chen. 2015. Hermes: Dynamic Partitioning for Distributed Social Network Graph Databases. In Proc. 18th Int. Conf. on Extending Database Technology. 25--36.
[33]
Joel Nishimura and Johan Ugander. 2013. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In Proc. 19th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM, 1106--1114.
[34]
M. Tamer Özsu and Patrick Valduriez. 2011. Principles of Distributed Database Systems 3rd ed.). Springer. Previous two editions of the book were published by Prentice-Hall in 1991 and 1999, respectively.
[35]
Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, and Giorgio Iacoboni. 2015. Hdrf: Stream-based partitioning for power-law graphs. In Proc. 24th ACM Int. Conf. on Information and Knowledge Management. ACM, 243--252.
[36]
Semih Salihoglu and Jennifer Widom. 2013. GPS: a graph processing system. In Proc. 25th Int. Conf. on Scientific and Statistical Database Management. 22:1--22:12.
[37]
Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: real-time content recommendations at twitter. Proceedings of the VLDB Endowment, Vol. 9, 13 (2016), 1281--1292.
[38]
Isabelle Stanton and Gabriel Kliot. 2012. Streaming Graph Partitioning for Large Distributed Graphs. In Proc. 18th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. 1222--1230.
[39]
Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. FENNEL: Streaming Graph Partitioning for Massive Scale Graphs. In Proc. 7th ACM Int. Conf. Web Search and Data Mining. 333--342.
[40]
Shiv Verma, Luke M. Leslie, Yosub Shin, and Indranil Gupta. 2017. An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing. Proc. VLDB Endowment, Vol. 10, 5 (2017), 493--504.
[41]
Cong Xie, Wu-Jun Li, and Zhihua Zhang. 2015. S-powergraph: Streaming graph partitioning for natural graphs by vertex-cut. arXiv preprint arXiv:1511.02586 (2015).
[42]
Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. 2014. Distributed power-law graph computing: Theoretical and empirical analysis. In Advances in Neural Information Proc. Systems 27, Proc. 28th Annual Conf. on Neural Information Proc. Systems. 1673--1681.
[43]
Ning Xu, Bin Cui, Lei Chen, Zi Huang, and Yingxia Shao. 2015. Heterogeneous environment aware streaming graph partitioning. IEEE Trans. Knowl. and Data Eng., Vol. 27, 6 (2015), 1560--1572.

Cited By

View all
  • (2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: Jan-2025
  • (2024)Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data SegmentProceedings of the ACM on Management of Data10.1145/36392692:1(1-27)Online publication date: 26-Mar-2024
  • (2024)GraphCube: Interconnection Hierarchy-aware Graph ProcessingProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638498(160-174)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2019

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. graph partitioning
  2. graph processing
  3. streaming algorithms

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '19
Sponsor:
SIGMOD/PODS '19: International Conference on Management of Data
June 30 - July 5, 2019
Amsterdam, Netherlands

Acceptance Rates

SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)92
  • Downloads (Last 6 weeks)11
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: Jan-2025
  • (2024)Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data SegmentProceedings of the ACM on Management of Data10.1145/36392692:1(1-27)Online publication date: 26-Mar-2024
  • (2024)GraphCube: Interconnection Hierarchy-aware Graph ProcessingProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638498(160-174)Online publication date: 2-Mar-2024
  • (2024)Resource Management of Automotive Engine Control Units2024 IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC62099.2024.10767819(1-4)Online publication date: 6-Oct-2024
  • (2024)Serving Graph Neural Networks With Distributed Fog Servers for Smart IoT ServicesIEEE/ACM Transactions on Networking10.1109/TNET.2023.329305232:1(550-565)Online publication date: Feb-2024
  • (2024)DKWS: A Distributed System for Keyword Search on Massive GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3313726(1-16)Online publication date: 2024
  • (2023)Distributed Graph Embedding with Information-Oriented Random WalksProceedings of the VLDB Endowment10.14778/3587136.358714016:7(1643-1656)Online publication date: 8-May-2023
  • (2023)Knowledge Graphs QueryingACM SIGMOD Record10.1145/3615952.361595652:2(18-29)Online publication date: 11-Aug-2023
  • (2023)The Evolution of Distributed Systems for Graph Neural Networks and Their Origin in Graph Processing and Deep Learning: A SurveyACM Computing Surveys10.1145/359742856:1(1-37)Online publication date: 28-Aug-2023
  • (2023)A Mixed-State Streaming Edge Partitioning based on Combinatorial Design2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00096(868-877)Online publication date: 1-Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media