Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3183713.3196915acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

TurboGraph++: A Scalable and Fast Graph Analytics System

Published: 27 May 2018 Publication History

Abstract

Existing distributed graph analytics systems are categorized into two main groups: those that focus on efficiency with a risk of out-of-memory error and those that focus on scale-up with a fixed memory budget and a sacrifice in performance. While the former group keeps a partitioned graph resident in memory of each machine and uses an in-memory processing technique, the latter stores the partitioned graph in external memory of each machine and exploits a streaming processing technique. Gemini and Chaos are the state-of-the-art distributed graph systems in each group, respectively.
We present TurboGraph++, a scalable and fast graph analytics system which efficiently processes large graphs by exploiting external memory for scale-up without compromising efficiency. First, TurboGraph++ provides a new graph processing abstraction for efficiently supporting neighborhood analytics that requires processing multi-hop neighborhoods of vertices, such as triangle counting and local clustering coefficient computation, with a fixed memory budget. Second, TurboGraph++ provides a balanced and buffer-aware partitioning scheme for ensuring balanced workloads across machines with reasonable cost. Lastly, TurboGraph++ leverages three-level parallel and overlapping processing for fully utilizing three hardware resources, CPU, disk, and network, in a cluster. Extensive experiments show that TurboGraph++ is designed to scale well to very large graphs, like Chaos, while its performance is comparable to Gemini.

References

[1]
2002. The Yahoo Webscope Program: yahoo! altavista web page hyperlink connectivity graph. https://webscope.sandbox.yahoo.com/.
[2]
2009. The lemur project: Clueweb09 web graph. http://www.lemurproject.org/clueweb09.
[3]
2012. The lemur project: Clueweb12 web graph. http://www.lemurproject.org/clueweb12.
[4]
2012. MPI: A Message-Passing Interface Standard Version 3.0. http://mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf .
[5]
Foto N Afrati, Dimitris Fotakis, and Jeffrey D Ullman. 2013. Enumerating subgraph instances using map-reduce. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on. IEEE, 62--73.
[6]
Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, and Erfan Zamanian. 2016. The end of slow networks: It's time for a redesign. Proceedings of the VLDB Endowment 9, 7 (2016), 528--539.
[7]
John Adrian Bondy, Uppaluri Siva Ramachandra Murty, et al. 1976. Graph theory with applications. Vol. 290. Citeseer.
[8]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.
[9]
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015 Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the Tenth European Conference on Computer Systems. ACM, 1.
[10]
Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis,and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804--1815.
[11]
Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, and Chao Tian. 2017. Parallelizing sequential graph computations. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 495--510.
[12]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, Vol. 12. 2.
[13]
Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI, Vol. 14. 599--613.
[14]
Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, and Hwanjo Yu. 2013. TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 77--85.
[15]
Nilesh Jain, Guangdeng Liao, and Theodore L Willke. 2013 Graphbuilder: scalable graph etl framework. In First International Workshop on Graph Data Management Experiences and Systems. ACM, 4.
[16]
Arijit Khan. 2017. Vertex-Centric Graph Processing: The Good, the Bad, and the Ugly. Proceedings of the 20th International Conference on Extending Database Technology (2017).
[17]
Ajay D Kshemkalyani and Mukesh Singhal. 2011. Distributed computing: principles, algorithms, and systems. Cambridge University Press.
[18]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web . ACM, 591--600.
[19]
Aapo Kyrola, Guy E Blelloch, and Carlos Guestrin. 2012. Graphchi: Large-scale graph computation on just a pc. USENIX.
[20]
Longbin Lai, Lu Qin, Xuemin Lin, and Lijun Chang. 2015. Scalable subgraph enumeration in mapreduce. Proceedings of the VLDB Endowment 8, 10 (2015), 974--985.
[21]
Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1--3 (2008), 458--473.
[22]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716--727.
[23]
Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-scale distributed graph computing systems: An experimental evaluation. Proceedings of the VLDB Endowment 8, 3 (2014), 281--292.
[24]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 135--146.
[25]
R Meusel, O Lehmberg, C Bizer, and S Vigna. 2014. Web data commons-hyperlink graphs. http://webdatacommons.org/hyperlinkgraph/.
[26]
Himchan Park and Min-Soo Kim. 2017. TrillionG: A Trillion-scale Synthetic Graph Generator using a Recursive Vector Model. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 913--928.
[27]
Ha-Myung Park, Sung-Hyon Myaeng, and U Kang. 2016. Pte: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1115--1124.
[28]
Todd Plantenga. 2013. Inexact subgraph isomorphism in MapReduce. J. Parallel and Distrib. Comput. 73, 2 (2013), 164--175.
[29]
Abdul Quamar, Amol Deshpande, and Jimmy Lin. 2014. NScale: neighborhood-centric analytics on large graphs. Proceedings of the VLDB Endowment 7, 13 (2014), 1673--1676.
[30]
Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out graph processing from secondary storage. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 410--424.
[31]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 472--488.
[32]
Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S Lam. 2013. Distributed socialite: a datalog-based language for large-scale graph analysis. Proceedings of the VLDB Endowment 6, 14 (2013), 1906--1917.
[33]
Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu.2014. Parallel subgraph listing in a large-scale graph. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 625--636.
[34]
Isabelle Stanton and Gabriel Kliot. 2012. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1222--1230.
[35]
Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From think like a vertex to think like a graph. Proceedings of the VLDB Endowment 7, 3 (2013), 193--204.
[36]
Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 333--342.
[37]
Zhigang Wang, Yu Gu, Yubin Bao, Ge Yu, and Jeffrey Xu Yu. 2016. Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing. In Proceedings of the 2016 International Conference on Management of Data. ACM, 479--494.
[38]
Wenlei Xie, Guozhang Wang, David Bindel, Alan Demers, and Johannes Gehrke. 2013. Fast iterative graph computation with block updates. Proceedings of the VLDB Endowment 6, 14 (2013), 2014--2025.
[39]
Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande, and James Cheng. 2016. Big graph analytics systems. In Proceedings of the 2016 International Conference on Management of Data. ACM, 2241--2243.
[40]
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment 7, 14 (2014), 1981--1992.
[41]
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In Proceedings of the 24th International Conference on World Wide Web. ACM, 1307--1317.
[42]
Makoto Yui, Jun Miyazaki, Shunsuke Uemura, and Hayato Yamana. 2010. Nb-GCLOCK: A non-blocking buffer management based on the generalized CLOCK. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 745--756.
[43]
Chang Zhou, Jun Gao, Binbin Sun, and Jeffrey Xu Yu. 2014. MOCgraph: Scalable distributed graph processing using message online computing. Proceedings of the VLDB Endowment 8, 4 (2014), 377--388.
[44]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)(Savannah, GA.
[45]
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In USENIX Annual Technical Conference. 375--386

Cited By

View all
  • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
  • (2024)Graph Computation with Adaptive Granularity2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00169(2123-2136)Online publication date: 13-May-2024
  • (2023)Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics EnginesProceedings of the VLDB Endowment10.14778/3611479.361148316:11(2728-2741)Online publication date: 24-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. disk-based system
  2. distributed system
  3. graph analytics system
  4. scalability

Qualifiers

  • Research-article

Funding Sources

  • Samsung Research Funding Center of Samsung Electronics

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)13
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
  • (2024)Graph Computation with Adaptive Granularity2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00169(2123-2136)Online publication date: 13-May-2024
  • (2023)Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics EnginesProceedings of the VLDB Endowment10.14778/3611479.361148316:11(2728-2741)Online publication date: 24-Aug-2023
  • (2023)GraphTune: An Efficient Dependency-Aware Substrate to Alleviate Irregularity in Concurrent Graph ProcessingACM Transactions on Architecture and Code Optimization10.1145/360009120:3(1-24)Online publication date: 19-Jul-2023
  • (2023)Hypergraph motifs and their extensions beyond binaryThe VLDB Journal10.1007/s00778-023-00827-833:3(625-665)Online publication date: 26-Dec-2023
  • (2022)Parallel Query Processing: To Separate Communication from ComputationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526164(1447-1461)Online publication date: 10-Jun-2022
  • (2022)GzippoProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549372(1-9)Online publication date: 30-Oct-2022
  • (2022)vGraph: Memory-Efficient Multicore Graph Processing for Traversal-Centric AlgorithmsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00068(1-14)Online publication date: Nov-2022
  • (2022)EndGraph: An Efficient Distributed Graph Preprocessing System2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS54860.2022.00020(111-121)Online publication date: Jul-2022
  • (2022)A Comparative Study of Large Automata Distributed Processing2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT)10.1109/AICT55583.2022.10013506(1-6)Online publication date: 12-Oct-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media