research-article

TurboGraph++: A Scalable and Fast Graph Analytics System

Authors:

Wook-Shin HanAuthors Info & Claims

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Pages 395 - 410

https://doi.org/10.1145/3183713.3196915

Published: 27 May 2018 Publication History

Abstract

Existing distributed graph analytics systems are categorized into two main groups: those that focus on efficiency with a risk of out-of-memory error and those that focus on scale-up with a fixed memory budget and a sacrifice in performance. While the former group keeps a partitioned graph resident in memory of each machine and uses an in-memory processing technique, the latter stores the partitioned graph in external memory of each machine and exploits a streaming processing technique. Gemini and Chaos are the state-of-the-art distributed graph systems in each group, respectively.

We present TurboGraph++, a scalable and fast graph analytics system which efficiently processes large graphs by exploiting external memory for scale-up without compromising efficiency. First, TurboGraph++ provides a new graph processing abstraction for efficiently supporting neighborhood analytics that requires processing multi-hop neighborhoods of vertices, such as triangle counting and local clustering coefficient computation, with a fixed memory budget. Second, TurboGraph++ provides a balanced and buffer-aware partitioning scheme for ensuring balanced workloads across machines with reasonable cost. Lastly, TurboGraph++ leverages three-level parallel and overlapping processing for fully utilizing three hardware resources, CPU, disk, and network, in a cluster. Extensive experiments show that TurboGraph++ is designed to scale well to very large graphs, like Chaos, while its performance is comparable to Gemini.

References

[1]

2002. The Yahoo Webscope Program: yahoo! altavista web page hyperlink connectivity graph. https://webscope.sandbox.yahoo.com/.

[2]

2009. The lemur project: Clueweb09 web graph. http://www.lemurproject.org/clueweb09.

[3]

2012. The lemur project: Clueweb12 web graph. http://www.lemurproject.org/clueweb12.

[4]

2012. MPI: A Message-Passing Interface Standard Version 3.0. http://mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf .

[5]

Foto N Afrati, Dimitris Fotakis, and Jeffrey D Ullman. 2013. Enumerating subgraph instances using map-reduce. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on. IEEE, 62--73.

Digital Library

[6]

Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, and Erfan Zamanian. 2016. The end of slow networks: It's time for a redesign. Proceedings of the VLDB Endowment 9, 7 (2016), 528--539.

Digital Library

[7]

John Adrian Bondy, Uppaluri Siva Ramachandra Murty, et al. 1976. Graph theory with applications. Vol. 290. Citeseer.

[8]

Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.

[9]

Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015 Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the Tenth European Conference on Computer Systems. ACM, 1.

Digital Library

[10]

Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis,and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804--1815.

Digital Library

[11]

Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, and Chao Tian. 2017. Parallelizing sequential graph computations. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 495--510.

Digital Library

[12]

Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, Vol. 12. 2.

Digital Library

[13]

Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI, Vol. 14. 599--613.

Digital Library

[14]

Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, and Hwanjo Yu. 2013. TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 77--85.

Digital Library

[15]

Nilesh Jain, Guangdeng Liao, and Theodore L Willke. 2013 Graphbuilder: scalable graph etl framework. In First International Workshop on Graph Data Management Experiences and Systems. ACM, 4.

Digital Library

[16]

Arijit Khan. 2017. Vertex-Centric Graph Processing: The Good, the Bad, and the Ugly. Proceedings of the 20th International Conference on Extending Database Technology (2017).

[17]

Ajay D Kshemkalyani and Mukesh Singhal. 2011. Distributed computing: principles, algorithms, and systems. Cambridge University Press.

Digital Library

[18]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web . ACM, 591--600.

Digital Library

[19]

Aapo Kyrola, Guy E Blelloch, and Carlos Guestrin. 2012. Graphchi: Large-scale graph computation on just a pc. USENIX.

Digital Library

[20]

Longbin Lai, Lu Qin, Xuemin Lin, and Lijun Chang. 2015. Scalable subgraph enumeration in mapreduce. Proceedings of the VLDB Endowment 8, 10 (2015), 974--985.

Digital Library

[21]

Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1--3 (2008), 458--473.

Digital Library

[22]

Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716--727.

Digital Library

[23]

Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-scale distributed graph computing systems: An experimental evaluation. Proceedings of the VLDB Endowment 8, 3 (2014), 281--292.

Digital Library

[24]

Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 135--146.

Digital Library

[25]

R Meusel, O Lehmberg, C Bizer, and S Vigna. 2014. Web data commons-hyperlink graphs. http://webdatacommons.org/hyperlinkgraph/.

[26]

Himchan Park and Min-Soo Kim. 2017. TrillionG: A Trillion-scale Synthetic Graph Generator using a Recursive Vector Model. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 913--928.

Digital Library

[27]

Ha-Myung Park, Sung-Hyon Myaeng, and U Kang. 2016. Pte: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1115--1124.

Digital Library

[28]

Todd Plantenga. 2013. Inexact subgraph isomorphism in MapReduce. J. Parallel and Distrib. Comput. 73, 2 (2013), 164--175.

Digital Library

[29]

Abdul Quamar, Amol Deshpande, and Jimmy Lin. 2014. NScale: neighborhood-centric analytics on large graphs. Proceedings of the VLDB Endowment 7, 13 (2014), 1673--1676.

Digital Library

[30]

Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out graph processing from secondary storage. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 410--424.

Digital Library

[31]

Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 472--488.

Digital Library

[32]

Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S Lam. 2013. Distributed socialite: a datalog-based language for large-scale graph analysis. Proceedings of the VLDB Endowment 6, 14 (2013), 1906--1917.

Digital Library

[33]

Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu.2014. Parallel subgraph listing in a large-scale graph. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 625--636.

Digital Library

[34]

Isabelle Stanton and Gabriel Kliot. 2012. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1222--1230.

Digital Library

[35]

Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From think like a vertex to think like a graph. Proceedings of the VLDB Endowment 7, 3 (2013), 193--204.

Digital Library

[36]

Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 333--342.

Digital Library

[37]

Zhigang Wang, Yu Gu, Yubin Bao, Ge Yu, and Jeffrey Xu Yu. 2016. Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing. In Proceedings of the 2016 International Conference on Management of Data. ACM, 479--494.

Digital Library

[38]

Wenlei Xie, Guozhang Wang, David Bindel, Alan Demers, and Johannes Gehrke. 2013. Fast iterative graph computation with block updates. Proceedings of the VLDB Endowment 6, 14 (2013), 2014--2025.

Digital Library

[39]

Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande, and James Cheng. 2016. Big graph analytics systems. In Proceedings of the 2016 International Conference on Management of Data. ACM, 2241--2243.

Digital Library

[40]

Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment 7, 14 (2014), 1981--1992.

Digital Library

[41]

Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In Proceedings of the 24th International Conference on World Wide Web. ACM, 1307--1317.

Digital Library

[42]

Makoto Yui, Jun Miyazaki, Shunsuke Uemura, and Hayato Yamana. 2010. Nb-GCLOCK: A non-blocking buffer management based on the generalized CLOCK. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 745--756.

[43]

Chang Zhou, Jun Gao, Binbin Sun, and Jeffrey Xu Yu. 2014. MOCgraph: Scalable distributed graph processing using message online computing. Proceedings of the VLDB Endowment 8, 4 (2014), 377--388.

Digital Library

[44]

Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)(Savannah, GA.

Digital Library

[45]

Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In USENIX Annual Technical Conference. 375--386

Digital Library

Cited By

Papon TChen TZhang SAthanassoulis M(2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654928
Xu RWang YXiao X(2024)Graph Computation with Adaptive Granularity2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00169(2123-2136)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00169
Zhang YKumar A(2023)Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics EnginesProceedings of the VLDB Endowment10.14778/3611479.361148316:11(2728-2741)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611483
Show More Cited By

Index Terms

TurboGraph++: A Scalable and Fast Graph Analytics System
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs

Recommendations

TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase ...
A Cohesive Structure Based Bipartite Graph Analytics System
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Bipartite graphs arise naturally when modeling two different types of entities such as user-item, author-paper, and director-board. In recent years, driven by numerous real-world applications in these networks, mining cohesive structures in bipartite ...
PFFS: a scalable flash memory file system for the hybrid architecture of phase-change RAM and NAND flash
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

In this paper, we present the scalable and efficient flash file system using the combination of NAND and Phase-change RAM (PRAM). Until now, several flash file systems have been developed considering the physical characteristics of NAND flash. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

May 2018

1874 pages

ISBN:9781450347037

DOI:10.1145/3183713

General Chairs:
Gautam Das
University of Texas at Arlington, USA
,
Christopher Jermaine
Rice University, USA
,
Philip Bernstein
Microsoft Research, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Samsung Research Funding Center of Samsung Electronics

Conference

SIGMOD/PODS '18

Sponsor:

SIGMOD

SIGMOD/PODS '18: International Conference on Management of Data

June 10 - 15, 2018

TX, Houston, USA

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
1,820
Total Downloads

Downloads (Last 12 months)79
Downloads (Last 6 weeks)13

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Papon TChen TZhang SAthanassoulis M(2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654928
Xu RWang YXiao X(2024)Graph Computation with Adaptive Granularity2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00169(2123-2136)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00169
Zhang YKumar A(2023)Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics EnginesProceedings of the VLDB Endowment10.14778/3611479.361148316:11(2728-2741)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611483
Zhao JZhang YHe LLi QZhang XJiang XYu HLiao XJin HGu LLiu HHe BZhang JSong XWang LZhou J(2023)GraphTune: An Efficient Dependency-Aware Substrate to Alleviate Irregularity in Concurrent Graph ProcessingACM Transactions on Architecture and Code Optimization10.1145/360009120:3(1-24)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3600091
Lee GYoon SKo JKim HShin K(2023)Hypergraph motifs and their extensions beyond binaryThe VLDB Journal10.1007/s00778-023-00827-833:3(625-665)Online publication date: 26-Dec-2023
https://doi.org/10.1007/s00778-023-00827-8
Zhang HYu JZhang YZhao KIves ZBonifati AEl Abbadi A(2022)Parallel Query Processing: To Separate Communication from ComputationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526164(1447-1461)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526164
Li XAusavarungnirun RLiu XLiu XZhang XLu HSong ZJing NLiang XMitra TYoung EXiong J(2022)GzippoProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549372(1-9)Online publication date: 30-Oct-2022
https://dl.acm.org/doi/10.1145/3508352.3549372
Jia MZhang YGan XLi DXu EWang RLu K(2022)vGraph: Memory-Efficient Multicore Graph Processing for Traversal-Centric AlgorithmsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00068(1-14)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00068
Liu TLi D(2022)EndGraph: An Efficient Distributed Graph Preprocessing System2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS54860.2022.00020(111-121)Online publication date: Jul-2022
https://doi.org/10.1109/ICDCS54860.2022.00020
A C(2022)A Comparative Study of Large Automata Distributed Processing2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT)10.1109/AICT55583.2022.10013506(1-6)Online publication date: 12-Oct-2022
https://doi.org/10.1109/AICT55583.2022.10013506
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents