Article

GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning

Authors:

Wenguang ChenAuthors Info & Claims

USENIX ATC '15: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference

Pages 375 - 386

Published: 08 July 2015 Publication History

Abstract

In this paper, we present GridGraph, a system for processing large-scale graphs on a single machine. Grid-Graph breaks graphs into 1D-partitioned vertex chunks and 2D-partitioned edge blocks using a first fine-grained level partitioning in preprocessing. A second coarse-grained level partitioning is applied in runtime. Through a novel dual sliding windows method, GridGraph can stream the edges and apply on-the-fly vertex updates, thus reduce the I/O amount required for computation. The partitioning of edges also enable selective scheduling so that some of the blocks can be skipped to reduce unnecessary I/O. This is very effective when the active vertex set shrinks with convergence.

Our evaluation results show that GridGraph scales seamlessly with memory capacity and disk bandwidth, and outperforms state-of-the-art out-of-core systems, including GraphChi and X-Stream. Furthermore, we show that the performance of GridGraph is even competitive with distributed systems, and it also provides significant cost efficiency in cloud environment.

References

[1]

AVERY, C. Giraph: Large-scale graph processing infrastructure on hadoop. Proceedings of the Hadoop Summit. Santa Clara (2011).

[2]

BACKSTROM, L., HUTTENLOCHER, D., KLEINBERG, J., AND LAN, X. Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (2006), ACM, pp. 44-54.

[3]

BERTSEKAS, D. P., AND TSITSIKLIS, J. N. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1989.

[4]

BOLDI, P., SANTINI, M., AND VIGNA, S. A large time-aware web graph. In ACM SIGIR Forum (2008), vol. 42, ACM, pp. 33-38.

[5]

BOLDI, P., AND VIGNA, S. The webgraph framework i: compression techniques. In Proceedings of the 13th international conference on World Wide Web (2004), ACM, pp. 595-602.

[6]

CHEN, R., SHI, J., CHEN, Y., AND CHEN, H. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the Tenth European Conference on Computer Systems (2015), ACM, p. 1.

[7]

FALOUTSOS, M., FALOUTSOS, P., AND FALOUTSOS, C. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review (1999), vol. 29, ACM, pp. 251-262.

[8]

GONZALEZ, J. E., LOW, Y., GU, H., BICKSON, D., AND GUESTRIN, C. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI (2012), vol. 12, p. 2.

[9]

HAN, W.-S., LEE, S., PARK, K., LEE, J.-H., KIM, M.-S., KIM, J., AND YU, H. Turbograph: a fast parallel graph engine handling billion-scale graphs in a single pc. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (2013), ACM, pp. 77-85.

[10]

JAIN, N., LIAO, G., AND WILLKE, T. L. Graphbuilder: scalable graph etl framework. In First International Workshop on Graph Data Management Experiences and Systems (2013), ACM, p. 4.

[11]

KHAYYAT, Z., AWARA, K., ALONAZI, A., JAMJOOM, H., WILLIAMS, D., AND KALNIS, P. Mizan: a system for dynamic load balancing in large-scale graph processing. In Proceedings of the 8th ACM European Conference on Computer Systems (2013), ACM, pp. 169-182.

[12]

KWAK, H., LEE, C., PARK, H., AND MOON, S. What is Twitter, a social network or a news media? In WWW '10: Proceedings of the 19th international conference on World wide web (New York, NY, USA, 2010), ACM, pp. 591-600.

[13]

KYROLA, A., BLELLOCH, G. E., AND GUESTRIN, C. Graphchi: Large-scale graph computation on just a pc. In OSDI (2012), vol. 12, pp. 31-46.

[14]

LENHARDT, R., AND ALAKUIJALA, J. Gipfeli-high speed compression algorithm. In Data Compression Conference (DCC), 2012 (2012), IEEE, pp. 109-118.

[15]

LIN, Z., KAHNG, M., SABRIN, K. M., CHAU, D. H. P., LEE, H., AND KANG, U. Mmap: Fast billion-scale graph computation on a pc via memory mapping. In Big Data (Big Data), 2014 IEEE International Conference on (2014), IEEE, pp. 159-164.

[16]

LOW, Y., BICKSON, D., GONZALEZ, J., GUESTRIN, C., KYROLA, A., AND HELLERSTEIN, J. M. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716-727.

[17]

MALEWICZ, G., AUSTERN, M. H., BIK, A. J., DEHNERT, J. C., HORN, I., LEISER, N., AND CZAJKOWSKI, G. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (2010), ACM, pp. 135-146.

[18]

NGUYEN, D., LENHARTH, A., AND PINGALI, K. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (New York, NY, USA, 2013), SOSP '13, ACM, pp. 456-471.

[19]

PAGE, L., BRIN, S., MOTWANI, R., AND WINOGRAD, T. The pagerank citation ranking: Bringing order to the web.

[20]

RANDLES, M., LAMB, D., AND TALEB-BENDIAB, A. A comparative study into distributed load balancing algorithms for cloud computing. In Advanced Information Networking and Applications Workshops (WAINA), 2010 IEEE 24th International Conference on (2010), IEEE, pp. 551-556.

[21]

ROY, A., MIHAILOVIC, I., AND ZWAENEPOEL, W. X-stream: edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013), ACM, pp. 472-488.

[22]

SHAO, B., WANG, H., AND LI, Y. Trinity: A distributed graph engine on a memory cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (2013), ACM, pp. 505-516.

[23]

SHUN, J., AND BLELLOCH, G. E. Ligra: a lightweight graph processing framework for shared memory. In ACM SIGPLAN Notices (2013), vol. 48, ACM, pp. 135-146.

[24]

SHUN, J., DHULIPALA, L., AND BLELLOCH, G. Smaller and faster: Parallel processing of compressed graphs with ligra+. In Proceedings of the IEEE Data Compression Conference (DCC) (2015).

[25]

VALIANT, L. G. A bridging model for parallel computation. Communications of the ACM 33, 8 (1990), 103-111.

[26]

WANG, K., XU, G., SU, Z., AND LIU, Y. D. Graphq: Graph query processing with abstraction refinement. In USENIX ATC (2015).

[27]

WANG, P., ZHANG, K., CHEN, R., CHEN, H., AND GUAN, H. Replication-based fault-tolerance for large-scale graph processing. In Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on (2014), IEEE, pp. 562-573.

[28]

XIN, R. S., GONZALEZ, J. E., FRANKLIN, M. J., AND STOICA, I. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems (2013), ACM, p. 2.

[29]

YAHOO. Yahoo! altavista web page hyperlink connectivity graph, circa 2002. http://webscope.sandbox.yahoo.com/.

[30]

YOO, A., CHOW, E., HENDERSON, K., MCLENDON, W., HENDRICKSON, B., AND CATALYUREK, U. A scalable distributed parallel breadth-first search algorithm on bluegene/l. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing (2005), IEEE Computer Society, p. 25.

[31]

YUAN, P., ZHANG, W., XIE, C., JIN, H., LIU, L., AND LEE, K. Fast iterative graph computation: a path centric approach. In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for (2014), IEEE, pp. 401-412.

[32]

ZHANG, K., CHEN, R., AND CHEN, H. Numa-aware graphstructured analytics. In Proc. PPoPP (2015).

[33]

ZHAO, Y., YOSHIGOE, K., XIE, M., ZHOU, S., SEKER, R., AND BIAN, J. Lightgraph: Lighten communication in distributed graph-parallel processing. In Big Data (BigData Congress), 2014 IEEE International Congress on (2014), IEEE, pp. 717-724.

[34]

ZHENG, D., MHEMBERE, D., BURNS, R., VOGELSTEIN, J., PRIEBE, C. E., AND SZALAY, A. S. Flashgraph: Processing billion-node graphs on an array of commodity ssds. In 13th USENIX Conference on File and Storage Technologies (FAST 15) (Santa Clara, CA, Feb. 2015), USENIX Association, pp. 45-58.

Cited By

Wang RZong WHe SChen XLi ZDang ZBagchi SZhang Y(2024)Efficient large graph processing with chunk-based graph representation modelProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692067(1239-1255)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692067
Xu DRyu JBaek JShin KSu PLi DBagchi SZhang Y(2024)FlexMemProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692042(817-833)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692042
Yang TChen YLiang YYang MMa XWon Y(2024)SeraphProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650719(373-387)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650719
Show More Cited By

GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning

Recommendations

On the Multichromatic Number of s-Stable Kneser Graphs

For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes

An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Forbidden Subgraphs and Weak Locally Connected Graphs

A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

USENIX ATC '15: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference

July 2015

625 pages

ISBN:9781931971225

Program Chairs:
Shan Lu
University of Chicago
,
Erik Riedel
EMC

Sponsors

VMware
NetApp
Google Inc.
Facebook: Facebook
HP: HP

Publisher

USENIX Association

United States

Publication History

Published: 08 July 2015

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

91
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang RZong WHe SChen XLi ZDang ZBagchi SZhang Y(2024)Efficient large graph processing with chunk-based graph representation modelProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692067(1239-1255)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692067
Xu DRyu JBaek JShin KSu PLi DBagchi SZhang Y(2024)FlexMemProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692042(817-833)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692042
Yang TChen YLiang YYang MMa XWon Y(2024)SeraphProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650719(373-387)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650719
Zhan YHu HYang XWang SCao QJiang HYao J(2024)RomeFS: A CXL-SSD Aware File System Exploiting Synergy of Memory-Block Dual PathsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698539(720-736)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698539
Huang KZhai JZheng LWang HJin YZhang QZhang RZheng ZYi YShen X(2024)WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and OperationsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650063(1-17)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650063
Yang TEngland CLi YLi BYang MTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Grafu: Unleashing the Full Potential of Future Value Computation for Out-of-core Synchronous Graph ProcessingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640409(467-481)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640409
Haddadi ABlack-Schaffer DPark C(2023)Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of SwappingProceedings of the International Symposium on Memory Systems10.1145/3631882.3631884(1-11)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3631882.3631884
Wang SZhang MYang KChen KMa SJiang JWu YAamodt TJerger NSwift M(2023)NosWalker: A Decoupled Architecture for Out-of-Core Random Walk ProcessingProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582025(466-482)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582025
Liu CChoi WKhadirsharbiyani SKandemir MMohror KArnold DBadia R(2023)MBFGraph: An SSD-based External Graph System for Evolving GraphsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607070(1-13)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607070
Zhang PKannan RPrasanna VMohror KArnold DBadia R(2023)Phases, Modalities, Spatial and Temporal Locality: Domain Specific ML Prefetcher for Accelerating Graph AnalyticsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607043(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607043
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten