Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2813767.2813795guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning

Published: 08 July 2015 Publication History

Abstract

In this paper, we present GridGraph, a system for processing large-scale graphs on a single machine. Grid-Graph breaks graphs into 1D-partitioned vertex chunks and 2D-partitioned edge blocks using a first fine-grained level partitioning in preprocessing. A second coarse-grained level partitioning is applied in runtime. Through a novel dual sliding windows method, GridGraph can stream the edges and apply on-the-fly vertex updates, thus reduce the I/O amount required for computation. The partitioning of edges also enable selective scheduling so that some of the blocks can be skipped to reduce unnecessary I/O. This is very effective when the active vertex set shrinks with convergence.
Our evaluation results show that GridGraph scales seamlessly with memory capacity and disk bandwidth, and outperforms state-of-the-art out-of-core systems, including GraphChi and X-Stream. Furthermore, we show that the performance of GridGraph is even competitive with distributed systems, and it also provides significant cost efficiency in cloud environment.

References

[1]
AVERY, C. Giraph: Large-scale graph processing infrastructure on hadoop. Proceedings of the Hadoop Summit. Santa Clara (2011).
[2]
BACKSTROM, L., HUTTENLOCHER, D., KLEINBERG, J., AND LAN, X. Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (2006), ACM, pp. 44-54.
[3]
BERTSEKAS, D. P., AND TSITSIKLIS, J. N. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1989.
[4]
BOLDI, P., SANTINI, M., AND VIGNA, S. A large time-aware web graph. In ACM SIGIR Forum (2008), vol. 42, ACM, pp. 33-38.
[5]
BOLDI, P., AND VIGNA, S. The webgraph framework i: compression techniques. In Proceedings of the 13th international conference on World Wide Web (2004), ACM, pp. 595-602.
[6]
CHEN, R., SHI, J., CHEN, Y., AND CHEN, H. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the Tenth European Conference on Computer Systems (2015), ACM, p. 1.
[7]
FALOUTSOS, M., FALOUTSOS, P., AND FALOUTSOS, C. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review (1999), vol. 29, ACM, pp. 251-262.
[8]
GONZALEZ, J. E., LOW, Y., GU, H., BICKSON, D., AND GUESTRIN, C. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI (2012), vol. 12, p. 2.
[9]
HAN, W.-S., LEE, S., PARK, K., LEE, J.-H., KIM, M.-S., KIM, J., AND YU, H. Turbograph: a fast parallel graph engine handling billion-scale graphs in a single pc. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (2013), ACM, pp. 77-85.
[10]
JAIN, N., LIAO, G., AND WILLKE, T. L. Graphbuilder: scalable graph etl framework. In First International Workshop on Graph Data Management Experiences and Systems (2013), ACM, p. 4.
[11]
KHAYYAT, Z., AWARA, K., ALONAZI, A., JAMJOOM, H., WILLIAMS, D., AND KALNIS, P. Mizan: a system for dynamic load balancing in large-scale graph processing. In Proceedings of the 8th ACM European Conference on Computer Systems (2013), ACM, pp. 169-182.
[12]
KWAK, H., LEE, C., PARK, H., AND MOON, S. What is Twitter, a social network or a news media? In WWW '10: Proceedings of the 19th international conference on World wide web (New York, NY, USA, 2010), ACM, pp. 591-600.
[13]
KYROLA, A., BLELLOCH, G. E., AND GUESTRIN, C. Graphchi: Large-scale graph computation on just a pc. In OSDI (2012), vol. 12, pp. 31-46.
[14]
LENHARDT, R., AND ALAKUIJALA, J. Gipfeli-high speed compression algorithm. In Data Compression Conference (DCC), 2012 (2012), IEEE, pp. 109-118.
[15]
LIN, Z., KAHNG, M., SABRIN, K. M., CHAU, D. H. P., LEE, H., AND KANG, U. Mmap: Fast billion-scale graph computation on a pc via memory mapping. In Big Data (Big Data), 2014 IEEE International Conference on (2014), IEEE, pp. 159-164.
[16]
LOW, Y., BICKSON, D., GONZALEZ, J., GUESTRIN, C., KYROLA, A., AND HELLERSTEIN, J. M. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716-727.
[17]
MALEWICZ, G., AUSTERN, M. H., BIK, A. J., DEHNERT, J. C., HORN, I., LEISER, N., AND CZAJKOWSKI, G. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (2010), ACM, pp. 135-146.
[18]
NGUYEN, D., LENHARTH, A., AND PINGALI, K. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (New York, NY, USA, 2013), SOSP '13, ACM, pp. 456-471.
[19]
PAGE, L., BRIN, S., MOTWANI, R., AND WINOGRAD, T. The pagerank citation ranking: Bringing order to the web.
[20]
RANDLES, M., LAMB, D., AND TALEB-BENDIAB, A. A comparative study into distributed load balancing algorithms for cloud computing. In Advanced Information Networking and Applications Workshops (WAINA), 2010 IEEE 24th International Conference on (2010), IEEE, pp. 551-556.
[21]
ROY, A., MIHAILOVIC, I., AND ZWAENEPOEL, W. X-stream: edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013), ACM, pp. 472-488.
[22]
SHAO, B., WANG, H., AND LI, Y. Trinity: A distributed graph engine on a memory cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (2013), ACM, pp. 505-516.
[23]
SHUN, J., AND BLELLOCH, G. E. Ligra: a lightweight graph processing framework for shared memory. In ACM SIGPLAN Notices (2013), vol. 48, ACM, pp. 135-146.
[24]
SHUN, J., DHULIPALA, L., AND BLELLOCH, G. Smaller and faster: Parallel processing of compressed graphs with ligra+. In Proceedings of the IEEE Data Compression Conference (DCC) (2015).
[25]
VALIANT, L. G. A bridging model for parallel computation. Communications of the ACM 33, 8 (1990), 103-111.
[26]
WANG, K., XU, G., SU, Z., AND LIU, Y. D. Graphq: Graph query processing with abstraction refinement. In USENIX ATC (2015).
[27]
WANG, P., ZHANG, K., CHEN, R., CHEN, H., AND GUAN, H. Replication-based fault-tolerance for large-scale graph processing. In Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on (2014), IEEE, pp. 562-573.
[28]
XIN, R. S., GONZALEZ, J. E., FRANKLIN, M. J., AND STOICA, I. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems (2013), ACM, p. 2.
[29]
YAHOO. Yahoo! altavista web page hyperlink connectivity graph, circa 2002. http://webscope.sandbox.yahoo.com/.
[30]
YOO, A., CHOW, E., HENDERSON, K., MCLENDON, W., HENDRICKSON, B., AND CATALYUREK, U. A scalable distributed parallel breadth-first search algorithm on bluegene/l. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing (2005), IEEE Computer Society, p. 25.
[31]
YUAN, P., ZHANG, W., XIE, C., JIN, H., LIU, L., AND LEE, K. Fast iterative graph computation: a path centric approach. In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for (2014), IEEE, pp. 401-412.
[32]
ZHANG, K., CHEN, R., AND CHEN, H. Numa-aware graphstructured analytics. In Proc. PPoPP (2015).
[33]
ZHAO, Y., YOSHIGOE, K., XIE, M., ZHOU, S., SEKER, R., AND BIAN, J. Lightgraph: Lighten communication in distributed graph-parallel processing. In Big Data (BigData Congress), 2014 IEEE International Congress on (2014), IEEE, pp. 717-724.
[34]
ZHENG, D., MHEMBERE, D., BURNS, R., VOGELSTEIN, J., PRIEBE, C. E., AND SZALAY, A. S. Flashgraph: Processing billion-node graphs on an array of commodity ssds. In 13th USENIX Conference on File and Storage Technologies (FAST 15) (Santa Clara, CA, Feb. 2015), USENIX Association, pp. 45-58.

Cited By

View all
  • (2024)Efficient large graph processing with chunk-based graph representation modelProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692067(1239-1255)Online publication date: 10-Jul-2024
  • (2024)FlexMemProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692042(817-833)Online publication date: 10-Jul-2024
  • (2024)SeraphProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650719(373-387)Online publication date: 27-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
USENIX ATC '15: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference
July 2015
625 pages
ISBN:9781931971225

Sponsors

  • VMware
  • NetApp
  • Google Inc.
  • Facebook: Facebook
  • HP: HP

Publisher

USENIX Association

United States

Publication History

Published: 08 July 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient large graph processing with chunk-based graph representation modelProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692067(1239-1255)Online publication date: 10-Jul-2024
  • (2024)FlexMemProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692042(817-833)Online publication date: 10-Jul-2024
  • (2024)SeraphProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650719(373-387)Online publication date: 27-Feb-2024
  • (2024)RomeFS: A CXL-SSD Aware File System Exploiting Synergy of Memory-Block Dual PathsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698539(720-736)Online publication date: 20-Nov-2024
  • (2024)WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and OperationsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650063(1-17)Online publication date: 22-Apr-2024
  • (2024)Grafu: Unleashing the Full Potential of Future Value Computation for Out-of-core Synchronous Graph ProcessingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640409(467-481)Online publication date: 27-Apr-2024
  • (2023)Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of SwappingProceedings of the International Symposium on Memory Systems10.1145/3631882.3631884(1-11)Online publication date: 2-Oct-2023
  • (2023)NosWalker: A Decoupled Architecture for Out-of-Core Random Walk ProcessingProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582025(466-482)Online publication date: 25-Mar-2023
  • (2023)MBFGraph: An SSD-based External Graph System for Evolving GraphsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607070(1-13)Online publication date: 12-Nov-2023
  • (2023)Phases, Modalities, Spatial and Temporal Locality: Domain Specific ML Prefetcher for Accelerating Graph AnalyticsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607043(1-15)Online publication date: 12-Nov-2023
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media