Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2621934.2621940acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial

Asymmetry in Large-Scale Graph Analysis, Explained

Published: 22 June 2014 Publication History

Abstract

Iterative computations are in the core of large-scale graph processing. In these applications, a set of parameters is continuously refined, until a fixed point is reached. Such fixed point iterations often exhibit non-uniform computational behavior, where changes propagate with different speeds throughout the parameter set, making them active or inactive during iterations. This asymmetrical behavior can lead to a many redundant computations, if not exploited. Many specialized graph processing systems and APIs exist that run iterative algorithms efficiently exploiting this asymmetry. However, their functionality is sometimes vaguely defined and due to their different programming models and terminology used, it is often challenging to derive equivalence between them.
We describe an optimization framework for iterative graph processing, which utilizes dataset dependencies. We explain several optimization techniques that exploit asymmetrical behavior of graph algorithms. We formally specify the conditions under which, an algorithm can use a certain technique. We also design template execution plans, using a canonical set of dataflow operators and we evaluate them using real-world datasets and applications. Our experiments show that optimized plans can significantly reduce execution time, often by an order of magnitude. Based on our experiments, we identify a trade-off that can be easily captured and could serve as the basis for automatic optimization of large-scale graph-processing applications.

References

[1]
SNAP: Stanford Network Analysis Platform. http://snap.stanford.edu/index.html. {Online; Last accessed January 2014}.
[2]
Stratosphere. http://getstratosphere.org. {Online; Last accessed January 2014}.
[3]
D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke. Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In Proceedings of the 1st ACM symposium on Cloud computing, pages 119--130, 2010.
[4]
P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Proceedings of the 20th international conference on World Wide Web, 2011.
[5]
P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), pages 595--601, Manhattan, USA, 2004.
[6]
S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl. Spinning fast iterative data flows. Proc. VLDB Endow., 5(11):1268--1279, 2012.
[7]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, pages 17--30, Berkeley, CA, USA, 2012. USENIX Association.
[8]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, 2012.
[9]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135--146, 2010.
[10]
S. R. Mihaylov, Z. G. Ives, and S. Guha. Rex: Recursive, delta-based data-centric computation. Proc. VLDB Endow., 5(11):1280--1291, 2012.
[11]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 439--455, 2013.
[12]
R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, New York, NY, USA, 2013. ACM.
[13]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, 2012.

Cited By

View all
  • (2021)An analysis of the graph processing landscapeJournal of Big Data10.1186/s40537-021-00443-98:1Online publication date: 9-Apr-2021
  • (2018)High-Level Programming Abstractions for Distributed Graph ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.276229430:2(305-324)Online publication date: 1-Feb-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GRADES'14: Proceedings of Workshop on GRAph Data management Experiences and Systems
June 2014
79 pages
ISBN:9781450329828
DOI:10.1145/2621934
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2014

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 29 of 61 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)An analysis of the graph processing landscapeJournal of Big Data10.1186/s40537-021-00443-98:1Online publication date: 9-Apr-2021
  • (2018)High-Level Programming Abstractions for Distributed Graph ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.276229430:2(305-324)Online publication date: 1-Feb-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media