Tuning the granularity of parallelism for distributed graph processing

Luo, Xinyuan; Wu, Sai; Wang, Wei; Shou, Lidan

doi:10.1007/s10619-017-7195-z

Tuning the granularity of parallelism for distributed graph processing

Published: 19 April 2017

Volume 35, pages 117–148, (2017)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Xinyuan Luo¹,
Sai Wu¹,
Wei Wang² &
…
Lidan Shou¹

356 Accesses
Explore all metrics

Abstract

Popular distributed graph processing frameworks, such as Pregel and GraphLab, are based on the vertex-centric computation model, where users write their customized Compute function for each vertex to process the data iteratively. Vertices are evenly partitioned among the compute nodes, and the granularity of parallelism of the graph algorithm is normally tuned by adjusting the number of compute nodes. Vertex-centric model splits the computation into phases. Inside one specific phase, the computation proceeds as an embarrassingly parallel process, because no communication between compute nodes incurs. By default, current graph engine only handles one iteration of the algorithm in a phase. However, in this paper, we find that we can also tune the granularity of parallelism, by aggregating the computation of multiple iterations into one phase, which has a significant impact on the performance of the graph algorithm. In the ideal case, if all computations are handled in one phase, the whole algorithm turns into an embarrassingly parallel algorithm and the benefit of parallelism is maximized. Based on this observation, we propose two approaches, a function-based approach and a parameter-based approach, to automatically transform a Pregel algorithm into a new one with tunable granularity of parallelism. We study the cost of such transformation and the trade-off between the granularity of parallelism and the performance. We provide a new direction to tune the performance of parallel algorithms. Finally, the approaches are implemented in our graph processing system, N2, and we illustrate their performance using popular graph algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Characterizing the Performance of Distributed Graph Computation Platforms

An Efficient Graph Processing System

Application-driven graph partitioning

Article 11 April 2022

Notes

Anatomy of Facebook https://www.facebook.com/notes/facebook-data-team/anatomy-of-facebook/10150388519243859.
Apache Hadoop http://hadoop.apache.org/.
METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering http://glaros.dtc.umn.edu/gkhome/metis/metis/overview.
KaHIP - Karlsruhe High Quality Partitioning http://algo2.iti.kit.edu/kahip.
http://www.dis.uniroma1.it/challenge9/download.shtml.
http://law.di.unimi.it/datasets.php.
Scotch & PT-Scotch: Software package and libraries for sequential and parallel graph partitioning, static mapping and clustering, sequential mesh and hypergraph partitioning, and sequential and parallel sparse matrix block ordering https://www.labri.fr/perso/pelegrin/scotch.

References

Amdahl, G.: Validity of the single processor approach to achieving large-scale computing capabilities. In: Proceedings of the April 18–20, 1967, Spring Joint Computer Conference, pp. 483–485. ACM (1967)
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N. and Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C. Hellerstein, J.M.: Graphlab: A new framework for parallel machine learning. In: UAI 2010: Conference on Uncertainty in Artificial Intelligence (2010)
Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 505–516. ACM (2013)
Foster, I.: Designing and Building Parallel Program. Addison Wesley Publishing Company, Boston (1995)
Google Scholar
Yan, X., Han, J.: gspan: graph-based substructure pattern mining. In: Proceedings 2002 IEEE International Conference on Data Mining, pp. 721–724. IEEE (2002)
Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex” to “think like a graph”. Proc. VLDB Endow. 7(3), 193–204 (2013)
Article Google Scholar
Kambatla, K., Rapolu, N., Jagannathan, S., Grama, A.: Asynchronous algorithms in MapReduce. In: IEEE International Conference on Cluster Computing, pp. 245–254 (2010)
Han, M., Daudjee, K.: Giraph unchained: barrierless asynchronous parallel execution in pregel-like graph processing systems. Proc. VLDB Endow. 8(9), 950–961 (2015)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI’04: 6th Symposium on Operating System Design and Implementation, San Francisco, CA (2004)
Salihoglu, S., Widom, J.: GPS: a graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management, p. 22. ACM (2013)
Macdonald, I.: Symmetric Functions and Hall Polynomials, 2nd edn. Clarendon Press, Oxford (1998)
MATH Google Scholar
Burger, E.B., Tubbs, R.: Making Transcendence Transparent: An Intuitive Approach to Classical Transcendental Number Theory, pp. 52–57. Springer Science & Business Media (2004)
Vigna, S., Boldi, P.: The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web, pp. 595–602. ACM (2004)
Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM (2011)
Salihoglu, S., Widom, J.: Optimizing graph algorithms on pregel-like systems. Proc. VLDB Endow. 7(7), 577–588 (2014)
Article Google Scholar
Bahmani, B., Chakrabarti, K., Xin, D.: Fast personalized pagerank on mapreduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 973–984. ACM (2011)
Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: A peta-scale graph mining system implementation and observations. In: Proceedings of the 9th IEEE International Conference on Data Mining, pp. 229–238. IEEE (2009)
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: Haloop: Efficient iterative data processing on large clusters. Proc. VLDB Endow. 3(1–2), 285–296 (2010)
Article Google Scholar
Zhang, Y., Gao, Q., Gao, L., Wang, C.: Priter: a distributed framework for prioritized iterative. Computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing (2011)
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S., Qiu, J., Fox, G.: Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818. ACM (2010)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: A framework for machine learning in the cloud. Proc VLDB Endow. 5(8), 716–727 (2012)
Article Google Scholar
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: OSDI’12: 10th Symposium on Operating Systems Design and Implementation, vol. 12(1), 2, Hollywood, CA (2012)
Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: Large-scale graph computation on just a pc. In: OSDI’12: 10th Symposium on Operating Systems Design and Implementation, vol. 12, pp. 31–46. Hollywood, CA (2012)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S. and Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot topics in Cloud Computing, pp. 10–10 (2010)
Zhang, Y., Gao, Q., Gao, L., et al.: Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Trans. Parallel Distrib. Syst. 25(8), 2091–2100 (2014)
Article Google Scholar
Gonzalez, J., Xin, R., Dave, A., Crankshaw, D., Franklin, M., Stoica, I.: Graphx: Graph processing in a distributed dataflow framework. In: OSDI’14: 11th Symposium on Operating Systems Design and Implementation, pp. 599–613 (2014)
Yan, D., Cheng, J., Lu, Y., Ng, W.: Effective techniques for message reduction and load balancing in distributed graph computation. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1307–1317. ACM (2015)
Labouseur, A., Birnbaum, J., Olsen, W., Spillane, S., Vijayan, J., Hwang, J., Han, W.: The G* graph database: efficiently managing large distributed dynamic graphs. Distrib. Parallel Databases 33(4), 479–514 (2015)
Article Google Scholar

Download references

Acknowledgements

This research is supported by National Natural Science Foundation of China (Grant No. 61661146001), National Key Basic Research Program of China (973 Program) (No. 2015CB352400) and National High Technology Research and Development Program of China (863 Program) (No. 2014AA015205).

Author information

Authors and Affiliations

Department of Computer Science, Zhejiang University, Hangzhou, China
Xinyuan Luo, Sai Wu & Lidan Shou
Department of Digital Information Technology, Zhejiang Technical Institute of Economics, Hangzhou, China
Wei Wang

Authors

Xinyuan Luo
View author publications
You can also search for this author in PubMed Google Scholar
Sai Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lidan Shou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sai Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, X., Wu, S., Wang, W. et al. Tuning the granularity of parallelism for distributed graph processing. Distrib Parallel Databases 35, 117–148 (2017). https://doi.org/10.1007/s10619-017-7195-z

Download citation

Published: 19 April 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10619-017-7195-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tuning the granularity of parallelism for distributed graph processing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On Characterizing the Performance of Distributed Graph Computation Platforms

An Efficient Graph Processing System

Application-driven graph partitioning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Tuning the granularity of parallelism for distributed graph processing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On Characterizing the Performance of Distributed Graph Computation Platforms

An Efficient Graph Processing System

Application-driven graph partitioning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation