research-article

Parallel processing of large graphs

Authors:

Tomasz Kajdanowicz,

Przemyslaw Kazienko,

Wojciech IndykAuthors Info & Claims

Future Generation Computer Systems, Volume 32, Issue C

Pages 324 - 337

https://doi.org/10.1016/j.future.2013.08.007

Published: 01 March 2014 Publication History

Abstract

More and more large data collections are gathered worldwide in various IT systems. Many of them possess a networked nature and need to be processed and analysed as graph structures. Due to their size they very often require the usage of a parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous Parallel (BSP). They are implemented for two different graph problems: calculation of single source shortest paths (SSSP) and collective classification of graph nodes by means of relational influence propagation (RIP). The methods and algorithms are applied to several network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The results revealed that iterative graph processing with the BSP implementation always and significantly, even up to 10 times outperforms MapReduce, especially for algorithms with many iterations and sparse communication. The extension of MapReduce based on map-side join is usually characterized by better efficiency compared to its origin, although not as much as BSP. Nevertheless, MapReduce still remains a good alternative for enormous networks, whose data structures do not fit in local memories. We compared three parallel computing techniques in terms of large graph processing.MapReduce, map-side join and Bulk Synchronous Parallel tested for two distinct problems.Iterative graph processing with the BSP implementation significantly outperforms MapReduce.Map-side join design pattern may improve the original MapReduce performance.

References

[1]

A. Lumsdaine, D. Gregor, B. Hendrickson, J. Berry, Challenges in parallel graph processing, Parallel Process. Lett., 17 (2007) 5-20.

[2]

G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski, Pregel: a system for large-scale graph processing, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD'10, ACM, New York, NY, USA, 2010, pp. 135-146.

Digital Library

[3]

S. Yang, B. Wang, H. Zhao, B. Wu, Efficient dense structure mining using MapReduce, in: ICDM Workshops'09, IEEE Computer Society, 2009, pp. 332-337.

[4]

J. Lin, M. Schatz, Design patterns for efficient graph algorithms in MapReduce, in: Proceedings of the Eighth Workshop on Mining and Learning with Graphs, ACM, New York, NY, USA, 2010, pp. 78-85.

Digital Library

[5]

J. Lin, C. Dyer, Data-Intensive Text Processing with MapReduce, in: Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2010.

[6]

Y. Bu, B. Howe, M. Balazinska, M. Ernst, Haloop: efficient iterative data processing on large clusters, Proc. VLDB Endowment, 3 (2010) 285-296.

Digital Library

[7]

U. Kang, C. Tsourakakis, C. Faloutsos, Pegasus: a peta-scale graph mining system implementation and observations, in: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, IEEE Computer Society, Washington, DC, USA, 2009, pp. 229-238.

Digital Library

[8]

Y. Zhang, Q. Gao, L. Gao, C. Wang, iMapReduce: a distributed computing framework for iterative computation, J. Grid Comput., 10 (2012) 47-68.

Digital Library

[9]

R. Chen, X. Weng, B. He, M. Yang, Large graph processing in the cloud, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD'10, ACM, New York, NY, USA, 2010, pp. 1123-1126.

Digital Library

[10]

E. Elnikety, T. Elsayed, H. Ramadan, iHadoop: asynchronous iterations for MapReduce, in: CloudCom'11, 2011, pp. 81-90.

[11]

L. Valiant, A bridging model for parallel computation, Commun. ACM, 33 (1990) 103-111.

Digital Library

[12]

T. Kajdanowicz, W. Indyk, P. Kazienko, J. Kukul, Comparison of the efficiency of MapReduce and bulk synchronous parallel approaches to large network processing, in: Proceedings of the ICDM 2012 Workshops, The Second IEEE ICDM Workshop on Data Mining in Networks, IEEE Computer Society, New York, NY, USA, 2012, pp. 218-225.

[13]

T. White, Hadoop: The Definitive Guide, O'Reilly, 2010.

[14]

D. Krizanc, A. Saarimaki, Bulk synchronous parallel: practical experience with a model for parallel computing, in: Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques 1996, 1996, pp. 208-217.

[15]

Apache Giraph, http://incubator.apache.org/giraph/ ¿(accessed: 31.05.2012).

[16]

S. Seo, E. Yoon, J. Kim, S. Jin, J. Kim, S. Maeng, Hama: an efficient matrix computation with the MapReduce framework, in: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, IEEE Computer Society, Washington, DC, USA, 2010, pp. 721-726.

Digital Library

[17]

B. Shao, H. Wang, Y. Xiao, Managing and mining large graphs: systems and implementations, in: Proceedings of the 2012 International Conference on Management of Data, ACM, New York, NY, USA, 2012, pp. 589-592.

[18]

M. Najork, D. Fetterly, A. Halverson, K. Kenthapadi, S. Gollapudi, Of hammers and nails: an empirical comparison of three paradigms for processing large graphs, in: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, 2012, pp. 103-112.

[19]

M.F. Pace, BSP vs MapReduce, CoRR arXiv:abs/1203.2081.

[20]

A. Lumsdaine, G. Douglas, B. Hendrickson, J. Berry, Challenges in parallel graph processing, Parallel Process. Lett., 17 (2007) 5-20.

[21]

MPI: a message passing interface, in: Proc. of Supercomputing'93 Message Passing Interface Forum, IEEE Computer Society Press, 1993, pp. 878-883.

[22]

T.A. El-Ghazawi, W.W. Carlson, J.M. Draper, UPC language specification, 1.2 edition, http://www.gwu.edu/~upc/downloads/upc_specs_1.2.pdf.

[23]

J. Cohen, Graph twiddling in a MapReduce world, J. Comput. Sci. Eng., 11 (2009) 29-41.

Digital Library

[24]

J. Huang, D. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs, Proc. VLDB Endowment, 4 (2011).

[25]

S. Salihoglu, J. Widom, Gps: a graph processing system, Technical Report, Stanford University. URL: http://ilpubs.stanford.edu:8090/1039/.

[26]

P. Kalnis, Z. Khayyat, K. Awara, H. Jamjoom, Mizan: optimizing graph mining in large parallel systems. URL: http://hdl.handle.net/10754/217609.

[27]

S. Chakrabarti, B. Dom, P. Indyk, Enhanced hypertext categorization using hyperlinks, SIGMOD Rec., 27 (1998) 307-318.

Digital Library

[28]

W. Indyk, T. Kajdanowicz, P. Kazienko, Relational large scale multi-label classification method for video categorization, Multimedia Tools Appl., 65 (2013) 63-74.

Digital Library

[29]

P. Haller, H. Miller, Parallelizing machine learning-functionally: a framework and abstractions for parallel graph processing, in: 2nd Annual Scala Workshop, 2011.

[30]

X. Cheng, C. Dale, J. Liu, Statistics and social network of YouTube videos, in: 16th International Workshop on Quality of Service, IWQoS 2008, 2008, pp. 229-238.

Cited By

Zare Hosseini ZKolahdouz Rahimi SForouzan EBaraani A(2023)Distributed RMI-DBG modelExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.120859233:COnline publication date: 15-Dec-2023
https://dl.acm.org/doi/10.1016/j.eswa.2023.120859
Gómez-Romero JMolina-Solana MOehmichen AGuo Y(2022)Visualizing large knowledge graphsFuture Generation Computer Systems10.1016/j.future.2018.06.01589:C(224-238)Online publication date: 21-Apr-2022
https://dl.acm.org/doi/10.1016/j.future.2018.06.015
Brighen ASlimani HRezgui AKheddouci H(2019)Listing all maximal cliques in large graphs on vertex-centric modelThe Journal of Supercomputing10.1007/s11227-019-02770-475:8(4918-4946)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s11227-019-02770-4
Show More Cited By

Parallel processing of large graphs
1. Software and its engineering
  1. Software organization and properties
    1. Software system structures

Recommendations

Comparison of the Efficiency of MapReduce and Bulk Synchronous Parallel Approaches to Large Network Processing
ICDMW '12: Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops

Network structures, especially social networks, grow rapidly and provide huge datasets intractable to analyse. In this paper, two parallel approaches to process large graph structures within the Hadoop environment were compared: Bulk Synchronous ...
A Massively Parallel Processing for the Multiple Linear Regression
SITIS '14: Proceedings of the 2014 Tenth International Conference on Signal-Image Technology and Internet-Based Systems

The amount of data generated by traditional business activities, has resulted data warehouses with a size up to petabytes. The ability to analyze this torrent of data will become the basis of competition and growth for individual firms by ever-narrower ...
A Comparison Study of Graph Data Processing Based on MyBSP and MapReduce
CCBD '14: Proceedings of the 2014 International Conference on Cloud Computing and Big Data

How to effectively process massive graph data is an intractable challenging issue. In this paper, two types of parallel computation approaches were compared: MapReduce and MyBSP. MyBSP is our open source implementation which adopts the Bulk Synchronous ...

Comments

Information & Contributors

Information

Published In

cover image Future Generation Computer Systems

Future Generation Computer Systems Volume 32, Issue C

March 2014

347 pages

ISSN:0167-739X

Issue’s Table of Contents

Copyright © The Authors.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 March 2014

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zare Hosseini ZKolahdouz Rahimi SForouzan EBaraani A(2023)Distributed RMI-DBG modelExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.120859233:COnline publication date: 15-Dec-2023
https://dl.acm.org/doi/10.1016/j.eswa.2023.120859
Gómez-Romero JMolina-Solana MOehmichen AGuo Y(2022)Visualizing large knowledge graphsFuture Generation Computer Systems10.1016/j.future.2018.06.01589:C(224-238)Online publication date: 21-Apr-2022
https://dl.acm.org/doi/10.1016/j.future.2018.06.015
Brighen ASlimani HRezgui AKheddouci H(2019)Listing all maximal cliques in large graphs on vertex-centric modelThe Journal of Supercomputing10.1007/s11227-019-02770-475:8(4918-4946)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s11227-019-02770-4
Colomo-Palacios RStantchev VRodríguez-González A(2018)Special issue on exploiting semantic technologies with particularization on linked data over grid and cloud architecturesFuture Generation Computer Systems10.5555/2748143.274837432:C(260-262)Online publication date: 30-Dec-2018
https://dl.acm.org/doi/10.5555/2748143.2748374
Corbellini AGodoy DMateos CSchiaffino SZunino A(2018)DPMFuture Generation Computer Systems10.1016/j.future.2017.02.02578:P1(474-480)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1016/j.future.2017.02.025
Shi ZLi JGuo PLi SFeng DSu Y(2017)Partitioning dynamic graph asynchronously with distributed FENNELFuture Generation Computer Systems10.1016/j.future.2017.01.01471:C(32-42)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1016/j.future.2017.01.014
Baldominos ACalle JCuadra D(2017)Beyond social graphsPattern Analysis & Applications10.1007/s10044-016-0550-220:1(269-285)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1007/s10044-016-0550-2
Erlandsson FBorg AJohnson HBródka P(2016)Predicting User Participation in Social MediaProceedings of the 12th International Conference and School on Advances in Network Science - Volume 956410.1007/978-3-319-28361-6_10(126-135)Online publication date: 11-Jan-2016
https://dl.acm.org/doi/10.1007/978-3-319-28361-6_10
McCune RWeninger TMadey G(2015)Thinking Like a VertexACM Computing Surveys10.1145/281818548:2(1-39)Online publication date: 12-Oct-2015
https://dl.acm.org/doi/10.1145/2818185

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents