Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Parallel processing of large graphs

Published: 01 March 2014 Publication History

Abstract

More and more large data collections are gathered worldwide in various IT systems. Many of them possess a networked nature and need to be processed and analysed as graph structures. Due to their size they very often require the usage of a parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous Parallel (BSP). They are implemented for two different graph problems: calculation of single source shortest paths (SSSP) and collective classification of graph nodes by means of relational influence propagation (RIP). The methods and algorithms are applied to several network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The results revealed that iterative graph processing with the BSP implementation always and significantly, even up to 10 times outperforms MapReduce, especially for algorithms with many iterations and sparse communication. The extension of MapReduce based on map-side join is usually characterized by better efficiency compared to its origin, although not as much as BSP. Nevertheless, MapReduce still remains a good alternative for enormous networks, whose data structures do not fit in local memories. We compared three parallel computing techniques in terms of large graph processing.MapReduce, map-side join and Bulk Synchronous Parallel tested for two distinct problems.Iterative graph processing with the BSP implementation significantly outperforms MapReduce.Map-side join design pattern may improve the original MapReduce performance.

References

[1]
A. Lumsdaine, D. Gregor, B. Hendrickson, J. Berry, Challenges in parallel graph processing, Parallel Process. Lett., 17 (2007) 5-20.
[2]
G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski, Pregel: a system for large-scale graph processing, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD'10, ACM, New York, NY, USA, 2010, pp. 135-146.
[3]
S. Yang, B. Wang, H. Zhao, B. Wu, Efficient dense structure mining using MapReduce, in: ICDM Workshops'09, IEEE Computer Society, 2009, pp. 332-337.
[4]
J. Lin, M. Schatz, Design patterns for efficient graph algorithms in MapReduce, in: Proceedings of the Eighth Workshop on Mining and Learning with Graphs, ACM, New York, NY, USA, 2010, pp. 78-85.
[5]
J. Lin, C. Dyer, Data-Intensive Text Processing with MapReduce, in: Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2010.
[6]
Y. Bu, B. Howe, M. Balazinska, M. Ernst, Haloop: efficient iterative data processing on large clusters, Proc. VLDB Endowment, 3 (2010) 285-296.
[7]
U. Kang, C. Tsourakakis, C. Faloutsos, Pegasus: a peta-scale graph mining system implementation and observations, in: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, IEEE Computer Society, Washington, DC, USA, 2009, pp. 229-238.
[8]
Y. Zhang, Q. Gao, L. Gao, C. Wang, iMapReduce: a distributed computing framework for iterative computation, J. Grid Comput., 10 (2012) 47-68.
[9]
R. Chen, X. Weng, B. He, M. Yang, Large graph processing in the cloud, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD'10, ACM, New York, NY, USA, 2010, pp. 1123-1126.
[10]
E. Elnikety, T. Elsayed, H. Ramadan, iHadoop: asynchronous iterations for MapReduce, in: CloudCom'11, 2011, pp. 81-90.
[11]
L. Valiant, A bridging model for parallel computation, Commun. ACM, 33 (1990) 103-111.
[12]
T. Kajdanowicz, W. Indyk, P. Kazienko, J. Kukul, Comparison of the efficiency of MapReduce and bulk synchronous parallel approaches to large network processing, in: Proceedings of the ICDM 2012 Workshops, The Second IEEE ICDM Workshop on Data Mining in Networks, IEEE Computer Society, New York, NY, USA, 2012, pp. 218-225.
[13]
T. White, Hadoop: The Definitive Guide, O'Reilly, 2010.
[14]
D. Krizanc, A. Saarimaki, Bulk synchronous parallel: practical experience with a model for parallel computing, in: Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques 1996, 1996, pp. 208-217.
[15]
Apache Giraph, http://incubator.apache.org/giraph/ ¿(accessed: 31.05.2012).
[16]
S. Seo, E. Yoon, J. Kim, S. Jin, J. Kim, S. Maeng, Hama: an efficient matrix computation with the MapReduce framework, in: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, IEEE Computer Society, Washington, DC, USA, 2010, pp. 721-726.
[17]
B. Shao, H. Wang, Y. Xiao, Managing and mining large graphs: systems and implementations, in: Proceedings of the 2012 International Conference on Management of Data, ACM, New York, NY, USA, 2012, pp. 589-592.
[18]
M. Najork, D. Fetterly, A. Halverson, K. Kenthapadi, S. Gollapudi, Of hammers and nails: an empirical comparison of three paradigms for processing large graphs, in: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, 2012, pp. 103-112.
[19]
M.F. Pace, BSP vs MapReduce, CoRR arXiv:abs/1203.2081.
[20]
A. Lumsdaine, G. Douglas, B. Hendrickson, J. Berry, Challenges in parallel graph processing, Parallel Process. Lett., 17 (2007) 5-20.
[21]
MPI: a message passing interface, in: Proc. of Supercomputing'93 Message Passing Interface Forum, IEEE Computer Society Press, 1993, pp. 878-883.
[22]
T.A. El-Ghazawi, W.W. Carlson, J.M. Draper, UPC language specification, 1.2 edition, http://www.gwu.edu/~upc/downloads/upc_specs_1.2.pdf.
[23]
J. Cohen, Graph twiddling in a MapReduce world, J. Comput. Sci. Eng., 11 (2009) 29-41.
[24]
J. Huang, D. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs, Proc. VLDB Endowment, 4 (2011).
[25]
S. Salihoglu, J. Widom, Gps: a graph processing system, Technical Report, Stanford University. URL: http://ilpubs.stanford.edu:8090/1039/.
[26]
P. Kalnis, Z. Khayyat, K. Awara, H. Jamjoom, Mizan: optimizing graph mining in large parallel systems. URL: http://hdl.handle.net/10754/217609.
[27]
S. Chakrabarti, B. Dom, P. Indyk, Enhanced hypertext categorization using hyperlinks, SIGMOD Rec., 27 (1998) 307-318.
[28]
W. Indyk, T. Kajdanowicz, P. Kazienko, Relational large scale multi-label classification method for video categorization, Multimedia Tools Appl., 65 (2013) 63-74.
[29]
P. Haller, H. Miller, Parallelizing machine learning-functionally: a framework and abstractions for parallel graph processing, in: 2nd Annual Scala Workshop, 2011.
[30]
X. Cheng, C. Dale, J. Liu, Statistics and social network of YouTube videos, in: 16th International Workshop on Quality of Service, IWQoS 2008, 2008, pp. 229-238.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Future Generation Computer Systems
Future Generation Computer Systems  Volume 32, Issue C
March 2014
347 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 March 2014

Author Tags

  1. Big data
  2. Bulk Synchronous Parallel
  3. Cloud computing
  4. Collective classification
  5. Large graph processing
  6. MapReduce
  7. Networked data
  8. Parallel processing
  9. Shortest path

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Distributed RMI-DBG modelExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.120859233:COnline publication date: 15-Dec-2023
  • (2022)Visualizing large knowledge graphsFuture Generation Computer Systems10.1016/j.future.2018.06.01589:C(224-238)Online publication date: 21-Apr-2022
  • (2019)Listing all maximal cliques in large graphs on vertex-centric modelThe Journal of Supercomputing10.1007/s11227-019-02770-475:8(4918-4946)Online publication date: 1-Aug-2019
  • (2018)Special issue on exploiting semantic technologies with particularization on linked data over grid and cloud architecturesFuture Generation Computer Systems10.5555/2748143.274837432:C(260-262)Online publication date: 30-Dec-2018
  • (2018)DPMFuture Generation Computer Systems10.1016/j.future.2017.02.02578:P1(474-480)Online publication date: 1-Jan-2018
  • (2017)Partitioning dynamic graph asynchronously with distributed FENNELFuture Generation Computer Systems10.1016/j.future.2017.01.01471:C(32-42)Online publication date: 1-Jun-2017
  • (2017)Beyond social graphsPattern Analysis & Applications10.1007/s10044-016-0550-220:1(269-285)Online publication date: 1-Feb-2017
  • (2016)Predicting User Participation in Social MediaProceedings of the 12th International Conference and School on Advances in Network Science - Volume 956410.1007/978-3-319-28361-6_10(126-135)Online publication date: 11-Jan-2016
  • (2015)Thinking Like a VertexACM Computing Surveys10.1145/281818548:2(1-39)Online publication date: 12-Oct-2015

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media