Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey
Public Access

Thinking Like a Vertex: A Survey of Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing

Published: 12 October 2015 Publication History

Abstract

The vertex-centric programming model is an established computational paradigm recently incorporated into distributed processing frameworks to address challenges in large-scale graph processing. Billion-node graphs that exceed the memory capacity of commodity machines are not well supported by popular Big Data tools like MapReduce, which are notoriously poor performing for iterative graph algorithms such as PageRank. In response, a new type of framework challenges one to “think like a vertex” (TLAV) and implements user-defined programs from the perspective of a vertex rather than a graph. Such an approach improves locality, demonstrates linear scalability, and provides a natural way to express and compute many iterative graph algorithms. These frameworks are simple to program and widely applicable but, like an operating system, are composed of several intricate, interdependent components, of which a thorough understanding is necessary in order to elicit top performance at scale. To this end, the first comprehensive survey of TLAV frameworks is presented. In this survey, the vertex-centric approach to graph processing is overviewed, TLAV frameworks are deconstructed into four main components and respectively analyzed, and TLAV implementations are reviewed and categorized.

References

[1]
Amine Abou-Rjeili and George Karypis. 2006. Multilevel algorithms for partitioning power-law graphs. In Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS’06). IEEE Computer Society, Washington, DC, 124. http://dl.acm.org/citation.cfm?id=1898953.1899055.
[2]
Charu Aggarwal and Karthik Subbian. 2014. Evolutionary network analysis: A survey. ACM Comput. Surv. 47, 1, Article 10 (May 2014), 36 pages.
[3]
Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J. Smola. 2013. Distributed large-scale natural graph factorization. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). International World Wide Web Conferences Steering Committee, Geneva, Switzerland, 37--48. http://dl.acm.org/citation.cfm?id=2488388.2488393.
[4]
Deepak Ajwani, Marcel Karnstedt, and Alessandra Sala. 2015. Processing large graphs: Representations, storage, systems and algorithms. In Proceedings of the 24th International Conference on World Wide Web Companion (WWW’15 Companion). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1545--1545.
[5]
Réka Albert, Hawoong Jeong, and Albert-László Barabási. 2000. Error and attack tolerance of complex networks. Nature 406, 6794 (2000), 378--382.
[6]
Konstantin Andreev and Harald Racke. 2006. Balanced graph partitioning. Theory Comput. Syst. 39, 6, 929--939.
[7]
Ching Avery. 2011. Giraph: Large-scale graph processing infrastructure on Hadoop. In Proceedings of Hadoop Summit. Santa Clara, CA.
[8]
Nguyen Thien Bao and Toyotaro Suzumura. 2013. Towards highly scalable pregel-based graph processing platform with x10. In Proceedings of the 22nd International Conference on World Wide Web Companion (WWW’13 Companion). International World Wide Web Conferences Steering Committee, Geneva, Switzerland, 501--508. http://dl.acm.org/citation.cfm?id=2487788.2487984.
[9]
Scott Beamer, Krste Asanović, and David Patterson. 2013. Direction-optimizing breadth-first search. Sci. Program. 21, 3--4 (July 2013), 137--148. http://dl.acm.org/citation.cfm?id=2590251.2590258.
[10]
Richard Bellman. 1958. On a routing problem. Quart. Appl. Math. 16 (1958), 87--90.
[11]
Michael A. Bender, Gerth Stølting Brodal, Rolf Fagerberg, Riko Jacob, and Elias Vicari. 2007. Optimal sparse matrix dense vector multiplication in the I/O-model. In Proceedings of the 19th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’07). ACM, New York, NY, 61--70.
[12]
Una Benlic and Jin-Kao Hao. 2013. Breakout local search for the vertex separator problem. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13). AAAI Press, 461--467. http://dl.acm.org/citation.cfm?id=2540128.2540196.
[13]
Dimitri P. Bertsekas, Francesca Guerriero, and Roberto Musmanno. 1996. Parallel asynchronous label-correcting methods for shortest paths. J. Optim. Theory Appl. 88, 2 (Feb. 1996), 297--320.
[14]
Dimitri P. Bertsekas and John N. Tsitsiklis. 1989. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Upper Saddle River, NJ.
[15]
Florian Bourse, Marc Lelarge, and Milan Vojnovic. 2014. Balanced graph edge partition. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 1456--1465.
[16]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 1--7 (April 1998), 107--117.
[17]
Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. 2012. The haloop approach to large-scale iterative data analysis. VLDB J. 21, 2 (April 2012), 169--190.
[18]
Aydín Bulu&ctilde;c and John R Gilbert. 2011. The combinatorial BLAS: Design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25, 4 (Nov. 2011), 496--509.
[19]
Hojung Cha and Dongho Lee. 2001. H-BSP: A hierarchical BSP computation model. J. Supercomput. 18, 2 (Feb. 2001), 179--200.
[20]
Kanianthra Mani Chandy and Leslie Lamport. 1985. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 63--75.
[21]
Kanianthra Mani Chandy and Jayadev Misra. 1984. The drinking philosophers problem. ACM Trans. Program. Lang. Syst. 6, 4 (Oct. 1984), 632--646.
[22]
Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. SIGPLAN Not. 40, 10 (Oct. 2005), 519--538.
[23]
Qun Chen, Song Bai, Zhanhuai Li, Zhiying Gou, Bo Suo, and Wei Pan. 2014a. GraphHP: A hybrid platform for iterative graph processing. Retrieved July 17, 2014, from http://wowbigdata.net.cn/paper/GraphHP %EF%BC%9AA%20Hybrid%20Platform%20for%20Iterative%20Graph%20Processing.pdf.
[24]
Rong Chen, Xin Ding, Peng Wang, Haibo Chen, Binyu Zang, and Haibing Guan. 2014b. Computation and communication efficient graph processing with distributed immutable view. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC’14). ACM, New York, NY, 215--226.
[25]
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the 10th European Conference on Computer Systems (EuroSys’15). ACM, New York, NY, Article 1, 15 pages.
[26]
Rong Chen, Jiaxin Shi, Binyu Zang, and Haibing Guan. 2014. Bipartite-oriented distributed graph partitioning for big learning. In Proceedings of 5th Asia-Pacific Workshop on Systems (APSys’14). ACM, Article 14, 7 pages.
[27]
Rishan Chen, Mao Yang, Xuetian Weng, Byron Choi, Bingsheng He, and Xiaoming Li. 2012. Improving large graph processing on partitioned graphs in the cloud. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC’12). ACM, New York, NY, Article 3, 13 pages.
[28]
Yen-Yu Chen, Qingqing Gan, and Torsten Suel. 2002. I/O-efficient techniques for computing pagerank. In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM’02). ACM, New York, NY, 549--557.
[29]
Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: Taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12). ACM, New York, NY, 85--98.
[30]
Sun Chung and Anne Condon. 1996. Parallel implementation of Bouvka’s minimum spanning tree algorithm. In Proceedings of the 10th International Parallel Processing Symposium (IPPS’06). IEEE Computer Society, Washington, DC, 302--308.
[31]
Jonathan Cohen. 2009. Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11, 4 (July 2009), 29--41.
[32]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (Jan. 2008), 107--113.
[33]
Mohamed Didi Biha and Marie-Jean Meurs. 2011. An exact algorithm for solving the vertex separator problem. J. Global Optim. 49, 3 (March 2011), 425--434.
[34]
Edsger Wybe Dijkstra. 1959. A note on two problems in connection with graphs. Numer. Math. 1, 1 (1959), 269--271.
[35]
Edsger Wybe Dijkstra. 1971. Hierarchical ordering of sequential processes. Acta Inf. 1, 2 (June 1971), 115--138.
[36]
Nicholas Edmonds. 2013. Active Messages as a Spanning Model for Parallel Graph Computation. Ph.D. Dissertation. Indiana University.
[37]
Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: A runtime for iterative MapReduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC’10). ACM, New York, NY, 810--818.
[38]
Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M. Patel. 2015. The case against specialized graph analytics engines. In Online Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR’15), Asilomar, CA, January 4--7, 2015. http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper20.pdf.
[39]
Uriel Feige, MohammadTaghi Hajiaghayi, and James R. Lee. 2008. Improved approximation algorithms for minimum weight vertex separators. SIAM J. Comput. 38, 2 (May 2008), 629--657.
[40]
Linton C. Freeman. 1977. A set of measures of centrality based on betweenness. Sociometry 40, 1, 35--41.
[41]
Joachim Gehweiler and Henning Meyerhenke. 2010. A distributed diffusive heuristic for clustering a virtual P2P supercomputer. In Proceedings of the 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and PhD Forum (IPDPSW’10). 1--8.
[42]
Joseph Gonzalez, Yucheng Low, Arthur Gretton, and Carlos Guestrin. 2011. Parallel gibbs sampling: From colored fields to thin junction trees. In International Conference on Artificial Intelligence and Statistics. 324--332.
[43]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, 17--30. http://dl.acm.org/citation.cfm?id=2387880.2387883.
[44]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, 599--613. http://dl.acm.org/citation.cfm?id=2685048.2685096
[45]
Douglas Gregor and Andrew Lumsdaine. 2005. The parallel BGL: A generic library for distributed graph computations. In Proceedings of the Parallel Object-Oriented Scientific Computing (POOSC’14).
[46]
Alessio Guerrieri and Alberto Montresor. 2014. Distributed edge partitioning for graph processing. arXiv preprint arXiv:1403.6270. http://arxiv.org/abs/1403.6270
[47]
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The who to follow service at twitter. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). International World Wide Web Conferences Steering Committee, Geneva, Switzerland, 505--514. http://dl.acm.org/citation.cfm?id=2488388.2488433
[48]
William W. Hager, James T. Hungerford, and Ilya Safro. 2014. A multilevel bilinear programming algorithm for the vertex separator problem. arXiv preprint arXiv:1410.4885. http://arxiv.org/abs/1410.4885.
[49]
Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Ozsu, Xingfang Wang, and Tianqi Jin. 2014. An experimental comparison of pregel-like graph processing systems. Proceedings of the VLDB Endowment 7, 12 (2014), 1047--1058.
[50]
Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, and Hwanjo Yu. 2013. TurboGraph: A fast parallel graph engine handling billion-scale graphs in a single PC. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, 77--85.
[51]
Wentao Hant, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: A graph engine for temporal graph analysis. In Proceedings of the 9th European Conference on Computer Systems (EuroSys’14). ACM, New York, NY, Article 1, 14 pages.
[52]
Harshvardhan, Adam Fidel, Nancy M. Amato, and Lawrence Rauchwerger. 2014. KLA: A new algorithmic paradigm for parallel graph computations. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). ACM, New York, NY, 27--38.
[53]
Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun. 2011. Efficient parallel graph exploration on multi-core CPU and GPU. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE Computer Society, Washington, DC, 78--88.
[54]
Imranul Hoque and Indranil Gupta. LFGraph: Simple and fast distributed graph analytics. In Proceedings of the ACM Symposium on Timely Results in Operating Systems.
[55]
Borislav Iordanov. 2010. HyperGraphDB: A generalized graph database. In Proceedings of the 2010 International Conference on Web-Age Information Management. Springer-Verlag, Berlin, 25--36. http://dl.acm.org/citation.cfm?id=1927585.1927589
[56]
Nilesh Jain, Guangdeng Liao, and Theodore L. Willke. 2013. GraphBuilder: Scalable graph ETL framework. In 1st International Workshop on Graph Data Management Experiences and Systems (GRADES’13). ACM, New York, NY, Article 4, 6 pages.
[57]
Tomasz Kajdanowicz, Przemyslaw Kazienko, and Wojciech Indyk. 2014. Parallel processing of large graphs. Future Gener. Comput. Syst. 32 (March 2014), 324--337.
[58]
U Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos. 2011. GBASE: A scalable and general graph management system. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, New York, NY, 1091--1099.
[59]
U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. 2009. PEGASUS: A peta-scale graph mining system implementation and observations. In Proceedings of the 2009 9th IEEE International Conference on Data Mining (ICDM’09). IEEE Computer Society, Washington, DC, 229--238.
[60]
George Karypis and Vipin Kumar. 1995. Multilevel graph partitioning schemes. In Proceedings of the International Conference on Parallel Processing (ICPP’95). 113--122.
[61]
George Karypis and Vipin Kumar. 1996. Parallel multilevel k-way partitioning scheme for irregular graphs. In Proceedings of the 1996 ACM/IEEE Conference on Supercomputing (Supercomputing’96). IEEE Computer Society, Washington, DC, Article 35.
[62]
Brian W. Kernighan and Shen Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49, 2 (1970), 291--307.
[63]
Arijit Khan and Sameh Elnikety. 2014. Systems for big-graphs. Proc. VLDB Endow. 7, 13 (Aug. 2014), 1709--1710. http://dl.acm.org/citation.cfm?id=2733004.2733067
[64]
Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, and Panos Kalnis. 2013. Mizan: A system for dynamic load balancing in large-scale graph processing. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys’13). ACM, New York, NY, 169--182.
[65]
Gang-Hoon Kim, Silvana Trimi, and Ji-Hyong Chung. 2014. Big-data applications in the government sector. Commun. ACM 57, 3 (March 2014), 78--85.
[66]
Mijung Kim and K. Selçuk Candan. 2012. SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data Knowl. Eng. 72 (Feb. 2012), 285--303.
[67]
Fabian Kuhn, Thomas Moscibroda, and Roger Wattenhofer. 2010. Local computation: Lower and upper bounds. arXiv preprint arXiv:1011.5470.
[68]
Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. 2009. Optimistic parallelism requires abstractions. Commun. ACM 52, 9 (Sept. 2009), 89--97.
[69]
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884
[70]
Aapo Kyrola and Carlos Guestrin. 2014. GraphChi-DB: Simple design for a scalable graph database system--on just a PC. arXiv preprint arXiv:1403.0701. http://arxiv.org/abs/1403.0701
[71]
Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 1 (2009), 29--123.
[72]
Jimmy Lin. 2013. Mapreduce is good enough? If all you have is a hammer, throw away everything that’s not a nail! Big Data 1, 1 (2013), 28--37.
[73]
Jimmy Lin and Michael Schatz. 2010. Design patterns for efficient graph algorithms in MapReduce. In Proceedings of the 8th Workshop on Mining and Learning with Graphs (MLG’10). ACM, New York, NY, 78--85.
[74]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8 (April 2012), 716--727.
[75]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1006.4990.
[76]
Honghui Lu, Sandhya Dwarkadas, Alan L. Cox, and Willy Zwaenepoel. 1995. Message passing versus distributed shared memory on networks of workstations. In Proceedings of the 1995 ACM/IEEE Conference on Supercomputing (Supercomputing’95). ACM, New York, NY, Article 37.
[77]
Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-scale distributed graph computing systems: An experimental evaluation. Proc. VLDB Endow. 8 (2014), 3.
[78]
Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan Berry. 2007. Challenges in parallel graph processing. Parallel Process. Lett. 17, 01 (2007), 5--20.
[79]
Nancy A. Lynch. 1996. Distributed Algorithms. Morgan Kaufmann.
[80]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, 135--146.
[81]
Jasmina Malicevic, Laurent Bindschaedler, Amitabha Roy, and Willy Zwaenepoel. 2014. X-Stream. http://labos.epfl.ch/x-stream.
[82]
Urlich Meyer and Peter Sanders. 2003. Δ-stepping: A parallelizable shortest path algorithm. J. Algorithms 49, 1 (2003), 114--152. 1998 European Symposium on Algorithms.
[83]
Henning Meyerhenke, Peter Sanders, and Christian Schulz. 2014. Parallel graph partitioning for complex networks. arXiv preprint arXiv:1404.4797.
[84]
Hui Miao, Xiangyang Liu, Bert Huang, and Lise Getoor. 2013. A hypergraph-partitioned vertex programming approach for large-scale consensus optimization. In Proceedings of the 2013 IEEE International Conference on Big Data. 193--198.
[85]
Kameshwar Munagala and Abhiram Ranade. 1999. I/O-complexity of graph algorithms. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99). Society for Industrial and Applied Mathematics, Philadelphia, PA, 687--694. http://dl.acm.org/citation.cfm?id=314500.314891.
[86]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 456--471.
[87]
Karthik Nilakant, Valentin Dalibard, Amitabha Roy, and Eiko Yoneki. 2014. PrefEdge: SSD prefetcher for large-scale graph traversal. In Proceedings of International Conference on Systems and Storage (SYSTOR’14). ACM, New York, NY, Article 4, 12 pages.
[88]
M. Usman Nisar, Arash Fard, and John A. Miller. 2013. Techniques for graph analytics on big data. In Proceedings of the 2013 IEEE International Congress on Big Data (BIGDATACONGRESS’13). IEEE Computer Society, Washington, DC, 255--262.
[89]
Joel Nishimura and Johan Ugander. 2013. Restreaming graph partitioning: Simple versatile algorithms for advanced balancing. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, 1106--1114.
[90]
Bill Nitzberg and Virginia Lo. 1991. Distributed shared memory: A survey of issues and algorithms. Computer 24, 8 (Aug. 1991), 52--60.
[91]
Matthew Felice Pace. 2012. {BSP} vs MapReduce. Procedia Comput. Sci. 9, (2012), 246--255. Proceedings of the International Conference on Computational Science, {ICCS} 2012.
[92]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
[93]
Roger Pearce, Maya Gokhale, and Nancy M. Amato. 2010. Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10). IEEE Computer Society, Washington, DC, 1--11.
[94]
David Peleg. 2000. Distributed Computing: A Locality-Sensitive Approach. Society for Industrial and Applied Mathematics, Philadelphia, PA.
[95]
Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The tao of parallelism in algorithms. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). 12--25.
[96]
Ivanilton Polato, Reginaldo R. Alfredo Goldman, and Fabio Kon. 2014. A comprehensive view of Hadoop research -- A systematic literature review. J. Network Comput. Appl. 46, (2014), 1--25.
[97]
Russell Power and Jinyang Li. 2010. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI, Vol. 10. 1--14.
[98]
Vijayan Prabhakaran, Ming Wu, Xuetian Weng, Frank McSherry, Lidong Zhou, and Maya Haridasan. 2012. Managing large graphs on multi-cores with graph awareness. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC’12). USENIX Association, Berkeley, CA, 4. http://dl.acm.org/citation.cfm?id=2342821.2342825.
[99]
Robert Preis. 1999. Linear time 1/2 -approximation algorithm for maximum weighted matching in general graphs. In Proceedings of the 16th Annual Conference on Theoretical Aspects of Computer Science (STACS’99). Springer-Verlag, Berlin, 259--269. http://dl.acm.org/citation.cfm?id=1764891.1764924.
[100]
Jelica Protic, Milo Tomasevic, and Veljko Milutinovic (Eds.). 1997. Distributed Shared Memory: Concepts and Systems. IEEE Computer Society Press, Los Alamitos, CA.
[101]
Louise Quick, Paul Wilkinson, and David Hardcastle. 2012. Using pregel-like large scale graph processing frameworks for social network analysis. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM’12). IEEE Computer Society, Washington, DC, 457--463.
[102]
Usha Nandini Raghavan, Réka Albert, and Soundar Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 3 (2007), 036106. http://dx.doi.org/10.1103/PhysRevE.76.036106.
[103]
Fatemeh Rahimian, Amir H. Payberah, Sarunas Girdzijauskas, and Seif Haridi. 2014. Distributed vertex-cut partitioning. In Distributed Applications and Interoperable Systems, Kostas Magoutis and Peter Pietzuch (Eds.). Springer, Berlin, 186--200.
[104]
Fatemeh Rahimian, Amir H. Payberah, Sarunas Girdzijauskas, Mark Jelasity, and Seif Haridi. 2013. JA-BE-JA: A distributed algorithm for balanced graph partitioning. In Proceedings of the 2013 IEEE 7th International Conference on Self-Adaptive and Self-Organizing Systems (SASO’13). IEEE Computer Society, Washington, DC, 51--60.
[105]
Lakshmish Ramaswamy, Bugra Gedik, and Ling Liu. 2005. A distributed approach to node clustering in decentralized peer-to-peer networks. IEEE Trans. Parallel Distrib. Syst. 16, 9 (Sept. 2005), 814--829.
[106]
Mark Redekopp, Yogesh Simmhan, and Viktor K. Prasanna. 2013. Optimizations and analysis of BSP graph processing models on public clouds. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS’13). IEEE Computer Society, Washington, DC, 203--214.
[107]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric graph processing using streaming partitions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 472--488.
[108]
Sherif Sakr. 2013. Processing large-scale graph data: A guide to current technology. IBM Developerworks (June 2013), 15.
[109]
Semih Salihoglu and Jennifer Widom. 2013. GPS: A graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management (SSDBM’13). ACM, New York, NY, Article 22, 12 pages.
[110]
Semih Salihoglu and Jennifer Widom. 2014. Optimizing Graph Algorithms on Pregel-Like Systems. Technical Report. Stanford InfoLab.
[111]
Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming Yin, and Pradeep Dubey. 2014. Navigating the maze of graph analytics frameworks using massive graph datasets. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD'14). ACM, New York, NY, 979--990.
[112]
Sangwon Seo, Edward J. Yoon, Jaehong Kim, Seongwook Jin, Jin-Soo Kim, and Seungryoul Maeng. 2010. HAMA: An efficient matrix computation with the MapReduce framework. In Proceedings of the 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CLOUDCOM’10). IEEE Computer Society, Washington, DC, 721--726.
[113]
Tina Beseri Sevim, Hakan Kutucu, and Murat Ersen Berberler. 2012. New mathematical model for finding minimum vertex cut set. In 2012 IV International Conference on Problems of Cybernetics and Informatics (PCI’12). 1--2.
[114]
Zechao Shang and Jeffrey Xu Yu. 2013. Catch the wind: Graph workload balancing on cloud. In Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE’13). IEEE Computer Society, Washington, DC, 553--564.
[115]
Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A distributed graph engine on a memory cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 505--516.
[116]
Yanyan Shen, Gang Chen, H. V. Jagadish, Wei Lu, Beng Chin Ooi, and Bogdan Marius Tudor. 2014. Fast failure recovery in distributed graph processing systems. Proc. VLDB Endow. 8, 4 (Dec. 2014), 437--448. http://dl.acm.org/citation.cfm?id=2735496.2735506.
[117]
Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). ACM, New York, NY, 135--146.
[118]
Julian Shun, Laxman Dhulipala, and Guy Blelloch. 2015. Smaller and faster: Parallel processing of compressed graphs with Ligra+. In Proceedings of the IEEE Data Compression Conference (DCC’15).
[119]
Yogesh Simmhan, Alok Kumbhare, Charith Wickramaarachchi, Soonil Nagarkar, Santosh Ravi, Cauligi Raghavendra, and Viktor Prasanna. 2014. GoFFish: A sub-graph centric framework for large-scale graph analytics. In Euro-Par 2014 Parallel Processing, Fernando Silva, Inłs Dutra, and Vtor Santos Costa (Eds.). Lecture Notes in Computer Science, Vol. 8632. Springer International Publishing, 451--462.
[120]
George M. Slota, Kamesh Madduri, and Sivasankaran Rajamanickam. 2014. PULP: Scalable multi-objective multi-constraint partitioning for small-world networks. In 2014 IEEE International Conference on Big Data. IEEE, Washington, DC, 481--490.
[121]
Isabelle Stanton. 2014. Streaming balanced graph partitioning algorithms for random graphs. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14). SIAM, 1287--1301. http://dl.acm.org/citation.cfm?id=2634074.2634169.
[122]
Isabelle Stanton and Gabriel Kliot. 2012. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). ACM, New York, NY, 1222--1230.
[123]
Philip Stutz, Abraham Bernstein, and William Cohen. 2010. Signal/Collect: Graph algorithms for the (semantic) web. In Proceedings of the 9th International Semantic Web Conference on the Semantic Web - Volume I. Springer-Verlag, Berlin, 764--780. http://dl.acm.org/citation.cfm?id=1940281.1940330.
[124]
Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). ACM, New York, NY, 607--614.
[125]
Serafettin Tasci and Murat Demirbas. 2013. Giraphx: Parallel yet serializable large-scale graph processing. In Proceedings of the 19th International Conference on Parallel Processing. Springer-Verlag, Berlin, 458--469.
[126]
Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From “think like a vertex” to “think like a graph.” Proc. VLDB Endow. 7 (2013), 3.
[127]
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, 104--112.
[128]
Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. FENNEL: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (WSDM’14). ACM, New York, NY, 333--342.
[129]
Johan Ugander and Lars Backstrom. 2013. Balanced label propagation for partitioning massive graphs. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM’13). ACM, New York, NY, 507--516.
[130]
Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug. 1990), 103--111.
[131]
Luis Vaquero, Félix Cuadrado, Dionysios Logothetis, and Claudio Martella. 2013. xdgp: A dynamic graph processing system with adaptive partitioning. arXiv preprint arXiv:1309.1049. http://arxiv.org/abs/1309.1049
[132]
Luis Vaquero, Felix Cuadrado, and Matei Ripeanu. 2014. Systems for near real-time analysis of large-scale dynamic graphs. arXiv preprint arXiv:1410.1903. http://arxiv.org/abs/1410.1903
[133]
Luis M. Vaquero, Felix Cuadrado, Dionysios Logothetis, and Claudio Martella. 2014. Adaptive partitioning for large-scale dynamic graphs. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems (ICDCS’14). IEEE Computer Society, Washington, DC, 144--153.
[134]
Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. 1992. Active messages: A mechanism for integrated communication and computation. In Proceedings of the 19th Annual International Symposium on Computer Architecture (ISCA’92). ACM, New York, NY, 256--266.
[135]
Guozhang Wang, Wenlei Xie, Alan J. Demers, and Johannes Gehrke. 2013. Asynchronous large-scale graph processing made easy. In Proceedings of the 6th Biennial Conference on Innovative Data Systems Research (CIDR’13).
[136]
Lu Wang, Yanghua Xiao, Bin Shao, and Haixun Wang. 2014. How to partition a billion-node graph. In 2014 IEEE 30th International Conference on Data Engineering (ICDE’14). 568--579.
[137]
Peng Wang, Kaiyuan Zhang, Rong Chen, Haibo Chen, and Haibing Guan. 2014. Replication-based fault-tolerance for large-scale graph processing. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’14). 562--573.
[138]
Rui Wang and K. Chiu. 2013. A stream partitioning approach to processing large scale distributed graph datasets. In 2013 IEEE International Conference on Big Data. 537--542.
[139]
Jim Webber. 2012. A programmatic introduction to Neo4J. In Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity (SPLASH’12). ACM, New York, NY, 217--218.
[140]
Jeremiah James Willcock, Torsten Hoefler, Nicholas Gerard Edmonds, and Andrew Lumsdaine. 2011. Active pebbles: Parallel programming for data-driven applications. In Proceedings of the International Conference on Supercomputing (ICS’11). ACM, New York, NY, 235--244.
[141]
Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. Sync or async: Time to fuse for distributed graph-parallel computation. In Proceedings of 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
[142]
Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. 2014. Distributed power-law graph computing: Theoretical and empirical analysis. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, 1673--1681. http://papers.nips.cc/paper/5396-distributed-power-law-graph-computing-theoretical-and-empirical-analysis.pdf.
[143]
Wenlei Xie, Guozhang Wang, David Bindel, Alan Demers, and Johannes Gehrke. 2013. Fast iterative graph computation with block updates. Proc. VLDB Endow. 6, 14 (Sept. 2013), 2014--2025.
[144]
Ning Xu, Lei Chen, and Bin Cui. 2014. LogGP: A log-based dynamic graph partitioning method. Proc. VLDB Endow. 7 (2014), 14.
[145]
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014a. Blogel: A block-centric framework for distributed computation on real-world graphs. Proc. VLDB Endow. 7 (2014), 14.
[146]
Da Yan, James Cheng, Kai Xing, Li Lu, Wilfred Ng, and Yingyi Bu. 2014b. Pregel algorithms for graph connectivity problems with performance guarantees. Proc. VLDB Endow., 7 (2014).
[147]
Eiko Yoneki and Amitabha Roy. 2013. Scale-up graph processing: A storage-centric view. In 1st International Workshop on Graph Data Management Experiences and Systems (GRADES’13). ACM, New York, NY, Article 8, 6 pages.
[148]
Pingpeng Yuan, Wenya Zhang, Changfeng Xie, Hai Jin, Ling Liu, and Kisung Lee. 2014. Fast iterative graph computation: A path centric approach. In Proceedings of the 2014 International Conference for High Performance Computing, Networking, Storage and Analysis (SC'14). IEEE Press, Piscataway, NJ, 401--412.
[149]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10). USENIX Association, Berkeley, CA, 10--10. http://dl.acm.org/citation.cfm?id=1863103.1863113
[150]
ZengFeng Zeng, Bin Wu, and Haoyu Wang. 2012. A parallel graph partitioning algorithm to speed up the large-scale distributed graph mining. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (BigMine’12). ACM, New York, NY, 61--68.
[151]
Kaiyuan Zhang, Rong Chen, and Haibo Chen. 2015. NUMA-aware graph-structured analytics. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY, 183--193.
[152]
Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2012a. Accelerate large-scale iterative computation through asynchronous accumulative updates. In Proceedings of the 3rd Workshop on Scientific Cloud Computing Date (ScienceCloud'12). ACM, New York, NY, 13--22.
[153]
Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2012b. iMapReduce: A distributed computing framework for iterative computation. J. Grid Comput. 10, 1 (March 2012), 47--68.
[154]
Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2013. PrIter: A distributed framework for prioritizing iterative computations. IEEE Trans. Parallel Distrib. Syst. 24, 9 (Sept. 2013), 1884--1893.
[155]
Yue Zhao, Kenji Yoshigoe, Mengjun Xie, Suijian Zhou, Remzi Seker, and Jiang Bian. 2014. LightGraph: Lighten communication in distributed graph-parallel processing. In Proceedings of the 2014 IEEE International Congress on Big Data (BIGDATACONGRESS’14). IEEE Computer Society, Washington, DC, 717--724.
[156]
Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, and Alexander S. Szalay. 2015. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). USENIX Association, Berkeley, CA, 45--58. http://dl.acm.org/citation.cfm?id=2750482.2750486
[157]
Xianke Zhou, Pengfei Chang, and Gang Chen. 2014. An efficient graph processing system. In Web Technologies and Applications, Lei Chen, Yan Jia, Timos Sellis, and Guanfeng Liu (Eds.). Lecture Notes in Computer Science, Vol. 8709. Springer International Publishing, 401--412.
[158]
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-scale parallel collaborative filtering for the Netflix prize. In Proceedings of the 4th International Conference on Algorithmic Aspects in Information and Management. Springer-Verlag, Berlin, 337--348.

Cited By

View all
  • (2025)Lock-Free Triangle Counting on GPUIEEE Transactions on Computers10.1109/TC.2024.350429574:3(1040-1052)Online publication date: 1-Mar-2025
  • (2024)CUTTANA: Scalable Graph Partitioning for Faster Distributed Graph Databases and AnalyticsProceedings of the VLDB Endowment10.14778/3696435.369643718:1(14-27)Online publication date: 1-Sep-2024
  • (2024)Improving Graph Compression for Efficient Resource-Constrained Graph AnalyticsProceedings of the VLDB Endowment10.14778/3665844.366585217:9(2212-2226)Online publication date: 1-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 48, Issue 2
November 2015
615 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2830539
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2015
Accepted: 01 August 2015
Revised: 01 July 2015
Received: 01 January 2015
Published in CSUR Volume 48, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Big Data
  2. Graph processing
  3. distributed algorithms
  4. distributed systems
  5. pregel

Qualifiers

  • Survey
  • Research
  • Refereed

Funding Sources

  • AFOSR
  • University of Notre Dame Department of Computer Science and Engineering
  • Department of Education GAANN Fellowship

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)754
  • Downloads (Last 6 weeks)80
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Lock-Free Triangle Counting on GPUIEEE Transactions on Computers10.1109/TC.2024.350429574:3(1040-1052)Online publication date: 1-Mar-2025
  • (2024)CUTTANA: Scalable Graph Partitioning for Faster Distributed Graph Databases and AnalyticsProceedings of the VLDB Endowment10.14778/3696435.369643718:1(14-27)Online publication date: 1-Sep-2024
  • (2024)Improving Graph Compression for Efficient Resource-Constrained Graph AnalyticsProceedings of the VLDB Endowment10.14778/3665844.366585217:9(2212-2226)Online publication date: 1-May-2024
  • (2024)Automating Vectorized Distributed Graph ComputationProceedings of the ACM on Management of Data10.1145/36988332:6(1-27)Online publication date: 20-Dec-2024
  • (2024)SpeedCore: Space-efficient and Dependency-aware GPU Parallel Framework for Core DecompositionProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673111(555-564)Online publication date: 12-Aug-2024
  • (2024)IMESH: A DSL for Mesh ProcessingACM Transactions on Graphics10.1145/366218143:5(1-17)Online publication date: 25-Jun-2024
  • (2024)TLPGNN: A Lightweight Two-level Parallelism Paradigm for Graph Neural Network Computation on Single and Multiple GPUsACM Transactions on Parallel Computing10.1145/364471211:2(1-28)Online publication date: 8-Jun-2024
  • (2024)A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and InteractionProceedings of the ACM on Management of Data10.1145/36393152:1(1-29)Online publication date: 26-Mar-2024
  • (2024)GraphScope Flex: LEGO-like Graph Computing StackCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653383(386-399)Online publication date: 9-Jun-2024
  • (2024)Towards Efficient Graph Processing in Geo-Distributed Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345387235:11(2147-2160)Online publication date: Nov-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media