Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

DisGCo: A Compiler for Distributed Graph Analytics

Published: 30 September 2020 Publication History

Abstract

Graph algorithms are widely used in various applications. Their programmability and performance have garnered a lot of interest among the researchers. Being able to run these graph analytics programs on distributed systems is an important requirement. Green-Marl is a popular Domain Specific Language (DSL) for coding graph algorithms and is known for its simplicity. However, the existing Green-Marl compiler for distributed systems (Green-Marl to Pregel) can only compile limited types of Green-Marl programs (in Pregel canonical form). This severely restricts the types of parallel Green-Marl programs that can be executed on distributed systems. We present DisGCo, the first compiler to translate any general Green-Marl program to equivalent MPI program that can run on distributed systems.
Translating Green-Marl programs to MPI (SPMD/MPMD style of computation, distributed memory) presents many other exciting challenges, besides the issues related to differences in syntax, as Green-Marl gives the programmer a unified view of the whole memory and allows the parallel and serial code to be inter-mixed. We first present the set of challenges involved in translating Green-Marl programs to MPI and then present a systematic approach to do the translation. We also present a few optimization techniques to improve the performance of our generated programs. DisGCo is the first graph DSL compiler that can handle all syntactic capabilities of a practical graph DSL like Green-Marl and generate code that can run on distributed systems. Our preliminary evaluation of DisGCo shows that our generated programs are scalable. Further, compared to the state-of-the-art DH-Falcon compiler that translates a subset of Falcon programs to MPI, our generated codes exhibit a geomean speedup of 17.32×.

References

[1]
2015. Green-Marl Language Spec. Retrieved from https://docs.oracle.com/cd/E56133_01/1.2.0/Green_Marl_Language_Specification.pdf.
[2]
2015. MPI3.1 documentation. Retrieved from https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf.
[3]
2016. Mezzanine Apapters. Retrieved from http://www.mellanox.com/related-docs/user_manuals.
[4]
2019. MPICH Home Page. Retrieved from http://www.mcs.anl.gov/mpi/mpich2.
[5]
A. Abdolrashidi and L. Ramaswamy. 2016. Continual and cost-effective partitioning of dynamic graphs for optimizing big graph processing systems. In Proceedings of the IEEE International Congress on Big Data (BigData Congress). 18--25.
[6]
A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. 2013. Distributed large-scale natural graph factorization. In Proceedings of the World Wide Web Conference. 37--48.
[7]
S. P. Amarasinghe and M. S. Lam. 1993. Communication optimization and code generation for distributed memory machines. In Proceedings of the Conference on Programming Language Design and Implementation. 126--138.
[8]
K. Andreev and H. Räcke. 2004. Balanced graph partitioning. In Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures. 120--124.
[9]
A. Bader and K. Madduri. 2008. SNAP, small-world network analysis and partitioning: An open-source parallel graph framework for the exploration of large-scale networks. In Proceedings of the International Parallel and Distributed Processing Symposium. 1--12.
[10]
G. Bikshandi, J. G. Castanos, S. B. Kodali, V. K. Nandivada, I. Peshansky, V. A. Saraswat, S. Sur, P. Varma, and T. Wen. 2009. Efficient, portable implementation of asynchronous multi-place programs. In Proceedings of the Symposium on Principles and Practice of Parallel Programming. 271--282.
[11]
R. C. Calinescu. 2000. The Bulk-Synchronous Parallel Model. Springer London, 5--12.
[12]
A. Chan and F. Dehne. 2003. CGMgraph/CGMlib: Implementing and testing CGM graph algorithms on PC clusters. In Recent Advances in Parallel Virtual Machine and Message Passing Interface. 117--125.
[13]
U. Cheramangalath, R. Nasre, and Y. N. Srikant. 2017. DH-Falcon: A language for large-scale graph processing on distributed heterogeneous systems. In Proceedings of the IEEE International Conference on Cluster Computing. 439--450.
[14]
S. Cherem, T. Chilimbi, and S. Gulwani. 2008. Inferring locks for atomic sections. In Proceedings of the Conference on Programming Language Design and Implementation. 304--315.
[15]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2009. Introduction to Algorithms (3rd ed.). The MIT Press, Cambridge, MA.
[16]
R. Cytron, J. Lipkis, and E. Schonberg. 1990. A compiler-assisted approach to SPMD execution. In Proceedings of the ACM/IEEE Supercomputing Conference. 398--406.
[17]
R. Dathathri, G. Gill, L. Hoang, H. Dang, A. Brooks, N. Dryden, M. Snir, and K. Pingali. 2018. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’18). ACM, New York, NY, 752--768.
[18]
J. Dinan, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. 2016. An implementation and evaluation of the MPI 3.0 one-sided communication interface. Concurr. Comput. : Pract. Exper. 28 (Dec. 2016), 4385--4404.
[19]
G. Gill, R. Dathathri, L. Hoang, A. Lenharth, and K. Pingali. 2018. Abelian: A compiler for graph analytics on distributed, heterogeneous platforms. In Proceedings of the European Conference on Parallel Processing. 249--264.
[20]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, 17--30. Retrieved from http://dl.acm.org/citation.cfm?id=2387880.2387883.
[21]
J. Gray, R. A. Lorie, G. R. Putzolu, and I. L. Traiger. 1976. Granularity of locks and degrees of consistency in a shared data base. In Proceedings of the IFIP Working Conference on Modelling in Data Base Management Systems.
[22]
J. N. Gray, R. A. Lorie, and G. R. Putzolu. 1975. Granularity of locks in a shared data base. In Proceedings of the International Conference on Very Large Data Bases. 428--451.
[23]
D. Gregor and A. Lumsdaine. 2005. Lifting sequential graph algorithms for distributed-memory parallel computation. In Proceedings of the ACM SIGPLAN International Conference on Object-oriented Programming, Systems, Languages, and Applications. 423--437.
[24]
W. D. Gropp and R. Thakur. 2007. Revealing the performance of MPI RMA implementations. In Proceedings of the PVM/MPI Users’ Group Conference. 272--280.
[25]
F. Hielscher and P. Gottschling. 2004. ParGraph. Retrieved from http://pargraph.sourceforge.net/.
[26]
T. Hoefler, J. Dinan, R. Thakur, B. Barrett, P. Balaji, W. Gropp, and K. Underwood. 2015. Remote memory access programming in MPI-3. ACM Trans. Parallel Comput. 2 (June 2015).
[27]
S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. 2012. Green-Marl: A DSL for easy and efficient graph analysis. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 349--362.
[28]
S. Hong, S. Salihoglu, J. Widom, and K. Olukotun. 2014. Simplifying scalable graph processing with a domain-specific language. In Proceedings of the Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’14). ACM, New York, NY.
[29]
G. Karypis and V. Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20 (Dec. 1998), 359--392.
[30]
Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. 2013. Mizan: A system for dynamic load balancing in large-scale graph processing. In Proceedings of the European Conference on Computer Systems. 169--182.
[31]
J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. 2012. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters. In Proceedings of the International Conference on Supercomputing. 341--352.
[32]
M. Li, X. Lu, K. Hamidouche, J. Zhang, and D. K. Panda. 2016. Mizan-RMA: Accelerating Mizan graph processing framework with MPI RMA. In Proceedings of the 23rd IEEE International Conference on High Performance Computing, Data, and Analytics. IEEE, 42--51.
[33]
M. Li, X. Lu, S. Potluri, K. Hamidouche, J. Jose, K. Tomko, and D. K. Panda. 2014. Scalable Graph500 design with MPI-3 RMA. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’14). 230--238.
[34]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. 2012. Proc. VLDB Endow. 5 (Apr. 2012), 716--727.
[35]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. 2010. GraphLab: New framework for parallel machine learning. CoRR abs/1006.4990 (2010).
[36]
T. Maier, P. Sanders, and R. Dementiev. 2016. Concurrent hash tables: Fast and general?(!) In Proceedings of the Symposium on Principles and Practice of Parallel Programming. 3:41–3:42.
[37]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the SIGMOD Conference. 135--146.
[38]
J. Nelson, B. Holt, B. Myers, P. Briggs, L. Ceze, S. Kahan, and M. Oskin. 2015. Latency-tolerant software distributed shared memory. In Proceedings of the USENIX Annual Technical Conference. 291--305.
[39]
D. Nguyen, A. Lenharth, and K. Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the ACM Symposium on Operating Systems Principles. 456--471.
[40]
D. Nguyen, A. Lenharth, and K. Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the ACM Symposium on Operating Systems Principles. 456--471.
[41]
J. Nishimura and J. Ugander. 2013. Restreaming graph partitioning: Simple versatile algorithms for advanced balancing. In Proceedings of the Knowledge Discovery and Data Mining Conference. 1106--1114.
[42]
S. Pai and K. Pingali. 2016. A compiler for throughput optimization of graph algorithms on GPUs. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 1--19.
[43]
S. J. Plimpton and K. D. Devine. 2011. MapReduce in MPI for large-scale graph algorithms. Parallel Comput. 37 (Sep. 2011), 610--632.
[44]
L. Rauchwerger, F. Arzu, and K. Ouchi. 1998. Standard templates adaptive parallel library (STAPL). In Proceedings of the International Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers. 402--409.
[45]
S. Salihoglu and J. Widom. 2013. GPS: A graph processing system. In Proceedings of the Scientific and Statistical Database Management Conference. 22:1–22:12.
[46]
J. Seo, J. Park, J. Shin, and M. S. Lam. 2013. Distributed socialite: A datalog-based language for large-scale graph analysis. Proc. VLDB Endow. 6 (Sep. 2013), 1906--1917.
[47]
B. Shao, H. Wang, and Y. Li. 2013. Trinity: A distributed graph engine on a memory cloud. In Proceedings of the SIGMOD Conference. 505--516.
[48]
G. Shashidhar and R. Nasre. 2017. LightHouse: An automatic code generator for graph algorithms on GPUs. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. 235--249.
[49]
J. Shun and G. E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the Symposium on Principles and Practice of Parallel Programming. 135--146.
[50]
G. M. Slota, S. Rajamanickam, K. Devine, and K. Madduri. 2017. Partitioning trillion-edge graphs in minutes. In Proceedings of the International Parallel and Distributed Processing Symposium. 646--655.
[51]
V. Tipparaju, W. Gropp, H. Ritzdorf, R. Thakur, and J. L. Träff. 2009. Investigating high performance RMA interfaces for the MPI-3 standard. In Proceedings of the International Conference on Parallel Processing. 293--300.
[52]
C. Tseng. 1995. Compiler optimizations for eliminating barrier synchronization. SIGPLAN Not. 30 (Aug 1995), 144--155.
[53]
C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. 2014. FENNEL: Streaming graph partitioning for massive scale graphs. In Proceedings of the Web Search and Data Mining Conference. 333--342.
[54]
R. Wang and K. Chiu. 2013. A stream partitioning approach to processing large scale distributed graph datasets. In Proceedings of the IEEE International Conference on Big Data. 537--542.
[55]
T. Yu and M. Pradel. 2016. SyncProf: Detecting, localizing, and optimizing synchronization bottlenecks. In Proceedings of the International Symposium on Software Testing and Analysis. 389--400.
[56]
Y. Zhang, V. C. Sreedhar, W. Zhu, V. Sarkar, and G. R. Gao. 2007. Optimized lock assignment and allocation: A method for exploiting concurrency among critical sections. In Proceedings of the Symposium on Principles and Practice of Parallel Programming. 146--147.
[57]
Y. Zhang, M. Yang, R. Baghdadi, S. Kamil, J. Shun, and S. Amarasinghe. 2018. GraphIt: A high-performance graph DSL. Proc. ACM Program. Lang. 2 (Oct. 2018).
[58]
X. Zhu, W. Chen, W. Zheng, and X. Ma. 2016. Gemini: A computation-centric distributed graph processing system. In Proceedings of the Symposium on Operating Systems Design and Implementation. 301--316.

Cited By

View all
  • (2024)StarPlat: A Versatile DSL for Graph AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967(104967)Online publication date: Aug-2024
  • (2023)Constructing an AI Compiler for ARM Cortex-M DevicesComputer Systems Science and Engineering10.32604/csse.2023.03467246:1(999-1019)Online publication date: 2023
  • (2023)COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel LoopACM Transactions on Architecture and Code Optimization10.1145/363333121:1(1-26)Online publication date: 18-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 17, Issue 4
December 2020
430 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3427420
Issue’s Table of Contents
© 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2020
Accepted: 01 July 2020
Revised: 01 June 2020
Received: 01 December 2019
Published in TACO Volume 17, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph analytics
  2. GreenMarl
  3. distributed programming

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • SERB CRG
  • NSM research

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)151
  • Downloads (Last 6 weeks)39
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)StarPlat: A Versatile DSL for Graph AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967(104967)Online publication date: Aug-2024
  • (2023)Constructing an AI Compiler for ARM Cortex-M DevicesComputer Systems Science and Engineering10.32604/csse.2023.03467246:1(999-1019)Online publication date: 2023
  • (2023)COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel LoopACM Transactions on Architecture and Code Optimization10.1145/363333121:1(1-26)Online publication date: 18-Nov-2023
  • (2022)Arbitrarily Parallelizable Code: A Model of Computation Evaluated on a Message-Passing Many-Core SystemComputers10.3390/computers1111016411:11(164)Online publication date: 18-Nov-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media