Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Grafs: declarative graph analytics

Published: 19 August 2021 Publication History

Abstract

Graph analytics elicits insights from large graphs to inform critical decisions for business, safety and security. Several large-scale graph processing frameworks feature efficient runtime systems; however, they often provide programming models that are low-level and subtly different from each other. Therefore, end users can find implementation and specially optimization of graph analytics error-prone and time-consuming. This paper regards the abstract interface of the graph processing frameworks as the instruction set for graph analytics, and presents Grafs, a high-level declarative specification language for graph analytics and a synthesizer that automatically generates efficient code for five high-performance graph processing frameworks. It features novel semantics-preserving fusion transformations that optimize the specifications and reduce them to three primitives: reduction over paths, mapping over vertices and reduction over vertices. Reductions over paths are commonly calculated based on push or pull models that iteratively apply kernel functions at the vertices. This paper presents conditions, parametric in terms of the kernel functions, for the correctness and termination of the iterative models, and uses these conditions as specifications to automatically synthesize the kernel functions. Experimental results show that the generated code matches or outperforms handwritten code, and that fusion accelerates execution.

Supplementary Material

Auxiliary Presentation Video (icfp21main-p119-p-video.mp4)
This is a presentation video for the article "Grafs: Declarative Graph Analytics" presented at ICFP 2021.
MP4 File (3473588.mp4)
Presentation Videos

References

[1]
Christopher R Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. Emptyheaded: A relational engine for graph processing. ACM Transactions on Database Systems (TODS), 42, 4 (2017), 20.
[2]
Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo MK Martin, Mukund Raghothaman, Sanjit A Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In 2013 Formal Methods in Computer-Aided Design. 1–8.
[3]
Sorav Bansal and Alex Aiken. 2006. Automatic generation of peephole superoptimizers. In ACM Sigplan Notices. 41, 394–403.
[4]
Shaon Barman, Rastislav Bodik, Satish Chandra, Emina Torlak, Arka Bhattacharya, and David Culler. 2015. Toward tool support for interactive synthesis. In 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!). 121–136.
[5]
Roderick Bloem, Georg Hofferek, Bettina Könighofer, Robert Könighofer, Simon Ausserlechner, and Raphael Spörk. 2014. Synthesis of Synchronization Using Uninterpreted Functions. In Proceedings of the 14th Conference on Formal Methods in Computer-Aided Design (FMCAD ’14). FMCAD Inc, Austin, TX. Article 11, 8 pages. isbn:978-0-9835678-4-4 http://dl.acm.org/citation.cfm?id=2682923.2682937
[6]
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’08). ACM, New York, NY, USA. 101–113. isbn:978-1-59593-860-2 https://doi.org/10.1145/1375581.1375595
[7]
Pavol Cerny, Edmund M. Clarke, Thomas A. Henzinger, Arjun Radhakrishna, Leonid Ryzhyk, Roopsha Samanta, and Thorsten Tarrach. 2017. From Non-preemptive to Preemptive Scheduling Using Synchronization Synthesis. Form. Methods Syst. Des., 50, 2-3 (2017), June, 97–139. issn:0925-9856 https://doi.org/10.1007/s10703-016-0256-5
[8]
Pavol Cerny, Thomas A. Henzinger, Arjun Radhakrishna, Leonid Ryzhyk, and Thorsten Tarrach. 2013. Efficient Synthesis for Concurrency by Semantics-Preserving Transformations. In Computer Aided Verification, Natasha Sharygina and Helmut Veith (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 951–967. isbn:978-3-642-39799-8
[9]
Pavol Cerny, Thomas A. Henzinger, Arjun Radhakrishna, Leonid Ryzhyk, and Thorsten Tarrach. 2014. Regression-Free Synthesis for Concurrency. In Computer Aided Verification, Armin Biere and Roderick Bloem (Eds.). Springer International Publishing, Cham. 568–584. isbn:978-3-319-08867-9
[10]
Unnikrishnan Cheramangalath, Rupesh Nasre, and Y N. Srikant. 2017. DH-Falcon: A Language for Large-Scale Graph Processing on Distributed Heterogeneous Systems. 439–450. https://doi.org/10.1109/CLUSTER.2017.72
[11]
Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani. 2008. Inferring locks for atomic sections. ACM SIGPLAN Notices, 43, 6 (2008), 304–315.
[12]
Wei-Ngan Chin. 1992. Safe fusion of functional expressions. In ACM SIGPLAN Lisp Pointers. 11–20.
[13]
Onofre Coll Ruiz, Kiminori Matsuzaki, and Shigeyuki Sato. 2016. s6raph: vertex-centric graph processing framework with functional interface. In Proceedings of the 5th International Workshop on Functional High-Performance Computing. 58–64.
[14]
Flavio Cruz, Ricardo Rocha, and Seth Copen Goldstein. 2016. Declarative coordination of graph-based parallel programs. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1–12.
[15]
Flavio Cruz, Ricardo Rocha, Seth Copen Goldstein, and Frank Pfenning. 2014. A linear logic programming language for concurrent programming over graph structures. Theory and Practice of Logic Programming, 14, 4-5 (2014), 493–507.
[16]
Dave Cunningham, Khilan Gudka, and Susan Eisenbach. 2008. Keep off the grass: Locking the right path for atomicity. In International Conference on Compiler Construction. 276–290.
[17]
Alain Darte. 1999. On the complexity of loop fusion. In 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No. PR00425). 149–157.
[18]
Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A Communication-optimizing Substrate for Distributed Heterogeneous Graph Analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). ACM, New York, NY, USA. 752–768. isbn:978-1-4503-5698-5 https://doi.org/10.1145/3192366.3192404
[19]
Kento Emoto, Kiminori Matsuzaki, Zhenjiang Hu, Akimasa Morihata, and Hideya Iwasaki. 2016. Think like a vertex, behave like a function! a functional DSL for vertex-centric big graph processing. ACM SIGPLAN Notices, 51 (2016), 09, 200–213. https://doi.org/10.1145/3022670.2951938
[20]
Yu Feng, Ruben Martins, Yuepeng Wang, Isil Dillig, and Thomas W Reps. 2017. Component-based synthesis for complex APIs. ACM SIGPLAN Notices, 52, 1 (2017), 599–612.
[21]
Andrew Gill, John Launchbury, and Simon L. Peyton Jones. 1993. A Short Cut to Deforestation. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture (FPCA ’93). ACM, New York, NY, USA. 223–232. isbn:0-89791-595-X https://doi.org/10.1145/165180.165214
[22]
Gurbinder Gill, Roshan Dathathri, Loc Hoang, Andrew Lenharth, and Keshav Pingali. 2018. Abelian: A Compiler for Graph Analytics on Distributed, Heterogeneous Platforms. In Euro-Par 2018: Parallel Processing, Marco Aldinucci, Luca Padovani, and Massimo Torquati (Eds.). Springer International Publishing, Cham. 249–264.
[23]
Jennifer Ann Golbeck. 2005. Computing and applying trust in web-based social networks. Ph.D. Dissertation.
[24]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th $USENIX$ Symposium on Operating Systems Design and Implementation ($OSDI$ 12). 17–30.
[25]
Samuel Grossman, Heiner Litz, and Christos Kozyrakis. 2018. Making pull-based graph processing performant. In ACM SIGPLAN Notices. 53, 246–260.
[26]
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In ACM SIGPLAN Notices. 46, 317–330.
[27]
Sumit Gulwani, William R Harris, and Rishabh Singh. 2012. Spreadsheet data manipulation using examples. Commun. ACM, 55, 8 (2012), 97–105.
[28]
Richard L Halpert, Christopher JF Pickett, and Clark Verbrugge. 2007. Component-based lock allocation. In Parallel Architecture and Compilation Techniques, 2007. PACT 2007. 16th International Conference on. 353–364.
[29]
John A Hewson, Paul Anderson, and Andrew D Gordon. 2012. A Declarative Approach to Automated Configuration. In LISA. 12, 51–66.
[30]
Loc Hoang, Matteo Pontecorvi, Roshan Dathathri, Gurbinder Gill, Bozhi You, Keshav Pingali, and Vijaya Ramachandran. 2019. A round-efficient distributed betweenness centrality algorithm. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 272–286.
[31]
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-Marl: a DSL for easy and efficient graph analysis. ACM SIGARCH Computer Architecture News, 40, 1 (2012), 349–362.
[32]
Farzin Houshmand and Mohsen Lesani. 2019. Hamsaz: replication coordination analysis and synthesis. Proceedings of the ACM on Programming Languages, 3, POPL (2019), 74.
[33]
Shachar Itzhaky, Sumit Gulwani, Neil Immerman, and Mooly Sagiv. 2010. A simple inductive synthesis methodology and its applications. In ACM Sigplan Notices. 45, 36–46.
[34]
Susmit Jha, Sumit Gulwani, Sanjit A Seshia, and Ashish Tiwari. 2010. Oracle-guided component-based program synthesis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. 215–224.
[35]
Patricia Johann and Eelco Visser. 2000. Warm fusion in Stratego: A case study in generation of program transformation systems. Annals of Mathematics and Artificial Intelligence, 29, 1 (2000), 1–34.
[36]
Rajeev Joshi, Greg Nelson, and Keith Randall. 2002. Denali: a goal-directed superoptimizer. 37, ACM.
[37]
Rajeev Joshi, Greg Nelson, and Yunhong Zhou. 2006. Denali: A practical algorithm for generating optimal code. ACM Transactions on Programming Languages and Systems (TOPLAS), 28, 6 (2006), 967–989.
[38]
Ken Kennedy and Kathryn S McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In International Workshop on Languages and Compilers for Parallel Computing. 301–320.
[39]
Christian Lindig and Norman Ramsey. 2004. Declarative composition of stack frames. In International Conference on Compiler Construction. 298–312.
[40]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5, 8 (2012), 716–727.
[41]
Yucheng Low, Joseph E Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E Guestrin, and Joseph Hellerstein. 2014. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1408.2041.
[42]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 135–146.
[43]
Mugilan Mariappan, Joanna Che, and Keval Vora. 2021. DZiG: Sparsity-Aware Incremental Processing of Streaming Graphs. In Proceedings of the European Conference on Computer Systems (EuroSys ’21). 1–16.
[44]
Mugilan Mariappan and Keval Vora. 2019. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In Proceedings of the European Conference on Computer Systems (EuroSys ’19). 1–16.
[45]
Harry Massalin. 1987. Superoptimizer – a Look at the Smallest Program. Palo Alto, California.
[46]
Akimasa Morihata, Kento Emoto, Kiminori Matsuzaki, Zhenjiang Hu, and Hideya Iwasaki. 2018. Optimizing Declarative Parallel Distributed Graph Processing by Using Constraint Solvers. In Functional and Logic Programming, John P. Gallagher and Martin Sulzmann (Eds.). Springer International Publishing, Cham. 166–181. isbn:978-3-319-90686-7
[47]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 456–471.
[48]
Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed program synthesis. ACM SIGPLAN Notices, 50, 6 (2015), 619–630.
[49]
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. In ACM SIGPLAN Notices. 51, 522–538.
[50]
Dimitrios Prountzos, Roman Manevich, and Keshav Pingali. 2012. Elixir: A system for synthesizing concurrent graph programs. In ACM SIGPLAN Notices. 47, 375–394.
[51]
Dimitrios Prountzos, Roman Manevich, and Keshav Pingali. 2015. Synthesizing parallel graph programs via automated planning. In ACM SIGPLAN Notices. 50, 533–544.
[52]
Apan Qasem and Ken Kennedy. 2006. Profitable Loop Fusion and Tiling Using Model-driven Empirical Search. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS ’06). ACM, New York, NY, USA. 249–258. isbn:1-59593-282-8 https://doi.org/10.1145/1183401.1183437
[53]
Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noel Pouchet, Fabrice Rastello, Robert J Harrison, and Ponnuswamy Sadayappan. 2016. A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 40.
[54]
Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, Robert J Harrison, and Ponnuswamy Sadayappan. 2016. On fusing recursive traversals of Kd trees. In Proceedings of the 25th International Conference on Compiler Construction. 152–162.
[55]
Ricardo Rocha and John Launchbury. 2011. Practical Aspects of Declarative Languages: 13th International Symposium, PADL 2011, Austin, TX, USA, January 24-25, 2011. Proceedings. 6539, Springer.
[56]
Marko A Rodriguez. 2015. The gremlin graph traversal machine and language (invited talk). In Proceedings of the 15th Symposium on Database Programming Languages. 1–10.
[57]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 472–488.
[58]
Olli Saarikivi, Margus Veanes, Todd Mytkowicz, and Madan Musuvathi. 2017. Fusing Effectful Comprehensions. SIGPLAN Not., 52, 6 (2017), June, 17–32. issn:0362-1340 https://doi.org/10.1145/3140587.3062362
[59]
Laith Sakka, Kirshanthan Sundararajah, and Milind Kulkarni. 2017. Treefuser: a framework for analyzing and fusing general recursive tree traversals. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), 76.
[60]
Laith Sakka, Kirshanthan Sundararajah, Ryan R Newton, and Milind Kulkarni. 2019. Sound, fine-grained traversal fusion for heterogeneous trees. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 830–844.
[61]
Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Gratian Lup, Jubi Taneja, and John Regehr. 2017. Souper: A synthesizing superoptimizer. arXiv preprint arXiv:1711.04422.
[62]
Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic superoptimization. In ACM SIGPLAN Notices. 48, 305–316.
[63]
Martin Sevenich, Sungpack Hong, Oskar van Rest, Zhe Wu, Jayanta Banerjee, and Hassan Chafi. 2016. Using domain-specific languages for analytic graph databases. Proceedings of the VLDB Endowment, 9, 13 (2016), 1257–1268.
[64]
G Shashidhar and Rupesh Nasre. 2016. Lighthouse: An automatic code generator for graph algorithms on gpus. In International Workshop on Languages and Compilers for Parallel Computing. 235–249.
[65]
Kensen Shi, Jacob Steinhardt, and Percy Liang. 2019. FrAngel: Component-based Synthesis with Control Structures. Proc. ACM Program. Lang., 3, POPL (2019), Article 73, Jan., 29 pages. issn:2475-1421 https://doi.org/10.1145/3290386
[66]
Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). ACM, New York, NY, USA. 135–146. isbn:978-1-4503-1922-5 https://doi.org/10.1145/2442516.2442530
[67]
Calvin Smith and Aws Albarghouthi. 2016. MapReduce program synthesis. ACM SIGPLAN Notices, 51, 6 (2016), 326–340.
[68]
Armando Solar-Lezama, Rodric Rabbah, Rastislav Bodík, and Kemal Ebcioğlu. 2005. Programming by sketching for bit-streaming programs. In ACM SIGPLAN Notices. 40, 281–294.
[69]
Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial sketching for finite programs. ACM Sigplan Notices, 41, 11 (2006), 404–415.
[70]
Saurabh Srivastava, Sumit Gulwani, and Jeffrey S Foster. 2010. From program verification to program synthesis. In ACM Sigplan Notices. 45, 313–326.
[71]
Abhishek Udupa, Arun Raghavan, Jyotirmoy V Deshmukh, Sela Mador-Haim, Milo MK Martin, and Rajeev Alur. 2013. TRANSIT: specifying protocols with concolic snippets. ACM SIGPLAN Notices, 48, 6 (2013), 287–296.
[72]
Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: a property graph query language. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems. 7.
[73]
Martin Vechev and Eran Yahav. 2008. Deriving linearizable fine-grained concurrent objects. ACM SIGPLAN Notices, 43, 6 (2008), 125–135.
[74]
Martin Vechev, Eran Yahav, and Greta Yorsh. 2010. Abstraction-guided synthesis of synchronization. In ACM Sigplan Notices. 45, 327–338.
[75]
Keval Vora. 2019. Lumos: Dependency-Driven Disk-based Graph Processing. In USENIX Annual Technical Conference (USENIX ATC ’19). 429–442.
[76]
Keval Vora, Rajiv Gupta, and Guoqing Xu. 2017. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. 237–251. isbn:978-1-4503-4465-4 https://doi.org/10.1145/3037697.3037748
[77]
Philip Wadler. 1988. Deforestation: Transforming Programs to Eliminate Trees. In Proceedings of the Second European Symposium on Programming. North-Holland Publishing Co., Amsterdam, The Netherlands, The Netherlands. 231–248. http://dl.acm.org/citation.cfm?id=80098.80104
[78]
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing highly expressive SQL queries from input-output examples. In ACM SIGPLAN Notices. 52, 452–466.
[79]
Zhilei Xu, Shoaib Kamil, and Armando Solar-Lezama. 2014. MSL: A Synthesis Enabled Language for Distributed Implementations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’14). IEEE Press, Piscataway, NJ, USA. 311–322. isbn:978-1-4799-5500-8 https://doi.org/10.1109/SC.2014.31
[80]
Yongzhe Zhang, Hsiang-Shang Ko, and Zhenjiang Hu. 2017. Palgol: A high-level DSL for vertex-centric graph processing with remote data access. In Asian Symposium on Programming Languages and Systems. 301–320.
[81]
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A High-performance Graph DSL. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 121, Oct., 30 pages. issn:2475-1421 https://doi.org/10.1145/3276491
[82]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In 12th $USENIX$ Symposium on Operating Systems Design and Implementation ($OSDI$ 16). 301–316.
[83]
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. Gridgraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In 2015 $USENIX$ Annual Technical Conference ($USENIX$$ATC$ 15). 375–386.

Cited By

View all
  • (2024)StarPlat: A Versatile DSL for Graph AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967(104967)Online publication date: Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 5, Issue ICFP
August 2021
1011 pages
EISSN:2475-1421
DOI:10.1145/3482883
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 August 2021
Published in PACMPL Volume 5, Issue ICFP

Check for updates

Badges

Author Tags

  1. Fusion
  2. Program Synthesis

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)11
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)StarPlat: A Versatile DSL for Graph AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967(104967)Online publication date: Aug-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media