research-article

Open access

Falcon: A Graph Manipulation Language for Heterogeneous Systems

Authors:

Unnikrishnan Cheramangalath,

Y. N. SrikantAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 4

Article No.: 54, Pages 1 - 27

https://doi.org/10.1145/2842618

Published: 22 December 2015 Publication History

Abstract

Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy—even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.

Supplementary Material

TACO1204-54 (taco1204-54.pdf)

Slide deck associated with this paper

Download
751.36 KB

References

[1]

D. Bader and K. Madduri. 2006. GTgraph: A Suite of Synthetic Graph Generators. Retrieved November 18, 2015, from http://www.cse.psu.edu/&sim;madduri/software/GTgraph.

[2]

D. Bader and K. Madduri. 2008. Snap, small-world network analysis and partitioning: An open-source parallel graph framework for the exploration of large-scale networks. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’08).

[3]

David A. Bader and Kamesh Madduri. 2005. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In High Performance Computing—PiPC 2005. Lecture Notes in Computer Science, Vol. 3769. Springer, 465--476.

Digital Library

[4]

A. Braunstein, M. Mézard, and R. Zecchina. 2005. Survey propagation: An algorithm for satisfiability. Random Structures and Algorithms 27, 2, 201--226.

Digital Library

[5]

Martin Burtscher and Keshav Pingali. 2011. CUDA implementation of the tree-based Barnes hut n-body algorithm. In GPU Computing Gems Emerald Edition. Morgan Kaufmann, 75--92. http://iss.ices.utexas.edu/Publications/Papers/burtscher11.pdf.

[6]

Unnikrishnan C, R. Nasre, and Y. N. Srikant. 2015. Falcon: A Graph Manipulation Language for Heterogeneous Systems. Technical Report. CSA, IISc, Bangalore, India. http://www.csa.iisc.ernet.in/TR/2015/5/.

[7]

L. Paul Chew. 1993. Guaranteed-quality mesh generation for curved surfaces. In Proceedings of the ACM Symposium on Computational Geometry. 274--280.

Digital Library

[8]

S. Chung and A. Condon. 1996. Parallel Implementation of Boruvka’s Minimum Spanning Tree Algorithm Retrieved November 18, 2015, from http://www.cs.ubc.ca/&sim;condon/papers/chungcondon96.pdf.

[9]

A. Davidson, S. Baxter, M. Garland, and J. D. Owens. 2014. Work-efficient parallel GPU methods for single source shortest paths. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS’14). 349--359.

Digital Library

[10]

DIMACS. 2009. 9th DIMACS Implementation Challenge. Retrieved November 18, 2015, from http://www.dis.uniroma1.it/challenge9/download.shtml.

[11]

P. Erdős and A Rényi. 1960. On the Evolution of Random Graphs. Retrieved November 18, 2015, from http://www.renyi.hu/&sim;p_erdos/1960-10.pdf.

[12]

Min Feng, Rajiv Gupta, and Laxmi N. Bhuyan. 2012. Speculative parallelization on GPGPUs. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). ACM, New York, NY, 293--294.

Digital Library

[13]

Daniele Frigioni, Mario Ioffreda, Umberto Nanni, and Giulio Pasqualone. 1998. Experimental analysis of dynamic algorithms for the single source shortest paths problem. ACM Journal of Experimental Algorithmics 3, Article No. 5.

Digital Library

[14]

Abdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto, and Matei Ripeanu. 2012. A yoke of oxen and a thousand chickens for heavy lifting graph processing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, 345--354.

Digital Library

[15]

Abdullah Gharaibeh, Elizeu Santos-Neto, Lauro Beltrão Costa, and Matei Ripeanu. 2013. The energy case for graph processing on hybrid CPU and GPU systems. In Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms. ACM, New York, NY, Article No. 2.

Digital Library

[16]

Douglas Gregor and Andrew Lumsdaine. 2005. The parallel BGL: A generic library for distributed graph computations. In Proceedings of the Conference on Parallel Object-Oriented Scientific Computing (POOSC’05).

[17]

P. Harish and P. J. Narayanan. 2007. Accelerating large graph algorithms on the GPU using CUDA. In Proceedings of the 14th International Conference on High Performance Computing (HiPC’07). 197--208.

Digital Library

[18]

P. Harish, V. Vineet, and P. J. Narayanan. 2009. Large Graph Algorithms for Massively Multithreaded Architectures. Technical Report IIIT/TR/2009/74. International Institute of Information Technology, Hyderabad, India.

[19]

Mark Harris. 2007. Optimizing Parallel Reduction in CUDA. Retrieved November 18, 2015, from http://docs.nvidia.com/cuda/samples/6_Advanced/reduction/doc/reduction.pdf.

[20]

Jared Hoberock and Nathan Bell. 2011. Thrust: A Productivity-Oriented Library for CUDA. Technical Report. Nvidia Corporation.

[21]

Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-Marl: A DSL for easy and efficient graph analysis. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). ACM, New York, NY, 349--362.

Digital Library

[22]

Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. 2011. Accelerating CUDA graph algorithms at maximum warp. ACM SIGPLAN Notices 46, 8, 267--276.

Digital Library

[23]

Sungpack Hong, Semih Salihoglu, Jennifer Widom, and Kunle Olukotun. 2014. Simplifying scalable graph processing with a domain-specific language. In Proceedings of the Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’14). ACM, New York, NY, 208.

Digital Library

[24]

Rashid Kaleem, Rajkishore Barik, Tatiana Shpeisman, Brian T. Lewis, Chunling Hu, and Keshav Pingali. 2014. Adaptive heterogeneous scheduling for integrated GPUs. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). ACM, New York, NY, 151--162.

Digital Library

[25]

Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC’14). ACM, New York, NY, 239--252.

Digital Library

[26]

Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. 2009. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. ACM SIGPLAN Notices 44, 4, 101--110.

Digital Library

[27]

Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8, 716--727.

Digital Library

[28]

Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, 135--145.

Digital Library

[29]

Mario Mendez-Lojo, Martin Burtscher, and Keshav Pingali. 2012. A GPU implementation of inclusion-based points-to analysis. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). ACM, New York, NY, 107--116.

Digital Library

[30]

Jaikrishnan Menon, Marc De Kruijf, and Karthikeyan Sankaralingam. 2012. iGPU: Exception support and speculative execution on GPUs. ACM SIGARCH Computer Architecture News 40, 3, 72--83.

Digital Library

[31]

Ulrich Meyer and Peter Sanders. 1998. Delta-stepping: A parallel single source shortest path algorithm. In Proceedings of the European Symposium on Algorithms (ESA’98). 393--404.

Digital Library

[32]

Rupesh Nasre, Martin Burtscher, and Keshav Pingali. 2013a. Data-driven versus topology-driven irregular computations on GPUs. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS’13). 463--474.

Digital Library

[33]

Rupesh Nasre, Martin Burtscher, and Keshav Pingali. 2013b. Morph algorithms on GPUs. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). ACM, New York, NY, 147--156.

Digital Library

[34]

John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable parallel programming with CUDA. Queue 6, 2, 40--53.

Digital Library

[35]

Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY,12--25.

Digital Library

[36]

Tarun Prabhu, Shreyas Ramalingam, Matthew Might, and Mary Hall. 2011. EigenCFA: Accelerating flow analysis with GPUs. ACM SIGPLAN Notices 46, 1, 511--522.

Digital Library

[37]

Dimitrios Prountzos, Roman Manevich, and Keshav Pingali. 2012. Elixir: A system for synthesizing concurrent graph programs. ACM SIGPLAN Notices 47, 10, 375--394.

Digital Library

[38]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, New York, NY, 519--530.

Digital Library

[39]

G. Ramalingam and T. Reps. 1996. On the computational complexity of dynamic graph problems. Theoretical Computer Science 158, 233--277.

Digital Library

[40]

Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 472--488.

Digital Library

[41]

Mehrzad Samadi, Amir Hormati, Janghaeng Lee, and Scott Mahlke. 2012. Paragon: Collaborative speculative loop execution on GPU and CPU. In Proceedings of the Workshop on General Purpose Processing with Graphics Processing Units (GPGPU-5). ACM, New York, NY, 64--73.

Digital Library

[42]

Ahmet Erdem Sariyüce, Kamer Kaya, Erik Saule, and Ümit V. Çatalyürek. 2013. Betweenness centrality on GPUs and heterogeneous architectures. In Proceedings of the Workshop on General Purpose Processing Using Graphics Processing Units (GPGPU-6). ACM, New York, NY, 76--85.

Digital Library

[43]

Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. ACM SIGPLAN Notices 48, 8, 135--146.

Digital Library

[44]

Richard M. Stallman and the GCC Developer Community. 2011. Using the GNU Compiler Collection. Retrieved November 18, 2015, from https://gcc.gnu.org/onlinedocs/gcc.pdf.

[45]

Morten Stockel and Soren Bog. 2008. Concurrent Datastructures. Technical Report IMM-BSC-2008-12. Technical University of Denmark.

[46]

Chen Tian, Min Feng, Vijay Nagarajan, and Rajiv Gupta. 2008. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture (MICRO-41). 330--341.

Digital Library

[47]

Chen Tian, Changhui Lin, Min Feng, and Rajiv Gupta. 2011. Enhanced speculative parallelization via incremental recovery. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). ACM, New York, NY, 189--200.

Digital Library

[48]

Leslie G. Valiant. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8, 103--111.

Digital Library

[49]

Shucai Xiao and Wu Chun Feng. 2010. Inter-block GPU communication via fast barrier synchronization. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDP’10). 1--12.

[50]

Kaiyuan Zhang, Rong Chen, and Haibo Chen. 2015. NUMA-aware graph-structured analytics. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY, 183--193.

Digital Library

[51]

Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems 25, 6, 1543--1552.

Digital Library

Cited By

Boukham HWachsmuth GDwars MChiadmi DFischer BBurgueño LCazzola W(2022)A Multi-target, Multi-paradigm DSL Compiler for Algorithmic Graph ProcessingProceedings of the 15th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3567512.3567513(2-15)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1145/3567512.3567513
Ko SLee THong KLee WSeo ISeo JHan WLi GLi ZIdreos SSrivastava D(2021)iTurboGraphProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457243(977-990)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3457243
Zheng RPai SLee J(2021)Efficient execution of graph algorithms on CPU with SIMD extensionsProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370326(262-276)Online publication date: 27-Feb-2021
https://dl.acm.org/doi/10.1109/CGO51591.2021.9370326
Show More Cited By

Index Terms

Falcon: A Graph Manipulation Language for Heterogeneous Systems
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

The efficiency of a pleasingly parallel application is studied for several computing platforms. A real world problem, i.e., Monte-Carlo numerical simulations of stratospheric balloon envelope drift descent is considered. We detail the optimization of ...
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Performance analysis of the OP2 framework on many-core architectures
Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)

We present a performance analysis and benchmarking study of the OP2 "active" library, which provides an abstraction framework for the solution of parallel unstructured mesh applications. OP2 aims to decouple the scientific specification of the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 12, Issue 4

January 2016

848 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2836331

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2015

Accepted: 01 October 2015

Revised: 01 October 2015

Received: 01 June 2015

Published in TACO Volume 12, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

IMPECS project of DST (Government of India)
MPI-SWS (Germany)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
985
Total Downloads

Downloads (Last 12 months)131
Downloads (Last 6 weeks)25

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Boukham HWachsmuth GDwars MChiadmi DFischer BBurgueño LCazzola W(2022)A Multi-target, Multi-paradigm DSL Compiler for Algorithmic Graph ProcessingProceedings of the 15th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3567512.3567513(2-15)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1145/3567512.3567513
Ko SLee THong KLee WSeo ISeo JHan WLi GLi ZIdreos SSrivastava D(2021)iTurboGraphProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457243(977-990)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3457243
Zheng RPai SLee J(2021)Efficient execution of graph algorithms on CPU with SIMD extensionsProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370326(262-276)Online publication date: 27-Feb-2021
https://dl.acm.org/doi/10.1109/CGO51591.2021.9370326
Chen XDathathri RGill GPingali K(2020)PangolinProceedings of the VLDB Endowment10.14778/3389133.338913713:8(1190-1205)Online publication date: 3-May-2020
https://dl.acm.org/doi/10.14778/3389133.3389137
Gogoi BCheramangalath UNasre RJog AKayiran OPattnaik A(2020)Custom code generation for a graph DSLProceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit10.1145/3366428.3380772(51-60)Online publication date: 23-Feb-2020
https://dl.acm.org/doi/10.1145/3366428.3380772
Srikant Y(2020)Distributed Graph AnalyticsDistributed Computing and Internet Technology10.1007/978-3-030-36987-3_1(3-20)Online publication date: 9-Jan-2020
https://dl.acm.org/doi/10.1007/978-3-030-36987-3_1
Panja RVadhiyar S(2019)HyPar: A divide-and-conquer model for hybrid CPU–GPU graph processingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.05.014Online publication date: Jun-2019
https://doi.org/10.1016/j.jpdc.2019.05.014
Wang GWada KYamagiwa S(2019)Optimization in the parallelism extraction algorithm with spanning tree on a multi‐GPU environmentIEEJ Transactions on Electrical and Electronic Engineering10.1002/tee.2287514:6(862-869)Online publication date: 13-Mar-2019
https://doi.org/10.1002/tee.22875
Mansinghka VSchaechtle UHanda SRadul AChen YRinard M(2018)Probabilistic programming with programmable inferenceACM SIGPLAN Notices10.1145/3296979.319240953:4(603-616)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3296979.3192409
Wang DHoffmann JReps T(2018)PMAF: an algebraic framework for static analysis of probabilistic programsACM SIGPLAN Notices10.1145/3296979.319240853:4(513-528)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3296979.3192408
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents