research-article

Public Access

A compiler for throughput optimization of graph algorithms on GPUs

Authors:

Keshav PingaliAuthors Info & Claims

OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications

Pages 1 - 19

https://doi.org/10.1145/2983990.2984015

Published: 19 October 2016 Publication History

Abstract

Writing high-performance GPU implementations of graph algorithms can be challenging. In this paper, we argue that three optimizations called throughput optimizations are key to high-performance for this application class. These optimizations describe a large implementation space making it unrealistic for programmers to implement them by hand.

To address this problem, we have implemented these optimizations in a compiler that produces CUDA code from an intermediate-level program representation called IrGL. Compared to state-of-the-art handwritten CUDA implementations of eight graph applications, code generated by the IrGL compiler is up to 5.95x times faster (median 1.4x) for five applications and never more than 30% slower for the others. Throughput optimizations contribute an improvement up to 4.16x (median 1.4x) to the performance of unoptimized IrGL code.

References

[1]

The LonestarGPU 2.0 benchmark suite, 2014.

[2]

J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: Weak behaviours and programming assumptions. In Ö. Özturk, K. Ebcioglu, and S. Dwarkadas, editors, Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’15, Istanbul, Turkey, March 14- 18, 2015, pages 577–591. ACM, 2015. ISBN 978-1-4503- 2835-7.

Digital Library

[3]

2694391.

[4]

R. Baghdadi, A. Cohen, S. Guelton, S. Verdoolaege, J. Inoue, T. Grosser, G. Kouveli, A. Kravets, A. Lokhmotov, C. Nugteren, F. Waters, and A. Donaldson. PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs. In WOLFHPC 2012 - 2nd Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, Salt Lake City, Utah, United States, Nov. 2012.

[5]

R. Baghdadi, U. Beaugnon, A. Cohen, T. Grosser, M. Kruse, C. Reddy, S. Verdoolaege, A. Betts, A. F. Donaldson, J. Ketema, J. Absar, S. van Haastregt, A. Kravets, A. Lokhmotov, R. David, and E. Hajiyev. PENCIL: A platform-neutral compute intermediate language for accelerator programming. In 2015 International Conference on Parallel Architecture and Compilation, PACT 2015, San Francisco, CA, USA, October 18-21, 2015, pages 138–149. IEEE Computer Society, 2015. ISBN 978-1-4673-9524-3.

Digital Library

[6]

E. Bardsley, A. Betts, N. Chong, P. Collingbourne, P. Deligiannis, A. F. Donaldson, J. Ketema, D. Liew, and S. Qadeer. Engineering a static verification tool for GPU kernels. In A. Biere and R. Bloem, editors, Computer Aided Verification - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18- 22, 2014. Proceedings, volume 8559 of Lecture Notes in Computer Science, pages 226–242. Springer, 2014. ISBN 978- 3-319-08866-2.

Digital Library

[7]

M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs. ICS ’08, 2008.

Digital Library

[8]

S. Beamer, K. Asanovic, and D. A. Patterson. Directionoptimizing breadth-first search. In J. K. Hollingsworth, editor, SC Conference on High Performance Computing Networking, Storage and Analysis, SC ’12, Salt Lake City, UT, USA - November 11 - 15, 2012, page 12. IEEE/ACM, 2012. ISBN 978-1-4673-0804-5.

Digital Library

[9]

L. Bergstrom and J. H. Reppy. Nested data-parallelism on the GPU. In P. Thiemann and R. B. Findler, editors, ACM SIGPLAN International Conference on Functional Programming, ICFP’12, Copenhagen, Denmark, September 9- 15, 2012, pages 247–258. ACM, 2012. ISBN 978-1-4503- 1054-3.

Digital Library

[10]

2364563.

[11]

A. Betts, N. Chong, A. F. Donaldson, J. Ketema, S. Qadeer, P. Thomson, and J. Wickerson. The design and implementation of a verification technique for GPU kernels. ACM Trans. Program. Lang. Syst., 37(3):10, 2015.

Digital Library

[12]

M. Burtscher, R. Nasre, and K. Pingali. A quantitative study of irregular programs on GPUs. In IISWC 2012, La Jolla, CA, USA, November 4-6, 2012, IISWC 2012, 2012.

Digital Library

[13]

D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 442– 446.

[14]

S. Che, B. M. Beckmann, S. K. Reinhardt, and K. Skadron. Pannotia: Understanding irregular GPGPU graph applications. In Proceedings of the IEEE International Symposium on Workload Characterization, IISWC 2013, Portland, OR, USA, September 22-24, 2013, pages 185–195. IEEE Computer Society, 2013. ISBN 978-1-4799-0553-9.

[15]

U. Cheramangalath, R. Nasre, and Y. Srikant. Falcon: A graph manipulation language for heterogeneous systems. Technical Report 2015-5, Indian Institute of Science, Department of Computer Science and Automation, 2015.

[16]

U. Cheramangalath, R. Nasre, and Y. N. Srikant. Falcon: A graph manipulation language for heterogeneous systems. TACO, 12(4):54, 2016.

Digital Library

[17]

S. Collange. Identifying scalar behavior in CUDA kernels. Technical report, Jan. 2011.

[18]

T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms, McGraw Hill, 2001.

Digital Library

[19]

C. da Silva Sousa, A. Mariano, and A. Proença. A generic and highly efficient parallel variant of Bor˚uvka’s algorithm. In 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2015.

Digital Library

[20]

A. A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel GPU methods for single-source shortest paths. In 2014 IEEE IPDPS, 2014.

Digital Library

[21]

I. J. Egielski, J. Huang, and E. Z. Zhang. Massive atomics for massive parallelism on GPUs. In D. Grove and S. Z. Guyer, editors, International Symposium on Memory Management, ISMM ’14, Edinburgh, United Kingdom, June 12, 2014, pages 93–103. ACM, 2014. ISBN 978-1-4503-2921-7.

Digital Library

[22]

E. Elsen and V. Vaidyanathan. Vertexapi2 – a vertex-program api for large graph computations on the gpu. 2014.

[23]

P. Fatourou and N. D. Kallimanis. Revisiting the combining synchronization technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, February 25-29, 2012, pages 257–266. ACM, 2012.

Digital Library

[24]

J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319–349, July 1987. ISSN 0164- 0925.

Digital Library

[25]

24041.

[26]

Z. Fu, B. B. Thompson, and M. Personick. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In P. A. Boncz and J. Larriba-Pey, editors, Second International Workshop on Graph Data Management Experiences and Systems, GRADES 2014, co-loated with SIGMOD/PODS 2014, Snowbird, Utah, USA, June 22, 2014, pages 2:1–2:6. CWI/ACM, 2014. ISBN 978-1-4503-2982-8.

Digital Library

[27]

A. Gharaibeh, L. B. Costa, E. Santos-Neto, and M. Ripeanu. A yoke of oxen and a thousand chickens for heavy lifting graph processing. In PACT ’12. ACM, 2012.

Digital Library

[28]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In C. Thekkath and A. Vahdat, editors, 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8-10, 2012, pages 17–30. USENIX Association, 2012. ISBN 978-1-931971-96-6.

Digital Library

[29]

K. Gupta, J. A. Stuart, and J. D. Owens. A Study of Persistent Threads Style GPU Programming for GPGPU Workloads. In Innovative Parallel Computing, May 2012.

[30]

D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In F. M. auf der Heide and C. A. Phillips, editors, SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, Thira, Santorini, Greece, June 13- 15, 2010, pages 355–364. ACM, 2010. ISBN 978-1-4503- 0079-7.

Digital Library

[31]

1810540.

[32]

S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating CUDA graph algorithms at maximum warp. In C. Cascaval and P. Yew, editors, Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2011, San Antonio, TX, USA, February 12-16, 2011, pages 267–276. ACM, 2011. ISBN 978-1-4503- 0119-0.

Digital Library

[33]

1941590.

[34]

S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: a DSL for easy and efficient graph analysis. In T. Harris and M. L. Scott, editors, Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, London, UK, March 3-7, 2012, pages 349–362. ACM, 2012. ISBN 978- 1-4503-0759-8.

Digital Library

[35]

S. Hong, S. Salihoglu, J. Widom, and K. Olukotun. Simplifying scalable graph processing with a domain-specific language. In D. R. Kaeli and T. Moseley, editors, 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2014, Orlando, FL, USA, February 15-19, 2014, page 208. ACM, 2014. ISBN 978-1-4503- 2670-4.

Digital Library

[36]

2544162.

[37]

R. Karrenberg and S. Hack. Improving Performance of OpenCL on CPUs. In M. F. P. O’Boyle, editor, Compiler Construction - 21st International Conference, CC 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24 - April 1, 2012. Proceedings, volume 7210 of Lecture Notes in Computer Science, pages 1–20. Springer, 2012. ISBN 978- 3-642-28651-3.

Digital Library

[38]

S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization. SIGPLAN Not., 44(4), Feb. 2009.

Digital Library

[39]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Proc. Conf. Uncertainty in Artificial Intelligence, UAI ’10, July 2010.

[40]

M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput., 15(4):1036–1053, 1986.

Digital Library

[41]

G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In A. K. Elmagarmid and D. Agrawal, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pages 135–146. ACM, 2010. ISBN 978-1-4503-0032-2.

Digital Library

[42]

D. Merrill, M. Garland, and A. S. Grimshaw. Scalable GPU graph traversal. In PPOPP 2012. ACM, 2012.

Digital Library

[43]

R. Nasre, M. Burtscher, and K. Pingali. Data-Driven Versus Topology-driven Irregular Computations on GPUs. In IPDPS 2013, 2013.

Digital Library

[44]

R. Nasre, M. Burtscher, and K. Pingali. Morph algorithms on GPUs. In PPoPP ’13, PPoPP ’13, 2013.

Digital Library

[45]

D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In M. Kaminsky and M. Dahlin, editors, ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP ’13, Farmington, PA, USA, November 3-6, 2013, pages 456–471. ACM, 2013. ISBN 978-1-4503- 2388-8.

Digital Library

[46]

2522739.

[47]

NVIDIA. NVIDIA’s next generation CUDA compute architecture: Kepler GK110 (whitepaper).

[48]

The CUDA C Programming Guide 7.5. NVIDIA, 2015.

[49]

M. A. O’Neil and M. Burtscher. Microarchitectural performance characterization of irregular GPU kernels. In 2014 IEEE International Symposium on Workload Characterization, IISWC 2014, Raleigh, NC, USA, October 26-28, 2014, pages 130–139. IEEE Computer Society, 2014. ISBN 978- 1-4799-6452-9.

[50]

The OpenACC API 2.0. OpenACC.org, 2013.

[51]

The OpenMP API 4.0. OpenMP Architecture Review Board, 2013.

[52]

S. Pai and K. Pingali. Lowering IrGL to CUDA. CoRR, abs/1607.05707, 2016.

[53]

S. Pai, R. Govindarajan, and M. J. Thazhuthaveetil. Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme. PACT 2012. ACM, 2012.

Digital Library

[54]

K. Pingali and G. Bilardi. Optimal Control Dependence Computation and the Roman Chariots Problem. ACM Trans. Program. Lang. Syst., 19(3):462–491, May 1997. ISSN 0164- 0925.

Digital Library

[55]

256217.

[56]

K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In PLDI 2011, PLDI 2011. ACM, 2011.

Digital Library

[57]

.

[58]

A. Polak. Counting triangles in large graphs on GPU. CoRR, abs/1503.00576, 2015.

[59]

D. Prountzos, R. Manevich, and K. Pingali. Elixir: a system for synthesizing concurrent graph programs. In G. T. Leavens and M. B. Dwyer, editors, Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, October 21-25, 2012, pages 375–394. ACM, 2012. ISBN 978-1-4503-1561-6.

Digital Library

[60]

D. Prountzos, R. Manevich, and K. Pingali. Synthesizing parallel graph programs via automated planning. In D. Grove and S. Blackburn, editors, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages 533–544. ACM, 2015. ISBN 978-1-4503-3468-6.

Digital Library

[61]

A. Ramamurthy. Towards scalar synchronization in SIMT architectures. Master’s thesis, The University of British Columbia, 2011.

[62]

S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware 2007, 2007.

Digital Library

[63]

J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In A. Nicolau, X. Shen, S. P. Amarasinghe, and R. W. Vuduc, editors, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, Shenzhen, China, February 23-27, 2013, pages 135–146. ACM, 2013. ISBN 978-1-4503-1922-5.

Digital Library

[64]

J. Soman, K. Kothapalli, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. In 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19-23 April 2010 - Workshop Proceedings, pages 1–8. IEEE, 2010.

[65]

M. Steinberger, M. Kenzel, P. Boechat, B. Kerbl, M. Dokter, and D. Schmalstieg. Whippletree: task-based scheduling of dynamic workloads on the GPU. ACM Trans. Graph., 33 (6):228:1–228:11, 2014.

Digital Library

[66]

L. O. Steven Dalton, Nathan Bell and M. Garland. Cusp: Generic parallel algorithms for sparse matrix and graph computations, 2014.

[67]

A. Venkat, M. Hall, and M. Strout. Loop and data transformations for sparse matrix code. PLDI 2015, 2015.

Digital Library

[68]

S. Verdoolaege, J. C. Juega, A. Cohen, J. I. Gómez, C. Tenllado, and F. Catthoor. Polyhedral parallel code generation for CUDA. ACM Trans. Archit. Code Optim., 9(4):54:1–54:23, Jan. 2013. ISSN 1544-3566.

Digital Library

[69]

V. Vineet, P. Harish, S. Patidar, and P. J. Narayanan. Fast minimum spanning tree for large graphs on the GPU. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on High Performance Graphics 2009. ACM, 2009.

Digital Library

[70]

Y. Wang, A. A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: a high-performance graph processing library on the GPU. PPoPP 2015. ACM, 2015.

Digital Library

[71]

Y. Wang, A. A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: a high-performance graph processing library on the GPU. In R. Asenjo and T. Harris, editors, Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, March 12-16, 2016, page 11. ACM, 2016. ISBN 978-1-4503-4092-2.

Digital Library

[72]

Y. Wu, Y. Wang, Y. Pan, C. Yang, and J. D. Owens. Performance characterization of high-level programming models for GPU graph analytics. In 2015 IEEE International Symposium on Workload Characterization, IISWC 2015, Atlanta, GA, USA, October 4-6, 2015, pages 66–75. IEEE, 2015. ISBN 978-1-5090-0088-3.

Digital Library

[73]

S. Xiao and W. Feng. Inter-block GPU communication via fast barrier synchronization. IPDPS 2010. IEEE, 2010.

[74]

Q. Xu, H. Jeon, and M. Annavaram. Graph processing on GPUs: Where are the bottlenecks? In 2014 IEEE International Symposium on Workload Characterization, IISWC 2014, Raleigh, NC, USA, October 26-28, 2014, pages 140– 149. IEEE Computer Society, 2014. ISBN 978-1-4799-6452- 9.

[75]

6983053.

[76]

Y. Yang and H. Zhou. CUDA-NP: realizing nested threadlevel parallelism in GPGPU applications. PPoPP 2014. ACM, 2014.

Digital Library

[77]

E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-thefly elimination of dynamic irregularities for GPU computing. In R. Gupta and T. C. Mowry, editors, Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2011, Newport Beach, CA, USA, March 5-11, 2011, pages 369–380. ACM, 2011. ISBN 978-1-4503-0266-1.

Digital Library

[78]

Y. Zhang and F. Mueller. CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures. In 41st International Conference on Parallel Processing, ICPP 2012, Pittsburgh, PA, USA, September 10-13, 2012, pages 340–349. IEEE Computer Society, 2012. ISBN 978-1-4673-2508-0.

Digital Library

[79]

J. Zhong and B. He. Medusa: Simplified Graph Processing on GPUs. IEEE Trans. Parallel Distrib. Syst., 25(6), 2014.

Digital Library

Cited By

Schmitz AMiller JBurak SMüller M(2024)Parallel Pattern Language Code GenerationProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649245(32-41)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1145/3649169.3649245
Olgu KKenter TNunez-Yanez JMcintosh-Smith S(2024)Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing ArchitecturesProceedings of the 12th International Workshop on OpenCL and SYCL10.1145/3648115.3648134(1-11)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3648115.3648134
Lee HDathathri RPingali KTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Kimbap: A Node-Property Map System for Distributed Graph AnalyticsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640421(566-581)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640421
Show More Cited By

Index Terms

A compiler for throughput optimization of graph algorithms on GPUs
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language features

Recommendations

A compiler for throughput optimization of graph algorithms on GPUs
OOPSLA '16

Writing high-performance GPU implementations of graph algorithms can be challenging. In this paper, we argue that three optimizations called throughput optimizations are key to high-performance for this application class. These optimizations describe a ...
Considerations in using OpenCL on GPUs and FPGAs for throughput-oriented genomics workloads
Abstract
The recent upsurge in the available amount of health data and the advances in next-generation sequencing are setting the ground for the long-awaited precision medicine. To process this deluge of data, bioinformatics workloads are ...
Highlights
- Refactoring of OpenCL GPU code to efficiently run on multiple FPGAs.
- Multi-...
A case study on compiler optimizations for the Intel® Core™ 2 duo processor

The complexity of modern processors poses increasingly more difficult challenges to software optimization. Modern optimizing compilers have become essential tools for leveraging the power of recent processors by means of high-level optimizations to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications

October 2016

915 pages

ISBN:9781450344449

DOI:10.1145/2983990

General Chair:
Eelco Visser
Delft University of Technology, Netherlands
,
Program Chair:
Yannis Smaragdakis
University of Athens, Greece

ACM SIGPLAN Notices Volume 51, Issue 10
OOPSLA '16
October 2016
915 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3022671
Editor:
Matthew Fluet
Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

In-Cooperation

SIGAda: ACM Special Interest Group on Ada Programming Language

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SPLASH '16

Sponsor:

SIGPLAN

SPLASH '16: Conference on Systems, Programming, Languages, and Applications: Software for Humanity

November 2 - 4, 2016

Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
1,892
Total Downloads

Downloads (Last 12 months)251
Downloads (Last 6 weeks)26

Reflects downloads up to 04 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Schmitz AMiller JBurak SMüller M(2024)Parallel Pattern Language Code GenerationProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649245(32-41)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1145/3649169.3649245
Olgu KKenter TNunez-Yanez JMcintosh-Smith S(2024)Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing ArchitecturesProceedings of the 12th International Workshop on OpenCL and SYCL10.1145/3648115.3648134(1-11)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3648115.3648134
Lee HDathathri RPingali KTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Kimbap: A Node-Property Map System for Distributed Graph AnalyticsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640421(566-581)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640421
Wu CLi JLi ZZhang JTang P(2024)Accelerating Maximal Bicliques Enumeration with GPU on large scale networkFuture Generation Computer Systems10.1016/j.future.2024.07.021Online publication date: Jul-2024
https://doi.org/10.1016/j.future.2024.07.021
Fallin AGonzalez ASeo JBurtscher MMohror KArnold DBadia R(2023)A High-Performance MST Implementation for GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607093(1-13)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607093
Alabandi GSands WBiros GBurtscher MMohror KArnold DBadia R(2023)A GPU Algorithm for Detecting Strongly Connected ComponentsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607071(1-13)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607071
Zhang XChang YLu TZhang KChen M(2023)Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00013(25-35)Online publication date: May-2023
https://doi.org/10.1109/CCGrid57682.2023.00013
Jeong SLee YLee JChoi HSong SLee JKim YKim HKloeckner AMoreira J(2022)Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph ProcessingProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569686(198-210)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569686
Xia YJiang PAgrawal GRamnath R(2022)Scaling and Selecting GPU Methods for All Pairs Shortest Paths (APSP) Computations2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00027(190-200)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00027
Brahmakshatriya AAmarasinghe S(2022)GraphIt to CUDA Compiler in 2021 LOC: A Case for High-Performance DSL Implementation via Staging with BuilDSL2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741280(53-65)Online publication date: 2-Apr-2022
https://doi.org/10.1109/CGO53902.2022.9741280
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents