Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2983990.2984015acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article
Public Access

A compiler for throughput optimization of graph algorithms on GPUs

Published: 19 October 2016 Publication History

Abstract

Writing high-performance GPU implementations of graph algorithms can be challenging. In this paper, we argue that three optimizations called throughput optimizations are key to high-performance for this application class. These optimizations describe a large implementation space making it unrealistic for programmers to implement them by hand.
To address this problem, we have implemented these optimizations in a compiler that produces CUDA code from an intermediate-level program representation called IrGL. Compared to state-of-the-art handwritten CUDA implementations of eight graph applications, code generated by the IrGL compiler is up to 5.95x times faster (median 1.4x) for five applications and never more than 30% slower for the others. Throughput optimizations contribute an improvement up to 4.16x (median 1.4x) to the performance of unoptimized IrGL code.

References

[1]
The LonestarGPU 2.0 benchmark suite, 2014.
[2]
J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: Weak behaviours and programming assumptions. In Ö. Özturk, K. Ebcioglu, and S. Dwarkadas, editors, Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’15, Istanbul, Turkey, March 14- 18, 2015, pages 577–591. ACM, 2015. ISBN 978-1-4503- 2835-7.
[3]
[4]
R. Baghdadi, A. Cohen, S. Guelton, S. Verdoolaege, J. Inoue, T. Grosser, G. Kouveli, A. Kravets, A. Lokhmotov, C. Nugteren, F. Waters, and A. Donaldson. PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs. In WOLFHPC 2012 - 2nd Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, Salt Lake City, Utah, United States, Nov. 2012.
[5]
R. Baghdadi, U. Beaugnon, A. Cohen, T. Grosser, M. Kruse, C. Reddy, S. Verdoolaege, A. Betts, A. F. Donaldson, J. Ketema, J. Absar, S. van Haastregt, A. Kravets, A. Lokhmotov, R. David, and E. Hajiyev. PENCIL: A platform-neutral compute intermediate language for accelerator programming. In 2015 International Conference on Parallel Architecture and Compilation, PACT 2015, San Francisco, CA, USA, October 18-21, 2015, pages 138–149. IEEE Computer Society, 2015. ISBN 978-1-4673-9524-3.
[6]
E. Bardsley, A. Betts, N. Chong, P. Collingbourne, P. Deligiannis, A. F. Donaldson, J. Ketema, D. Liew, and S. Qadeer. Engineering a static verification tool for GPU kernels. In A. Biere and R. Bloem, editors, Computer Aided Verification - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18- 22, 2014. Proceedings, volume 8559 of Lecture Notes in Computer Science, pages 226–242. Springer, 2014. ISBN 978- 3-319-08866-2.
[7]
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs. ICS ’08, 2008.
[8]
S. Beamer, K. Asanovic, and D. A. Patterson. Directionoptimizing breadth-first search. In J. K. Hollingsworth, editor, SC Conference on High Performance Computing Networking, Storage and Analysis, SC ’12, Salt Lake City, UT, USA - November 11 - 15, 2012, page 12. IEEE/ACM, 2012. ISBN 978-1-4673-0804-5.
[9]
L. Bergstrom and J. H. Reppy. Nested data-parallelism on the GPU. In P. Thiemann and R. B. Findler, editors, ACM SIGPLAN International Conference on Functional Programming, ICFP’12, Copenhagen, Denmark, September 9- 15, 2012, pages 247–258. ACM, 2012. ISBN 978-1-4503- 1054-3.
[10]
[11]
A. Betts, N. Chong, A. F. Donaldson, J. Ketema, S. Qadeer, P. Thomson, and J. Wickerson. The design and implementation of a verification technique for GPU kernels. ACM Trans. Program. Lang. Syst., 37(3):10, 2015.
[12]
M. Burtscher, R. Nasre, and K. Pingali. A quantitative study of irregular programs on GPUs. In IISWC 2012, La Jolla, CA, USA, November 4-6, 2012, IISWC 2012, 2012.
[13]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 442– 446.
[14]
S. Che, B. M. Beckmann, S. K. Reinhardt, and K. Skadron. Pannotia: Understanding irregular GPGPU graph applications. In Proceedings of the IEEE International Symposium on Workload Characterization, IISWC 2013, Portland, OR, USA, September 22-24, 2013, pages 185–195. IEEE Computer Society, 2013. ISBN 978-1-4799-0553-9.
[15]
U. Cheramangalath, R. Nasre, and Y. Srikant. Falcon: A graph manipulation language for heterogeneous systems. Technical Report 2015-5, Indian Institute of Science, Department of Computer Science and Automation, 2015.
[16]
U. Cheramangalath, R. Nasre, and Y. N. Srikant. Falcon: A graph manipulation language for heterogeneous systems. TACO, 12(4):54, 2016.
[17]
S. Collange. Identifying scalar behavior in CUDA kernels. Technical report, Jan. 2011.
[18]
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms, McGraw Hill, 2001.
[19]
C. da Silva Sousa, A. Mariano, and A. Proença. A generic and highly efficient parallel variant of Bor˚uvka’s algorithm. In 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2015.
[20]
A. A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel GPU methods for single-source shortest paths. In 2014 IEEE IPDPS, 2014.
[21]
I. J. Egielski, J. Huang, and E. Z. Zhang. Massive atomics for massive parallelism on GPUs. In D. Grove and S. Z. Guyer, editors, International Symposium on Memory Management, ISMM ’14, Edinburgh, United Kingdom, June 12, 2014, pages 93–103. ACM, 2014. ISBN 978-1-4503-2921-7.
[22]
E. Elsen and V. Vaidyanathan. Vertexapi2 – a vertex-program api for large graph computations on the gpu. 2014.
[23]
P. Fatourou and N. D. Kallimanis. Revisiting the combining synchronization technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, February 25-29, 2012, pages 257–266. ACM, 2012.
[24]
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319–349, July 1987. ISSN 0164- 0925.
[25]
[26]
Z. Fu, B. B. Thompson, and M. Personick. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In P. A. Boncz and J. Larriba-Pey, editors, Second International Workshop on Graph Data Management Experiences and Systems, GRADES 2014, co-loated with SIGMOD/PODS 2014, Snowbird, Utah, USA, June 22, 2014, pages 2:1–2:6. CWI/ACM, 2014. ISBN 978-1-4503-2982-8.
[27]
A. Gharaibeh, L. B. Costa, E. Santos-Neto, and M. Ripeanu. A yoke of oxen and a thousand chickens for heavy lifting graph processing. In PACT ’12. ACM, 2012.
[28]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In C. Thekkath and A. Vahdat, editors, 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8-10, 2012, pages 17–30. USENIX Association, 2012. ISBN 978-1-931971-96-6.
[29]
K. Gupta, J. A. Stuart, and J. D. Owens. A Study of Persistent Threads Style GPU Programming for GPGPU Workloads. In Innovative Parallel Computing, May 2012.
[30]
D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In F. M. auf der Heide and C. A. Phillips, editors, SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, Thira, Santorini, Greece, June 13- 15, 2010, pages 355–364. ACM, 2010. ISBN 978-1-4503- 0079-7.
[31]
[32]
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating CUDA graph algorithms at maximum warp. In C. Cascaval and P. Yew, editors, Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2011, San Antonio, TX, USA, February 12-16, 2011, pages 267–276. ACM, 2011. ISBN 978-1-4503- 0119-0.
[33]
[34]
S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: a DSL for easy and efficient graph analysis. In T. Harris and M. L. Scott, editors, Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, London, UK, March 3-7, 2012, pages 349–362. ACM, 2012. ISBN 978- 1-4503-0759-8.
[35]
S. Hong, S. Salihoglu, J. Widom, and K. Olukotun. Simplifying scalable graph processing with a domain-specific language. In D. R. Kaeli and T. Moseley, editors, 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2014, Orlando, FL, USA, February 15-19, 2014, page 208. ACM, 2014. ISBN 978-1-4503- 2670-4.
[36]
[37]
R. Karrenberg and S. Hack. Improving Performance of OpenCL on CPUs. In M. F. P. O’Boyle, editor, Compiler Construction - 21st International Conference, CC 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24 - April 1, 2012. Proceedings, volume 7210 of Lecture Notes in Computer Science, pages 1–20. Springer, 2012. ISBN 978- 3-642-28651-3.
[38]
S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization. SIGPLAN Not., 44(4), Feb. 2009.
[39]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Proc. Conf. Uncertainty in Artificial Intelligence, UAI ’10, July 2010.
[40]
M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput., 15(4):1036–1053, 1986.
[41]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In A. K. Elmagarmid and D. Agrawal, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pages 135–146. ACM, 2010. ISBN 978-1-4503-0032-2.
[42]
D. Merrill, M. Garland, and A. S. Grimshaw. Scalable GPU graph traversal. In PPOPP 2012. ACM, 2012.
[43]
R. Nasre, M. Burtscher, and K. Pingali. Data-Driven Versus Topology-driven Irregular Computations on GPUs. In IPDPS 2013, 2013.
[44]
R. Nasre, M. Burtscher, and K. Pingali. Morph algorithms on GPUs. In PPoPP ’13, PPoPP ’13, 2013.
[45]
D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In M. Kaminsky and M. Dahlin, editors, ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP ’13, Farmington, PA, USA, November 3-6, 2013, pages 456–471. ACM, 2013. ISBN 978-1-4503- 2388-8.
[46]
[47]
NVIDIA. NVIDIA’s next generation CUDA compute architecture: Kepler GK110 (whitepaper).
[48]
The CUDA C Programming Guide 7.5. NVIDIA, 2015.
[49]
M. A. O’Neil and M. Burtscher. Microarchitectural performance characterization of irregular GPU kernels. In 2014 IEEE International Symposium on Workload Characterization, IISWC 2014, Raleigh, NC, USA, October 26-28, 2014, pages 130–139. IEEE Computer Society, 2014. ISBN 978- 1-4799-6452-9.
[50]
The OpenACC API 2.0. OpenACC.org, 2013.
[51]
The OpenMP API 4.0. OpenMP Architecture Review Board, 2013.
[52]
S. Pai and K. Pingali. Lowering IrGL to CUDA. CoRR, abs/1607.05707, 2016.
[53]
S. Pai, R. Govindarajan, and M. J. Thazhuthaveetil. Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme. PACT 2012. ACM, 2012.
[54]
K. Pingali and G. Bilardi. Optimal Control Dependence Computation and the Roman Chariots Problem. ACM Trans. Program. Lang. Syst., 19(3):462–491, May 1997. ISSN 0164- 0925.
[55]
[56]
K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In PLDI 2011, PLDI 2011. ACM, 2011.
[58]
A. Polak. Counting triangles in large graphs on GPU. CoRR, abs/1503.00576, 2015.
[59]
D. Prountzos, R. Manevich, and K. Pingali. Elixir: a system for synthesizing concurrent graph programs. In G. T. Leavens and M. B. Dwyer, editors, Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, October 21-25, 2012, pages 375–394. ACM, 2012. ISBN 978-1-4503-1561-6.
[60]
D. Prountzos, R. Manevich, and K. Pingali. Synthesizing parallel graph programs via automated planning. In D. Grove and S. Blackburn, editors, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages 533–544. ACM, 2015. ISBN 978-1-4503-3468-6.
[61]
A. Ramamurthy. Towards scalar synchronization in SIMT architectures. Master’s thesis, The University of British Columbia, 2011.
[62]
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware 2007, 2007.
[63]
J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In A. Nicolau, X. Shen, S. P. Amarasinghe, and R. W. Vuduc, editors, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, Shenzhen, China, February 23-27, 2013, pages 135–146. ACM, 2013. ISBN 978-1-4503-1922-5.
[64]
J. Soman, K. Kothapalli, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. In 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19-23 April 2010 - Workshop Proceedings, pages 1–8. IEEE, 2010.
[65]
M. Steinberger, M. Kenzel, P. Boechat, B. Kerbl, M. Dokter, and D. Schmalstieg. Whippletree: task-based scheduling of dynamic workloads on the GPU. ACM Trans. Graph., 33 (6):228:1–228:11, 2014.
[66]
L. O. Steven Dalton, Nathan Bell and M. Garland. Cusp: Generic parallel algorithms for sparse matrix and graph computations, 2014.
[67]
A. Venkat, M. Hall, and M. Strout. Loop and data transformations for sparse matrix code. PLDI 2015, 2015.
[68]
S. Verdoolaege, J. C. Juega, A. Cohen, J. I. Gómez, C. Tenllado, and F. Catthoor. Polyhedral parallel code generation for CUDA. ACM Trans. Archit. Code Optim., 9(4):54:1–54:23, Jan. 2013. ISSN 1544-3566.
[69]
V. Vineet, P. Harish, S. Patidar, and P. J. Narayanan. Fast minimum spanning tree for large graphs on the GPU. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on High Performance Graphics 2009. ACM, 2009.
[70]
Y. Wang, A. A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: a high-performance graph processing library on the GPU. PPoPP 2015. ACM, 2015.
[71]
Y. Wang, A. A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: a high-performance graph processing library on the GPU. In R. Asenjo and T. Harris, editors, Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, March 12-16, 2016, page 11. ACM, 2016. ISBN 978-1-4503-4092-2.
[72]
Y. Wu, Y. Wang, Y. Pan, C. Yang, and J. D. Owens. Performance characterization of high-level programming models for GPU graph analytics. In 2015 IEEE International Symposium on Workload Characterization, IISWC 2015, Atlanta, GA, USA, October 4-6, 2015, pages 66–75. IEEE, 2015. ISBN 978-1-5090-0088-3.
[73]
S. Xiao and W. Feng. Inter-block GPU communication via fast barrier synchronization. IPDPS 2010. IEEE, 2010.
[74]
Q. Xu, H. Jeon, and M. Annavaram. Graph processing on GPUs: Where are the bottlenecks? In 2014 IEEE International Symposium on Workload Characterization, IISWC 2014, Raleigh, NC, USA, October 26-28, 2014, pages 140– 149. IEEE Computer Society, 2014. ISBN 978-1-4799-6452- 9.
[75]
[76]
Y. Yang and H. Zhou. CUDA-NP: realizing nested threadlevel parallelism in GPGPU applications. PPoPP 2014. ACM, 2014.
[77]
E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-thefly elimination of dynamic irregularities for GPU computing. In R. Gupta and T. C. Mowry, editors, Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2011, Newport Beach, CA, USA, March 5-11, 2011, pages 369–380. ACM, 2011. ISBN 978-1-4503-0266-1.
[78]
Y. Zhang and F. Mueller. CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures. In 41st International Conference on Parallel Processing, ICPP 2012, Pittsburgh, PA, USA, September 10-13, 2012, pages 340–349. IEEE Computer Society, 2012. ISBN 978-1-4673-2508-0.
[79]
J. Zhong and B. He. Medusa: Simplified Graph Processing on GPUs. IEEE Trans. Parallel Distrib. Syst., 25(6), 2014.

Cited By

View all
  • (2024)Parallel Pattern Language Code GenerationProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649245(32-41)Online publication date: 3-Mar-2024
  • (2024)Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing ArchitecturesProceedings of the 12th International Workshop on OpenCL and SYCL10.1145/3648115.3648134(1-11)Online publication date: 8-Apr-2024
  • (2024)Kimbap: A Node-Property Map System for Distributed Graph AnalyticsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640421(566-581)Online publication date: 27-Apr-2024
  • Show More Cited By

Index Terms

  1. A compiler for throughput optimization of graph algorithms on GPUs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
      October 2016
      915 pages
      ISBN:9781450344449
      DOI:10.1145/2983990
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 October 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. GPUs
      2. Graph applications
      3. amorphous data-parallelism
      4. compilers
      5. optimization
      6. throughput

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SPLASH '16
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 268 of 1,244 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)251
      • Downloads (Last 6 weeks)26
      Reflects downloads up to 04 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Parallel Pattern Language Code GenerationProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649245(32-41)Online publication date: 3-Mar-2024
      • (2024)Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing ArchitecturesProceedings of the 12th International Workshop on OpenCL and SYCL10.1145/3648115.3648134(1-11)Online publication date: 8-Apr-2024
      • (2024)Kimbap: A Node-Property Map System for Distributed Graph AnalyticsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640421(566-581)Online publication date: 27-Apr-2024
      • (2024)Accelerating Maximal Bicliques Enumeration with GPU on large scale networkFuture Generation Computer Systems10.1016/j.future.2024.07.021Online publication date: Jul-2024
      • (2023)A High-Performance MST Implementation for GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607093(1-13)Online publication date: 12-Nov-2023
      • (2023)A GPU Algorithm for Detecting Strongly Connected ComponentsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607071(1-13)Online publication date: 12-Nov-2023
      • (2023)Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00013(25-35)Online publication date: May-2023
      • (2022)Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph ProcessingProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569686(198-210)Online publication date: 8-Oct-2022
      • (2022)Scaling and Selecting GPU Methods for All Pairs Shortest Paths (APSP) Computations2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00027(190-200)Online publication date: May-2022
      • (2022)GraphIt to CUDA Compiler in 2021 LOC: A Case for High-Performance DSL Implementation via Staging with BuilDSL2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741280(53-65)Online publication date: 2-Apr-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media