research-article

Many Cores, Many Models: GPU Programming Model vs. Vendor Compatibility Overview

Author:

Andreas HertenAuthors Info & Claims

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 1019 - 1026

https://doi.org/10.1145/3624062.3624178

Published: 12 November 2023 Publication History

Abstract

In recent history, GPUs became a key driver of compute performance in HPC. With the installation of the Frontier supercomputer, they became the enablers of the Exascale era; further largest-scale installations are in progress (Aurora, El Capitan, JUPITER). But the early-day dominance by NVIDIA and their CUDA programming model has changed: The current HPC GPU landscape features three vendors (AMD, Intel, NVIDIA), each with native and derived programming models. The choices are ample, but not all models are supported on all platforms, especially if support for Fortran is needed; in addition, some restrictions might apply. It is hard for scientific programmers to navigate this abundance of choices and limits. This paper gives a guide by matching the GPU platforms with supported programming models, presented in a concise table and further elaborated in detailed comments. An assessment is made regarding the level of support of a model on a platform.

References

[1]

Aksel Alpay, Bálint Soproni, Holger Wünsche, and Vincent Heuveline. 2022. Exploring the possibility of a hipSYCL-based implementation of oneAPI. In International Workshop on OpenCL. ACM. https://doi.org/10.1145/3529538.3530005

Digital Library

[2]

AMD. 2023. AOMP. https://github.com/ROCm-Developer-Tools/aomp

[3]

AMD. 2023. GPUFORT. https://github.com/ROCmSoftwarePlatform/gpufort

[4]

AMD. 2023. HIP. https://rocm.docs.amd.com/projects/HIP/en/latest/

[5]

AMD. 2023. hipfort. https://rocm.docs.amd.com/projects/hipfort/en/latest/

[6]

AMD. 2023. roc-stdpar. https://github.com/ROCmSoftwarePlatform/roc-stdpar

[7]

Libompx Authors and Contributors. [n. d.]. Libompx. https://github.com/markdewing/libompx/tree/add_catch

[8]

Pierre Carbonelle. 2023. PopularitY of Programming Language. https://pypl.github.io/PYPL.html

[9]

Valentin Clement and Jeffrey S. Vetter. 2021. Flacc: Towards OpenACC support for Fortran in the LLVM Ecosystem. In 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). 12–19. https://doi.org/10.1109/LLVMHPC54804.2021.00007

[10]

TOP500 Compilers. 2023. TOP500 List. https://www.top500.org/lists/top500/2023/06/

[11]

Tom J Deakin, Andrei Poenaru, Tom Lin, and Simon N Mcintosh-Smith. 2021. Tracking Performance Portability on the Yellow Brick Road to Exascale. In Proceedings of P3HPC 2020(Proceedings of P3HPC 2020: International Workshop on Performance, Portability, and Productivity in HPC, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis). Institute of Electrical and Electronics Engineers (IEEE), United States, 1–13. https://doi.org/10.1109/P3HPC51967.2020.00006 Publisher Copyright: © 2020 IEEE.

[12]

Tom Deakin, James Price, Matt Martineau, and Simon McIntosh-Smith. 2018. Evaluating attainable memory bandwidth of parallel programming models via BabelStream. International Journal of Computational Science and Engineering 17, 3 (2018), 247–262. https://doi.org/10.1504/IJCSE.2018.095847 arXiv:https://www.inderscienceonline.com/doi/pdf/10.1504/IJCSE.2018.095847

[13]

Joel E. Denny, Seyong Lee, and Jeffrey S. Vetter. 2018. CLACC: Translating OpenACC to OpenMP in Clang. In 2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). 18–29. https://doi.org/10.1109/LLVM-HPC.2018.8639349

[14]

GCC Developers. 2023. GCC OpenMP. https://gcc.gnu.org/wiki/openmp

[15]

LLVM/Clang Developers. 2023. Clang OpenMP. https://clang.llvm.org/docs/OpenMPSupport.html

[16]

GCC. 2023. GCC OpenACC. https://gcc.gnu.org/wiki/OpenACC

[17]

Khronos Group. 2023. SYCL. https://www.khronos.org/sycl/

[18]

Jeff Hammond. 2022. Shifting through the Gears of GPU Programming: Understanding Performance and Portability Trade-offs. https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41620/ GTC Digital Spring Conference.

[19]

Jeff R Hammond, Tom Deakin, Jim H Cownie, and Simon N McIntosh-Smith. 2022. Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream. In 2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). Institute of Electrical and Electronics Engineers (IEEE), United States, 1–18. https://doi.org/10.1109/PMBS56514.2022.00013 SC 2022 Workshops International Conference for High Performance Computing, Networking, Storage and Analysis ; Conference date: 13-11-2022 Through 18-11-2022.

[20]

Andreas Herten. 2022. GPU Vendor/Programming Model Compatibility Table. https://github.com/AndiH/gpu-lang-compat

[21]

Andreas Herten. 2022. GPU Vendor/Programming Model Compatibility Table. (2022). https://doi.org/10.34732/XDVBLG-R1BVIF

[22]

Andreas Herten and Kaveh Haghighi Mood. 2022. Many Ways to GPUs - A GPU Introduction. https://juser.fz-juelich.de/record/916369 The content is also available at https://www.nat-esm.de/services/documentation.

[23]

HPE. 2023. HPE Cray Programming Environment. https://www.hpe.com/psnow/doc/a50002303enw

[24]

Thomas Huber, Swaroop Pophale, Nolan Baker, Michael Carr, Nikhil Rao, Jaydon Reap, Kristina Holsapple, Joshua Hoke Davis, Tobias Burnus, Seyong Lee, David E. Bernholdt, and Sunita Chandrasekaran. 2022. ECP SOLLVE: Validation and Verification Testsuite Status Update and Compiler Insight for OpenMP. In 2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 123–135. https://doi.org/10.1109/P3HPC56579.2022.00017

[25]

Intel. 2023. Data Parallel Control. https://github.com/IntelPython/dpctl

[26]

Intel. 2023. Data Parallel Extension for Numpy. https://github.com/IntelPython/dpnp

[27]

Intel. 2023. Data-parallel Extension to Numba. https://github.com/IntelPython/numba-dpex

[28]

Intel. 2023. Intel Application Migration Tool for OpenACC to OpenMP API. https://github.com/intel/intel-application-migration-tool-for-openacc-to-openmp

[29]

Intel. 2023. oneAPI. https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html

[30]

Intel. 2023. oneDPL. https://oneapi-src.github.io/oneDPL/index.html

[31]

Intel. 2023. SYCLomatic. https://github.com/oneapi-src/SYCLomatic

[32]

Intel and Contributors. 2023. oneAPI DPC++ Compiler. https://github.com/intel/llvm LLVM-fork with DPC++ support by Intel.

[33]

Aaron Jarmusch and Sunita Chandrasekaran. [n. d.]. OpenACC Verification and Validation Testsuite. https://crpl.cis.udel.edu/oaccvv/

[34]

Aaron Jarmusch, Aaron Liu, Christian Munley, Daniel Horta, Vaidhyanathan Ravichandran, Joel Denny, Kyle Friedline, and Sunita Chandrasekaran. 2022. Analysis of Validating and Verifying OpenACC Compilers 3.0 and Above. In 2022 Workshop on Accelerator Programming Using Directives (WACCPD). 1–10. https://doi.org/10.1109/WACCPD56842.2022.00006

[35]

Hartmut Kaiser, Mikael Simberg, Bryce Adelstein Lelbach, Thomas Heller, Agustin Berge, John Biddiscombe, Auriane Reverdell, Anton Bikineev, Grant Mercer, Andreas Schaefer, Kevin Huck, Adrian Lemoine, Taeguk Kwon, Jeroen Habraken, Matthew Anderson, Steven R. Brandt, Marcin Copik, Srinivas Yadav, Martin Stumpf, Daniel Bourgeois, Akhil Nair, Denis Blank, Giannis Gonidelis, Rebecca Stobaugh, Nikunj Gupta, Shoshana Jakobovits, Vinay Amatya, Lars Viklund, Patrick Diehl, and Zahra Khatami. 2023. STEllAR-GROUP/hpx: HPX V1.9.1: The C++ Standards Library for Parallelism and Concurrency. https://doi.org/10.5281/zenodo.5185328

[36]

Andreas Kloeckner, Gert Wohlgemuth, Gregory Lee, Tomasz Rybak, Alex Nitz, David Chiang, Stan Seibert, Martin Bergtholdt, Thomas Unterthiner, Graham Markall, Mit Kotak, Vincent Favre-Nicolin, Bogdan Opanchuk, Bruce Merry, Nicolas Pinto, Fabrizio Milo, Thomas Collignon, Florian Rathgeber, Simon Perkins, Vladimir Rutsky, Bryan Catanzaro, Alex Park, Freddie Witherden, Lev E. Givon, Luke Pfister, Marcus Brubaker, RA ZA, Loic Hausammann, and Christoph Gohlke. 2023. PyCUDA. https://doi.org/10.5281/zenodo.8121901

[37]

Siu Kwan Lam, stuartarchibald, Antoine Pitrou, Mark Florisson, Stan Seibert, Graham Markall, esc, Todd A. Anderson, Guilherme Leobas, rjenc29, Michael Collison, luk-f a, Jay Bourque, Aaron Meurer, Kaustubh, Travis E. Oliphant, Nick Riasanovsky, Michael Wang, densmirn, njwhite, Ethan Pronovost, Ehsan Totoni, Eric Wieser, Stefan Seefeld, Hernan Grecco, Andre Masella, Pearu Peterson, Isaac Virshup, Matty G, and Itamar Turner-Trauring. 2023. numba/numba: Version 0.57.1. https://doi.org/10.5281/zenodo.8087361

[38]

Meifeng Lin, Zhihua Dong, Tianle Wang, Mohammad Atif, Meghna Battacharya, Kyle Knoepfel, Charles Leggett, Brett Viren, and Haiwang Yu. 2023. Portable Programming Model Exploration for LArTPC Simulation in a Heterogeneous Computing Environment: OpenMP vs. SYCL. (4 2023). https://www.osti.gov/biblio/1973454

[39]

LLVM/Flang. 2023. Flang. https://flang.llvm.org/

[40]

George S. Markomanolis, Aksel Alpay, Jeffrey Young, Michael Klemm, Nicholas Malaya, Aniello Esposito, Jussi Heikonen, Sergei Bastrakov, Alexander Debus, Thomas Kluge, Klaus Steiniger, Jan Stephan, Rene Widera, and Michael Bussmann. 2022. Evaluating GPU Programming Models for the LUMI Supercomputer. In Supercomputing Frontiers, Dhabaleswar K. Panda and Michael Sullivan (Eds.). Springer International Publishing, Cham, 79–101.

[41]

A. Matthes, R. Widera, E. Zenker, B. Worpitz, A. Huebl, and M. Bussmann. 2017. Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library. arxiv:1706.10086https://arxiv.org/abs/1706.10086

[42]

Microsoft. 2023. C++ AMP. https://learn.microsoft.com/en-us/cpp/parallel/amp/cpp-amp-cpp-accelerated-massive-parallelism?view=msvc-170

[43]

NVIDIA. 2023. CUDA Fortran. https://developer.nvidia.com/cuda-fortran

[44]

NVIDIA. 2023. CUDA Python. https://nvidia.github.io/cuda-python/index.html

[45]

NVIDIA. 2023. CUDA Toolkit. https://developer.nvidia.com/cuda-toolkit

[46]

NVIDIA. 2023. cuNumeric. https://developer.nvidia.com/cunumeric

[47]

NVIDIA. 2023. NVIDIA HPC SDK. https://developer.nvidia.com/hpc-sdk

[48]

Ryosuke Okuta, Yuya Unno, Daisuke Nishino, Shohei Hido, and Crissman Loomis. 2017. CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS). http://learningsys.org/nips17/assets/papers/paper_16.pdf

[49]

Swaroop Pophale, Felipe Cabarcas, and Sunita Chandrasekaran. [n. d.]. OpenMP Validation and Verification Testsuite. https://crpl.cis.udel.edu/ompvvsollve

[50]

ECP Exascale Computing Project. 2022. OpenMP Roadmap for Accelerators Across DOE Pre-Exascale/Exascale Machines. https://www.openmp.org/wp-content/uploads/2022_ECP_Community_BoF_Days-OpenMP_RoadMap_BoF.pdf

[51]

PyTorch Authors and Contributors. 2023. PyTorch C++ Interface. https://pytorch.org/cppdocs/frontend.html

[52]

RAJA Authors and Contributors. [n. d.]. RAJA Performance Portability Layer. https://github.com/LLNL/RAJA

[53]

John E. Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. Computing in Science & Engineering 12, 3 (2010), 66–73. https://doi.org/10.1109/MCSE.2010.69

Digital Library

[54]

TIOBE. 2023. TIOBE Index. https://www.tiobe.com/tiobe-index/

[55]

Christian R. Trott, Damien Lebrun-Grandié, Daniel Arndt, Jan Ciesko, Vinh Dang, Nathan Ellingwood, Rahulkumar Gayatri, Evan Harvey, Daisy S. Hollman, Dan Ibanez, Nevin Liber, Jonathan Madsen, Jeff Miles, David Poliakoff, Amy Powell, Sivasankaran Rajamanickam, Mikael Simberg, Dan Sunderland, Bruno Turcksin, and Jeremiah Wilke. 2022. Kokkos 3: Programming Model Extensions for the Exascale Era. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 805–817. https://doi.org/10.1109/TPDS.2021.3097283

[56]

Jisheng Zhao, Colleen Bertoni, Jeffrey Young, Kevin Harms, Vivek Sarkar, and Brice Videau. 2023. HIPLZ: Enabling Performance Portability for Exascale Systems. In Euro-Par 2022: Parallel Processing Workshops, Jeremy Singer, Yehia Elkhatib, Dora Blanco Heras, Patrick Diehl, Nick Brown, and Aleksandar Ilic (Eds.). Springer Nature Switzerland, Cham, 197–210.

Recommendations

Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems

The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...
Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL
IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogeneity of available accelerator cards within current supercomputers, portability is a key aspect for modern HPC applications. In Octo-Tiger, an astrophysics application simulating ...
Many-Core Accelerated LIBOR Swaption Portfolio Pricing
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

This paper describes the acceleration of a Monte-Carlo algorithm for pricing a LIBOR swaption portfolio using multi-core CPUs and GPUs. Speedups of up to 305x are achieved on two Nvidia Tesla M2050 GPUs and up to 20.8x on two Intel Xeon E5620 CPUs, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

November 2023

2180 pages

ISBN:9798400707858

DOI:10.1145/3624062

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SC-W 2023

SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

November 12 - 17, 2023

CO, Denver, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
141
Total Downloads

Downloads (Last 12 months)141
Downloads (Last 6 weeks)21

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents