Analytical Performance Estimation during Code Generation on Modern GPUs

Ernst, Dominik; Holzer, Markus; Hager, Georg; Knorr, Matthias; Wellein, Gerhard

doi:10.1016/j.jpdc.2022.11.003

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2204.14242 (cs)

[Submitted on 29 Apr 2022]

Title:Analytical Performance Estimation during Code Generation on Modern GPUs

Authors:Dominik Ernst, Markus Holzer, Georg Hager, Matthias Knorr, Gerhard Wellein

View PDF

Abstract:Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The code generation process involves the selection of adequate code transformations, tuning parameters, and parallelization strategies. We propose an alternative to time-intensive autotuning, scenario-specific performance models, or black-box machine learning to select the best-performing configuration.
This paper identifies the relevant performance-defining mechanisms for memory-intensive GPU applications through a performance model coupled with an analytic hardware metric estimator. This enables a quick exploration of large configuration spaces to identify highly efficient code candidates with high accuracy.
We examine the changes of the A100 GPU architecture compared to the predecessor V100 and address the challenges of how to model the data transfer volumes through the new memory hierarchy.
We show how our method can be coupled to the pystencils stencil code generator, which is used to generate kernels for a range-four 3D-25pt stencil and a complex two-phase fluid solver based on the Lattice Boltzmann Method. For both, it delivers a ranking that can be used to select the best-performing candidate.
The method is not limited to stencil kernels but can be integrated into any code generator that can generate the required address expressions.

Comments:	arXiv admin note: substantial text overlap with arXiv:2107.01143
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2204.14242 [cs.DC]
	(or arXiv:2204.14242v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2204.14242
Related DOI:	https://doi.org/10.1016/j.jpdc.2022.11.003

Submission history

From: Dominik Ernst [view email]
[v1] Fri, 29 Apr 2022 17:11:37 UTC (2,129 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Analytical Performance Estimation during Code Generation on Modern GPUs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Analytical Performance Estimation during Code Generation on Modern GPUs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators