research-article

An adaptive performance modeling tool for GPU architectures

Authors:

Sara S. Baghsorkhi,

Matthieu Delahaye,

Sanjay J. Patel,

William D. Gropp,

Wen-mei W. HwuAuthors Info & Claims

ACM SIGPLAN Notices, Volume 45, Issue 5

Pages 105 - 114

https://doi.org/10.1145/1837853.1693470

Published: 09 January 2010 Publication History

Abstract

This paper presents an analytical model to predict the performance of

general-purpose applications on a GPU architecture. The model is designed to provide performance information to an auto-tuning compiler and assist it in narrowing down the search to the more promising implementations. It can also be incorporated into a tool to help programmers better assess the performance bottlenecks in their code. We analyze each GPU kernel and identify how the kernel exercises major GPU microarchitecture features. To identify the performance bottlenecks accurately, we introduce an abstract interpretation of a GPU kernel, work flow graph, based on which we estimate the execution time of a GPU kernel. We validated our performance model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). For this purpose, we used data parallel benchmarks that stress different GPU microarchitecture events such as uncoalesced memory accesses, scratch-pad memory bank conflicts, and control flow divergence, which must be accurately modeled but represent challenges to the analytical performance models. The proposed model captures full system complexity and shows high accuracy in predicting the performance trends of different optimized kernel implementations. We also describe our approach to extracting the performance model automatically from a kernel code.

References

[1]

ATI Stream Computing. http://developer.amd.com/gpu-assets/Stream-Computing-Overview.pdf.

[2]

M. Baskaran and R. Bordawekar. Optimizing Sparse Matrix-Vector multiplication on GPUs. IBM Research Report, December 2008.

[3]

M. Clement and M. Quinn. Analytical Performance Prediction on Multicomputers. In ACM/IEEE Conference on Supercomputing, November 1993.

Digital Library

[4]

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Trans. Program. Lang. Syst., pages 451--490, 1991.

Digital Library

[5]

T. Davis. University of Florida Sparse Matrix Collection. http://www.cise.u.edu/research/sparse/matrices/.

[6]

K. Fatahalian, J. Sugerman, and P. Hanrahan. Understanding the Efficiency of GPU Algorithms for Matrix-matrix Multiplication. In Conference on Graphics Hardware, August 2004.

Digital Library

[7]

J. Ferrante, K. J. Ottenstein, and J. D. Warren. The Program Dependence Graph and its Use in Optimization. ACM Trans. Program. Lang. Syst., pages 319--349, 1987.

Digital Library

[8]

N. K. Govindaraju, S. Larsen, J. Gray, and D. Manocha. A Memory Model for Scientific Algorithms on Graphics Processors. In ACM/IEEE Conference on Supercomputing, November 2006.

Digital Library

[9]

N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli. High performance discrete Fourier transforms on graphics processors. In ACM/IEEE Conference on Supercomputing, November 2008.

Digital Library

[10]

S. Hong and H. Kim. An model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness. In International Symposium on Computer Architecture, June 2009.

Digital Library

[11]

C. Jiang and M. Snir. Automatic Tuning Matrix Multiplication Performance on Graphics Hardware. In International Conference on Parallel Architectures and Compilation Techniques, September 2005.

Digital Library

[12]

W. Liu, W. Muller-Wittig, and B. Schmidt. Performance Predictions for General-Purpose Computation on GPUs. In International Conference on Parallel Processing, September 2007.

Digital Library

[13]

M. Kongstad. An Implementation of Global Value Numbering in the GNU Compiler Collection with Performance Measurements, October 2004.

[14]

G. Marin and J. Mellor-Crummey. Cross-architecture Performance Predictions for Scientific Applications Using Parameterized Models. In International conference on Measurement and modeling of computer systems, June 2004.

Digital Library

[15]

A. Munshi. The OpenCL Specification. http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf.

[16]

J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable Parallel Programming with CUDA. Queue, 6(2):40--53, 2008.

Digital Library

[17]

NVIDIA Corporation. NVIDIA CUDA Programming Guide: Version 1.0, June 2007.

[18]

Z. Pan and R. Eigenmann. Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning. In International Symposium on Code Generation and Optimization, March 2006.

Digital Library

[19]

S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S. Ueng, J. A. Stratton, and W. W. Hwu. Program Optimization Space Pruning for a Multithreaded GPU. In International Symposium on Code Generation and Optimization, April 2008.

Digital Library

[20]

D. Schaa and D. Kaeli. Exploring the Multi GPU Design Space. In International Symposium on Parallel and Distributed Processing, October 2009.

Digital Library

[21]

D. B. Whalley. Tuning High Performance Kernels through Empirical Compilation. In International Conference on Parallel Processing, June 2005.

Digital Library

Cited By

Mustafa DAlkhasawneh RObeidat FShatnawi A(2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3372990
Lemeire JCornelis JKonstantinidis E(2023)Analysis of the analytical performance models for GPUs and extracting the underlying Pipeline modelJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.11.002173(32-47)Online publication date: Mar-2023
https://doi.org/10.1016/j.jpdc.2022.11.002
Amaris MCamargo RCordeiro DGoldman ATrystram D(2023)Evaluating execution time predictions on GPU kernels using an analytical model and machine learning techniquesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.09.002171:C(66-78)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.jpdc.2022.09.002
Show More Cited By

Index Terms

An adaptive performance modeling tool for GPU architectures
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance ...
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance ...
An adaptive performance modeling tool for GPU architectures
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

This paper presents an analytical model to predict the performance of

general-purpose applications on a GPU architecture. The model is designed to provide performance information to an auto-tuning compiler and assist it in narrowing down the search to ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 45, Issue 5

PPoPP '10

May 2010

346 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1837853

Issue’s Table of Contents

PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
January 2010
372 pages
ISBN:9781605588773
DOI:10.1145/1693453
General Chairs:
R. Govindarajan
Indian Institute of Science
,
David Padua
UIUC
,
Program Chair:
Mary Hall
University of Utah

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 January 2010

Published in SIGPLAN Volume 45, Issue 5

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

230
Total Citations
View Citations
2,815
Total Downloads

Downloads (Last 12 months)141
Downloads (Last 6 weeks)18

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mustafa DAlkhasawneh RObeidat FShatnawi A(2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3372990
Lemeire JCornelis JKonstantinidis E(2023)Analysis of the analytical performance models for GPUs and extracting the underlying Pipeline modelJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.11.002173(32-47)Online publication date: Mar-2023
https://doi.org/10.1016/j.jpdc.2022.11.002
Amaris MCamargo RCordeiro DGoldman ATrystram D(2023)Evaluating execution time predictions on GPU kernels using an analytical model and machine learning techniquesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.09.002171:C(66-78)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.jpdc.2022.09.002
Mustafa D(2022)A Survey of Performance Tuning Techniques and Tools for Parallel ApplicationsIEEE Access10.1109/ACCESS.2022.314784610(15036-15055)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3147846
Horga ARezine AChattopadhyay SEles PPeng Z(2022)Symbolic identification of shared memory based bank conflicts for GPUsJournal of Systems Architecture10.1016/j.sysarc.2022.102518127(102518)Online publication date: Jun-2022
https://doi.org/10.1016/j.sysarc.2022.102518
Filipovič JHozzová JNezarat AOl'ha JPetrovič F(2022)Using hardware performance counters to speed up autotuning convergence on GPUsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.10.003160:C(16-35)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.jpdc.2021.10.003
Anastasiadis PPapadopoulou NGoumas GKoziris N(2021)CoCoPeLia: Communication-Computation Overlap Prediction for Efficient Linear Algebra on GPUs2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00015(36-47)Online publication date: Mar-2021
https://doi.org/10.1109/ISPASS51385.2021.00015
Condia JSonza Reorda M(2021)Modular Functional Testing: Targeting the Small Embedded Memories in GPUsVLSI-SoC: Design Trends10.1007/978-3-030-81641-4_10(205-233)Online publication date: 15-Jul-2021
https://doi.org/10.1007/978-3-030-81641-4_10
Alavani GSarkar S(2021)Performance modeling of graphics processing unit application using static and dynamic analysisConcurrency and Computation: Practice and Experience10.1002/cpe.660234:3Online publication date: 15-Sep-2021
https://doi.org/10.1002/cpe.6602
Braun LNikas SSong CHeuveline VFröning H(2020)A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU KernelsACM Transactions on Architecture and Code Optimization10.1145/343173118:1(1-25)Online publication date: 30-Dec-2020
https://dl.acm.org/doi/10.1145/3431731
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents