Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICPADS.2011.29guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

Published: 07 December 2011 Publication History

Abstract

The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-dimensional problem that requires deep technical knowledge of GPU architecture. Although substantial literature exists on how to map and optimize GPU performance on the more mature NVIDIA CUDA architecture, the converse is true for OpenCL on an AMD GPU, such as the 1600-core AMD Radeon HD 5870 GPU. Consequently, we present and evaluate architecture-aware mapping and optimizations for the AMD GPU. The most prominent of which include (i) explicit use of registers, (ii) use of vector types, (iii) removal of branches, and (iv) use of image memory for global data. We demonstrate the efficacy of our AMD GPU mapping and optimizations by applying each in isolation as well as in concert to a large-scale, molecular modeling application called GEM. Via these AMD-specific GPU optimizations, our optimized OpenCL implementation on an AMD Radeon HD 5870 delivers more than a four-fold improvement in performance over the basic OpenCL implementation. In addition, it outperforms our optimized CUDA version on an NVIDIA GTX280 by 12%. Overall, we achieve a speedup of 371-fold over a serial but hand-tuned SSE version of our molecular modeling application, and in turn, a 46-fold speedup over an ideal scaling on an 8-core CPU.

Cited By

View all
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
  • (2019)On the Portability of CPU-Accelerated Applications via Automated Source-to-Source TranslationProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3293320.3293338(1-8)Online publication date: 14-Jan-2019
  • (2014)Accelerating a hydrological uncertainty ensemble model using graphics processing units (GPUs)Computers & Geosciences10.5555/2745549.274566162:C(178-186)Online publication date: 1-Jan-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
December 2011
1069 pages
ISBN:9780769545769

Publisher

IEEE Computer Society

United States

Publication History

Published: 07 December 2011

Author Tags

  1. AMD
  2. CUDA
  3. GPU
  4. NVIDIA
  5. OpenCL
  6. performance evaluation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
  • (2019)On the Portability of CPU-Accelerated Applications via Automated Source-to-Source TranslationProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3293320.3293338(1-8)Online publication date: 14-Jan-2019
  • (2014)Accelerating a hydrological uncertainty ensemble model using graphics processing units (GPUs)Computers & Geosciences10.5555/2745549.274566162:C(178-186)Online publication date: 1-Jan-2014
  • (2014)CoreTSARProceedings of the 29th International Conference on Supercomputing - Volume 848810.1007/978-3-319-07518-1_11(172-186)Online publication date: 22-Jun-2014
  • (2013)Evaluating the acceleration of typical scientific problems on the GPUProceedings of the South African Institute for Computer Scientists and Information Technologists Conference10.1145/2513456.2513473(17-26)Online publication date: 7-Oct-2013
  • (2013)Characterizing the challenges and evaluating the efficacy of a CUDA-to-OpenCL translatorParallel Computing10.1016/j.parco.2013.09.00339:12(769-786)Online publication date: 1-Dec-2013
  • (2013)Improving application behavior on heterogeneous manycore systems through kernel mappingParallel Computing10.1016/j.parco.2013.08.01139:12(867-878)Online publication date: 1-Dec-2013

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media