Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

Published: 01 January 2011 Publication History

Abstract

The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing. At the core of this phenomenon are massively multithreaded, data-parallel architectures possessing impressive acceleration ratings, offering low-cost supercomputing together with attractive power budgets. Even given the numerous benefits provided by GPGPUs, there remain a number of barriers that delay wider adoption of these architectures. One major issue is the heterogeneous and distributed nature of the memory subsystem commonly found on data-parallel architectures. Application acceleration is highly dependent on being able to utilize the memory subsystem effectively so that all execution units remain busy. In this paper, we present techniques for enhancing the memory efficiency of applications on data-parallel architectures, based on the analysis and characterization of memory access patterns in loop bodies; we target vectorization via data transformation to benefit vector-based architectures (e.g., AMD GPUs) and algorithmic memory selection for scalar-based architectures (e.g., NVIDIA GPUs). We demonstrate the effectiveness of our proposed methods with kernels from a wide range of benchmark suites. For the benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4{\times} and 13.5{\times} over baseline GPU implementations on each platform, respectively) by applying our proposed methodology.

Cited By

View all
  • (2024)Combining Weight Approximation, Sharing and Retraining for Neural Network Model CompressionACM Transactions on Embedded Computing Systems10.1145/368746623:6(1-23)Online publication date: 11-Sep-2024
  • (2024)Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for ArraysProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645034(83-94)Online publication date: 7-May-2024
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 22, Issue 1
January 2011
191 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2011

Author Tags

  1. GPU computing
  2. General-purpose computation on GPUs (GPGPUs)
  3. data parallelism
  4. data-parallel architectures.
  5. memory access pattern
  6. memory coalescing
  7. memory optimization
  8. memory selection
  9. vectorization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Combining Weight Approximation, Sharing and Retraining for Neural Network Model CompressionACM Transactions on Embedded Computing Systems10.1145/368746623:6(1-23)Online publication date: 11-Sep-2024
  • (2024)Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for ArraysProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645034(83-94)Online publication date: 7-May-2024
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
  • (2023)On the Effects of Transaction Data Access Patterns on Performance in Lock-Based Concurrency ControlIEEE Transactions on Computers10.1109/TC.2022.322208472:6(1718-1732)Online publication date: 1-Jun-2023
  • (2022)Flatfish: A Reinforcement Learning Approach for Application-Aware Address MappingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.314620441:11(4758-4770)Online publication date: 1-Nov-2022
  • (2022)Low occupancy high performance elemental products in assembly free FEM on GPUEngineering with Computers10.1007/s00366-021-01350-638:Suppl 3(2189-2204)Online publication date: 1-Aug-2022
  • (2020)Data-parallel query processing on non-uniform dataProceedings of the VLDB Endowment10.14778/3380750.338075813:6(884-897)Online publication date: 11-Mar-2020
  • (2020)Evaluating Gather and Scatter Performance on CPUs and GPUsProceedings of the International Symposium on Memory Systems10.1145/3422575.3422794(209-222)Online publication date: 28-Sep-2020
  • (2020)Intelligent Data Placement on Discrete GPU Nodes with Unified MemoryProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414651(139-151)Online publication date: 30-Sep-2020
  • (2020)PAC: Paged Adaptive Coalescer for 3D-Stacked MemoryProceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3369583.3392670(137-148)Online publication date: 23-Jun-2020
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media