article

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

Authors:

Katherine YelickAuthors Info & Claims

SIAM Review, Volume 51, Issue 1

Pages 129 - 159

https://doi.org/10.1137/070693199

Published: 01 February 2009 Publication History

Abstract

Stencil-based kernels constitute the core of many important scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. In this paper, we explore the impact of trends in memory subsystems on a variety of stencil optimization techniques and develop performance models to analytically guide our optimizations. Our work targets cache reuse methodologies across single and multiple stencil sweeps, examining cache-aware algorithms as well as cache-oblivious techniques on the Intel Itanium2, AMD Opteron, and IBM Power5. Additionally, we consider stencil computations on the heterogeneous multicore design of the Cell processor, a machine with an explicitly managed memory hierarchy. Overall our work represents one of the most extensive analyses of stencil optimizations and performance modeling to date. Results demonstrate that recent trends in memory system organization have reduced the efficacy of traditional cache-blocking optimizations. We also show that a cache-aware implementation is significantly faster than a cache-oblivious approach, while the explicitly managed memory on Cell enables the highest overall efficiency: Cell attains 88% of algorithmic peak while the best competing cache-based processor achieves only 54% of algorithmic peak performance.

Cited By

View all

Heroux MLakshminarasimhan MAntepara OZhao TSepanski BBasu PJohansen HHall MWilliams S(2024)BricksInternational Journal of High Performance Computing Applications10.1177/1094342024126828838:6(549-567)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1177/10943420241268288
Qu LAbdelkhalak RLtaief HSaid IKeyes D(2023)Exploiting temporal data reuse and asynchrony in the reverse time migrationInternational Journal of High Performance Computing Applications10.1177/1094342022112852937:2(132-150)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1177/10943420221128529
Antepara OWilliams SJohansen HZhao THirsch SGoyal PHall M(2023)Performance Portability Evaluation of Blocked Stencil Computations on GPUsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624177(1007-1018)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624177
Show More Cited By

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

Impact of modern memory subsystems on cache optimizations for stencil computations
MSP '05: Proceedings of the 2005 workshop on Memory system performance

In this work we investigate the impact of evolving memory system features, such as large on-chip caches, automatic prefetch, and the growing distance to main memory on 3D stencil computations. These calculations form the basis for a wide range of ...
Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE

We are witnessing the consolidation of the heterogeneous computing in parallel computing with architectures such as Cell Broadband Engine (Cell BE) or Graphics Processing Units (GPUs) which are present in a myriad of developments for high performance ...
Automatic code generation and tuning for stencil kernels on modern shared memory architectures

In this paper, we present Patus, a code generation and auto-tuning framework for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units. Patus, which stands for " P arallel A uto tu ned S ...

Comments

Information & Contributors

Information

Published In

SIAM Review Volume 51, Issue 1

February 2009

228 pages

ISSN:0036-1445

Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 February 2009

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Heroux MLakshminarasimhan MAntepara OZhao TSepanski BBasu PJohansen HHall MWilliams S(2024)BricksInternational Journal of High Performance Computing Applications10.1177/1094342024126828838:6(549-567)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1177/10943420241268288
Qu LAbdelkhalak RLtaief HSaid IKeyes D(2023)Exploiting temporal data reuse and asynchrony in the reverse time migrationInternational Journal of High Performance Computing Applications10.1177/1094342022112852937:2(132-150)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1177/10943420221128529
Antepara OWilliams SJohansen HZhao THirsch SGoyal PHall M(2023)Performance Portability Evaluation of Blocked Stencil Computations on GPUsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624177(1007-1018)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624177
Jacquelin MAraya-Polo MMeng JWolf FShende SCulhane CAlam SJagode H(2022)Scalable distributed high-order stencil computationsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571924(1-13)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571924
Wichmann KKronbichler MLöhner RWall W(2021)A runtime based comparison of highly tuned lattice Boltzmann and finite difference solversInternational Journal of High Performance Computing Applications10.1177/1094342021100616935:4(370-390)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1177/10943420211006169
Wyrzykowski RDeelman EPieper AHager GFehske H(2021)A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materialsInternational Journal of High Performance Computing Applications10.1177/109434202095942335:1(60-77)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1177/1094342020959423
Ahmad ZChowdhury RDas RGanapathi PGregory AZhu YAgrawal KAzar Y(2021)Fast Stencil Computations using Fast Fourier TransformsProceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3409964.3461803(8-21)Online publication date: 6-Jul-2021
https://dl.acm.org/doi/10.1145/3409964.3461803
Zhang KSu HDou Y(2021)Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architecturesThe Journal of Supercomputing10.1007/s11227-021-03823-377:11(13584-13600)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1007/s11227-021-03823-3
Geiser GSchröder W(2020)Structured multi-block grid partitioning using balanced cut treesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.12.010138:C(139-152)Online publication date: 1-Apr-2020
https://dl.acm.org/doi/10.1016/j.jpdc.2019.12.010
Liu YLiu LHu MWang WXue WZhu Q(2020)Performance Modeling of Stencil Computation on SW26010 ProcessorsAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_27(386-400)Online publication date: 2-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-60245-1_27
Show More Cited By

Abstract

Cited By

Recommendations

Impact of modern memory subsystems on cache optimizations for stencil computations

Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations