Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

Published: 01 February 2009 Publication History

Abstract

Stencil-based kernels constitute the core of many important scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. In this paper, we explore the impact of trends in memory subsystems on a variety of stencil optimization techniques and develop performance models to analytically guide our optimizations. Our work targets cache reuse methodologies across single and multiple stencil sweeps, examining cache-aware algorithms as well as cache-oblivious techniques on the Intel Itanium2, AMD Opteron, and IBM Power5. Additionally, we consider stencil computations on the heterogeneous multicore design of the Cell processor, a machine with an explicitly managed memory hierarchy. Overall our work represents one of the most extensive analyses of stencil optimizations and performance modeling to date. Results demonstrate that recent trends in memory system organization have reduced the efficacy of traditional cache-blocking optimizations. We also show that a cache-aware implementation is significantly faster than a cache-oblivious approach, while the explicitly managed memory on Cell enables the highest overall efficiency: Cell attains 88% of algorithmic peak while the best competing cache-based processor achieves only 54% of algorithmic peak performance.

Cited By

View all
  • (2024)BricksInternational Journal of High Performance Computing Applications10.1177/1094342024126828838:6(549-567)Online publication date: 1-Nov-2024
  • (2023)Exploiting temporal data reuse and asynchrony in the reverse time migrationInternational Journal of High Performance Computing Applications10.1177/1094342022112852937:2(132-150)Online publication date: 1-Mar-2023
  • (2023)Performance Portability Evaluation of Blocked Stencil Computations on GPUsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624177(1007-1018)Online publication date: 12-Nov-2023
  • Show More Cited By
  1. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image SIAM Review
    SIAM Review  Volume 51, Issue 1
    February 2009
    228 pages

    Publisher

    Society for Industrial and Applied Mathematics

    United States

    Publication History

    Published: 01 February 2009

    Author Tags

    1. AMD Opteron
    2. IBM Power5
    3. Intel Itanium2
    4. STI Cell
    5. cache blocking
    6. cache-oblivious algorithms
    7. performance evaluation
    8. performance modeling
    9. stencil computations
    10. time skewing

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)BricksInternational Journal of High Performance Computing Applications10.1177/1094342024126828838:6(549-567)Online publication date: 1-Nov-2024
    • (2023)Exploiting temporal data reuse and asynchrony in the reverse time migrationInternational Journal of High Performance Computing Applications10.1177/1094342022112852937:2(132-150)Online publication date: 1-Mar-2023
    • (2023)Performance Portability Evaluation of Blocked Stencil Computations on GPUsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624177(1007-1018)Online publication date: 12-Nov-2023
    • (2022)Scalable distributed high-order stencil computationsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571924(1-13)Online publication date: 13-Nov-2022
    • (2021)A runtime based comparison of highly tuned lattice Boltzmann and finite difference solversInternational Journal of High Performance Computing Applications10.1177/1094342021100616935:4(370-390)Online publication date: 1-Jul-2021
    • (2021)A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materialsInternational Journal of High Performance Computing Applications10.1177/109434202095942335:1(60-77)Online publication date: 1-Jan-2021
    • (2021)Fast Stencil Computations using Fast Fourier TransformsProceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3409964.3461803(8-21)Online publication date: 6-Jul-2021
    • (2021)Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architecturesThe Journal of Supercomputing10.1007/s11227-021-03823-377:11(13584-13600)Online publication date: 1-Nov-2021
    • (2020)Structured multi-block grid partitioning using balanced cut treesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.12.010138:C(139-152)Online publication date: 1-Apr-2020
    • (2020)Performance Modeling of Stencil Computation on SW26010 ProcessorsAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_27(386-400)Online publication date: 2-Oct-2020
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media