Article

Tiling stencil computations to maximize parallelism

Authors:

SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1 - 11

https://doi.org/10.1109/SC.2012.107

Published: 10 November 2012 Publication History

Abstract

Most stencil computations allow tile-wise concurrent start, i.e., there always exists a face of the iteration space and a set of tiling hyperplanes such that all tiles along that face can be started concurrently. This provides load balance and maximizes parallelism. However, existing automatic tiling frameworks often choose hyperplanes that lead to pipelined start-up and load imbalance. We address this issue with a new tiling technique that ensures concurrent start-up as well as perfect load-balance whenever possible. We first provide necessary and sufficient conditions on tiling hyperplanes to enable concurrent start for programs with affine data accesses. We then provide an approach to find such hyperplanes. Experimental evaluation on a 12-core Intel Westmere shows that our code is able to outperform a tuned domain-specific stencil code generator by 4% to 27%, and previous compiler techniques by a factor of 2x to 10.14x.

Cited By

View all

Zhu Q(2024)FreeStencil: A Fine-Grained Solver Compiler with Graph and Kernel Optimizations on Structured Meshes for Modern GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673076(1022-1031)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673076
Plotnitskii PBeaurepaire LQu LAkbudak KLtaief HKeyes DEvans KSchenk O(2024)Leveraging the High Bandwidth of Last-Level Cache for HPC Seismic Imaging ApplicationsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3659914.3659936(1-13)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3659914.3659936
Del Sozzo EConficconi DSano K(2024)Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/363492017:2(1-33)Online publication date: 30-Apr-2024
https://dl.acm.org/doi/10.1145/3634920
Show More Cited By

Index Terms

Tiling stencil computations to maximize parallelism
1. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Index terms have been assigned to the content through auto-classification.

Recommendations

Tiling stencil computations to maximize parallelism
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Most stencil computations allow tile-wise concurrent start, i.e., there always exists a face of the iteration space and a set of tiling hyperplanes such that all tiles along that face can be started concurrently. This provides load balance and maximizes ...
Split tiling for GPUs: automatic parallelization using trapezoidal tiles
GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units

Tiling is a key technique to enhance data reuse. For computations structured as one sequential outer "time" loop enclosing a set of parallel inner loops, tiling only the parallel inner loops may not enable enough data reuse in the cache. Tiling the ...
Revisiting split tiling for stencil computations in polyhedral compilation
Abstract
Complex tile shapes maximize parallelism and locality of stencil computations by enabling tile-wise concurrent start, i.e., all tiles along a particular tiling direction of the iteration space can be started concurrently. We study split tiling—a ...

Comments

Information & Contributors

Information

Published In

SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis

November 2012

1139 pages

ISBN:9781467308069

Publisher

IEEE Computer Society

United States

Publication History

Published: 10 November 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhu Q(2024)FreeStencil: A Fine-Grained Solver Compiler with Graph and Kernel Optimizations on Structured Meshes for Modern GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673076(1022-1031)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673076
Plotnitskii PBeaurepaire LQu LAkbudak KLtaief HKeyes DEvans KSchenk O(2024)Leveraging the High Bandwidth of Last-Level Cache for HPC Seismic Imaging ApplicationsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3659914.3659936(1-13)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3659914.3659936
Del Sozzo EConficconi DSano K(2024)Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/363492017:2(1-33)Online publication date: 30-Apr-2024
https://dl.acm.org/doi/10.1145/3634920
Ahmad ZBrowne RChowdhury RDas RHuang YZhu YLee IChabbi MSteuwer M(2024)Fast American Option Pricing using Nonlinear StencilsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638506(316-332)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638506
Chen YLi KWang YBai DWang LMa LYuan LZhang YCao TYang MLee IChabbi MSteuwer M(2024)ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor CoresProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638476(333-347)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638476
Sundararajah KSaumya CKulkarni M(2022)UniRec: a unimodular-like framework for nested recursions and loopsProceedings of the ACM on Programming Languages10.1145/35633336:OOPSLA2(1264-1290)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3563333
Li KYuan LZhang YYue Yde Supinski BHall MGamblin T(2021)Reducing redundancy in data organization and arithmetic calculation for stencil computationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476154(1-15)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476154
Abdelaal KKong MZhou HMoreira JMueller FEtsion Y(2021)Tile size selection of affine programs for GPGPUs using polyhedral cross-compilationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460369(13-26)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3460369
Vatai ESinghal USuda R(2020)Diamond matrix powers kernelsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3368474.3368494(102-113)Online publication date: 15-Jan-2020
https://dl.acm.org/doi/10.1145/3368474.3368494
Seyfari YLotfi SKarimpour J(2018)Optimizing inter-nest data locality in imperfect stencils based on loop blockingThe Journal of Supercomputing10.5555/3288339.328836574:10(5432-5460)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.5555/3288339.3288365
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Index Terms

Recommendations