Hybrid hexagonal/classical tiling for GPUs

T Grosser, A Cohen, J Holewinski… - Proceedings of Annual …, 2014 - dl.acm.org
Proceedings of Annual IEEE/ACM International Symposium on Code Generation …, 2014dl.acm.org
Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical
hyper-rectangular tiles cannot be used due to the combination of backward and forward
dependences along space dimensions. Existing techniques trade temporal data reuse for
inefficiencies in other areas, such as load imbalance, redundant computations, or increased
control flow overhead, therefore making it challenging for use with GPUs. We propose a time-
tiling method for iterative stencil computations on GPUs. Our method does not involve …
Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical hyper-rectangular tiles cannot be used due to the combination of backward and forward dependences along space dimensions. Existing techniques trade temporal data reuse for inefficiencies in other areas, such as load imbalance, redundant computations, or increased control flow overhead, therefore making it challenging for use with GPUs.
We propose a time-tiling method for iterative stencil computations on GPUs. Our method does not involve redundant computations. It favors coalesced global-memory accesses, data reuse in local/shared-memory or cache, avoidance of thread divergence, and concurrency, combining hexagonal tile shapes along the time and one spatial dimension with classical tiling along the other spatial dimensions. Hexagonal tiles expose multi-level parallelism as well as data reuse. Experimental results demonstrate significant performance improvements over existing stencil compilers.
ACM Digital Library