Keyword: stencil computations : Search

Applied Filters

People

Publications

Conferences

Publication Date

21 Results for: Keyword: stencil computationsEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,728,963 records)|Limit your search to The ACM Full-Text Collection (748,894 records)

Showing 1 - 20of21 Results

Filters

Select All

Export Citations Save to Binder

per page:

Relevance

research-article
Open Access
April 2024
A shared compilation stack for distributed-memory parallelism in stencil DSLs
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3April 2024, Pages 38–56https://doi.org/10.1145/3620666.3651344

Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target ...
0
226
Metrics
Total Citations0
Total Downloads226
Last 12 Months226
Last 6 weeks79
View online with eReader
PDF
research-article
Open Access
September 2021
Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation
ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 4Article No.: 51, Pages 1–23https://doi.org/10.1145/3469030

Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As a result, optimizations relying on ...
19
3,016
Metrics
Total Citations19
Total Downloads3,016
Last 12 Months1,829
Last 6 weeks112
View online with eReader
HTML
PDF
research-article
Open Access
August 2020
FPDetect: Efficient Reasoning About Stencil Programs Using Selective Direct Evaluation
ACM Transactions on Architecture and Code Optimization (TACO), Volume 17, Issue 3Article No.: 19, Pages 1–27https://doi.org/10.1145/3402451

We present FPDetect, a low-overhead approach for detecting logical errors and soft errors affecting stencil computations without generating false positives. We develop an offline analysis that tightly estimates the number of floating-point bits ...
3
591
Metrics
Total Citations3
Total Downloads591
Last 12 Months106
Last 6 weeks19
1
Supplementary Material
a19-das-suppl.pdf
View online with eReader
HTML
PDF
research-article
Open Access
December 2019
Flextended Tiles: A Flexible Extension of Overlapped Tiles for Polyhedral Compilation
- Jie Zhao,
- Albert Cohen
ACM Transactions on Architecture and Code Optimization (TACO), Volume 16, Issue 4Article No.: 47, Pages 1–25https://doi.org/10.1145/3369382

Loop tiling to exploit data locality and parallelism plays an essential role in a variety of general-purpose and domain-specific compilers. Affine transformations in polyhedral frameworks implement classical forms of rectangular and parallelogram tiling,...
12
1,277
Metrics
Total Citations12
Total Downloads1,277
Last 12 Months243
Last 6 weeks41
View online with eReader
HTML
PDF
research-article
November 2018
Stencil codes on a vector length agnostic architecture
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation TechniquesNovember 2018, Article No.: 13, Pages 1–12https://doi.org/10.1145/3243176.3243192

Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manual ...
15
273
Metrics
Total Citations15
Total Downloads273
Last 12 Months53
Last 6 weeks11
Get Access
Upcoming Conferences

PACT '24

October 14 - 16, 2024

Hilton Long Beach, Long Beach, CA, USA

PACT '24 Website

SPLASH '24

October 20 - 25, 2024

Hilton Pasadena, Pasadena, CA, USA

SPLASH '24 Website

ASPLOS '25

March 30 - April 3, 2025

World Trade Center, Rotterdam, Netherlands

ASPLOS '25 Website
short-paper
June 2017
Stencil Autotuning with Ordinal Regression: Extended Abstract
SCOPES '17: Proceedings of the 20th International Workshop on Software and Compilers for Embedded SystemsJune 2017, Pages 72–75https://doi.org/10.1145/3078659.3078664

The increasing performance of today's computer architecture comes with an unprecedented augment of hardware complexity. Unfortunately this results in difficult-to-tune software and consequentially in a gap between the potential peak performance and the ...
2
77
Metrics
Total Citations2
Total Downloads77
Last 12 Months2
Last 6 weeks0
Get Access
research-article
Public Access
January 2017
Trade-Offs Between Synchronization, Communication, and Computation in Parallel Linear Algebra Computations
ACM Transactions on Parallel Computing (TOPC), Volume 3, Issue 1Article No.: 3, Pages 1–47https://doi.org/10.1145/2897188

This article derives trade-offs between three basic costs of a parallel algorithm: synchronization, data movement, and computational cost. These trade-offs are lower bounds on the execution time of the algorithm that are independent of the number of ...
13
739
Metrics
Total Citations13
Total Downloads739
Last 12 Months92
Last 6 weeks13
View online with eReader
PDF
research-article
Public Access
September 2016
Resource Conscious Reuse-Driven Tiling for GPUs
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationSeptember 2016, Pages 99–111https://doi.org/10.1145/2967938.2967967

Computations involving successive application of 3D stencil operators are widely used in many application domains, such as image processing, computational electromagnetics, seismic processing, and climate modeling. Enhancement of temporal and spatial ...
24
383
Metrics
Total Citations24
Total Downloads383
Last 12 Months56
Last 6 weeks20
View online with eReader
PDF
research-article
Public Access
March 2016
Effective resource management for enhancing performance of 2D and 3D stencils on GPUs
GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing UnitMarch 2016, Pages 92–102https://doi.org/10.1145/2884045.2884047

GPUs are an attractive target for data parallel stencil computations prevalent in scientific computing and image processing applications. Many tiling schemes, such as overlapped tiling and split tiling, have been proposed in past to improve the ...
16
695
Metrics
Total Citations16
Total Downloads695
Last 12 Months83
Last 6 weeks12
View online with eReader
PDF
research-article
June 2015
Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications
- Mohamed Wahib,
- Naoya Maruyama
HPDC '15: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed ComputingJune 2015, Pages 259–270https://doi.org/10.1145/2749246.2749255

This paper proposes an end-to-end framework for automatically transforming stencil-based CUDA programs to exploit inter-kernel data locality. The CUDA-to-CUDA transformation collectively replaces the user-written kernels by auto-generated kernels ...
15
292
Metrics
Total Citations15
Total Downloads292
Last 12 Months21
Last 6 weeks1
Get Access
research-article
Public Access
June 2015
Parameterized Diamond Tiling for Stencil Computations with Chapel parallel iterators
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingJune 2015, Pages 197–206https://doi.org/10.1145/2751205.2751226

Stencil computations figure prominently in the core kernels of many scientific computations, such as partial differential equation solvers. Parallel scaling of stencil computations can be significantly improved on multicore processors using advanced ...
26
553
Metrics
Total Citations26
Total Downloads553
Last 12 Months47
Last 6 weeks6
View online with eReader
PDF
Article
May 2015
Energy Modeling and Optimization for Tiled Nested-Loop Codes
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium WorkshopMay 2015, Pages 888–895https://doi.org/10.1109/IPDPSW.2015.94

We develop a methodology for modeling the energy efficiency of tiled nested-loop codes running on a graphics processing unit (GPU) and use it for energy efficiency optimization. % We use the polyhedral model, a We assume that a highly optimized and ...
0
Metrics
Total Citations0
article
April 2015
Optimizing the computation of a parallel 3D finite difference algorithm for graphics processing units
Concurrency and Computation: Practice & Experience (CCOMP), Volume 27, Issue 6April 2015, Pages 1591–1602https://doi.org/10.1002/cpe.3351

This paper explores the possibilities of using a graphics processing unit for complex 3D finite difference computation via MUSTA-FORCE and WENO algorithms. We propose a novel algorithm based on the new properties of CUDA surface memory optimized for 2D ...
1
Metrics
Total Citations1
research-article
January 2015
PLUTO+: near-complete modeling of affine transformations for parallelism and locality
- Aravind Acharya,
- Uday Bondhugula
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingJanuary 2015, Pages 54–64https://doi.org/10.1145/2688500.2688512

Affine transformations have proven to be very powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multi-dimensional affine function can represent a long and complex sequence of simpler ...
Also Published in:
ACM SIGPLAN Notices: Volume 50 Issue 8August 2015
18
491
Metrics
Total Citations18
Total Downloads491
Last 12 Months31
Last 6 weeks5
Get Access
research-article
January 2015
Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates
- T. Malas,
- G. Hager,
- H. Ltaief,
- H. Stengel,
- G. Wellein,
- D. Keyes
SIAM Journal on Scientific Computing (SISC), Volume 37, Issue 42015, Pages C439–C464https://doi.org/10.1137/140991133

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to ...
13
Metrics
Total Citations13
tutorial
October 2014
WOSC 2014: second workshop on optimizing stencil computations
SPLASH '14: Proceedings of the companion publication of the 2014 ACM SIGPLAN conference on Systems, Programming, and Applications: Software for HumanityOctober 2014, Pages 89–90https://doi.org/10.1145/2660252.2662138

The second Workshop on Optimizing Stencil Computations is held in Portland, Oregon, USA on October 20, 2014, as part of the 2014 ACM SIGPLAN conference on Systems, Programming Languages, and Applications: Software for Humanity (SPLASH). The workshop's ...
0
83
Metrics
Total Citations0
Total Downloads83
Last 12 Months0
Last 6 weeks0
Get Access
Article
May 2012
Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters
IPDPS '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing SymposiumMay 2012, Pages 544–556https://doi.org/10.1109/IPDPS.2012.57

Overlapping computations and communication is a key to accelerating stencil applications on parallel computers, especially for GPU clusters. However, such programming is a time-consuming part of the stencil application development. To address this ...
2
Metrics
Total Citations2
Article
July 2009
Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization
COMPSAC '09: Proceedings of the 2009 33rd Annual IEEE International Computer Software and Applications Conference - Volume 01July 2009, Pages 579–586https://doi.org/10.1109/COMPSAC.2009.82

We present a pipelined wavefront parallelization approach for stencil-based computations. Within a fixed spatial domain successive wavefronts are executed by threads scheduled to a multicore processor chip with a shared outer level cache. By re-using ...
29
Metrics
Total Citations29
article
February 2009
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
SIAM Review (SIREV), Volume 51, Issue 1February 2009, Pages 129–159https://doi.org/10.1137/070693199

Stencil-based kernels constitute the core of many important scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory ...
71
Metrics
Total Citations71
article
January 2009
Writing productive stencil codes with overlapped tiling
Concurrency and Computation: Practice & Experience (CCOMP), Volume 21, Issue 1January 2009, Pages 25–39

Stencil computations constitute the kernel of many scientific applications. Tiling is often used to improve the performance of stencil codes for data locality and parallelism. However, tiled stencil codes typically require shadow regions, whose management ...
9
Metrics
Total Citations9

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

A shared compilation stack for distributed-memory parallelism in stencil DSLs

Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation

FPDetect: Efficient Reasoning About Stencil Programs Using Selective Direct Evaluation

Flextended Tiles: A Flexible Extension of Overlapped Tiles for Polyhedral Compilation

Stencil codes on a vector length agnostic architecture

Upcoming Conferences

Stencil Autotuning with Ordinal Regression: Extended Abstract

Trade-Offs Between Synchronization, Communication, and Computation in Parallel Linear Algebra Computations

Resource Conscious Reuse-Driven Tiling for GPUs

Effective resource management for enhancing performance of 2D and 3D stencils on GPUs

Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications

Parameterized Diamond Tiling for Stencil Computations with Chapel parallel iterators

Energy Modeling and Optimization for Tiled Nested-Loop Codes

Optimizing the computation of a parallel 3D finite difference algorithm for graphics processing units

PLUTO+: near-complete modeling of affine transformations for parallelism and locality

Also Published in:

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

WOSC 2014: second workshop on optimizing stencil computations

Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters

Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

Writing productive stencil codes with overlapped tiling