Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1111583.1111589acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmspConference Proceedingsconference-collections
Article

Impact of modern memory subsystems on cache optimizations for stencil computations

Published: 12 June 2005 Publication History

Abstract

In this work we investigate the impact of evolving memory system features, such as large on-chip caches, automatic prefetch, and the growing distance to main memory on 3D stencil computations. These calculations form the basis for a wide range of scientific applications from simple Jacobi iterations to complex multigrid and block structured adaptive PDE solvers. First we develop a simple benchmark to evaluate the effectiveness of prefetching in cache-based memory systems. Next we present a small parameterized probe and validate its use as a proxy for general stencil computations on three modern microprocessors. We then derive an analytical memory cost model for quantifying cache-blocking behavior and demonstrate its effectiveness in predicting the stencil-computation performance. Overall results demonstrate that recent trends memory system organization have reduced the efficacy of traditional cache-blocking optimizations.

References

[1]
S. Sellappa and S. Chatterjee, "Cache-efficient multigrid algorithms," International Journal of High Performance Computing Applications, vol. 18, no. 1, pp. 115--133, 2004.
[2]
G. Rivera and C. Tseng, "Tiling optimizations for 3d scientific computations," in Proceedings of SC'00, (Dallas, TX), Supercomputing 2000, November 2000.
[3]
A. Lim, S. Liao, and M. Lam, "Blocking and array contraction across arbitrarily nested loops using affine partitioning," in Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 2001.
[4]
D. Bailey, "Littleś law and high performance computing," RNR Technical Report, 1997.
[5]
J. McCalpin, "Memory bandwidth and machine balance in current high performance computers," IEEE TCAA Newsletter, December 1995.
[6]
"Chombo homepage." http://seesar.lbl.gov/anag/chombo/, 2004.
[7]
"Cactus Homepage." http://www.cactuscode.org, 2004.
[8]
W. Benger, I. Foster, J. Novotny, E. Seidel, J. Shalf, W. Smith, and P. Walker, "Numerical relativity in a distributed environment," in Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.
[9]
M. Alcubierre, G. Allen, B. Brgmann, E. Seidel, and W.-M. Suen, "Towards an understanding of the stability properties of the 3+1 evolution equations in general relativity," Phys. Rev. D, vol. (gr-qc/9908079), 2000.
[10]
J. A. Font, M. Miller, W. M. Suen, and M. Tobias, "Three dimensional numerical general relativistic hydrodynamics: Formulations, methods, and code tests," Phys. Rev. D, vol. Phys. Rev. D61, 2000.
[11]
"Performance API homepage." http://icl.cs.utk.edu/papi, 2005.
[12]
"CHUD homepage." http://developer.apple.com/tools/performance/, 2005.
[13]
Z. Li and Y. Song, "Automatic tiling of iterative stencil loops," ACM Trans. Program. Lang. Syst., vol. 26, no. 6, pp. 975--1028, 2004.
[14]
M. M. Strout, L. Carter, J. Ferrante, J. Freeman, and B. Kreaseck, "Combining performance aspects of irregular gauss-seidel via sparse tiling," in 15th Workshop on Languages and Compilers for Parallel Computing (LCPC), (College Park, Maryland), July 25-27, 2002.

Cited By

View all
  • (2023)A Fast Algorithm for Aperiodic Linear Stencil Computation using Fast Fourier TransformsACM Transactions on Parallel Computing10.1145/360633810:4(1-34)Online publication date: 24-Jul-2023
  • (2023)MerchandiserProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577497(204-217)Online publication date: 25-Feb-2023
  • (2023)Casper: Accelerating Stencil Computations Using Near-Cache ProcessingIEEE Access10.1109/ACCESS.2023.325200211(22136-22154)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MSP '05: Proceedings of the 2005 workshop on Memory system performance
June 2005
74 pages
ISBN:1595931473
DOI:10.1145/1111583
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache blocking
  2. performance modeling
  3. prefetch
  4. stencil

Qualifiers

  • Article

Conference

MSP05
MSP05: Memory Systems Performance Workshop
June 12, 2005
Illinois, Chicago

Acceptance Rates

Overall Acceptance Rate 6 of 20 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Fast Algorithm for Aperiodic Linear Stencil Computation using Fast Fourier TransformsACM Transactions on Parallel Computing10.1145/360633810:4(1-34)Online publication date: 24-Jul-2023
  • (2023)MerchandiserProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577497(204-217)Online publication date: 25-Feb-2023
  • (2023)Casper: Accelerating Stencil Computations Using Near-Cache ProcessingIEEE Access10.1109/ACCESS.2023.325200211(22136-22154)Online publication date: 2023
  • (2022)An Efficient Vectorization Scheme for Stencil Computation2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00069(650-660)Online publication date: May-2022
  • (2021)Fast Stencil Computations using Fast Fourier TransformsProceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3409964.3461803(8-21)Online publication date: 6-Jul-2021
  • (2021)Optimizing Stencil Codes with Exploiting Data Reuse2021 International Conference on Information Control, Electrical Engineering and Rail Transit (ICEERT)10.1109/ICEERT53919.2021.00018(45-54)Online publication date: Oct-2021
  • (2021)Performance Evaluation of Memory-Centric ARMv8 Many-Core Architectures: A Case Study with Phytium 2000+Journal of Computer Science and Technology10.1007/s11390-020-0741-636:1(33-43)Online publication date: 30-Jan-2021
  • (2021)FPGA‐based HPC accelerators: An evaluation on performance and energy efficiencyConcurrency and Computation: Practice and Experience10.1002/cpe.657034:20Online publication date: 22-Aug-2021
  • (2020)The Performance and Energy Efficiency Potential of FPGAs in Scientific Computing2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)10.1109/PMBS51919.2020.00007(8-19)Online publication date: Nov-2020
  • (2019)Reproducible stencil compiler benchmarks using prova! Future Generation Computer Systems10.1016/j.future.2018.05.02392:C(933-946)Online publication date: 1-Mar-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media