article

Public Access

Effective padding of multidimensional arrays to avoid cache conflict misses

Authors:

Sriram Krishnamoorthy,

Louis-Noël Pouchet,

Fabrice Rastello,

P. SadayappanAuthors Info & Claims

ACM SIGPLAN Notices, Volume 51, Issue 6

Pages 129 - 144

https://doi.org/10.1145/2980983.2908123

Published: 02 June 2016 Publication History

Abstract

Caches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and lower performance, even if the working set is much smaller than cache capacity. Array padding (increasing the size of array dimensions) is a well-known optimization technique that can reduce conflict misses. In this paper, we develop the first algorithms for optimal padding of arrays aimed at a set-associative cache for arbitrary tile sizes. In addition, we develop the first solution to padding for nested tiles and multi-level caches. Experimental results with multiple benchmarks demonstrate a significant performance improvement from padding.

References

[1]

J. Ansel. Autotuning programs with algorithmic choice. PhD thesis, Massachusetts Institute of Technology, 2014.

[2]

J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-Kelley, J. Bosboom, U.-M. O’Reilly, and S. Amarasinghe. Open-Tuner: An extensible framework for program autotuning. In PACT’14, pages 303–316. ACM, 2014.

Digital Library

[3]

ATLAS. ATLAS homepage. http://math-atlas.sourceforge.net.

[4]

D. F. Bacon, J.-H. Chow, D.-c. R. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and tlb effectiveness. In CASCON’94. IBM Press, 1994.

Digital Library

[5]

J. Bilmes. PHiPAC: a portable, high-performance, ANSI C coding methodology. In ICS’97. ACM, 1997.

Digital Library

[6]

J. Douglas. Alternating direction methods for three space variables. Numerische Mathematik, 4(1):41–63, 1962.

Digital Library

[7]

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. LCPC’92, pages 328–343, 1992.

[8]

M. Frigo. A fast Fourier transform compiler. In PLDI’99, pages 169–180. ACM, May 1999.

Digital Library

[9]

S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Trans. Program. Lang. Syst., 21(4):703–746, July 1999.

Digital Library

[10]

E. Herruzo, O. Plata, and E. L. Zapata. Using padding to optimize locality in scientific applications. In ICCS’08, pages 863–872. Springer, 2008.

Digital Library

[11]

C. Hong, W. Bao, A. Cohen, S. Krishnamoorthy, L.-N. Pouchet, F. Rastello, J. Ramanujam, and P. Sadayappan. Effective padding of multi-dimensional arrays to avoid cache conflict misses. Technical Report OSU-CISRC-4/16-TR2, Ohio State University, 2016.

[12]

Intel. Intel FFT length and layout advisor. https://software.intel.com/en-us/articles/ fft-length-and-layout-advisor,.

[13]

Intel. Intel Math Kernel Library. https://software.intel.com/en-us/intel-mkl,.

[14]

K. Ishizaka, M. Obata, and H. Kasahara. Cache optimization for coarse grain task parallel processing using inter-array padding. In LCPC’04, pages 64–76. Springer, 2004.

Digital Library

[15]

S. G. Johnson and M. Frigo. Implementing FFTs in practice. In C. S. Burrus, editor, Fast Fourier Transforms, chapter 11. Connexions, Rice University, Houston TX, September 2008.

[16]

M. Kowarschik and C. Weiss. An overview of cache optimization techniques and cache-aware numerical algorithms. In Algorithms for Memory Hierarchies, volume 2625 of LNCS, pages 213–232. Springer, 2003.

[17]

Z. Li and Y. Song. Automatic tiling of iterative stencil loops. ACM Trans. Program. Lang. Syst., 26(6):975–1028, Nov. 2004.

Digital Library

[18]

P. R. Panda, H. Nakamura, N. D. Dutt, and A. Nicolau. Augmenting loop tiling with data alignment for improved cache performance. IEEE Trans. on Computers, 48(2):142–149, 1999.

Digital Library

[19]

D. W. Peaceman and H. H. Rachford, Jr. The numerical solution of parabolic and elliptic differential equations. J. of the Society for Industrial and Applied Mathematics, 3(1):28– 41, 1955.

[20]

L.-N. Pouchet and T. Yuki. PolyBench/C 4.1. http://polybench.sourceforge.net.

[21]

G. Rivera and C.-W. Tseng. Tiling optimizations for 3D scientific computations. In SC’00, page 32. IEEE, 2000.

Digital Library

[22]

C. ¸Tăpu¸s, I.-H. Chung, J. K. Hollingsworth, et al. Active harmony: Towards automated performance tuning. In SC’02, pages 1–11. IEEE, 2002.

Digital Library

[23]

A. Tiwari, C. Chen, J. Chame, M. Hall, and J. K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In IPDPS’09., pages 1–12. IEEE, 2009.

Digital Library

[24]

R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1–2):3–35, 2001.

Digital Library

Cited By

Kim SHur J(2023)Adaptive Image Size Padding for Load Balancing in System-on-Chip Memory HierarchyElectronics10.3390/electronics1216339312:16(3393)Online publication date: 9-Aug-2023
https://doi.org/10.3390/electronics12163393
Ferry CYuki TDerrien SRajopadhye S(2023)Increasing FPGA Accelerators Memory Bandwidth With a Burst-Friendly Memory LayoutIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.320149442:5(1546-1559)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TCAD.2022.3201494
Komarov KMironov VLee SPham BGordon MChoi C(2023)High-performance strategies for the recent MRSF-TDDFT in GAMESSThe Journal of Chemical Physics10.1063/5.0148005158:19Online publication date: 15-May-2023
https://doi.org/10.1063/5.0148005
Show More Cited By

Index Terms

Effective padding of multidimensional arrays to avoid cache conflict misses
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Effective padding of multidimensional arrays to avoid cache conflict misses
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Caches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and ...
Runtime identification of cache conflict misses: The adaptive miss buffer

This paper describes the miss classification table, a simple mechanism that enables the processor or memory controller to identify each cache miss as either a conflict miss or a capacity (non-conflict) miss. The miss classification table works by ...
Reducing traffic generated by conflict misses in caches
CF '04: Proceedings of the 1st conference on Computing frontiers

Off-chip memory accesses are a major source of power consumption in embedded processors. In order to reduce the amount of traffic between the processor and the off-chip memory as well as to hide the memory latency, nearly all embedded processors have a ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 51, Issue 6

PLDI '16

June 2016

726 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2980983

Editor:
Andy Gill
University of Kansas, Lawrence, KS

Issue’s Table of Contents

PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2016
726 pages
ISBN:9781450342612
DOI:10.1145/2908080
General Chair:
Chandra Krintz
University of California at Santa Barbara, USA
,
Program Chair:
Emery Berger
University of Massachusetts at Amherst, USA

Copyright © 2016 ACM.

© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2016

Published in SIGPLAN Volume 51, Issue 6

Check for updates

Author Tags

Qualifiers

Article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
2,095
Total Downloads

Downloads (Last 12 months)462
Downloads (Last 6 weeks)50

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim SHur J(2023)Adaptive Image Size Padding for Load Balancing in System-on-Chip Memory HierarchyElectronics10.3390/electronics1216339312:16(3393)Online publication date: 9-Aug-2023
https://doi.org/10.3390/electronics12163393
Ferry CYuki TDerrien SRajopadhye S(2023)Increasing FPGA Accelerators Memory Bandwidth With a Burst-Friendly Memory LayoutIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.320149442:5(1546-1559)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TCAD.2022.3201494
Komarov KMironov VLee SPham BGordon MChoi C(2023)High-performance strategies for the recent MRSF-TDDFT in GAMESSThe Journal of Chemical Physics10.1063/5.0148005158:19Online publication date: 15-May-2023
https://doi.org/10.1063/5.0148005
Sato YYuki TEndo T(2019)An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral CompilationACM Transactions on Architecture and Code Optimization10.1145/329344915:4(1-23)Online publication date: 8-Jan-2019
https://dl.acm.org/doi/10.1145/3293449
Bao WRawat PKong MKrishnamoorthy SPouchet LSadayappan P(2019)Efficient Cache Simulation for Affine ComputationsLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_6(65-85)Online publication date: 15-Nov-2019
https://doi.org/10.1007/978-3-030-35225-7_6
Hincapié-Zuluaga DHerrera-Ramírez JGarcia-Sucerquia J(2018)Study of the padding effects in numerical reconstruction of digitally recorded hologramsOptik10.1016/j.ijleo.2018.05.033169(109-117)Online publication date: Sep-2018
https://doi.org/10.1016/j.ijleo.2018.05.033
Endo TMidorikawa HSato Y(2018)Software Technology That Deals with Deeper Memory Hierarchy in Post-petascale EraAdvanced Software Technologies for Post-Peta Scale Computing10.1007/978-981-13-1924-2_12(227-248)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-981-13-1924-2_12
Jiang PAgrawal GGropp WBeckman PLi ZCazorla F(2017)Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigationProceedings of the International Conference on Supercomputing10.1145/3079079.3079080(1-11)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3079079.3079080
Li MLu XSubramoni HPanda D(2017)Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand2017 IEEE 24th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2017.00017(62-71)Online publication date: Dec-2017
https://doi.org/10.1109/HiPC.2017.00017
Akbulut GKandemir MKarakoy MChoi W(2023)Data Recomputation for Multithreaded Applications2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323776(01-09)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323776
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents