Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Public Access

Effective padding of multidimensional arrays to avoid cache conflict misses

Published: 02 June 2016 Publication History

Abstract

Caches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and lower performance, even if the working set is much smaller than cache capacity. Array padding (increasing the size of array dimensions) is a well-known optimization technique that can reduce conflict misses. In this paper, we develop the first algorithms for optimal padding of arrays aimed at a set-associative cache for arbitrary tile sizes. In addition, we develop the first solution to padding for nested tiles and multi-level caches. Experimental results with multiple benchmarks demonstrate a significant performance improvement from padding.

References

[1]
J. Ansel. Autotuning programs with algorithmic choice. PhD thesis, Massachusetts Institute of Technology, 2014.
[2]
J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-Kelley, J. Bosboom, U.-M. O’Reilly, and S. Amarasinghe. Open-Tuner: An extensible framework for program autotuning. In PACT’14, pages 303–316. ACM, 2014.
[3]
ATLAS. ATLAS homepage. http://math-atlas.sourceforge.net.
[4]
D. F. Bacon, J.-H. Chow, D.-c. R. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and tlb effectiveness. In CASCON’94. IBM Press, 1994.
[5]
J. Bilmes. PHiPAC: a portable, high-performance, ANSI C coding methodology. In ICS’97. ACM, 1997.
[6]
J. Douglas. Alternating direction methods for three space variables. Numerische Mathematik, 4(1):41–63, 1962.
[7]
J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. LCPC’92, pages 328–343, 1992.
[8]
M. Frigo. A fast Fourier transform compiler. In PLDI’99, pages 169–180. ACM, May 1999.
[9]
S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Trans. Program. Lang. Syst., 21(4):703–746, July 1999.
[10]
E. Herruzo, O. Plata, and E. L. Zapata. Using padding to optimize locality in scientific applications. In ICCS’08, pages 863–872. Springer, 2008.
[11]
C. Hong, W. Bao, A. Cohen, S. Krishnamoorthy, L.-N. Pouchet, F. Rastello, J. Ramanujam, and P. Sadayappan. Effective padding of multi-dimensional arrays to avoid cache conflict misses. Technical Report OSU-CISRC-4/16-TR2, Ohio State University, 2016.
[12]
Intel. Intel FFT length and layout advisor. https://software.intel.com/en-us/articles/ fft-length-and-layout-advisor,.
[13]
Intel. Intel Math Kernel Library. https://software.intel.com/en-us/intel-mkl,.
[14]
K. Ishizaka, M. Obata, and H. Kasahara. Cache optimization for coarse grain task parallel processing using inter-array padding. In LCPC’04, pages 64–76. Springer, 2004.
[15]
S. G. Johnson and M. Frigo. Implementing FFTs in practice. In C. S. Burrus, editor, Fast Fourier Transforms, chapter 11. Connexions, Rice University, Houston TX, September 2008.
[16]
M. Kowarschik and C. Weiss. An overview of cache optimization techniques and cache-aware numerical algorithms. In Algorithms for Memory Hierarchies, volume 2625 of LNCS, pages 213–232. Springer, 2003.
[17]
Z. Li and Y. Song. Automatic tiling of iterative stencil loops. ACM Trans. Program. Lang. Syst., 26(6):975–1028, Nov. 2004.
[18]
P. R. Panda, H. Nakamura, N. D. Dutt, and A. Nicolau. Augmenting loop tiling with data alignment for improved cache performance. IEEE Trans. on Computers, 48(2):142–149, 1999.
[19]
D. W. Peaceman and H. H. Rachford, Jr. The numerical solution of parabolic and elliptic differential equations. J. of the Society for Industrial and Applied Mathematics, 3(1):28– 41, 1955.
[20]
L.-N. Pouchet and T. Yuki. PolyBench/C 4.1. http://polybench.sourceforge.net.
[21]
G. Rivera and C.-W. Tseng. Tiling optimizations for 3D scientific computations. In SC’00, page 32. IEEE, 2000.
[22]
C. ¸Tăpu¸s, I.-H. Chung, J. K. Hollingsworth, et al. Active harmony: Towards automated performance tuning. In SC’02, pages 1–11. IEEE, 2002.
[23]
A. Tiwari, C. Chen, J. Chame, M. Hall, and J. K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In IPDPS’09., pages 1–12. IEEE, 2009.
[24]
R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1–2):3–35, 2001.

Cited By

View all
  • (2023)Adaptive Image Size Padding for Load Balancing in System-on-Chip Memory HierarchyElectronics10.3390/electronics1216339312:16(3393)Online publication date: 9-Aug-2023
  • (2023)Increasing FPGA Accelerators Memory Bandwidth With a Burst-Friendly Memory LayoutIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.320149442:5(1546-1559)Online publication date: 1-May-2023
  • (2023)High-performance strategies for the recent MRSF-TDDFT in GAMESSThe Journal of Chemical Physics10.1063/5.0148005158:19Online publication date: 15-May-2023
  • Show More Cited By

Index Terms

  1. Effective padding of multidimensional arrays to avoid cache conflict misses

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 51, Issue 6
    PLDI '16
    June 2016
    726 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2980983
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
    • cover image ACM Conferences
      PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2016
      726 pages
      ISBN:9781450342612
      DOI:10.1145/2908080
      • General Chair:
      • Chandra Krintz,
      • Program Chair:
      • Emery Berger
    © 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 June 2016
    Published in SIGPLAN Volume 51, Issue 6

    Check for updates

    Author Tags

    1. Array padding
    2. conflict misses
    3. direct-mapped cache
    4. set-associative cache
    5. tiling

    Qualifiers

    • Article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)462
    • Downloads (Last 6 weeks)50
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Adaptive Image Size Padding for Load Balancing in System-on-Chip Memory HierarchyElectronics10.3390/electronics1216339312:16(3393)Online publication date: 9-Aug-2023
    • (2023)Increasing FPGA Accelerators Memory Bandwidth With a Burst-Friendly Memory LayoutIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.320149442:5(1546-1559)Online publication date: 1-May-2023
    • (2023)High-performance strategies for the recent MRSF-TDDFT in GAMESSThe Journal of Chemical Physics10.1063/5.0148005158:19Online publication date: 15-May-2023
    • (2019)An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral CompilationACM Transactions on Architecture and Code Optimization10.1145/329344915:4(1-23)Online publication date: 8-Jan-2019
    • (2019)Efficient Cache Simulation for Affine ComputationsLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_6(65-85)Online publication date: 15-Nov-2019
    • (2018)Study of the padding effects in numerical reconstruction of digitally recorded hologramsOptik10.1016/j.ijleo.2018.05.033169(109-117)Online publication date: Sep-2018
    • (2018)Software Technology That Deals with Deeper Memory Hierarchy in Post-petascale EraAdvanced Software Technologies for Post-Peta Scale Computing10.1007/978-981-13-1924-2_12(227-248)Online publication date: 7-Dec-2018
    • (2017)Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigationProceedings of the International Conference on Supercomputing10.1145/3079079.3079080(1-11)Online publication date: 14-Jun-2017
    • (2017)Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand2017 IEEE 24th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2017.00017(62-71)Online publication date: Dec-2017
    • (2023)Data Recomputation for Multithreaded Applications2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323776(01-09)Online publication date: 28-Oct-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media