article

Free access

New tiling techniques to improve cache temporal locality

Authors:

Zhiyuan LiAuthors Info & Claims

ACM SIGPLAN Notices, Volume 34, Issue 5

Pages 215 - 228

https://doi.org/10.1145/301631.301668

Published: 01 May 1999 Publication History

Abstract

Tiling is a well-known loop transformation to improve temporal locality of nested loops. Current compiler algorithms for tiling are limited to loops which are perfectly nested or can be transformed, in trivial ways, into a perfect nest. This paper presents a number of program transformations to enable tiling for a class of nontrivial imperfectly-nested loops such that cache locality is improved. We define a program model for such loops and develop compiler algorithms for their tiling. We propose to adopt odd-even variable duplication to break anti- and output dependences without unduly increasing the working-set size, and to adopt speculative execution to enable tiling of loops which may terminate prematurely due to, e.g. convergence tests in iterative algorithms. We have implemented these techniques in a research compiler, Panorama. Initial experiments with several benchmark programs are performed on SGI workstations based on MIPS R5K and R10K processors. Overall, the transformed programs run faster by 9% to 164%.

References

[1]

J. M. Anderson, S. P. Amarasinghe and M. S. Lam. Data and computation transformations for multiprocessors. In Fifth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming, July 19-21, 1995.

Digital Library

[2]

David F. Bacon, Susan L. Graham and Oliver J. Sharp. Compiler transformations for high-performance computing. In ACM Computing Surveys, Vol. 26, No. 4, Dec. 1994.

Digital Library

[3]

W. Blume and R. Eigenmann. Symbolic range propagation. Proceedings of the 9th International Parallel Processing Symposium, April 1995.

Digital Library

[4]

Jean-Francios Collard. Space-time transformation of while-loops using speculative execution. In Proc. of the Scalable High-Performance Computing Conf., Knoxville, TN, pp. 429-436, May 1994.

[5]

J. Gu, Z. Li, and G. Lee. Experience with efficient array data flow analysis for array privatization. In Sixth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, June 1997.

Digital Library

[6]

Stephanie Coleman and Kathryn S. McKinley. Tile size selection using cache organization and data layout. In Proc. of the A CM SIGPLAN conference on Programming Language Design and Implementation, June 1995.

Digital Library

[7]

M. R. Haghighat. Symbolic Dependence Analysis for High Performance Parallelizing Compilers. Ph.D. thesis, CSRD Rpt No. 995, University of Illinois, May 1990.

[8]

M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. A matrix-based approach to the global locality optimization problem. In Proc. International Convergence on Parallel Architectures and Compilation Techniques (PACT'98), October 14-17,1998, Paris, France.

Digital Library

[9]

Induprakas Kodukula, Nawaaz Ahmed and Keshav Pingali. Data-centric multi-level blocking. In A CM SIGPLAN Conference on Programming Language Design and Implementation, Jun 1997.

Digital Library

[10]

Induprakas Kodukula, Keshav Pingali. Transformations of imperfectly nested loops. In Proc. Supercomputing, November 1996.

Digital Library

[11]

D. J. Kuck. The Structure of Computers and Computations, Volume 1. John Wiley & Sons, 1978.

Digital Library

[12]

Monica S. Lain, Edward E. Rothberg and Michael E. Wolf. The cache performance and optimizations of blocked algorithms. In Proc. of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63-74, Santa Clara, California, April 8-11, 1991.

Digital Library

[13]

Naraig Manjikian and Tarek S. Abdelrahman. Fusion of loops for parallelism and locality. In IEEB Transactions on Parallel and Distributed Systems,Vol. 8, No. 2, Feb 1997.

Digital Library

[14]

Karhryn S. McKinley, Steve Carr and Chau-Wen Tseng. Improving data locality with loop transformations, in A CM Transactions on Programming Languages and Systems, Vol. 18, No. 4, pp. 424-453, July 1996.

Digital Library

[15]

John McCalpin and David Wonnacott. Time Skewing: A Value-Based Approach to Optimizing for Memory Locality. In http://www, haverford, edu/cmsc/davew/cacheopt/cache- opt. html.

[16]

William Pugh. A Practical Algorithm for Exact Array Dependence Analysis. In Communications of the A CM, August, 1992.

Digital Library

[17]

W. Pugh, E. Rosser and T. Shpeisman. Exploiting Monotone Convergence Functions in Parallel Programs. Technical Report CS-TR-3636.1, University of Maryland, October 1996.

Digital Library

[18]

Gabriel Rivers and Chau-Wen Tseng. Eliminating Conflict Misses for High Performance Architectures. in Proc. of the 1998 ACM International Conference on Supercomputing, Melbourne, Australia, July 1998.

Digital Library

[19]

B. R. Rau and j. A. Fisher. Instruction-level parallel processing: History, overview and perspective. The Journal of Supercomputing, 7:9-50, 1993.

Digital Library

[20]

Standard Performance Evaluation Corporation, SPEC Newsletter, Vols. 1-9, 1989-1997.

[21]

Michelle Strout, Larry Carter, Jeanne Ferrante and Beth Simon. Schedule-independent storage mapping for loops. In Prof. of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 1998.

Digital Library

[22]

Michael E. Wolf, Dror E. Maydan and Ding-Kai Chen. Combining loop transformations considering caches and scheduling. In MICRO 29, pages 274-286, Mountain View, CA, 1996.

Digital Library

[23]

Michael E. Wolf. Improving Locality and Parallelism in Nested Loops. Ph.D. thesis, Stanford University, Aug. 1992.

Digital Library

[24]

Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Proc. of ACM SIGPLAN conference on Programming Language Design and Implementation, June 1991.

Digital Library

[25]

Michael Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.

Digital Library

Cited By

Zhang YLi KYuan LCheng JZhang YCao TYang M(2024)LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor CoresProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00059(1-17)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00059
Liu JYang XZhang ZLiu M(2024)A massive MPI parallel framework of smoothed particle hydrodynamics with optimized memory management for extreme mechanics problemsComputer Physics Communications10.1016/j.cpc.2023.108970295(108970)Online publication date: Feb-2024
https://doi.org/10.1016/j.cpc.2023.108970
Walker ANiemeyer K(2021)Applying the Swept Rule for Solving Two-Dimensional Partial Differential Equations on Heterogeneous ArchitecturesMathematical and Computational Applications10.3390/mca2603005226:3(52)Online publication date: 17-Jul-2021
https://doi.org/10.3390/mca26030052
Show More Cited By

Index Terms

New tiling techniques to improve cache temporal locality
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language features
        Concurrent programming structures

Recommendations

Automatic tiling of iterative stencil loops

Iterative stencil loops are used in scientific programs to implement relaxation methods for numerical simulation and signal processing. Such loops iteratively modify the same array elements over different time steps, which presents opportunities for the ...
New tiling techniques to improve cache temporal locality
PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation

Tiling is a well-known loop transformation to improve temporal locality of nested loops. Current compiler algorithms for tiling are limited to loops which are perfectly nested or can be transformed, in trivial ways, into a perfect nest. This paper ...
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests

Linear loop transformations and tiling are known to be very effective for enhancing locality of reference in perfectly-nested loops. However, they cannot be applied directly to imperfectly-nested loops. Some compilers attempt to convert imperfectly-...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 34, Issue 5

May 1999

304 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/301631

Editors:
Barbara G. Ryder
Rutgers Univ., New Brunswick, NJ
,
Benjamin G. Zorn
Univ. of Colorado, Boulder
,
A. Michael Berman
Rowan Univ., Glassboro, NJ

Issue’s Table of Contents

PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
May 1999
304 pages
ISBN:1581130945
DOI:10.1145/301618
Chairmen:
Barbara G. Ryder
Rutgers Univ., New Brunswick, NJ
,
Benjamin G. Zorn
Univ. of Colorado, Boulder, and Microsoft Research

Copyright © 1999 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1999

Published in SIGPLAN Volume 34, Issue 5

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

157
Total Citations
View Citations
1,301
Total Downloads

Downloads (Last 12 months)142
Downloads (Last 6 weeks)24

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YLi KYuan LCheng JZhang YCao TYang M(2024)LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor CoresProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00059(1-17)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00059
Liu JYang XZhang ZLiu M(2024)A massive MPI parallel framework of smoothed particle hydrodynamics with optimized memory management for extreme mechanics problemsComputer Physics Communications10.1016/j.cpc.2023.108970295(108970)Online publication date: Feb-2024
https://doi.org/10.1016/j.cpc.2023.108970
Walker ANiemeyer K(2021)Applying the Swept Rule for Solving Two-Dimensional Partial Differential Equations on Heterogeneous ArchitecturesMathematical and Computational Applications10.3390/mca2603005226:3(52)Online publication date: 17-Jul-2021
https://doi.org/10.3390/mca26030052
Kandemir MTang XZhao HRyoo JKarakoy MFreund SYahav E(2021)Distance-in-time versus distance-in-spaceProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454069(665-680)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454069
Kawai MIda AMatsuba HNakajima KBolten M(2020)Multiplicative Schwartz-Type Block Multi-Color Gauss-Seidel Smoother for Algebraic Multigrid MethodsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3368474.3368481(217-226)Online publication date: 15-Jan-2020
https://dl.acm.org/doi/10.1145/3368474.3368481
Sai RMellor-Crummey JMeng XAraya-Polo MMeng J(2020)Accelerating High-Order Stencils on GPUs2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)10.1109/PMBS51919.2020.00014(86-108)Online publication date: Nov-2020
https://doi.org/10.1109/PMBS51919.2020.00014
Korneev BLevchenko V(2019)Runge-Kutta Discontinuous Galerkin Method and DiamondTorre GPGPU Algorithm for Effective Simulation of Large 3D Multiphase Fluid Flows with Shocks2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)10.1109/SIBIRCON48586.2019.8958102(0817-0822)Online publication date: Oct-2019
https://doi.org/10.1109/SIBIRCON48586.2019.8958102
Mostafazadeh BMarti FLiu FChandramowlishwaran A(2018)Roofline Guided Design and Analysis of a Multi-stencil CFD Solver for Multicore Performance2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00085(753-762)Online publication date: May-2018
https://doi.org/10.1109/IPDPS.2018.00085
Siddique NGrubel PBadawy ACook J(2018)A performance study of the time-varying cache behaviorThe Journal of Supercomputing10.1007/s11227-017-2144-174:2(665-695)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1007/s11227-017-2144-1
Mostafazadeh Davani BMarti FPourghassemi BLiu FChandramowlishwaran A(2017)Unsteady Navier-Stokes Computations on GPU Architectures23rd AIAA Computational Fluid Dynamics Conference10.2514/6.2017-4508Online publication date: 2-Jun-2017
https://doi.org/10.2514/6.2017-4508
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents