Abstract
Linear loop transformations and tiling are known to be very effective for enhancing locality of reference in perfectly-nested loops. However, they cannot be applied directly to imperfectly-nested loops. Some compilers attempt to convert imperfectly-nested loops into perfectly-nested loops by using statement sinking, loop fusion, etc., and then apply locality enhancing transformations to the resulting perfectly-nested loops, but the approaches used are fairly ad hoc and may fail even for simple programs. In this paper, we present a systematic approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of each statement into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop nest, so embedding generalizes techniques like statement sinking and loop fusion which are used in ad hoc ways in current compilers to produce perfectly-nested loops from imperfectly-nested ones. In contrast to these ad hoc techniques however, our embeddings are chosen carefully to enhance locality. The product space can itself be transformed to increase locality further, after which fully permutable loops can be tiled. The final code generation step may produce imperfectly-nested loops as output if that is desirable. We present experimental evidence for the effectiveness of this approach, using dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks.
Similar content being viewed by others
REFERENCES
C. Ancourt and F. Irigoin, Scanning Polyhedra with DO Loops, Principle and Practice of Parallel Progr., pp. 39 50 (April 1991).
E. Ayguadé and Jordi Torres, Partitioning the Statement per Iteration Space Using Nonsingular Matrices, ACM Inter. Conf. Supercomputing, Tokyo, pp. 407-415, (July 1993).
Uptal Banerjee, A Theory of Loop Permutations, Languages and Compilers for Parallel Computing, pp. 54-74 (1989).
Wei Li and Keshav Pingali, A Singular Loop Transformation Based on Nonsingular Matrices, IJPP, 22(2): xx-xx (April 1994).
J. Ramanujam and P. Sadayappan, Tiling multidimensional iteration spaces for multicomputers, J. Parallel Distributed Computing, 16(2):108-120 (October 1992).
M. E. Wolf and M. S. Lam, A Data Locality Optimizing Algorithm, SIGPLAN Conf. Progr. Lang. Design and Implementation (June 1991).
Gene Golub and Charles Van Loan, Matrix Computations, The Johns Hopkins University Press (1996).
Steve Carr and K. Kennedy, Compiler Blockability of Numerical Algorithms, Supercomputing (1992).
Yonghong Song and Zhiyuan Li, New Tiling Techniques to Improve Cache Temporal Locality, SIGPLAN Conf. Progr. Lang. Design and Implementation (June 1999).
Induprakas Kodukula, Keshav Pingali, Robert Cox, and Dror Maydan, Imperfectly Nested Loop Transformations for Memory Hierarchy Management, Intern. Conf. Supercomputing, Rhodes, Greece (June 1999).
K. Kennedy and K. S. McKinley, Optimizing for Parallelism and Data Locality, ACM Int. Conf. Supercomputing, ACM Press, Washington, D.C., pp. 323-334 (July 1992).
M. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley Publishing Company (1995).
Induprakas Kodukula, Nawaaz Ahmed, and Keshav Pingali, Data-Centric Multi-Level Blocking, Progr. Lang. Design and Implementation, ACM SIGPLAN (June 1997).
W. Li and K. Pingali, Access Normalization: Loop Restructuring for NUMA Compilers, ACM Trans. Computer Systems (1993).
William Pugh, Counting Solutions to Presburger Formulas: How and Why, Technical Report, University of Maryland (1993).
Phillipe Claus, Counting Solutions to Linear and Nonlinear Constraints Through Erhart Polynomials, ACM Int. Conf. Supercomputing, ACM (May 1996).
Stephanie Coleman and Kathryn S. McKinley, The Size Selection Using Cache Organization and Data Layout, ACM SIGPLAN conf. Progr. Lang. Design and Implementation (PLDI), ACM Press (June 1995).
S. Ghosh, M. Martonosi, and S. Malik, Cache Miss Equations: An Analytical Representation of Cache Misses, Proc. The 11th Int. Conf. Supercomputing (ICS-97), ACM Press, New York, pp. 317-324 (July 1997).
Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf, The Cache Performance and Optimizations of Blocked Algorithms, Fourth Int. Conf. Architectural Support for Progr. Lang. Operat. Syst., pp. 63-74 (April 1991).
Michael E. Wolf, Dror E. Maydan, and Ding-Kai Chen, Combining Loop Transformations Considering Caches and Scheduling, Silicon Graphics, Mountain View, California, MICRO 29, pp. 274-286 (1996).
S. Y. Kung, VLSI Array Processors, Prentice-Hall Inc. (1988).
Paul Feautrier, Some Efficient Solutions to the Affine Scheduling Problem-Part II: Multi-Dimensional time, I. J. P. P. (December 1992).
Wayne Kelly and William Pugh, Finding Legal Reordering Transformations Using Mappings, Proc. Seventh Int. Workshop of Lang. Compilers for Parallel Computing, Springer-Verlag, Ithaca, New York, pp. 107-124 (August 1994).
Amy Lim and Monica Lam, Maximizing Parallelism and Minimizing Synchronization with Affine Partitions, Parallel Computing 24:445-475 (1998).
Wayne Kelly and William Pugh, Selecting Affine Mappings Based on Performance Estimation, Parallel Processing Letters 4(3):205-209 (September 1994).
William Pugh and Evan Rosser, Iteration Space Slicing for Locality, Proc. 12th Int. Workshop of Languages and Compilers for Parallel Computing (LCPC99) (August 1999).
Nikolay Mateev, Keshav Pingali, Paul Stodghill, and Vladimir Kotlyar, Next-Generation Generic Programming and Its Application to Sparse Matrix Computations, Proc. Int. Conf. Supercomputing, Santa Fe, New Mexico (May 2000).
Nawaaz Ahmed, Nikolay Mateev, Keshav Pingali and Paul Stodghill, A Framework for Sparse Matrix Code Synthesis from High-Level Specifications, Proc. SC2000, Dallas, Texas (November 2000).
S. Chaterjee, V. Jain, A. Lebeck, S. Mundhra, and M. Thottethodi, Nonlinear Array Layouts for Hierarchical Memory Systems, Int. Conf. On Supercomputing (ICS'99) (June 1999).
F. G. Gustavson, Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms, IBM J. Res. Dev. 41(6):737-755 (November 1997).
Nawaaz Ahmed and Keshav Pingali, Automatic Generation of Block-Recursive Codes, Proc Euro-Par, Munich, Germany (August/September 2000).
Qing Yi, Vikram Adve, and Ken Kennedy, Transforming Loops to Recursion for Multi-Level Memory Hierarchies, Proc. ACM Sympos. Progr. Lang. Design and Implementation, Vancouver, Canada (June 2000).
Nikolay Mateev, Vijay Menon, and Keshav Pingali, Left-Looking to Right-Looking and Vice Versa: An Application of Fractal Symbolic Analysis to Linear Algebra Code Restructuring, Proc. Euro-Par, Munich, Germany (August/September 2000).
Nikolay Mateev, Vijay Menon, and Keshav Pingali, Fractal Symbolic Analysis for Program Transformations, ACM Int. Conf. Supercomputing (ICS), ACM (June 2001).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahmed, N., Mateev, N. & Pingali, K. Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests. International Journal of Parallel Programming 29, 493–544 (2001). https://doi.org/10.1023/A:1012293814832
Issue Date:
DOI: https://doi.org/10.1023/A:1012293814832