Abstract
This paper presents an approach to modeling loop transformations using linear algebra. Compound transformations are modeled as integer matrices. The nonsingular linear transformations presented here subsume the class of unimodular transformations. The loop transformations included are the unimodular transformations-reversal, skewing, and permutation- and a new transformation, namelystretching. Nonunimodular transformations (with determinant ≥ 1) create “holes” in the transformed iteration space, rendering code generation difficult. We solve this problem by suitably changing the step size of loops in order to “skip” these holes when traversing the transformed iteration space. For the class of nonunimodular loop transformations, we present algorithms for deriving the loop bounds, the array access expressions, and the step sizes of loops in the nest. To derive the step sizes, we compute the Hermite normal form of the transformation matrix; the step sizes are the entries on the diagonal of this matrix. We then use the theory of Hessenberg matrices in the derivation of exact loop bounds for nonunimodular transformations. We illustrate the use of this approach in several problems such as the generation of tile sets and distributed-memory code generation. This approach provides a framework for optimizing programs for a variety of architectures.
Similar content being viewed by others
References
Allen, J.R., and Kennedy, K. 1987. Automatic translation of FORTRAN programs to vector form.ACM Trans. Programming Languages and Systems, 9, 4 (Oct.): 491–542.
Ancourt, C., and Irigoin, F. 1991. Scanning polyhedra with DO loops. InProc., Third ACM SIGPLAN Symp. on the Principles & Practice of Parallel Programming (PPoPP), pp. 39–50.
Banerjee, U. 1988.Dependence Analysis for Supercomputing. Kluwer Academic, Boston.
Banerjee, U. 1991. Inimodular transformations of double loops. InAdvances in Languages and Compilers for Parallel Processing (A. Nicolau et al., eds.), MIT Press, pp. 192–219.
Barnett, M., and Lengauer, C. 1992. Loop parallelization and unimodularity. Rept. ECS-LFCS-92-197, Univ. of Edinburgh.
Dowling, M. 1988. Optimal code parallelisation using unimodular transformations. InPreprints in Optimization, Carolo-Wilhelmina Universität zu Braunschweig, Germany.
Heller, D. 1974. A determinant theorem with applications to parallel algorithms.SIAM J. Num. Anal., 11: 484–496.
Hiranandani, S., Kennedy, K., and Tseng, C. 1991. Compiler optimization for Fortran D on MIMD distributed memory machines. InProc. Supercomputing '91. pp. 86–100.
Irigoin, F., and Triolet, R. 1988. Supernode partitioning. InProc., 15th Annual ACM Symp. on the Principles of Programming Languages (San Diego, Jan.), pp. 319–329.
Li, W., and Pingali, K. 1992. A singular loop transformation framework based on non-singular matrices. InProc., 5th Workshop on Languages and Compilers for Parallel Computing.
Lu, L. 1991. A unified framework for systematic loop transformations. InProc., Third ACM SIGPLAN Symp. on the Principles & Practice of Parallel Programming (PPoPP), pp. 28–38.
Nemhauser, G., and Wolsey, L. 1988.Integer and Combinatorial Optimization. Wiley, New York.
Ramanujam, J. 1990. Compile-time techniques for parallel execution of loops on distributed memory multiprocessors. Ph.D. thesis, Ohio State Univ., Columbus, Oh.
Ramanujam, J. 1992. A linear algebraic view of loop transformations and their interaction. InProc., Fifth SIAM Conf. on Parallel Processing for Scientific Computing, pp. 543–548.
Ramanujam, J. 1994. Efficient code generation for loop transformations. Tech. Rept. TR-94-08-03, Dept. of Electr. and Comp. Eng., La. State Univ., Baton Rouge, La.
Ramanujam, J., and Sadayappan, P. 1992. Tiling multidimensional iteration spaces for nonshared memory machines.J. Parallel and Distr. Comp., 16, 2 (Oct.): 108–120.
Schreiber, R., and Dongarra, J. 1990. Automatic blocking of nested loops. Tech. rept., Univ. of Tenm., Knoxville (Aug.).
Schrijver, A. 1986.Theory of Linear and Integer Programming. Wiley, New York.
Wolf, M., and Lam, M. 1991. A loop transformation theory and an algorithm to maximize parallelism.IEEE Trans. Parallel and Distr. Syst., 2, 4 (Oct.): 452–471.
Wolfe, M. 1989a. More iteration space tiling. InProc., Supercomputing '89, pp. 655–664.
Wolfe, M. 1989b.Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, Mass.
Wolfe, M. 1990. Massive parallelism through program restructuring. Tech. Rept. CS/E90-009, Oregon Graduate Institute (June).
Wolfe, M., and Tseng, C. 1992. The power test for data dependence.IEEE Trans. Parallel and Distr. Syst., 3, 5 (Sept.): 591–601.
Zima, H., and Chapman, B. 1990.Supercompilers for Parallel and Vector Supercomputers, ACM Press Frontier Series.
Author information
Authors and Affiliations
Additional information
Supported in part by an NSF Young Investigator Award CCR-9457768, an NSF grant CCR-9210422, and by the Louisiana Board of Regents through contract LEQSF (1991–94)-RD-A-09.
Rights and permissions
About this article
Cite this article
Ramanujam, J. Beyond unimodular transformations. J Supercomput 9, 365–389 (1995). https://doi.org/10.1007/BF01206273
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01206273