Abstract
This paper presents a new unified method for simultaneously tiling the register and cache levels of the memory hierarchy. We will only focus on the code transformation phase of tiling. Our algorithm uses strip-mining and loop interchange on all memory hierarchy levels to determine the tiles as usual, and, afterwards, and due to the special characteristics of the register level, we apply index set splitting, unrolling and scalar replacement to this level. After applying strip-mining, the iteration space is non-convex. To perform in a single step the loop interchange in non-convex iteration spaces, we use non-unimodular matrices. The order proposed to perform index set splitting to the loops guarantees that each loop in the nest has to be processed only once and also avoids code explosion.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
D. Callahan, S. Carr, K. Kennedy. Improving Register Allocation for Subscripted Variables. Int. Conf. on Programming Language Design and Implementation, June 1990, pp. 53–65
S. Carr. Memory-Hierarchy Management. Ph.D. Dissertation, Rice University, Feb 1993.
S. Carr, K. McKinley, C-W. Tseng. Compiler Optimizations for Improving Data Locality. Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Aug 1994, pp.252–262
A. Fernández, J.M. Llabería, M. Valero-García. Loop Transformation using non-unimodular matrices. IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 8, Aug 1995, pp. 832–840
M. Jiménez, J.M. Llabería, A. Fernández, E. Morancho. A Unified Transformation Technique for Multilevel Blocking. TR. UPC-DAC-1995-51, Dept. of Computer Architecture, Polytechnic University of Catalonia, Dec 1995.
M. Lam, E.Rothberg, M. Wolf. The Cache Performance and Optimizations of Blocked Algorithms. Int. Conf. on Architectural Support for Programming Languages and Operating Systems, 1991, pp. 63–74
J.J Navarro, T. Juan, T. Lang. MOB Forms: A Class of Multilevel Block Algorithms for Dense Linear Algebra Operations. Int. Conf. on Supercomputing, July 1994, pp. 354–363
M. Wolf. Improving Locality and Parallelism in Nested Loops. Technical Report CSL-TR-92-538, Stanford University, Aug 1992.
M. Wolf, M. Lam. A Data Optimizing Algorithm. Int. Conf. on Programming Language Design and Implementation, June 1991, pp. 30–44
M. Wolf, M. Lam. A Loop Transformation Theory and an Algorithm to Maximize Parallelism. IEEE Trans. on Parallel and Distributed System, Vol. 2, No. 4, October 1991, pp. 452–471
M. Wolfe. More Iteration Space Tiling. Int. Conf. on Supercomputing, 1989, pp. 655–664
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiménez, M., Llabería, J.M., Fernández, A., Morancho, E. (1996). A unified transformation technique for multilevel blocking. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds) Euro-Par'96 Parallel Processing. Euro-Par 1996. Lecture Notes in Computer Science, vol 1123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61626-8_53
Download citation
DOI: https://doi.org/10.1007/3-540-61626-8_53
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61626-9
Online ISBN: 978-3-540-70633-5
eBook Packages: Springer Book Archive