Abstract
Cache memories were invented to decouple fast processors from slow memories. However, this decoupling is only partial, and many researchers have attempted to improve cache use by program optimization. Potential benefits are significant since both energy dissipation and performance highly depend on the traffic between memory levels. But modeling the traffic is diffcult; this observation has led to the use of heuristic methods for steering program transformations. In this paper, we propose another approach: we simplify the cache model and we organize the target program in such a way that an asymptotic evaluation of the memory trafic is possible. This information is used by our optimization algorithm in order to find the best reordering of the program operations, at least in an asymptotic sense. Our method optimizes both temporal and spatial locality. It can be applied to any static control program with arbitrary dependences. The optimizer has been partially implemented and applied to non-trivial programs. We present experimental evidence that the amount of cache misses is drastically reduced with corresponding performance improvements.
Chapter PDF
Similar content being viewed by others
References
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK User’s Guide, Third Edition. SIAM, 1999.
U. Banerjee. Unimodular transformations of double loops. In Advances in Languages and Compilers for Parallel Processing, pages 192–219, Irvine, august 1990.
F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle. Custom memory managament methodology. Kluwer Academic, 1998.
S. Coleman and K. McKinley. Tile size selection using cache organization and data layout. In ACM SIGPLAN’95 Conference on Programming Language Design and Implementation, pages 279–290, La Jolla, june 1995.
P. Feautrier. Dataflow analysis of scalar and array references. International Journal of Parallel Programming, 20(1):23–53, february 1991.
P. Feautrier. Some efficient solutions to the affine scheduling problem, part I: one dimensional time. International Journal of Parallel Programming, 21(5):313–348, october 1992.
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memories management by global program transformation. Journal of Parallel and Distributed Computing, (5):587–616, 1988.
M. Kandemir, J. Ramanujam, and A. Choudhary. Improving cache locality by a combination of loop and data transformations. IEEE Transactions on Computers, 48(2):159–167, february 1999.
I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In ACM SIGPLAN’97 Conference on Programming Language Design and Implementation, pages 346–357, Las Vegas, june 1997.
D. Kuck. The Structure of Computers and Computations. John Wiley & Sons, Inc., 1978.
W. Li. Compiling for NUMA parallel machines. PhD thesis, Cornell Univ., 1993.
V. Loechner, B. Meister, and P. Clauss. Precise data locality optimization of nested loops. Journal of Supercomputing, 21(1):37–76, january 2002.
K. McKinley, S. Carr, and C. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, july 1996.
F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming, 28(5):469–498, october 2000.
M. Wolf and M. Lam. A data locality optimizing algorithm. In ACM SIGPLAN’91 Conference on Programming Language Design and Implementation, pages 30–44, New York, june 1991.
M. Wolfe. Iteration space tiling for memory hierarchies. In 3rd SIAM Conference on Parallel Processing for Scientific Computing, pages 357–361, december 1987.
J. Xue. Transformations of nested loops with non-convex iteration spaces. Parallel Computing, 22(3):339–368, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bastoul, C., Feautrier, P. (2003). Improving Data Locality by Chunking. In: Hedin, G. (eds) Compiler Construction. CC 2003. Lecture Notes in Computer Science, vol 2622. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36579-6_23
Download citation
DOI: https://doi.org/10.1007/3-540-36579-6_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00904-7
Online ISBN: 978-3-540-36579-2
eBook Packages: Springer Book Archive