Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Open access

Optimizing memory usage in the polyhedral model

Published: 01 September 2000 Publication History

Abstract

The polyhedral model provides a single unified foundation for systolic array synthesis and automatic parallelization of loop programs. We investigate the problem of memory reuse when compiling Alpha (a functional language based on this model). Direct compilation would require unacceptably large memory (for example O(n3) for matrix multiplication). Researchers have previously addressed the problem of memory reuse, and the analysis that this entails for projective memory allocations. This paper addresses, for a given schedule, the choice of the projections so as to minimize the volume of the residual memory. We prove tight bounds on the number of linearly independent projection vectors. Our method is constructive, yielding an optimal memory allocation. We extend the method to modular functions, and deal with the subsequent problems of code generation. Our ideas are illustrated on a number of examples generated by the current version of the Alpha compiler.

References

[1]
Aho, A. V., Sethi, R., and Ullman, J. D. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, USA.
[2]
Allen, J. R. and Kennedy, K. 1984. Automatic loop interchange. In Proceedings of the SIG- PLAN '84 Symposium on Compiler Construction. ACM Press, ACM Press, New York, NY 10036, USA, 233-246.
[3]
Arvind, Nikhil, R. S., and Pingali, K. K. 1989. I-structures: Data structures for parallel computing. ACM Transactions on Programming Languages and Systems 11, 4 (October), 598-632.
[4]
Banerjee, U. 1993. Loop Transformations for Restructuring Compilers: The Foundations. Norwell, Mass.: Kluwer Academic Publishers.
[5]
Chamski, Z. 1993. Environnement logiciel de programmation d'un accelerateur de calcul parallele. Ph.D. thesis, Universite de Rennes 1.
[6]
Darte, A. 1991. Regular partitioning for synthesizing fixed-size systolic arrays. Integration: The VLSI Journal 12, 293-304.
[7]
De Greef, E., Catthoor, F., and De Man, H. 1996. Reducing storage size for static control programs mapped to parallel architectures. presented at Dagstuhl Seminar on Loop Parallelization.
[8]
De Greef, E., Catthoor, F., and De Man, H. 1997. Memory size reduction through storage order optimization for embedded parallel multimedia applications. In Parallel Processing and Multimedia. Geneva, Switzerland.
[9]
Feautrier, P. 1988. Parametric integer programming. Rairo Recherche Operationnelle 22, 3 (Sept.), 243-268.
[10]
Feautrier, P. 1991. Data ow analysis of array and scalar references. International Journal of Parallel Programming 20, 1 (Feb), 23-53.
[11]
Feautrier, P. 1992. Some efficient solutions to the affine scheduling problem, part I, one dimensional time, and part II, multidimensional time. International Journal of Parallel Programming 21, 5-6 (Oct.), 313-348, 389-420.
[12]
Feo, J., Cann, D., and Oldehoeft, R. 1990. A report on the sisal language project. Journal of Parallel and Distributed Computing 10, 349-366.
[13]
Golub, G. H. and Van Loan, C. F. 1983. Matrix computations. The Johns Hopkins University Press.
[14]
Griebl, M. and Lengauer, C. 1996. The loop parallelizer LooPo. In Proc. Sixth Workshop on Compilers for Parallel Computers, M. Gerndt, Ed. Konferenzen des Forschungszentrums Julich, vol. 21. Forschungszentrum Julich, 311-320.
[15]
Hudak, P. and Bloss, A. 1985. The aggregate update problem in functional programming systems. In ACM Symposium on Principles of Programming Languages. ACM, New Orleans, LA, 300-314.
[16]
Karp, R., Miller, R., and Winograd, S. 1967. The organization of computations for uniform recurrence equations. Journal of the Association for Computing Machinery 14, 3 (July), 563- 590.
[17]
Keller, R. M. and Sleep, M. R. 1986. Applicative caching. ACM Transactions on Programming Languages and Systems 8, 1 (January), 88-108.
[18]
Knobe, K. and Dally, W. J. 1995. The subspace model: A theory of shapes for parallel systems. In 5th Workshop on Compilers for Parallel Computers. Malaga, Spain.
[19]
Le Verge, H., Mauras, C., and Quinton, P. 1991. The Alpha language and its use for the design of systolic arrays. Journal of VLSI Signal Processing 3, 3 (Sept.), 173-182.
[20]
Lefebvre, V. and Feautrier, P. 1997. Optimizing storage size for static control programs in automatic parallelizers. In Euro-Par'97, Lengauer, Griebl, and Gorlatch, Eds. Lecture Notes in Computer Science, vol. 1300. Springer-Verlag.
[21]
Li, J. and Chen, M. 1991. The data alignment phase in compiling programs for distributedmemorymachines. Journal of Parallel and Distributed Computing 13, 2 (Oct.), 213-221.
[22]
Loechner, V. and Mongenet, C. 1995. A toolbox for affine recurrence equations parallelization. Lecture Notes in Computer Science 919, 263-??
[23]
Mauras, C. 1989. Alpha : un langage equationnel pour la conception et la programmation d'architectures systoliques. Ph.D. thesis, Universite de Rennes1.
[24]
Mauras, C., Quinton, P., Rajopadhye, S., and Saouter, Y. 1990. Scheduling affine parameterized recurrences by means of variable dependent timing functions. In International Conference on Application Specific Array Processing,S. Y. KungandE. Swartzlander,Eds.IEEEComputer Society, Princeton, New Jersey, 100-110.
[25]
Maydan, D., Amarsinghe, S. P., and Lam, M. 1993. Array data ow analysis and its use in array privatization. In Principles of Programming Languages. ACM, 2-15.
[26]
Quiller e, F., Rajopadhye, S., and Wilde, D. 2000. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming 28, 5 (October), (in press).
[27]
Quinton, P., Rajopadhye, S., and Wilde, D. 1995a. Derivation of data parallel code from a functional program. In 9th International Parallel Processing Symposium. IEEE, Santa Barbara, CA, 766-772.
[28]
Quinton, P., Rajopadhye, S., and Wilde, D. 1995b. Deriving imperative code from functional programs. In 7th Conference on Functional Programming Languages and Computer Architecture. ACM, La Jolla, CA, 36-44.
[29]
Quinton, P. and Van Dongen, V. 1989. The mapping of linear recurrence equations on regular arrays. Journal of VLSI Signal Processing 1, 2, 95-113.
[30]
Rajopadhye, S. and Fujimoto, R. 1990. Synthesizing systolic arrays from recurrence equations. Parallel computing 14, 163-189.
[31]
Rajopadhye, S. and Wilde, D. 1993. Allocating memory arrays for polyhedra. PI 749, IRISA. July.
[32]
Schrijver, A. 1987. Theory of Linear and Integer Programming. John Wiley and Sons, New York.
[33]
Stewart, G. W. 1973. Introduction to matrix computations. AcademicPress.
[34]
Strout, M. M.,Carter, L., Ferrante, J., and Simon, B. 1998. Schedule-independent storage mapping for loops. ACM Sigplan Notices 33, 11 (Nov.), 24-33.
[35]
Wilde, D. 1993. A library for doing polyhedral operations. PI 785, Irisa. dec.
[36]
Wilde, D. and Rajopadhye, S. 1995. The naive execution of affine recurrence equations. In International Conference on Application-Specific Array Processors. IEEE, Strasbourg, France, 1-12.
[37]
Wilde, D. and Rajopadhye, S. 1996. Memory reuse analysis in the polyhedral model. In Euro- Par'96, Bouge, Fraignaud, Mignotte, and Robert, Eds. Lecture Notes in Computer Science, vol. 1123. Springer-Verlag, 389-397.
[38]
Wolfe, M. J. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley.

Cited By

View all
  • (2022)Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iterationACM Transactions on Architecture and Code Optimization10.1145/356605420:1(1-26)Online publication date: 16-Dec-2022
  • (2022)Lightweight Array Contraction by Trace-Based Polyhedral AnalysisHigh Performance Computing. ISC High Performance 2022 International Workshops10.1007/978-3-031-23220-6_2(20-32)Online publication date: 29-May-2022
  • (2021)Monoparametric Tiling of Polyhedral ProgramsInternational Journal of Parallel Programming10.1007/s10766-021-00694-249:3(376-409)Online publication date: 1-Jun-2021
  • Show More Cited By

Recommendations

Reviews

Herbert G. Mayer

Quilleré and Rajopadhye address the problem of memory size optimization when compiling Systems of Affine Recurrence Relations (SARE), defined over polyhedral index domains. The formalism to do this, detailed in the paper, is referred to as the polyhedral model. It is presented as being useful for loop parallelization and systolic synthesis. Alpha, a functional language developed by researchers addressing similar problems of parallel execution, is explained and used throughout the paper to illustrate key points and to prove feasibility claims. Section 2 characterizes Alpha by showing the source-to-source transformations from functional program to imperative C output. Section 3 presents the mathematical foundation for analysis and proofs that appear later in the text, focusing on the lifetime function. Section 4 defines the constraints that a memory function must satisfy to prepare for pseudo projections, which are shown in section 5. These pseudo projections allow significant memory savings. Proof of the tight bounds on the number of linearly independent projection vectors is shown in section 6, while section 7 discusses further, secondary level memory savings. Section 8 includes examples, and finally section 9 is intended to show the applicability to loop parallelization. Related work is summarized in section 10, pointing out weaknesses in SISAL and Haskell, while section 11 contains conclusions. Although direct compilation of functional languages such as Alpha requires unacceptably large storage, the authors show how this prohibitive requirement can be reduced. For example, the typical 2-D storage requirement for a local object can be scaled down to a small memory space, often a single cell or machine register. This is exemplified with the Fibonacci function. The core contribution of the paper is a generalization of this memory reduction and the accompanying mathematical formalization of how and when this reduction is possible. The authors contend that “the primary criterion for optimality is the number of linearly independent projection vectors.” There are notable weaknesses in the paper. For one thing, it is far too long for the volume and type of findings it describes. It is also heavily loaded with mathematical derivations, even when the point to be proven is not especially difficult or essential. More importantly, section 9 should have shown how the polyhedral model applies to parallel code generation, but that promise was not fulfilled. After all, the real goal of multiprocessing research is the simultaneous use of multiple computing resources for the benefit of fast execution, not how to save memory that was wasted due to some (single assignment or functional language) compute model. Several key ideas announced in the introduction are not expanded in the paper. For example, the relevance to systolic synthesis and adaptability to other areas of computing was only sketched. Automatic code emission for systolic systems is hard, so some principles for parallel code emission would have been welcome and would have fitted in well with the subject matter. The greatest disappointment is the assumption from the start by the authors that the parallel schedule for an algorithm is determined a priori, before transformations by Alpha start. Despite these shortcomings, the paper is a must-read for researchers and developers of parallel compute systems that use single-assignment or functional language models. However, given the enormous hunger for memory in functional languages, the paper actually scares me away and directs me toward more conventional, imperative programming environments. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems
ACM Transactions on Programming Languages and Systems  Volume 22, Issue 5
Sept. 2000
200 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/365151
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2000
Published in TOPLAS Volume 22, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. affine recurrence equations
  2. applicative (functional) languages
  3. automatic parallelization
  4. data-parallel languages
  5. dataflow analysis
  6. dependence analysis
  7. lifetime analysis
  8. memory management
  9. parallel code generation
  10. polyhedral model
  11. scheduling

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)152
  • Downloads (Last 6 weeks)23
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iterationACM Transactions on Architecture and Code Optimization10.1145/356605420:1(1-26)Online publication date: 16-Dec-2022
  • (2022)Lightweight Array Contraction by Trace-Based Polyhedral AnalysisHigh Performance Computing. ISC High Performance 2022 International Workshops10.1007/978-3-031-23220-6_2(20-32)Online publication date: 29-May-2022
  • (2021)Monoparametric Tiling of Polyhedral ProgramsInternational Journal of Parallel Programming10.1007/s10766-021-00694-249:3(376-409)Online publication date: 1-Jun-2021
  • (2021)A Structured Grid Solver with Polyhedral+Dataflow RepresentationLanguages and Compilers for Parallel Computing10.1007/978-3-030-72789-5_10(127-146)Online publication date: 26-Mar-2021
  • (2019)Tiramisu: a polyhedral compiler for expressing fast and portable codeProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314896(193-205)Online publication date: 16-Feb-2019
  • (2019)Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661197(193-205)Online publication date: Feb-2019
  • (2018)DeLICM: scalar dependence removal at zero memory costProceedings of the 2018 International Symposium on Code Generation and Optimization10.1145/3168815(241-253)Online publication date: 24-Feb-2018
  • (2017)Collaborative Intent Prediction with Real-Time Contextual DataACM Transactions on Information Systems10.1145/304165935:4(1-33)Online publication date: 16-Aug-2017
  • (2017)PPUACM Journal on Emerging Technologies in Computing Systems10.1145/299050213:3(1-29)Online publication date: 14-Apr-2017
  • (2016)Privacy Games Along Location TracesACM Transactions on Privacy and Security10.1145/300990819:4(1-31)Online publication date: 16-Dec-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media