article

Open access

Optimizing memory usage in the polyhedral model

Editor: William Pugh Authors:

Fabien Quilleré,

Sanjay RajopadhyeAuthors Info & Claims

ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 22, Issue 5

Pages 773 - 815

https://doi.org/10.1145/365151.365152

Published: 01 September 2000 Publication History

PDF eReader

Abstract

The polyhedral model provides a single unified foundation for systolic array synthesis and automatic parallelization of loop programs. We investigate the problem of memory reuse when compiling Alpha (a functional language based on this model). Direct compilation would require unacceptably large memory (for example O(n³) for matrix multiplication). Researchers have previously addressed the problem of memory reuse, and the analysis that this entails for projective memory allocations. This paper addresses, for a given schedule, the choice of the projections so as to minimize the volume of the residual memory. We prove tight bounds on the number of linearly independent projection vectors. Our method is constructive, yielding an optimal memory allocation. We extend the method to modular functions, and deal with the subsequent problems of code generation. Our ideas are illustrated on a number of examples generated by the current version of the Alpha compiler.

References

[1]

Aho, A. V., Sethi, R., and Ullman, J. D. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, USA.

Crossref

Google Scholar

[2]

Allen, J. R. and Kennedy, K. 1984. Automatic loop interchange. In Proceedings of the SIG- PLAN '84 Symposium on Compiler Construction. ACM Press, ACM Press, New York, NY 10036, USA, 233-246.

Crossref

Google Scholar

[3]

Arvind, Nikhil, R. S., and Pingali, K. K. 1989. I-structures: Data structures for parallel computing. ACM Transactions on Programming Languages and Systems 11, 4 (October), 598-632.

Crossref

Google Scholar

[4]

Banerjee, U. 1993. Loop Transformations for Restructuring Compilers: The Foundations. Norwell, Mass.: Kluwer Academic Publishers.

Crossref

Google Scholar

[5]

Chamski, Z. 1993. Environnement logiciel de programmation d'un accelerateur de calcul parallele. Ph.D. thesis, Universite de Rennes 1.

Google Scholar

[6]

Darte, A. 1991. Regular partitioning for synthesizing fixed-size systolic arrays. Integration: The VLSI Journal 12, 293-304.

Crossref

Google Scholar

[7]

De Greef, E., Catthoor, F., and De Man, H. 1996. Reducing storage size for static control programs mapped to parallel architectures. presented at Dagstuhl Seminar on Loop Parallelization.

Google Scholar

[8]

De Greef, E., Catthoor, F., and De Man, H. 1997. Memory size reduction through storage order optimization for embedded parallel multimedia applications. In Parallel Processing and Multimedia. Geneva, Switzerland.

Google Scholar

[9]

Feautrier, P. 1988. Parametric integer programming. Rairo Recherche Operationnelle 22, 3 (Sept.), 243-268.

Google Scholar

[10]

Feautrier, P. 1991. Data ow analysis of array and scalar references. International Journal of Parallel Programming 20, 1 (Feb), 23-53.

Google Scholar

[11]

Feautrier, P. 1992. Some efficient solutions to the affine scheduling problem, part I, one dimensional time, and part II, multidimensional time. International Journal of Parallel Programming 21, 5-6 (Oct.), 313-348, 389-420.

Crossref

Google Scholar

[12]

Feo, J., Cann, D., and Oldehoeft, R. 1990. A report on the sisal language project. Journal of Parallel and Distributed Computing 10, 349-366.

Crossref

Google Scholar

[13]

Golub, G. H. and Van Loan, C. F. 1983. Matrix computations. The Johns Hopkins University Press.

Google Scholar

[14]

Griebl, M. and Lengauer, C. 1996. The loop parallelizer LooPo. In Proc. Sixth Workshop on Compilers for Parallel Computers, M. Gerndt, Ed. Konferenzen des Forschungszentrums Julich, vol. 21. Forschungszentrum Julich, 311-320.

Crossref

Google Scholar

[15]

Hudak, P. and Bloss, A. 1985. The aggregate update problem in functional programming systems. In ACM Symposium on Principles of Programming Languages. ACM, New Orleans, LA, 300-314.

Crossref

Google Scholar

[16]

Karp, R., Miller, R., and Winograd, S. 1967. The organization of computations for uniform recurrence equations. Journal of the Association for Computing Machinery 14, 3 (July), 563- 590.

Crossref

Google Scholar

[17]

Keller, R. M. and Sleep, M. R. 1986. Applicative caching. ACM Transactions on Programming Languages and Systems 8, 1 (January), 88-108.

Crossref

Google Scholar

[18]

Knobe, K. and Dally, W. J. 1995. The subspace model: A theory of shapes for parallel systems. In 5th Workshop on Compilers for Parallel Computers. Malaga, Spain.

Google Scholar

[19]

Le Verge, H., Mauras, C., and Quinton, P. 1991. The Alpha language and its use for the design of systolic arrays. Journal of VLSI Signal Processing 3, 3 (Sept.), 173-182.

Crossref

Google Scholar

[20]

Lefebvre, V. and Feautrier, P. 1997. Optimizing storage size for static control programs in automatic parallelizers. In Euro-Par'97, Lengauer, Griebl, and Gorlatch, Eds. Lecture Notes in Computer Science, vol. 1300. Springer-Verlag.

Crossref

Google Scholar

[21]

Li, J. and Chen, M. 1991. The data alignment phase in compiling programs for distributedmemorymachines. Journal of Parallel and Distributed Computing 13, 2 (Oct.), 213-221.

Crossref

Google Scholar

[22]

Loechner, V. and Mongenet, C. 1995. A toolbox for affine recurrence equations parallelization. Lecture Notes in Computer Science 919, 263-??

Crossref

Google Scholar

[23]

Mauras, C. 1989. Alpha : un langage equationnel pour la conception et la programmation d'architectures systoliques. Ph.D. thesis, Universite de Rennes1.

Google Scholar

[24]

Mauras, C., Quinton, P., Rajopadhye, S., and Saouter, Y. 1990. Scheduling affine parameterized recurrences by means of variable dependent timing functions. In International Conference on Application Specific Array Processing,S. Y. KungandE. Swartzlander,Eds.IEEEComputer Society, Princeton, New Jersey, 100-110.

Google Scholar

[25]

Maydan, D., Amarsinghe, S. P., and Lam, M. 1993. Array data ow analysis and its use in array privatization. In Principles of Programming Languages. ACM, 2-15.

Crossref

Google Scholar

[26]

Quiller e, F., Rajopadhye, S., and Wilde, D. 2000. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming 28, 5 (October), (in press).

Crossref

Google Scholar

[27]

Quinton, P., Rajopadhye, S., and Wilde, D. 1995a. Derivation of data parallel code from a functional program. In 9th International Parallel Processing Symposium. IEEE, Santa Barbara, CA, 766-772.

Crossref

Google Scholar

[28]

Quinton, P., Rajopadhye, S., and Wilde, D. 1995b. Deriving imperative code from functional programs. In 7th Conference on Functional Programming Languages and Computer Architecture. ACM, La Jolla, CA, 36-44.

Crossref

Google Scholar

[29]

Quinton, P. and Van Dongen, V. 1989. The mapping of linear recurrence equations on regular arrays. Journal of VLSI Signal Processing 1, 2, 95-113.

Google Scholar

[30]

Rajopadhye, S. and Fujimoto, R. 1990. Synthesizing systolic arrays from recurrence equations. Parallel computing 14, 163-189.

Google Scholar

[31]

Rajopadhye, S. and Wilde, D. 1993. Allocating memory arrays for polyhedra. PI 749, IRISA. July.

Google Scholar

[32]

Schrijver, A. 1987. Theory of Linear and Integer Programming. John Wiley and Sons, New York.

Crossref

Google Scholar

[33]

Stewart, G. W. 1973. Introduction to matrix computations. AcademicPress.

Google Scholar

[34]

Strout, M. M.,Carter, L., Ferrante, J., and Simon, B. 1998. Schedule-independent storage mapping for loops. ACM Sigplan Notices 33, 11 (Nov.), 24-33.

Crossref

Google Scholar

[35]

Wilde, D. 1993. A library for doing polyhedral operations. PI 785, Irisa. dec.

Google Scholar

[36]

Wilde, D. and Rajopadhye, S. 1995. The naive execution of affine recurrence equations. In International Conference on Application-Specific Array Processors. IEEE, Strasbourg, France, 1-12.

Crossref

Google Scholar

[37]

Wilde, D. and Rajopadhye, S. 1996. Memory reuse analysis in the polyhedral model. In Euro- Par'96, Bouge, Fraignaud, Mignotte, and Robert, Eds. Lecture Notes in Computer Science, vol. 1123. Springer-Verlag, 389-397.

Crossref

Google Scholar

[38]

Wolfe, M. J. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley.

Crossref

Google Scholar

Cited By

View all

Zhao TPopoola THall MOlschanowsky CStrout M(2022)Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iterationACM Transactions on Architecture and Code Optimization10.1145/356605420:1(1-26)Online publication date: 16-Dec-2022
https://dl.acm.org/doi/10.1145/3566054
Thievenaz HKimura KAlias C(2022)Lightweight Array Contraction by Trace-Based Polyhedral AnalysisHigh Performance Computing. ISC High Performance 2022 International Workshops10.1007/978-3-031-23220-6_2(20-32)Online publication date: 29-May-2022
https://dl.acm.org/doi/10.1007/978-3-031-23220-6_2
Iooss GAlias CRajopadhye S(2021)Monoparametric Tiling of Polyhedral ProgramsInternational Journal of Parallel Programming10.1007/s10766-021-00694-249:3(376-409)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1007/s10766-021-00694-2
Show More Cited By

Index Terms

Optimizing memory usage in the polyhedral model
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Systolic arrays
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Replacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
Write-aware memory management for hybrid SLC-MLC PCM memory systems

In recent years, phase-change memory (PCM) has generated a great deal of interest because of its byte addressability and non-volatility properties. It is regarded as a good alternative storage medium that can reduce the performance gap between the main ...
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform
PMAM '12: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores

The advent of manycore in computing architecture causes severe energy consumption and memory wall problem. Thus, emerging technologies such as on-chip memory and nonvolatile memory (NVRAM) have led to a paradigm shift in computing architecture era. For ...

Reviews

Reviewer: Herbert G. Mayer

Quilleré and Rajopadhye address the problem of memory size optimization when compiling Systems of Affine Recurrence Relations (SARE), defined over polyhedral index domains. The formalism to do this, detailed in the paper, is referred to as the polyhedral model. It is presented as being useful for loop parallelization and systolic synthesis. Alpha, a functional language developed by researchers addressing similar problems of parallel execution, is explained and used throughout the paper to illustrate key points and to prove feasibility claims. Section 2 characterizes Alpha by showing the source-to-source transformations from functional program to imperative C output. Section 3 presents the mathematical foundation for analysis and proofs that appear later in the text, focusing on the lifetime function. Section 4 defines the constraints that a memory function must satisfy to prepare for pseudo projections, which are shown in section 5. These pseudo projections allow significant memory savings. Proof of the tight bounds on the number of linearly independent projection vectors is shown in section 6, while section 7 discusses further, secondary level memory savings. Section 8 includes examples, and finally section 9 is intended to show the applicability to loop parallelization. Related work is summarized in section 10, pointing out weaknesses in SISAL and Haskell, while section 11 contains conclusions. Although direct compilation of functional languages such as Alpha requires unacceptably large storage, the authors show how this prohibitive requirement can be reduced. For example, the typical 2-D storage requirement for a local object can be scaled down to a small memory space, often a single cell or machine register. This is exemplified with the Fibonacci function. The core contribution of the paper is a generalization of this memory reduction and the accompanying mathematical formalization of how and when this reduction is possible. The authors contend that “the primary criterion for optimality is the number of linearly independent projection vectors.” There are notable weaknesses in the paper. For one thing, it is far too long for the volume and type of findings it describes. It is also heavily loaded with mathematical derivations, even when the point to be proven is not especially difficult or essential. More importantly, section 9 should have shown how the polyhedral model applies to parallel code generation, but that promise was not fulfilled. After all, the real goal of multiprocessing research is the simultaneous use of multiple computing resources for the benefit of fast execution, not how to save memory that was wasted due to some (single assignment or functional language) compute model. Several key ideas announced in the introduction are not expanded in the paper. For example, the relevance to systolic synthesis and adaptability to other areas of computing was only sketched. Automatic code emission for systolic systems is hard, so some principles for parallel code emission would have been welcome and would have fitted in well with the subject matter. The greatest disappointment is the assumption from the start by the authors that the parallel schedule for an algorithm is determined a priori, before transformations by Alpha start. Despite these shortcomings, the paper is a must-read for researchers and developers of parallel compute systems that use single-assignment or functional language models. However, given the enormous hunger for memory in functional languages, the paper actually scares me away and directs me toward more conventional, imperative programming environments. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems

ACM Transactions on Programming Languages and Systems Volume 22, Issue 5

Sept. 2000

200 pages

ISSN:0164-0925

EISSN:1558-4593

DOI:10.1145/365151

Editor:
William Pugh
Univ. of Maryland, College Park

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2000

Published in TOPLAS Volume 22, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

67
Total Citations
View Citations
1,375
Total Downloads

Downloads (Last 12 months)152
Downloads (Last 6 weeks)23

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhao TPopoola THall MOlschanowsky CStrout M(2022)Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iterationACM Transactions on Architecture and Code Optimization10.1145/356605420:1(1-26)Online publication date: 16-Dec-2022
https://dl.acm.org/doi/10.1145/3566054
Thievenaz HKimura KAlias C(2022)Lightweight Array Contraction by Trace-Based Polyhedral AnalysisHigh Performance Computing. ISC High Performance 2022 International Workshops10.1007/978-3-031-23220-6_2(20-32)Online publication date: 29-May-2022
https://dl.acm.org/doi/10.1007/978-3-031-23220-6_2
Iooss GAlias CRajopadhye S(2021)Monoparametric Tiling of Polyhedral ProgramsInternational Journal of Parallel Programming10.1007/s10766-021-00694-249:3(376-409)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1007/s10766-021-00694-2
Davis EOlschanowsky CVan Straalen B(2021)A Structured Grid Solver with Polyhedral+Dataflow RepresentationLanguages and Compilers for Parallel Computing10.1007/978-3-030-72789-5_10(127-146)Online publication date: 26-Mar-2021
https://doi.org/10.1007/978-3-030-72789-5_10
Baghdadi RRay JRomdhane MDel Sozzo EAkkas AZhang YSuriana PKamil SAmarasinghe SKandemir MJimborean AMoseley T(2019)Tiramisu: a polyhedral compiler for expressing fast and portable codeProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314896(193-205)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.5555/3314872.3314896
Baghdadi RRay JRomdhane MSozzo EAkkas AZhang YSuriana PKamil SAmarasinghe S(2019)Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661197(193-205)Online publication date: Feb-2019
https://doi.org/10.1109/CGO.2019.8661197
Kruse MGrosser TKnoop JSchordan MJohnson TO'Boyle M(2018)DeLICM: scalar dependence removal at zero memory costProceedings of the 2018 International Symposium on Code Generation and Optimization10.1145/3168815(241-253)Online publication date: 24-Feb-2018
https://dl.acm.org/doi/10.1145/3168815
Sun YYuan NXie XMcDonald KZhang R(2017)Collaborative Intent Prediction with Real-Time Contextual DataACM Transactions on Information Systems10.1145/304165935:4(1-33)Online publication date: 16-Aug-2017
https://dl.acm.org/doi/10.1145/3041659
Golnari PYetim YMartonosi MVizel YMalik S(2017)PPUACM Journal on Emerging Technologies in Computing Systems10.1145/299050213:3(1-29)Online publication date: 14-Apr-2017
https://dl.acm.org/doi/10.1145/2990502
Shokri RTheodorakopoulos GTroncoso C(2016)Privacy Games Along Location TracesACM Transactions on Privacy and Security10.1145/300990819:4(1-31)Online publication date: 16-Dec-2016
https://dl.acm.org/doi/10.1145/3009908
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Enabling Hybrid PCM Memory System with Inherent Memory Management

Write-aware memory management for hybrid SLC-MLC PCM memory systems

Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations