research-article

Iterative optimization in the polyhedral model: part ii, multidimensional time

Authors:

Louis-Noël Pouchet,

Cédric Bastoul,

John CavazosAuthors Info & Claims

PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 90 - 100

https://doi.org/10.1145/1375581.1375594

Published: 07 June 2008 Publication History

Abstract

High-level loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve in-depth program transformations that aim to sustain a balanced workload over the computational, storage, and communication resources of the target architecture. Therefore, it is mandatory that the compiler accurately models the target architecture as well as the effects of complex code restructuring.

However, because optimizing compilers (1) use simplistic performance models that abstract away many of the complexities of modern architectures, (2) rely on inaccurate dependence analysis, and (3) lack frameworks to express complex interactions of transformation sequences, they typically uncover only a fraction of the peak performance available on many applications. We propose a complete iterative framework to address these issues. We rely on the polyhedral model to construct and traverse a large and expressive search space. This space encompasses only legal, distinct versions resulting from the restructuring of any static control loop nest. We first propose a feedback-driven iterative heuristic tailored to the search space properties of the polyhedral model. Though, it quickly converges to good solutions for small kernels, larger benchmarks containing higher dimensional spaces are more challenging and our heuristic misses opportunities for significant performance improvement. Thus, we introduce the use of a genetic algorithm with specialized operators that leverage the polyhedral representation of program dependences. We provide experimental evidence that the genetic algorithm effectively traverses huge optimization spaces, achieving good performance improvements on large loop nests.

References

[1]

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'06), pages 295--305, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[2]

N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In ACM/IEEE Conf. on Supercomputing (SC'00), Dallas, TX, USA, Nov. 2000.

Digital Library

[3]

J. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, 2002.

Digital Library

[4]

D. Barthou, J.-F. Collard, and P. Feautrier. Fuzzy array dataflow analysis. J. of Parallel and Distributed Computing, 40:210--226, 1997.

Digital Library

[5]

C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Juan--les--Pins, France, Sept. 2004.

Digital Library

[6]

C. Bastoul and P. Feautrier. Improving data locality by chunking. In Intl. Conf. on Compiler Construction (ETAPS CC 12), volume 2622, pages 320--335, Warsaw, Poland, Apr. 2003.

Digital Library

[7]

A. Bernstein. Analysis of programs for parallel processing. IEEE Trans. on Electronic Computers, 15(5):757--763, Oct. 1966.

[8]

F. Bodin, T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, and E. Rohou. Iterative compilation in a non-linear optimisation space. In W. on Profile and Feedback Directed Compilation, Paris, Oct. 1998.

[9]

U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Intl. Conf. on Compiler Construction (ETAPS CC 17), Budapest, Hungary, Apr. 2008.

Digital Library

[10]

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelization and locality optimization system. In ACM SIGPLAN Conf. on Programming Languages Design and Implementation (PLDI'08), Tucson, AZ, USA, June 2008.

Digital Library

[11]

K. D. Cooper, A. Grosul, T. J. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. ACME: adaptive compilation made efficient. In ACM SIGLPAN/SIGBED Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES'05), pages 69--77, Chicago, IL, USA, 2005. ACM Press.

Digital Library

[12]

K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In Workshop on Languages, Compilers, and Tools for Embedded Systems, pages 1--9, Atlanta, GA, USA, July 1999. ACM Press.

Digital Library

[13]

K. D. Cooper, D. Subramanian, and L. Torczon. Adaptive optimizing compilers for the 21st century. J. Supercomputing, 23(1):7--22, 2002.

Digital Library

[14]

A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser, 2000.

Digital Library

[15]

P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22(3):243--268, 1988.

[16]

P. Feautrier. Some efficient solutions to the affine scheduling problem, part I: one dimensional time. Intl. J. of Parallel Programming, 21(5):313--348, Oct. 1992.

Digital Library

[17]

P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Intl. J. of Parallel Programming, 21(6):389--420, Dec. 1992.

Digital Library

[18]

S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 34(3), 2006.

Digital Library

[19]

D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co. Inc., Boston, MA, USA, 1989.

Digital Library

[20]

M. Haneda, P. M. W. Knijnenburg, and H. A. G. Wijshoff. Automatic selection of compiler options using non-parametric inferential statistics. In IEEE Intl.\ Conf.\ on Parallel Architectures and Compilation Techniques (PACT'05), pages 123--132, Saint Louis, MO, USA, 2005. IEEE Computer Society.

Digital Library

[21]

W. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, Univ. of Maryland, 1996.

Digital Library

[22]

W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Intl. Symp. on the frontiers of massively parallel computation, pages 332--341, McLean, VA, USA, Feb. 1995.

Digital Library

[23]

T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'00), pages 237--246, Philadelphia, PA, USA, 2000. IEEE Computer Society.

Digital Library

[24]

P. A. Kulkarni, S. R. Hines, D. B. Whalley, J. D. Hiser, J. W. Davidson, and D. L. Jones. Fast and efficient searches for effective optimization-phase sequences. ACM Trans. on Architecture and Code Optimization, 2(2):165--198, 2005.

Digital Library

[25]

M. Le Fur. Scanning parameterized polyhedron using Fourier-Motzkin elimination. Concurrency -- Practice and Experience, 8(6):445--460, 1996.

[26]

C. Lee. UTDSP benchmark suite, 1998. http://www.eecg.toronto.edu/char‘ corinna/DSP.

[27]

A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In ACM Symp. on Principles of Programming Languages (PoPL'97), pages 201--214, Paris, France, 1997. ACM Press.

Digital Library

[28]

S. Long and G. Fursin. Systematic search within an optimisation space based on unified transformation framework. IJCSE Intl. J. of Computational Science and Engineering, 2006.

Digital Library

[29]

S. Long and M. O'Boyle. Adaptive Java optimisation using instance-based learning. In ACM Intl. Conf. on Supercomputing (ICS'04), pages 237--246, Saint-Malo, France, June 2004.

Digital Library

[30]

A. Nisbet. GAPS: A compiler framework for genetic algorithm (GA) optimised parallelisation. In HPCN Europe 1998: Proc. of the Intl. Conf. and Exhibition on High-Performance Computing and Networking, pages 987--989, London, UK, 1998. Springer-Verlag.

Digital Library

[31]

M. Palkovič. Enhanced Applicability of Loop Transformations. PhD thesis, T.U. Eindhoven, The Netherlands, Sept. 2007.

[32]

S. Pop, A. Cohen, C. Bastoul, S. Girbal, P. Jouvelot, G.-A. Silber, and N. Vasilache. GRAPHITE: Loop optimizations based on the polyhedral model for GCC. In Proc. of the 4th GCC Developper's Summit, Ottawa, Canada, June 2006.

[33]

L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen. A note on the performance distribution of affine schedules. 2nd Workshop on Statistical and Machine learning approaches to ARchitectures and compilaTion (SMART'08), Göteborg, Sweden, Jan. 2008.

[34]

L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'07), pages 144--156, San Jose, CA, USA, Mar. 2007.

Digital Library

[35]

W. Pugh. The Omega test: a fast and practical integer programming algorithm for dependence analysis. In ACM Intl. Conf. on Supercomputing (ICS'91), pages 4--13, Albuquerque, NM, USA, Aug. 1991.

Digital Library

[36]

M. Püschel, B. Singer, J. Xiong, J. Moura, J. Johnson, D. Padua, M. Veloso, and R. W. Johnson. SPIRAL: A generator for platform-adapted libraries of signal processing algorithms. J. of High Performance Computing and Applications, special issue on Automatic Performance Tuning, 18(1):21--45, 2004.

Digital Library

[37]

F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. Intl. J. of Parallel Programming, 28(5):469--498, Oct. 2000.

Digital Library

[38]

L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout. Parameterized tiled loops for free. SIGPLAN Notices, Proc. of the 2007 PLDI Conf., 42(6):405--414, 2007.

Digital Library

[39]

A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986.

Digital Library

[40]

M. Stephenson, S. Amarasinghe, M. Martin, and U.-M. O'Reilly. Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Notices, 38(5):77--90, 2003.

Digital Library

[41]

S. Triantafyllis, M. Vachharajani, and D. I. August. Compiler optimization-space exploration. In J. of Instruction-level Parallelism, volume 7, Jan. 2005.

[42]

N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Proc. of the Intl. Conf. on Compiler Construction (ETAPS CC 16), volume 3923, pages 185--201, Vienna, Austria, Mar. 2006. Springer-Verlag.

Digital Library

[43]

N. Vasilache, A. Cohen, and L.-N. Pouchet. Automatic correction of loop transformations. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'07), pages 292--302, Brasov, Romania, Sept. 2007.

Digital Library

[44]

F. Vivien. On the optimality of Feautrier's scheduling algorithm. In Intl. Euro-Par Conf. on Parallel Processing (EURO--PAR'02), pages 299--308, London, UK, 2002. Springer-Verlag.

Digital Library

[45]

D. K. Wilde. A library for doing polyhedral operations. Technical Report 785, IRISA, Rennes, France, 1993.

[46]

M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, 1995.

Digital Library

[47]

J. Xue. Transformations of nested loops with non-convex iteration spaces. Parallel Computing, 22(3):339--368, 1996.

Digital Library

Cited By

Yang CAtkinson ECarbin M(2021)Simplifying dependent reductions in the polyhedral modelProceedings of the ACM on Programming Languages10.1145/34343015:POPL(1-33)Online publication date: 4-Jan-2021
https://dl.acm.org/doi/10.1145/3434301
Sachan AGhoshal B(2021)Learning based compilation of embedded applications targeting minimal energy consumption▪Journal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102116116:COnline publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1016/j.sysarc.2021.102116
Grosser TTheodoridis TFalkenstein MPitchanathan AKruse MRigger MSu ZHoefler T(2020)Fast linear programming through transprecision computing on small and sparse dataProceedings of the ACM on Programming Languages10.1145/34282634:OOPSLA(1-28)Online publication date: 13-Nov-2020
https://dl.acm.org/doi/10.1145/3428263
Show More Cited By

Index Terms

Iterative optimization in the polyhedral model: part ii, multidimensional time
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Iterative optimization in the polyhedral model: part ii, multidimensional time
PLDI '08

High-level loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve in-depth program transformations that aim to sustain a balanced workload over the ...
Single-dimension software pipelining for multidimensional loops

Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP), to software pipeline ...
Affine-by-Statement Transformations of Imperfectly Nested Loops
IPPS '96: Proceedings of the 10th International Parallel Processing Symposium

A majority of loop restructuring techniques developed so far assume that loops are perfectly nested. The unimodular approach unifies three individual transformations -- loop interchange, skewing and reversal -- but is still limited to perfect loop ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2008

396 pages

ISBN:9781595938602

DOI:10.1145/1375581

General Chair:
Rajiv Gupta
University of California, Riverside, USA
,
Program Chair:
Saman Amarasinghe
Massachusetts Institute of Technology, USA

ACM SIGPLAN Notices Volume 43, Issue 6
PLDI '08
June 2008
382 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1379022
Issue’s Table of Contents

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '08

Sponsor:

PLDI '08: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 7 - 13, 2008

AZ, Tucson, USA

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

109
Total Citations
View Citations
1,115
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)5

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang CAtkinson ECarbin M(2021)Simplifying dependent reductions in the polyhedral modelProceedings of the ACM on Programming Languages10.1145/34343015:POPL(1-33)Online publication date: 4-Jan-2021
https://dl.acm.org/doi/10.1145/3434301
Sachan AGhoshal B(2021)Learning based compilation of embedded applications targeting minimal energy consumption▪Journal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102116116:COnline publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1016/j.sysarc.2021.102116
Grosser TTheodoridis TFalkenstein MPitchanathan AKruse MRigger MSu ZHoefler T(2020)Fast linear programming through transprecision computing on small and sparse dataProceedings of the ACM on Programming Languages10.1145/34282634:OOPSLA(1-28)Online publication date: 13-Nov-2020
https://dl.acm.org/doi/10.1145/3428263
Kruse MFinkel HWu X(2020)Autotuning Search Space for Loop Transformations2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar)10.1109/LLVMHPCHiPar51896.2020.00007(12-22)Online publication date: Nov-2020
https://doi.org/10.1109/LLVMHPCHiPar51896.2020.00007
Martins do Rosario VFaustino da Silva AAparecida Silva Camacho TNapoli OBreternitz MBorin E(2020)Smart selection of optimizations in dynamic compilersConcurrency and Computation: Practice and Experience10.1002/cpe.608933:18Online publication date: 26-Nov-2020
https://doi.org/10.1002/cpe.6089
Ganser SGrößlinger ASiegmund NApel SLengauer C(2018)Speeding up Iterative Polyhedral Schedule Optimization with Surrogate Performance ModelsACM Transactions on Architecture and Code Optimization10.1145/329177315:4(1-27)Online publication date: 19-Dec-2018
https://dl.acm.org/doi/10.1145/3291773
Gong ZChen ZSzaday JWong DSura ZWatkinson NMaleki SPadua DVeidenbaum ANicolau ATorrellas J(2018)An empirical study of the effect of source-level loop transformations on compiler stabilityProceedings of the ACM on Programming Languages10.1145/32764962:OOPSLA(1-29)Online publication date: 24-Oct-2018
https://dl.acm.org/doi/10.1145/3276496
Kronawitter SLengauer C(2018)Polyhedral Search Space Exploration in the ExaStencils Code GeneratorACM Transactions on Architecture and Code Optimization10.1145/327465315:4(1-25)Online publication date: 10-Oct-2018
https://dl.acm.org/doi/10.1145/3274653
Ashouri AKillian WCavazos JPalermo GSilvano C(2018)A Survey on Compiler Autotuning using Machine LearningACM Computing Surveys10.1145/319797851:5(1-42)Online publication date: 18-Sep-2018
https://dl.acm.org/doi/10.1145/3197978
Wang ZO'Boyle M(2018)Machine Learning in Compiler OptimizationProceedings of the IEEE10.1109/JPROC.2018.2817118106:11(1879-1901)Online publication date: Nov-2018
https://doi.org/10.1109/JPROC.2018.2817118
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents