research-article

Loop transformations: convexity, pruning and optimization

Authors:

Louis-Noël Pouchet,

Uday Bondhugula,

Cédric Bastoul,

Nicolas VasilacheAuthors Info & Claims

POPL '11: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

Pages 549 - 562

https://doi.org/10.1145/1926385.1926449

Published: 26 January 2011 Publication History

Abstract

High-level loop transformations are a key instrument in mapping computational kernels to effectively exploit the resources in modern processor architectures. Nevertheless, selecting required compositions of loop transformations to achieve this remains a significantly challenging task; current compilers may be off by orders of magnitude in performance compared to hand-optimized programs. To address this fundamental challenge, we first present a convex characterization of all distinct, semantics-preserving, multidimensional affine transformations. We then bring together algebraic, algorithmic, and performance analysis results to design a tractable optimization algorithm over this highly expressive space. Our framework has been implemented and validated experimentally on a representative set of benchmarks running on state-of-the-art multi-core platforms.

Supplementary Material

MP4 File (50-mpeg-4.mp4)

Download
389.89 MB

References

[1]

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In Proc. of the Intl. Symposium on Code Generation and Optimization (CGO'06), pages 295--305, Washington, 2006.

Digital Library

[2]

D. Barthou, J.-F. Collard, and P. Feautrier. Fuzzy array dataflow analysis. Journal of Parallel and Distributed Computing, 40:210--226, 1997.

Digital Library

[3]

C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Juan-les-Pins, France, Sept. 2004.

Digital Library

[4]

M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The polyhedral model is more widely applicable than you think. In Intl. Conf. on Compiler Construction (ETAPS CC'10), LNCS 6011, pages 283--303, Paphos, Cyprus, Mar. 2010.

Digital Library

[5]

F. Bodin, T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, and E. Rohou. Iterative compilation in a non-linear optimisation space. In W. on Profile and Feedback Directed Compilation, Paris, Oct. 1998.

[6]

U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International conference on Compiler Construction (ETAPS CC), Apr. 2008.

Digital Library

[7]

U. Bondhugula, O. Gunluk, S. Dash, and L. Renganarayanan. A model for fusion and code motion in an automatic parallelizing compiler. In Proc. of the 19th intl. conf. on Parallel Architectures and Compilation Techniques (PACT'10), pages 343--352. ACM press, 2010.

Digital Library

[8]

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2008.

Digital Library

[9]

C. Chen, J. Chame, and M. Hall. CHiLL: A framework for composing high-level loop transformations. Technical Report 08-897, U. of Southern California, 2008.

[10]

P. Clauss. Counting solutions to linear and nonlinear constraints through ehrhart polynomials: applications to analyze and transform scientific programs. In Proc. of the Intl. Conf. on Supercomputing, pages 278--285. ACM, 1996.

Digital Library

[11]

A. Cohen, S. Girbal, D. Parello, M. Sigler, O. Temam, and N. Vasilache. Facilitating the search for compositions of program transformations. In ACM International conference on Supercomputing, pages 151--160, June 2005.

Digital Library

[12]

A. Darte. On the complexity of loop fusion. Parallel Computing, pages 149--157, 1999.

[13]

A. Darte and G. Huard. Loop shifting for loop parallelization. Technical Report RR2000-22, ENS Lyon, May 2000.

[14]

A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Proc. Letters, 7(4):379--392, 1997.

[15]

P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22(3):243--268, 1988.

[16]

P. Feautrier. Dataflow analysis of scalar and array references. Intl. J. of Parallel Programming, 20(1):23--53, Feb. 1991.

[17]

P. Feautrier. Some efficient solutions to the affine scheduling problem, part I: one dimensional time. Intl. J. of Parallel Programming, 21(5):313--348, Oct. 1992.

Digital Library

[18]

P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Intl. J. of Parallel Programming, 21(6):389--420, Dec. 1992.

Digital Library

[19]

F. Franchetti, Y. Voronenko, and M. Püschel. Formal loop merging for signal transforms. In ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 315--326, 2005.

Digital Library

[20]

S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations. Intl. J. of Parallel Programming, 34(3):261--317, June 2006.

Digital Library

[21]

M. Griebl. Automatic parallelization of loop programs for distributed memory architectures. Habilitation thesis. Facultät für Mathematik und Informatik, Universität Passau, 2004.

[22]

A.-C. Guillou, F. Quilleré, P. Quinton, S. Rajopadhye, and T. Risset. Hardware design methodology with the Alpha language. In FDL'01, Lyon, France, Sept. 2001.

[23]

F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319--329, 1988.

Digital Library

[24]

W. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, Univ. of Maryland, 1996.

Digital Library

[25]

K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing, pages 301--320, 1993.

Digital Library

[26]

I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In ACM SIGPLAN'97 Conf. on Programming Language Design and Implementation, pages 346--357, Las Vegas, June 1997.

Digital Library

[27]

M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'08), pages 114--124. ACM Press, 2008.

Digital Library

[28]

R. Lethin, A. Leung, B. Meister, N. Vasilache, D. Wohlford, M. Baskaran, A. Hartono, and K. Datta. In D. Padua, editor, Encyclopedia of Parallel Computing. 1st edition., 2011, 50 p. in 4 volumes, not available separately., hardcover edition, June 2011.

[29]

K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst., 18(4):424--453, 1996.

Digital Library

[30]

N. Megiddo and V. Sarkar. Optimal weighted loop fusion for parallel programs. In symposium on Parallel Algorithms and Architectures, pages 282--291, 1997.

Digital Library

[31]

A. Nisbet. GAPS: A compiler framework for genetic algorithm (GA) optimised parallelisation. In HPCN Europe 1998: Proc. of the Intl. Conf. and Exhibition on High-Performance Computing and Networking, pages 987--989, London, UK, 1998. Springer-Verlag.

Digital Library

[32]

L.-N. Pouchet. Interative Optimization in the Polyhedral Model. PhD thesis, University of Paris-Sud 11, Orsay, France, Jan. 2010.

[33]

L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. Iterative optimization in the polyhedral model: Part II, multidimensional time. In ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'08), pages 90--100. ACM Press, 2008.

Digital Library

[34]

L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, and P. Sadayappan. Combined iterative and model-driven optimization in an automatic parallelization framework. In Conf. on SuperComputing (SC'10), New Orleans, LA, Nov. 2010. To appear.

Digital Library

[35]

A. Qasem and K. Kennedy. Profitable loop fusion and tiling using model-driven empirical search. In Proc. of the 20th Intl. Conf. on Supercomputing (ICS'06), pages 249--258. ACM press, 2006.

Digital Library

[36]

J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108--230, 1992.

[37]

M. Ren, J. Y. Park, M. Houston, A. Aiken, and W. J. Dally. A tuning framework for software-managed memory hierarchies. In Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'08), pages 280--291. ACM Press, 2008.

Digital Library

[38]

L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout. Parameterized tiled loops for free. SIGPLAN Notices, Proc. of the 2007 PLDI Conf., 42(6):405--414, 2007.

Digital Library

[39]

A. Schrijver. Theory of linear and integer programming. John Wiley & Sons, 1986.

Digital Library

[40]

S. Singhai and K. McKinley. A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal, 40(6):340--355, 1997.

[41]

N. J. A. Sloane. Sequence a000670. The On-Line Encyclopedia of Integer Sequences.

[42]

M. Stephenson, S. Amarasinghe, M. Martin, and U.-M. O'Reilly. Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Not., 38(5):77--90, 2003.

Digital Library

[43]

A. Tiwari, C. Chen, J. Chame, M. Hall, and J. K. Hollingsworth. A scalable autotuning framework for computer optimization. In IPDPS'09, Rome, May 2009.

Digital Library

[44]

N. Vasilache. Scalable Program Optimization Techniques in the Polyhedra Model. PhD thesis, University of Paris-Sud 11, 2007.

[45]

S. Verdoolaege, F. Catthoor, M. Bruynooghe, and G. Janssens. Feasibility of incremental translation. Technical Report CW 348, Katholieke Universiteit Leuven Department of Computer Science, Oct. 2002.

[46]

Y. Voronenko, F. de Mesmay, and M. Püschel. Computer generation of general size linear transform libraries. In Intl. Symp. on Code Generation and Optimization (CGO'09), Mar. 2009.

Digital Library

[47]

R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the atlas project. Parallel Computing, 27(1--2):3--35, 2001.

[48]

M. Wolf, D. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, pages 274--286, 1996.

Digital Library

[49]

M. Wolfe. More iteration space tiling. In Proceedings of Supercomputing '89, pages 655--664, 1989.

Digital Library

[50]

M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, 1995.

Digital Library

Cited By

Zhao JXu JDi PNie WHu JYi YYang SGeng ZZhang RLi BGan ZJin X(2024)Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine RelationsACM Transactions on Computer Systems10.1145/363530541:1-4(1-45)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3635305
Xu JSong GZhou BLi FHao JZhao JLee IChabbi MSteuwer M(2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638484
Jayaweera MKong MWang YKaeli D(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://doi.org/10.1109/CGO57630.2024.10444795
Show More Cited By

Index Terms

Loop transformations: convexity, pruning and optimization
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Loop transformations: convexity, pruning and optimization
POPL '11

High-level loop transformations are a key instrument in mapping computational kernels to effectively exploit the resources in modern processor architectures. Nevertheless, selecting required compositions of loop transformations to achieve this remains a ...
Loop Transformation Using Nonunimodular Matrices

Linear transformations are widely used to vectorize and parallelize loops. A subset of these transformations are unimodular transformations. When a unimodular transformation is used, the exact bounds of the transformed loop nest are easily computed and ...
Transformations techniques for extracting parallelism in non-uniform nested loops

Executing a program in parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. This yields to improve the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

POPL '11: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

January 2011

652 pages

ISBN:9781450304900

DOI:10.1145/1926385

General Chair:
Thomas Ball
Microsoft Research, USA
,
Program Chair:
Mooly Sagiv
Tel Aviv University, Israel

ACM SIGPLAN Notices Volume 46, Issue 1
POPL '11
January 2011
624 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1925844
Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

In-Cooperation

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 January 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

POPL '11

Sponsor:

SIGPLAN

POPL '11: The 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

January 26 - 28, 2011

Texas, Austin, USA

Acceptance Rates

Overall Acceptance Rate 824 of 4,130 submissions, 20%

Upcoming Conference

POPL '25

Sponsor:
sigplan

The 52nd Annual ACM SIGPLAN Symposium on Principles of Programming Languages

January 19 - 25, 2025

Denver , CO , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

118
Total Citations
View Citations
1,016
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)5

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao JXu JDi PNie WHu JYi YYang SGeng ZZhang RLi BGan ZJin X(2024)Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine RelationsACM Transactions on Computer Systems10.1145/363530541:1-4(1-45)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3635305
Xu JSong GZhou BLi FHao JZhao JLee IChabbi MSteuwer M(2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638484
Jayaweera MKong MWang YKaeli D(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://doi.org/10.1109/CGO57630.2024.10444795
Reber BGould MKneipp ALiu FPrechtl IDing CChen LPatru D(2023)Cache Programming for Scientific Loops Using LeasesACM Transactions on Architecture and Code Optimization10.1145/360009020:3(1-25)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3600090
Ding YWu JGao YWang MSo H(2023)Model-Platform Optimized Deep Neural Network Accelerator Generation through Mixed-Integer Geometric Programming2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM57271.2023.00018(83-93)Online publication date: May-2023
https://doi.org/10.1109/FCCM57271.2023.00018
Liu SWan XZhang ZZhao BWu W(2023)TurboStencil: You only compute once for stencil computationFuture Generation Computer Systems10.1016/j.future.2023.04.019146(260-272)Online publication date: Sep-2023
https://doi.org/10.1016/j.future.2023.04.019
Bastoul CZhang ZRazanajato HLossing NSusungi Ade Juan JFilhol EJarry BConsolaro GZhang R(2022)Optimizing GPU Deep Learning Operators with Polyhedral Scheduling Constraint Injection2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741260(313-324)Online publication date: 2-Apr-2022
https://doi.org/10.1109/CGO53902.2022.9741260
Kong M(2021)On the Impact of Affine Loop Transformations in Qubit AllocationACM Transactions on Quantum Computing10.1145/34654092:3(1-40)Online publication date: 30-Sep-2021
https://dl.acm.org/doi/10.1145/3465409
Kao HSan Miguel JEnright Jerger N(2021)Ghostwriter: A Cache Coherence Protocol for Error-Tolerant Applications50th International Conference on Parallel Processing Workshop10.1145/3458744.3474045(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3458744.3474045
Abdelaal KKong MZhou HMoreira JMueller FEtsion Y(2021)Tile size selection of affine programs for GPGPUs using polyhedral cross-compilationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460369(13-26)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3460369
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents