Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1926385.1926449acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
research-article

Loop transformations: convexity, pruning and optimization

Published: 26 January 2011 Publication History

Abstract

High-level loop transformations are a key instrument in mapping computational kernels to effectively exploit the resources in modern processor architectures. Nevertheless, selecting required compositions of loop transformations to achieve this remains a significantly challenging task; current compilers may be off by orders of magnitude in performance compared to hand-optimized programs. To address this fundamental challenge, we first present a convex characterization of all distinct, semantics-preserving, multidimensional affine transformations. We then bring together algebraic, algorithmic, and performance analysis results to design a tractable optimization algorithm over this highly expressive space. Our framework has been implemented and validated experimentally on a representative set of benchmarks running on state-of-the-art multi-core platforms.

Supplementary Material

MP4 File (50-mpeg-4.mp4)

References

[1]
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In Proc. of the Intl. Symposium on Code Generation and Optimization (CGO'06), pages 295--305, Washington, 2006.
[2]
D. Barthou, J.-F. Collard, and P. Feautrier. Fuzzy array dataflow analysis. Journal of Parallel and Distributed Computing, 40:210--226, 1997.
[3]
C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Juan-les-Pins, France, Sept. 2004.
[4]
M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The polyhedral model is more widely applicable than you think. In Intl. Conf. on Compiler Construction (ETAPS CC'10), LNCS 6011, pages 283--303, Paphos, Cyprus, Mar. 2010.
[5]
F. Bodin, T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, and E. Rohou. Iterative compilation in a non-linear optimisation space. In W. on Profile and Feedback Directed Compilation, Paris, Oct. 1998.
[6]
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International conference on Compiler Construction (ETAPS CC), Apr. 2008.
[7]
U. Bondhugula, O. Gunluk, S. Dash, and L. Renganarayanan. A model for fusion and code motion in an automatic parallelizing compiler. In Proc. of the 19th intl. conf. on Parallel Architectures and Compilation Techniques (PACT'10), pages 343--352. ACM press, 2010.
[8]
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2008.
[9]
C. Chen, J. Chame, and M. Hall. CHiLL: A framework for composing high-level loop transformations. Technical Report 08-897, U. of Southern California, 2008.
[10]
P. Clauss. Counting solutions to linear and nonlinear constraints through ehrhart polynomials: applications to analyze and transform scientific programs. In Proc. of the Intl. Conf. on Supercomputing, pages 278--285. ACM, 1996.
[11]
A. Cohen, S. Girbal, D. Parello, M. Sigler, O. Temam, and N. Vasilache. Facilitating the search for compositions of program transformations. In ACM International conference on Supercomputing, pages 151--160, June 2005.
[12]
A. Darte. On the complexity of loop fusion. Parallel Computing, pages 149--157, 1999.
[13]
A. Darte and G. Huard. Loop shifting for loop parallelization. Technical Report RR2000-22, ENS Lyon, May 2000.
[14]
A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Proc. Letters, 7(4):379--392, 1997.
[15]
P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22(3):243--268, 1988.
[16]
P. Feautrier. Dataflow analysis of scalar and array references. Intl. J. of Parallel Programming, 20(1):23--53, Feb. 1991.
[17]
P. Feautrier. Some efficient solutions to the affine scheduling problem, part I: one dimensional time. Intl. J. of Parallel Programming, 21(5):313--348, Oct. 1992.
[18]
P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Intl. J. of Parallel Programming, 21(6):389--420, Dec. 1992.
[19]
F. Franchetti, Y. Voronenko, and M. Püschel. Formal loop merging for signal transforms. In ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 315--326, 2005.
[20]
S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations. Intl. J. of Parallel Programming, 34(3):261--317, June 2006.
[21]
M. Griebl. Automatic parallelization of loop programs for distributed memory architectures. Habilitation thesis. Facultät für Mathematik und Informatik, Universität Passau, 2004.
[22]
A.-C. Guillou, F. Quilleré, P. Quinton, S. Rajopadhye, and T. Risset. Hardware design methodology with the Alpha language. In FDL'01, Lyon, France, Sept. 2001.
[23]
F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319--329, 1988.
[24]
W. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, Univ. of Maryland, 1996.
[25]
K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing, pages 301--320, 1993.
[26]
I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In ACM SIGPLAN'97 Conf. on Programming Language Design and Implementation, pages 346--357, Las Vegas, June 1997.
[27]
M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'08), pages 114--124. ACM Press, 2008.
[28]
R. Lethin, A. Leung, B. Meister, N. Vasilache, D. Wohlford, M. Baskaran, A. Hartono, and K. Datta. In D. Padua, editor, Encyclopedia of Parallel Computing. 1st edition., 2011, 50 p. in 4 volumes, not available separately., hardcover edition, June 2011.
[29]
K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst., 18(4):424--453, 1996.
[30]
N. Megiddo and V. Sarkar. Optimal weighted loop fusion for parallel programs. In symposium on Parallel Algorithms and Architectures, pages 282--291, 1997.
[31]
A. Nisbet. GAPS: A compiler framework for genetic algorithm (GA) optimised parallelisation. In HPCN Europe 1998: Proc. of the Intl. Conf. and Exhibition on High-Performance Computing and Networking, pages 987--989, London, UK, 1998. Springer-Verlag.
[32]
L.-N. Pouchet. Interative Optimization in the Polyhedral Model. PhD thesis, University of Paris-Sud 11, Orsay, France, Jan. 2010.
[33]
L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. Iterative optimization in the polyhedral model: Part II, multidimensional time. In ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'08), pages 90--100. ACM Press, 2008.
[34]
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, and P. Sadayappan. Combined iterative and model-driven optimization in an automatic parallelization framework. In Conf. on SuperComputing (SC'10), New Orleans, LA, Nov. 2010. To appear.
[35]
A. Qasem and K. Kennedy. Profitable loop fusion and tiling using model-driven empirical search. In Proc. of the 20th Intl. Conf. on Supercomputing (ICS'06), pages 249--258. ACM press, 2006.
[36]
J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108--230, 1992.
[37]
M. Ren, J. Y. Park, M. Houston, A. Aiken, and W. J. Dally. A tuning framework for software-managed memory hierarchies. In Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'08), pages 280--291. ACM Press, 2008.
[38]
L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout. Parameterized tiled loops for free. SIGPLAN Notices, Proc. of the 2007 PLDI Conf., 42(6):405--414, 2007.
[39]
A. Schrijver. Theory of linear and integer programming. John Wiley & Sons, 1986.
[40]
S. Singhai and K. McKinley. A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal, 40(6):340--355, 1997.
[41]
N. J. A. Sloane. Sequence a000670. The On-Line Encyclopedia of Integer Sequences.
[42]
M. Stephenson, S. Amarasinghe, M. Martin, and U.-M. O'Reilly. Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Not., 38(5):77--90, 2003.
[43]
A. Tiwari, C. Chen, J. Chame, M. Hall, and J. K. Hollingsworth. A scalable autotuning framework for computer optimization. In IPDPS'09, Rome, May 2009.
[44]
N. Vasilache. Scalable Program Optimization Techniques in the Polyhedra Model. PhD thesis, University of Paris-Sud 11, 2007.
[45]
S. Verdoolaege, F. Catthoor, M. Bruynooghe, and G. Janssens. Feasibility of incremental translation. Technical Report CW 348, Katholieke Universiteit Leuven Department of Computer Science, Oct. 2002.
[46]
Y. Voronenko, F. de Mesmay, and M. Püschel. Computer generation of general size linear transform libraries. In Intl. Symp. on Code Generation and Optimization (CGO'09), Mar. 2009.
[47]
R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the atlas project. Parallel Computing, 27(1--2):3--35, 2001.
[48]
M. Wolf, D. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, pages 274--286, 1996.
[49]
M. Wolfe. More iteration space tiling. In Proceedings of Supercomputing '89, pages 655--664, 1989.
[50]
M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, 1995.

Cited By

View all
  • (2024)Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine RelationsACM Transactions on Computer Systems10.1145/363530541:1-4(1-45)Online publication date: 15-Jan-2024
  • (2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
  • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
  • Show More Cited By

Index Terms

  1. Loop transformations: convexity, pruning and optimization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    POPL '11: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
    January 2011
    652 pages
    ISBN:9781450304900
    DOI:10.1145/1926385
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 1
      POPL '11
      January 2011
      624 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1925844
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 January 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. affine scheduling
    2. compilation
    3. compiler optimization
    4. loop transformations
    5. parallelism

    Qualifiers

    • Research-article

    Conference

    POPL '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 824 of 4,130 submissions, 20%

    Upcoming Conference

    POPL '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)45
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine RelationsACM Transactions on Computer Systems10.1145/363530541:1-4(1-45)Online publication date: 15-Jan-2024
    • (2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
    • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
    • (2023)Cache Programming for Scientific Loops Using LeasesACM Transactions on Architecture and Code Optimization10.1145/360009020:3(1-25)Online publication date: 19-Jul-2023
    • (2023)Model-Platform Optimized Deep Neural Network Accelerator Generation through Mixed-Integer Geometric Programming2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM57271.2023.00018(83-93)Online publication date: May-2023
    • (2023)TurboStencil: You only compute once for stencil computationFuture Generation Computer Systems10.1016/j.future.2023.04.019146(260-272)Online publication date: Sep-2023
    • (2022)Optimizing GPU Deep Learning Operators with Polyhedral Scheduling Constraint Injection2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741260(313-324)Online publication date: 2-Apr-2022
    • (2021)On the Impact of Affine Loop Transformations in Qubit AllocationACM Transactions on Quantum Computing10.1145/34654092:3(1-40)Online publication date: 30-Sep-2021
    • (2021)Ghostwriter: A Cache Coherence Protocol for Error-Tolerant Applications50th International Conference on Parallel Processing Workshop10.1145/3458744.3474045(1-10)Online publication date: 9-Aug-2021
    • (2021)Tile size selection of affine programs for GPGPUs using polyhedral cross-compilationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460369(13-26)Online publication date: 3-Jun-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media