research-article

Automatic creation of tile size selection models

Authors:

Lakshminarayanan Renganarayanan,

Sanjay Rajopadhye,

Charles Anderson,

Alexandre E. Eichenberger,

Kevin O'BrienAuthors Info & Claims

CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

Pages 190 - 199

https://doi.org/10.1145/1772954.1772982

Published: 24 April 2010 Publication History

Abstract

Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is usually achieved by hand-crafting tile size selection (TSS) models that characterize the performance of the tiled program as a function of tile sizes. The best tile sizes are selected by either directly using the TSS model or by using the TSS model together with an empirical search. Hand-crafting accurate TSS models is hard, and adapting them to different architecture/compiler, or even keeping them up-to-date with respect to the evolution of a single compiler is often just as hard. Instead of hand-crafting TSS models, can we automatically learn or create them? In this paper, we show that for a specific class of programs fairly accurate TSS models can be automatically created by using a combination of simple program features, synthetic kernels, and standard machine learning techniques. The automatic TSS model generation scheme can also be directly used for adapting the model and/or keeping it up-to-date. We evaluate our scheme on six different architecture-compiler combinations (chosen from three different architectures and four different compilers). The models learned by our method have consistently shown near-optimal performance (within 5% of the optimal on average) across all architecture-compiler combinations.

References

[1]

Intel 64 and IA-32 Architectures Optimization Reference Manual.

[2]

C.M. Bishop et al. Pattern recognition and machine learning.Springer New York:, 2006.

Digital Library

[3]

Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan.A practical automatic polyhedral program optimization system.In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2008.

Digital Library

[4]

Brad Calder, Dirk Grunwald, Michael Jones, Donald Lindsay, James Martin, Michael Mozer, and Benjamin Zoren. Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems, 19(1):188--222, January 1997.

Digital Library

[5]

J. Cavazos and J.E.B. Moss. Inducing heuristics to decide whether to schedule. In Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, pages 183--194,2004.

Digital Library

[6]

J. Cavazos and M.F.P. O'Boyle. Method-specific dynamic compilation using logistic regression. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming languages,systems, and applications, pages 229--240, 2006.

Digital Library

[7]

Jacqueline Chame and Sungdo Moon. A tile selection algorithm for data locality and cache interference. In 1999 ACM International Conference on Supercomputing, pages 492--499. ACM Press, 1999.

Digital Library

[8]

Chun Chen, Jacqueline Chame, and Mary Hall. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In CGO '05: Proceedings of the international symposium on Code generation and optimization, pages 111--122,Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[9]

S. Coleman and K.S. McKinley. Tile size selection using cache organization and data layout. In Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation,pages 279--290. ACM New York, NY, USA, 1995.

Digital Library

[10]

J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R.C.Whaley, and K. Yelick. Self-Adapting Linear Algebra Algorithms and Software. In Proceedings of the IEEE, 93(2):293,2005.

[11]

Arkady Epshteyn, María Jesús Garzarán, Gerald DeJong, David A.Padua, Gang Ren, Xiaoming Li, Kamen Yotov, and Keshav Pingali. Analytic models and empirical search: A hybrid approach to code optimization. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, pages 259--273,2005.

Digital Library

[12]

K. Esseghir. Improving data locality for caches. Master's thesis, Rice University, 1993.

[13]

Basilio B. Fraguela, M. G. Carmueja, and Diego Andrade. Optimal tile size selection guided by analytical models. In PARCO, pages 565--572, 2005.

[14]

A. Hartono, M.M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy,B. Norris, J. Ramanujam, and P. Sadayappan. Parametric multilevel tiling of imperfectly nested loops. In Proceedings of the 23rdinternational conference on Conference on Supercomputing, pages 147--157. ACM New York, NY, USA, 2009.

Digital Library

[15]

Chung-Hsing Hsu and Ulrich Kremer. A quantitative analysis of tile size selection algorithms. J. Supercomput., 27(3):279--294, 2004.

Digital Library

[16]

F. Irigoin and R. Triolet. Super node partitioning. In 15th ACM Symposium on Principles of Programming Languages, pages 319--328. ACM, Jan 1988.

Digital Library

[17]

Shoaib Kamil, Parry Husbands, Leonid Oliker, John Shalf, and Katherine Yelick. Impact of modern memory subsystems on cache optimizations for stencil computations. In Proceedings of the Workshop on Memory System Performance, pages 36--43, New York,NY, USA, 2005. ACM Press.

Digital Library

[18]

DaeGon Kim and Sanjay Rajopadhye. Efficient tiled loop generation:D-tiling. In The 22nd International Workshop on Languages and Compilers for Parallel Computing, 2009.

Digital Library

[19]

T. Kisuki, P.M.W. Knijnenburg, and MFP O' Boyle. Combined selection of tile sizes and unroll factors using iterative compilation.In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, page 237. Citeseer, 2000.

Digital Library

[20]

P. M. W. Knijnenburg, T. Kisuki, K. Gallivan, and M. F. P. O'Boyle.The effect of cache models on iterative compilation for combined tiling and unrolling. Concurr. Comput.: Pract. Exper., 16(2-3):247--270, 2004.

Digital Library

[21]

M.D. Lam, E.E. Rothberg, and M.E. Wolf. The cache performance and optimizations of blocked algorithms. Proceedings of the 4thinternational conference on architectural support for programming languages and operating systems, 25:63--74, 1991.

Digital Library

[22]

Monica S. Lam and Michael E. Wolf. A data locality optimizing algorithm (with retrospective). In Best of PLDI, pages 442--459,1991.

[23]

Xiaoming Li and María Jesús Garzaran. Optimizing matrix multiplication with a classifier learning system. In Workshop on Languages and Compilers for Parallel Computing, pages 121--135,2005.

Digital Library

[24]

A. McGovern, E. Moss, and A. Barto. Scheduling straight-line code using reinforcement learning and rollouts. (UM-CS-1999-023),1999.

Digital Library

[25]

N. Mitchell, N. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6):641--670, 1998.

Digital Library

[26]

Martin F. Møller. A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6:525--533, 1993.

Digital Library

[27]

A.Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. Lecture notes in computer science, pages 41--50, 2002.

[28]

Eliot Moss, Paul Utgoff, John Cavazos, Doina Precup, Darko Stefanovic, Carla Brodley, and David Scheeff. Learning to schedule straight-line code. In Proceedings of Neural Information Processing Symposium, pages 929--935. MIT Press, 1997.

Digital Library

[29]

Saeed Parsa and Shahriar Lotfi. A new genetic algorithm for loop tiling. The Journal of Supercomputing, 37(3):249--269, 2006.

Digital Library

[30]

Apan Qasem and Ken Kennedy. Profitable loop fusion and tiling using model-driven empirical search. In ICS '06: Proceedings of the 20th annual international conference on Supercomputing, pages 249--258, New York, NY, USA, 2006. ACM.

Digital Library

[31]

Lakshminarayanan Renganarayana and Sanjay Rajopadhye. Positivity, posynomials and tile size selection. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--12,Piscataway, NJ, USA, 2008. IEEE Press.

Digital Library

[32]

Lakshminarayanan Renganarayanan, DaeGon Kim, Sanjay Rajopadhye,and Michelle Mills Strout. Parameterized tiled loops for free.In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 405--414,New York, NY, USA, 2007. ACM.

Digital Library

[33]

Gabriel Rivera and Chau wen Tseng. A comparison of compiler tiling algorithms. In Proceedings of the 8th International Conference on Compiler Construction (CC'99, pages 168--182, 1999.

Digital Library

[34]

V. Sarkar, N. Megiddo, I.B.M.T.J.W.R. Center, and Y. Heights. An analytical model for loop tiling and its solution. Performance Analysis of Systems and Software, 2000. ISPASS. 2000 IEEE International Symposium on, pages 146--153, 2000.

Digital Library

[35]

R. Schreiber and J. Dongarra. Automatic blocking of nested loops.Technical Report 90.38, RIACS, NASA Ames Research Center, Aug1990.

Digital Library

[36]

M. Stephenson and S. Amarasinghe. Predicting unroll factors using supervised classification. In Proceedings of International Symposium on Code Generation and Optimization (CGO), pages 123--134, 2005.

Digital Library

[37]

Mark Stephenson, Saman Amarasinghe, Martin Martin, and Una-May O'Reilly. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN '03Conference on Programming Language Design and Implementation,pages 77--90. ACM Press, 2002.

Digital Library

[38]

Xavier Vera, Jaume Abella, Antonio González, and Josep Llosa.Optimizing program locality through cmes and gas. In PACT'03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, page 68, Washington,DC, USA, 2003. IEEE Computer Society.

Digital Library

[39]

R. Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27. IEEE Computer Society,1998.

Digital Library

[40]

R. Clint Whaley and Antoine Petitet. Minimizing development and maintenance costs in supporting persistently optimized BLAS.Software: Practice and Experience, 35(2):101--121, February 2005.

Digital Library

[41]

Jingling Xue. Loop Tiling For Parallelism. Kluwer Academic Publishers, 2000.

Digital Library

[42]

K. Yotov, Xiaoming Li, Gang Ren, M. J. S. Garzaran, D. Padua,K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? In Proceedings of the IEEE, 93:358--386,2005.

[43]

Kamen Yotov, Keshav Pingali, and Paul Stodghill. Think globally,search locally. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 141--150, NewYork, NY, USA, 2005. ACM.

Digital Library

Cited By

Jayaweera MKong MWang YKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444795
Liu HXu JChen SGuo T(2022)Compiler Optimization Parameter Selection Method Based on Ensemble LearningElectronics10.3390/electronics1115245211:15(2452)Online publication date: 6-Aug-2022
https://doi.org/10.3390/electronics11152452
Kelefouras VDjemame KKeramidas GVoros N(2022)A Methodology for Efficient Tile Size Selection for Affine Loop KernelsInternational Journal of Parallel Programming10.1007/s10766-022-00734-550:3-4(405-432)Online publication date: 23-May-2022
https://doi.org/10.1007/s10766-022-00734-5
Show More Cited By

Index Terms

Automatic creation of tile size selection models
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A practical tile size selection model for affine loop nests
ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing

Loop tiling for locality is an important transformation for general-purpose and domain-specific compilation as it allows programs to exploit the benefits of deep memory hierarchies. Most code generation tools with the infrastructure to perform automatic ...
Tile size selection revisited

Loop tiling is a widely used loop transformation to enhance data locality and allow data reuse. In the tiled code, however, tiles of different sizes can lead to significant variation in performance. Thus, selection of an optimal tile size is critical to ...
Optimal Tile Size Selection Problem Using Machine Learning
ICMLA '12: Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 02

One of the key feature of modern architectures is deep memory hierarchies. In order to exploit this feature, one has to expose data locality with-in a program. Loop tiling is an optimization phase in modern compilers which is used to transform a loop ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

April 2010

300 pages

ISBN:9781605586359

DOI:10.1145/1772954

General Chairs:
Andreas Moshovos
University of Toronto
,
Greg Steffan
University of Toronto
,
Program Chairs:
Kim Hazelwood
University of Virginia
,
David Kaeli
Northeastern University

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

IEEE CS uArch

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 April 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CGO '10

Sponsor:

CGO '10: 8th Annual IEEE/ ACM International Symposium on Code Generation and Optimization

April 24 - 28, 2010

Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
533
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jayaweera MKong MWang YKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444795
Liu HXu JChen SGuo T(2022)Compiler Optimization Parameter Selection Method Based on Ensemble LearningElectronics10.3390/electronics1115245211:15(2452)Online publication date: 6-Aug-2022
https://doi.org/10.3390/electronics11152452
Kelefouras VDjemame KKeramidas GVoros N(2022)A Methodology for Efficient Tile Size Selection for Affine Loop KernelsInternational Journal of Parallel Programming10.1007/s10766-022-00734-550:3-4(405-432)Online publication date: 23-May-2022
https://doi.org/10.1007/s10766-022-00734-5
Meng KNorris B(2022)Guiding Code Optimizations with Deep Learning-Based Code MatchingLanguages and Compilers for Parallel Computing10.1007/978-3-030-95953-1_2(20-28)Online publication date: 16-Feb-2022
https://doi.org/10.1007/978-3-030-95953-1_2
Mammadli RSelakovic MWolf FPradel MSamanta RDillig I(2021)Learning to make compiler optimizations more effectiveProceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming10.1145/3460945.3464952(9-20)Online publication date: 21-Jun-2021
https://dl.acm.org/doi/10.1145/3460945.3464952
Olivry AIooss GTollenaere NRountev ASadayappan PRastello FFreund SYahav E(2021)IOOpt: automatic derivation of I/O complexity bounds for affine programsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454103(1187-1202)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454103
Narasimhan KAcharya ABaid ABondhugula UZhou HMoreira JMueller FEtsion Y(2021)A practical tile size selection model for affine loop nestsProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3462213(27-39)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3462213
Abdelaal KKong MZhou HMoreira JMueller FEtsion Y(2021)Tile size selection of affine programs for GPGPUs using polyhedral cross-compilationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460369(13-26)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3460369
Li RXu YSukumaran-Rajam ARountev ASadayappan PSherwood TBerger EKozyrakis C(2021)Analytical characterization and design space exploration for optimization of CNNsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446759(928-942)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446759
Kurt SSukumaran-Rajam ARastello FSadayyapan PCuicchi CQualters IKramer W(2020)Efficient tiled sparse matrix multiplication through matrix signaturesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433816(1-14)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433816
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents