Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1772954.1772982acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Automatic creation of tile size selection models

Published: 24 April 2010 Publication History

Abstract

Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is usually achieved by hand-crafting tile size selection (TSS) models that characterize the performance of the tiled program as a function of tile sizes. The best tile sizes are selected by either directly using the TSS model or by using the TSS model together with an empirical search. Hand-crafting accurate TSS models is hard, and adapting them to different architecture/compiler, or even keeping them up-to-date with respect to the evolution of a single compiler is often just as hard. Instead of hand-crafting TSS models, can we automatically learn or create them? In this paper, we show that for a specific class of programs fairly accurate TSS models can be automatically created by using a combination of simple program features, synthetic kernels, and standard machine learning techniques. The automatic TSS model generation scheme can also be directly used for adapting the model and/or keeping it up-to-date. We evaluate our scheme on six different architecture-compiler combinations (chosen from three different architectures and four different compilers). The models learned by our method have consistently shown near-optimal performance (within 5% of the optimal on average) across all architecture-compiler combinations.

References

[1]
Intel 64 and IA-32 Architectures Optimization Reference Manual.
[2]
C.M. Bishop et al. Pattern recognition and machine learning.Springer New York:, 2006.
[3]
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan.A practical automatic polyhedral program optimization system.In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2008.
[4]
Brad Calder, Dirk Grunwald, Michael Jones, Donald Lindsay, James Martin, Michael Mozer, and Benjamin Zoren. Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems, 19(1):188--222, January 1997.
[5]
J. Cavazos and J.E.B. Moss. Inducing heuristics to decide whether to schedule. In Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, pages 183--194,2004.
[6]
J. Cavazos and M.F.P. O'Boyle. Method-specific dynamic compilation using logistic regression. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming languages,systems, and applications, pages 229--240, 2006.
[7]
Jacqueline Chame and Sungdo Moon. A tile selection algorithm for data locality and cache interference. In 1999 ACM International Conference on Supercomputing, pages 492--499. ACM Press, 1999.
[8]
Chun Chen, Jacqueline Chame, and Mary Hall. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In CGO '05: Proceedings of the international symposium on Code generation and optimization, pages 111--122,Washington, DC, USA, 2005. IEEE Computer Society.
[9]
S. Coleman and K.S. McKinley. Tile size selection using cache organization and data layout. In Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation,pages 279--290. ACM New York, NY, USA, 1995.
[10]
J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R.C.Whaley, and K. Yelick. Self-Adapting Linear Algebra Algorithms and Software. In Proceedings of the IEEE, 93(2):293,2005.
[11]
Arkady Epshteyn, María Jesús Garzarán, Gerald DeJong, David A.Padua, Gang Ren, Xiaoming Li, Kamen Yotov, and Keshav Pingali. Analytic models and empirical search: A hybrid approach to code optimization. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, pages 259--273,2005.
[12]
K. Esseghir. Improving data locality for caches. Master's thesis, Rice University, 1993.
[13]
Basilio B. Fraguela, M. G. Carmueja, and Diego Andrade. Optimal tile size selection guided by analytical models. In PARCO, pages 565--572, 2005.
[14]
A. Hartono, M.M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy,B. Norris, J. Ramanujam, and P. Sadayappan. Parametric multilevel tiling of imperfectly nested loops. In Proceedings of the 23rdinternational conference on Conference on Supercomputing, pages 147--157. ACM New York, NY, USA, 2009.
[15]
Chung-Hsing Hsu and Ulrich Kremer. A quantitative analysis of tile size selection algorithms. J. Supercomput., 27(3):279--294, 2004.
[16]
F. Irigoin and R. Triolet. Super node partitioning. In 15th ACM Symposium on Principles of Programming Languages, pages 319--328. ACM, Jan 1988.
[17]
Shoaib Kamil, Parry Husbands, Leonid Oliker, John Shalf, and Katherine Yelick. Impact of modern memory subsystems on cache optimizations for stencil computations. In Proceedings of the Workshop on Memory System Performance, pages 36--43, New York,NY, USA, 2005. ACM Press.
[18]
DaeGon Kim and Sanjay Rajopadhye. Efficient tiled loop generation:D-tiling. In The 22nd International Workshop on Languages and Compilers for Parallel Computing, 2009.
[19]
T. Kisuki, P.M.W. Knijnenburg, and MFP O' Boyle. Combined selection of tile sizes and unroll factors using iterative compilation.In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, page 237. Citeseer, 2000.
[20]
P. M. W. Knijnenburg, T. Kisuki, K. Gallivan, and M. F. P. O'Boyle.The effect of cache models on iterative compilation for combined tiling and unrolling. Concurr. Comput.: Pract. Exper., 16(2-3):247--270, 2004.
[21]
M.D. Lam, E.E. Rothberg, and M.E. Wolf. The cache performance and optimizations of blocked algorithms. Proceedings of the 4thinternational conference on architectural support for programming languages and operating systems, 25:63--74, 1991.
[22]
Monica S. Lam and Michael E. Wolf. A data locality optimizing algorithm (with retrospective). In Best of PLDI, pages 442--459,1991.
[23]
Xiaoming Li and María Jesús Garzaran. Optimizing matrix multiplication with a classifier learning system. In Workshop on Languages and Compilers for Parallel Computing, pages 121--135,2005.
[24]
A. McGovern, E. Moss, and A. Barto. Scheduling straight-line code using reinforcement learning and rollouts. (UM-CS-1999-023),1999.
[25]
N. Mitchell, N. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6):641--670, 1998.
[26]
Martin F. Møller. A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6:525--533, 1993.
[27]
A.Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. Lecture notes in computer science, pages 41--50, 2002.
[28]
Eliot Moss, Paul Utgoff, John Cavazos, Doina Precup, Darko Stefanovic, Carla Brodley, and David Scheeff. Learning to schedule straight-line code. In Proceedings of Neural Information Processing Symposium, pages 929--935. MIT Press, 1997.
[29]
Saeed Parsa and Shahriar Lotfi. A new genetic algorithm for loop tiling. The Journal of Supercomputing, 37(3):249--269, 2006.
[30]
Apan Qasem and Ken Kennedy. Profitable loop fusion and tiling using model-driven empirical search. In ICS '06: Proceedings of the 20th annual international conference on Supercomputing, pages 249--258, New York, NY, USA, 2006. ACM.
[31]
Lakshminarayanan Renganarayana and Sanjay Rajopadhye. Positivity, posynomials and tile size selection. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--12,Piscataway, NJ, USA, 2008. IEEE Press.
[32]
Lakshminarayanan Renganarayanan, DaeGon Kim, Sanjay Rajopadhye,and Michelle Mills Strout. Parameterized tiled loops for free.In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 405--414,New York, NY, USA, 2007. ACM.
[33]
Gabriel Rivera and Chau wen Tseng. A comparison of compiler tiling algorithms. In Proceedings of the 8th International Conference on Compiler Construction (CC'99, pages 168--182, 1999.
[34]
V. Sarkar, N. Megiddo, I.B.M.T.J.W.R. Center, and Y. Heights. An analytical model for loop tiling and its solution. Performance Analysis of Systems and Software, 2000. ISPASS. 2000 IEEE International Symposium on, pages 146--153, 2000.
[35]
R. Schreiber and J. Dongarra. Automatic blocking of nested loops.Technical Report 90.38, RIACS, NASA Ames Research Center, Aug1990.
[36]
M. Stephenson and S. Amarasinghe. Predicting unroll factors using supervised classification. In Proceedings of International Symposium on Code Generation and Optimization (CGO), pages 123--134, 2005.
[37]
Mark Stephenson, Saman Amarasinghe, Martin Martin, and Una-May O'Reilly. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN '03Conference on Programming Language Design and Implementation,pages 77--90. ACM Press, 2002.
[38]
Xavier Vera, Jaume Abella, Antonio González, and Josep Llosa.Optimizing program locality through cmes and gas. In PACT'03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, page 68, Washington,DC, USA, 2003. IEEE Computer Society.
[39]
R. Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27. IEEE Computer Society,1998.
[40]
R. Clint Whaley and Antoine Petitet. Minimizing development and maintenance costs in supporting persistently optimized BLAS.Software: Practice and Experience, 35(2):101--121, February 2005.
[41]
Jingling Xue. Loop Tiling For Parallelism. Kluwer Academic Publishers, 2000.
[42]
K. Yotov, Xiaoming Li, Gang Ren, M. J. S. Garzaran, D. Padua,K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? In Proceedings of the IEEE, 93:358--386,2005.
[43]
Kamen Yotov, Keshav Pingali, and Paul Stodghill. Think globally,search locally. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 141--150, NewYork, NY, USA, 2005. ACM.

Cited By

View all
  • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
  • (2022)Compiler Optimization Parameter Selection Method Based on Ensemble LearningElectronics10.3390/electronics1115245211:15(2452)Online publication date: 6-Aug-2022
  • (2022)A Methodology for Efficient Tile Size Selection for Affine Loop KernelsInternational Journal of Parallel Programming10.1007/s10766-022-00734-550:3-4(405-432)Online publication date: 23-May-2022
  • Show More Cited By

Index Terms

  1. Automatic creation of tile size selection models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
    April 2010
    300 pages
    ISBN:9781605586359
    DOI:10.1145/1772954
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS uArch

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 April 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. machine learning
    2. neural network
    3. performance modeling
    4. tiling

    Qualifiers

    • Research-article

    Conference

    CGO '10

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
    • (2022)Compiler Optimization Parameter Selection Method Based on Ensemble LearningElectronics10.3390/electronics1115245211:15(2452)Online publication date: 6-Aug-2022
    • (2022)A Methodology for Efficient Tile Size Selection for Affine Loop KernelsInternational Journal of Parallel Programming10.1007/s10766-022-00734-550:3-4(405-432)Online publication date: 23-May-2022
    • (2022)Guiding Code Optimizations with Deep Learning-Based Code MatchingLanguages and Compilers for Parallel Computing10.1007/978-3-030-95953-1_2(20-28)Online publication date: 16-Feb-2022
    • (2021)Learning to make compiler optimizations more effectiveProceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming10.1145/3460945.3464952(9-20)Online publication date: 21-Jun-2021
    • (2021)IOOpt: automatic derivation of I/O complexity bounds for affine programsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454103(1187-1202)Online publication date: 19-Jun-2021
    • (2021)A practical tile size selection model for affine loop nestsProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3462213(27-39)Online publication date: 3-Jun-2021
    • (2021)Tile size selection of affine programs for GPGPUs using polyhedral cross-compilationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460369(13-26)Online publication date: 3-Jun-2021
    • (2021)Analytical characterization and design space exploration for optimization of CNNsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446759(928-942)Online publication date: 19-Apr-2021
    • (2020)Efficient tiled sparse matrix multiplication through matrix signaturesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433816(1-14)Online publication date: 9-Nov-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media