Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1654059.1654105acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Compact multi-dimensional kernel extraction for register tiling

Published: 14 November 2009 Publication History
  • Get Citation Alerts
  • Abstract

    To achieve high performance on multi-cores, modern loop optimizers apply long sequences of transformations that produce complex loop structures. Downstream optimizations such as register tiling (unroll-and-jam plus scalar promotion) typically provide a significant performance improvement. Typical register tilers provide this performance improvement only when applied on simple loop structures. They often fail to operate on complex loop structures leaving a significant amount of performance on the table. We present a technique called compact multi-dimensional kernel extraction (COMDEX) which can make register tilers operate on arbitrarily complex loop structures and enable them to provide the performance benefits. COMDEX extracts compact unrollable kernels from complex loops. We show that by using COMDEX as a pre-processing to register tiling we can (i) enable register tiling on complex loop structures and (ii) realize a significant performance improvement on a variety of codes.

    References

    [1]
    J. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, 2002.
    [2]
    Uday Bondhugula, M. Baskaran, Sriram Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International conference on Compiler Construction (ETAPS CC), April 2008.
    [3]
    Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2008.
    [4]
    David Callahan, Steve Carr, and Ken Kennedy. Improving register allocation for subscripted variables. In PLDI '90: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation, pages 53--65, New York, NY, USA, 1990. ACM Press.
    [5]
    Steve Carr and Yiping Guan. Unroll-and-jam using uniformly generated sets. In MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pages 349--357, Washington, DC, USA, 1997. IEEE Computer Society.
    [6]
    Steve Carr and Philip Sweany. An experimental evaluation of scalar replacement on scientific benchmarks. Software Practice and Experience, 33(15):1419--1445, 2003.
    [7]
    L. Carter, J. Ferrante, F. Hummel, B. Alpern, and K. S. Gatlin. Hierarchical tiling: A methodology for high performance. Technical Report CS96-508, UCSD, Nov. 1996.
    [8]
    Larry Carter, Jeanne Ferrante, and Susan Flynn Hummel. Hierarchical tiling for improved superscalar performance. In IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing, pages 239--245, Washington, DC, USA, 1995. IEEE Computer Society.
    [9]
    Albert Cohen, Sylvain Girbal, David Parello, M. Sigler, Olivier Temam, and Nicolas Vasilache. Facilitating the search for compositions of program transformations. In ACM International conference on Supercomputing, pages 151--160, June 2005.
    [10]
    Albert Hartono, Muthu Manikandan Baskaran, Cédric Bastoul, Albert Cohen, Sriram Krishnamoorthy, Boyana Norris, J. Ramanujam, and P. Sadayappan. Parametric multi-level tiling of imperfectly nested loops. In ICS '09: Proceedings of the 23rd international conference on Supercomputing, pages 147--157, New York, NY, USA, 2009. ACM.
    [11]
    HiTLoG: Hierarchical Tiled Loop Generator. Available at: http://www.cs.colostate.edu/MMAlpha/HiTLoG/.
    [12]
    F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319--329, 1988.
    [13]
    Marta Jiménez, José M. Llabería, and Agustín Fernández. Register tiling in nonrectangular iteration spaces. ACM Trans. Program. Lang. Syst., 24(4):409--453, 2002.
    [14]
    P. M. W. Knijnenburg, T. Kisuki, K. Gallivan, and M. F. P. O'Boyle. The effect of cache models on iterative compilation for combined tiling and unrolling: Research articles. Concurr. Comput.: Pract. Exper., 16(2--3):247--270, 2004.
    [15]
    N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6):641--670, June 1998.
    [16]
    Fabien Quilleré, Sanjay Rajopadhye, and Doran Wilde. Generation of efficient nested loops from polyhedra. International Journal Parallel Programming, 28(5):469--498, 2000.
    [17]
    Lakshminarayanan Renganarayana, Ramakrishna Upadrasta, and Sanjay Rajopadhye. Optimal ILP and register tiling: Analytical model and optimization framework. In LCPC 2005: 12th International Workshop on Languages and Compilers for Parallel Computing. Springer Verlag, 2005.
    [18]
    Lakshminarayanan Renganarayanan, DaeGon Kim, Sanjay Rajopadhye, and Michelle Mills Strout. Parameterized tiled loops for free. In PLDI '07: ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 405--414, New York, NY, USA, 2007. ACM Press.
    [19]
    Vivek Sarkar. Optimized unrolling of nested loops. In ICS '00: Proceedings of the 14th international conference on Supercomputing, pages 153--166, New York, NY, USA, 2000. ACM Press.
    [20]
    Vivek Sarkar. Optimized unrolling of nested loops. International Journal of Parallel Programming, 29(5):545--581, 2001.
    [21]
    Nicolas Vasilache. Scalable Program Optimization Techniques in the Polyhedral Model. PhD thesis, Université de Paris-Sud, INRIA Futurs, September 2007.
    [22]
    R. Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27. IEEE Computer Society, 1998.
    [23]
    Jingling Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000.
    [24]
    K. Yotov, Xiaoming Li, Gang Ren, M. J. S. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93:358--386, 2005.

    Cited By

    View all
    • (2023)Optimizing Depthwise Convolutions on ARMv8 ArchitectureParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_34(441-452)Online publication date: 8-Apr-2023
    • (2020)Polyhedral Compilation for Racetrack MemoriesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.3012266(1-1)Online publication date: 2020
    • (2019)Multi-dimensional Vectorization in LLVMProceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing10.1145/3303117.3306172(1-8)Online publication date: 16-Feb-2019
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
    November 2009
    778 pages
    ISBN:9781605587448
    DOI:10.1145/1654059
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 November 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SC '09
    Sponsor:

    Acceptance Rates

    SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;
    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Optimizing Depthwise Convolutions on ARMv8 ArchitectureParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_34(441-452)Online publication date: 8-Apr-2023
    • (2020)Polyhedral Compilation for Racetrack MemoriesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.3012266(1-1)Online publication date: 2020
    • (2019)Multi-dimensional Vectorization in LLVMProceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing10.1145/3303117.3306172(1-8)Online publication date: 16-Feb-2019
    • (2015)Energy-efficient Algorithms for Ultrascale SystemsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1502052:2(77-104)Online publication date: 6-Apr-2015
    • (2012)Parameterized loop tilingACM Transactions on Programming Languages and Systems10.1145/2160910.216091234:1(1-41)Online publication date: 4-May-2012
    • (2010)A model for fusion and code motion in an automatic parallelizing compilerProceedings of the 19th international conference on Parallel architectures and compilation techniques10.1145/1854273.1854317(343-352)Online publication date: 11-Sep-2010

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media