Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2628071.2628106acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Tiling and optimizing time-iterated computations on periodic domains

Published: 24 August 2014 Publication History
  • Get Citation Alerts
  • Abstract

    This paper deals with optimizing time-iterated computations on periodic data domains. These computations are prevalent in computational sciences, particularly in partial differential equation solvers. We propose a fully automatic technique suitable for implementation in a compiler or in a domain-specific code generator for such computations. Dependence patterns on periodic data domains prevent existing algorithms from finding tiling opportunities. Our approach augments a state-of-the-art parallelization and locality-enhancing algorithm from the polyhedral framework to allow time-tiling of stencil computations on periodic domains. Experimental results on the swim SPEC CPU2000fp benchmark show a speedup of 5× and 4.2× over the highest SPEC performance achieved by native compilers on Intel Xeon and AMD Opteron multicore SMP systems, respectively. On other representative stencil computations, our scheme provides performance similar to that achieved with no periodicity, and a very high speedup is obtained over the native compiler. We also report a mean speedup of about 1.5× over a domain-specific stencil compiler supporting limited cases of periodic boundary conditions. To the best of our knowledge, it has been infeasible to manually reproduce such optimizations on swim or any other periodic stencil, especially on a data grid of two-dimensions or higher.

    References

    [1]
    N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loops. International Journal of Parallel Programming, 29(5), Oct. 2001.
    [2]
    A. V. Aho, R. Sethi, J. D. Ullman, and M. S. Lam. Compilers: Principles, Techniques, and Tools Second Edition. Prentice Hall, 2006.
    [3]
    V. Bandishti, I. Pananilath, and U. Bondhugula. Tiling stencil computations to maximize parallelism. In SC, pages 40:1--40:11, 2012.
    [4]
    U. K. Banerjee. Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publishers, Norwell, MA, USA, 1993.
    [5]
    U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In ETAPS CC, 2008.
    [6]
    U. Bondhugula and A. Cohen. Handling negative coefficients in Pluto. Technical Report 1, Indian Institute of Science, Feb. 2014.
    [7]
    U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In PLDI, pages 101--113, 2008.
    [8]
    C. Choffrut and K. Culik. Folding of the plane and the design of systolic arrays. Information Processing Letters, 17(3):149--153, 1983.
    [9]
    A. Cohen, S. Girbal, D. Parello, M. Sigler, O. Temam, and N. Vasilache. Facilitating the search for compositions of program transformations. In ACM ICS, pages 151--160, June 2005.
    [10]
    K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. A. Patterson, J. Shalf, and K. A. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Supercomputing, page 4, 2008.
    [11]
    P. Feautrier. Some efficient solutions to the affine scheduling problem: Part I, one-dimensional time. IJPP, 21(5):313--348, 1992.
    [12]
    P. Feautrier. Some efficient solutions to the affine scheduling problem: Part II, multidimensional time. IJPP, 21(6):389--420, 1992.
    [13]
    S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations. IJPP, 34(3):261--317, June 2006.
    [14]
    M. Griebl, P. Feautrier, and C. Lengauer. Index set splitting. International Journal of Parallel Programming, 28(6):607--631, 2000.
    [15]
    T. Henretty, K. Stock, L.-N. Pouchet, F. Franchetti, J. Ramanujam, and P. Sadayappan. Data layout transformation for stencil computations on short simd architectures. In ETAPS International Conference on Compiler Construction (CC'11), pages 225--245, Saarbrucken, Germany, Mar. 2011.
    [16]
    T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector simd architectures. In ACM ICS, 2013.
    [17]
    F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319--329, 1988.
    [18]
    S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN PLDI, July 2007.
    [19]
    L. Lamport. The hyperplane method for an array computer. In Proceedings of the Sagamore Computer Conference on Parallel Processing, pages 113--131, London, UK, 1975. Springer-Verlag.
    [20]
    A. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, pages 201--214, 1997.
    [21]
    N. Osheim, M. M. Strout, D. Rostron, and S. Rajopadhye. Smashing: Folding space to tile through time. In J. N. Amaral, editor, LCPC, pages 80--93. Springer-Verlag, 2008.
    [22]
    PLUTO: A polyhedral automatic parallelizer and locality optimizer for multicores. http://pluto-compiler.sourceforge.net.
    [23]
    W. Pugh and E. Rosser. Iteration space slicing and its application to communication optimization. In International Conference on Supercomputing, pages 221--228, 1997.
    [24]
    S. Rajopadhye, L. Mui, and S. Kiaei. Piecewise linear schedules for recurrence equations. In VLSI Signal Processing V, IEEE Press, pages 375--384, Oct. 1992.
    [25]
    J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108--230, 1992.
    [26]
    D. A. Randall, T. D. Ringler, R. P. Heikes, P. Jones, and J. Baumgardner. Climate modeling with spherical geodesic grids. Computing in Science and Engg., 4(5):32--41, Sept. 2002.
    [27]
    R. Sadourny. The dynamics of finite-difference models of the shallow-water equations. J. atm. sciences, 32(4), Apr. 1975.
    [28]
    A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986.
    [29]
    Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In ACM SIGPLAN PLDI, pages 215--228, 1999.
    [30]
    R. Strzodka, M. Shaheen, D. Pajak, and H.-P. Seidel. Cache oblivious parallelograms in iterative stencil computations. In ACM ICS, pages 49--59, 2010.
    [31]
    R. Strzodka, M. Shaheen, D. Pajak, and H.-P. Seidel. Cache accurate time skewing in iterative stencil computations. In ICPP, pages 571--581, 2011.
    [32]
    P. N. Swarztrauber. 171.swim spec cpu2000 benchmark description file. Standard Performance Evaluation Corporation. http://www.spec.org/cpu2000/CFP2000/171.swim/docs/171.swim.html, 2000.
    [33]
    Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson. The pochoir stencil compiler. In SPAA, pages 117--128, 2011.
    [34]
    J. Treibig, G. Wellein, and G. Hager. Efficient multicore-aware parallelization strategies for iterative stencil computations. Journal of Computational Science, 2(2):130--137, 2011.
    [35]
    S. Verdoolaege. isl: An integer set library for the polyhedral model. In K. Fukuda, J. Hoeven, M. Joswig, and N. Takayama, editors, Mathematical Software - ICMS 2010, volume 6327, pages 299--302. Springer, 2010.
    [36]
    S. Verdoolaege. Integer Set Library, 2013. An integer set library for program analysis.
    [37]
    M. Wolf. More iteration space tiling. In Proceedings of Supercomputing '89, pages 655--664, 1989.
    [38]
    M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co., Inc., 1995.
    [39]
    D. Wonnacott. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In IPDPS, pages 171--180, 2000.
    [40]
    D. Wonnacott and M. Strout. On the scalability of loop tiling techniques. In International workshop on Polyhedral compilation techniques, 2013.
    [41]
    J. Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000.
    [42]
    Y. Yaacoby and P. R. Cappello. Converting affine recurrence equations to quasi-uniform recurrence equations. VLSI Signal Processing, 11(1-2):113--131, 1995.

    Cited By

    View all
    • (2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
    • (2022)Occam: Optimal Data Reuse for Convolutional Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/356605220:1(1-25)Online publication date: 16-Dec-2022
    • (2021)Revisiting split tiling for stencil computations in polyhedral compilationThe Journal of Supercomputing10.1007/s11227-021-03835-zOnline publication date: 27-May-2021
    • Show More Cited By

    Index Terms

    1. Tiling and optimizing time-iterated computations on periodic domains

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation
      August 2014
      514 pages
      ISBN:9781450328098
      DOI:10.1145/2628071
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 August 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. automatic parallelization
      2. periodic
      3. polyhedral model
      4. stencils
      5. tiling

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      PACT '14
      Sponsor:
      • IFIP WG 10.3
      • SIGARCH
      • IEEE CS TCPP
      • IEEE CS TCAA

      Acceptance Rates

      PACT '14 Paper Acceptance Rate 54 of 144 submissions, 38%;
      Overall Acceptance Rate 121 of 471 submissions, 26%

      Upcoming Conference

      PACT '24

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)16
      • Downloads (Last 6 weeks)3

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
      • (2022)Occam: Optimal Data Reuse for Convolutional Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/356605220:1(1-25)Online publication date: 16-Dec-2022
      • (2021)Revisiting split tiling for stencil computations in polyhedral compilationThe Journal of Supercomputing10.1007/s11227-021-03835-zOnline publication date: 27-May-2021
      • (2020)Pencil: A Pipelined Algorithm for Distributed StencilsSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00089(1-16)Online publication date: Nov-2020
      • (2019)Flextended TilesACM Transactions on Architecture and Code Optimization10.1145/336938216:4(1-25)Online publication date: 17-Dec-2019
      • (2019)Tessellating Star StencilsProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337835(1-10)Online publication date: 5-Aug-2019
      • (2019)Transitioning Spiking Neural Network Simulators to Heterogeneous HardwareProceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3316480.3322893(115-126)Online publication date: 29-May-2019
      • (2019)Automated Tiling of Unstructured Mesh Computations with Application to Seismological ModelingACM Transactions on Mathematical Software10.1145/330225645:2(1-30)Online publication date: 3-May-2019
      • (2018)Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00037(276-285)Online publication date: May-2018
      • (2018)Automatic runtime calculation of communications for data‐parallel expressions with periodic conditionsConcurrency and Computation: Practice and Experience10.1002/cpe.443031:5Online publication date: 31-Jan-2018
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media