research-article

Tiling and optimizing time-iterated computations on periodic domains

Authors:

Uday Bondhugula,

Vinayaka Bandishti,

Guillain Potron, and

Nicolas VasilacheAuthors Info & Claims

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

August 2014

Pages 39 - 50

https://doi.org/10.1145/2628071.2628106

Published: 24 August 2014 Publication History

Abstract

This paper deals with optimizing time-iterated computations on periodic data domains. These computations are prevalent in computational sciences, particularly in partial differential equation solvers. We propose a fully automatic technique suitable for implementation in a compiler or in a domain-specific code generator for such computations. Dependence patterns on periodic data domains prevent existing algorithms from finding tiling opportunities. Our approach augments a state-of-the-art parallelization and locality-enhancing algorithm from the polyhedral framework to allow time-tiling of stencil computations on periodic domains. Experimental results on the swim SPEC CPU2000fp benchmark show a speedup of 5× and 4.2× over the highest SPEC performance achieved by native compilers on Intel Xeon and AMD Opteron multicore SMP systems, respectively. On other representative stencil computations, our scheme provides performance similar to that achieved with no periodicity, and a very high speedup is obtained over the native compiler. We also report a mean speedup of about 1.5× over a domain-specific stencil compiler supporting limited cases of periodic boundary conditions. To the best of our knowledge, it has been infeasible to manually reproduce such optimizations on swim or any other periodic stencil, especially on a data grid of two-dimensions or higher.

References

[1]

N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loops. International Journal of Parallel Programming, 29(5), Oct. 2001.

Digital Library

[2]

A. V. Aho, R. Sethi, J. D. Ullman, and M. S. Lam. Compilers: Principles, Techniques, and Tools Second Edition. Prentice Hall, 2006.

Digital Library

[3]

V. Bandishti, I. Pananilath, and U. Bondhugula. Tiling stencil computations to maximize parallelism. In SC, pages 40:1--40:11, 2012.

Digital Library

[4]

U. K. Banerjee. Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publishers, Norwell, MA, USA, 1993.

Digital Library

[5]

U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In ETAPS CC, 2008.

Digital Library

[6]

U. Bondhugula and A. Cohen. Handling negative coefficients in Pluto. Technical Report 1, Indian Institute of Science, Feb. 2014.

[7]

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In PLDI, pages 101--113, 2008.

Digital Library

[8]

C. Choffrut and K. Culik. Folding of the plane and the design of systolic arrays. Information Processing Letters, 17(3):149--153, 1983.

[9]

A. Cohen, S. Girbal, D. Parello, M. Sigler, O. Temam, and N. Vasilache. Facilitating the search for compositions of program transformations. In ACM ICS, pages 151--160, June 2005.

Digital Library

[10]

K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. A. Patterson, J. Shalf, and K. A. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Supercomputing, page 4, 2008.

Digital Library

[11]

P. Feautrier. Some efficient solutions to the affine scheduling problem: Part I, one-dimensional time. IJPP, 21(5):313--348, 1992.

Digital Library

[12]

P. Feautrier. Some efficient solutions to the affine scheduling problem: Part II, multidimensional time. IJPP, 21(6):389--420, 1992.

Digital Library

[13]

S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations. IJPP, 34(3):261--317, June 2006.

Digital Library

[14]

M. Griebl, P. Feautrier, and C. Lengauer. Index set splitting. International Journal of Parallel Programming, 28(6):607--631, 2000.

[15]

T. Henretty, K. Stock, L.-N. Pouchet, F. Franchetti, J. Ramanujam, and P. Sadayappan. Data layout transformation for stencil computations on short simd architectures. In ETAPS International Conference on Compiler Construction (CC'11), pages 225--245, Saarbrucken, Germany, Mar. 2011.

Digital Library

[16]

T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector simd architectures. In ACM ICS, 2013.

Digital Library

[17]

F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319--329, 1988.

Digital Library

[18]

S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN PLDI, July 2007.

Digital Library

[19]

L. Lamport. The hyperplane method for an array computer. In Proceedings of the Sagamore Computer Conference on Parallel Processing, pages 113--131, London, UK, 1975. Springer-Verlag.

Digital Library

[20]

A. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, pages 201--214, 1997.

Digital Library

[21]

N. Osheim, M. M. Strout, D. Rostron, and S. Rajopadhye. Smashing: Folding space to tile through time. In J. N. Amaral, editor, LCPC, pages 80--93. Springer-Verlag, 2008.

Digital Library

[22]

PLUTO: A polyhedral automatic parallelizer and locality optimizer for multicores. http://pluto-compiler.sourceforge.net.

[23]

W. Pugh and E. Rosser. Iteration space slicing and its application to communication optimization. In International Conference on Supercomputing, pages 221--228, 1997.

Digital Library

[24]

S. Rajopadhye, L. Mui, and S. Kiaei. Piecewise linear schedules for recurrence equations. In VLSI Signal Processing V, IEEE Press, pages 375--384, Oct. 1992.

[25]

J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108--230, 1992.

[26]

D. A. Randall, T. D. Ringler, R. P. Heikes, P. Jones, and J. Baumgardner. Climate modeling with spherical geodesic grids. Computing in Science and Engg., 4(5):32--41, Sept. 2002.

Digital Library

[27]

R. Sadourny. The dynamics of finite-difference models of the shallow-water equations. J. atm. sciences, 32(4), Apr. 1975.

[28]

A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986.

Digital Library

[29]

Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In ACM SIGPLAN PLDI, pages 215--228, 1999.

Digital Library

[30]

R. Strzodka, M. Shaheen, D. Pajak, and H.-P. Seidel. Cache oblivious parallelograms in iterative stencil computations. In ACM ICS, pages 49--59, 2010.

Digital Library

[31]

R. Strzodka, M. Shaheen, D. Pajak, and H.-P. Seidel. Cache accurate time skewing in iterative stencil computations. In ICPP, pages 571--581, 2011.

Digital Library

[32]

P. N. Swarztrauber. 171.swim spec cpu2000 benchmark description file. Standard Performance Evaluation Corporation. http://www.spec.org/cpu2000/CFP2000/171.swim/docs/171.swim.html, 2000.

[33]

Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson. The pochoir stencil compiler. In SPAA, pages 117--128, 2011.

Digital Library

[34]

J. Treibig, G. Wellein, and G. Hager. Efficient multicore-aware parallelization strategies for iterative stencil computations. Journal of Computational Science, 2(2):130--137, 2011.

[35]

S. Verdoolaege. isl: An integer set library for the polyhedral model. In K. Fukuda, J. Hoeven, M. Joswig, and N. Takayama, editors, Mathematical Software - ICMS 2010, volume 6327, pages 299--302. Springer, 2010.

Digital Library

[36]

S. Verdoolaege. Integer Set Library, 2013. An integer set library for program analysis.

[37]

M. Wolf. More iteration space tiling. In Proceedings of Supercomputing '89, pages 655--664, 1989.

Digital Library

[38]

M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co., Inc., 1995.

Digital Library

[39]

D. Wonnacott. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In IPDPS, pages 171--180, 2000.

Digital Library

[40]

D. Wonnacott and M. Strout. On the scalability of loop tiling techniques. In International workshop on Polyhedral compilation techniques, 2013.

[41]

J. Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000.

Digital Library

[42]

Y. Yaacoby and P. R. Cappello. Converting affine recurrence equations to quasi-uniform recurrence equations. VLSI Signal Processing, 11(1-2):113--131, 1995.

Digital Library

Cited By

Xu JSong GZhou BLi FHao JZhao JLee IChabbi MSteuwer M(2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638484
Gondimalla ALiu JThottethodi MVijaykumar T(2022)Occam: Optimal Data Reuse for Convolutional Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/356605220:1(1-25)Online publication date: 16-Dec-2022
https://dl.acm.org/doi/10.1145/3566052
Li YSun HPang J(2021)Revisiting split tiling for stencil computations in polyhedral compilationThe Journal of Supercomputing10.1007/s11227-021-03835-zOnline publication date: 27-May-2021
https://doi.org/10.1007/s11227-021-03835-z
Show More Cited By

Index Terms

Tiling and optimizing time-iterated computations on periodic domains
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A practical automatic polyhedral parallelizer and locality optimizer
PLDI '08

We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this ...
Read More
PLUTO+: near-complete modeling of affine transformations for parallelism and locality
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Affine transformations have proven to be very powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multi-dimensional affine function can represent a long and complex sequence of simpler ...
Read More
PLUTO+: near-complete modeling of affine transformations for parallelism and locality
PPoPP '15

Affine transformations have proven to be very powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multi-dimensional affine function can represent a long and complex sequence of simpler ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

August 2014

514 pages

ISBN:9781450328098

DOI:10.1145/2628071

General Chair:
J. Nelson Amaral
University of Alberta, Canada
,
Program Chair:
Josep Torrellas
University of Illinois, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFIP WG 10.3: IFIP WG 10.3
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing
IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ARTEMIS COPCAMS
Seventh Framework Programme

Conference

PACT '14

Sponsor:

IFIP WG 10.3
SIGARCH
IEEE CS TCPP
IEEE CS TCAA

PACT '14: International Conference on Parallel Architectures and Compilation

August 24 - 27, 2014

AB, Edmonton, Canada

Acceptance Rates

PACT '14 Paper Acceptance Rate 54 of 144 submissions, 38%;

Overall Acceptance Rate 121 of 471 submissions, 26%

Upcoming Conference

PACT '24

Sponsor:
sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Long Beach , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
315
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)3

Other Metrics

View Author Metrics

Citations

Cited By

Xu JSong GZhou BLi FHao JZhao JLee IChabbi MSteuwer M(2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638484
Gondimalla ALiu JThottethodi MVijaykumar T(2022)Occam: Optimal Data Reuse for Convolutional Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/356605220:1(1-25)Online publication date: 16-Dec-2022
https://dl.acm.org/doi/10.1145/3566052
Li YSun HPang J(2021)Revisiting split tiling for stencil computations in polyhedral compilationThe Journal of Supercomputing10.1007/s11227-021-03835-zOnline publication date: 27-May-2021
https://doi.org/10.1007/s11227-021-03835-z
Wang HChandramowlishwaran A(2020)Pencil: A Pipelined Algorithm for Distributed StencilsSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00089(1-16)Online publication date: Nov-2020
https://doi.org/10.1109/SC41405.2020.00089
Zhao JCohen A(2019)Flextended TilesACM Transactions on Architecture and Code Optimization10.1145/336938216:4(1-25)Online publication date: 17-Dec-2019
https://dl.acm.org/doi/10.1145/3369382
Yuan LHuang SZhang YCao H(2019)Tessellating Star StencilsProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337835(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337835
Nguyen QAndelfinger PCai WKnoll AJin DLiu JKale L(2019)Transitioning Spiking Neural Network Simulators to Heterogeneous HardwareProceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3316480.3322893(115-126)Online publication date: 29-May-2019
https://dl.acm.org/doi/10.1145/3316480.3322893
Luporini FLange MJacobs CGorman GRamanujam JKelly P(2019)Automated Tiling of Unstructured Mesh Computations with Application to Seismological ModelingACM Transactions on Mathematical Software10.1145/330225645:2(1-30)Online publication date: 3-May-2019
https://dl.acm.org/doi/10.1145/3302256
Hou KWang HFeng WVetter JLee S(2018)Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00037(276-285)Online publication date: May-2018
https://doi.org/10.1109/IPDPS.2018.00037
Moreton‐Fernandez AGonzalez‐Escribano A(2018)Automatic runtime calculation of communications for data‐parallel expressions with periodic conditionsConcurrency and Computation: Practice and Experience10.1002/cpe.443031:5Online publication date: 31-Jan-2018
https://doi.org/10.1002/cpe.4430
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents