Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A framework for enhancing data reuse via associative reordering

Published: 09 June 2014 Publication History

Abstract

The freedom to reorder computations involving associative operators has been widely recognized and exploited in designing parallel algorithms and to a more limited extent in optimizing compilers.
In this paper, we develop a novel framework utilizing the associativity and commutativity of operations in regular loop computations to enhance register reuse. Stencils represent a particular class of important computations where the optimization framework can be applied to enhance performance. We show how stencil operations can be implemented to better exploit register reuse and reduce load/stores. We develop a multi-dimensional retiming formalism to characterize the space of valid implementations in conjunction with other program transformations. Experimental results demonstrate the effectiveness of the framework on a collection of high-order stencils.

References

[1]
M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, 1964.
[2]
F. Aleen and N. Clark. Commutativity analysis for software parallelization: letting program transformations see the big picture. In ASPLOS, pages 241--252, 2009.
[3]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS parallel benchmarks - summary and preliminary results. In SC, pages 158--165, 1991.
[4]
J. W. Banks and W. D. Henshaw. Upwind schemes for the wave equation in second-order form. J. Comput. Phys., 231(17):5854--5889, 2012.
[5]
C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT, pages 7--16, 2004.
[6]
G. E. Blelloch. Scans as primitive parallel operations. IEEE TC, 38 (11):1526--1538, 1989.
[7]
P.-Y. Calland, A. Darte, and Y. Robert. Circuit retiming applied to decomposed software pipelining. IEEE TPDS, 9(1):24--35, 1998.
[8]
Chombo. https://commons.lbl.gov/display/chombo.
[9]
R. Cruz, M. Araya-Polo, and J. Cela. Introducing the semi-stencil algorithm. In PPAM, pages 496--506. 2010.
[10]
A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. PPL, 7(4):379--392, 1997.
[11]
K. Datta. Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. PhD thesis, EECS, University of California, Berkeley, 2009.
[12]
S. J. Deitz, B. L. Chamberlain, and L. Snyder. Eliminating redundancies in sum-of-product array computations. In ICS, pages 65--77, 2001.
[13]
Y. Dotsenko, N. K. Govindaraju, P.-P. Sloan, C. Boyd, and J. Manferdelli. Fast scan algorithms on graphics processors. In ICS, pages 205--213, 2008.
[14]
H. Dursun, M. Kunaseth, K. ichi Nomura, J. Chame, R. F. Lucas, C. Chen, M. W. Hall, R. K. Kalia, A. Nakano, and P. Vashishta. Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters. The Journal of Supercomputing, 62(2): 946--966, 2012.
[15]
P. Feautrier. Dataflow analysis of scalar and array references. IJPP, 20(1):23--53, 1991.
[16]
L. Han, W. Liu, and J. Tuck. Speculative parallelization of partial reduction variables. In CGO, pages 141--150, 2010.
[17]
R. Haralick and L. Shapiro. Computer and robot vision. Computer and Robot Vision. Addison-Wesley, 1993.
[18]
T. Henretty, K. Stock, L.-N. Pouchet, F. Franchetti, J. Ramanujam, and P. Sadayappan. Data layout transformation for stencil computations on short simd architectures. In CC, pages 225--245, 2011.
[19]
T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector simd architectures. In ICS, 2013.
[20]
J. Holewinski, L.-N. Pouchet, and P. Sadayappan. High-performance code generation for stencil computations on gpu architectures. In ICS, 2012.
[21]
S. Kim and S.-M. Moon. Rotating register allocation for enhanced pipeline scheduling. In PACT, pages 60--72, 2007.
[22]
M. Kong, R. Veras, K. Stock, F. Franchetti, L.-N. Pouchet, and P. Sadayappan. When polyhedral transformations meet simd code generation. In PLDI, 2013.
[23]
M. Kulkarni, D. Nguyen, D. Prountzos, X. Sui, and K. Pingali. Exploiting the commutativity lattice. In PLDI, pages 542--555, 2011.
[24]
T. Liebig. openEMS - Open Electromagnetic Field Solver. URL http://openEMS.de.
[25]
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE TCCA, pages 19--25, 1995.
[26]
Overture. Overture: An Object-Oriented Toolkit for Solving Partial Differential Equations in Complex Geometry; version 25, 2012. http://www.overtureframework.org/.
[27]
N. L. Passos and E. H.-M. Sha. Achieving full parallelism using multidimensional retiming. IEEE TPDS, 7(11):1150--1163, 1996.
[28]
N. L. Passos, E. H.-M. Sha, and S. C. Bass. Optimizing dsp flow graphs via schedule-based multidimensional retiming. IEEE TSP, 44 (1):150--155, 1996.
[29]
L.-N. Pouchet. PoCC 1.2: the Polyhedral Compiler Collection. http://pocc.sourceforge.net, 2012.
[30]
L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. Iterative optimization in the polyhedral model: Part II, multidimensional time. In PLDI, pages 90--100, 2008.
[31]
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. Loop transformations: Convexity, pruning and optimization. In POPL, pages 549--562, 2011.
[32]
L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In FPGA, 2013.
[33]
P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: A language extension for implicit parallel programming. In PLDI, pages 1--11, 2011.
[34]
F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. IJPP, 28(5):469--498, 2000.
[35]
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In PLDI, pages 519--530, 2013.
[36]
X. Redon and P. Feautrier. Detection of recurrences in sequential programs with loops. In PARLE, pages 132--145, 1993.
[37]
M. C. Rinard and P. C. Diniz. Commutativity analysis: A new analysis technique for parallelizing compilers. TOPLAS, 19(6):942--991, 1997.
[38]
N. Sedaghati, R. Thomas, L. Pouchet, R. Teodorescu, and P. Sadayappan. StVEC: A vector instruction extension for high performance stencil computation. In PACT, pages 276--287, 2011.
[39]
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for gpu computing. In GH, pages 97--106, 2007.
[40]
L. T. Simpson. Value-driven Redundancy Elimination. PhD thesis, Houston, TX, USA, 1996.
[41]
N. Vasilache, A. Cohen, and L.-N. Pouchet. Automatic correction of loop transformations. In PACT, pages 292--304, 2007.
[42]
S. Verdoolaege. ISL: An integer set library for the polyhedral model. In Mathematical Software--ICMS 2010, pages 299--302. Springer, 2010.
[43]
H. Weller. OpenFOAM. URL http://www.openfoam.org/.
[44]
S. Williams, A. Waterman, and D. Patterson. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, 2009.
[45]
Y. Zou and S. Rajopadhye. Scan detection and parallelization in "inherently sequential" nested loop programs. In CGO, pages 74--83, 2012.

Cited By

View all
  • (2024)ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor CoresProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638476(333-347)Online publication date: 2-Mar-2024
  • (2022)Source-to-Source Automatic Differentiation of OpenMP Parallel LoopsACM Transactions on Mathematical Software10.1145/347279648:1(1-32)Online publication date: 16-Feb-2022
  • (2021)Spray: Sparse Reductions of Arrays in OPENMP2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00056(475-484)Online publication date: May-2021
  • Show More Cited By

Index Terms

  1. A framework for enhancing data reuse via associative reordering
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 49, Issue 6
    PLDI '14
    June 2014
    598 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2666356
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
    • cover image ACM Conferences
      PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2014
      619 pages
      ISBN:9781450327848
      DOI:10.1145/2594291
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2014
    Published in SIGPLAN Volume 49, Issue 6

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor CoresProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638476(333-347)Online publication date: 2-Mar-2024
    • (2022)Source-to-Source Automatic Differentiation of OpenMP Parallel LoopsACM Transactions on Mathematical Software10.1145/347279648:1(1-32)Online publication date: 16-Feb-2022
    • (2021)Spray: Sparse Reductions of Arrays in OPENMP2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00056(475-484)Online publication date: May-2021
    • (2019)Automatic Differentiation for Adjoint Stencil LoopsProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337906(1-10)Online publication date: 5-Aug-2019
    • (2019)Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356210(1-44)Online publication date: 17-Nov-2019
    • (2018)SODA: Stencil with Optimized Dataflow Architecture2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)10.1145/3240765.3240850(1-8)Online publication date: 5-Nov-2018
    • (2018)SIMD code generation for stencils on brick decompositionsACM SIGPLAN Notices10.1145/3200691.317853753:1(423-424)Online publication date: 10-Feb-2018
    • (2018)SIMD code generation for stencils on brick decompositionsProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178537(423-424)Online publication date: 10-Feb-2018
    • (2018)Delivering Performance-Portable Stencil Computations on CPUs and GPUs Using Bricks2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC.2018.00009(59-70)Online publication date: Nov-2018
    • (2018)Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil ComputationsProceedings of the IEEE10.1109/JPROC.2018.2862896106:11(1902-1920)Online publication date: Nov-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media