Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Flattening and parallelizing irregular, recurrent loop nests

Published: 01 August 1995 Publication History

Abstract

Irregular loop nests in which the loop bounds are determined dynamically by indexed arrays are difficult to compile into expressive parallel constructs, such as segmented scans and reductions. In this paper, we describe a suite of transformations to automatically parallelize such irregular loop nests, even in the presence of recurrences. We describe a simple, general loop flattening transformation, along with new optimizations which make it a viable compiler transformation. A robust recurrence parallelization technique is coupled to the loop flattening transformation, allowing parallelization of segmented reductions, scans, and combining-sends over arbitrary associative operators. We discuss the implementation and performance results of the transformations in a parallelizing Fortran 77 compiler for the Cray C90 supercomputer. In particular, we focus on important sparse matrix-vector multiplication kernels, for one of which we are able to automatically derive an algorithm used by one of the fastest library routines available.

References

[1]
A. Aho, R. Sethi, and J. Ullman. Compilers, principles, techniques, and tools. Addison Wesley, 1988.
[2]
Guy E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, C- 38(11):1526-1538, November 1989.
[3]
G.E. Blelloch, S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. In Proceedings 4th A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming. San Diego, CA, May 1993.
[4]
G.E. Blelloch, S. Chatterjee, and M. Zagha. Solving linear recurrences with loop raking. In Proceedings Sixth International Parallel Processing Symposium, March 1992.
[5]
G.E. BIelloch, M. Heroux, and M. Zagha. Segmented operations for sparse matrix computation on vector multiprocessors. Carnegie Mellon University, School of Computer Science Technical Report CMU-CS-93- 173, August 1993.
[6]
R. Das, J. Saltz, and R. yon Hanxleden. Slicing analysis and indirect accesses to distributed arrays. Technical Report CS-TR-3076, University of Maryland, May 1993.
[7]
i.S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices. Oxford Science Publications, 1986.
[8]
A.L. Fisher and A. M. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 135-146, Orlando, FL, June 1994.
[9]
R. von Hanxleden and K. Kennedy. Relaxing SIMD control flow constraints using loop transformations. In Proceedings of the A CM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 188-199, San Francisco, CA, June 1992.
[10]
R. von Hanxleden, K. Kennedy, C. KoelbeI, R. Das, and J. Saltz. Compiler Analysis for Irregular Problems in Fortran D. In Languages and Compilers for Parallel Computing, 5th International Workshop Proceedings, pages 97-111, New Haven, CT, August 1992.
[11]
D.R. Kincaid and T. C. Oppe. Recent vectorization and parallelization of ITPACKV. Technical report, Center for Numerical Analysis, The University of Texas at Austin, November 1984.
[12]
C.J. Kuck, R. H. Kuhn, B. Leasure, and M. Wolfe. The structure of an advanced vectorizer for pipelined processors. In the proceedings of The 4th International Computer Software and Applications Conference (COMPSAC 80), October 1980.
[13]
C. Polychronopoulos. Loop coalescing: a compiler transformation for parallel machines. In Proceedings of the 1987 International Conference on Parallel Processing, St. Charles, IL, August 1987.
[14]
T.J. Sheffler. Implementing the multiprefix operation on parallel and vector computers. Carnegie Mellon University, School of Computer Science Technical Report CMU-CS-92-173, August 1992.
[15]
T.J. Sheffler. Match and move, an approach to data parallel computing. Ph.D. Thesis. Carnegie Mellon University, School of Computer Science 1992.
[16]
J. Subhlok, D. R. O'Hallaron, T. Gross, P. A. Dinda, and J. Webb. Communication and memory requirements as the basis for mapping task and data parallel programs. In Proceedings Supercomputing '94, Washington, D.C., November 1994.
[17]
M. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, 1989.

Cited By

View all
  • (2023)Batchman and Robin: Batched and Non-batched Branching for Interactive ZKProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623169(1452-1466)Online publication date: 15-Nov-2023
  • (2019)Mapping Imperfect Loops to Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.250491835:7(1092-1104)Online publication date: 4-Jan-2019
  • (2017)Automatic Collapsing of Non-Rectangular Loops2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.34(778-787)Online publication date: May-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1995
Published in SIGPLAN Volume 30, Issue 8

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)155
  • Downloads (Last 6 weeks)30
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Batchman and Robin: Batched and Non-batched Branching for Interactive ZKProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623169(1452-1466)Online publication date: 15-Nov-2023
  • (2019)Mapping Imperfect Loops to Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.250491835:7(1092-1104)Online publication date: 4-Jan-2019
  • (2017)Automatic Collapsing of Non-Rectangular Loops2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.34(778-787)Online publication date: May-2017
  • (2017)Source code transformations and optimizationsEmbedded Computing for High Performance10.1016/B978-0-12-804189-5.00005-3(137-183)Online publication date: 2017
  • (2023)Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop CharacteristicsProceedings of the ACM on Programming Languages10.1145/36228037:OOPSLA2(173-203)Online publication date: 16-Oct-2023
  • (2022)Loop rolling for code size reductionProceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO53902.2022.9741256(217-229)Online publication date: 2-Apr-2022
  • (2020)Supporting the Analysis of Safety Critical User InterfacesACM Transactions on Computer-Human Interaction10.1145/340419927:5(1-48)Online publication date: 3-Aug-2020
  • (2020)Location-based Sentiment Analyses and Visualization of Twitter Election DataDigital Government: Research and Practice10.1145/33399091:2(1-19)Online publication date: 9-Apr-2020
  • (2018)vRAM: Faster Verifiable RAM with Program-Independent Preprocessing2018 IEEE Symposium on Security and Privacy (SP)10.1109/SP.2018.00013(908-925)Online publication date: May-2018
  • (2016)GPUnetACM Transactions on Computer Systems10.1145/296309834:3(1-31)Online publication date: 17-Sep-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media