research-article

Run-Time Parallelization and Scheduling of Loops

Authors:

Ravi Mirchandaney,

Kay CrowleyAuthors Info & Claims

IEEE Transactions on Computers, Volume 40, Issue 5

Pages 603 - 612

https://doi.org/10.1109/12.88484

Published: 01 May 1991 Publication History

Abstract

The authors study run-time methods to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. The authors utilize symbolic transformation rules to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. The authors present performance results from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.

References

[1]

{1} J. R. Allen, D. Callahan, and K. Kennedy, "Automatic decomposition of scientific programs for parallel execution," in Conf. Record, 14th POPL, Jan. 1987.

Digital Library

[2]

{2} E. Anderson, "Solving sparse triangular linear systems on parallel computers," Rep. 794, UIUC, June 1988.

[3]

{3} D. Baxter, J. Saltz, M. Schultz, S. Eisentstat, and K. Crowley, "An experimental study of methods for parallel preconditioned krylov methods," in Proc. 1988 Hypercube Multiprocessor Conf., Pasadena, CA, Jan. 1988, pp. 1698,1711.

Digital Library

[4]

{4} R. Cytron, "Doacross: Beyond vectorization for multiprocessors," in Proc. ICPP 1986, 1986, pp. 836-844.

[5]

{5} K. Gallivan, W. Jalby, and D. Gannon, "On the problem of optimizing data transfers for complex memory systems," in Proc. 1988 ACM Int. Conf. Supercomput., St. Malo France, July 1988, pp. 238,253.

Digital Library

[6]

{6} M. C. Gilliland and Burton J. Smith, "Hep: A semaphore-synchronized multiprocessor with central control," in Proc. 1976 Summer Comput. Simulation Conf., July 1976, pp. 57-62.

[7]

{7} A. Greenbaum, "Solving sparse triangular linear systems using fortran with parallel extensions on the NYU Ultracomputer prototype," Rep. 99, NYU Ultracomputer Note, Apr. 1986.

[8]

{8} H. F. Jordan, "Performance measurements on hep, a pipelined mind computer," in Proc. 10th Annu. Int. Symp. Comput. Architecture, SIGARCH Newsletter, vol. 11, 1983, pp. 207-212.

Digital Library

[9]

{9} C. Koelbel, "The BIF data structures user's manual," Purdue Univ., West Lafayette, IN, 1987, in preparation.

[10]

{10} C. Koelbel, P. Mehrotra, and J. Van Rosendale, "Supporting shared data structures on distributed memory architectures," in Proc. 2nd ACM SIGPLAN Symp. Principles Practice of Parallel Programming, Mar. 1990, Rep. 90-7, ICASE, Jan. 1990.

Digital Library

[11]

{11} V. Krothapalli and P. Sadayappan, "An approach to synchronization for parallel computing," in Proc. 1988 Conf. Supercomput., St. Malo, 1988, 1988, pp. 573-581.

Digital Library

[12]

{12} E. L. Lusk and R. A. Overbeek, "A minimalist approach to portable, parallel programming," in The Characteristics of parallel Algorithms, L. Jamieson, D. Gannon, and R. Douglass, Eds. Cambridge, MA: MIT Press, 1987, pp. 351-362.

[13]

{13} R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nicol, and Kay Crowley, "Principles of runtime support for parallel processors," in Proc. 1988 ACM Int. Conf. Supercomput., St. Malo, France, July 1988, pp. 140-152.

Digital Library

[14]

{14} D. M. Nicol and J. H. Saltz, "Delay point schedules for irregular parallel computations," Int. J. Parallel Programming, vol. 18, no. 1, Feb. 1989.

Digital Library

[15]

{15} D. A Padua, D. J. Kuck, and D. H. Lawrie, "High-speed multiprocessors and compilation techniques," IEEE Trans. Comput., vol. C-29, no. 9, pp. 763-776, Sept. 1980.

[16]

{16} D. A. Padua and M. J. Wolfe, "Advanced compiler optimizations for supercomputers," Commun. ACM, Dec. 1986.

Digital Library

[17]

{17} C. Polychronopoulos and D. Kuck, "Guided self-scheduling: A practical scheduling scheme for parallel supercomputers," IEEE Tran. Comput., 1987.

Digital Library

[18]

{18} J. Saltz, "Methods for automated problem mapping," in The IMA Volumes in Mathematics and its Applications. Vol. 13: Numerical Algorithms for Modern Parallel Computer Architectures, M. Schultz, Ed. New York: Springer-Verlag, 1988.

[19]

{19} J. Saltz, "Aggregation methods for solving sparse triangular systems on multiprocessors," SIAM J. Sci. Stat. Computat., vol. 11, no. 1, pp. 123-144, 1990.

Digital Library

[20]

{20} J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman, "Run-time scheduling and execution of loops on message passing machines, J. Parallel Distributed Comput., Apr. 1990. Rep. 89-7, ICASE, Jan. 1989.

[21]

{21} J. Saltz and R. Mirchandaney, "The preprocessed doacross loop," Rep. 90-11, ICASE Interim Rep., 1990, also in Proc. ICPP, 1991, to be published.

[22]

{22} J. Saltz, R. Mirchandaney, and D. Baxter, "Run-time parallelization and scheduling of loops," Rep. 88-70, ICASE, Dec. 1988.

[23]

{23} J. Saltz, R. Mirchandaney, and D. Baxter, "Runtime parallelization and scheduling of loops," in Proc. Symp. Parallel Algorithms Architectures, Santa Fe, NM, June 1989.

Digital Library

[24]

{24} P. Tang and P. Yew, "Processor self-scheduling for multiple nested parallel loops," in Proc. ICPP 1986, pp. 528-535.

[25]

{25} M. Wolfe, Optimizing Supercompilers for Supercomputers. Cambridge MA: MIT Press, 1989.

Digital Library

Cited By

Park YMin SLee J(2022)GinexProceedings of the VLDB Endowment10.14778/3551793.355181915:11(2626-2639)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551819
Rolinger TKrieger CSussman A(2022)Compiler Optimization for Irregular Memory Access Patterns in PGAS ProgramsLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_1(3-21)Online publication date: 12-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-31445-2_1
Gundabolu SVijaykumar TThottethodi Mde Supinski BHall MGamblin T(2021)FastZProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476202(1-13)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476202
Show More Cited By

Index Terms

Run-Time Parallelization and Scheduling of Loops
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
  2. Parallel computing methodologies
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language features
        Recursion

Recommendations

Run-time parallelization for partially parallel loops
ICPADS '97: Proceedings of the 1997 International Conference on Parallel and Distributed Systems

In this paper, a run-time technique based on inspector-executor scheme is proposed to find available parallelism on loops in this paper. Our inspector can determine the wavefronts by building a DEF-USE table. Additionally, the process of inspector for ...
Run-Time Parallelization for Loops
HICSS '96: Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture

Current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and/or statically insufficiently defined access pattern. In this paper, a run-time technique based on insp/exec scheme (...
On the parallelization of loop nests containing while loops
PAS '95: Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis

Recently, efforts have been made to devise automatic methods, based on a mathematical model, for the parallelization of loop nests with while loops. These methods are extensions of methods for the parallelization of nested for loops. As we present the ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 40, Issue 5

May 1991

99 pages

ISSN:0018-9340

Editor:
Earl Swartzlander
Univ. of Texas at Austin, Austin

Issue’s Table of Contents

Copyright © Copyright © 1991 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 1991

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

109
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Park YMin SLee J(2022)GinexProceedings of the VLDB Endowment10.14778/3551793.355181915:11(2626-2639)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551819
Rolinger TKrieger CSussman A(2022)Compiler Optimization for Irregular Memory Access Patterns in PGAS ProgramsLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_1(3-21)Online publication date: 12-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-31445-2_1
Gundabolu SVijaykumar TThottethodi Mde Supinski BHall MGamblin T(2021)FastZProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476202(1-13)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476202
Wang QZheng LZhao JLiao XJin HXue J(2020)A Conflict-free Scheduler for High-performance Graph Processing on Multi-pipeline FPGAsACM Transactions on Architecture and Code Optimization10.1145/339052317:2(1-26)Online publication date: 29-May-2020
https://dl.acm.org/doi/10.1145/3390523
Kobeissi SKetterlin AClauss P(2020)Rec2Poly: Converting Recursions to Polyhedral Optimized Loops Using an Inspector-Executor StrategyEmbedded Computer Systems: Architectures, Modeling, and Simulation10.1007/978-3-030-60939-9_7(96-109)Online publication date: 5-Jul-2020
https://dl.acm.org/doi/10.1007/978-3-030-60939-9_7
Hückelheim JHovland PStrout MMüller J(2019)Reverse-mode algorithmic differentiation of an OpenMP-parallel compressible flow solverInternational Journal of High Performance Computing Applications10.1177/109434201771206033:1(140-154)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1177/1094342017712060
Bak SGuo YBalaji PSarkar V(2019)Optimized Execution of Parallel Loops via User-Defined Scheduling PoliciesProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337913(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337913
Mohammadi MYuki TCheshmi KDavis EHall MDehnavi MNandy POlschanowsky CVenkat AStrout MMcKinley KFisher K(2019)Sparse computation data dependence simplification for efficient compiler-generated inspectorsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314646(594-609)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314646
Luporini FLange MJacobs CGorman GRamanujam JKelly P(2019)Automated Tiling of Unstructured Mesh Computations with Application to Seismological ModelingACM Transactions on Mathematical Software10.1145/330225645:2(1-30)Online publication date: 3-May-2019
https://dl.acm.org/doi/10.1145/3302256
Venkat AMohammadi MPark JRong HBarik RStrout MHall MWest J(2016)Automating wavefront parallelization for sparse matrix computationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014959(1-12)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3014959
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents