Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Run-Time Parallelization and Scheduling of Loops

Published: 01 May 1991 Publication History

Abstract

The authors study run-time methods to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. The authors utilize symbolic transformation rules to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. The authors present performance results from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.

References

[1]
{1} J. R. Allen, D. Callahan, and K. Kennedy, "Automatic decomposition of scientific programs for parallel execution," in Conf. Record, 14th POPL, Jan. 1987.
[2]
{2} E. Anderson, "Solving sparse triangular linear systems on parallel computers," Rep. 794, UIUC, June 1988.
[3]
{3} D. Baxter, J. Saltz, M. Schultz, S. Eisentstat, and K. Crowley, "An experimental study of methods for parallel preconditioned krylov methods," in Proc. 1988 Hypercube Multiprocessor Conf., Pasadena, CA, Jan. 1988, pp. 1698,1711.
[4]
{4} R. Cytron, "Doacross: Beyond vectorization for multiprocessors," in Proc. ICPP 1986, 1986, pp. 836-844.
[5]
{5} K. Gallivan, W. Jalby, and D. Gannon, "On the problem of optimizing data transfers for complex memory systems," in Proc. 1988 ACM Int. Conf. Supercomput., St. Malo France, July 1988, pp. 238,253.
[6]
{6} M. C. Gilliland and Burton J. Smith, "Hep: A semaphore-synchronized multiprocessor with central control," in Proc. 1976 Summer Comput. Simulation Conf., July 1976, pp. 57-62.
[7]
{7} A. Greenbaum, "Solving sparse triangular linear systems using fortran with parallel extensions on the NYU Ultracomputer prototype," Rep. 99, NYU Ultracomputer Note, Apr. 1986.
[8]
{8} H. F. Jordan, "Performance measurements on hep, a pipelined mind computer," in Proc. 10th Annu. Int. Symp. Comput. Architecture, SIGARCH Newsletter, vol. 11, 1983, pp. 207-212.
[9]
{9} C. Koelbel, "The BIF data structures user's manual," Purdue Univ., West Lafayette, IN, 1987, in preparation.
[10]
{10} C. Koelbel, P. Mehrotra, and J. Van Rosendale, "Supporting shared data structures on distributed memory architectures," in Proc. 2nd ACM SIGPLAN Symp. Principles Practice of Parallel Programming, Mar. 1990, Rep. 90-7, ICASE, Jan. 1990.
[11]
{11} V. Krothapalli and P. Sadayappan, "An approach to synchronization for parallel computing," in Proc. 1988 Conf. Supercomput., St. Malo, 1988, 1988, pp. 573-581.
[12]
{12} E. L. Lusk and R. A. Overbeek, "A minimalist approach to portable, parallel programming," in The Characteristics of parallel Algorithms, L. Jamieson, D. Gannon, and R. Douglass, Eds. Cambridge, MA: MIT Press, 1987, pp. 351-362.
[13]
{13} R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nicol, and Kay Crowley, "Principles of runtime support for parallel processors," in Proc. 1988 ACM Int. Conf. Supercomput., St. Malo, France, July 1988, pp. 140-152.
[14]
{14} D. M. Nicol and J. H. Saltz, "Delay point schedules for irregular parallel computations," Int. J. Parallel Programming, vol. 18, no. 1, Feb. 1989.
[15]
{15} D. A Padua, D. J. Kuck, and D. H. Lawrie, "High-speed multiprocessors and compilation techniques," IEEE Trans. Comput., vol. C-29, no. 9, pp. 763-776, Sept. 1980.
[16]
{16} D. A. Padua and M. J. Wolfe, "Advanced compiler optimizations for supercomputers," Commun. ACM, Dec. 1986.
[17]
{17} C. Polychronopoulos and D. Kuck, "Guided self-scheduling: A practical scheduling scheme for parallel supercomputers," IEEE Tran. Comput., 1987.
[18]
{18} J. Saltz, "Methods for automated problem mapping," in The IMA Volumes in Mathematics and its Applications. Vol. 13: Numerical Algorithms for Modern Parallel Computer Architectures, M. Schultz, Ed. New York: Springer-Verlag, 1988.
[19]
{19} J. Saltz, "Aggregation methods for solving sparse triangular systems on multiprocessors," SIAM J. Sci. Stat. Computat., vol. 11, no. 1, pp. 123-144, 1990.
[20]
{20} J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman, "Run-time scheduling and execution of loops on message passing machines, J. Parallel Distributed Comput., Apr. 1990. Rep. 89-7, ICASE, Jan. 1989.
[21]
{21} J. Saltz and R. Mirchandaney, "The preprocessed doacross loop," Rep. 90-11, ICASE Interim Rep., 1990, also in Proc. ICPP, 1991, to be published.
[22]
{22} J. Saltz, R. Mirchandaney, and D. Baxter, "Run-time parallelization and scheduling of loops," Rep. 88-70, ICASE, Dec. 1988.
[23]
{23} J. Saltz, R. Mirchandaney, and D. Baxter, "Runtime parallelization and scheduling of loops," in Proc. Symp. Parallel Algorithms Architectures, Santa Fe, NM, June 1989.
[24]
{24} P. Tang and P. Yew, "Processor self-scheduling for multiple nested parallel loops," in Proc. ICPP 1986, pp. 528-535.
[25]
{25} M. Wolfe, Optimizing Supercompilers for Supercomputers. Cambridge MA: MIT Press, 1989.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 40, Issue 5
May 1991
99 pages
ISSN:0018-9340
Issue’s Table of Contents

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 1991

Author Tags

  1. Encore Multimax
  2. automatic parallelization
  3. automatic scheduling
  4. compile-time information
  5. concurrently executable loop iterations
  6. do loop
  7. execution time preprocessing
  8. executors
  9. inspector procedures
  10. loop dependency analysis
  11. loop indexes
  12. parallel programming
  13. run-time methods
  14. run-time reordering
  15. scheduling.
  16. source code loop structures
  17. symbolic transformation rules
  18. transformed versions
  19. wavefronts

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)GinexProceedings of the VLDB Endowment10.14778/3551793.355181915:11(2626-2639)Online publication date: 29-Sep-2022
  • (2022)Compiler Optimization for Irregular Memory Access Patterns in PGAS ProgramsLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_1(3-21)Online publication date: 12-Oct-2022
  • (2021)FastZProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476202(1-13)Online publication date: 14-Nov-2021
  • (2020)A Conflict-free Scheduler for High-performance Graph Processing on Multi-pipeline FPGAsACM Transactions on Architecture and Code Optimization10.1145/339052317:2(1-26)Online publication date: 29-May-2020
  • (2020)Rec2Poly: Converting Recursions to Polyhedral Optimized Loops Using an Inspector-Executor StrategyEmbedded Computer Systems: Architectures, Modeling, and Simulation10.1007/978-3-030-60939-9_7(96-109)Online publication date: 5-Jul-2020
  • (2019)Reverse-mode algorithmic differentiation of an OpenMP-parallel compressible flow solverInternational Journal of High Performance Computing Applications10.1177/109434201771206033:1(140-154)Online publication date: 1-Jan-2019
  • (2019)Optimized Execution of Parallel Loops via User-Defined Scheduling PoliciesProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337913(1-10)Online publication date: 5-Aug-2019
  • (2019)Sparse computation data dependence simplification for efficient compiler-generated inspectorsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314646(594-609)Online publication date: 8-Jun-2019
  • (2019)Automated Tiling of Unstructured Mesh Computations with Application to Seismological ModelingACM Transactions on Mathematical Software10.1145/330225645:2(1-30)Online publication date: 3-May-2019
  • (2016)Automating wavefront parallelization for sparse matrix computationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014959(1-12)Online publication date: 13-Nov-2016
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media