research-article

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

Authors:

Lawrence Rauchwerger and

David A. PaduaAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 10, Issue 2

Pages 160 - 180

https://doi.org/10.1109/71.752782

Published: 01 February 1999 Publication History

Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall and apply a fully parallel data dependence test to determine if it had any cross-iteration dependences; if the test fails, then the loop is reexecuted serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run-time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: It detects at run-time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks, which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods

References

[1]

S. Abraham, private communication, Hewlett Packard Laboratories, 1994.

[2]

J.R. Allen K. Kennedy C. Porterfield and J. Warren, "Conversion of Control Dependence to Data Dependence," Proc. 10th ACM Symp. Principles of Programming Languages, pp. 177-189, Jan. 1983.

Digital Library

[3]

T. Allen and D.A. Padua, "Debugging Fortran on a Shared-Memory Machine," Proc. 1987 Int'l Conf. Parallel Processing, pp. 721-727, St. Charles, Ill., 1987.

[4]

FX/Series Architecture Manual. Alliant Computer Systems Corp., 1986.

[5]

Alliant FX/2800 Series System Description. Alliant Computer Systems Corp., 1991.

[6]

R. Ballance A. Maccabe and K. Ottenstein, "The Program Dependence Web: A Representation Supporting Control-Data- and Demand-Driven Interpretation of Imperative Languages," Proc. SIGPLAN '90 Conf. Programming Language Design and Implementation, pp. 257-271, June 1990.

Digital Library

[7]

U. Banerjee, Dependence Analysis for Supercomputing. Boston, Mass.: Kluwer Academic, 1988.

Digital Library

[8]

M. Berry D. Chen P. Koss D. Kuck S. Lo Y. Pang R. Roloff A. Sameh E. Clementi S. Chin D. Schneider G. Fox P. Messina D. Walker C. Hsiung J. Schwarzmeier K. Lue S. Orzag F. Seidl O. Johnson G. Swanson R. Goodrum and J. Martin, "The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers," Technical Report CSRD-827, Center for Supercomputing Research and Development, Univ. of Illinois, Urbana-Champaign, May 1989.

Digital Library

[9]

H. Berryman and J. Saltz, "A Manual for PARTI Runtime Primitives," Interim Report 90-13, ICASE, 1990.

[10]

W. Blume and R. Eigenmann, "Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks<sup>TM</sup> Programs," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 6, pp. 643-656, Nov. 1992.

Digital Library

[11]

M. Burke R. Cytron J. Ferrante and W. Hsieh, "Automatic Generation of Nested, Fork-Join Parallelism," J. Supercomputing, pp. 71-88, 1989.

[12]

W.J. Camp S.J. Plimpton B.A. Hendrickson and R.W. Leland, "Massively Parallel Methods for Engineering and Science Problems," Comm. ACM, vol. 37, no. 4, pp. 31-41, Apr. 1994.

Digital Library

[13]

D.K. Chen P.C. Yew and J. Torrellas, "An Efficient Algorithm for the Run-Time Parallelization of doacross Loops," Proc. Supercomputing 1994, pp. 518-527, Nov. 1994.

Digital Library

[14]

A. Dinning and E. Schonberg, "An Empirical Comparison of Monitoring Algorithms for Access Anomaly Detection," Proc. Second ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPOPP), pp. 1-10, 1990.

Digital Library

[15]

R. Eigenmann J. Hoeflinger Z. Li and D. Padua, "Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs," Proc. Fourth Workshop on Languages and Compilers for Parallel Computing, pp. 65-83, Santa Clara, Calif., Aug. 1991.

Digital Library

[16]

P.A. Emrath S. Ghosh and D.A. Padua, "Detecting Nondeterminacy in Parallel Programs," IEEE Software, pp. 69-77, Jan. 1992.

Digital Library

[17]

D.R. Jefferson, "Virtual Time," ACM Trans. Programming Languages and Systems, vol. 7, no. 3, pp. 404-425, July 1985.

Digital Library

[18]

V. Krothapalli and P. Sadayappan, "An Approach to Synchronization of Parallel Computing," Proc. 1988 Int'l Conf. Supercomputing, pp. 573-581, June 1988.

Digital Library

[19]

C. Kruskal, "Efficient Parallel Algorithms for Graph Problems," Proc. 1986 Int'l Conf. Parallel Processing, pp. 869-876, Aug. 1986.

[20]

D.J. Kuck R.H. Kuhn D.A. Padua B. Leasure and M. Wolfe, "Dependence Graphs and Compiler Optimizations," Proc. Eighth ACM Symp. Principles of Programming Languages, pp. 207-218, Jan. 1981.

Digital Library

[21]

F. Thomson Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, 1992.

Digital Library

[22]

S. Leung and J. Zahorjan, "Improving the Performance of Runtime Parallelization," Proc. Fourth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPOPP), pp. 83-91, May 1993.

Digital Library

[23]

Z. Li, "Array Privatization for Parallel Execution of Loops," Proc. 19th Int'l Symp. Computer Architecture, pp. 313-322, 1992.

Digital Library

[24]

D.E. Maydan S.P. Amarasinghe and M.S. Lam, "Data Dependence and Data-Flow Analysis of Arrays," Proc. Fourth Workshop Programming Languages and Compilers for Parallel Computing, Aug. 1992.

Digital Library

[25]

J. Mellor-Crummey, "On-the-Fly Detection of Data Races for Programs with Nested Fork-Join Parallelism," Proc. Supercomputing 1991, pp. 24-33, Albuquerque, N.M., Nov. 1991.

Digital Library

[26]

J. Mellor-Crummey, "Compile-Time Support for Efficient Data Race Detection in Shared-Memory Parallel Programs," Proc. ACM/ONR Workshop Parallel and Distributed Debugging, San Diego, Calif., pp. 129-139, May 1993.

Digital Library

[27]

S. Midkiff and D. Padua, "Compiler Algorithms for Synchronization," IEEE Trans. Computers, vol. 36, no. 12, pp. 1,485-1,495, Dec. 1987.

Digital Library

[28]

I. Nudler and L. Rudolph, "Tools for the Efficient Developement of Efficient Parallel Programs," Proc. First Israeli Conf. Computer System Eng., 1988.

[29]

D.A. Padua and M.J. Wolfe, "Advanced Compiler Optimizations for Supercomputers," Comm. ACM, vol. 29, pp. 1,184-1,201, Dec. 1986.

Digital Library

[30]

C. Polychronopoulos, "Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design," IEEE Trans. Computers, vol. 37, no. 8, pp. 991-1,004, Aug. 1988.

Digital Library

[31]

L. Rauchwerger N. Amato and D. Padua, "A Scalable Method for Run-Time Loop Parallelization," Int'l J. Parallel Processing, vol. 26, no. 6, pp. 537-576, July 1995.

Digital Library

[32]

L. Rauchwerger and D. Padua, "The Privatizing doall Test: A Run-Time Technique for doall Loop Identification and Array Privatization," Proc. 1994 Int'l Conf. Supercomputing, pp. 33-43, July 1994.

Digital Library

[33]

L. Rauchwerger and D.A. Padua, "Parallelizing WHILE Loops for Multiprocessor Systems," Proc. Ninth Int'l Parallel Processing Symp., Apr. 1995.

Digital Library

[34]

L. Rauchwerger and D.A. Padua, "The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization," Proc. SIGPLAN 1995 Conf. Programming Language Design and Implementation, pp. 218-232, La Jolla, Calif., June 1995.

Digital Library

[35]

J. Saltz and R. Mirchandaney, "The Preprocessed doacross Loop," Proc. 1991 Int'l Conf. Parallel Processing, Dr. H.D. Schwetman, ed., pp. 174-178. CRC Press, Inc., 1991.

[36]

J. Saltz R. Mirchandaney and K. Crowley, "The doconsider Loop," Proc. 1989 Int'l Conf. Supercomputing, pp. 29-40, June 1989.

Digital Library

[37]

J. Saltz R. Mirchandaney and K. Crowley, "Run-Time Parallelization and Scheduling of Loops," IEEE Trans. Computers, vol. 40, no. 5, May 1991.

Digital Library

[38]

E. Schonberg, "On-the-Fly Detection of Access Anomalies," Proc. SIGPLAN 1989 Conf. Programming Language Design and Implementation, pp. 285-297, Portland, Ore., 1989.

Digital Library

[39]

P. Tu and D. Padua, "Array Privatization for Shared and Distributed Memory Machines," Proc. Second Workshop Languages, Compilers, and Run-Time Environments for Distributed Memory Machines, Sept. 1992.

[40]

P. Tu and D. Padua, "Automatic Array Privatization," Proc. Sixth Ann. Workshop on Languages and Compilers for Parallel Computing, Portland, Ore., Aug. 1993.

Digital Library

[41]

P. Tu and D. Padua, "GSA Based Demand-Driven Symbolic Analysis," Technical Report 1339, Center for Supercomputing Research and Development, Univ. of Illinois at Urbana-Champaign, Feb. 1994.

[42]

A. Vladimirescu, "LSI Circuit Simulation on Vector Computers," Technical Report no. UCB/ERL M82/75, Electronics Research Lab., Univ. of California, Berkeley, Oct. 1982.

[43]

M. Wolfe, Optimizing Compilers for Supercomputers. Boston, Mass.: The MIT Press, 1989.

Digital Library

[44]

J. Wu J. Saltz S. Hiranandani and H. Berryman, "Runtime Compilation Methods for Multicomputers," Proc. 1991 Int'l Conf. Parallel Processing, Dr. H.D. Schwetman, ed., pp. 26-30. CRC Press, Inc., 1991.

[45]

C. Xu, "Effects of Parallelism Degree on Runtime Parallelism of Loops," Proc. 31st Hawaii Int'l Conf. System Sciences, pp. 86-95, Jan. 1998.

Digital Library

[46]

C. Xu and V. Chaudhary, "Time-Stamping Algorithms for Parallelization of Loops at Run-time," Proc. 11th Int'l Parallel Processing Symp., Apr. 1997.

Digital Library

[47]

Y. Zhang L. Rauchwerger and J. Torrellas, "Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors," Proc. Fourth Int'l Symp. High Performance Computer Architecture 1998 (HPCA-4), pp. 162-173, Feb. 1998.

Digital Library

[48]

Y. Zhang L. Rauchwerger and J. Torrellas, "Speculative Parallel Execution of Loops with Cross-Iteration Dependences in DSM Multiprocessors," Proc. Fifth Int'l Symp. High-Performance Computer Architecture (HPCA-5), Jan. 1999.

[49]

C. Zhu and P.C. Yew, "A Scheme to Enforce Data Dependence on Large Multiprocessor Systems," IEEE Trans. Software Eng., vol. 13, no. 6, pp. 726-739, June 1987.

Digital Library

[50]

H. Zima, Supercompilers for Parallel and Vector Computers. New York: ACM Press, 1991.

Cited By

Xu ZChon YSu YTan ZApostolakis SCampanoni SAugust D(2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649827
Bhosale AEigenmann RLee IChabbi MSteuwer M(2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638493
Dewald FRohde JHochberger CMantel H(2022)Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLSACM Transactions on Reconfigurable Technology and Systems10.1145/350180115:3(1-31)Online publication date: 4-Feb-2022
https://dl.acm.org/doi/10.1145/3501801
Show More Cited By

Index Terms

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel ...
Read More
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops
IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing Symposium

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. In our previously proposed framework we have speculatively executed a loop as ...
Read More
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization
PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel ...
Read More

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 10, Issue 2

February 1999

96 pages

ISSN:1045-9219

Editor:
John A. Stankovic
Univ. of Virginia, Charlottesville

Issue’s Table of Contents

Copyright © Copyright © 1999 IEEE. All Rights Reserved.

Publisher

IEEE Press

Publication History

Published: 01 February 1999

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

99
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Xu ZChon YSu YTan ZApostolakis SCampanoni SAugust D(2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649827
Bhosale AEigenmann RLee IChabbi MSteuwer M(2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638493
Dewald FRohde JHochberger CMantel H(2022)Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLSACM Transactions on Reconfigurable Technology and Systems10.1145/350180115:3(1-31)Online publication date: 4-Feb-2022
https://dl.acm.org/doi/10.1145/3501801
Rolinger TKrieger CSussman A(2022)Compiler Optimization for Irregular Memory Access Patterns in PGAS ProgramsLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_1(3-21)Online publication date: 12-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-31445-2_1
Morihata ASato SFreund SYahav E(2021)Reverse engineering for reduction parallelization via semiring polynomialsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454079(820-834)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454079
Bhosale AEigenmann RZhou HMoreira JMueller FEtsion Y(2021)On the automatic parallelization of subscripted subscript patterns using array property analysisProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460424(392-403)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3460424
Qiu JSun XSabet AZhao ZSherwood TBerger EKozyrakis C(2021)Scalable FSM parallelization via path fusion and higher-order speculationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446705(887-901)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446705
Yang CAtkinson ECarbin M(2021)Simplifying dependent reductions in the polyhedral modelProceedings of the ACM on Programming Languages10.1145/34343015:POPL(1-33)Online publication date: 4-Jan-2021
https://dl.acm.org/doi/10.1145/3434301
Qiu JJiang LZhao ZLarus JCeze LStrauss K(2020)Challenging Sequential Bitstream Processing via Principled Bitwise SpeculationProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378461(607-621)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378461
Apostolakis SXu ZChan GCampanoni SAugust DLarus JCeze LStrauss K(2020)PerspectiveProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378458(351-367)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378458
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents