Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

Published: 01 February 1999 Publication History
  • Get Citation Alerts
  • Abstract

    Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall and apply a fully parallel data dependence test to determine if it had any cross-iteration dependences; if the test fails, then the loop is reexecuted serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run-time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: It detects at run-time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks, which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods

    References

    [1]
    S. Abraham, private communication, Hewlett Packard Laboratories, 1994.
    [2]
    J.R. Allen K. Kennedy C. Porterfield and J. Warren, "Conversion of Control Dependence to Data Dependence," Proc. 10th ACM Symp. Principles of Programming Languages, pp. 177-189, Jan. 1983.
    [3]
    T. Allen and D.A. Padua, "Debugging Fortran on a Shared-Memory Machine," Proc. 1987 Int'l Conf. Parallel Processing, pp. 721-727, St. Charles, Ill., 1987.
    [4]
    FX/Series Architecture Manual. Alliant Computer Systems Corp., 1986.
    [5]
    Alliant FX/2800 Series System Description. Alliant Computer Systems Corp., 1991.
    [6]
    R. Ballance A. Maccabe and K. Ottenstein, "The Program Dependence Web: A Representation Supporting Control-Data- and Demand-Driven Interpretation of Imperative Languages," Proc. SIGPLAN '90 Conf. Programming Language Design and Implementation, pp. 257-271, June 1990.
    [7]
    U. Banerjee, Dependence Analysis for Supercomputing. Boston, Mass.: Kluwer Academic, 1988.
    [8]
    M. Berry D. Chen P. Koss D. Kuck S. Lo Y. Pang R. Roloff A. Sameh E. Clementi S. Chin D. Schneider G. Fox P. Messina D. Walker C. Hsiung J. Schwarzmeier K. Lue S. Orzag F. Seidl O. Johnson G. Swanson R. Goodrum and J. Martin, "The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers," Technical Report CSRD-827, Center for Supercomputing Research and Development, Univ. of Illinois, Urbana-Champaign, May 1989.
    [9]
    H. Berryman and J. Saltz, "A Manual for PARTI Runtime Primitives," Interim Report 90-13, ICASE, 1990.
    [10]
    W. Blume and R. Eigenmann, "Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks<sup>TM</sup> Programs," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 6, pp. 643-656, Nov. 1992.
    [11]
    M. Burke R. Cytron J. Ferrante and W. Hsieh, "Automatic Generation of Nested, Fork-Join Parallelism," J. Supercomputing, pp. 71-88, 1989.
    [12]
    W.J. Camp S.J. Plimpton B.A. Hendrickson and R.W. Leland, "Massively Parallel Methods for Engineering and Science Problems," Comm. ACM, vol. 37, no. 4, pp. 31-41, Apr. 1994.
    [13]
    D.K. Chen P.C. Yew and J. Torrellas, "An Efficient Algorithm for the Run-Time Parallelization of doacross Loops," Proc. Supercomputing 1994, pp. 518-527, Nov. 1994.
    [14]
    A. Dinning and E. Schonberg, "An Empirical Comparison of Monitoring Algorithms for Access Anomaly Detection," Proc. Second ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPOPP), pp. 1-10, 1990.
    [15]
    R. Eigenmann J. Hoeflinger Z. Li and D. Padua, "Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs," Proc. Fourth Workshop on Languages and Compilers for Parallel Computing, pp. 65-83, Santa Clara, Calif., Aug. 1991.
    [16]
    P.A. Emrath S. Ghosh and D.A. Padua, "Detecting Nondeterminacy in Parallel Programs," IEEE Software, pp. 69-77, Jan. 1992.
    [17]
    D.R. Jefferson, "Virtual Time," ACM Trans. Programming Languages and Systems, vol. 7, no. 3, pp. 404-425, July 1985.
    [18]
    V. Krothapalli and P. Sadayappan, "An Approach to Synchronization of Parallel Computing," Proc. 1988 Int'l Conf. Supercomputing, pp. 573-581, June 1988.
    [19]
    C. Kruskal, "Efficient Parallel Algorithms for Graph Problems," Proc. 1986 Int'l Conf. Parallel Processing, pp. 869-876, Aug. 1986.
    [20]
    D.J. Kuck R.H. Kuhn D.A. Padua B. Leasure and M. Wolfe, "Dependence Graphs and Compiler Optimizations," Proc. Eighth ACM Symp. Principles of Programming Languages, pp. 207-218, Jan. 1981.
    [21]
    F. Thomson Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, 1992.
    [22]
    S. Leung and J. Zahorjan, "Improving the Performance of Runtime Parallelization," Proc. Fourth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPOPP), pp. 83-91, May 1993.
    [23]
    Z. Li, "Array Privatization for Parallel Execution of Loops," Proc. 19th Int'l Symp. Computer Architecture, pp. 313-322, 1992.
    [24]
    D.E. Maydan S.P. Amarasinghe and M.S. Lam, "Data Dependence and Data-Flow Analysis of Arrays," Proc. Fourth Workshop Programming Languages and Compilers for Parallel Computing, Aug. 1992.
    [25]
    J. Mellor-Crummey, "On-the-Fly Detection of Data Races for Programs with Nested Fork-Join Parallelism," Proc. Supercomputing 1991, pp. 24-33, Albuquerque, N.M., Nov. 1991.
    [26]
    J. Mellor-Crummey, "Compile-Time Support for Efficient Data Race Detection in Shared-Memory Parallel Programs," Proc. ACM/ONR Workshop Parallel and Distributed Debugging, San Diego, Calif., pp. 129-139, May 1993.
    [27]
    S. Midkiff and D. Padua, "Compiler Algorithms for Synchronization," IEEE Trans. Computers, vol. 36, no. 12, pp. 1,485-1,495, Dec. 1987.
    [28]
    I. Nudler and L. Rudolph, "Tools for the Efficient Developement of Efficient Parallel Programs," Proc. First Israeli Conf. Computer System Eng., 1988.
    [29]
    D.A. Padua and M.J. Wolfe, "Advanced Compiler Optimizations for Supercomputers," Comm. ACM, vol. 29, pp. 1,184-1,201, Dec. 1986.
    [30]
    C. Polychronopoulos, "Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design," IEEE Trans. Computers, vol. 37, no. 8, pp. 991-1,004, Aug. 1988.
    [31]
    L. Rauchwerger N. Amato and D. Padua, "A Scalable Method for Run-Time Loop Parallelization," Int'l J. Parallel Processing, vol. 26, no. 6, pp. 537-576, July 1995.
    [32]
    L. Rauchwerger and D. Padua, "The Privatizing doall Test: A Run-Time Technique for doall Loop Identification and Array Privatization," Proc. 1994 Int'l Conf. Supercomputing, pp. 33-43, July 1994.
    [33]
    L. Rauchwerger and D.A. Padua, "Parallelizing WHILE Loops for Multiprocessor Systems," Proc. Ninth Int'l Parallel Processing Symp., Apr. 1995.
    [34]
    L. Rauchwerger and D.A. Padua, "The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization," Proc. SIGPLAN 1995 Conf. Programming Language Design and Implementation, pp. 218-232, La Jolla, Calif., June 1995.
    [35]
    J. Saltz and R. Mirchandaney, "The Preprocessed doacross Loop," Proc. 1991 Int'l Conf. Parallel Processing, Dr. H.D. Schwetman, ed., pp. 174-178. CRC Press, Inc., 1991.
    [36]
    J. Saltz R. Mirchandaney and K. Crowley, "The doconsider Loop," Proc. 1989 Int'l Conf. Supercomputing, pp. 29-40, June 1989.
    [37]
    J. Saltz R. Mirchandaney and K. Crowley, "Run-Time Parallelization and Scheduling of Loops," IEEE Trans. Computers, vol. 40, no. 5, May 1991.
    [38]
    E. Schonberg, "On-the-Fly Detection of Access Anomalies," Proc. SIGPLAN 1989 Conf. Programming Language Design and Implementation, pp. 285-297, Portland, Ore., 1989.
    [39]
    P. Tu and D. Padua, "Array Privatization for Shared and Distributed Memory Machines," Proc. Second Workshop Languages, Compilers, and Run-Time Environments for Distributed Memory Machines, Sept. 1992.
    [40]
    P. Tu and D. Padua, "Automatic Array Privatization," Proc. Sixth Ann. Workshop on Languages and Compilers for Parallel Computing, Portland, Ore., Aug. 1993.
    [41]
    P. Tu and D. Padua, "GSA Based Demand-Driven Symbolic Analysis," Technical Report 1339, Center for Supercomputing Research and Development, Univ. of Illinois at Urbana-Champaign, Feb. 1994.
    [42]
    A. Vladimirescu, "LSI Circuit Simulation on Vector Computers," Technical Report no. UCB/ERL M82/75, Electronics Research Lab., Univ. of California, Berkeley, Oct. 1982.
    [43]
    M. Wolfe, Optimizing Compilers for Supercomputers. Boston, Mass.: The MIT Press, 1989.
    [44]
    J. Wu J. Saltz S. Hiranandani and H. Berryman, "Runtime Compilation Methods for Multicomputers," Proc. 1991 Int'l Conf. Parallel Processing, Dr. H.D. Schwetman, ed., pp. 26-30. CRC Press, Inc., 1991.
    [45]
    C. Xu, "Effects of Parallelism Degree on Runtime Parallelism of Loops," Proc. 31st Hawaii Int'l Conf. System Sciences, pp. 86-95, Jan. 1998.
    [46]
    C. Xu and V. Chaudhary, "Time-Stamping Algorithms for Parallelization of Loops at Run-time," Proc. 11th Int'l Parallel Processing Symp., Apr. 1997.
    [47]
    Y. Zhang L. Rauchwerger and J. Torrellas, "Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors," Proc. Fourth Int'l Symp. High Performance Computer Architecture 1998 (HPCA-4), pp. 162-173, Feb. 1998.
    [48]
    Y. Zhang L. Rauchwerger and J. Torrellas, "Speculative Parallel Execution of Loops with Cross-Iteration Dependences in DSM Multiprocessors," Proc. Fifth Int'l Symp. High-Performance Computer Architecture (HPCA-5), Jan. 1999.
    [49]
    C. Zhu and P.C. Yew, "A Scheme to Enforce Data Dependence on Large Multiprocessor Systems," IEEE Trans. Software Eng., vol. 13, no. 6, pp. 726-739, June 1987.
    [50]
    H. Zima, Supercompilers for Parallel and Vector Computers. New York: ACM Press, 1991.

    Cited By

    View all
    • (2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
    • (2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
    • (2022)Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLSACM Transactions on Reconfigurable Technology and Systems10.1145/350180115:3(1-31)Online publication date: 4-Feb-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Parallel and Distributed Systems
    IEEE Transactions on Parallel and Distributed Systems  Volume 10, Issue 2
    February 1999
    96 pages
    ISSN:1045-9219
    Issue’s Table of Contents

    Publisher

    IEEE Press

    Publication History

    Published: 01 February 1999

    Author Tags

    1. Compilers
    2. DOALL
    3. parallel processing
    4. privatization.
    5. reduction
    6. run-time
    7. speculative

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
    • (2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
    • (2022)Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLSACM Transactions on Reconfigurable Technology and Systems10.1145/350180115:3(1-31)Online publication date: 4-Feb-2022
    • (2022)Compiler Optimization for Irregular Memory Access Patterns in PGAS ProgramsLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_1(3-21)Online publication date: 12-Oct-2022
    • (2021)Reverse engineering for reduction parallelization via semiring polynomialsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454079(820-834)Online publication date: 19-Jun-2021
    • (2021)On the automatic parallelization of subscripted subscript patterns using array property analysisProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460424(392-403)Online publication date: 3-Jun-2021
    • (2021)Scalable FSM parallelization via path fusion and higher-order speculationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446705(887-901)Online publication date: 19-Apr-2021
    • (2021)Simplifying dependent reductions in the polyhedral modelProceedings of the ACM on Programming Languages10.1145/34343015:POPL(1-33)Online publication date: 4-Jan-2021
    • (2020)Challenging Sequential Bitstream Processing via Principled Bitwise SpeculationProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378461(607-621)Online publication date: 9-Mar-2020
    • (2020)PerspectiveProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378458(351-367)Online publication date: 9-Mar-2020
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media