Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/509058.509070acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

Techniques for speculative run-time parallelization of loops

Published: 07 November 1998 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents a set of new run-time tests for speculative parallelization of loops that defy parallelization based on static analysis alone. It presents a novel method for speculative array privatization that is not only more efficient than previous methods when the speculation is correct, but also does not require rolling back the computation in case the variable is found not to be privatizable. We present another method for speculative parallelization which can overcome all loop-carried anti and output dependences, with even lower overhead than previous techniques which could not break such dependences. Again, in order to ameliorate the problem of paying a heavy penalty for speculatively parallelizing loops that turn out to be serial, we present a technique that enables early detection of loop-carried dependences. Our experimental results from a preliminary implementation of these tests on an IBM G30 SMP machine show a significant reduction in the penalty paid for mis-speculation, from roughly 50% to between 2% and 18% of the serial execution time. For parallel loops, we obtain about the same, and often, even better performance relative to the previous methods, making our techniques extremely attractive.

    References

    [1]
    W. Blume and R. Eigenmann. Performance analysis of parallelizing compilers on the Perfect Benchmarks Programs. IEEE Transactions on Parallel and Distributed Systems, 3(6):643-656, November 1992.
    [2]
    D. K. Chen, P. C. Yew, and J. Torrellas. An efficient algorithm for the run-time parallelization of doacross loops. In Proc. Supercomputing '94, pages 518-527, November 1994.
    [3]
    The Perfect Club. The perfect club benchmarks: Effective performance evaluation of supercomputers. International Journal of Supercomputing Applications, 3(3):5-40, Fall 1989.
    [4]
    R. Eigenmann, J. Hoeflinger, Z. Li, and D. Padua. Experience in the automatic parallelization of four Perfect-Benchmark programs. In Proc. 4th Workshop on Languages and Compilers for Parallel Computing. Pitman/MIT Press, August 1991.
    [5]
    S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative versioning cache. In Proc. 4th International Symposium on High Performance Computer Architecture, February 1998.
    [6]
    M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proc. Supercomputing '95, San Diego, CA, December 1995.
    [7]
    T.-C. Huang and P.-H. Hsu. The SPNT test: A new technology for run-time speculative parallelization of loops. In Proc. 10th Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN, August 1997.
    [8]
    K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. In Proc. ACM Symposium on Principles of Programming Languages, San Diego, CA, January 1998.
    [9]
    P. Krothapalli and P. Sadayappan. An approach to synchronization of parallel computing. In Proc. 1988 International Conference on Supercomputing, pages 573-581, June 1988.
    [10]
    S. Leung and J. Zahorjan. Improving the performance of run-time parallelization. In Proc. ACM Symposium on Principles and Practices of Parallel Programming, pages 83-91, May 1993.
    [11]
    S. Midkiff and D. Padua. Compiler generated synchronization for do loops. IEEE Transactions on Computers, 36:1485-1495, December 1987.
    [12]
    R. Nim. Techniques for speculative run-time parallelization of do loops. Master's thesis, Indian Institute of Technology, Delhi, India, December 1997.
    [13]
    J. Oplinger, D. Heine, S.-W. Liao, M. S. Lam, and K. Olukotun. Software and hardware for exploiting speculative parallelism with a multiprocessor. Technical Report CSL-TR-97-715, Stanford University, May 1997.
    [14]
    L. Rauchwerger and D. Padua. The privatizing doall test: A run-time technique for doall loop identification and array privatization. In Proc. 1994 International Conference on Supercomputing, pages 33-43, July 1994.
    [15]
    L. Rauchwerger and D. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proc. ACM SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995.
    [16]
    J. Saltz and R. Mirchandaney. The pre-processed doacross loop. In Proc. 1991 International Conference on Parallel Processing, pages 174-178, 1991.
    [17]
    J. Saltz, R. Mirchandaney, and K. Crowley. Run-time parallelization and scheduling of loops. IEEE Transactions on Computers, 40(5), May 1991.
    [18]
    J. G. Steffan and T. C. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proc. 4th International Symposium on High Performance Computer Architecture, February 1998.
    [19]
    P. Tu and D. Padua. Automatic array privatization. In Proc. 6th Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
    [20]
    Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors. In Proc. 4th International Symposium on High Performance Computer Architecture, February 1998.
    [21]
    C. Zhu and P. C. Yew. A scheme to enforce data dependence on large multiprocessor systems. IEEE Transactions on Software Engineering, 13(6):726-739, 1987.

    Cited By

    View all
    • (2020)Challenging Sequential Bitstream Processing via Principled Bitwise SpeculationProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378461(607-621)Online publication date: 9-Mar-2020
    • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
    • (2016)Performance implications of transient loop-carried data dependences in automatically parallelized loopsProceedings of the 25th International Conference on Compiler Construction10.1145/2892208.2892214(23-33)Online publication date: 17-Mar-2016
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing
    November 1998
    894 pages
    ISBN:089791984X

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 07 November 1998

    Check for updates

    Qualifiers

    • Article

    Conference

    SC '98
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)3

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Challenging Sequential Bitstream Processing via Principled Bitwise SpeculationProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378461(607-621)Online publication date: 9-Mar-2020
    • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
    • (2016)Performance implications of transient loop-carried data dependences in automatically parallelized loopsProceedings of the 25th International Conference on Compiler Construction10.1145/2892208.2892214(23-33)Online publication date: 17-Mar-2016
    • (2016)New Data Structures to Handle Speculative Parallelization at RuntimeInternational Journal of Parallel Programming10.1007/s10766-014-0347-044:3(407-426)Online publication date: 1-Jun-2016
    • (2015)Compiler-Driven Software Speculation for Thread-Level ParallelismACM Transactions on Programming Languages and Systems10.1145/282150538:2(1-45)Online publication date: 22-Dec-2015
    • (2012)Dynamically dispatching speculative threads to improve sequential executionACM Transactions on Architecture and Code Optimization10.1145/2355585.23555869:3(1-31)Online publication date: 5-Oct-2012
    • (2012)Analysis of pure methods using garbage collectionProceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness10.1145/2247684.2247694(48-57)Online publication date: 16-Jun-2012
    • (2011)Parallelizing irregular algorithmsProceedings of the 18th Conference on Pattern Languages of Programs10.1145/2578903.2579141(1-18)Online publication date: 21-Oct-2011
    • (2011)Enhanced speculative parallelization via incremental recoveryACM SIGPLAN Notices10.1145/2038037.194158046:8(189-200)Online publication date: 12-Feb-2011
    • (2011)Exclusive squashing for thread-level speculationProceedings of the 20th international symposium on High performance distributed computing10.1145/1996130.1996172(275-276)Online publication date: 8-Jun-2011
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media