Article

Free access

Techniques for speculative run-time parallelization of loops

Authors:

Manish Gupta and

Rahul NimAuthors Info & Claims

SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

November 1998

Pages 1 - 12

Published: 07 November 1998 Publication History

Abstract

This paper presents a set of new run-time tests for speculative parallelization of loops that defy parallelization based on static analysis alone. It presents a novel method for speculative array privatization that is not only more efficient than previous methods when the speculation is correct, but also does not require rolling back the computation in case the variable is found not to be privatizable. We present another method for speculative parallelization which can overcome all loop-carried anti and output dependences, with even lower overhead than previous techniques which could not break such dependences. Again, in order to ameliorate the problem of paying a heavy penalty for speculatively parallelizing loops that turn out to be serial, we present a technique that enables early detection of loop-carried dependences. Our experimental results from a preliminary implementation of these tests on an IBM G30 SMP machine show a significant reduction in the penalty paid for mis-speculation, from roughly 50% to between 2% and 18% of the serial execution time. For parallel loops, we obtain about the same, and often, even better performance relative to the previous methods, making our techniques extremely attractive.

References

[1]

W. Blume and R. Eigenmann. Performance analysis of parallelizing compilers on the Perfect Benchmarks Programs. IEEE Transactions on Parallel and Distributed Systems, 3(6):643-656, November 1992.

Digital Library

[2]

D. K. Chen, P. C. Yew, and J. Torrellas. An efficient algorithm for the run-time parallelization of doacross loops. In Proc. Supercomputing '94, pages 518-527, November 1994.

Digital Library

[3]

The Perfect Club. The perfect club benchmarks: Effective performance evaluation of supercomputers. International Journal of Supercomputing Applications, 3(3):5-40, Fall 1989.

Digital Library

[4]

R. Eigenmann, J. Hoeflinger, Z. Li, and D. Padua. Experience in the automatic parallelization of four Perfect-Benchmark programs. In Proc. 4th Workshop on Languages and Compilers for Parallel Computing. Pitman/MIT Press, August 1991.

Digital Library

[5]

S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative versioning cache. In Proc. 4th International Symposium on High Performance Computer Architecture, February 1998.

Digital Library

[6]

M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proc. Supercomputing '95, San Diego, CA, December 1995.

Digital Library

[7]

T.-C. Huang and P.-H. Hsu. The SPNT test: A new technology for run-time speculative parallelization of loops. In Proc. 10th Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN, August 1997.

Digital Library

[8]

K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. In Proc. ACM Symposium on Principles of Programming Languages, San Diego, CA, January 1998.

Digital Library

[9]

P. Krothapalli and P. Sadayappan. An approach to synchronization of parallel computing. In Proc. 1988 International Conference on Supercomputing, pages 573-581, June 1988.

Digital Library

[10]

S. Leung and J. Zahorjan. Improving the performance of run-time parallelization. In Proc. ACM Symposium on Principles and Practices of Parallel Programming, pages 83-91, May 1993.

Digital Library

[11]

S. Midkiff and D. Padua. Compiler generated synchronization for do loops. IEEE Transactions on Computers, 36:1485-1495, December 1987.

Digital Library

[12]

R. Nim. Techniques for speculative run-time parallelization of do loops. Master's thesis, Indian Institute of Technology, Delhi, India, December 1997.

[13]

J. Oplinger, D. Heine, S.-W. Liao, M. S. Lam, and K. Olukotun. Software and hardware for exploiting speculative parallelism with a multiprocessor. Technical Report CSL-TR-97-715, Stanford University, May 1997.

Digital Library

[14]

L. Rauchwerger and D. Padua. The privatizing doall test: A run-time technique for doall loop identification and array privatization. In Proc. 1994 International Conference on Supercomputing, pages 33-43, July 1994.

Digital Library

[15]

L. Rauchwerger and D. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proc. ACM SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995.

Digital Library

[16]

J. Saltz and R. Mirchandaney. The pre-processed doacross loop. In Proc. 1991 International Conference on Parallel Processing, pages 174-178, 1991.

[17]

J. Saltz, R. Mirchandaney, and K. Crowley. Run-time parallelization and scheduling of loops. IEEE Transactions on Computers, 40(5), May 1991.

Digital Library

[18]

J. G. Steffan and T. C. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proc. 4th International Symposium on High Performance Computer Architecture, February 1998.

Digital Library

[19]

P. Tu and D. Padua. Automatic array privatization. In Proc. 6th Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.

Digital Library

[20]

Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors. In Proc. 4th International Symposium on High Performance Computer Architecture, February 1998.

Digital Library

[21]

C. Zhu and P. C. Yew. A scheme to enforce data dependence on large multiprocessor systems. IEEE Transactions on Software Engineering, 13(6):726-739, 1987.

Digital Library

Cited By

Qiu JJiang LZhao ZLarus JCeze LStrauss K(2020)Challenging Sequential Bitstream Processing via Principled Bitwise SpeculationProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378461(607-621)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378461
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Murphy NJones TMullins RCampanoni SZaks AHermenegildo M(2016)Performance implications of transient loop-carried data dependences in automatically parallelized loopsProceedings of the 25th International Conference on Compiler Construction10.1145/2892208.2892214(23-33)Online publication date: 17-Mar-2016
https://dl.acm.org/doi/10.1145/2892208.2892214
Show More Cited By

Index Terms

Techniques for speculative run-time parallelization of loops
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language features
        Control structures
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Speculative Parallelization of Partially Parallel Loops
Read More
Run-Time Parallelization and Scheduling of Loops

The authors study run-time methods to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, ...
Read More
Speculative Parallelization of Partially Parallel Loops
LCR '00: Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. We have previously proposed a framework for their identification. We ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

November 1998

894 pages

ISBN:089791984X

General Chairs:
Richard Brent,
Dennis Duke

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

IEEE Computer Society

United States

Publication History

Published: 07 November 1998

Check for updates

Qualifiers

Article

Conference

SC '98

Sponsor:

SIGARCH
IEEE-CS

SC '98: International Conference for High Performance Computing, Networking, Storage and Analysis

November 7 - 13, 1998

CA, San Jose

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
358
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)3

Other Metrics

View Author Metrics

Citations

Cited By

Qiu JJiang LZhao ZLarus JCeze LStrauss K(2020)Challenging Sequential Bitstream Processing via Principled Bitwise SpeculationProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378461(607-621)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378461
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Murphy NJones TMullins RCampanoni SZaks AHermenegildo M(2016)Performance implications of transient loop-carried data dependences in automatically parallelized loopsProceedings of the 25th International Conference on Compiler Construction10.1145/2892208.2892214(23-33)Online publication date: 17-Mar-2016
https://dl.acm.org/doi/10.1145/2892208.2892214
Estebanez ALlanos DGonzalez-Escribano A(2016)New Data Structures to Handle Speculative Parallelization at RuntimeInternational Journal of Parallel Programming10.1007/s10766-014-0347-044:3(407-426)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s10766-014-0347-0
Yiapanis PBrown GLuján M(2015)Compiler-Driven Software Speculation for Thread-Level ParallelismACM Transactions on Programming Languages and Systems10.1145/282150538:2(1-45)Online publication date: 22-Dec-2015
https://dl.acm.org/doi/10.1145/2821505
Luo YZhai A(2012)Dynamically dispatching speculative threads to improve sequential executionACM Transactions on Architecture and Code Optimization10.1145/2355585.23555869:3(1-31)Online publication date: 5-Oct-2012
https://dl.acm.org/doi/10.1145/2355585.2355586
Österlund ELöwe WZhang LMutlu O(2012)Analysis of pure methods using garbage collectionProceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness10.1145/2247684.2247694(48-57)Online publication date: 16-Jun-2012
https://dl.acm.org/doi/10.1145/2247684.2247694
Monteiro PMonteiro MPingali KHvatum L(2011)Parallelizing irregular algorithmsProceedings of the 18th Conference on Pattern Languages of Programs10.1145/2578903.2579141(1-18)Online publication date: 21-Oct-2011
https://dl.acm.org/doi/10.1145/2578903.2579141
Tian CLin CFeng MGupta R(2011)Enhanced speculative parallelization via incremental recoveryACM SIGPLAN Notices10.1145/2038037.194158046:8(189-200)Online publication date: 12-Feb-2011
https://dl.acm.org/doi/10.1145/2038037.1941580
García-Yágüez ÁLlanos DGonzález-Escribano AMaccabe AThain D(2011)Exclusive squashing for thread-level speculationProceedings of the 20th international symposium on High performance distributed computing10.1145/1996130.1996172(275-276)Online publication date: 8-Jun-2011
https://dl.acm.org/doi/10.1145/1996130.1996172
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents