research-article

A lightweight in-place implementation for software thread-level speculation

Authors:

Cosmin E. Oancea,

Tim HarrisAuthors Info & Claims

SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures

Pages 223 - 232

https://doi.org/10.1145/1583991.1584050

Published: 11 August 2009 Publication History

Abstract

Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them.

Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the "real" heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences).

We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.

References

[1]

V. S. Adve. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Int. Conf. High Performance Computing (SC), Nov 1995.

Digital Library

[2]

L. Ceze, J. Tuck, C. Cascaval, and J. Torrellas. Bulk Disambiguation of Speculative Threads in Multiprocessors. In Int. Symp. Computer Architecture (ISCA), Jun 2006.

Digital Library

[3]

M. K. Chen and K. Olukotun. Exploiting Method Level Parallelism in Single Threaded Java Programs. In Int. Conf. on Parallel Architectures and Compilation Techniques (PACT). Oct 1998.

Digital Library

[4]

M. K. Chen and K. Olukotun. The JRPM System for Dynamically Parallelizing Java Programs. In Int. Symp. on Computer Architecture (ISCA), Jun 2003.

Digital Library

[5]

M. Cintra and D. R. Llanos. Toward Efficient and Robust Software Speculative Parallelization on Multiprocessors. In Int. Symp. on Principles and Practice of Parallel Programming (PPoPP), Jun 2003.

Digital Library

[6]

F. Dang, H. Yu, and L. Rauchwerger. The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops. In Int. Parallel and Distributed Processing Symp. (IPDPS), Apr 2002.

Digital Library

[7]

L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for Chip Multiprocessor. In Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1998.

Digital Library

[8]

T. Harris, M. Plesko, A. Shinnar, and D. Tarditi. Optimizing Memory Transactions. In Int. Conf. on Programming Language Design and Implementation (PLDI), Jun 2006.

Digital Library

[9]

Intel. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, chapter 7. In http://download.intel.com/design/processor/manuals/253668.pdf, Sep 2008.

[10]

I. H. Kazi and D. J. Lilja. Coarsed-Grained Thread Pipelining: A Speculative Parallel Execution Model for Shared-Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 12(9), Sep 2001.

Digital Library

[11]

S. W. Kim, R. E. Chong-Liang Ooi, B. Falsafi, and T. N. Vijaykumar. Reference Idempotency Analysis: A Framework for Optimizing Speculative Execution. In Int. Symp. on Principles and Practice of Parallel Programming (PPoPP), Jun 2001.

Digital Library

[12]

F. Masdupuy. Array Operations Abstraction Using Semantic Analysis of Trapezoid Congruences. In Int. Conf. Supercomputing (ICS), July 1992.

Digital Library

[13]

C. E. Oancea and A. Mycroft. Set-Congruence Dynamic Analysis for Software TLS. In Lang. Comp. Par. Comp. (LCPC), Aug 2008.

[14]

C. E. Oancea and A. Mycroft. Software Thread-Level Speculation . An Optimistic Library Implementation. In Int. Worksh. Multi-Core Soft. Eng. (IWMSE), Jan 2008.

Digital Library

[15]

C. J. F. Pickett and C. Verbrugge. Software Thread Level Speculation for the Java Language and Virtual Machine Environment. In Lang. Comp. Par. Comp. (LCPC), Oct 2005.

Digital Library

[16]

L. Rauchwerger, and N. M. Amato, and D. A. Padua. A Scalable Method for Run-Time Loop Parallelization. In Int. Conf. Supercomputing (ICS), Jul 1995.

[17]

L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Trans. on Parallel and Distributed System, 10 No 2(2):160--199, Feb 1999.

Digital Library

[18]

P. Rundberg and P. Stenstr&#246;m. An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors. The Journal of Instr.-Level Par., 1999.

[19]

S. Rus, M. Pennings, and L. Rauchwerger Sensitivity Analysis for Automatic Parallelization on Multi-Cores. In Int. Conf. Supercomputing (ICS), Jun 2007.

Digital Library

[20]

S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid Analysis: Static & Dynamic Memory Reference Analysis. In Int. Journal of Par. Prg., 31(4), pages 251--283, Aug 2003.

Digital Library

[21]

B. Saha, A.-R. Adl-Tabatabai, R. L. Hudson, C. C. Minh, and B. Hertzberg. McRT-STM: a High Performance Software Transactional Memory System for a Multi-Core Runtime. In Int. Symp. Princ. Pract. of Par. Prg. (PPoPP), Mar 2006.

Digital Library

[22]

S. Sarkar, P. Sewell, F. Z. Nardelli, S. Owens, T. Ridge, T. Braibant, M. Myreen, and J. Alglave. The Semantics of X86-CC Multiprocessor Machine Code. In Int. Symp. Princ. of Prg. Lang. (POPL), Jan 2009.

Digital Library

[23]

G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar Processors. In Int. Symp. Comp. Arch. (ISCA), Jun 1995.

Digital Library

[24]

J. G. Steffan, C. G. Colohan, A. Zhai, and T. Mowry. A Scalable Approach for Thread Level Speculation. In Int. Symp. Comp. Arch. (ISCA), Jun 2000.

Digital Library

[25]

M. Tremblay, J. Chan, S. Chaudhry, A. W. Conigliaro, and S. S. Tse The MAJC Architecture: A Synthesis of Parallelism and Scalability. In Symp. Microarch. (MICRO), Dec 2000.

[26]

A. Welc, S. Jagannathan, and A. Hosking. Safe Futures for Java. In Int. Conf. on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), pages 439--453, Oct 2006.

Digital Library

[27]

A. Zhai, C. B. Colohan, J. G. Steffan, and T. C. Mowry. Compiler Optimization of Scalar Value Communication Between Speculative Threads. In Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct 2002.

Digital Library

[28]

C. Zilles and G. Sohi. Master/Slave Speculative Parallelization. In Int. Symp. on Microarchitecture (Micro), Nov 2002.

Digital Library

Cited By

Bhosale AEigenmann RLee IChabbi MSteuwer M(2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638493
Estebanez ALlanos DOrden DPalop B(2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
https://doi.org/10.1371/journal.pone.0267602
Saad MKishi MJing SHans SPalmieri RHollingsworth JKeidar I(2019)Processing transactions in a predefined orderProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295730(120-132)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3295730
Show More Cited By

Index Terms

A lightweight in-place implementation for software thread-level speculation
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Applying thread-level speculation to database transactions
Incrementally parallelizing database transactions with thread-level speculation

With the advent of chip multiprocessors, exploiting intratransaction parallelism in database systems is an attractive way of improving transaction performance. However, exploiting intratransaction parallelism is difficult for two reasons: first, ...
Partially ordered epochs for thread-level speculation
CF '05: Proceedings of the 2nd conference on Computing frontiers

Thread-Level Speculation TLS) can be used to exploit parallelism in programs where static analysis fails. When a dependence violation is dynamically detected, the violating thread is rolled-back and restarted. However, we believe that for many ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures

August 2009

370 pages

ISBN:9781605586069

DOI:10.1145/1583991

General Chair:
Friedhelm Meyer auf der Heide
University of Paderborn, Germany)
,
Program Chair:
Michael A. Bender
Stony Brook University and Tokutek, Inc., USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SPAA 09

Sponsor:

SPAA 09: 21st ACM Symposium on Parallelism in Algorithms and Architectures

August 11 - 13, 2009

AB, Calgary, Canada

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

51
Total Citations
View Citations
396
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bhosale AEigenmann RLee IChabbi MSteuwer M(2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638493
Estebanez ALlanos DOrden DPalop B(2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
https://doi.org/10.1371/journal.pone.0267602
Saad MKishi MJing SHans SPalmieri RHollingsworth JKeidar I(2019)Processing transactions in a predefined orderProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295730(120-132)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3295730
Martins PSousa LMariano A(2017)A Survey on Fully Homomorphic EncryptionACM Computing Surveys10.1145/312444150:6(1-33)Online publication date: 6-Dec-2017
https://dl.acm.org/doi/10.1145/3124441
Schenck WEl Sayed SFoszczynski MHomberg WPleiter D(2017)Evaluation and Performance Modeling of a Burst Buffer SolutionACM SIGOPS Operating Systems Review10.1145/3041710.304171450:2(12-26)Online publication date: 30-Jan-2017
https://dl.acm.org/doi/10.1145/3041710.3041714
Papagiannis ASaloustros GMarazakis MBilas A(2017)IrisACM SIGOPS Operating Systems Review10.1145/3041710.304171350:2(3-11)Online publication date: 30-Jan-2017
https://dl.acm.org/doi/10.1145/3041710.3041713
Pedrero MGutierrez ERomero SPlata O(2017)ReduxSTM: Optimizing STM designs for Irregular ApplicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.04.009107(114-133)Online publication date: Sep-2017
https://doi.org/10.1016/j.jpdc.2017.04.009
Aldea SLlanos DGonzalez-Escribano A(2017)BFCA+The Journal of Supercomputing10.1007/s11227-016-1623-073:1(88-99)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1007/s11227-016-1623-0
Estebanez ALlanos DGonzalez-Escribano A(2017)Using the Xeon Phi Platform to Run Speculatively-Parallelized CodesInternational Journal of Parallel Programming10.1007/s10766-016-0421-x45:2(225-241)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s10766-016-0421-x
Verbrugge CPickett CKrolik AKielstra AJannesari ASato YWinter S(2016)Exhaustive analysis of thread-level speculationProceedings of the 3rd International Workshop on Software Engineering for Parallel Systems10.1145/3002125.3002127(25-34)Online publication date: 21-Oct-2016
https://dl.acm.org/doi/10.1145/3002125.3002127
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten