Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1583991.1584050acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

A lightweight in-place implementation for software thread-level speculation

Published: 11 August 2009 Publication History

Abstract

Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them.
Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the "real" heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences).
We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.

References

[1]
V. S. Adve. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Int. Conf. High Performance Computing (SC), Nov 1995.
[2]
L. Ceze, J. Tuck, C. Cascaval, and J. Torrellas. Bulk Disambiguation of Speculative Threads in Multiprocessors. In Int. Symp. Computer Architecture (ISCA), Jun 2006.
[3]
M. K. Chen and K. Olukotun. Exploiting Method Level Parallelism in Single Threaded Java Programs. In Int. Conf. on Parallel Architectures and Compilation Techniques (PACT). Oct 1998.
[4]
M. K. Chen and K. Olukotun. The JRPM System for Dynamically Parallelizing Java Programs. In Int. Symp. on Computer Architecture (ISCA), Jun 2003.
[5]
M. Cintra and D. R. Llanos. Toward Efficient and Robust Software Speculative Parallelization on Multiprocessors. In Int. Symp. on Principles and Practice of Parallel Programming (PPoPP), Jun 2003.
[6]
F. Dang, H. Yu, and L. Rauchwerger. The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops. In Int. Parallel and Distributed Processing Symp. (IPDPS), Apr 2002.
[7]
L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for Chip Multiprocessor. In Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1998.
[8]
T. Harris, M. Plesko, A. Shinnar, and D. Tarditi. Optimizing Memory Transactions. In Int. Conf. on Programming Language Design and Implementation (PLDI), Jun 2006.
[9]
Intel. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, chapter 7. In http://download.intel.com/design/processor/manuals/253668.pdf, Sep 2008.
[10]
I. H. Kazi and D. J. Lilja. Coarsed-Grained Thread Pipelining: A Speculative Parallel Execution Model for Shared-Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 12(9), Sep 2001.
[11]
S. W. Kim, R. E. Chong-Liang Ooi, B. Falsafi, and T. N. Vijaykumar. Reference Idempotency Analysis: A Framework for Optimizing Speculative Execution. In Int. Symp. on Principles and Practice of Parallel Programming (PPoPP), Jun 2001.
[12]
F. Masdupuy. Array Operations Abstraction Using Semantic Analysis of Trapezoid Congruences. In Int. Conf. Supercomputing (ICS), July 1992.
[13]
C. E. Oancea and A. Mycroft. Set-Congruence Dynamic Analysis for Software TLS. In Lang. Comp. Par. Comp. (LCPC), Aug 2008.
[14]
C. E. Oancea and A. Mycroft. Software Thread-Level Speculation . An Optimistic Library Implementation. In Int. Worksh. Multi-Core Soft. Eng. (IWMSE), Jan 2008.
[15]
C. J. F. Pickett and C. Verbrugge. Software Thread Level Speculation for the Java Language and Virtual Machine Environment. In Lang. Comp. Par. Comp. (LCPC), Oct 2005.
[16]
L. Rauchwerger, and N. M. Amato, and D. A. Padua. A Scalable Method for Run-Time Loop Parallelization. In Int. Conf. Supercomputing (ICS), Jul 1995.
[17]
L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Trans. on Parallel and Distributed System, 10 No 2(2):160--199, Feb 1999.
[18]
P. Rundberg and P. Stenström. An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors. The Journal of Instr.-Level Par., 1999.
[19]
S. Rus, M. Pennings, and L. Rauchwerger Sensitivity Analysis for Automatic Parallelization on Multi-Cores. In Int. Conf. Supercomputing (ICS), Jun 2007.
[20]
S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid Analysis: Static & Dynamic Memory Reference Analysis. In Int. Journal of Par. Prg., 31(4), pages 251--283, Aug 2003.
[21]
B. Saha, A.-R. Adl-Tabatabai, R. L. Hudson, C. C. Minh, and B. Hertzberg. McRT-STM: a High Performance Software Transactional Memory System for a Multi-Core Runtime. In Int. Symp. Princ. Pract. of Par. Prg. (PPoPP), Mar 2006.
[22]
S. Sarkar, P. Sewell, F. Z. Nardelli, S. Owens, T. Ridge, T. Braibant, M. Myreen, and J. Alglave. The Semantics of X86-CC Multiprocessor Machine Code. In Int. Symp. Princ. of Prg. Lang. (POPL), Jan 2009.
[23]
G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar Processors. In Int. Symp. Comp. Arch. (ISCA), Jun 1995.
[24]
J. G. Steffan, C. G. Colohan, A. Zhai, and T. Mowry. A Scalable Approach for Thread Level Speculation. In Int. Symp. Comp. Arch. (ISCA), Jun 2000.
[25]
M. Tremblay, J. Chan, S. Chaudhry, A. W. Conigliaro, and S. S. Tse The MAJC Architecture: A Synthesis of Parallelism and Scalability. In Symp. Microarch. (MICRO), Dec 2000.
[26]
A. Welc, S. Jagannathan, and A. Hosking. Safe Futures for Java. In Int. Conf. on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), pages 439--453, Oct 2006.
[27]
A. Zhai, C. B. Colohan, J. G. Steffan, and T. C. Mowry. Compiler Optimization of Scalar Value Communication Between Speculative Threads. In Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct 2002.
[28]
C. Zilles and G. Sohi. Master/Slave Speculative Parallelization. In Int. Symp. on Microarchitecture (Micro), Nov 2002.

Cited By

View all
  • (2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
  • (2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
  • (2019)Processing transactions in a predefined orderProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295730(120-132)Online publication date: 16-Feb-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
August 2009
370 pages
ISBN:9781605586069
DOI:10.1145/1583991
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. roll-back
  2. thread-level speculation (tls)

Qualifiers

  • Research-article

Conference

SPAA 09

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25
37th ACM Symposium on Parallelism in Algorithms and Architectures
July 28 - August 1, 2025
Portland , OR , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
  • (2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
  • (2019)Processing transactions in a predefined orderProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295730(120-132)Online publication date: 16-Feb-2019
  • (2017)A Survey on Fully Homomorphic EncryptionACM Computing Surveys10.1145/312444150:6(1-33)Online publication date: 6-Dec-2017
  • (2017)Evaluation and Performance Modeling of a Burst Buffer SolutionACM SIGOPS Operating Systems Review10.1145/3041710.304171450:2(12-26)Online publication date: 30-Jan-2017
  • (2017)IrisACM SIGOPS Operating Systems Review10.1145/3041710.304171350:2(3-11)Online publication date: 30-Jan-2017
  • (2017)ReduxSTM: Optimizing STM designs for Irregular ApplicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.04.009107(114-133)Online publication date: Sep-2017
  • (2017)BFCA+The Journal of Supercomputing10.1007/s11227-016-1623-073:1(88-99)Online publication date: 1-Jan-2017
  • (2017)Using the Xeon Phi Platform to Run Speculatively-Parallelized CodesInternational Journal of Parallel Programming10.1007/s10766-016-0421-x45:2(225-241)Online publication date: 1-Apr-2017
  • (2016)Exhaustive analysis of thread-level speculationProceedings of the 3rd International Workshop on Software Engineering for Parallel Systems10.1145/3002125.3002127(25-34)Online publication date: 21-Oct-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media