research-article

Combining thread level speculation helper threads and runahead execution

Authors:

Polychronis Xekalakis,

Nikolas Ioannou,

Marcelo CintraAuthors Info & Claims

ICS '09: Proceedings of the 23rd international conference on Supercomputing

Pages 410 - 420

https://doi.org/10.1145/1542275.1542333

Published: 08 June 2009 Publication History

Get Access

Abstract

With the current trend toward multicore architectures, improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), but, instead, via multithreaded execution.Generating thread-parallel programs is hard and thread-level speculation (TLS) has been suggested as an execution model that can speculatively exploit thread-level parallelism (TLP) even when thread independence cannot be guaranteed by the programmer/compiler. Alternatively, the helper threads (HT) execution model has been proposed where subordinate threads are executed in parallel with a main thread in order to improve the execution efficiency (i.e., ILP) of the latter. Yet another execution model, runahead execution (RA), has also been proposed where subordinate versions of the main thread are dynamically created especially to cope with long-latency operations, again with the aim of improving the execution efficiency of the main thread.

Each one of these multithreaded execution models works best for different applications and application phases. In this paper we combine these three models into a single execution model and single hardware infrastructure such that the system can dynamically adapt to find the most appropriate multithreaded execution model. More specifically, TLS is favored whenever successful parallel execution of instructions in multiple threads (i.e., TLP) is possible and the system can seamlessly transition at run-time to the other models otherwise. In order to understand the tradeoffs involved, we also develop a performance model that allows one to quantitatively attribute overall performance gains to either TLP or ILP in such combined multithreaded execution model.

Experimental results show that our unified execution model achieves speedups of up to 41.2%, with an average of 10.2%, over an existing state-of-the-art TLS system and speedups of up to 35.2%, with an average of 18.3%, over a flavor of runahead execution for a subset of the SPEC2000 Int benchmark suite.

References

[1]

R. Barnes, E. Nystrom, J. Sias, S. Patel, N. Navarro, and W. M. Hwu. ''Beating In-Order Stalls with 'Fea-Ficker' Two-Pass Pipelining.'' Intl. Symp. on Microarchitecture, pages 387--398, December 2003.

Abstract

References

Cited By

Index Terms

Recommendations

Filtered runahead execution with a runahead buffer

Mixed speculative multithreaded execution models

An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations