Article

Thread-Level Speculation on a CMP can be energy efficient

Authors:

Smruti Sarangi,

Josep TorrellasAuthors Info & Claims

ICS '05: Proceedings of the 19th annual international conference on Supercomputing

Pages 219 - 228

https://doi.org/10.1145/1088149.1088178

Published: 20 June 2005 Publication History

Abstract

Chip Multiprocessors (CMP) with Thread-Level Speculation (TLS) have become the subject of intense research. However, TLS is suspected of being too energy inefficient to compete against conventional processors. In this paper, we refute this claim. To do so, we first identify the main sources of dynamic energy consumption in TLS. Then, we present simple energy-saving optimizations that cut the energy cost of TLS by over 60% on average with minimal performance impact. The resulting TLS CMP, populated with four 3-issue cores, speeds-up full SPECint 2000 codes by 1.27 on average, while keeping the fraction of the chip's energy consumption due to TLS to only 20%. Compared to a 6-issue superscalar at the same frequency, the TLS CMP is on average faster, while consuming only 85% of its total on-chip power.

References

[1]

International Technology Roadmap for Semiconductors. Semiconductor Industry Association, 2002.

[2]

J. L. Aragon, J. Gonzalez, and A. Gonzalez. Power-Aware Control Speculation Through Selective Throttling. In International Symposium on High-Performance Computer Architecture, pages 103--112, February 2003.

Digital Library

[3]

D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a Framework for Architectural-Level Power Analysis and Optimizations. In International Symposium on Computer Architecture, pages 83--94, June 2000.

Digital Library

[4]

J. A. Butts and G. S. Sohi. A Static Power Model for Architects. In International Symposium on Microarchitecture, pages 191--201, December 2000.

Digital Library

[5]

M. Cintra, J. F. Martínez, and J. Torrellas. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors. In International Symposium on Computer Architecture, pages 13--24, June 2000.

Digital Library

[6]

M. J. Garzarán, M. Prvulovic, J. M. Llabería, V. Viñals, L. Rauchwerger, and J. Torrellas. Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors. In International Symposium on High-Performance Computer Architecture, pages 191--202, February 2003.

Digital Library

[7]

SSA for Trees - GNU Project, May 2003. "http://www.gccsummit. org/2003/view_abstract.php?talk=2".

[8]

S. Gopal, T. Vijaykumar, J. Smith, and G. Sohi. Speculative Versioning Cache. In International Symposium on High-Performance Computer Architecture, pages 195--205, February 1998.

Digital Library

[9]

L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 58--69, October 1998.

Digital Library

[10]

V. Krishnan and J. Torrellas, A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, pages 866--880, September 1999.

Digital Library

[11]

R. Kumar, K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In International Symposium on Microarchitecture, December 2003.

Digital Library

[12]

S. Manne, A. Klauser, and D. Grunwald. Pipeline Gating: Speculation Control for Energy Reduction. In International Symposium on Computer Architecture, pages 132--141, July 1998.

Digital Library

[13]

P. Marcuello and A. Gonzalez. Clustered Speculative Multithreaded Processors. In International Conference on Supercomputing, pages 365--372, June 1999.

Digital Library

[14]

A. J. Martin, M. Nystroem, and P. Penzes. ET2: A Metric for Time and Energy Efficiency of Computation. Technical Report CSTR:2001.007, California Institute of Technology, December 2001.

[15]

S. Palacharla, N. P. Jouppi, and J. E. Smith. Complexity-Effective Superscalar Processors. In International Symposium on Computer Architecture, June 1997.

Digital Library

[16]

V. Petric and A. Roth. Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection. Technical Report MS-CIS-03-34, University of Pennsylvania, November 2003.

[17]

M. Prvulovic, M. J. Garzarán, L. Rauchwerger, and J. Torrellas. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. In International Symposium on Computer Architecture, pages 204--215, June 2001.

Digital Library

[18]

J. Renau. Chip Multiprocessors with Speculative Multithreading: Design for Performance and Energy Efficiency. PhD thesis, University of Illinois at Urbana-Champaign, 2004.

Digital Library

[19]

J. Renau, J. Tuck, W. Liu, L. Ceze, K. Strauss, and J. Torrellas. Tasking with Out-of-Order Spawn in TLS Chip Multiprocessors; Microarchitecture and Compilation. In International Conference on Supercomputing, June 2005.

Digital Library

[20]

P. Shivakumar and N. Jouppi. CACTI 3.0: An Integrated Cache Timing, Power and Area Model. Technical Report 2001/2, Compaq Computer Corporation, August 2001.

[21]

G. S. Sohi, S. E. Breach, and T. N. Vijayakumar. Multiscalar Processors. In International Symposium on Computer Architecture, pages 414--425, June 1995.

Digital Library

[22]

J. Steffan, C. Colohan, A. Zhai, and T. Mowry. A Scalable Approach to Thread-Level Speculation. In International Symposium on Computer Architecture, pages 1--12, June 2000.

Digital Library

[23]

J. Steffan, C. Colohan, A. Zhai, and T. Mowry. Improving Value Communication for Thread-Level Speculation. In International Symposium on High-Performance Computer Architecture, February 2002.

Digital Library

[24]

H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif. Full Chip Leakage Estimation Considering Power Supply and Temperature Variations. In International Symposium on Low Power Electronics and Design, August 2003.

Digital Library

[25]

M. Tremblay. MAJC: Microprocessor Architecture for Java Computing. Hot Chips, August 1999.

[26]

J. Tsai, J. Huang, C. Amlo, D. Lilja, and P. Yew. The Superthreaded Processor Architecture. IEEE Trans. on Computers, 48(9):881--902, September 1999.

Digital Library

[27]

J. Tuck. A Novel Compiler Framework for a Chip-Multiprocessor Architecture with Thread-Level Speculation. Master's thesis, University of Illinois at Urbana-Champaign, 2004.

[28]

H. S. Wang, X. P. Zhu, L. S. Peh, and S. Malik. Orion: A Power-Performance Simulator for Interconnection Networks. In International Symposium on Microarchitecture, December 2002.

Digital Library

[29]

Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan. HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects. Technical Report CS-2003-05, University of Virginia, Department of Computer Science, March 2003.

Cited By

Elsabbagh FSheikhha SYing VNguyen QEmer JSanchez D(2023)Accelerating RTL Simulation with Hardware-Software Co-DesignProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614257(153-166)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614257
Posluns GZhu YZhang GJeffrey MSalapura VZahran MChong FTang L(2022)A scalable architecture for reprioritizing ordered parallelismProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527387(437-453)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527387
Abeydeera MSanchez DLarus JCeze LStrauss K(2020)Chronos: Efficient Speculative Parallelism for AcceleratorsProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378454(1247-1262)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378454
Show More Cited By

Recommendations

Energy-Efficient Thread-Level Speculation

Chip multiprocessors with thread-level speculation have become the subject of intense research. This work refutes the claim that such a design is necessarily too energy inefficient. In addition, it proposes out-of-order task spawning to exploit more ...
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors

Conventional front-end designs attempt to maximize the number of "in-flight” instructions in the pipeline. However, branch mispredictions cause the processor to fetch useless instructions that are eventually squashed, increasing front-end energy and ...
Dynamic Core Allocation for Energy-Efficient Thread-Level Speculation
CSE '14: Proceedings of the 2014 IEEE 17th International Conference on Computational Science and Engineering

Thread-level speculation becomes promising for allowing multiple dependent threads to run on a multi-core processor simultaneously. It is often assumed that the parallel performance of a program is increased linearly as the number of processor cores ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '05: Proceedings of the 19th annual international conference on Supercomputing

June 2005

414 pages

ISBN:1595931678

DOI:10.1145/1088149

General Chair:
Arvind
MIT
,
Program Chair:
Larry Rudolph
MIT

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICS05

Sponsor:

SIGARCH

ICS05: International Conference on Supercomputing 2005

June 20 - 22, 2005

Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
483
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Elsabbagh FSheikhha SYing VNguyen QEmer JSanchez D(2023)Accelerating RTL Simulation with Hardware-Software Co-DesignProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614257(153-166)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614257
Posluns GZhu YZhang GJeffrey MSalapura VZahran MChong FTang L(2022)A scalable architecture for reprioritizing ordered parallelismProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527387(437-453)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527387
Abeydeera MSanchez DLarus JCeze LStrauss K(2020)Chronos: Efficient Speculative Parallelism for AcceleratorsProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378454(1247-1262)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378454
Ying VJeffrey MSanchez DMartínez JDuato JEeckhout L(2020)T4Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00024(159-172)Online publication date: 30-May-2020
https://dl.acm.org/doi/10.1109/ISCA45697.2020.00024
Jeffrey MYing VSubramanian SLee HEmer JSanchez DOskin MInoue K(2018)Harmonizing speculative and non-speculative execution in architectures for ordered parallelismProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00026(217-230)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00026
Abeydeera MSubramanian SJeffrey MEmer JSanchez D(2017)SAM: Optimizing Multithreaded Cores for Speculative Parallelism2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.37(64-78)Online publication date: Sep-2017
https://doi.org/10.1109/PACT.2017.37
Jeffrey MSubramanian SAbeydeera MEmer JSanchez DHsu WYang CLipasti MLee H(2016)Data-centric execution of speculative parallel programsThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195644(1-13)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195644
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Jeffrey MSubramanian SAbeydeera MEmer JSanchez D(2016)Data-centric execution of speculative parallel programs2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783708(1-13)Online publication date: Oct-2016
https://doi.org/10.1109/MICRO.2016.7783708
Jeffrey MSubramanian SYan CEmer JSanchez DPrvulovic M(2015)A scalable architecture for ordered parallelismProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830777(228-241)Online publication date: 5-Dec-2015
https://dl.acm.org/doi/10.1145/2830772.2830777
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents