Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1088149.1088178acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Thread-Level Speculation on a CMP can be energy efficient

Published: 20 June 2005 Publication History
  • Get Citation Alerts
  • Abstract

    Chip Multiprocessors (CMP) with Thread-Level Speculation (TLS) have become the subject of intense research. However, TLS is suspected of being too energy inefficient to compete against conventional processors. In this paper, we refute this claim. To do so, we first identify the main sources of dynamic energy consumption in TLS. Then, we present simple energy-saving optimizations that cut the energy cost of TLS by over 60% on average with minimal performance impact. The resulting TLS CMP, populated with four 3-issue cores, speeds-up full SPECint 2000 codes by 1.27 on average, while keeping the fraction of the chip's energy consumption due to TLS to only 20%. Compared to a 6-issue superscalar at the same frequency, the TLS CMP is on average faster, while consuming only 85% of its total on-chip power.

    References

    [1]
    International Technology Roadmap for Semiconductors. Semiconductor Industry Association, 2002.
    [2]
    J. L. Aragon, J. Gonzalez, and A. Gonzalez. Power-Aware Control Speculation Through Selective Throttling. In International Symposium on High-Performance Computer Architecture, pages 103--112, February 2003.
    [3]
    D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a Framework for Architectural-Level Power Analysis and Optimizations. In International Symposium on Computer Architecture, pages 83--94, June 2000.
    [4]
    J. A. Butts and G. S. Sohi. A Static Power Model for Architects. In International Symposium on Microarchitecture, pages 191--201, December 2000.
    [5]
    M. Cintra, J. F. Martínez, and J. Torrellas. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors. In International Symposium on Computer Architecture, pages 13--24, June 2000.
    [6]
    M. J. Garzarán, M. Prvulovic, J. M. Llabería, V. Viñals, L. Rauchwerger, and J. Torrellas. Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors. In International Symposium on High-Performance Computer Architecture, pages 191--202, February 2003.
    [7]
    SSA for Trees - GNU Project, May 2003. "http://www.gccsummit. org/2003/view_abstract.php?talk=2".
    [8]
    S. Gopal, T. Vijaykumar, J. Smith, and G. Sohi. Speculative Versioning Cache. In International Symposium on High-Performance Computer Architecture, pages 195--205, February 1998.
    [9]
    L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 58--69, October 1998.
    [10]
    V. Krishnan and J. Torrellas, A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, pages 866--880, September 1999.
    [11]
    R. Kumar, K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In International Symposium on Microarchitecture, December 2003.
    [12]
    S. Manne, A. Klauser, and D. Grunwald. Pipeline Gating: Speculation Control for Energy Reduction. In International Symposium on Computer Architecture, pages 132--141, July 1998.
    [13]
    P. Marcuello and A. Gonzalez. Clustered Speculative Multithreaded Processors. In International Conference on Supercomputing, pages 365--372, June 1999.
    [14]
    A. J. Martin, M. Nystroem, and P. Penzes. ET2: A Metric for Time and Energy Efficiency of Computation. Technical Report CSTR:2001.007, California Institute of Technology, December 2001.
    [15]
    S. Palacharla, N. P. Jouppi, and J. E. Smith. Complexity-Effective Superscalar Processors. In International Symposium on Computer Architecture, June 1997.
    [16]
    V. Petric and A. Roth. Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection. Technical Report MS-CIS-03-34, University of Pennsylvania, November 2003.
    [17]
    M. Prvulovic, M. J. Garzarán, L. Rauchwerger, and J. Torrellas. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. In International Symposium on Computer Architecture, pages 204--215, June 2001.
    [18]
    J. Renau. Chip Multiprocessors with Speculative Multithreading: Design for Performance and Energy Efficiency. PhD thesis, University of Illinois at Urbana-Champaign, 2004.
    [19]
    J. Renau, J. Tuck, W. Liu, L. Ceze, K. Strauss, and J. Torrellas. Tasking with Out-of-Order Spawn in TLS Chip Multiprocessors; Microarchitecture and Compilation. In International Conference on Supercomputing, June 2005.
    [20]
    P. Shivakumar and N. Jouppi. CACTI 3.0: An Integrated Cache Timing, Power and Area Model. Technical Report 2001/2, Compaq Computer Corporation, August 2001.
    [21]
    G. S. Sohi, S. E. Breach, and T. N. Vijayakumar. Multiscalar Processors. In International Symposium on Computer Architecture, pages 414--425, June 1995.
    [22]
    J. Steffan, C. Colohan, A. Zhai, and T. Mowry. A Scalable Approach to Thread-Level Speculation. In International Symposium on Computer Architecture, pages 1--12, June 2000.
    [23]
    J. Steffan, C. Colohan, A. Zhai, and T. Mowry. Improving Value Communication for Thread-Level Speculation. In International Symposium on High-Performance Computer Architecture, February 2002.
    [24]
    H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif. Full Chip Leakage Estimation Considering Power Supply and Temperature Variations. In International Symposium on Low Power Electronics and Design, August 2003.
    [25]
    M. Tremblay. MAJC: Microprocessor Architecture for Java Computing. Hot Chips, August 1999.
    [26]
    J. Tsai, J. Huang, C. Amlo, D. Lilja, and P. Yew. The Superthreaded Processor Architecture. IEEE Trans. on Computers, 48(9):881--902, September 1999.
    [27]
    J. Tuck. A Novel Compiler Framework for a Chip-Multiprocessor Architecture with Thread-Level Speculation. Master's thesis, University of Illinois at Urbana-Champaign, 2004.
    [28]
    H. S. Wang, X. P. Zhu, L. S. Peh, and S. Malik. Orion: A Power-Performance Simulator for Interconnection Networks. In International Symposium on Microarchitecture, December 2002.
    [29]
    Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan. HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects. Technical Report CS-2003-05, University of Virginia, Department of Computer Science, March 2003.

    Cited By

    View all
    • (2023)Accelerating RTL Simulation with Hardware-Software Co-DesignProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614257(153-166)Online publication date: 28-Oct-2023
    • (2022)A scalable architecture for reprioritizing ordered parallelismProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527387(437-453)Online publication date: 18-Jun-2022
    • (2020)Chronos: Efficient Speculative Parallelism for AcceleratorsProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378454(1247-1262)Online publication date: 9-Mar-2020
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '05: Proceedings of the 19th annual international conference on Supercomputing
    June 2005
    414 pages
    ISBN:1595931678
    DOI:10.1145/1088149
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    ICS05
    Sponsor:
    ICS05: International Conference on Supercomputing 2005
    June 20 - 22, 2005
    Massachusetts, Cambridge

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Accelerating RTL Simulation with Hardware-Software Co-DesignProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614257(153-166)Online publication date: 28-Oct-2023
    • (2022)A scalable architecture for reprioritizing ordered parallelismProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527387(437-453)Online publication date: 18-Jun-2022
    • (2020)Chronos: Efficient Speculative Parallelism for AcceleratorsProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378454(1247-1262)Online publication date: 9-Mar-2020
    • (2020)T4Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00024(159-172)Online publication date: 30-May-2020
    • (2018)Harmonizing speculative and non-speculative execution in architectures for ordered parallelismProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00026(217-230)Online publication date: 20-Oct-2018
    • (2017)SAM: Optimizing Multithreaded Cores for Speculative Parallelism2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.37(64-78)Online publication date: Sep-2017
    • (2016)Data-centric execution of speculative parallel programsThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195644(1-13)Online publication date: 15-Oct-2016
    • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
    • (2016)Data-centric execution of speculative parallel programs2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783708(1-13)Online publication date: Oct-2016
    • (2015)A scalable architecture for ordered parallelismProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830777(228-241)Online publication date: 5-Dec-2015
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media