Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Energy efficient co-adaptive instruction fetch and issue

Published: 01 May 2003 Publication History

Abstract

Front-end instruction delivery accounts for a significant fraction of the energy consumed in a dynamic superscalar processor. The issue queue in these processors serves two crucial roles: it bridges the front and back ends of the processor and serves as the window of instructions for the out-of-order engine. A mismatch between the front end producer rate and back end consumer rate, and between the supplied instruction window from the front end, and the required instruction window to exploit the level of application parallelism, results in additional front-end energy, and increases the issue queue utilization. While the former increases overall processor energy consumption, the latter aggravates the issue queue hot spot problem.We propose a complementary combination of fetch gating and issue queue adaptation to address both of these issues. We introduce an issue-centric fetch gating scheme based on issue queue utilization and application parallelism characteristics. Our scheme attempts to provide an instruction window size that matches the current parallelism characteristics of the application while maintaining enough queue entries to avoid back-end starvation. Compared to a conventional fetch gating scheme based on flow-rate matching, we demonstrate 20% better overall energy-delay with a 44% additional reduction in issue queue energy. We identify Icache energy savings as the largest contributor to the overall savings and quantify the sources of savings in this structure. We then couple this issue-driven fetch gating approach with an issue queue adaptation scheme based on queue utilization. While the fetch gating scheme provides a window of issue queue instructions appropriate to the level of program parallelism, the issue queue adaptation approach shuts down the remaining underutilized issue queue entries. Used in tandem, these complementary techniques yield a 20% greater issue queue energy savings than the addition of the savings from each technique applied in isolation. The result of this combined approach is a 6% overall energy-delay savings coupled with a 54% reduction in issue queue energy.

References

[1]
D. H. Albonesi. Dynamic IPC/Clock Rate Optimization. In Proceedings of the 25th International Symposium on Computer Architecture, pages 282--292, June 1998.
[2]
D. H. Albonesi. The Inherent Energy Efficiency of Complexity-Adaptive Processors. In Proceedings of ISCA Workshop on Power-Driven Microarchitecture, June 1998.
[3]
C. J. Anderson, J. Petrovich, J. Keaty, and G. Nusbaum. Physical Design of a Fourth-Generation POWER GHz microprocessor. In IEEE International Solid-State Circuits Conference - Digest of Technical Papers, pages 232--233, February 2001.
[4]
A. Baniasadi and A. Moshovos. Instruction Flow-Based Front End Throttling for Power-Aware High-Performance Processors. In Proceedings of the International Symposium on Low Power Electronics and Design, August 2001.
[5]
P. Bose, D. M. Brooks, A. Buyuktosunoglu, P. W. Cook, K. Das, P. Emma, M. Gschwind, H. Jacobson, T. Karkhanis, S. E. Schuster, J. E. Smith, V. Srinivasan, V. Zyuban, D. H. Albonesi, and S. Dwarkadas. Early-Stage Definition of LPX: A Low Power Issue-Execute Processor Prototype. In Proceedings of HPCA Workshop on Power-Aware Computer Systems, February 2002.
[6]
D. M. Brooks, P. Bose, S. E. Schuster, H. Jacobson, P. N. Kudva, A. Buyuktosunoglu, J. Wellman, V. Zyuban, M. Gupta, and P. W. Cook. Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors. IEEE Micro, 20(6):26--44, November/December 2000.
[7]
D. M. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimization. In Proceedings of the 27th International Symposium on Computer Architecture, June 2000.
[8]
D. Burger and T. Austin. The SimpleScalar toolset, version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, 1997.
[9]
A. Buyuktosunoglu, D. H. Albonesi, S. E. Schuster, D. M. Brooks, P. Bose, and P. W. Cook. Power Efficient Issue Queue Design. In Power Aware Computing, pages 37--60. Kluwer Academic Publishers, 2002.
[10]
A. Buyuktosunoglu, S. E. Schuster, D. M. Brooks, P. Bose, P. W. Cook, and D. H. Albonesi. A Circuit Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors. In Proceedings of 11th Great Lakes Symposium on VLSI, pages 73--78, March 2001.
[11]
S. Dropsho, A. Buyuktosunoglu, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, G. Semeraro, G. Magklis, and M. L. Scott. Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power. In Proceedings of the 11th International Conference on Parallel Architectures and Compilation Techniques, pages 141--152, September 2002.
[12]
D. Folegnani and A. Gonzalez. Energy-Effective Issue Logic. In Proceedings of the 28th International Symposium on Computer Architecture, pages 230--239, June 2001.
[13]
S. H. Gunther, F. Binns, D. M. Carmean, and J. C. Hall. Managing the Impact of Increasing Microprocessor Power Consumption. Intel Technology Journal, March 2001.
[14]
T. Halfhill. Transmeta Breaks x86 Low Power Barrier. Microprocessor Report, 14(2):1--19, February 2000.
[15]
M. S. Hrishikesh, N. P. Jouppi, K. I. Farkas, D. Burger, S. W. Keckler, and P. Shivakumar. The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays. In Proceedings of the 29th International Symposium on Computer Architecture, May 2002.
[16]
E. Jacobsen, E. Rotenberg, and J. E. Smith. Assigning Confidence to Conditional Branch Predictions. In Proceedings of 29th International Symposium on Microarchitecture, pages 142--152, December 1996.
[17]
T. Karkhanis, P. Bose, and J. E. Smith. Saving Energy with Just in Time Instruction Delivery. In Proceedings of the International Symposium on Low Power Electronics and Design, August 2002.
[18]
S. Manne, A. Klauser, and D. Grunwald. Pipeline Gating: Speculation Control for Energy Reduction. In Proceedings of the 25th International Symposium on Computer Architecture, pages 132--141, June/July 1998.
[19]
D. Marculescu. Profile-Driven Code Execution for Low Power Dissipation. In Proceedings of the International Symposium on Low Power Electronics and Design, pages 253--255, July 2000.
[20]
D. Ponomarev, G. Kucuk, and K. Ghose. Dynamic Allocation of Datapath Resources for Low Power. In Proceedings of 34th International Symposium on Microarchitecture, pages 90--102, December 2001.
[21]
H. Sanchez. Thermal Management System for High Performance PowerPC Microprocessors. In Digest of Technical Papers IEEE COMPCON, 1997.
[22]
J. M. Tendler, J. S. Dodson, J. S. Fields, H. Le, and B. Sinharoy. POWER4 System Microarchitecture. IBM Journal Research and Development, 46(1):5--27, January 2002.
[23]
O. S. Unsal, I. Koren, C. M. Krishna, and C. A. Moritz. Cool-Fetch: Compiler-Enabled Power-Aware Fetch Throttling. IEEE Computer Architecture Letters, 1, July 2002.

Cited By

View all
  • (2015)Customizable ComputingSynthesis Lectures on Computer Architecture10.2200/S00650ED1V01Y201505CAC03310:3(1-118)Online publication date: 6-Jul-2015
  • (2008)Power optimization of embedded real-time systems and their adaptabilityAutomatic Control and Computer Sciences10.3103/S014641160803007342:3(153-162)Online publication date: 27-Jul-2008
  • (2008)Energy reduction of the fetch mechanism through dynamic adaptationIET Computers & Digital Techniques10.1049/iet-cdt:200601792:2(94)Online publication date: 2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 31, Issue 2
ISCA 2003
May 2003
422 pages
ISSN:0163-5964
DOI:10.1145/871656
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture
    June 2003
    432 pages
    ISBN:0769519458
    DOI:10.1145/859618
    • Conference Chair:
    • Allan Gottlieb,
    • Program Chair:
    • Kai Li

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2003
Published in SIGARCH Volume 31, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Customizable ComputingSynthesis Lectures on Computer Architecture10.2200/S00650ED1V01Y201505CAC03310:3(1-118)Online publication date: 6-Jul-2015
  • (2008)Power optimization of embedded real-time systems and their adaptabilityAutomatic Control and Computer Sciences10.3103/S014641160803007342:3(153-162)Online publication date: 27-Jul-2008
  • (2008)Energy reduction of the fetch mechanism through dynamic adaptationIET Computers & Digital Techniques10.1049/iet-cdt:200601792:2(94)Online publication date: 2008
  • (2007)Exploiting Operand Availability for Efficient Simultaneous MultithreadingIEEE Transactions on Computers10.1109/TC.2007.2856:2(208-223)Online publication date: 1-Feb-2007
  • (2006)Control Speculation for Energy-Efficient Next-Generation Superscalar ProcessorsIEEE Transactions on Computers10.1109/TC.2006.3255:3(281-291)Online publication date: 1-Mar-2006
  • (2005)Energy-aware fetch mechanismProceedings of the 2005 international symposium on Low power electronics and design10.1145/1077603.1077615(42-47)Online publication date: 8-Aug-2005
  • (2021)Extending Performance-Energy Trade-offs Via Dynamic Core ScalingIEEE Transactions on Computers10.1109/TC.2020.302930670:11(1875-1886)Online publication date: 1-Nov-2021
  • (2020)Recurrent Attention Network with Reinforced Generator for Visual DialogACM Transactions on Multimedia Computing, Communications, and Applications10.1145/339089116:3(1-16)Online publication date: 5-Jul-2020
  • (2020)Multi-View Graph Matching for 3D Model RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/338792016:3(1-20)Online publication date: 5-Jul-2020
  • (2020)Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-Modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/338716416:3(1-23)Online publication date: 14-Jul-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media