Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/MICRO.2004.18acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Dynamically Trading Frequency for Complexity in a GALS Microprocessor

Published: 04 December 2004 Publication History

Abstract

Microprocessors are traditionally designed to provide "best overall" performance across a wide range of applications and operating environments. Several groups have proposed hardware techniques that save energy by "downsizing" hardware resources that are underutilized by the current application phase. Others have proposed a different energy-saving approach: dividing the processor into domains and dynamically changing the clock frequency and voltage within each domain during phases when the full domain frequency is not required. What has not been studied to date is how to exploit the adaptive nature of these approaches to improve performance rather than to save energy. In this paper, we describe an adaptive globally asynchronous, locally synchronous (GALS) microprocessor with a fixed global voltage and four independently clocked domains. Each domain is streamlined with modest hardware structures for very high clock frequency. Key structures can then be upsized on demand to exploit more distant parallelism, improve branch prediction, or increase cache capacity. Although doing so requires decreasing the associated domain frequency, other domain frequencies are unaffected. Our approach, therefore, is to maximize the throughput of each domain by finding the proper balance between the number of clock periods, and the clock frequency, for each application phase. To achieve this objective, we use novel hardware-based control techniques that accurately and efficiently capture the performance of all possible cache and queue configurations within a single interval, without having to resort to exhaustive online exploration or expensive offline profiling. Measuring across a broad suite of application benchmarks, we find that configuring our adaptive GALS processor just once per application yields 17.6% better performance, on average, than that of the "best overall" fully synchronous design. By adapting automatically to application phases, we can increase this advantage to more than 20%.

References

[1]
{1} D. H. Albonesi. Dynamic IPC/Clock Rate Optimization. In 25th Intl. Symp. on Computer Architecture, June 1998.
[2]
{2} R. Balasubramonian, D. H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures. In 33rd Intl. Symp. on Microarchitecture, Dec. 2000.
[3]
{3} L. Bengtsson and B. Svensson. A Globally Asynchronous, Locally Synchronous SIMD Processor. In 3rd Intl. Conf. on Massively Parallel Computing Systems , Apr. 1998.
[4]
{4} D. Burger and T. Austin. The Simplescalar Tool Set, Version 2.0. Technical Report CS-TR-97-1342, U. Wisc.-Madison, June 1997.
[5]
{5} A. Buyuktosunoglu, D. H. Albonesi, S. Schuster, D. Brooks, P. Bose, and P. Cook. A Circuit Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors. In 11th Great Lakes Symp. on VLSI, Mar. 2001.
[6]
{6} B. R. Childers, H. Tang, and R. Melhem. Adapting Processor Supply Voltage to Instruction-Level Parallelism. In Kool Chips Workshop, Dec. 2000.
[7]
{7} L. T. Clark. Circuit Design of XScale¿ Microprocessors. In 2001 Symposium on VLSI Circuits, Short Course on Physical Design for Low-Power and High-Performance Microprocessor Circuits, June 2001.
[8]
{8} A. S. Dhodapkar and J. E. Smith. Managing Multi-Configuration Hardware via Dynamic Working Set Analysis. In 29th Intl. Symp. on Computer Architecture , May 2002.
[9]
{9} S. Dropsho, A. Buyuktosunoglu, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, G. Semeraro, G. Magklis, and M. Scott. Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power. In 11th Intl. Conf. on Parallel Architectures and Compilation Techniques, Sept. 2002.
[10]
{10} M. Fleischmann. LongRun¿ Power Management. Technical report, Transmeta Corporation, Jan. 2001.
[11]
{11} D. Folegnani and A. Gonzalez. Energy-Efficient Issue Logic. In 28th Intl. Symp. on Computer Architecture, June 2001.
[12]
{12} T. R. Halfhill. Transmeta Breaks x86 Low-Power Barrier. Microprocessor Report, 14(2), Feb. 2000.
[13]
{13} A. Hartstein and T. R. Puzak. The Optimum Pipeline Depth for a Microprocessor. In 29th Intl. Symp. on Computer Architecture, May 2002.
[14]
{14} C.-H. Hsu, U. Kremer, and M. Hsiao. Compiler-Directed Dynamic Frequency and Voltage Scaling. In Workshop on Power-Aware Computer Systems, Nov. 2000.
[15]
{15} A. Iyer and D. Marculescu. Power and Performance Evaluation of Globally Asynchronous Locally Synchronous Processors. In 29th Intl. Symp. on Computer Architecture, May 2002.
[16]
{16} C. Kim, D. Burger, and S. W. Keckler. An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches. In 10th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002.
[17]
{17} S. Leibson. XScale (StrongArm-2) Muscles In. Microprocessor Report, 14(9), Sept. 2000.
[18]
{18} G. Magklis, M. L. Scott, G. Semeraro, D. H. Albonesi, and S. G. Dropsho. Profile-Based Dynamic Voltage and Frequency Scaling for a Multiple Clock Domain Microprocessor. In 30th Intl. Symp. on Computer Architecture , June 2003.
[19]
{19} D. Marculescu. On the Use of Microarchitecture-Driven Dynamic Voltage Scaling. In Workshop on Complexity-Effective Design, June 2000.
[20]
{20} S. McFarling. Combining Branch Predictors. Technical Report Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
[21]
{21} V. Milutinovic, D. Fura, and W. Helbig. Pipeline Design Tradeoffs in a 32-Bit Gallium Arsenide Micro-processor. IEEE Trans. on Computers, 40(11), Nov. 1991.
[22]
{22} J. Muttersbach, T. Villager, H. Kaeslin, N. Felber, and W. Fichtner. Globally-Asynchronous Locally-Synchronous Architectures to Simplify the Design of On-Chip Systems. In 12th IEEE Intl. ASIC/SOC Conf., Sept. 1999.
[23]
{23} S. Palacharla, N. Jouppi, and J. Smith. Quantifying the Complexity of Superscalar Processors. Technical Report TR-96-1328, U. Wisc.-Madison, Nov. 1996.
[24]
{24} D. Ponomarev, G. Kucuk, and K. Ghose. Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources. In 34th Intl. Symp. on Microarchitecture, Dec. 2001.
[25]
{25} M. Powell, A. Agrawal, T. N. Vijaykumar, B. Falsafi, and K. Roy. Reducing Set-Associative Cache Energy Via Selective Direct-Mapping and Way Prediction. In 34th Intl. Symp. on Microarchitecture, Dec. 2001.
[26]
{26} R. Sasanka, C. Hughes, and S. Adve. Joint Local and Global Hardware Adaptations for Energy. In 10th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002.
[27]
{27} G. Semeraro, D. H. Albonesi, S. G. Dropsho, G. Magklis, S. Dwarkadas, and M. L. Scott. Dynamic Frequency and Voltage Control for a Multiple Clock Domain Microarchitecture. In 35th Intl. Symp. on Microarchitecture , Nov. 2002.
[28]
{28} G. Semeraro, D. H. Albonesi, G. Magklis, M. L. Scott, S. G. Dropsho, and S. Dwarkadas. Hiding Synchronization Delays in a GALS Processor Microarchitecture. In 10th Intl. Symp. on Asynchronous Circuits and Systems, Apr. 2004.
[29]
{29} G. Semeraro, G. Magklis, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L. Scott. Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling. In 8th Intl. Symp. on High-Performance Computer Architecture , Feb. 2002.
[30]
{30} T. Sherwood and B. Calder. Time Varying Behavior of Programs. Technical Report UCSD-CS99-630, U. Cal. San Diego, Aug. 1999.
[31]
{31} A. E. Sjogren and C. J. Myers. Interfacing Synchronous and Asynchronous Modules Within A High-Speed Pipeline. In 17th Conf. on Advanced Research in VLSI, Sept. 1997.
[32]
{32} E. Sprangle and D. Carmean. Increasing Processor Performance by Implementing Deeper Pipelines. In 29th Intl. Symp. on Computer Architecture, May 2002.
[33]
{33} S. J. E. Wilton and N. P. Jouppi. CACTI: An Enhanced Cache Access and Cycle Time Model. IEEE J. of Solid-State Circuits, May 1996.
[34]
{34} B. Xu and D. Albonesi. Runtime Reconfiguration Techniques for Efficient General Purpose Computation. IEEE Design and Test of Computers, Jan. 2000.
[35]
{35} R. Zimmermann. Computer Arithmetic: Principles, Architectures, and VLSI Design. Personal publication (Available at http://www.iis.ee.ethz.ch/~zimmi/- publications/comp_arith_notes.ps.gz), Mar. 1999.

Cited By

View all
  • (2015)Dynamic MIPS Rate Stabilization for Complex ProcessorsACM Transactions on Architecture and Code Optimization10.1145/271457512:1(1-25)Online publication date: 2-Apr-2015
  • (2011)A phase adaptive cache hierarchy for SMT processorsMicroprocessors & Microsystems10.1016/j.micpro.2011.08.00835:8(683-694)Online publication date: 1-Nov-2011
  • (2010)Simulating a LAGS processor to consider variable latency on L1 D-CacheProceedings of the 2010 Summer Computer Simulation Conference10.5555/1999416.1999421(56-63)Online publication date: 11-Jul-2010
  • Show More Cited By
  1. Dynamically Trading Frequency for Complexity in a GALS Microprocessor

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
      December 2004
      345 pages
      ISBN:0769521266

      Sponsors

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 04 December 2004

      Check for updates

      Qualifiers

      • Article

      Conference

      MICRO37
      Sponsor:

      Acceptance Rates

      MICRO 37 Paper Acceptance Rate 29 of 158 submissions, 18%;
      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Dynamic MIPS Rate Stabilization for Complex ProcessorsACM Transactions on Architecture and Code Optimization10.1145/271457512:1(1-25)Online publication date: 2-Apr-2015
      • (2011)A phase adaptive cache hierarchy for SMT processorsMicroprocessors & Microsystems10.1016/j.micpro.2011.08.00835:8(683-694)Online publication date: 1-Nov-2011
      • (2010)Simulating a LAGS processor to consider variable latency on L1 D-CacheProceedings of the 2010 Summer Computer Simulation Conference10.5555/1999416.1999421(56-63)Online publication date: 11-Jul-2010
      • (2009)Improving SMT performanceProceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers10.1145/1570256.1570271(2029-2034)Online publication date: 8-Jul-2009
      • (2009)Dynamic MIPS rate stabilization in out-of-order processorsACM SIGARCH Computer Architecture News10.1145/1555815.155576337:3(46-56)Online publication date: 20-Jun-2009
      • (2009)Dynamic MIPS rate stabilization in out-of-order processorsProceedings of the 36th annual international symposium on Computer architecture10.1145/1555754.1555763(46-56)Online publication date: 20-Jun-2009
      • (2007)Dynamic capacity-speed tradeoffs in SMT processor cachesProceedings of the 2nd international conference on High performance embedded architectures and compilers10.5555/1762146.1762160(136-150)Online publication date: 28-Jan-2007
      • (2007)Architectural contestingACM SIGARCH Computer Architecture News10.1145/1294313.129432135:3(28-35)Online publication date: 1-Jun-2007

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media