Article

Dynamically Trading Frequency for Complexity in a GALS Microprocessor

Authors:

Michael L. ScottAuthors Info & Claims

MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture

Pages 157 - 168

https://doi.org/10.1109/MICRO.2004.18

Published: 04 December 2004 Publication History

Get Access

Abstract

Microprocessors are traditionally designed to provide "best overall" performance across a wide range of applications and operating environments. Several groups have proposed hardware techniques that save energy by "downsizing" hardware resources that are underutilized by the current application phase. Others have proposed a different energy-saving approach: dividing the processor into domains and dynamically changing the clock frequency and voltage within each domain during phases when the full domain frequency is not required. What has not been studied to date is how to exploit the adaptive nature of these approaches to improve performance rather than to save energy. In this paper, we describe an adaptive globally asynchronous, locally synchronous (GALS) microprocessor with a fixed global voltage and four independently clocked domains. Each domain is streamlined with modest hardware structures for very high clock frequency. Key structures can then be upsized on demand to exploit more distant parallelism, improve branch prediction, or increase cache capacity. Although doing so requires decreasing the associated domain frequency, other domain frequencies are unaffected. Our approach, therefore, is to maximize the throughput of each domain by finding the proper balance between the number of clock periods, and the clock frequency, for each application phase. To achieve this objective, we use novel hardware-based control techniques that accurately and efficiently capture the performance of all possible cache and queue configurations within a single interval, without having to resort to exhaustive online exploration or expensive offline profiling. Measuring across a broad suite of application benchmarks, we find that configuring our adaptive GALS processor just once per application yields 17.6% better performance, on average, than that of the "best overall" fully synchronous design. By adapting automatically to application phases, we can increase this advantage to more than 20%.

References

[1]

{1} D. H. Albonesi. Dynamic IPC/Clock Rate Optimization. In 25th Intl. Symp. on Computer Architecture, June 1998.

Abstract

References

Cited By

Recommendations

The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor

Dynamically Scheduling VLIW Instructions

A low-complexity microprocessor design with speculative pre-execution

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations