research-article

Open access

Revisiting Clustered Microarchitecture for Future Superscalar Cores: A Case for Wide Issue Clusters

Authors:

Pierre Michaud,

Andrea Mondelli,

André SeznecAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 3

Article No.: 28, Pages 1 - 22

https://doi.org/10.1145/2800787

Published: 31 August 2015 Publication History

PDF eReader

Abstract

During the past 10 years, the clock frequency of high-end superscalar processors has not increased. Performance keeps growing mainly by integrating more cores on the same chip and by introducing new instruction set extensions. However, this benefits only some applications and requires rewriting and/or recompiling these applications. A more general way to accelerate applications is to increase the IPC, the number of instructions executed per cycle. Although the focus of academic microarchitecture research moved away from IPC techniques, the IPC of commercial processors was continuously improved during these years.

We argue that some of the benefits of technology scaling should be used to raise the IPC of future superscalar cores further. Starting from microarchitecture parameters similar to recent commercial high-end cores, we show that an effective way to increase the IPC is to allow the out-of-order engine to issue more micro-ops per cycle. But this must be done without impacting the clock cycle. We propose combining two techniques: clustering and register write specialization. Past research on clustered microarchitectures focused on narrow issue clusters, as the emphasis at that time was on allowing high clock frequencies.

Instead, in this study, we consider wide issue clusters, with the goal of increasing the IPC under a constant clock frequency. We show that on a wide issue dual cluster, a very simple steering policy that sends 64 consecutive instructions to the same cluster, the next 64 instructions to the other cluster, and so forth, permits tolerating an intercluster delay of three cycles. We also propose a method for decreasing the energy cost of sending results from one cluster to the other cluster.

Supplementary Material

TACO1203-28 (taco1203-28.pdf)

Slide deck associated with this paper

Download
1.05 MB

References

[1]

A. Baniasadi and A. Moshovos. 2000. Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors. In Proceedings of the International Symposium on Microarchitecture (MICRO’00).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

An evaluation of speculative instruction execution on simultaneous multithreaded processors

A Complexity-Effective Out-of-Order Retirement Microarchitecture

Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations