A study on polymorphing superscalar processor dynamically to improve power efficiency

S Srinivasan, R Rodrigues, A Annamalai… - 2013 IEEE Computer …, 2013 - ieeexplore.ieee.org
2013 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2013ieeexplore.ieee.org
Asymmetric Multicore Processors (AMP) have emerged as likely candidates to solve the
performance/power conundrum in the current generation of processors. Most recent work in
this area evaluate such multicores by considering large (usually out-of-order (OOO)) and
small (usually in-order (InO)) cores on the same chip. Dynamic online swapping of threads
between these cores is then facilitated whenever deemed beneficial. However, if threads are
swapped too often, the overheads may negatively impact the benefits of swapping. Hence …
Asymmetric Multicore Processors (AMP) have emerged as likely candidates to solve the performance/power conundrum in the current generation of processors. Most recent work in this area evaluate such multicores by considering large (usually out-of-order (OOO)) and small (usually in-order (InO)) cores on the same chip. Dynamic online swapping of threads between these cores is then facilitated whenever deemed beneficial. However, if threads are swapped too often, the overheads may negatively impact the benefits of swapping. Hence, in most recent work, thread swapping decisions are made at coarse grain instruction granularities, leaving out many opportunities. In this paper, we propose a scheme to mitigate the penalty imposed by thread swapping and yet achieve all the benefits of AMPs. Here, a single superscalar OOO core morphs itself into an InO core at runtime, whenever determined to be performance/Watt efficient. Certain Intel processors already have a similar mechanism to statically morph an OOO core to an InO core to facilitate debug. We extend this existing capability to perform dynamic core morphing at runtime with an orthogonal objective of improving power efficiency. Results indicate that on an average, performance/Watt benefits of 10% can be extracted by our proposed morphing scheme at a very small performance penalty of 3.8%. Since this scheme is based on existing mechanisms readily available in current microprocessors, it incurs no hardware overheads.
ieeexplore.ieee.org