Hyperscalar: A novel dynamically reconfigurable multi-core architecture
JC Chiu, YL Chou, PK Chen - 2010 39th International …, 2010 - ieeexplore.ieee.org
JC Chiu, YL Chou, PK Chen
2010 39th International Conference on Parallel Processing, 2010•ieeexplore.ieee.orgThis paper proposes a reconfigurable multi-core architecture, called hyperscalar that
enables many scalar cores to be united dynamically as a larger superscalar processor to
accelerate a thread. To accomplish this, we propose the virtual shared register files (VSRF)
that allow the instructions of a thread executed in the united cores to logically face a uniform
set of register files. We also propose the instruction analyzer (IA) with the capability of
detecting and tagging the dependence information to the newly fetched instructions …
enables many scalar cores to be united dynamically as a larger superscalar processor to
accelerate a thread. To accomplish this, we propose the virtual shared register files (VSRF)
that allow the instructions of a thread executed in the united cores to logically face a uniform
set of register files. We also propose the instruction analyzer (IA) with the capability of
detecting and tagging the dependence information to the newly fetched instructions …
This paper proposes a reconfigurable multi-core architecture, called hyperscalar that enables many scalar cores to be united dynamically as a larger superscalar processor to accelerate a thread. To accomplish this, we propose the virtual shared register files (VSRF) that allow the instructions of a thread executed in the united cores to logically face a uniform set of register files. We also propose the instruction analyzer (IA) with the capability of detecting and tagging the dependence information to the newly fetched instructions. According to the tags, instructions in the united cores can issue requests to obtain their remote operands via the VSRF. The reconfigurable feature of hyperscalar can cover a spectrum of workloads well, providing high single-thread performance when TLP is low and high throughput when TLP is high. Simulation results show that the a 8-core hyperscalar chip multiprocessor’s 2, 4, and 8-core-united configurations archive 94%, 90%, and 83% of the performance of the monolithic 2, 4, and 8-issue out-of-order superscalar processors with lower area costs and better support for software diversity.
ieeexplore.ieee.org