[PDF][PDF] Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on …, 2000•dl.acm.org
High-performance microprocessors are currently designed to exploit the inherent instruction
level parallelism (ILP) available in most applications. The techniques used in their design
and the aggressive scheduling techniques used to exploit this ILP tend to increase the
register requirements of the loops. If more registers than those available in the architecture
are required, some actions (such as spill code insertion) have to be applied to reduce this
pressure, at the expense of some performance degradation. This degradation could be …
level parallelism (ILP) available in most applications. The techniques used in their design
and the aggressive scheduling techniques used to exploit this ILP tend to increase the
register requirements of the loops. If more registers than those available in the architecture
are required, some actions (such as spill code insertion) have to be applied to reduce this
pressure, at the expense of some performance degradation. This degradation could be …
Abstract
High-performance microprocessors are currently designed to exploit the inherent instruction level parallelism (ILP) available in most applications. The techniques used in their design and the aggressive scheduling techniques used to exploit this ILP tend to increase the register requirements of the loops. If more registers than those available in the architecture are required, some actions (such as spill code insertion) have to be applied to reduce this pressure, at the expense of some performance degradation. This degradation could be avoided if a high–capacity register file were included without causing a negative impact on the cycle time of the processor.
In this paper we propose a two-level hierarchical register file organization for VLIW architectures that combines high capacity and low access time. For the configuration proposed in this paper, the new organization achieves a speed–up of 10–14% over a monolithic organization with 64 registers; it is obtained with a 43%(40%) reduction in area (peak power dissipation). Compared to a monolithic file with 32 registers, the speed–up is as much as 38% with just a 14%(4%) increase in area (peak power dissipation).
ACM Digital Library