# A Low-Power Register File with Dual-V<sub>t</sub> Dynamic Bit-Lines driven by CMOS Bootstrapped Circuit

Hyoung-Wook Lee, Hyunjoong Lee, Jong-Kwan Woo, Woo-Yeol Shin, and Suhwan Kim

Abstract—Recent CMOS technology scaling has seriously eroded the bit-line noise immunity of register files due to the consequent increase in active bit-line leakage currents. To restore its noise immunity while maintaining performance, we propose and evaluate a  $256 \times 40$ -bit register file incorporating dual-V<sub>t</sub> bit-lines with a boosted gate overdrive voltage in 65 nm bulk CMOS technology. Simulation results show that the proposed bootsrapping scheme lowers leakage current by a factor of 450 without its performance penalty.

*Index Terms*—Leakage, sub-threshold, dual-threshold, register file, deep sub-micron

### I. INTRODUCTION

Register files are performance-critical memory components in general-purpose microprocessors. They usually require a multiple read/write port capability to enable simultaneous access to several execution units in a super-scalar architecture. This requirement, coupled with the demand for a large number of word entries per port, forces the use of wired-OR style dynamic circuits for their local and global bit-lines [1].

As CMOS technology continues to scale, the supply voltage ( $V_{DD}$ ) falls with each generation and contributes the power consumption. However, the transistor threshold voltage ( $V_t$ ) should be reduced at the same rate to maintain adequate gate overdrive, which in turn

allows the enable circuit performance to improve by about 30% in each generation. However, lower  $V_t$  causes transistor sub-threshold leakage currents to increase exponentially; and hence the bit-line active leakage currents of register files also increase exponentially and the noise immunity of the bit-lines is dramatically reduced. Alternative bit-line circuit techniques are required to curtail this trend and combine good noise immunity with high performance [2].

## **II. REGISTER FILE ORGANIZATION**

Fig. 1 shows the architecture of 256 X 40-bit register file with four read-ports and two write-ports. A complete read operation is performed in two clock cycles. An 8-bit read/write address for each port is decoded in the first



**Fig. 1.** Organization of a  $256 \times 40$ -bit register file with four read- and two write-ports.

Manuscript received May. 15, 2009; revised Aug. 13, 2009.

<sup>\*</sup> School of Electrical Engineering, Seoul National University, San 56-1, Shinlim-dong, Kwanak-gu, Seoul, 151-742, Korea E-mail: Suhwan@snu.ac.kr

cycle, while delivering the read/write select signals into the register file array. The decoder is not critical and it is therefore implemented in conventional static CMOS. In the next cycle, the actual bit-line read operation is conducted and this is performance-critical.

Fig. 2 shows the four full-swing local bit-lines, which are totally independent of each other, sharing only the bit-cells. Each local bit-line (LBL) supports 16 bit-cells and a two-way merge via a static NMOS gate that drives a global bit-line (GBL). A bit-cell has four read-ports and two write-ports. Both reading and writing are singleended.

LBL and GBL dynamic ORs are susceptible to noise because of the high active leakage, which occurs during evaluation when the precharged domino node stays high. LBL is more sensitive than GBL because less charge is stored in the domino node and it has a wider dynamic OR structure. The dual- $V_t$  LBL uses high- $V_t$  on the readselection transistor and low- $V_t$  on the bit-cell data transistors. The high- $V_t$  transistors limit the bit-line leakage, but at the cost of lower performance due to a reduction in the drive currents of high- $V_t$  transistors.



Fig. 2. Four full-swing local bit-lines.



Fig. 3. Pseudo-static low-V<sub>t</sub> LBL.

Fig. 3 shows Intel's pseudo-static leakage-tolerant LBL structures, which is the result of the following modifications to a conventional dynamic bit-line topology [3]. 1) Read-select input and bit-cell data locations on the bit-line stack are swapped. Read-select signals feed the lower (M<sub>2</sub>) transistors of the LBL. 2) Staticprecharge transistors  $(P_x)$ , actively driven by the readselect signals are introduced. These transistors anchor the bit-line static nodes (Vs) at VDD when read-selects are at GND. 3) Static 2-input NOR gates are introduced, whose inputs are the stack node and bit-cell data. The NOR-gate outputs drive the upper  $(M_1)$  transistors of the LBL. When the read-select inputs are at GND, the NORgate outputs force the input of the leakage-limiting transistor M<sub>1</sub> to GND. This effectively cuts off the bitline sub-threshold active leakage current path, since both the drain and source currents of the transistor M<sub>1</sub> are maximized by source-body biasing to VDD, which further elevates their Vt. The result is an increase in bit-line noise immunity. However, the benefit of this pseudostatic technique comes at the cost of reduced performance due to the requirement for an additional NOR-gate and increased sub-threshold leakage through P<sub>x</sub> and M<sub>2</sub>.

The dual-V<sub>t</sub> local bit-line required for the proposed boosting technique is shown in Fig. 4. In this scheme, the leakage current was reduced by using  $M_{10}$ , which has a high threshold voltage. The current supply performance of  $M_{10}$  is improved by setting the voltage of RWL to  $V_{DD} + \Delta V$ , which is above the supply voltage.

When a bit-cell is not selected by the decoder, the value of RS will become '0' as well as the value of RWL. The element  $M_5$  is powered on by the value of RWL, and the MOS capacitor  $M_6$  forms a channel at the gate to store charge. When RS is selected by the decoder and its voltage is pulled up, the RWL is directly routed from the



Fig. 4. Dual-Vt LBL driven by a CMOS bootstrapped circuit.

supply voltage via  $M_5$ , and  $M_7$ . As a result, the voltage at RWL begin to increase. However, if the voltage at RWL is raised to higher than the threshold voltages of the buffer  $B_x$ , and the outputs of this buffer becomes '1' after the delay  $\tau$ ,  $M_5$  is turned off. Then,  $M_6$  is turned off and the charge stored at  $M_6$  causes a rapid increase in the current through  $M_7$ . This movement of charge causes the voltage at RWL to grow faster than it would if RWL were only connected through general CMOS circuits. This continues until the voltage reaches a stable value of  $V_{DD} + \Delta V$ . This approach improve the overdrive voltage of  $M_{10}$ , effectively compensating for the reduction in performance caused by the high threshold voltage of the transistor  $M_{10}$ .

The value of  $\Delta V$  after the flow of charge is complete can be expressed as a function of the parasitic capacitance  $C_B$  of the element  $M_6$ , the capacitance  $C_0$ loaded on to RWL, delay of buffer  $B_x$ , the threshold voltage  $V_{t,Bx}$  of the buffer, and the average current  $I_{M7}$  through  $M_7$ , when it is on. The charge stored in  $C_0$  before  $M_5$  is turned off by the buffer is  $(V_{t,Bx} \cdot C_0 + \tau \cdot I_{M7})$ , the charge stored at the drain-source node of  $M_5$  is  $C_B \cdot V_{DD}$ , and the charge supplied to RWL as the buffer is opened is  $C_B \cdot V_{DD}$ . As these charges has been re-distributed, the increased voltage at RWL is

$$\mathsf{V}_{\mathsf{RWL}} = \frac{((\mathsf{V}_{\mathsf{t}_{\mathsf{BX}}} \cdot \mathsf{C}_{\mathsf{O}} + \tau \cdot \mathsf{I}_{\mathsf{M7}}) + 2(\mathsf{C}_{\mathsf{B}} \cdot \mathsf{V}_{\mathsf{DD}}))}{(\mathsf{C}_{\mathsf{O}} + \mathsf{C}_{\mathsf{B}})}$$

From this equation, it can be seen that desired value of  $\Delta V$  can be obtained by fine-tuning  $V_{RWL}$ , because  $\Delta V = V_{RWL} - V_{DD}$ . However, the maximum value of  $\Delta V$  cannot exceed the threshold voltage of M<sub>5</sub>, which is  $V_{t,M5}$ , because discharge occurs as soon as  $\Delta V$  exceeds  $V_{t,M5}$  and M<sub>5</sub> has been opened. This characteristics has the effect of compensating for the weakness of this design, which is the frequent changes to C<sub>0</sub>,  $t_{p,B1}$  and I<sub>M7</sub> In other words, we can expect to achieve of  $\Delta V = V_{t,M5}$  in stable fashion regardless of the changes in C<sub>0</sub>,  $t_{p,B1}$  and I<sub>M7</sub> if the value of C<sub>B</sub> is higher than a certain level. It is also possible to raise the voltage to a certain level regardless of changes in process, supply voltage, and temperature. This prevents the possibility of the transistor M<sub>10</sub> being damaged by an excessively high gate voltage.

### **III. SIMULATION RESULTS**

Fig. 5 shows how voltage  $V_{RWL}$  changes when the RS signal is applied.  $\Delta V$  is intended to rise to its maximum value of  $V_{t,M5}$  (0.5V).

Fig. 6 (a) illustrates the voltage change in the local bitline. In boosting scheme, the position where element is opened is similar to that for a high threshold voltage element, but subsequently the voltage drops level similar to that low threshold voltage elements as the current increases rapidly. Fig. 6 (b) indicates that, in comparison



Fig. 5. RWL boosted during a READ operation.





with a general in which all the threshold voltage elements are of low, a boost-up construction are used, the difference in delay time is sufficiently small to be ignored. This boosting scheme reduces the current leakage in the entire bit-line about 450 times compared to a conventional one.

## **IV. CONCLUSIONS**

By combining read signal boosting with chargeinjection and dual- $V_{th}$  local bit-line, it is possible to implement a 256×40-bit register file without any performance penalty while reducing the current leakage by a factor of 450.

#### ACKNOWLEDGMENTS

The authors would like to thank for the financial support from Nano-Systems Institute (NSI-NCRC) program sponsored by the Korea Science and Engineering Foundation (KOSEF) and in part by Nano IP/SoC Promotion Group of Seoul R&BD Program.

#### REFERENCES

- N. Nintunze and G. Pham, "A register file with 8.4GHz throughput for efficient instruction scheduling in a Pentium 4 processor," *In Proceedings of the 2006 symposium on VLSI circuits*, pp. 188-189, 2006.
- [2] S. Heo, K. Barr, M. Hampton, and K. Asnovic, "Dynamic fine-grain leakage reduction using leakage-biased bitlines," *In Proceedings of the international symposium on computer architecture*, pp. 137-147, May 2002.
- [3] R. K. Krishnamurthy, A. Alvandpour, G. Balamurugan, N. R. Shanbhag, K. Soumyanath, S. Y. Borkar, "A 130-nm 6-GHz 256 32 bit leakage-tolerant register file," *IEEE Journal of Solid-State Circuits*, vol. SC-37, pp. 624–632, May 2002.
- [4] H.-W. Lee, H. Lee, J.-K. Woo, W.-Y. Shin, M. Kim, and S. Kim, "Low-power 256×40-bit register file with dual-V<sub>t</sub> dynamic bit-lines," *In the*

Proceedings of the Korean conference on semiconductor, pp. 681-682, February 2008.



**Hyoung-Wook Lee** received the B.S. and M.S. degree from electrical engineering at Seoul National University, Seoul, Korea, in 2006. He is currently working for Samsung, Korea. His research interests include

low-power and high-speed CMOS digital circuits.



**Hyunjoong Lee** received the B.S. and M.S. degrees in electrical engineering from Seoul National University, Seoul, Korea, in 2005 and 2007, respectively. He is currently pursuing Ph.D. degree at

Seoul National University, Seoul, Korea.

His research interests include sensor interface for MEMS and bio-applications, data converter and analog techniques in CMOS circuits.



Jong-Kwan Woo received the B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 2000. He is currently pursuing Ph. D. degree at Seoul University, Seoul, Korea. His research

interests include high-speed I/O clock distribution, data converter and low-power analog CMOS circuits.



**Woo-Yeol Shin** received the B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 2005. He is currently working toward the Ph.D. degree in electrical engineering at Seoul National

University, Seoul Korea. His research interests include high-speed I/O circuits and high-speed memory interfaces.



**Suhwan Kim** received the Ph.D. degree in electrical engineering and computer science from the University of Michigan, Ann Arbor, in 2001. From 1993 to 1999, he was with LG

Electronics, Seoul. From 2001 to

2004, he was a Research Staff Member with the IBM T.J. Watson Research Center, Yorktown Heights, NY. In 2004, he joined Seoul National University, Seoul, where he is currently an associate professor of electrical engineering. His research interests include analog and mixed signal circuits and device/circuit co-design opportunities.