Evaluation of Energy-Recovering Interconnects For Low-Power 3D Stacked Ics

Evaluation of Energy-Recovering Interconnects for Low-Power 3D Stacked ICs
P. Asimakopoulos1 , G. Van der Plas2 , A. Yakovlev1 and P. Marchal2 1 School of EECE, Newcastle University, UK 2 IMEC, Belgium
AbstractEnergy-recovering schemes have been proposed in the literature as an alternative approach to low-power design, while their performance has been demonstrated to be extremely promising when driving large capacitive loads, such as clock distribution networks [1]. This work investigates the potential of the energy-recovering methodology for improving the energy efciency of through-silicon via (TSV) interconnects in 3D ICs.
If an equivalent capacitive load was driven by conventional CMOS logic while switching at the same frequency f , the energy dissipation per cycle would be:
ECON V = 1 2 CL VDD 2 (2)
I. I NTRODUCTION Energy dissipation is a major concern for battery-powered mobile systems. In 3D stacked ICs, TSV interconnects enable low-parasitic direct connections between tiers and can allow for considerable energy savings when compared to traditional PCB chip-to-chip interconnections [2]. However, TSV parasitic capacitance can still become an important source of energy dissipation in large, densely interconnected 3D SoCs, since the combined capacitance and thus the energy required to drive TSVs, will increase linearly with the number of tiers and interconnections. Energy-recovering logic has demonstrated great potential when driving large capacitive loads and circuits utilizing this technique have been successfully implemented in the past [1], [5]. Energy-recovering designs can break the C VDD energy limit of conventional static CMOS, by spreading out charge transfer more evenly over an entire switching period and thus making energy dissipation proportional to the operating frequency [3]. The result is very low energy dissipation which can asymptotically approach zero at low operating frequencies [4]. In this paper, an energy-recovering conguration for 3D ICs is presented and an analysis is attempted based on theoretical models. The proposed circuit is evaluated against conventional static CMOS, while the energy efciency dependence on design parameters is extracted. II. P ROPOSED C ONFIGURATION In energy-recovering systems, load capacitances (CL ) are typically driven by resonant sinusoidal waves which charge nodes and recover part of the charge in the falling half-period of the wave. It can be proven [6] that if a resistive load R is present in the current ow path, the energy dissipated on that 1 load during a full charge/discharge cycle ( f ) would be:
2 2 (RCL f ) CL VDD 2 978-1-4244-4512-7/09/$25.00 2009 IEEE EDISS = (1)
1 Therefore, as long as the switching frequency is f < 2 RCL , energy-recovering designs can save energy when compared to conventional CMOS logic. In a 3D IC, TSVs will provide an interconnection path for signals crossing adjacent tiers. Since TSVs can have an appreciable parasitic capacitance [7], which will increase as additional TSVs are connected in series, ratioed buffer stages will be required so as to ensure a sharp-rising signal at the output of the TSVs. To reduce energy dissipation, an energyrecovering conguration could replace the required buffering stages allowing only sinusoidal signals crossing TSVs and thus saving energy according to (1). In the proposed 3D interconnecting conguration (Figure 1), each tier is clocked with the assistance of conventional buffer stages whereas data signals, prior to crossing a TSV, are converted to sinusoidal waves with the use of adiabatic drivers (Figure 2).
P2LC
f-f
DATA OUT
TSV
TSV
TSV
Tier3
P2LC
TSV
TSV
TSV
Tier2
CLK BUFFER
CLK IN
ADIABATIC DRIVER
f-f
DATA IN
Tier1
Figure 1.
Proposed conguration.
The excellent energy efciency of sinusoidal charging and energy recovery can considerably reduce energy dissipation for interconnecting signals. At the same time, compatibility with digital logic is retained by converting locally in each tier,
978-1-4244-4512-7/09/$25.00 2009 IEEE
IN OUT
IN OUT
IN
IN
Figure 2.
Adiabatic driver.
capacitance. Assuming that both nFET and pFET transistors are equally sized (Wn ), the cross-coupled pFETs reduce the conventionally-driven input capacitance by , however their gate capacitance (Cn ) will appear as an additional capacitive load to the driver. Furthermore, drain/source diffusion capacitance (CD ) can be an important portion of the load since in each cycle 6CD will be present in the current ow path (4 contributed by the ON T-gate and 2 by the OFF T-gate). Therefore the combined load capacitance will be:
CL = CT SV + Cn + 6CD (4)
the resulting sinusoidal dual-rail pulses back to level signals through Pulse-to-Level converters (P2LC) (Figure 3).
VDD
The resistance of the T-gate (RT G ) can be related to the gate capacitance by a Device technology factor (T G ) [11], which we can dene for our convenience:
T G = RT G Cn RT G = T G Cn (5)
OUT IN
OUT IN
Combining (1), (4) and (5) gives the total dissipated energy per cycle in the adiabatic driver:
2 EAD = Cn VDD +
2 T G 2 f [CT SV + Cn + 6CD ]2 VDD 2 Cn
(6)
Figure 3.
A Pulse-to-Level Converter implementation.
III. A NALYSIS Sinusoidal waveforms can be efciently generated with the use of a single resonant-tank inductor [8], which when combined with the adiabatic drivers resistance (RT G ) and the TSV capacitance (CT SV ), forms an RLC oscillator (Figure 4) resonating at:
f = 2 Lind CT SV 1 (3)
The second term of (6) has a consistent contribution to the energy dissipation on every cycle, while the rst term is dependent on the data switching activity (D). We can also further simplify this equation by dening the diffusion capacitance as a fraction of the input capacitance,
CD = bCn (7)
2 f 2 TG
and equating term
to a variable,
2 T G f 2 (8)
y=
Equation (6) then becomes:

EAD = = y 2 2 D Cn VDD + [CT SV + (6b + 1)Cn ]2 VDD Cn D 1 2 2 Cn CT SV yVDD + (6b + 1)2 + y Cn 2 +(12b + 2)CT SV yVDD
L VDD/2
Rind
RTGATE Adiabatic Driver CTSV
M1 T=1/f
(9)
Figure 4.
Resonant pulse generator.
Since in (9) Cn is the free parameter, the rst two terms of (9) are inversely proportional and EAD is minimized when they become equal. The value of the gate capacitance at that point is calculated as:
s Cn(opt) = [ D + (6b + 1)2 ]1 CT SV y (10)
The bulk of the energy dissipation in the proposed conguration will occur on the adiabatic driver, the inductors parasitic resistance (Rind ) and transistor M1. Since the sinusoidal pulses will have to be converted back to level-signals after crossing a TSV, the P2L converters will contribute to the total energy dissipation as well. A. Adiabatic driver The adiabatic driver used in this conguration (Figure 2) is based on adiabatic circuits presented in [9], [10], with the load capacitance replaced by the parasitic capacitance of TSVs (CT SV ). Energy in the adiabatic driver is dissipated both on the transmission-gate resistance (RT G ) and for driving its input
Combining (9) and (10) results in minimum energy dissipation for the adiabatic driver, which is:
"s EAD(min) = # D 2 2 + (6b + 1) 2yC + (6b + 1) T SV VDD y (11)
Replacing variable y in (11), we can observe the dependence of the energy dissipation on parameters f and T G :
"s EAD = 2D + (6b + 1)2 + (6b + 1) 2 T G f #
2 2 T G f CT SV VDD
(12)
B. Switch M1 Respectively, the energy dissipation on M1 which is switched-on briey to recover the energy dissipated each cycle on the Rtotal = RT G + Rind , is a trade-off between dissipation on its on-resistance RM 1 and input capacitance CM 1 . Since M1 is a fairly large transistor, previous ratioed stages will have a signicant energy consumption. For that reason a m factor is used to compensate for the additional losses. IM 1(rms) is the rms current passing through the transistor while turned-on and VGM 1 is the peak gate voltage. A methodology for deriving optimum values for both these parameters is proposed in [8]. The total dissipated energy on M1 can be calculated as:
2 EM 1 = mCM 1 VGM 1 + 2 IM 1(rms) RM 1
D. Total energy dissipation The total dissipated energy for the adiabatic driver, switch M1 and inductors resistance can be calculated by combining (12), (17) and (20):
Etotal = EAD + EM 1 + Eind (21)
Nevertheless, for (21) to be complete the energy contribution of the P2L converters has to be included as well, which cannot be theoretically derived and is addressed in the following sections. IV. E VALUATION For the evaluation of the proposed method, the T G , M 1 and CD parameters were extracted using simulation models for a 130nm process. The TSV capacitance was assumed to be 160fF, a value which can be either derived from a single TSV or a series combination of TSVs distributed among subsequent tiers in a 3D IC. The supply voltage was at 1.2V and data switching activity was assumed to be equal to 1. All parameters were inserted into (21) and the energy dissipation per bit per cycle was calculated, with the operating frequency and Q factor as free variables. An identical circuit conguration was simulated as well in a commercial SPICE program and the simulation data showed good correlation with the theoretical estimations. The inductors quality factor can have a considerable effect on the energy dissipation and this can be observed in Figure 5, where the estimated energy dissipation is plotted for various Export operating frequencies.
Energy Dissipation - 130nm - 160fF Load
120 100
(13)
For transistor M1 we can also dene a Device technology factor:

M 1 = RM 1 CM 1 RM 1 = M 1 CM 1 (14)
Substituting RM 1 into (13):

2 EM 1 = mCM 1 VGM 1 +
1 CM 1
2 IM 1(rms) M 1
(15)
Minimum energy consumption will occur when the two terms of (15) are equal:
r CM 1(opt) = M 1 IM 1(rms) mf VGM 1 r EM 1(min) = 2IM 1(rms) VGM 1 mM 1 f (16)
(17)
80
Energy (fJ)
60 40 20
C. Inductors parasitic resistance The inductors (Lind ) parasitic resistance Rind is proportional to the Qind factor, which is typically implementation technology dependent. For the purposes of this analysis it can be estimated as:
1 Rind 1 Qind s s Lind CL Lind CL (18)
11
13
15
17
19
Q factor
500Mhz 300Mhz 100Mhz
Figure 5.
Estimated energy dissipation at 500, 200, 100 MHz. TSV=160fF.
Qind Rind Rind
= =
V. C OMPARISON The proposed conguration was compared to a conventional CMOS buffer with ratioed stages driving an equivalent TSV load capacitance. Since the conventional buffer when transmitting data will produce just one charging event per 2 cycles of the operating frequency, its energy dissipation per cycle is calculated as in (2). If we also include the data switching activity, then:
ECON V = D 1 2 CL VDD 2 (22)
1 Qind CL 2f
Energy dissipation on Rind can be estimated using (1) as:

Eind = 2 2 (Rind CL f ) CL VDD 2 (19)
Combining (18) and (19) we calculate the inductors energy dependence on the Qind factor:
Eind CL 2 = V 4 Qind DD (20)
The estimated energy performance improvement of the proposed conguration over the conventional buffer for D = 1, can be seen in Figure 6.
Export
Export
Energy Improvement (%) - 130nm - 160fF Load

80 70 60
60 50 40 30 20 10 0 -10 -20 -30 -40
Energy Dissipation - 130nm - with P2L - 300MHz, 160fF
Improvement (%)
50 40 30 20 10 0 5 7 9 11 13 15 17 19
Improvement (%)
11
13
15
17
19
Q factor
500Mhz 300Mhz 100Mhz
0.6
Q factor
0.8 1
Figure 6. Energy performance improvement of the proposed energyrecovering conguration over a conventional buffer driving an equivalent TSV Export load capacitance.
Energy Reduction - 130nm - with P2L - 300MHz
60 50
Figure 8. Improvement in energy dissipation when P2LC is included and Export switching activity is variable. f=300MHz, TSV=160fF.
Energy Dissipation - with P2L - 300MHz - 160fF Load
70 60
Improvement (%)
50 40 30 20 10 0 5 7 9 11 13 15 17 19
Improvement (%)
40 30 20 10 0 5 7 9 11 13 15 17 19
Q factor Q factor
80fF 120fF 160fF 130nm 65nm
Figure 7. Energy dissipation reduction at 300MHz when P2LC is included and the TSV load is variable.
Figure 9. Improvement in energy dissipation when P2LC is included and process technology is variable. f=300MHz, TSV=160fF.
Varying the value of the TSV load capacitance has no effect on the estimated energy improvement percentage, as the energy dissipation for both the proposed method and the conventional buffer is linearly related to the load capacitance. However, the data plotted in Figures 5, 6 can be considered Page 1 as the theoretical maximum performance attainable by the energy-recovering method, since the energy dissipation of the P2L converters is not included in these estimations. To include the effect of the P2L converters in the energy calculations, a simple circuit topology is chosen (Figure 3) and its energy dissipation is extracted from simulation data. Since the P2L converter has constant energy dissipation regardless of the value of the TSV capacitance, the linear relation of the total dissipated energy to the load capacitance value is no longer valid for the energy-recovering circuit. This can be observed in Figure 7, where the estimated energy improvement is plotted at 300MHz operating frequency and the TSV load capacitance is variable. It can be expected that as the load capacitance increases, energy performance would approach the Page 1 estimations in Figure 6. Switching activity can also be a signicant factor affecting energy performance. Since in the energy-recovering circuit the sinewave oscillation cannot be halted, all capacitances in the current ow path will charge and discharge on each cycle regardless of data activity. In contrast, static CMOS ideally dissipates energy only when switching and thus the energy-recovering circuit can compare favorably only at high switching rates. In Figure 8, the estimated effect of the switching activity on energy performance is plotted for an
operating frequency of 300Mhz. Since the technology factors T G and M 1 were extracted for the 130nm process, reducing their value by could also provide us with an estimation of the circuits energy performance for the 65nm node. The result in plotted in Page 1 Figure 9 and as can be observed, technology scaling has a positive effect on energy dissipation when compared to the conventional buffer.
VI. C ONCLUSIONS In this work, a theoretical analysis was developed to investigate the potential of the energy-recovering methodology, as used in adiabatic logic and resonant clock distribution networks, for reducing the energy dissipation of 3D IC interconnects. The total energy dissipation per cycle and optimum device sizing were extracted for the proposed method using the theoretical models. Simulation data showed good correlation Page with the theoretical estimations on a1 130nm process. The proposed conguration was compared against a conventional CMOS buffer, driving an equivalent TSV load capacitance and its energy performance was evaluated. Analysis revealed energy dependence on Q, f , D and TSV capacitance parameters and the results demonstrated favorable energy performance for high Q factors/switching activities/TSV capacitances and low operating frequencies. Furthermore, an estimation was provided on the energy performance behavior of the energy-recovering design in an advanced technology node, where improved efciency was demonstrated.
R EFERENCES
[1] H. Mahmoodi, V. Tirumalashetty, M. Cooke and K. Roy, Ultra LowPower Clocking Scheme Using Energy Recovery and Clock Gating, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Jan. 2009, Volume: 17, Issue 1, pp. 33-44. [2] M. Facchini et al., System-level Power/performance Evaluation of 3D stacked DRAMs for Mobile Applications, 2009, Design, Automation, and Test in Europe (DATE) conference 2009. [3] C. L. Seitz et al., Hot Clock nMOS, Technical Report, California Institute of Technology, 1985, [CaltechCSTR:1985.5177-tr-85]. [4] S. Kim, C. H. Ziesler, M. C. Papaefthymiou, Charge-Recovery Computing on Silicon, IEEE Trans. Computers 54(6): 651-659 (2005). [5] V. S. Sathe, J. C. Kao and M. C. Papaefthymiou, Resonant-Clock LatchBased Design, IEEE Journal of Solid-State Circuits, Volume 43, Issue 4, April 2008, pp. 864 - 873. [6] B. Voss and M. Glesner, A low power sinusoidal clock, The 2001 IEEE International Symposium on Circuits and Systems, Volume 4, 6-9 May 2001, pp. 108 - 111, vol. 4. [7] P. Marchal et al., 3D Technology Assessment: Path-nding the technology/Design Sweet-spot, Proceedings of the IEEE, January 2009, vol 97, pp. 96-107. [8] D. Maksimovic, V. G. Oklobdzija, Integrated power clock generators for low energy logic, 26th Annual IEEE Power Electronics Specialists Conference, 18-22 Jun 1995, pp. 61-67, vol.1. [9] W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzanis, and E. YC. Chou, Low-power digital systems based on adiabatic-switching principles, IEEE Trans. Very Large Scale Intgr. (VLSI) Syst., vol. 2, no. 4, pp. 398407, April 1994. [10] C. C. Yeh, J. H. Lou, and J. B. Kuo, 1.5V CMOS Full-Swing Energy Efcient Logic (EEL) Circuit Suitable for Low-Voltage and Low-Power VLSI Application, Elec. Lett., Vol. 33, No. 16, pp. 1375-1376, 1997. [11] D. Maksimovic, A MOS gate drive with resonant transitions, 22nd Annual IEEE Power Electronics Specialists Conference, 24-27 Jun 1991, pp.527-532.

Evaluation of Energy-Recovering Interconnects For Low-Power 3D Stacked Ics

Uploaded by

Copyright:

Available Formats

Evaluation of Energy-Recovering Interconnects For Low-Power 3D Stacked Ics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluation of Energy-Recovering Interconnects For Low-Power 3D Stacked Ics

Uploaded by

Copyright:

Available Formats

Evaluation of Energy-Recovering Interconnects for Low-Power 3D Stacked ICs

978-1-4244-4512-7/09/$25.00 2009 IEEE

2 T G 2 f [CT SV + Cn + 6CD ]2 VDD 2 Cn

A Pulse-to-Level Converter implementation.

and equating term

Equation (6) then becomes:

RTGATE Adiabatic Driver CTSV

Resonant pulse generator.

For transistor M1 we can also dene a Device technology factor:

Substituting RM 1 into (13):

Estimated energy dissipation at 500, 200, 100 MHz. TSV=160fF.

Qind Rind Rind

Energy dissipation on Rind can be estimated using (1) as:

Energy Improvement (%) - 130nm - 160fF Load

Energy Dissipation - 130nm - with P2L - 300MHz, 160fF

You might also like