INTEGRATION, The VLSI Journal: Fang Tang, Amine Bermak, Zhouye Gu
INTEGRATION, The VLSI Journal: Fang Tang, Amine Bermak, Zhouye Gu
INTEGRATION, The VLSI Journal: Fang Tang, Amine Bermak, Zhouye Gu
Low power dynamic logic circuit design using a pseudo dynamic buffer
Fang Tang n, Amine Bermak, Zhouye Gu
The Hong Kong University of Science and Technology, ECE Department, Clear Water Bay, Kowloon, Hong Kong
a r t i c l e i n f o abstract
Article history: In this paper, we propose a pseudo dynamic buffer (PDB) for footed domino logic circuit implementa-
Received 17 January 2011 tion. Using the proposed PDB structure, the output pulse during the precharge process is prevented
Received in revised form from propagating to the output stage, as is the case in conventional case. As a result, up to half of the
30 August 2011
power is saved compared to a conventional domino gate, while improving the sampling window of the
Accepted 30 August 2011
Available online 20 October 2011
dynamic gate. This PDB structure is applicable not only for Pull-down network (N-type) dynamic logic,
but also for Pull-up networks (P-type). Simulation results illustrate improved performance using the
Keywords: proposed scheme compared to the conventional dynamic logic for different loading conditions, clock
Dynamic logic frequencies and logic functions. In addition, our proposed design reduces the clock loading from
Pseudo dynamic buffer
conventional three to two transistors. As a result, the proposed scheme significantly saves power due to
Precharge pulse
lower load capacitance on the clock bus. Test structures are fabricated in 0:35 mm CMOS technology.
Low power domino logic
Measurement results validate the proposed concept and illustrate power saving as compared to
conventional design.
& 2011 Elsevier B.V. All rights reserved.
1. Introduction extra noise in the gate and the propagation of these pulses
through the static buffer result in extra power consumption
Dynamic logic has been very widely used in a large number of [12]. TSPC-based dynamic logic circuit proposed in [13] can
applications such as high speed digital logic [1–3]; memory [4–6] significantly reduce the propagated noise, since the buffer is
as well as high performance microprocessor design [7,8]. This disabled during the precharge phase by an extra stacked clock
logic family offers a number of interesting features compared to transistor. However, this additional clock transistor increases the
static logic, namely reduced transistor count (almost half com- load capacitance of the clock signal and eventually, extra power is
pared to static complementary) as well as reduced load capaci- consumed due to larger clock loading. In this paper, we propose a
tance and hence improved speed. The operation of a dynamic pseudo dynamic buffer (PDB) for the footed clock controlled
logic gate is controlled by a clock signal and can be implemented dynamic logic circuit. Using this PDB structure, the precharge
in either Pull-up (P-type) or Pull-down (N-type) configurations [9]. pulse is blocked at the input of the buffer and is prevented from
The voltage at the output of the dynamic circuit is stored on a being propagated to the output of the dynamic gate. As a
parasitic capacitance, which is typically buffered before it is sent consequence, power typically consumed in the buffer during the
to the next stage. This temporary voltage is affected not only by precharge phase is saved. Additionally, compared to TSPC-based
charge sharing of the internal parasitic capacitances [10], but also dynamic logic, in our scheme the clock transistor count is reduced
by the consequent dynamic circuit. Normally, a buffer at the from 3 to 2. As a result, the power consumption of our proposed
output of the dynamic logic is required to drive the next stage. scheme is significantly reduced due to lower load capacitance on
A typical domino gate [9] consists of a P-type or N-type network the clock bus.
followed by a static inverter. In general, a Keeper (or bleeder) is This paper is organized as follows. In Section 2, the conven-
added in order to alleviate charge sharing problems [11]. Since tional domino logic gate is reviewed and the proposed dynamic
the output of the dynamic gate is sampled on parasitic capacita- logic gate based on the pseudo dynamic buffer is introduced.
nces, periodic precharge phases of the output node are required. Section 3 introduces a thorough performance analysis of the
Although dynamic logic circuit benefits from smaller area and proposed gate in terms of power consumption as well as cascad-
higher speed, a significant power is waisted due to these periodic ing and charge sharing properties. Section 4 introduces simula-
precharge phases. Additionally, these precharge pulses introduce tion and experimental results of the proposed logic structure.
Section 4 also provides performance comparison of the proposed
dynamic gate against the conventional structure for different
n
Corresponding author. logic gates and under different loading and frequency assump-
E-mail address: bermak@ieee.org (F. Tang). tions. Comparison results are based on both simulation as well as
0167-9260/$ - see front matter & 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.vlsi.2011.08.003
396 F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404
clk precharge pulse During the evaluation phase, node Z is discharged to Gnd as
well as node B, resulting in enabling the PMOS transistor M4,
Z while pulling up the output F to Vdd.
Vpp = Vdd
During the precharge phase, node Z is charged up to Vdd,
B followed by the voltage at node B. Since the NMOS evaluation
Vpp = Vdd
transistor M2 is disabled, the output node Z is held high (same
value as the previous evaluation phase).
F precharge pulse
propagation The timing diagram of the proposed circuit for the case when
evaluate precharge
the input A is high is illustrated in Fig. 5. It is also worth
Fig. 2. Timing diagram of the conventional domino logic circuit, when input logic mentioning that a voltage drop at node F is assumed with an
A is ‘1’. Note that the precharge pulse is propagated through the buffer. amplitude of Vpp0 (ideally Vpp0 ¼ 0). It is important to note that
F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404 397
VDD and Cload are not be charged. When the logic activity is increased,
VDD the output stage consumes more power and as a result, the power
saving of the proposed scheme is reduced. The second non-ideal
M1 M4 restriction is due to the internal capacitance CB. In the conven-
tional scheme, the NMOS transistor’s source of the output stage is
Z always connected to ground. Therefore, the parasitic capacitor in
F this source node does not consume power. However, in the
CZ proposed domino scheme, the capacitance of node Bconsists of
clk the parasitic capacitors of both the first stage NMOS transistors
A
M3 Cbuf + CLoad and the output stage NMOS transistor. As a result, the cap load in
the proposed scheme is increased leading to a larger power
M5
B consumption. The third limit is the charge sharing, which is
mainly generated due to a finite clock slew rate. During the clock
CB transition from ‘1’ to ‘0’, the voltage at node Bwill be charged from
both node Zand node F, leading to a output charge sharing.
M2
As discussed above, when comparing with the conventional
domino logic, the power saving by using the proposed scheme is
coming from the reduced activity of the buffer output. Different
from the conventional domino logic, TSPC does not propagate the
precharge pulse which is similar to the proposed scheme. There-
fore, the power saving discussed above is not applicable to TSPC.
Fig. 4. Domino logic circuit using the proposed pseudo dynamic buffer. The source Shown in Figs. 3 and 4, TSPC has three clock transistors, while
of the buffer’s NMOS transistor M5 is connected to node B instead of Gnd. only two in the proposed scheme. This extra clock transistor leads
to larger clock load and slower clock slew rate. When both the
A PMOS and NMOS clock transistors are turned on due to the slow
clock slew rate, an extra large short current will be consumed if
clk precharge pulse the input logic is high. Thus, the clock slew rate improvement can
be simply estimated as
Z
Vpp = Vdd SRPRO SRTSPC CN
¼ ð2Þ
B SRTSPC 2C N þC P
Vpp = Vdd’
where CN and CP are the gate parasitic capacitances of the NMOS
and PMOS transistors, respectively. If C P ¼ 2C N when considering
holding the value a larger PMOS transistor size, the clock slew rate improvement is
and no precharge 25%. In Section 4.2, a detailed simulation about the power saving
evaluate precharge pulse propagation will be delivered.
Fig. 5. Timing diagram of the proposed domino logic circuit. Note that there is an
expected small voltage drop V 0pp at the output node F during the precharge phase.
3.2. Charge sharing analysis
during the precharge phase, the output node F is isolated from
Similarly to any dynamic logic circuits, the proposed scheme
Gnd. In other words, the precharge pulse at node Z cannot
also suffers from charge sharing if the input logic is high during
propagate through the buffer to the output node F.
the precharge phase. This charge sharing is mainly introduced by
the parasitic capacitance at node B, indicated in Fig. 6. To ensure
3. Performance analysis an accurate operation, the voltage drop Vdrop at node Z should be
within the noise margin. This voltage drop could be propagated to
3.1. Power consumption analysis
Vdd1
Vdd2
In the proposed domino logic, the precharge pulse is prevented
from propagating to the output node of the buffer, resulting in a M1
decreasing current consumption in the output stage of the M4
domino gate. Ideally, if the precharge pulse propagating is Vdd Z Vdd
Vdd Vdd-Vdrop
completely prevented and the input logic is fixed to ‘1’, the power F
0
Path
0 CF
saving Z of the proposed scheme compared to a conventional clk A
1
P tot P 0tot B M5
Z¼
P tot
Vdd-Vth
C Load CB
¼ ð1Þ 0
C p þ C Load M2
the next stage and causes more serious charge sharing problems
requiring special attention. 1.5
The charge sharing problem could be alleviated by using a Clk 1
0.5
number of solutions, namely: 0
Solution A: using dual power supply techniques. Vdd2 used for 2 2.5 3 3.5 4 4.5 5
supplying the pseudo dynamic buffer has higher voltage than Time (ns)
Vdd1, by which the voltage drop at the output node due to 1.9
charge sharing could be compensated. Output 1.85
w/o 1.8
Solution B: increasing the channel length of M5 resulting in keeper 1.75
more current flowing to node B through path I instead of path
II. This will result in a smaller voltage drop at node F. 2 2.5 3 3.5 4 4.5 5
Solution C: increasing the load capacitance at node F. Time (ns)
Obviously, if C F b C B , the charge sharing effect could be
minimized. Output 1.82
with
Solution D: using the Keeper design. keeper 1.8
VDD VDD
VDD
M1 M1
M4 B
vdd Z
F
vdd vdd-vth A
clk M3
A clk
M3 M4
M5
F
B Z
vdd-vth
M2
M2 M5
Fig. 10. The critical path of the proposed domino logic when the input logic A is
changed from ‘0’ to ‘1’ during the precharge phase. Fig. 12. P-type domino circuit using the proposed pseudo dynamic buffer. The
source of transistor M4 is connected to node B, instead of Vdd.
O3 vdd
4. Simulation and experimental results
O4 Vdd-Vth
Fig. 11. The timing response of the proposed cascading domino logic. Results of
4.1. Simulation and comparison with conventional dynamic logic
later stages in the cascade can be generated in the precharge phase and as a result
the evaluation window can be extended. For a 1:1 duty ratio of the clock signal, the power saving for a
1-bit minimum transistor feature adder with a load capacitance of
25 fF is evaluated in 0:18 mm technology for different clock
later stages (for example: O4) cannot generate correct output frequencies and results are summarized in Table 1. As a result,
values. the higher operating frequency, the higher power saving. How-
For the proposed PDB-based domino logic circuit, evaluated ever ideally, the maximum power saving is normally limited
output results are held during the precharge phase. For later within 45% since the power consumptions of the first stage for the
stages, the output value can be generated in the precharge phase proposed scheme and the conventional domino logic are similar.
by enabling transistors M3 and M5 (Fig. 10). The critical signal The carry output delay time in this simulation result is provided
path is demonstrated in Fig. 10. During the precharge phase in in the table. It should be noted that domino logic can only transit
cycle 1, O4 is finally pulled high to VddVth. This feature enables input logic from 0 to 1 in the evaluation phase. During this
the result for a long cascade of domino gates to be settled even transition, the proposed scheme indicates faster settling speed
during the precharge phase (Fig. 11). This feature even not than the conventional domino logic. This faster speed is coming
conventional, enables to retrieve the correct output in the sub- from an extra charge transfer path from node Bto node Fthrough
sequent clock cycle. For the proposed structure and similarly to M5. As a result, the charge in node Zcan leak faster. A smaller peak
conventional domino logic, the input logic changing from ‘1’ to ‘0’ voltage in node Bcan be observed in the proposed scheme which
can only be sensed during the next clock cycle. also shows this faster settling result.
Similarly, higher load capacitance leads to increased power
saving using our proposed architecture. For a 1:1 duty ratio of the
3.4. P-type implementation clock signal, the power saving for the same 1-bit adder with a
clock frequency of 500 MHz is evaluated for different clock
A domino logic chain consists of alternative N-type and P-type frequencies and results are summarized in Table 2.
modules. The pseudo dynamic buffer proposed in Fig. 4 is only In order to validate the claimed power saving of our proposed
valid for the N-type domino logic. For P-type logic, the proposed domino logic, extensive simulations were performed using a set
circuit structure is also feasible as depicted in Fig. 12. For P-type of logic circuits as illustrated in Table 3. In this comparison, the
domino logic, the buffer configuration is very similar to the clock frequency, the supply voltage and the load capacitance were
N-type design, and is obtained by connecting the source of the set to 500 MHz, 1.8 V and 10 fF, respectively. Two different
buffer’s PMOS transistor M4 to the drain of the PMOS evaluation scenarios of input logic activity are simulated when the input
transistor M1. logic activity is 50% of the clock signal and for the case when the
400 F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404
Table 2
Power and delay savings comparison for different load capacitances in 0:18 mm
technology. 3
Voltage (V)
Power saving 42% 46% 48% 49% 2
Delay ns Conventional 0.38 0.58 0.9 1.5
This work 0.3 0.48 0.8 1.4 1.5
1
Table 3
Power savings comparison with different logic functions in 0:18 mm technology
(Vdd ¼1.8 V, 500 MHz). 0.5
800 2
20
input is static (k¼0). The results for both cases are summarized in
Assuming all the clock transistors contribute the same load 16bit, 32bit and 64bit. With the power consumption and delay
capacitance on the clock bus and the power consumed by the comparison is summarized in Table 4, indicating about 30% power
clock buffer can be ignored (N is a large value), the clock load saving and a slightly shorter delay. The load capacitance is 10 fF,
capacitance of our proposed dynamic logic structure is about 2/3 the clock frequency is 50 MHz and the input single activity is
as much as the value of the TSPC-based scheme. 15 MHz. The delay of both 64bit full adders from the least
It should, however, be noted that, during the evaluation phase, significant bit (LSB) input to the most significant bit (MSB) output
an NMOS clock transistor is stacked in the inverter for both the
proposed and TSPC-based scheme, which could increase the
evaluation delay when the output is pulled down. In order to
obtain the same delay as the conventional domino logic circuit, VDD VDD VDD VDD VDD
the NMOS transistor’s channel width should be increased.
Although increasing the transistor size leads to more power
consumption and eventually up to 10% power saving is degraded S
due to a larger parasitic capacitance, the TSPC-based scheme Ci
suffers even more from such an issue due to a larger clock load Ci B
capacitance. Some of the conventional circuit techniques such as B
clk clk
progressive sizing of the stacked transistors can help improve the
delay by about 20% [17]. The static power dissipation is simulated A B A A B Ci A
by fixing the input logic to ‘0’ and feeding a 4 GHz clock signal. Co
When N is 64 using a 45 nm CMOS technology, the static power
for the TSPC-based dynamic logic circuit is about 397 mW while it
is 344 mW for the proposed circuit, resulting in about 13% power
saving. It should be mentioned that the static power in this
section means the power consumption when the input logic is
fixed to zero. Different from the leakage current in the static Fig. 16. Schematic of an example of an adder cell implemented in our test chip.
combinational logic circuit, this leakage is mainly due to the
activity of the clock. Therefore, this leakage is much larger than
the one in a static logic circuits.
A real ripple carry adder is also simulated by using 0:18 mm
technology. The adder widths in this benchmark are 4bit, 8bit,
Table 4
Full adder power consumption and delay comparison between the proposed
scheme and TSPC in 0:18 mm technology.
M1 M1
clk' M4 clk’ M4
clk clk F
F
A A Z A A
M3 M3 Z M6
M5
clk
M5
M2 Proposed M2
dynamic buffer
Fig. 15. The building block for power comparison between (A) the proposed dynamic logic circuit and (B) the TSPC-based logic circuit.
402 F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404
Fig. 20. Experimental results showing the transient response of an adder cell
implemented using the proposed PDB-based domino logic.
2500
2000
Power Consumption (µw)
Conventional
1500 domino logic
1000
*
*
* *
500 *
* Proposed domino logic
*
* *
*
0 **
0 100 200 300 400 500 600 700 800 900 1000
Clock Frequency (MHz)
Fig. 21. Comparison of the measured power consumption for the conventional
versus proposed adder circuits as a function of the clock frequency for an external
load capacitance of 100 fF.
60
Fig. 18. (Upper) The measurement setup and (lower) the microphotograph of
50
the chip.
Power Consumption (uW)
40 Measurement results
30
20 Simulation results
10
0
0 1000 2000 3000 4000 5000
Load Capacitance (fF)
Fig. 22. Measured versus simulated power consumption for the conventional
Fig. 19. Experimental results showing the transient response of an adder cell domino logic as a function of the load capacitance, when the input logic is high
implemented using conventional domino logic. and the clock frequency is 1 MHz.
F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404 403
(lower picture). The test structures were successfully tested and capacitance. Note that the proposed structure has much lower
Figs. 19 and 20 illustrate the silicon output transient waveforms power consumption benefiting from no precharge pulse propaga-
from the conventional and proposed adder cells, respectively. tion at the output node. As a result, the power saving is roughly
In this measurement result, the clock frequency is set to 1 MHz, proportional to the output load capacitance, which is consistent
the switching frequency of the input logic is set to 1 KHz and the with Eq. (1). One can note from Fig. 23 that the power does not
supply voltage is set to 3.3 V. For the conventional domino logic, increase linearly as a function of the output load capacitance. This
in each clock cycle the precharge pulse is propagated to the is explained by the fact that the output activity is reduced since
output node, when the input logic is high. In the proposed design, the proposed buffer isolates the large capacitive node from the
the precharge pulse propagation is avoided as depicted in Fig. 20. dynamic node of the gate.
Fig. 21 illustrates the relationship between the clock frequency Fig. 24 shows the power consumption for both the conven-
and the power consumption for both the conventional and the tional and the proposed designs for different supply voltages. The
proposed domino adder circuit for an external load capacitance of clock frequency is set to 1 MHz and the load capacitance is 0.5 pF.
100 fF and a 1:1 clock duty ratio. The power consumption is Obviously, the power consumption is proportional to V 2dd and the
proportional to the clock frequency and a consistent up to 45% power saving is roughly independent of the supply voltage. In this
power saving is obtained. The leakage current is only about 17 nA, 0:35 mm CMOS technology, the transistors enter into subthres-
which can be ignored. hold region when Vdd o 0:5 V. Therefore, the proposed domino
In Figs. 22 and 23, the power consumption of the conventional logic can operate properly even in the subthreshold region with
design versus the proposed one for different loading conditions is considerable power saving.
represented. The consumed power is a linear function of the load
5. Conclusion
1.35
In a conventional footed domino logic circuit, the precharge
Measurement results pulse is propagated to the output stage, which increases substan-
1.3 tially the power and limits the cascading performance of the gate.
Power Consumption (uW)
Acknowledgment
90
simulation results The work described in this paper is supported by a research
80 measurement results
grant from the research grant council of Hong Hong RGC Grant
Square root of the power (nW1/2)
60
References
50
[1] M. Anders, S. Mathew, B. Bloechel, S. Thompson, R. Krishnamurthy,
40 K. Soumyanath, S. Borkar, A 6.5 GHz 130 nm single-ended dynamic ALU
and instruction-scheduler loop, IEEE ISSCC (2002) 410–411.
[2] Xu-guang Sun, Zhi-gang Mao, Feng-chang Lai, A 64 bit parallel CMOS adder
30
for high performance processors, in: Proceedings of the IEEE Asia-Pacific
Conference on ASIC, 2002, pp. 205–208.
20 [3] R.H. Krambeck, C.M. Lee, H.-F.S. Law, High-speed compact circuits with
CMOS, IEEE Journal of Solid-State Circuits SC-17 (3) (1982) 614–619.
10 [4] Hwang Wei, R.V. Joshi, W.H. Henkels, A 500-MHz, 32-word64-bit, eight-port
Proposed design self-resetting CMOS register file, IEEE Journal of Solid-State Circuits Jan.
0 (1999) 56–67.
0 0.5 1 1.5 2 2.5 3 3.5 [5] B. Amrutur, M. Horowitz, Fast low-power decoders for RAMs, IEEE Journal of
Solid-State Circuits 36 (Oct.) (2001) 1506–1515.
Supply voltage (v) [6] A. Bhavnagarwala, S.V. Kosonocky, S.P. Kowalczyk, R.V. Joshi, A transregional
CMOS SRAM with single, logic VDD and dynamic power rails, in: IEEE
Fig. 24. Square root of the power consumption as a function of the supply voltage. Symposium on VLSI Circuits, 2004, pp. 291–293 (June).
404 F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404
[7] K.J. Nowka, T. Galambos, Circuit design techniques for a gigahertz integer [13] Y. Ji-Ren, I. Karlsson, C. Svensson, A true single-phase-clock dynamic CMOS
microprocessor, in: IEEE International Conference on Computer Design, 1998, circuit technique, IEEE Journal of Solid-State Circuits 22 (Oct.) (1987)
pp. 11–16, Aug. 899–901.
[8] R. Heald, et al., A third-generation SPARC V9 64-b microprocessor, IEEE [14] W.S. Song, M.M. Vai, H.T. Nguyen, High-performance low-power bit-level
Journal of Solid-State Circuits 5 (2000) 1526–1538. systolic array signal processor with low-threshold dynamic logic circuits, in:
[9] Neil H.E. Weste, David Harris, Principles of CMOS VLSI Design: A System Conference Record of the Thirty-Fifth Asilomar Conference on Signals,
Perspective, 3rd ed., Addison-Wesley, 2004. Systems and Computers, 2001, pp. 144–147.
[10] Tyler Thorp, Dean Liu, Pradeep Trivedi, Analysis of blocking dynamic circuits, [15] Jinn-Shyan Wang, Ching-Rong Chang, Chingwei Yeh, Analysis and design of
IEEE Transactions on VLSI Systems (2003) 744–749. high-speed and low-power CMOS PLAs, IEEE Journal of Solid-State Circuits 36
[11] Yolin Lih, Nestoras Tzartzanis, William W. Walker, A leakage current replica
(8) (2001) 1250–1262.
Keeper for dynamic circuits, IEEE Journal of Solid-State Circuits 42 (1) (2007)
[16] V. Kursun, E.G. Friedman, Domino logic with variable threshold voltage
48–55.
keeper, IEEE Transactions on VLSI Systems 11 (6) (2003) 1080–1093.
[12] F. Mendoza-Hernandez, M. Linares-Aranda, V. Champac, Noise-tolerance
[17] Jan M. Rabey, Anantha Chandrakasan, Borivoje Nikolic, Digital Integrated
improvement in dynamic CMOS logic circuits. in: Proceedings of the IEE
Circuits—A Design Perspective, 2nd ed., Prentice Hall, 2003.
Circuits, Devices and Systems, vol. 153, 2006, pp. 565–573 No. 6, Dec.