Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
56 views10 pages

INTEGRATION, The VLSI Journal: Fang Tang, Amine Bermak, Zhouye Gu

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 10

INTEGRATION, the VLSI journal 45 (2012) 395–404

Contents lists available at SciVerse ScienceDirect

INTEGRATION, the VLSI journal


journal homepage: www.elsevier.com/locate/vlsi

Low power dynamic logic circuit design using a pseudo dynamic buffer
Fang Tang n, Amine Bermak, Zhouye Gu
The Hong Kong University of Science and Technology, ECE Department, Clear Water Bay, Kowloon, Hong Kong

a r t i c l e i n f o abstract

Article history: In this paper, we propose a pseudo dynamic buffer (PDB) for footed domino logic circuit implementa-
Received 17 January 2011 tion. Using the proposed PDB structure, the output pulse during the precharge process is prevented
Received in revised form from propagating to the output stage, as is the case in conventional case. As a result, up to half of the
30 August 2011
power is saved compared to a conventional domino gate, while improving the sampling window of the
Accepted 30 August 2011
Available online 20 October 2011
dynamic gate. This PDB structure is applicable not only for Pull-down network (N-type) dynamic logic,
but also for Pull-up networks (P-type). Simulation results illustrate improved performance using the
Keywords: proposed scheme compared to the conventional dynamic logic for different loading conditions, clock
Dynamic logic frequencies and logic functions. In addition, our proposed design reduces the clock loading from
Pseudo dynamic buffer
conventional three to two transistors. As a result, the proposed scheme significantly saves power due to
Precharge pulse
lower load capacitance on the clock bus. Test structures are fabricated in 0:35 mm CMOS technology.
Low power domino logic
Measurement results validate the proposed concept and illustrate power saving as compared to
conventional design.
& 2011 Elsevier B.V. All rights reserved.

1. Introduction extra noise in the gate and the propagation of these pulses
through the static buffer result in extra power consumption
Dynamic logic has been very widely used in a large number of [12]. TSPC-based dynamic logic circuit proposed in [13] can
applications such as high speed digital logic [1–3]; memory [4–6] significantly reduce the propagated noise, since the buffer is
as well as high performance microprocessor design [7,8]. This disabled during the precharge phase by an extra stacked clock
logic family offers a number of interesting features compared to transistor. However, this additional clock transistor increases the
static logic, namely reduced transistor count (almost half com- load capacitance of the clock signal and eventually, extra power is
pared to static complementary) as well as reduced load capaci- consumed due to larger clock loading. In this paper, we propose a
tance and hence improved speed. The operation of a dynamic pseudo dynamic buffer (PDB) for the footed clock controlled
logic gate is controlled by a clock signal and can be implemented dynamic logic circuit. Using this PDB structure, the precharge
in either Pull-up (P-type) or Pull-down (N-type) configurations [9]. pulse is blocked at the input of the buffer and is prevented from
The voltage at the output of the dynamic circuit is stored on a being propagated to the output of the dynamic gate. As a
parasitic capacitance, which is typically buffered before it is sent consequence, power typically consumed in the buffer during the
to the next stage. This temporary voltage is affected not only by precharge phase is saved. Additionally, compared to TSPC-based
charge sharing of the internal parasitic capacitances [10], but also dynamic logic, in our scheme the clock transistor count is reduced
by the consequent dynamic circuit. Normally, a buffer at the from 3 to 2. As a result, the power consumption of our proposed
output of the dynamic logic is required to drive the next stage. scheme is significantly reduced due to lower load capacitance on
A typical domino gate [9] consists of a P-type or N-type network the clock bus.
followed by a static inverter. In general, a Keeper (or bleeder) is This paper is organized as follows. In Section 2, the conven-
added in order to alleviate charge sharing problems [11]. Since tional domino logic gate is reviewed and the proposed dynamic
the output of the dynamic gate is sampled on parasitic capacita- logic gate based on the pseudo dynamic buffer is introduced.
nces, periodic precharge phases of the output node are required. Section 3 introduces a thorough performance analysis of the
Although dynamic logic circuit benefits from smaller area and proposed gate in terms of power consumption as well as cascad-
higher speed, a significant power is waisted due to these periodic ing and charge sharing properties. Section 4 introduces simula-
precharge phases. Additionally, these precharge pulses introduce tion and experimental results of the proposed logic structure.
Section 4 also provides performance comparison of the proposed
dynamic gate against the conventional structure for different
n
Corresponding author. logic gates and under different loading and frequency assump-
E-mail address: bermak@ieee.org (F. Tang). tions. Comparison results are based on both simulation as well as

0167-9260/$ - see front matter & 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.vlsi.2011.08.003
396 F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404

experimental measurements from the fabricated test structures. VDD VDD


Section 5 presents a conclusion.
M1
M4
2. PDB-based domino logic Z
F
2.1. Conventional domino logic clk A clk
M3 M6
Fig. 1A shows the schematic of a conventional footed clock
controlled domino logic circuit, which consists of a dynamic
B
N-type gate (Pull-down network PDN) followed by a static
inverter. Fig. 1B illustrates the implementation of a domino logic M5
M2
buffer. The gate operates in two phases, namely precharge and
evaluation phases. During the precharge phase the clock signal clk
is pulled low thus turning on the PMOS transistor M1 enabling to
precharge the dynamic node Z. During the evaluation phase, the
clock signal clk is pulsed high, thus turning on the NMOS Fig. 3. TSPC dynamic logic circuit using NC 2 MOS output buffer. The output node F
transistor M2. When the input A is low, the logic at node Z is holds its previous value and is isolated from the precharge pulse at node Z during
kept high regardless of the operating phase. However, when the the precharge phase.
input A is high, two phases (evaluation and precharge) should be
discussed as depicted in Fig. 2. During the precharge phase, node
and as a result the cascading performance is limited [8]. A number
Z is charged up to Vdd as well as node B. The voltage at node F
of solutions were proposed to deal with this issue of precharge
drops down to ‘0’, resulting in a propagation of the precharge
pulse propagation [12,13]. Amongst other solutions, the TPSC
phase to the output of the buffer.
dynamic logic scheme proposed by Ji-Ren et al. [13] overcomes
The propagation of the precharge pulse from node Z through
such a problem using NC 2 MOS or PC 2 MOS, as shown in Fig. 3, but
the static buffer results in increased power consumption. In
at the expense of an extra transistor as compared to a domino
addition, the output logic is unstable during the precharge phase
gate. In this gate, the dynamic node Z is precharged high and M6
is disabled. As a result, the output F holds its previous value [13].
However, the TSPC design suffers from the drawback of doubling
VDD
VDD the load capacitance for the clock signal, which results in
VDD increased power induced in the clock distribution network. This
M1 power constitutes a very significant component in modern digital
M4 processor design [14]. Large clock loading will cause a slow clock
clk
Z edge and therefore, extra power is consumed when transistors M1
Mp F and M2 are turned on at the same time leading to an increase in
in1 short circuit current during the clock signal transition.
clk A
in2 PDN M3
M5 2.2. Proposed PDB for domino logic
in3

clk B The previous section illustrates the issue of performance


Mn degradation due to the propagation of the precharge pulse
M2 inherent in domino logic gates. The proposed PDB-based imple-
mentation overcomes this problem using the circuit structure
shown in Fig. 4. In the proposed implementation of the buffer, the
source of the buffer’s NMOS transistor M5 is connected to node B
instead of Gnd. Using such a circuit topology, the value at node Z
cannot propagate to the output F during the precharge phase of
Fig. 1. (A) A typical domino logic circuit using the conventional buffer. The buffer
the gate since during this phase, the evaluation transistor M2 is
is a static inverter, which connects the source of M5 to Gnd. (B) Domino buffer turned off. For our proposed gate, when the input logic A is low,
implementation example. the floating node Z is always high and then, the output node F is
kept low regardless of the operating phase. On the other hand, if
the input A is high, the precharge and evaluation phases will lead
A to the following situation:

clk precharge pulse  During the evaluation phase, node Z is discharged to Gnd as
well as node B, resulting in enabling the PMOS transistor M4,
Z while pulling up the output F to Vdd.
Vpp = Vdd
 During the precharge phase, node Z is charged up to Vdd,
B followed by the voltage at node B. Since the NMOS evaluation
Vpp = Vdd
transistor M2 is disabled, the output node Z is held high (same
value as the previous evaluation phase).
F precharge pulse
propagation The timing diagram of the proposed circuit for the case when
evaluate precharge
the input A is high is illustrated in Fig. 5. It is also worth
Fig. 2. Timing diagram of the conventional domino logic circuit, when input logic mentioning that a voltage drop at node F is assumed with an
A is ‘1’. Note that the precharge pulse is propagated through the buffer. amplitude of Vpp0 (ideally Vpp0 ¼ 0). It is important to note that
F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404 397

VDD and Cload are not be charged. When the logic activity is increased,
VDD the output stage consumes more power and as a result, the power
saving of the proposed scheme is reduced. The second non-ideal
M1 M4 restriction is due to the internal capacitance CB. In the conven-
tional scheme, the NMOS transistor’s source of the output stage is
Z always connected to ground. Therefore, the parasitic capacitor in
F this source node does not consume power. However, in the
CZ proposed domino scheme, the capacitance of node Bconsists of
clk the parasitic capacitors of both the first stage NMOS transistors
A
M3 Cbuf + CLoad and the output stage NMOS transistor. As a result, the cap load in
the proposed scheme is increased leading to a larger power
M5
B consumption. The third limit is the charge sharing, which is
mainly generated due to a finite clock slew rate. During the clock
CB transition from ‘1’ to ‘0’, the voltage at node Bwill be charged from
both node Zand node F, leading to a output charge sharing.
M2
As discussed above, when comparing with the conventional
domino logic, the power saving by using the proposed scheme is
coming from the reduced activity of the buffer output. Different
from the conventional domino logic, TSPC does not propagate the
precharge pulse which is similar to the proposed scheme. There-
fore, the power saving discussed above is not applicable to TSPC.
Fig. 4. Domino logic circuit using the proposed pseudo dynamic buffer. The source Shown in Figs. 3 and 4, TSPC has three clock transistors, while
of the buffer’s NMOS transistor M5 is connected to node B instead of Gnd. only two in the proposed scheme. This extra clock transistor leads
to larger clock load and slower clock slew rate. When both the
A PMOS and NMOS clock transistors are turned on due to the slow
clock slew rate, an extra large short current will be consumed if
clk precharge pulse the input logic is high. Thus, the clock slew rate improvement can
be simply estimated as
Z
Vpp = Vdd SRPRO SRTSPC CN
¼ ð2Þ
B SRTSPC 2C N þC P
Vpp = Vdd’
where CN and CP are the gate parasitic capacitances of the NMOS
and PMOS transistors, respectively. If C P ¼ 2C N when considering
holding the value a larger PMOS transistor size, the clock slew rate improvement is
and no precharge 25%. In Section 4.2, a detailed simulation about the power saving
evaluate precharge pulse propagation will be delivered.
Fig. 5. Timing diagram of the proposed domino logic circuit. Note that there is an
expected small voltage drop V 0pp at the output node F during the precharge phase.
3.2. Charge sharing analysis
during the precharge phase, the output node F is isolated from
Similarly to any dynamic logic circuits, the proposed scheme
Gnd. In other words, the precharge pulse at node Z cannot
also suffers from charge sharing if the input logic is high during
propagate through the buffer to the output node F.
the precharge phase. This charge sharing is mainly introduced by
the parasitic capacitance at node B, indicated in Fig. 6. To ensure
3. Performance analysis an accurate operation, the voltage drop Vdrop at node Z should be
within the noise margin. This voltage drop could be propagated to
3.1. Power consumption analysis
Vdd1
Vdd2
In the proposed domino logic, the precharge pulse is prevented
from propagating to the output node of the buffer, resulting in a M1
decreasing current consumption in the output stage of the M4
domino gate. Ideally, if the precharge pulse propagating is Vdd Z Vdd
Vdd Vdd-Vdrop
completely prevented and the input logic is fixed to ‘1’, the power F
0
Path

0 CF
saving Z of the proposed scheme compared to a conventional clk A
1

domino logic can be approximately given as M3


II
th
Pa

P tot P 0tot B M5

P tot
Vdd-Vth
C Load CB
¼ ð1Þ 0
C p þ C Load M2

The power saving of the proposed scheme comes from reducing


the output node activity. Typically there are three limits which
can degrade the power saving. The first one is the logic activity
rate. The power saving is maximized when it is fixed to ‘1’,
because the output node Fis also fixed to ‘1’ and capacitors Cbuf Fig. 6. Charge sharing of the proposed scheme.
398 F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404

the next stage and causes more serious charge sharing problems
requiring special attention. 1.5
The charge sharing problem could be alleviated by using a Clk 1
0.5
number of solutions, namely: 0

 Solution A: using dual power supply techniques. Vdd2 used for 2 2.5 3 3.5 4 4.5 5
supplying the pseudo dynamic buffer has higher voltage than Time (ns)
Vdd1, by which the voltage drop at the output node due to 1.9
charge sharing could be compensated. Output 1.85
w/o 1.8
 Solution B: increasing the channel length of M5 resulting in keeper 1.75
more current flowing to node B through path I instead of path
II. This will result in a smaller voltage drop at node F. 2 2.5 3 3.5 4 4.5 5
 Solution C: increasing the load capacitance at node F. Time (ns)
Obviously, if C F b C B , the charge sharing effect could be
minimized. Output 1.82
with
 Solution D: using the Keeper design. keeper 1.8

2 2.5 3 3.5 4 4.5 5


Solution A uses a dual power supply to compensate for the
voltage drop at the output node. Using such an approach, no extra Time (ns)
parasitic or load capacitance is required and ideally, the charge Fig. 8. A transient simulation of the proposed domino logic with/without a
sharing can be completely eliminated. However, because the Keeper.
PMOS transistor of the buffer cannot be fully turned off
(9V gsp 9 4V dd2 V dd1 4 0), a large leakage current is an issue that
needs to be addressed in the case of dual supply solution, clk
particularly in deep submicron CMOS technologies. For solution
B, by increasing the channel length of M5, a larger amount of
charges is shared through M3 instead of M5 and as a result, the O1
In FD O2
charge sharing at the output node can be alleviated. However, this FD O3
FD O4
method cannot completely solve charge sharing issue and it also FD
increases the delay. Solution C increases the load capacitance at
the output node and involves extra power consumed proportion- 1 clock cycle
ally to the load capacitance. Similarly to solution B, this approach
also suffers from charge sharing issue. Solution D, which relies on In
using a Keeper transistor, as shown in Fig. 7, is commonly used to
solve charge sharing problem in conventional dynamic logic E P
clk
circuits [16]. This solution is potentially interesting as there is
no static short circuit current consumed, compared to solution A. O1 No evaluation
In addition, compared to solutions B and C, the voltage drop at the result
output node can be completely eliminated in the case of solution O2
D, while having less effect on the delay since less extra parasitic
capacitances are involved. O3
The simulation results about the charge sharing with and
without a Keeper are shown in Fig. 8. It is clear that, when using
a Keeper at the output node, the output voltage returns to Vdd
after a very short vibration due to the Keeper feedback. This Fig. 9. The timing diagram for a cascade of gates in the conventional domino logic.
Later stages cannot generate a correct output value, due to a limited evaluation
window.
Vdd
Vdd Vdd Keeper

M1 vibration is very short and small, which can guarantee a good


M4 M6 stability. When using 0:18 mm technology and setting Vdd ¼1.8 V,
clock frequency fclk ¼200 MHz, input activity fin ¼100 MHz,
Cload ¼1 fF, the power consumption of a buffer without Keeper is
clk 2:9 mW while the power consumption with a Keeper is 5:5 mW,
A F
M3 indicating an 90% power consumption overhead.
M5
B
3.3. Cascading analysis
M2
Fig. 9 presents the output timing diagram for the conventional
Short circuit cascading domino logic without using clock delay technique as
path
required in conventional domino logic circuit [15]. In conven-
tional schemes, only half of the clock cycle is used for the
evaluation phase and the output voltage will be reset to Gnd
Fig. 7. A Keeper design scheme solving charge sharing. A short circuit path is during the precharge phase. This results in a limited evaluation
introduced during the evaluation phase. window as illustrated in Fig. 9. If the clock frequency is very high,
F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404 399

VDD VDD
VDD

M1 M1
M4 B
vdd Z
F
vdd vdd-vth A
clk M3
A clk
M3 M4
M5
F
B Z
vdd-vth
M2
M2 M5

Fig. 10. The critical path of the proposed domino logic when the input logic A is
changed from ‘0’ to ‘1’ during the precharge phase. Fig. 12. P-type domino circuit using the proposed pseudo dynamic buffer. The
source of transistor M4 is connected to node B, instead of Vdd.

clock cycle 1 clock cycle 2


Table 1
In Power and delay savings comparison for different clock frequencies in 0:18 mm
technology.
P E
clk E
Clock frequency MHz with 25 fF load 500 250 150 100 Delay

O1 Power mW (1.8 V) Conventional 63 29.6 20 13.8 0.58 ns

O2 Hold This work 34 16.56 11.6 8.3 0.48 ns


Saving 46% 44% 42% 40% 17%

O3 vdd
4. Simulation and experimental results
O4 Vdd-Vth

Fig. 11. The timing response of the proposed cascading domino logic. Results of
4.1. Simulation and comparison with conventional dynamic logic
later stages in the cascade can be generated in the precharge phase and as a result
the evaluation window can be extended. For a 1:1 duty ratio of the clock signal, the power saving for a
1-bit minimum transistor feature adder with a load capacitance of
25 fF is evaluated in 0:18 mm technology for different clock
later stages (for example: O4) cannot generate correct output frequencies and results are summarized in Table 1. As a result,
values. the higher operating frequency, the higher power saving. How-
For the proposed PDB-based domino logic circuit, evaluated ever ideally, the maximum power saving is normally limited
output results are held during the precharge phase. For later within 45% since the power consumptions of the first stage for the
stages, the output value can be generated in the precharge phase proposed scheme and the conventional domino logic are similar.
by enabling transistors M3 and M5 (Fig. 10). The critical signal The carry output delay time in this simulation result is provided
path is demonstrated in Fig. 10. During the precharge phase in in the table. It should be noted that domino logic can only transit
cycle 1, O4 is finally pulled high to VddVth. This feature enables input logic from 0 to 1 in the evaluation phase. During this
the result for a long cascade of domino gates to be settled even transition, the proposed scheme indicates faster settling speed
during the precharge phase (Fig. 11). This feature even not than the conventional domino logic. This faster speed is coming
conventional, enables to retrieve the correct output in the sub- from an extra charge transfer path from node Bto node Fthrough
sequent clock cycle. For the proposed structure and similarly to M5. As a result, the charge in node Zcan leak faster. A smaller peak
conventional domino logic, the input logic changing from ‘1’ to ‘0’ voltage in node Bcan be observed in the proposed scheme which
can only be sensed during the next clock cycle. also shows this faster settling result.
Similarly, higher load capacitance leads to increased power
saving using our proposed architecture. For a 1:1 duty ratio of the
3.4. P-type implementation clock signal, the power saving for the same 1-bit adder with a
clock frequency of 500 MHz is evaluated for different clock
A domino logic chain consists of alternative N-type and P-type frequencies and results are summarized in Table 2.
modules. The pseudo dynamic buffer proposed in Fig. 4 is only In order to validate the claimed power saving of our proposed
valid for the N-type domino logic. For P-type logic, the proposed domino logic, extensive simulations were performed using a set
circuit structure is also feasible as depicted in Fig. 12. For P-type of logic circuits as illustrated in Table 3. In this comparison, the
domino logic, the buffer configuration is very similar to the clock frequency, the supply voltage and the load capacitance were
N-type design, and is obtained by connecting the source of the set to 500 MHz, 1.8 V and 10 fF, respectively. Two different
buffer’s PMOS transistor M4 to the drain of the PMOS evaluation scenarios of input logic activity are simulated when the input
transistor M1. logic activity is 50% of the clock signal and for the case when the
400 F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404

Table 2
Power and delay savings comparison for different load capacitances in 0:18 mm
technology. 3

Load capacitance fF @500 MHz 10 25 50 100


2.5 clk' Proposed TSPC
Power mW (1.8 V) Conventional 56.2 63 68.5 76.5
This work 32.6 34 35.6 39

Voltage (V)
Power saving 42% 46% 48% 49% 2
Delay ns Conventional 0.38 0.58 0.9 1.5
This work 0.3 0.48 0.8 1.4 1.5

1
Table 3
Power savings comparison with different logic functions in 0:18 mm technology
(Vdd ¼1.8 V, 500 MHz). 0.5

Logic function Power Power Power saving Power saving


(conv.) (pro.) (when k¼0) (%) (when k¼ 50%) 0
(mW) (mW) (%) 5 5.5 6 6.5 7
Time (s) x 10-9
Z ¼ AB 18:4 9 51 24
Z ¼ A þB 16:1 7:74 52 24 Fig. 13. The transient simulation waveforms of the clock signals for the proposed
Z ¼ ABC 22 12:3 44 21 scheme and the TSPC-based logic circuit when N equals to 64.
Z ¼ A þB þ C 17:5 9:6 45 25
Z ¼ ABCD 29:1 15:7 46 24
Z ¼ A þB þ C þD 21:9 11:6 47 22
Z ¼ ABCþ D 20:2 11:3 44 20 1000 25
Z ¼ ABþ CD 20:9 11:9 43 24
1
Power Consumption (µW)

800 2
20
input is static (k¼0). The results for both cases are summarized in

Power Saving (%)


Table 3.
3
4.2. Comparison with TSPC-based dynamic logic 600 15

Since the functionality of the proposed dynamic logic circuit is


similar to TSPC, the power comparison between these two 400 10
structures is very important in order to evaluate the advantage
of our proposed scheme. For the sake of comparison, the circuit
used is illustrated in Fig. 15. For both of these two design
schemes, the clock signal is propagated through the clock buffer 200 5
and then drives N dynamic logic gates (N dynamic buffers). The 0 10 20 30 40 50 60 70
clock buffer has two stages invertors with transistor size 90nm Logic Unit Count N
10 mm=3 mm and 30 mm=10 mm, respectively. The other transis-
tors are designed with the minimum size. The input signal A is 25
fixed to ‘1’ to maximize the power consumption. In order to
estimate the power of the clock distribution network, the overall 1
600 20
Power Consumption (µW)

power estimated here not only includes the power consumed by


the dynamic logic circuit, but also the power dissipated by the
Power Saving (%)
2
clock buffer. The value of N equivalently represents the load 500 15
capacitance of the clock signal. For each TSPC dynamic logic, there 3
are three transistors driven by the clock, compared with only two
clock transistors for our proposed scheme. As a result, the total 400 10
load capacitance of the clock signal for the TSPC-based logic
circuit is much larger. Fig. 13 illustrates a transient simulation
showing the slew rate difference between these two schemes, 300 5
when N equals to 64.
When the clock signal is within the range from Vth to VddVth,
200 0
there exists a short circuit path when the input signal A is high,
0 10 20 30 40 50 60 70
resulting in a large short-circuit current. Since the clock slew rate
45nm Logic Unit Count N
represents the lasting time of the short circuit, the TSPC unit will
consume more power compared to our proposed dynamic logic Fig. 14. Power consumption comparison between the proposed PDB-based logic unit
circuit. and the TSPC logic unit using CMOS 90 nm process (A), CMOS 45 nm process (B).
In Fig. 14, the simulated power consumption as a function of N Curve A is the power saving; curve B is the power consumption of the TSPC-based
dynamic logic circuit and curve C is the power consumption of the proposed scheme.
is represented, based on more advanced CMOS processes namely
45 nm and 90 nm CMOS and for a clock frequency of 4 GHz with a
1.2 V Vdd. It should be mentioned that when N is small, the power buffer can be ignored and thus, the power ratio between our
consumed by the clock buffer is significant in the total power proposed scheme and the TSPC-based scheme approximately
consumption. As N is increased, the power consumed by the clock equals to the ratio of their clock load capacitance values.
F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404 401

Assuming all the clock transistors contribute the same load 16bit, 32bit and 64bit. With the power consumption and delay
capacitance on the clock bus and the power consumed by the comparison is summarized in Table 4, indicating about 30% power
clock buffer can be ignored (N is a large value), the clock load saving and a slightly shorter delay. The load capacitance is 10 fF,
capacitance of our proposed dynamic logic structure is about 2/3 the clock frequency is 50 MHz and the input single activity is
as much as the value of the TSPC-based scheme. 15 MHz. The delay of both 64bit full adders from the least
It should, however, be noted that, during the evaluation phase, significant bit (LSB) input to the most significant bit (MSB) output
an NMOS clock transistor is stacked in the inverter for both the
proposed and TSPC-based scheme, which could increase the
evaluation delay when the output is pulled down. In order to
obtain the same delay as the conventional domino logic circuit, VDD VDD VDD VDD VDD
the NMOS transistor’s channel width should be increased.
Although increasing the transistor size leads to more power
consumption and eventually up to 10% power saving is degraded S
due to a larger parasitic capacitance, the TSPC-based scheme Ci
suffers even more from such an issue due to a larger clock load Ci B
capacitance. Some of the conventional circuit techniques such as B
clk clk
progressive sizing of the stacked transistors can help improve the
delay by about 20% [17]. The static power dissipation is simulated A B A A B Ci A
by fixing the input logic to ‘0’ and feeding a 4 GHz clock signal. Co
When N is 64 using a 45 nm CMOS technology, the static power
for the TSPC-based dynamic logic circuit is about 397 mW while it
is 344 mW for the proposed circuit, resulting in about 13% power
saving. It should be mentioned that the static power in this
section means the power consumption when the input logic is
fixed to zero. Different from the leakage current in the static Fig. 16. Schematic of an example of an adder cell implemented in our test chip.
combinational logic circuit, this leakage is mainly due to the
activity of the clock. Therefore, this leakage is much larger than
the one in a static logic circuits.
A real ripple carry adder is also simulated by using 0:18 mm
technology. The adder widths in this benchmark are 4bit, 8bit,

Table 4
Full adder power consumption and delay comparison between the proposed
scheme and TSPC in 0:18 mm technology.

Ripple carry adder bit 64 32 16 8 4 1bit delay

Power mW (1.8 V) TSPC-based 41 20 9.8 4.6 2 410 ps


This work 27 13 6.5 3.1 1.3 400 ps
Power saving 34% 34% 33% 33% 36%
Fig. 17. Layout of the test circuit using AMIS 0:35 mm technology.

VDD VDD VDD VDD

M1 M1
clk' M4 clk’ M4
clk clk F
F
A A Z A A
M3 M3 Z M6
M5
clk
M5
M2 Proposed M2

dynamic buffer

TSPC based buffer

Proposed TSPC based


dynamic buffer Proposed buffer
dynamic × N
buffer TSPC based
buffer ×N

Proposed TSPC based


dynamic buffer buffer

Fig. 15. The building block for power comparison between (A) the proposed dynamic logic circuit and (B) the TSPC-based logic circuit.
402 F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404

is about 25 ns, when considering a critical path A ¼ 1 40,


B½63 : 1 ¼ 0, B½0 ¼ 14 0 and C ¼0.

4.3. Experimental results

In order to validate the proposed concept, a prototype chip


including a set of test structure was designed using the proposed
domino logic as well as the conventional logic style. The test
structures were fabricated in AMIS 0:35 mm technology. Figs. 16
and 17 illustrate the schematic and layout of an example of an
adder cell implemented in our test chip. Fig. 18 illustrates the
device under test (upper picture) and the chip microphotograph

Fig. 20. Experimental results showing the transient response of an adder cell
implemented using the proposed PDB-based domino logic.

2500

2000
Power Consumption (µw)
Conventional
1500 domino logic

1000
*
*
* *
500 *
* Proposed domino logic
*
* *
*
0 **
0 100 200 300 400 500 600 700 800 900 1000
Clock Frequency (MHz)

Fig. 21. Comparison of the measured power consumption for the conventional
versus proposed adder circuits as a function of the clock frequency for an external
load capacitance of 100 fF.

60

Fig. 18. (Upper) The measurement setup and (lower) the microphotograph of
50
the chip.
Power Consumption (uW)

40 Measurement results

30

20 Simulation results

10

0
0 1000 2000 3000 4000 5000
Load Capacitance (fF)

Fig. 22. Measured versus simulated power consumption for the conventional
Fig. 19. Experimental results showing the transient response of an adder cell domino logic as a function of the load capacitance, when the input logic is high
implemented using conventional domino logic. and the clock frequency is 1 MHz.
F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404 403

(lower picture). The test structures were successfully tested and capacitance. Note that the proposed structure has much lower
Figs. 19 and 20 illustrate the silicon output transient waveforms power consumption benefiting from no precharge pulse propaga-
from the conventional and proposed adder cells, respectively. tion at the output node. As a result, the power saving is roughly
In this measurement result, the clock frequency is set to 1 MHz, proportional to the output load capacitance, which is consistent
the switching frequency of the input logic is set to 1 KHz and the with Eq. (1). One can note from Fig. 23 that the power does not
supply voltage is set to 3.3 V. For the conventional domino logic, increase linearly as a function of the output load capacitance. This
in each clock cycle the precharge pulse is propagated to the is explained by the fact that the output activity is reduced since
output node, when the input logic is high. In the proposed design, the proposed buffer isolates the large capacitive node from the
the precharge pulse propagation is avoided as depicted in Fig. 20. dynamic node of the gate.
Fig. 21 illustrates the relationship between the clock frequency Fig. 24 shows the power consumption for both the conven-
and the power consumption for both the conventional and the tional and the proposed designs for different supply voltages. The
proposed domino adder circuit for an external load capacitance of clock frequency is set to 1 MHz and the load capacitance is 0.5 pF.
100 fF and a 1:1 clock duty ratio. The power consumption is Obviously, the power consumption is proportional to V 2dd and the
proportional to the clock frequency and a consistent up to 45% power saving is roughly independent of the supply voltage. In this
power saving is obtained. The leakage current is only about 17 nA, 0:35 mm CMOS technology, the transistors enter into subthres-
which can be ignored. hold region when Vdd o 0:5 V. Therefore, the proposed domino
In Figs. 22 and 23, the power consumption of the conventional logic can operate properly even in the subthreshold region with
design versus the proposed one for different loading conditions is considerable power saving.
represented. The consumed power is a linear function of the load

5. Conclusion
1.35
In a conventional footed domino logic circuit, the precharge
Measurement results pulse is propagated to the output stage, which increases substan-
1.3 tially the power and limits the cascading performance of the gate.
Power Consumption (uW)

This paper proposes a footed pseudo dynamic buffer scheme,


which can eliminate the precharge pulse propagation. This
1.25 proposed technique is not only feasible for N-type network
domino logic but also for P-type one while significantly reducing
Simulation results power and improving the cascading capability. Moreover com-
1.2 pared to the TSPC scheme, the proposed structure does not
require an extra clock controlled transistor in the buffer, which
results in a more compact implementation and lower clock load
1.15 capacitance. Performance of the proposed logic structure is
compared to the traditional domino logic for different clock
frequencies and under different loading conditions. The power
1.1 consumption of our proposed scheme also shows an obvious
0 1000 2000 3000 4000 5000 advantage compared to the TSPC-based dynamic logic in 90 nm
Load Capacitance (fF) and 45 nm CMOS processes. Test chip including test structures is
fabricated using AMIS 0:35 mm technology. Simulation and
Fig. 23. Measured versus simulated power consumption for the proposed domino experimental results show proper logic functionality and power
logic as a function of the load capacitance, when the input logic is high and the
clock frequency is 1 MHz. One can note that the power does not increase linearly
saving of up to 45% compared to conventional footed domino
for this case as the proposed buffer isolates the large capacitive node from the implementation.
dynamic node of the gate.

Acknowledgment
90
simulation results The work described in this paper is supported by a research
80 measurement results
grant from the research grant council of Hong Hong RGC Grant
Square root of the power (nW1/2)

70 Conventional design reference 610507.

60
References
50
[1] M. Anders, S. Mathew, B. Bloechel, S. Thompson, R. Krishnamurthy,
40 K. Soumyanath, S. Borkar, A 6.5 GHz 130 nm single-ended dynamic ALU
and instruction-scheduler loop, IEEE ISSCC (2002) 410–411.
[2] Xu-guang Sun, Zhi-gang Mao, Feng-chang Lai, A 64 bit parallel CMOS adder
30
for high performance processors, in: Proceedings of the IEEE Asia-Pacific
Conference on ASIC, 2002, pp. 205–208.
20 [3] R.H. Krambeck, C.M. Lee, H.-F.S. Law, High-speed compact circuits with
CMOS, IEEE Journal of Solid-State Circuits SC-17 (3) (1982) 614–619.
10 [4] Hwang Wei, R.V. Joshi, W.H. Henkels, A 500-MHz, 32-word64-bit, eight-port
Proposed design self-resetting CMOS register file, IEEE Journal of Solid-State Circuits Jan.
0 (1999) 56–67.
0 0.5 1 1.5 2 2.5 3 3.5 [5] B. Amrutur, M. Horowitz, Fast low-power decoders for RAMs, IEEE Journal of
Solid-State Circuits 36 (Oct.) (2001) 1506–1515.
Supply voltage (v) [6] A. Bhavnagarwala, S.V. Kosonocky, S.P. Kowalczyk, R.V. Joshi, A transregional
CMOS SRAM with single, logic VDD and dynamic power rails, in: IEEE
Fig. 24. Square root of the power consumption as a function of the supply voltage. Symposium on VLSI Circuits, 2004, pp. 291–293 (June).
404 F. Tang et al. / INTEGRATION, the VLSI journal 45 (2012) 395–404

[7] K.J. Nowka, T. Galambos, Circuit design techniques for a gigahertz integer [13] Y. Ji-Ren, I. Karlsson, C. Svensson, A true single-phase-clock dynamic CMOS
microprocessor, in: IEEE International Conference on Computer Design, 1998, circuit technique, IEEE Journal of Solid-State Circuits 22 (Oct.) (1987)
pp. 11–16, Aug. 899–901.
[8] R. Heald, et al., A third-generation SPARC V9 64-b microprocessor, IEEE [14] W.S. Song, M.M. Vai, H.T. Nguyen, High-performance low-power bit-level
Journal of Solid-State Circuits 5 (2000) 1526–1538. systolic array signal processor with low-threshold dynamic logic circuits, in:
[9] Neil H.E. Weste, David Harris, Principles of CMOS VLSI Design: A System Conference Record of the Thirty-Fifth Asilomar Conference on Signals,
Perspective, 3rd ed., Addison-Wesley, 2004. Systems and Computers, 2001, pp. 144–147.
[10] Tyler Thorp, Dean Liu, Pradeep Trivedi, Analysis of blocking dynamic circuits, [15] Jinn-Shyan Wang, Ching-Rong Chang, Chingwei Yeh, Analysis and design of
IEEE Transactions on VLSI Systems (2003) 744–749. high-speed and low-power CMOS PLAs, IEEE Journal of Solid-State Circuits 36
[11] Yolin Lih, Nestoras Tzartzanis, William W. Walker, A leakage current replica
(8) (2001) 1250–1262.
Keeper for dynamic circuits, IEEE Journal of Solid-State Circuits 42 (1) (2007)
[16] V. Kursun, E.G. Friedman, Domino logic with variable threshold voltage
48–55.
keeper, IEEE Transactions on VLSI Systems 11 (6) (2003) 1080–1093.
[12] F. Mendoza-Hernandez, M. Linares-Aranda, V. Champac, Noise-tolerance
[17] Jan M. Rabey, Anantha Chandrakasan, Borivoje Nikolic, Digital Integrated
improvement in dynamic CMOS logic circuits. in: Proceedings of the IEE
Circuits—A Design Perspective, 2nd ed., Prentice Hall, 2003.
Circuits, Devices and Systems, vol. 153, 2006, pp. 565–573 No. 6, Dec.

You might also like