VLSI Design SoC CH 6

Chapter 6
Designing Combinational Logic

Gates in CMOS
There are numerous circuit styles to implement a
given logic function. As with the inverter, the
common design metrics by which a gate is
evaluated are area, speed, energy and power.
Depending on the application, the emphasis will
be on different metrics. For instance, the switching
speed of digital circuits is the primary metric in a
high-performance processor, while it is energy
dissipation in a battery operated circuit. In
addition to these metrics, robustness to noise and
reliability are also very important considerations.
Combinational vs. Sequential Logic
In Out
Combinational Combinational
In Logic Out Logic
Circuit Circuit
State
Combinational Sequential
Output = f(In ) Output = f (In, Previous In)
3
A static CMOS gate is a combination of two
networks, called the pull-up network (PUN) and the
pull-down network (PDN) (Figure next ). The figure
shows a generic N input logic gate where all inputs are
distributed to both the pull-up and pull-down networks.
The function of the PUN is to provide a connection

between the output and VDD anytime the output of the
logic gate is meant to be 1 (based on the inputs).
Similarly, the function of the PDN is to connect the

output to VSS when the output of the logic gate is meant
to be 0.
The PUN and PDN networks are constructed in a

mutually exclusive fashion such that one and only one
of the networks is conducting in steady state.
The primary advantage of the CMOS
structure is robustness (i.e, low
sensitivity to noise), good
performance, and low power
consumption with no static power
dissipation.
Static CMOS Circuit
At every point in time (except during the switching
transients) each gate output is connected to either
VDD or Vss via a low-resistive path.
The outputs of the gates assume at all times the value
of the Boolean function, implemented by the circuit
(ignoring, once again, the transient effects during
switching periods).
This is in contrast to the dynamic circuit class, which
relies on temporary storage of signal values on the
capacitance of high impedance circuit nodes.
6
Complementary logic gate as a
combination of a PUN (pull-up network)
and a PDN (pull-down network).
The PDN is constructed using NMOS devices, while
PMOS transistors are used in the PUN. The primary
reason for this choice is that NMOS transistors produce
“strong zeros,” and PMOS devices generate “strong
ones”.
A set of construction rules can be derived to construct logic functions
NMOS devices connected in series corresponds to an AND function.

With all the inputs high, the series combination conducts and the value at
one end of the chain is transferred to the other end. Similarly, NMOS
transistors connected in parallel represent an OR function.
Using similar arguments, construction rules for PMOS networks

can be formulated. A series connection of PMOS conducts if both
inputs are low, representing a NOR function, while PMOS
transistors in parallel implement a NAND
PMOS Transistors
in Series/Parallel Connection
PMOS switch closes when switch control input is low
A B
X Y Y= X if AANDB =A+ B
X B Y= Xif AORB= AB
Y
PMOS Transistors pass a “strong” 1 but a “weak” 0
10
Using De Morgan’s theorems,
it can be shown that the pull-up and pull-down networks

of a complementary CMOS structure are dual
networks. This means that a parallel connection of
transistors in the pull-up network corresponds to a series
connection of the corresponding devices in the pull-down
network, and vice versa.
Therefore, to construct a CMOS gate, one of the networks

(e.g., PDN) is implemented using combinations of series
and parallel devices. The other network (i.e., PUN) is
obtained using duality principle by walking the hierarchy,
replacing series sub-nets with parallel sub-nets, and parallel
sub-nets with series sub-nets. The complete CMOS gate is
constructed by combining the PDN with the PUN.
The complementary gate is naturally inverting,
implementing only functions such as NAND, NOR, and
XNOR. The realization of a non-inverting Boolean
function (such as AND OR, or XOR) in a single stage is
not possible, and requires the addition of an extra
inverter stage.
The number of transistors required to implement an N-

input logic gate is 2N.
Two-input NAND gate in complementary static
CMOS style and the corresponding truth table
Example 6.1
Synthesis of complex CMOS Gate Using complementary
CMOS logic
F  D  A .( B  C )
Deriving the pull-up network
hierarchically by identifying sub-
nets
Example 6.2
Noise Margins are input-pattern dependent.
For the above example, a glitch on only one
of the two inputs has a larger chance of
creating a false transition at the output than
when the glitch would occur on both inputs
simultaneously. Therefore, the former
condition has a lower low noise margin.
The propagation delay depends upon the input
patterns
Figure below shows the two-input NAND gate and its

equivalent RC switch level model. Note that the internal
node capacitance Cint —attributable to the source/drain
regions and the gate overlap capacitance of M2/M1— is
included.
Ignoring the effect of internal capacitance Cint ,
Consider for instance the low-to high transition. Three
possible input scenarios can be identified for charging
the output to VDD. If both inputs are driven low, the
two PMOS devices are on. The delay in this case is
0.69 x (Rp/2) x CL, since the two resistors are in
parallel. This is not the worst-case low-to high
transition, which occurs when only one device turns
on, and is given by 0.69 x Rp x CL. For the pull-down
path, the output is discharged only if both A and B are
switched high, and the delay is given by 0.69 x (2RN)
x CL to a first order.
In other words, adding devices in series slows down
the circuit, and devices must be made wider to avoid a
performance penalty. When sizing the transistors in a
gate with multiple fan-in’s, we should pick the
combination of inputs that triggers the worst-case
conditions.
For example, for a NAND gate to have the same pull-
down delay (tphl) as a minimum-sized inverter, the
NMOS devices in the NAND stack must be made
twice as wide so that the equivalent resistance the
NAND pull-down is the same as the inverter. The
PMOS devices can remain unchanged.
Example 6.3
Sizing of a NOR gate to produce the same delay as an
inverter with size
The output of this network is high, if and only if both inputs A and B
are low. The worst-case pull-down transition happens when only one
of the NMOS devices turns on (i.e., if either A or B is high). Assume
that the goal is to size the NOR gate such that it has approximately
the same delay as an inverter with the following device sizes:
NMOS 0.5µm/0.25µm and PMOS 1.5µm/0.25µm.
Since the pull-down path in the worst case is a
single device, the NMOS devices (M1 and M2) can
have the same device widths as the NMOS device in
the inverter. For the output to be pulled high, both
devices must be turned on. Since the resistances add,
the devices must be made two times larger compared to
the PMOS in the inverter (i.e., M3 and M4 must have a
size of 3µm/0.25µm). Since PMOS devices have a
lower mobility relative to NMOS devices, stacking
devices in series must be avoided as much as
possible. A NAND implementation is clearly
preferred over a NOR implementation for
implementing generic logic.
When sizing gates, worst case conditions are

considered
Transistor Sizing
Inverter has size ratio of 2:1
Rp Rp Rp
2 A B 2 4 B
Rn Rp Cint
CL 4
2 A
B
Rn Rn Rn CL
2 Cint
1
A A B 1
26
Transistor Sizing a Complex CMOS Gate
B 8
A 4
C 8
D 4
OUT = D + A • (B + C)
A 2
D 1
B 2 C 2
27
Ignoring the internal node capacitances, while
analysing propagation delay for a first order analysis is
a reasonable assumption. However, in more complex
logic gates that have large fan-in, the internal node
capacitances can become significant. Consider a 4-input
NAND gate as shown in figure below
The internal capacitances consist of the junction
capacitance of the transistors, as well as the gate-to-source
and gate-to-drain capacitances. The latter are turned into
capacitances to ground using the Miller equivalence.
The propagation delay can be computed using the Elmore

delay model and is approximated as:
(considering ground node as source node)
Notice that the resistance of M1 appears in all the terms,

which makes this device especially important when
attempting to minimize delay.
Effect of Fan-in and Fan-out on delay
While complementary CMOS is a very robust and simple

approach for implementing logic gates, there are two major
problems associated with using this style as the complexity
of the gate (i.e., fan-in) increases. First, the number of
transistors required to implement an N fan-in gate is 2N.
This can result in significant implementation area.
The second problem is that propagation delay of a

complementary CMOS gate deteriorates rapidly as a
function of the fan-in. The large number of transistors (2N)
increases the overall capacitance of the gate.
For an N-input NAND gate, the output capacitance increases
linearly with the fan-in since the number of PMOS devices
connected to the output node increases linearly with the fan-
in. Also, a series connection of transistors in either the PUN
or PDN slows the gate as well, because the effective
(dis)charging resistance is increased.
For the same N-input NAND gate, the effective resistance of

the PDN path increases linearly with the fan-in. Since the
output capacitance increase linearly and the pull-down
resistance increases linearly, the high-to-low delay can
increase in a quadratic fashion.
The fan-out has a large impact on the delay of complementary

CMOS logic as well. Each input to a CMOS gate connects to
both an NMOS and a PMOS device, and presents a load to the
driving gate equal to the sum of the gate capacitances.
Propagation delay, of CMOS NAND Gate, for both transitions as a
function of fan-in assuming a fixed fan-out (NMOS: 0.5µm and
PMOS: 1.5µm). As predicted above, the tpLH increases linearly due
to the linearly-increasing value of the output capacitance. The
simultaneous increase in the pull-down resistance and the load
capacitance results in an approximately quadratic relationship for
tpHL. Gates with a fan-in greater than or equal to 4 become
excessively slow and must be avoided.
Design Techniques for Large Fan-in
Several approaches may be used to reduce delays in large

fan-in circuits
1. Transistor Sizing
The most obvious solution is to increase the overall

transistor size. This lowers the resistance of devices in
series and lowers the time constant. However, increasing the
transistor size, results in larger parasitic capacitors, which
do not only affect the propagation delay of the gate in
question, but also present a larger load to the preceding
gate. This technique should, therefore, be used with caution.
2. Progressive Transistor Sizing
From the following delay equation of the N-input NAND

gate, we see that the resistance of M1 (R1) appears N times
in the delay equation, the resistance of M 2 (R2) appears N-
1 times, etc. From the equation, it is clear that R 1 should be
made the smallest, R2 the next smallest, etc. Consequently,
a progressive scaling of the transistors is beneficial: M 1 >
M2> M3 > MN. Basically, in this approach, the important
resistance is reduced while reducing capacitance.
3. Input Re-Ordering
Not all inputs of a gate arrive at the same time (due, for
instance, to the propagation delays of
the preceding logical gates). An input signal to a gate is
called critical if it is the last signal of
all inputs to assume a stable value. The path through the
logic which determines the ultimate
speed of the structure is called the critical path.
Fast Complex Gates:
Design Technique 2
• Transistor ordering
critical path critical path
charged 01
In3 1 M3 CL In1 M3 CLcharged
In2 1 M2 In2 1 M2 C2 discharged

C2 charged
In1 In3 1 M1 C1 discharged
M1 C1 charged
01
delay determined by time to delay determined by time to

discharge CL, C1 and C2 discharge CL
36
Putting the critical-path transistors closer to the output of
the gate can result in a speedup.
This is demonstrated in figure above. Signal In 1 is
assumed to be a critical signal. Suppose further that In 2
and In3 are high and that In1 undergoes a 0->1 transition.
Assume also that CL is initially charged high. In case (a),
no path to GND exists until M1 is turned on, which is
unfortunately the last event to happen. The delay between
the arrival of In1 and the output is therefore determined
by the time it takes to discharge CL, C1 and C2. In the
second case,
C1 and C2 are already discharged when In1 changes. Only
CL still has to be discharged, resulting in a smaller delay.
4. Logic Restructuring
Manipulating the logic equations can reduce the fan-in

requirements and hence reduce the gate delay, as
illustrated in figure below. The quadratic dependency of
the gate delay on fanin makes the six-input NOR gate
extremely slow. Partitioning the NOR-gate into two
three input gates results in a significant speed-up, which
offsets by far the extra delay incurred by turning the
inverter into a two-input NAND gate.
Minimising Delay in Combinational networks
Similar to the delay of inverter, the delay of a

complex gate can be written as,
with tp0 still representing the intrinsic delay of an inverter, and

f the ratio between the external load and the input capacitance
of the gate. In this context, f is often called the electrical
effort. p represents the ratio of the intrinsic (or unloaded)
delays of the complex gate and the simple inverter. g is logical
effort, which is the ratio of input capacitance of gate to the
input capacitance of an inverter for the same output current.
h= fg= gate effort

Equivalently, logical effort is how
much more input capacitance a gate
presents to deliver the same output
current as an inverter.
Logical effort of 2-input NAND and NOR gates.
We size the 2-input NAND and NOR such that their equivalent
resistances equal the resistance of the inverter
gNAND = 4/3
gNOR = 5/3
Example 6.5
The total delay of a path through a combinational
logic block can now be expressed as
Using a similar procedure as was used for the

inverter, to determine the minimum delay of the
path i.e. finding N-1 partial derivatives(w.r.t. input
gate capacitance) and setting them to zero, we find
that each stage should bear the same ‘effort’:
f1 g1 = f2 g2 = …. = fN gN
The path effort can then be defined as the
product of the two, or H = FG.
The gate effort that minimizes the path delay is
found to equal
And the minimum delay through the path is

Sizing Combinational logic for
minimum delay
F= CL /Cg1 = 5
From the entries of the table (above),
H= FG= 125/9, and the optimal stage effort ‘h’ is
we derive the fanout factors: f1 = 1.93; f2 =

1.93x(3/5) = 1.16; f3 = 1.16; f4=1.93
Example 6.6
If there are branches, e.g. a gate is driving
more than one gate, as shown below, we
will have to consider the branching effort
‘b’, i.e.
where Con−path is the load capacitance along the path we are
analyzing and Coff −path is the capacitance of connections that
lead off the path. The path is A to B.
‘B’ is the total branching effort of the path. The branching

effort along an entire path B is the product of the branching
effort at each of the stages along the path.
e.g. The branching effort at the output of the first stage

(of the figure above) is (y+y)/y = 2. At the output of
second stage is (z+z+z)/z= 3.
When branching is considered, the path effort is

H = FGB,
and the optimised stage effort is
(H1/N )
Properties of Complementary CMOS Gates Snapshot
High noise margins:

VOH and VOL are at VDD and GND, respectively.
No static power consumption:
There never exists a direct path between VDD and
VSS (GND) in steady-state mode.
Comparable rise and fall times:
(under appropriate sizing conditions)
51
The dynamic power dissipation is given by
α0->1 CL V2dd f, where α0->1 is the switching activity which
has two components: a static component that is only a
function of the topology of the logic network, and a
dynamic one that results from the timing behavior of the
circuit—the latter factor is also called glitching.
The transition activity is a strong function of the logic

function being implemented. For static CMOS gates with
statistically independent inputs, the static transition
probability is the probability p0 that the output will be in
the zero state in one cycle, multiplied by the probability p 1
that the output will be in the one state in the next cycle:
Transition probability
Assuming that the inputs are independent and

uniformly distributed, any N-input static gate
has a transition probability that corresponds to
where N0 is the number of zero entries and N1 is

the number of one entries in the output column of
the truth table of the function.
Signal Statistics
consider once again a 2-input static NOR gate, and let pa

and pb be the probabilities that the inputs A and B are one.
Assume further that the inputs are not correlated
For an AND gate, Z equals 1 if and only if B

and C are equal to 1,
Dynamic or Glitching Transitions
The finite propagation delay from one logic block (gates) to the next can
cause spurious transitions, called glitches, critical races, or dynamic
hazards, to occur: a node can exhibit multiple transitions in a single
clock cycle before settling to the correct logic level.
Initially, all the outputs are 1 since one of the inputs was 0. For this
particular transition (i.e. ‘1’ at the other input simultaneously), all the
odd bits must transition to 0 while the even bits remain at the value of 1.
However, due to the finite propagation delay, the higher order even
outputs start to discharge and the voltage drops. When the correct input
ripples through the network, the output goes high. The glitch on the
even bits causes extra power dissipation beyond what is required to
strictly implement the logic function.
Glitching in a chain of NAND
Gates
Reducing Switching Activity
1. Logic Restructuring
Chain implementation will have an overall lower switching
activity than the tree implementation for random inputs.
However the tree topology will have lower (no) glitching
activity since the signal paths are balanced to all the gates.
Below are two alternate implementations of
F= A.B.C.D
Assume that all primary inputs (A,B,C,D) are
uncorrelated and uniformly distributed (i.e.,
p1 (a,b,c,d)= 0.5)
2. Input ordering
Reordering of inputs affects the circuit activity.

Consider the two static logic circuits of Figure below. The
probabilities of A, B and C being 1 are listed in the Figure.
Since both circuits implement identical logic functionality, it
is clear that the activity at the output node Z is equal in both
cases. The difference is in the activity at the intermediate
node. In the first circuit, this activity equals (1 - 0.5 x 0.2)
(0.5 x 0.2) = 0.09. In the second case, the probability that a 0
-> 1 transition occurs equals (1 – 0.2 x 0.1) (0.2 x 0.1) =
0.0196. This is substantially lower.
3. Time-multiplexing resources
If data being transmitted were random, it will make no difference which

architecture is used. However if the data signals have some distinct
properties (called temporal correlation), the power dissipation of the
time-multiplexed solution can be significantly higher. Suppose, for
instance, that A is always (or mostly) 1 and B is (mostly) 0. In the
parallel solution, the switched capacitance is very low since there are
very few transitions on the data bits. However, in the time-multiplexed
solution, the bus toggles between 0 and 1.
4. Glitch Reduction by balancing signal paths
The occurrence of glitching in a circuit is mainly due to a mismatch

in the path lengths in the network. If all input signals of a gate
change simultaneously, no glitching occurs. On the other hand, if
input signals change at different times, a dynamic hazard might
develop. Such a mismatch in signal timing is typically the result of
different path lengths with respect to the primary inputs of the
network. This is illustrated in Figure below. Assume that the XOR
gate has a unit delay. The first network (a) suffers from glitching as a
result of the wide disparity between the arrival times of the input
signals for a gate. For example, for gate F3, one input settles at time
0, while the second one only arrives at time 2. Redesigning the
network so that all arrival times are identical can dramatically reduce
the number of superfluous transitions (network b)
Ratioed Logic
Ratioed logic is an attempt to reduce the number of transistors

required to implement a given logic function, at the cost of
reduced robustness and extra power dissipation. The purpose of
the PUN in complementary CMOS is to provide a conditional
path between VDD and the output when the PDN is turned off. In
ratioed logic, the entire PUN is replaced with a single
unconditional load device that pulls up the output for a high
output.
Figure below shows an example of ratioed logic, which uses a
grounded PMOS load and is referred to as a pseudo-NMOS gate.
The clear advantage of pseudo-NMOS is the reduced
number of transistors (N+1 versus 2N for complementary
CMOS).
The nominal high output voltage (VOH) for this gate is VDD
since the pull-down devices are turned off when the output is
pulled high (assuming that VOL is below VTn). On the other
hand, the nominal low output voltage is not 0 V since there is
a fight between the devices in the PDN and the grounded
PMOS load device. This results in reduced noise margins and
more importantly static power dissipation.
Since the voltage swing on the output and the overall

functionality of the gate depends upon the ratio between the
NMOS and PMOS sizes, the circuit is called ratioed. This is
in contrast to the ratioless logic styles, such as
complementary CMOS, where the low and high levels do not
depend upon transistor sizes.
The value of VOL is obtained by equating the
currents for Vin= VDD.
At this operation point, it is reasonable to assume
that the NMOS device resides in linear
mode (since the output should ideally be close to
0V), while the PMOS load is saturated.
Assuming that VOL is small relative to the gate

drive (VDD-VT) and that VTn is equal to VTp in
magnitude, VOL can be approximated as:
In order to make VOL as small as possible, the PMOS
device should be sized much smaller than the NMOS
pull-down devices. Unfortunately, this has a negative
impact on the propagation delay for charging up the
output node since the current provided by the PMOS
device is limited.
A major disadvantage of the pseudo-NMOS gate is the

static power that is dissipated when the output is low
through the direct current path that exists between
VDD and GND.
The static power consumption in the low-output mode is

The static power dissipation of pseudo-NMOS limits
its use. However, pseudo-NMOS still finds use in
large fan-in circuits. When area is most important, the
reduced transistor count compared to complimentary
CMOS is quite attractive.
Differential Cascode Voltage Switch Logic (or DCVSL)
It is possible to create a ratioed logic style that completely

eliminates static currents and provides rail-to-rail swing.
A differential gate requires that each input is provided in
complementary format, and produces complementary
outputs in turn. The feedback mechanism ensures that the
load device is turned off when not needed.
The pull-down networks PDN1 and PDN2 use NMOS
devices and are mutually exclusive (this is, when PDN1
conducts, PDN2 is off, and when PDN1 is off, PDN2
conducts), such that the required logic function and its
inverse are simultaneously implemented.
Assume now that, for a given set of inputs, PDN1

conducts while PDN2 does not, and that Out and Out` are
initially high and low, respectively. Turning on PDN1,
causes Out to be pulled down, although there is still a
fight between M1 and PDN1. Out` is in a high impedance
state, as M2 and PDN2 are both turned off. PDN1 must be
strong enough to bring Out below
VDD- |VTp |, the point at which M2 turns on and starts
charging Out` to VDD—eventually turning off M1. This in
turn enables Out to discharge all the way to GND.
Figure below shows an example of an XOR/XNOR gate
Out= (AB + A`B`)` = XOR

Out` = (AB` + A`B)` = XNOR
The resulting circuit exhibits a rail-to-rail swing, and the

static power dissipation is eliminated.
This circuit style still has a

power-dissipation problem that
is due to cross-over currents.
During the transition, there is a
period of time when PMOS and
PDN are turned on
simultaneously, producing a
short circuit path.
It is possible to share transistors among
the two pull down networks, which
reduces the implementation overhead.
DCVSL Transient Response
Example 6.8
DCVSL properties
This approach prevents some of the time-differential problems

introduced by additional inverters. For example, in logic design
it often happens that both a signal and its complement are needed
simultaneously. When the complementary signal is generated
using an inverter, the inverted signal is delayed with respect to
the original (Figure a). This causes timing problems, especially
in very high-speed designs. The differential output capability
avoids this problem (Figure b).
Pass-Transistor Logic
A popular and widely-used alternative to

complementary CMOS is pass-transistor logic,
which attempts to reduce the number of transistors
required to implement logic by allowing the primary
inputs to drive gate terminals as well as source/drain
terminals. This is in contrast to logic families that we
have studied so far, which only allow primary inputs
to drive the gate terminals of MOSFETS.
Previous figure shows an implementation of
the AND function constructed that way, using
only NMOS transistors. In this gate, if the B
input is high, the top transistor is turned on
and copies the input A to the output F. When
B is low, the bottom pass transistor is turned
on and passes a 0.
When the pass transistor pulls a node high, the output
only charges up to VDD -VTn. In fact, the situation is worsened
by the fact that the devices experience body effect, as there
exists a significant source-to-body voltage when pulling high.
Consider the case when the pass transistor is charging up a
node with the gate and drain terminals set at VDD. Let the
source of the NMOS pass transistor be labeled x. Node x will
charge up to VDD-VTn(Vx):
Example 6.9
Pass transistor output (Drain/Source)
terminal should not drive other gate
terminals to avoid multiple threshold drops.
Pass-transistors require lower switching energy
to charge up a node due to the reduced voltage
swing.
But it may consumes static power when the

output is high—the reduced voltage level may
be insufficient to turn off the PMOS transistor
of the subsequent CMOS inverter.
This equation corresponds to when both drain and gate

of pass transistor are high
Differential Pass Transistor Logic
For high performance design, a differential pass-

transistor logic family, called CPL or DPL, is commonly
used. The basic idea (similar to DCVSL) is to accept
true and complementary inputs and produce true and
complementary outputs. A number of CPL gates
(AND/NAND, OR/NOR, and XOR/NXOR) are shown
in Figure below
E.g. the simultaneous implementation of 4-
input AND/ NAND gate in CPL leads to
reduced transistor count due to sharing of
logic terms.
4-input AND/NAND Gate in CPL
Example 6.11
Robust and Efficient Pass-Transistor Design
Unfortunately, differential pass-transistor logic, like single-ended pass-

transistor logic, suffers from static power dissipation and reduced noise
margins, since the high input to the signal-restoring inverter only charges
up to VDD-VTn.
Solution 1: Level Restoration. A common solution to the voltage drop

problem is the use of a level restorer, which is a single PMOS configured
in a feedback path.
Assuming B= VDD , if input A makes a 0 to VDD transition, Mn
only charges up node X to VDD-VTn. This is, however, enough
to switch the output of the inverter low, turning on the feedback
device Mr and pulling node X all the way to VDD. This
eliminates any static power dissipation in the inverter.
While this solution is appealing in terms of eliminating static

power dissipation, it adds complexity since the circuit is now
ratioed. The problem arises during the transition of node X
from high-to-low. The pass transistor network attempts to pull-
down node X while the level restorer pulls now X to V DD.
Therefore, the pull-down device must be stronger than the pull-
up device in order to switch node X and the output. Some
careful transistor sizing is necessary to make the circuit
function correctly. The resistance of Mn and Mr must be such
that the voltage at node X drops below the threshold of the
inverter.
NMOS-only Switch
C = 2.5V C = 2.5 V
M2
A = 2.5 V A = 2.5 V B
Mn
B
CL M1
VB does not pull up to 2.5V, but 2.5VV-TN

Threshold voltage loss causes
static power consumption
NMOS has higher threshold than PMOS (body effect)
86
Level Restoration
Another concern is the influence of the level

restorer on the switching speed of the device.
Adding the restoring device increases the
capacitance at the internal node X, slowing
down the gate. The rise time of the gate is further
negatively affected, since, the level restoring
transistor Mr fights the decrease in voltage at node
X before being switched off.
On the other hand, the level restorer reduces the fall
time, since the PMOS transistor, once turned on,
speeds the pull-up action.
Level Restoration
A modification of the level-restorer, applicable in differential

networks and known as swing-restored pass transistor logic, is
shown in Figure below. Instead of a simple inverter or half-latch
at the output of the pass transistor network, two back-to-back
inverters, configured in a cross-coupled fashion, are used for
level restoration and performance improvement.
Solution 2: Multiple threshold and zero threshold devices.
The use of zero-threshold transistors can be dangerous due

to the subthreshold currents that can flow through the
pass-transistors, even if VGS is slightly below VT. This is
demonstrated in Figure below
Solution3: Transmission Gate Logic
The most widely-used solution to deal with the voltage-
drop problem is the use of transmission gates. It builds
on the complementary properties of NMOS and PMOS
transistors: NMOS devices pass a strong 0 but a weak 1,
while PMOS transistors pass a strong 1 but a weak 0.
The ideal approach is to use an NMOS to pull-down
and a PMOS to pull-up. The transmission gate
combines the best of both device flavors by placing a
NMOS device in parallel with a PMOS device.
A=B when C=1
When C=0, the

transmission gate
is open
Consider the case of charging node B to V DD for the transmission
gate circuit in Figure a (below). Node A is driven to V DD and
transmission gate is enabled (C = 1 and C= 0). If only the NMOS
pass-device were present, node B charges up to VDD-VTn at which
point the NMOS device turns off. However, since the PMOS device
is present and turned on (VGSp = -VDD), charging continues all the
way up to VDD. Figure b shows the opposite case, this is discharging
node B to 0. B is initially at VDD when node A is driven low. The
PMOS transistor by itself can only pull down node B to V Tp at which
point it turns off. The parallel NMOS device however stays turned
on (since its VGS = VDD) and pulls node B all the way to GND.
A simple inverting two-input
multiplexer using transmission
gates
Transmission Gate XOR
When B=1, F= A`B.

When B=0, F= AB`.
Overall, F= A`B + AB`
This implementation of XOR has fewer transistors (6) as

compared to complementary implementation (12)
Dynamic CMOS Design
The pseudo-NMOS logic style requires only N + 1 transistors

to implement an N input logic gate, but unfortunately it has
static power dissipation. In this section, an alternate logic
style called dynamic logic is presented that obtains a similar
result, while avoiding static power consumption.
Precharge
When CLK = 0, the output node Out is precharged to

VDD by the PMOS transistor Mp. During that time, the
evaluate NMOS transistor Me is off, so that the pull-
down path is disabled. The evaluation FET eliminates
any static power that would be consumed during the
precharge period (this is, static current would flow
between the supplies if both the pulldown and the
precharge device were turned on simultaneously).
Evaluation
For CLK = 1, the precharge transistor Mp is off, and the

evaluation transistor Me is turned on. The output is
conditionally discharged based on the input values and the pull-
down topology. If the inputs are such that the PDN conducts,
then a low resistance path exists between Out and GND and the
output is discharged to GND. If the PDN is turned off, the
precharged value remains stored on the output capacitance CL,
which is a combination of junction capacitances, the wiring
capacitance, and the input capacitance of the fan-out gates.
Dynamic logic is non-ratioed. The sizing of the PMOS
precharge device is not important for realizing proper
functionality of the gate.
It has reduced transistor count

A number of important properties can be
derived for the dynamic logic gate:
• The logic function is implemented by the

NMOS pull-down network. The
construction of the PDN proceeds just as it
does for static CMOS.
• The number of transistors (for complex
gates) is substantially lower than in the
static case: N + 2 versus 2N.
• It is non-ratioed.
It only consumes dynamic power. Ideally, no static
current path ever exists between V DD and GND. The
overall power dissipation, however, can be
significantly higher compared to a static logic gate.
The logic gates have faster switching speeds. There

are two main reasons for this.
The first (obvious) reason is due to the reduced load
capacitance attributed to the lower number of
transistors per gate and the single-transistor load per
fan-in. Second, the dynamic gate does not have short
circuit current, and all the current provided by the
pull-down devices goes towards discharging the load
capacitance.
When evaluating the power dissipation of a dynamic gate,
it would appear that dynamic logic presents a significant
advantage. There are three reasons for this. First, the
physical capacitance is lower since dynamic logic uses
fewer transistors to implement a given function. Also, the
load seen for each fanout is one transistor instead of two.
Second, dynamic logic gates by construction can at most

have one transition per clock cycle. Glitching (or dynamic
hazards) does not occur in dynamic logic. Finally,
dynamic gates do not exhibit short circuit power since the
pull-up path is not turned on when the gate is evaluating.
Dynamic logic has higher switching activity,
α0->1 = p0
where p0 is the probability that the output is
zero.
p0 > p0 p1
Example 6.16
Issues in Dynamic Design
1) Charge Leakage
If the pull-down network is off, the output should ideally remain
at the precharged state of VDD during the evaluation phase.
However, this charge gradually leaks away due to leakage
currents, eventually resulting in a malfunctioning of the gate.
Sources of leakage are reverse biased
diodes and sub-threshold leakage of
the devices M1 and Mp
The solution to leakage is

to use static bleeders as
leakage compensators.
The size of bleeder
transistor is made small
so that pull down devices
could lower the output
node
2) Charge Sharing
During the precharge phase, the output node is precharged to V DD.

Assume that all inputs are set to 0 during precharge, and that the capacitance
Ca is discharged.
Assume further that input B remains at 0 during evaluation, while input A
makes a 0 -> 1 transition, turning transistor Ma on. The charge stored
originally on capacitor CL is redistributed over CL and Ca. This causes a drop
in the output voltage, which cannot be recovered due to the dynamic nature
of the circuit.
The most common and effective approach to deal with
the charge redistribution is to also precharge critical
internal nodes, as shown in the figure below. This
solution comes at the expense of increased area and
capacitance
Issues in Dynamic Design 3:
Backgate Coupling (Output to Input
Capacitive coupling)
A transition in the input In of the static gate
may cause the output of the gate (Out2) to
go low. This output transition
couples capacitively to the other input of
the gate, the dynamic node Out1, through
the gate-source and gate-drain capacitances
of transistor M4. Due to this the output of
the dynamic gate can drop significantly.
Backgate Coupling Effect
3
2
Out1
Voltage
1
Clk
0
In Out2
-1
0 2 Time, ns 4 6
109
Issues in Dynamic Design 4: Clock Feedthrough
Coupling between Out and Clk

Clk Mp input of the precharge device
Out due to the gate to drain
A CL capacitance. So voltage of Out
can rise above VDD. The fast
B
rising (and falling edges) of the
Clk Me clock couple to Out.
110
Clock Feedthrough
A special case of capacitive coupling is clock-

feedthrough, an effect caused by the capacitive
coupling between the clock input of the precharge
device and the dynamic output node. The coupling
capacitance consists of the gate-to-drain capacitance
of the precharge device, and includes both the overlap
and the channel capacitances. This capacitive
coupling causes the output of the dynamic node to
rise above VDD on the low-to-high transition of the
clock, assuming that the pull-down network is turned
off. Subsequently, the fast rising and falling edges of
the clock couple onto the signal node
Clock Feedthrough
Clock feedthrough
Clk
Out 2.5
In1
In2 Voltage 1.5
In3 In &
0.5 Clk
In4 Out
Clk -0.5
0 0.5 Time, ns 1
Clock feedthrough
112
Straightforward cascading of
dynamic gates to create more
complex structures does not work.
Cascading Dynamic Gates
During the precharge phase (i.e., CLK =0), the outputs of

both inverters are precharged to VDD. Assume that the
primary input In makes a 0 -> 1 transition (Figure b). On the
rising edge of the clock, output Out1 starts to discharge.
The second output should remain in the
precharged state of VDD as its expected value is
1 (Out1 transitions to 0 during evaluation).
However, there is a finite propagation delay for
the input to discharge Out1 to GND. Therefore,
the second output also starts to discharge.
Domino Logic
The cascading problem in dynamic logic arises because the outputs of

each gate—and hence the inputs to the next stages—are precharged to 1.
This may cause inadvertent discharge in the beginning of the evaluation
cycle. Setting all the inputs to 0 during precharge addresses that concern.
A Domino logic module consists of an n-type dynamic logic block
followed by a static inverter. During precharge, the output of the n-type
dynamic gate is charged up to VDD, and the output of the inverter is set to
0.
Since each dynamic gate has a static inverter, only non-
inverting logic can be implemented which is a major
limitation of domino logic
one approach to the problem—reorganizing
the logic using simple boolean transforms such as De
Morgan’s Law.
Differential Domino logic gate
For obtaining inverting as well as non inverting logic function
The function of transistors Mf1 and Mf2 is to keep the circuit static when the clock
is high for extended periods of time
(bleeder)
Multiple Output Domino for
reduced Area
Compound Domino Logic
Instead of each dynamic gate driving a static inverter, it is possible to

combine the outputs of multiple dynamic gates with the aid of a complex
static CMOS gate, as shown in Figure below. The outputs of three
dynamic structures, implementing O1 = (ABC)`, O2 = (DEF)` and O3 =
(GH)`, are combined using a single complex CMOS static gate that
implements O = ((o1+o2) o3)`. The total logic function realized this way
equals O = ABCDEF + GH. This minimizes number of transistors
np-CMOS
The Domino logic presented in the previous section has the

disadvantage that each dynamic gate requires an extra
static inverter in the critical path to make the circuit
functional.
np-CMOS, provides an alternate approach to cascading
dynamic logic by using two flavors (n-tree and p-tree) of
dynamic logic.
np-CMOS logic exploits the duality between n-tree and p-

tree logic gates to eliminate the cascading problem. If the
n-tree gates are controlled by CLK, and p-tree gates are
controlled using CLK`, n-tree gates can directly drive p-
tree gates, and vice-versa. Similar
to Domino, n-tree outputs must go through an inverter
when connecting to another n-tree gate.
When inverting logic is required, we can use
PUN gate followed by PDN gate or vice
versa. When non inverting logic is required
use either n-tree or p-tree
np-CMOS logic circuit style
In a p-tree logic gate, PMOS devices are
used to build a pull-up logic network,
including a PMOS evaluation transistor
(Figure above). The NMOS predischarge
transistor drives the output
low during precharge. The output
conditionally makes a 0 -> 1 transition
during evaluation depending on its
inputs.
Problem 6.21

VLSI Design SoC CH 6

Uploaded by

Copyright:

Available Formats

VLSI Design SoC CH 6

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VLSI Design SoC CH 6

Uploaded by

Copyright:

Available Formats

Chapter 6

Designing Combinational Logic

Output = f(In ) Output = f (In, Previous In)

The function of the PUN is to provide a connection

Similarly, the function of the PDN is to connect the

The PUN and PDN networks are constructed in a

NMOS devices connected in series corresponds to an AND function.

Using similar arguments, construction rules for PMOS networks

PMOS switch closes when switch control input is low

PMOS Transistors pass a “strong” 1 but a “weak” 0

it can be shown that the pull-up and pull-down networks

Therefore, to construct a CMOS gate, one of the networks

The number of transistors required to implement an N-

Figure below shows the two-input NAND gate and its

When sizing gates, worst case conditions are

The propagation delay can be computed using the Elmore

(considering ground node as source node)

Notice that the resistance of M1 appears in all the terms,

While complementary CMOS is a very robust and simple

The second problem is that propagation delay of a

For the same N-input NAND gate, the effective resistance of

The fan-out has a large impact on the delay of complementary

Several approaches may be used to reduce delays in large

The most obvious solution is to increase the overall

From the following delay equation of the N-input NAND

critical path critical path

In2 1 M2 In2 1 M2 C2 discharged

delay determined by time to delay determined by time to

Manipulating the logic equations can reduce the fan-in

Similar to the delay of inverter, the delay of a

with tp0 still representing the intrinsic delay of an inverter, and

h= fg= gate effort

Using a similar procedure as was used for the

And the minimum delay through the path is

H= FG= 125/9, and the optimal stage effort ‘h’ is

we derive the fanout factors: f1 = 1.93; f2 =

‘B’ is the total branching effort of the path. The branching

e.g. The branching effort at the output of the first stage

When branching is considered, the path effort is

High noise margins:

The transition activity is a strong function of the logic

Assuming that the inputs are independent and

where N0 is the number of zero entries and N1 is

consider once again a 2-input static NOR gate, and let pa

For an AND gate, Z equals 1 if and only if B

Reordering of inputs affects the circuit activity.

If data being transmitted were random, it will make no difference which

The occurrence of glitching in a circuit is mainly due to a mismatch

Ratioed logic is an attempt to reduce the number of transistors

Since the voltage swing on the output and the overall

Assuming that VOL is small relative to the gate

A major disadvantage of the pseudo-NMOS gate is the

The static power consumption in the low-output mode is

It is possible to create a ratioed logic style that completely

Assume now that, for a given set of inputs, PDN1

Out= (AB + A`B`)` = XOR

The resulting circuit exhibits a rail-to-rail swing, and the

This circuit style still has a

This approach prevents some of the time-differential problems