Ec8095-Vlsi Design-607796847-Ec 8095 Notes
Ec8095-Vlsi Design-607796847-Ec 8095 Notes
Ec8095-Vlsi Design-607796847-Ec 8095 Notes
MOS Transistor, CMOS logic, Inverter, Pass Transistor, Transmission gate, Layout Design Rules,
Gate Layouts, Stick Diagrams, Long-Channel I-V Characteristics, C-V Characteristics, Non ideal I-V
Effects, DC Transfer characteristics, RC Delay Model, Elmore Delay, Linear Delay Model, Logical
effort, Parasitic Delay, Delay in Logic Gate, Scaling.
INTRODUCTION: (VLSI)
In 1958, Jack Kilby built the first integrated circuit flip-flop at Texas Instruments.
Bell Labs developed the bipolar junction transistor. Bipolar transistors were more reliable,
less noisy and more power-efficient.
In 1960s, Metal Oxide Semiconductor Field Effect Transistors (MOSFETs) began to enter
in the production.
MOSFETs offer the compelling advantage that; they draw almost zero control current
while idle.
They come in two flavors: nMOS and pMOS, using n-type and p-type silicon respectively.
In 1963, Frank Wanlass at Fairchild described the first logic gates using MOSFETs.
Fairchild’s gates used both nMOS and pMOS transistors, naming as Complementary
Metal Oxide Semiconductor (CMOS).
Power consumption became a major issue in the 1980s as hundreds of thousands of
transistors were integrated onto a single die.
CMOS processes were widely adopted and replaced nMOS and bipolar processes for all
digital logic applications.
In 1965, Gordon Moore observed that plotting the number of transistors that can be most
economically manufactured on a chip gives a straight line on a semi logarithmic scale.
Explain the basic concept of nMOS and pMOS transistor with relevant symbol.
A Metal-Oxide-Semiconductor (MOS) structure is created by superimposing layers of
conducting and insulating materials.
CMOS technology provides two types of transistors. They are n-type transistor (nMOS) and p-
type transistor (pMOS).
As transistor operation is controlled by electric fields, the devices are also called Metal Oxide
Semiconductor Field Effect Transistors (MOSFETs).
The transistor consists of a stack of the conducting gate, an insulating layer of silicon dioxide
(SiO2) and the silicon wafer, also called as substrate, body or bulk.
A pMOS transistor consists of p-type source and drain region with an n-type body.
An nMOS transistor consists of n-type source and drain region with a p-type body.
Explain the accumulation mode, depletion layer and inversion layer of MOS transistor
with diagram.
The MOS transistor is a majority-carrier device, in which the current in a conducting channel
is controlled by gate voltage.
In an nMOS transistor, the majority carriers are electrons.
In a pMOS transistor, the majority carriers are holes.
Figure 2 shows a simple MOS structure. The top layer of the structure is a good conductor
called the gate.
Transistor gate is polysilicon, i.e., silicon formed from many small crystals. The middle layer
is a very thin insulating film of SiO2, called the gate oxide. The bottom layer is the doped
silicon body.
The figure 2 shows a p-type body, in which the carriers are holes. The body is grounded and
voltage is applied to the gate.
The gate oxide is a good insulator, so almost zero current flows from the gate to the body.
Accumulation mode:
In Figure 2(a), when a negative voltage is applied to the gate, negative charges are formed on
the gate.
The positively charged holes are attracted to the region under the gate. This is called the
accumulation mode.
Depletion mode:
In Figure 2(b), when a small positive voltage is applied to the gate, positive charges are
formed on the gate.
The holes in the body are repelled from the region directly under the gate, resulting in a
depletion region forming below the gate.
Inversion layer:
In Figure 2(c), when a higher positive potential greater than threshold voltage (Vt) is applied,
more positive charges are attracted to the gate.
The holes are repelled and some free electrons in the body are attracted to the region under the
gate. This conductive layer of electrons in the p-type body is called the inversion layer.
The threshold voltage depends on the number of dopants in the body and the thickness tox of
the oxide.
Figure 2: MOS structure demonstrating (a) accumulation, (b) depletion, and (c) inversion layer
Draw the small signal model of device during cut-off, linear and saturation. (April 2018)
Discuss the cutoff, linear and saturation region operation of MOS transistor. (Nov 2009)
The MOS transistor operates in cutoff region, linear region and saturation region.
Cutoff region:
In Figure 3(a), the gate-to-source voltage (Vgs) is less than the threshold voltage (Vt) and
source is grounded.
Junctions between the body and the source or drain are reverse biased, so no current flows.
Thus, the transistor is said to be OFF and this mode of operation is called cutoff.
If Vgs < Vt , the transistor is cutoff (OFF).
Linear Region:
In Figure 3(b), the gate voltage is greater than the threshold voltage.
An inversion region of electrons, called the channel connects the source and drain, creating a
conductive path and making the transistor ON.
If Vgs > Vt , the transistor turns ON. If Vds is small, the transistor acts as a linear resistor, in
which the current flow is proportional to Vds.
The number of carriers and the conductivity increases, with the gate voltage.
Figure 3: nMOS transistor demonstrating cutoff, linear, and saturation regions of operation
The voltage between drain and source is Vds = Vgs - Vgd. If Vds = 0 (i.e., Vgs = Vgd), there is
no electric field to push current from drain to source.
When a small positive voltage Vds is applied to the drain (Figure 3(c)), current Ids flows
through the channel from drain to source.
This mode of operation is termed as linear, resistive, triode, nonsaturated, or
unsaturated.
Saturation
region:
The current increases with increase in both the drain voltage and gate voltage.
If Vds becomes sufficiently large that Vgd < Vt , the channel is no longer inverted near the
drain and becomes pinched off (Figure 3(d)).
As electrons reach the end of the channel, they are injected into the depletion region near the
drain and accelerated toward the drain.
Above this drain voltage, current Ids are controlled only by the gate voltage. This mode is
called saturation.
If Vgs > Vt and Vds is large, the transistor acts as a current source, in which the current
flow becomes independent of Vds.
The pMOS transistor in Figure 4 operates in just the opposite fashion. The n-type body is tied
to high potential, junctions of p-type source and drains are normally reverse-biased.
When the gate has high potential, no current flows between drain and source.
When the gate voltage is lowered by a threshold Vt, holes are attracted to form a p-type
channel beneath the gate, allowing current to flow between drain and source.
Derive an expression to show the drain current of MOS for various operating region.
Explain one non-ideality for each operating region that changes the drain current. (NOV
2018)
Explain the dynamic behavior of MOSFET transistor with neat diagram. (April 2018)
Explain the electrical properties CMOS. (Nov 2017)
Explain in detail about the ideal I-V characteristics of a NMOS and PMOS device. (MAY
2013)
Discuss in detail with necessary equations the operation of MOSFET and its current-voltage
characteristics. (April/May 2011, May 2016).
Derive drain current of MOS device in different operating regions. (Nov/Dec
2014)(May/June 2013) (Nov 2012, Nov 2016)
Explain in detail about the ideal I-V characteristics and non-ideal I-V characteristics of a
NMOS and PMOS device. (May/June 2013)
Derive expressions for the drain-to-source current in the nonsaturated and saturated regions
of operation of an nMOS transistor. (Nov 2007, Nov 2008)
MOS transistor has three regions of operation:
Cutoff or sub threshold region
Linear region
Saturation region
The current through an OFF transistor is zero. When a transistor turns ON (Vgs > Vt), the gate
attract electrons to form a channel.
Current is measured from the amount of charge in the channel.
The charge on each plate of a capacitor is Q = CV. Thus, the charge in the channel Qchannel is
Qchannel = Cg (Vgc - Vt)
where Cg : Capacitance of the gate to the channel
Vgc - Vt : Amount of voltage attracting charge to the channel.
If the source is at Vs and the drain is at Vd ,
Average channel voltage is Vc = (Vs + Vd)/2 = Vs + Vds /2.
Gate and channel voltage Vgc is Vg – Vc = Vgs – Vds /2,
The ox/tox term is called as Cox. Capacitance (Cox) is a per unit area of the gate oxide.
Average velocity (v) of carrier is proportional to the lateral electric field (field between source and
drain). The constant of proportionality µ is called the mobility.
v = µE --------------(2)
The electric field E is the voltage difference between drain and source (Vds) divided by the
channel length (L).
V
E -------------(3)
ds
L
The time required for carriers to cross the channel is L divided by v.
The current between source and drain is the total amount of charge in the channel divided by the
time required to cross.
----------- (4)
Equation (4) is called linear or resistive, because when Vds << VGT, Ids increases linearly with
Vds, like an ideal resistor.
k’ is the k prime, k’ = µ Cox.
If Vds > Vdsat = VGT, the channel is no longer inverted in the drain region. Channel is pinched
off.
Beyond this point (called the drain saturation voltage), increasing the drain voltage has no
further effect on current.
Substituting Vds = Vdsat in Eq (4), we can find an expression for the saturation current (Ids)
that is independent of Vds.
V ---------------------
2
I (5)
ds 2 GT
This expression is valid for Vgs > Vt and Vds > Vdsat .
Summarizes the current in the three regions:
C – V CHARACTERISTICS OF MOS TRANSISTOR (AC characteristics)
Explain the dynamic behavior of MOSFET transistor with neat diagram. (April 2018)
Discuss the CV characteristics of the CMOS. (Nov 2012, May 2014, Nov
2015, Nov 2016)
Explain the electrical properties CMOS. (Nov 2017)
Each terminal of an MOS transistor has capacitance to the other terminals.
Capacitances are nonlinear and voltage dependent (C-V).
SIMPLE MOS CAPACITANCES MODEL:
The gate of an MOS transistor is a good capacitor. Its capacitance is necessary to attract
charge to invert the channel, so high gate capacitance is required to obtain high Ids.
The gate capacitor can be viewed as a parallel plate capacitor with the gate on top, channel on
bottom and the thin oxide dielectric between.
The capacitance is Cg = Cox WL. ----------------(1)
Cg = Cpermicron W --------------- (2)
In addition to the gate, the source and drain also have capacitances. These capacitances are
called parasitic capacitors.
The source and drain capacitances arise from the p–n junctions between the source or drain
diffusion and the body. These capacitances are called diffusion capacitance Csb and Cdb.
The depletion region acts as an insulator between the conducting p- and n-type regions,
creating capacitance across the junction.
The capacitance of junctions depends on the area and perimeter of the source and drain
diffusion, the depth of the diffusion, the doping levels and the voltage.
As diffusion has both high capacitance and high resistance, it is generally made as small as
possible in the layout.
The capacitance depends on both the area AS and sidewall perimeter PS of the source
diffusion region. The area is AS = WD.
The perimeter is PS = 2W +2D.
Where, Cjbs - Capacitance of the junction between the body and the bottom of the
source Cjbssw - Capacitance of the junction between the body and the side walls of the
source
In summary, MOS transistor can be viewed as a four-terminal device with capacitances
between each terminal pair, as shown in Figure 10.
Table 2: Relationships between voltages for the three regions of operation of a CMOS
inverter
Figure 12(a), shows Idsn and Idsp in terms of Vdsn and Vdsp for various values of Vgsn and Vgsp.
Figure 12(b), shows the same plot of Idsn and |Idsp| in terms of Vout for various values of Vin.
Operating points are plotted on Vout vs. Vin axes in Figure 12(c) to show the inverter DC
transfer characteristics.
The supply current IDD = Idsn = |Idsp| is plotted against Vin in Figure 13(d) showing that both
transistors are momentarily ON as Vin.
The operation of the CMOS inverter can be divided into five regions as indicated on figure
12(c).
UNIT-I EC8095-VLSI DESIGN
Derive the noise margins for a CMOS inverter. (May 2010, Nov2016)
(iii) Noise Margins:
Noise margin (Noise immunity) is related to the DC voltage characteristics.
Noise Margin allows determining the allowable noise voltage on the input of a gate, so that
theoutput will not be corrupted.
Two parameters of the noise margin are LOW noise margin (NML), and the HIGH
noisemargin (NMH).
NML is defined as the difference in maximum LOW input voltage VIL and the maximum
LOW output voltage VOL. NML = VIL - VOL
The value of NMH is the difference between the minimum HIGH output voltage VOH and the
minimum HIGH input voltage VIH. i.e., NMH = VOH - VIH
Inputs between VIL and VIH are said to be in the indeterminate region or forbidden zone.
(iv) Pass Transistor DC Characteristics:
The nMOS transistors pass 0’s well but 1’s poorly. Figure 15(a), shows an nMOS transistor
with the gate and drain tied to VDD.
Initially at Vs = 0. Vgs > Vtn, so the transistor is ON and current flow.
Therefore, nMOS transistors attempting to pass a 1 never pull the source above VDD – Vtn.
This loss is called a threshold drop.
The pMOS transistors pass 1’s well but 0’s poorly.
If the pMOS source drops below |Vtp|, the transistor cuts off.
Hence, pMOS transistors only pull down to a threshold above GND, as shown
in Figure 15(b).
Explain in detail about the non ideal I-V characteristics of a CMOS device. (MAY
2013)
Explain channel length modulation and body effect. (Nov 2009, May 2013)
MOS characteristics degrade with temperature. It is useful to have a qualitative
understanding of non ideal effects to predict their impact on circuit behavior.
(i) Mobility Degradation and Velocity Saturation:
Current is proportional to the lateral electric field Elat = Vds /L between source and drain.
A high voltage at the gate of the transistor attracts the carriers to the edge of the channel,
causing carriers collision with the oxide interface that slows the carriers. This is called
mobility degradation.
Carriers approach a maximum velocity (vsat) when high fields are applied. This
phenomenon is called velocity saturation.
(ii) Channel Length Modulation:
V
2 A
Hence, VA is proportional to channel length. This channel length modulation model is a
gross oversimplification of nonlinear behavior.
Threshold Voltage (Vt) Effects
Explain in detail about Threshold Voltage effect and its effect in MOS device. (May 2016)
Threshold voltage Vt increases with the source voltage, decreases with the body voltage,
decreases with the drain voltage and increases with channel length.
Body Effect:
When a voltage Vsb is applied between the source and body, it increases the amount of
charge required to invert the channel. Hence, it increases the threshold voltage.
The threshold voltage can be modeled as
where Vt0 is the threshold voltage when the source is at the body
potential, фs is the surface potential at threshold and γ is the body
effect coefficient.
(iv) Leakage:
Even when transistors are OFF, transistors leak small amounts of current.
Leakage mechanisms include subthreshold conduction between source and drain, gate
leakage from the gate to body and junction leakage from source to body and drain to body.
Subthreshold conduction is caused by thermal emission of carriers over the potential
barrier set by the threshold.
Gate leakage is a quantum-mechanical effect caused by tunneling through the extremely
thin gate dielectric.
Junction leakage is caused by current through the p-n junction between the source/drain
diffusions and the body.
SCALING
Discuss the scaling principles and its limits. (MAY 2013, Nov 2017, Nov 2018)
Discuss the principle of constant field and lateral scaling. Write the effects of the above
scaling methods on the device characteristics. (Nov 2012, Dec 2011, Nov 2015, May 2016)
Explain need of scaling, scaling principles and fundamental units of CMOS inverter. (May
2107)
In VLSI design, the transistor size has reduced by 30% every two to three years. Scaling
is reducing feature size of transistor.
Nowadays, transistors become smaller, switch faster, dissipate less power and cheaper.
Designers need to predict the effect of feature size scaling on chip performance to plan
future products and ensure existing products for cost reduction.
Transistor scaling:
Dennard’s Scaling Law predicts that the basic operational characteristics of a MOS
transistor can be preserved and the performance can be improved.
Parameters of a device are scaled by a dimensionless factor S.
These parameters include the following:
All dimensions (in the x, y, and z directions)
Device voltages
Doping concentration densities
Constant field scaling (Full Scaling):
In constant field scaling, electric fields remain the same as both voltage and distance
shrink.
1/S scaling is applied to all dimensions, device voltages and concentration densities.
Ids per transistor are scaled by 1/S.
2
No. of transistors per unit area is scaled by S .
Current density is scaled by S and power density remains constant.
1 1 2
o e.g., ( )S
S S
Lateral scaling (gate-shrink):
Another approach is lateral scaling, in which only the gate length is scaled.
This is commonly called as gate shrink, because it can be done easily to an existing mask
database for a design.
Ids per transistor are scaled by S.
No. of transistors per unit area is scaled by S.
2 2
Current density is scaled by S and power density is scaled by S .
The industry generally scales process generations with 30% shrink.
It reduces the cost (area) of a transistor by a factor of two.
A 5% gate shrink (S = 1.05) is commonly applied as a process, becomes mature to boost
the speed of components in that process.
Constant voltage scaling (Fixed scaling) offers quadratic delay improvement as well
as cost reduction.
It is also maintaining continuity in I/O voltage standards. Constant voltage scaling
increases the electric fields in devices.
Ids per transistor are scaled by S.
2
No. of transistors per unit area is scaled by S .
3 3
Current density is scaled by S and power density is scaled by S .
A 30% shrink with Dennard scaling improves clock frequency by 40% and cuts power
consumption per gate by a factor of 2.
Maintaining a constant field has the further benefit, that many nonlinear factors and wear
out mechanisms are unaffected.
From 90nm generation technology, voltage scaling is dramatically slowed down due to
leakage. This may ultimately limit CMOS scaling.
The gate that charges or discharges a node is called the driver. The gates and wires being
driven, are called the load. Propagation delay is usually called as delay.
Arrival times and propagation delays are defined separately for rising and falling transitions.
The delay of a gate may be different from different inputs. Earliest arrival times can also
be computed based on contamination delays.
Expression of delay for rising output is tPLH = 0.69
RP.CL Where, RP – effective resistance of pMOS
transistor
CL - load capacitance of CMOS inverter.
Expression of delay for falling output is tPHL = 0.69
RN.CL Where, RN – effective resistance of nMOS
transistor
Propagation delay of CMOS inverter is tP = (tPLH + tPHL) / 2
RC Delay Model:
Discuss in detail about the resistive and capacitive delay estimation of a CMOS inverter
circuit.
(MAY 2013) (or)
Briefly explain about the RC delay model.
RC delay model approximates the nonlinear transistor I-V and C-V characteristics with an
average resistance and capacitance over the switching range of the gate.
Effective Resistance:
Figure shows equivalent RC circuit models for nMOS and pMOS transistors of
width k with contacted diffusion on both source and drain.
The pMOS transistor has approximately twice the resistance of the nMOS
transistor, because holes have lower mobility than electrons.
Stick diagram
A stick diagram is a cartoon of a chip layout. A "stick diagram" is a paper and pencil tool
that use to plan the layout of a cell.
The stick diagram resembles the actual layout, but uses "sticks" or lines to represent the
devices and conductors. Figure 17, shows a stick diagram for an inverter.
The stick diagram represents the rectangles with lines, which represent wires and
component symbols.
The stick diagram does not represent all the details of a layout, but it makes some
relationship much clearer and it is simple to draw.
Layouts are constructed from rectangles, but stick diagrams are built from cartoon symbols
for components and wires.
Stick diagram Rules:
Rule 1: When two or more ‘sticks’ of the same type cross or touch each other, that
represents electrical contact.
Rule 2: When two or more ‘sticks’ of the different type cross or touch each other, there
is no electrical contact. If electrical contact is needed, we have to show the connection
explicitly.
Rule 3: When a poly crosses diffusion, it represents a transistor. If a contact is shown,
then it is not a transistor. A transistor exists where a polysilicon (red) stick crosses
either an n-diffusion (green) stick or a p-diffusion (yellow) stick.
Rule 4: In CMOS, a demarcation line is drawn to avoid touching of p-diff with n-diff.
All pMOS must lie on one side of the line and all nMOS will have to be on the other
side.
Drawing stick diagrams in color: Red for poly, green for n-diffusion, yellow for p-
diffusion, and shades of blue for metal are typical colors.
A few simple rules for constructing wires from straight-line segments ensure that, the stick
diagram corresponds to a feasible layout.
Wires cannot be drawn at arbitrary angles. Only horizontal and vertical wire segments are
allowed.
Two wire segments on the same layer, which cross are electrically connected.
Vias to connect wires, which do not normally interact, are drawn as black dots.
Figure 19, shows the stick figures for transistors.
Each type of transistor is represented as poly and diffusion crossings, much as in the layout.
Draw and explain briefly the n-well CMOS design rules. (NOV 2007, April 2008, MAY 2014)
Discuss in detail with a neat layout, the design rules for a CMOS inverter.
Write the layout design rules and draw diagram for four input NAND and NOR. (Nov
2016) (April 2018)
Layout rules also referred to as design rules.
It can be considered as prescription for preparing the photomasks, which are used in the
fabrication of integrated circuits.
The rules are defined in terms of feature sizes (widths), separations and overlaps.
The main objective of the layout rules is to build reliable functional circuits in as small
area as possible.
Layout design rules describe how small features can be and how closely they can be
reliably packed in a particular manufacturing process.
Design rules are a set of geometrical specifications that dictate the design of the layout
masks.
A design rule set provides numerical values for minimum dimensions and line spacing.
Scalable design rules are based on a single parameter (λ), which characterizes the
resolution of the process. λ is generally half of the minimum drawn transistor channel
length.
This length is the distance between the source and drain of a transistor and is set by the
minimum width of a polysilicon wire.
Lambda based rule (Scalable design rule):
Lambda-based rules are round up dimensions of scaling to an integer multiple of λ.
Lambda rules make scaling layout small. The same layout can be moved to a new process,
simply by specifying a new value of λ.
The minimum feature size of a technology is characterized as 2λ.
Micron Design Rules (Absolute dimensions):
The MOSIS rules are expressed in terms of lambda.
These rules allow some degree of scaling between processes.
Only need to reduce the value of lambda and the designs will be valid in the next process
down in size.
These processes rarely shrink uniformly.
Thus, industry usually uses the actual micron design rules for layouts.
There are set of micron design rules for a hypothetical 65 nm process.
We can observe that, these rules differ slightly but not immensely from lambda based rules
with lambda = 0.035 micro meter.
Upper level metal rules are highly variable depending on the metal thickness. Thicker
wires require greater widths, spacing and bigger vias.
Two metal layers in an n-well process has the following:
Metal and diffusion have minimum width and spacing of 4 λ.
Contacts are 2 λ × 2 λ and must be surrounded by 1 λ on the layers above and below.
Polysilicon uses a width of 2 λ.
Polysilicon overlaps diffusion by 2 λ where a transistor is desired and has a
spacing of 1 λ away where no transistor is desired.
Figure: Simplified λ -based design rules with CMOS inverter layout diagram
Design Rule:
Well Rules:
The n-well is usually a deeper implant than the transistor source/drain implants.
Therefore, it is necessary to provide sufficient clearance between the n-well edges and the
adjacent n+ diffusions.
Transistor Rules:
CMOS transistors are generally defined by at least four physical masks.
There are active (also called diffusion, diff, thinox, OD, or RX), n-select (also called n-
implant, n-imp, or nplus), p-select (also called p-implant, pimp, or pplus) and polysilicon
(also called poly, polyg, PO, or PC).
The active mask defines all areas, where n- or p-type diffusion is to be placed or where the
gates of transistor are to be placed.
Contact Rules:
There are several generally available contacts:
Metal to p-active (p-diffusion)
Metal to n-active (n-diffusion)
Metal to polysilicon
Metal to well or substrate
Metal
Rules:
Metal spacing may vary with the width of the metal line.
Metal wire width of minimum spacing may be increased. This is due to etch characteristics
versus large metal wires.
Via Rules:
Processes may allow vias to be placed over polysilicon and diffusion regions.
Some processes allow vias to be placed within these areas, but do not allow the vias to the
boundary of polysilicon or diffusion.
Example: NAND3
Draw the layout diagram of NAND. (May 2017)
The npn transistor is formed between the grounded n-diffusion source of the nMOS transistor, the
p-type substrate and the n-well.
The resistors are due to the resistance through the substrate or well to the nearest
substrate and well taps.
The cross-coupled transistors form a bistable silicon-controlled rectifier (SCR). Both
parasitic bipolar transistors are OFF.
Latchup can be triggered, when transient currents flow through the substrate during
normal chip power-up.
Latchup prevention is easily accomplished by
Minimizing Rsub and Rwell.
Use of guard rings
SOI process avoids latchup entirely, because they have no parasitic bipolar structures.
Process
parameters for MOS and CMOS:
CMOS
TECHNOLOGIES:
The four main CMOS
technologies are
n-Well process
p-Well
process
Twin-tub
Process
Silicon on Insulator
Explain the different steps involved in CMOS fabrication / manufacturing process with neat
diagrams.
(Nov 2007, Nov 2009, Nov 2016, NOV 2018)
Describe with neat diagram the n-well and channel formation in CMOS process. (Nov/Dec
2014)(Nov/Dec 2011) (April/May 2011) (Nov/Dec 2012)
n-WELL PROCESS:
Step 1: Start with blank wafer
First step will be to form the n-well
– Cover wafer with protective layer of SiO2 (oxide)
– Remove layer where n-well should be built.
p substrate
Step 2: Oxidation
0
Grow SiO2 on top of Si wafer, at 900 – 1200 C with H2O or O2 in oxidation furnace.
SiO2
p substrate
Step 3: Photoresist
• Spin on photoresist
– Photoresist is a light-sensitive organic polymer.
– Softens, where exposed to light.
Photoresist
SiO2
p substrate
Step 4: Lithography
• Expose photoresist through n-well mask.
• Strip off exposed photoresist.
Photoresist
SiO2
p substrate
Step 5: Etch
• Etch oxide with hydrofluoric acid (HF).
• Only attracts oxide, where resist has been exposed.
P h o to re s is t
S iO 2
p s u b s tra te
p substrate
Step 7: n-well
n-well is formed with diffusion or ion implantation.
SiO2
n well
n well
p substrate
Step 9: Polysilicon
• Deposit thin layer of oxide. Use CVD to form poly and dope heavily to
increase conductivity.
Polysilicon
Thin gate oxide
n well
p substrate
Step 16: Metallization
• Sputter on aluminum over whole wafer.
• Pattern to remove excess metal, leaving wires.
Metal
Thick field oxide
p+ n+ n+ p+ p+ n+
n well
p substrate
********
P-WELL PROCESS:
A common approach to p-well CMOS fabrication is to start with moderately doped n-type
substrate (wafer), create the p-type well for the n-channel devices and build the p-channel
transistor in the native n-substrate.
Explain the twin tub process with a neat diagram. (Nov 2007, April
2008)
Twin-tub process:
Step 1:
n- Substrate is taken initially, which is shown in figure.
Step 2:
Next step is epitaxial layer deposition. Lightly doped epitaxial layer is deposited above n-
substrate.
Step 3:
The next step is tub formation. Two wells are formed namely n-well and p-well.
Polysilicon layer is formed above overall substrate.
Step 4:
Polysilicon gates are formed for n-well and p-well by using photo-etching process.
Step 5:
+ +
n diffusion is formed in n-well, P diffusion is formed in p-well. These are used for VDD
contact and VSS contact. These are known as substrate formation.
Step 6:
Then, contact cuts are defined as in n-well process. Then metallization is processed.
Elmore’s Delay
What is meant by Elmore’s delay and give expression for Elmore’s delay?
The Elmore delay model estimates the delay from a source, switching to one of the leaf nodes.
Delay is the sum over each node i of the capacitance Ci on the node multiplied by the effective
resistance R.
Propagation delay time :
t
C
R
pd i to source i
nodes i
Where Cout is the capacitance of the external load being driven and Cin is the capacitance of
the gate.
Normalized delay vs electrical effort for an idealized inverter and 3-input NAND gate shown
in
diagram.
The
y-intercepts indicate the parasitic delay. The slope of the lines is the logical effort.
The inverter has a slope of 1. The NAND gate has a slope of 5/3.
Design a four input NAND gate and obtain its delay during the transition from high to
low. (April 2018)
Figure shows a model of an n-input NAND gate in which the upper inputs were all 1
and the bottom input rises. The gate must discharge the diffusion capacitances of all of the
internal nodes as well as the output.
Elmore delay is
2 2
n 5 4
5 16 20
Delay for 4 input NAND gate: n RC = 4 RC = RC =18RC
2 2 2 2 2 2
Obtain the logical effort and path efforts of the given circuit. (April 2018)
Delay in Multistage Logic Networks:
The figure shows the logical and electrical efforts of each stage in a multistage path
as a function of the sizes of each stage.
The path of interest (the only path in this case) is marked with the dashed blue line. Observe
that logical effort is independent of size, while electrical effort depends on sizes.
The path logical effort G can be expressed as the products of the logical efforts of each
stage along the path.
G gi
The path electrical effort H can be given as the ratio of the output capacitance
the path must drive divided by the input capacitance presented by the path
The path effort F is the product of the stage efforts of each stage.
F f i g i hi
Introduce an effort to account for branching between stages of a path. This branching
effort b is the ratio of the total capacitance seen by a stage to the capacitance on the path.
C C
b onpath offpath
C
onpath
The path branching effort B is the product of the branching efforts between stages.
B bi
The path effort (F) is defined as the product of the logical, electrical, and
branching efforts of the path. The product of the electrical efforts of the stages is actually
BH, not just H.
F=GBH
Compute the delay of a multistage network. The path delay D is the sum of
the delays of each stage. It can also be written as the sum of the path effort delay DF
D d i DF P
D f
F i
P pi
The product of the stage efforts is F, independent of gate sizes. The path effort
delay is the sum of the stage efforts. The sum of a set of numbers whose product is constant
is minimized by choosing all the numbers to be equal.
The path delay is minimized when each stage bears the same effort. If a path has N
stages and each bears the same effort, that effort must be
fˆ = gi hi = F 1 / N
Thus, the minimum possible delay of an N-stage path with path effort F and
path parasitic delay P is
D=NF1/N +P
It shows that the minimum delay of the path can be estimated knowing only
the number of stages, path effort, and parasitic delays without the need to assign transistor
sizes.
Bubble pushing
CMOS stages are inherently inverting, so AND and OR functions must be built from NAND
and NOR gates.
DeMorgan’s law helps with this conversion:
A.B AB
AB A.B
A
NAND gate is equivalent to an OR of inverted inputs.
A
NOR gate is equivalent to an AND of inverted inputs.
The
same relationship applies to gates with more inputs.
Switching between these representations is easy and is often called bubble pushing.
Compound Gates
Static CMOS also efficiently handles compound gates computing various inverting
combinations
of AND/OR functions in a single stage.
The function F = AB +CD can be computed with an AND-OR INVERT- 22 (AOI22) gate
and an inverter, as shown in Figure.
Logical
effort of compound gates can be different for different inputs.
Figure shows, how logical efforts can be estimated for the AOI21, AOI22 and a more
complex compound AOI gate.
The logical effort and parasitic delay of different gate inputs are different.
Consider the falling output transition occurring, when one input hold a stable 1 value
and the other rises from 0 to 1.
If input B rises last, node x will initially be at VDD – Vt = VDD, because it was pulled
up through the nMOS transistor on input A.
The Elmore delay is (R/2)(2C) + R(6C) =7RC=2.33 τ
If input A raises last, node x will initially be at 0 V, because it was discharged through
the nMOS transistor on input B.
No charge must be delivered to node x, so the Elmore delay is simply R(6C) =6RC =2τ.
We define the outer input to be the input closer to the supply rail (e.g., B) and the
inner input to be the input closer to the output (e.g., A).
Therefore, if one signal is known to arrive later than the others, the gate is faster when
that signal is connected to the inner input.
d. Asymmetric gates
When one input is far less critical than another, even symmetric gates can be made
asymmetric to favor the late input at the expense of the early one.
In a series network, this involves connecting the early input to the outer transistor and
making the transistor wider, so that, it offers less series resistance when the critical
input arrives.
In a parallel network, the early input is connected to a narrower transistor to reduce
the parasitic capacitance.
Consider the path in Figure (a). Under ordinary conditions, the path acts as a buffer
between A and Y.
What is meant by skewed gate and give functions of skewed gate with schematic diagrams?
One input transition is more important than the other. HI-skew gates to favor the rising output
transition.
Alternative (ratioed circuits, dynamic circuits and pass transistor circuits) CMOS logic
configurations are called circuit families.
nMOS transistors provide more current than pMOS for the same size and capacitance, so
nMOS networks are preferred.
Examples of combinational circuits
(i) CMOS inverter:
1
UNIT-II EC6601-VLSI DESIGN
*************************************************************************************************
2
UNIT-II EC6601-VLSI DESIGN
Briefly discuss about the classification of circuit families and comparison of the circuit
families. (May 2014, APRIL-2015)
Draw the CMOS logic circuit for the Boolean expression Z= A( B C ) DE and explain.
(April 2018)
3
UNIT-II EC6601-VLSI DESIGN
The logical effort and parasitic delay of different gate inputs are different.
Consider the falling output transition occurring, when one input hold a stable 1 value and the
other rises from 0 to 1.
4
UNIT-II EC6601-VLSI DESIGN
If input B rises last, node x will initially be at VDD – Vt = VDD, because it was pulled up
through the nMOS transistor on input A.
The Elmore delay is (R/2)(2C) + R(6C) =7RC=2.33 τ
If input A raises last, node x will initially be at 0 V, because it was discharged through the
nMOS transistor on input B.
No charge must be delivered to node x, so the Elmore delay is simply R(6C) =6RC =2τ.
d. Asymmetric gates
When one input is far less critical than another, even symmetric gates can be made asymmetric
to favor the late input at the expense of the early one.
In a series network, this involves connecting the early input to the outer transistor and making
the transistor wider, so that, it offers less series resistance when the critical input arrives.
In a parallel network, the early input is connected to a narrower transistor to reduce the
parasitic capacitance.
Consider the path in Figure (a). Under ordinary conditions, the path acts as a buffer between A
and Y.
When reset is asserted, the path forces the output low.
If reset only occurs under exceptional circumstances and take place slowly, the circuit should
be optimized for input-to-output delay at the expense of reset.
This can be done with the asymmetric NAND gate in Figure (b).
5
UNIT-II
EC6601-VLSI DESIGN
6
2.4.1: Differential Pass Transistor Logic
For high performance design, a differential pass-transistor logic family, called CPL or DPL, is
commonly used.
The basic idea is to accept true and complementary inputs and produce true and
complementary outputs.
A number of CPL gates (AND/NAND, OR/NOR, and XOR/NXOR) are shown in Figure.
Since the circuits are differential, complementary data inputs and outputs are always available.
Both polarities of every signal eliminate the need for extra inverters, as is often the case in
static CMOS or pseudo-NMOS.
CPL belongs to the class of static gates, because the output-defining nodes are always
connected to either VDD or GND through a low resistance path.
This is advantage for the noise flexibility.
Discuss in detail the characteristics of CMOS Transmission gates.(May 2016, May 2017, Nov 2017)
Explain Transmission gates with neat sketches. (April 2008, April 2018)
List out limitations of pass transistor logic. Explain any two techniques used to overcome
limitations. (NOV 2018)
A transmission gate in conjunction with simple static CMOS logic is called CMOS with
transmission gate.
A transmission gate is parallel pairs of nMOS and pMOS transistor.
A single nMOS or pMOS pass transistor suffers from a threshold drop.
Transmission gates solve the threshold drop but require two transistors in parallel.
The resistance of a unit-sized transmission gate can be estimated as R for the purpose of
delay estimation.
Current flow the parallel combination of the nMOS and pMOS transistors. One of the
transistors is passing the value well and the other is passing it poorly.
7
A logic-1 is passed well through the pMOS but poorly through the nMOS.
Estimate the effective resistance of a unit transistor passing a value in its poor direction as
twice the usual value: 2R for nMOS and 4R for pMOS.
8
UNIT-II EC6601-VLSI DESIGN
2.5.1:Ratioed Circuits:
The ratioed gate consists of an nMOS pulldown network and pullup device called the static
load.
When the pulldown network is OFF, the static load pulls the output to 1.
When the pulldown network turns ON, it fights the static load.
The static load must be weak enough that, the output pulls down to an acceptable 0. Hence,
there is a ratio constraint between the static load and pulldown network.
Explain the detail about pseudo-nMOS gates with neat circuit diagram. (April/May 2011)
(Nov/Dec 2013)
9
UNIT-II EC6601-VLSI DESIGN
2
Figure: symmetric NOR gate.
10
UNIT-II EC6601-VLSI DESIGN
When one input is 0 and the other 1, the gate can be viewed as a pseudo-nMOS circuit
with appropriate ratio constraints.
When both inputs are 0, both pMOS transistors turn on in parallel, pulling the output
high faster than they would, in an ordinary pseudo nMOS gate.
When both inputs are 1, both pMOS transistors turn OFF, saving static power dissipation.
Cascode Voltage Switch Logic (CVSL) seeks the benefits of ratioed circuits without the static
power consumption.
It uses both true and complementary input signals and computes both true and complementary
outputs using a pair of nMOS pulldown networks, as shown in Figure (a).
The pulldown network f implements the logic function as in a static CMOS gate, while f uses
inverted inputs feeding transistors arranged in the conduction complement.
For any given input pattern, one of the pulldown networks will be ON and the other OFF.
The pulldown network that is ON will pull that output low.
This low output turns ON the pMOS transistor to pull the opposite output high.
When the opposite output rises, the other pMOS transistor turns OFF, so no static power
dissipation occurs.
Figure (b) shows a CVSL AND/NAND gate.
Advantage:
CVSL has a potential speed advantage because all of the logic is performed with nMOS
transistors, thus reducing the input capacitance.
11
UNIT-II EC6601-VLSI DESIGN
Describe the basic principle of operation of dynamic CMOS, domino and NP domino logic
with neat diagrams. (NOV 2011)
Dynamic Circuits:
Ratioed circuits reduce the input capacitance by replacing the pMOS transistors connected to
the inputs with a single resistive pullup.
The drawbacks of ratioed circuits include
o Slow rising transitions,
o Contention on the falling transitions,
o Static power dissipation and a nonzero VOL.
Dynamic circuits avoid these drawbacks by using a clocked pullup transistor rather than a
pMOS that is always ON.
Figure compares (a) static CMOS, (b) pseudo-nMOS, and (c) dynamic inverters.
Figure: Comparison of (a) static CMOS, (b) pseudo-nMOS, and (c) dynamic inverters
Dynamic circuit operation is divided into two modes, as shown in Figure.
(i) During precharge, the clock ф is 0, so the clocked pMOS is ON and initializes the output Y
high.
(ii) During evaluation, the clock is 1 and the clocked pMOS turns OFF. The output may
remain high or may be discharged low through the pulldown network.
12
UNIT-II EC6601-VLSI DESIGN
Foot transistor:
In Figure (c), if the input A is 1 during precharge, contention will take place because both the
pMOS and nMOS transistors will be ON.
When the input cannot be guaranteed to be 0 during precharge, an extra clocked evaluation
transistor can be added to the bottom of the nMOS stack.
To avoid contention as shown in the below figure, extra transistor is sometimes called as foot
is added.
The given below figure estimates the falling logical effort of both footed and
unfooted dynamic gates.
13
UNIT-II EC6601-VLSI DESIGN
Footed gates have higher logical effort than their unfooted concept but are still an
improvement over static logic.
The parasitic delay does increase with the number of inputs, because there is more diffusion
capacitance on the output node.
A fundamental difficulty with dynamic circuits is the monotonicity requirement. While a
dynamic gate is in evaluation, the inputs must be monotonically rising.
That is, the input can start LOW and remain LOW, start LOW and rise HIGH, start HIGH and
remain HIGH, but not start HIGH and fall LOW.
Figure shows waveforms for a footed dynamic inverter in which the input violates
monotonicity.
14
UNIT-II EC6601-VLSI DESIGN
Explain the domino logic families with neat diagrams. (NOV 2012, APRIL-2015, Nov 2017)
Dual-rail domino gates encode each signal with a pair of wires. The input and output signal
pairs are denoted with _h and _l, respectively.
Table summarizes the encoding. The _h wire is asserted to indicate that the output of the gate
is “high” or 1. The _l wire is asserted to indicate that the output of the gate is “low” or 0.
When the gate is precharged, neither _h nor _l is asserted. The pair of lines should never be
both asserted simultaneously during correct operation.
15
UNIT-II EC6601-VLSI DESIGN
Dual-rail domino gates accept both true and complementary inputs and compute both true and
complementary outputs, as shown in Figure (a).
This is identical to static CVSL circuits except that the cross-coupled pMOS transistors are
instead connected to the precharge clock.
Therefore, dual-rail domino can be viewed as a dynamic form of CVSL, sometimes called
DCVS.
Figure (b) shows a dual-rail AND/NAND gate and Figure (c) shows a dual-rail XOR/XNOR
gate. The gates are shown with clocked evaluation transistors, but can also be unfooted.
Dynamic circuits also suffer from charge leakage on the dynamic node.
If a dynamic node is precharged high and then left floating, the voltage on the
dynamic node will drift over time due to subthreshold, gate and junction leakage.
Dynamic circuits have poor input noise margins.
If the input rises above Vt,, while the gate is in evaluation, the input transistors will
turn ON weakly and can incorrectly discharge the output.
Both leakage and noise margin problems can be addressed by adding a keeper circuit.
Figure shows a conventional keeper on a domino buffer. The keeper is a weak
transistor that holds, or staticizes, the output at the correct level when it would
otherwise float.
Prepared by: B.ARUNKUMAR,AP/ECE
16
UNIT-II EC6601-VLSI DESIGN
When the dynamic node X is high, the output Y is low and the keeper is ON to prevent
X from floating.
When X falls, the keeper initially opposes the transition, so it must be much weaker
than the pulldown network.
Eventually Y rises, turning the keeper OFF and avoiding static power dissipation.
17
UNIT-II EC6601-VLSI DESIGN
Also suppose that the intermediate node x had a low value from a previous cycle.
During evaluation, input A rises, but input B remains low, so the output Y should remain high.
However, charge is shared between CX and CY, shown in Figure (b). This behaves as a
capacitive voltage divider and the voltages equalize at
Charge sharing is serious when the output is lightly loaded (small CY ) and the internal
capacitance is large.
If the charge-sharing noise is small, the keeper will eventually restore the dynamic output to
VDD.
If the charge-sharing noise is large, the output may flip and turn off the keeper, leading to
incorrect results.
Charge sharing can be overcome by precharging some or all of the internal nodes with
secondary precharge transistors.
These transistors should be small, because they only charge the small internal capacitances
and their diffusion capacitance slows the evaluation.
It is sufficient to precharge every other node in a tall stack.
The HI-skew inverting static gates are replaced with predischarged dynamic gates using
pMOS logic.
A footed dynamic p-logic NAND gate is shown in Figure (b). When ф is 0, the first and third
stages precharge high while the second stage predischarges low.
When ф rises, all the stages evaluate. Domino connections are possible, as shown in Figure
(c).
18
UNIT-II EC6601-VLSI DESIGN
In an ordinary dynamic gate, the input has a low noise margin (about Vt ), but is strongly
driven by a static CMOS gate.
The floating dynamic output is more prone to noise from coupling and charge sharing, but
drives another static CMOS gate with a larger noise margin.
In NORA, however, the sensitive dynamic inputs are driven by noise prone dynamic outputs.
Besides drawback and the extra clock phase requirement, there is little reason to use NORA.
Zipper domino is a closely related technique, that leaves the precharge transistors slightly ON
during evaluation by using precharge clocks. This swing between 0 and VDD – |Vtp| for the
pMOS precharge and Vtn and VDD for the nMOS precharge.
Figure : NP Domino
**********************************************************************************
19
UNIT-II EC6601-VLSI DESIGN
Explain the static and dynamic power dissipation in CMOS circuits with necessary
diagrams and expressions. (DEC 2011, Nov 2015, NOV 2016, May 2017, May 2010)
What are the sources of power dissipation in CMOS and discuss various design
techniques to reduce power dissipation in CMOS? (Nov 2012, May 2013, Nov
2014, May 2016)
The instantaneous power P (t) consumed by a circuit element is the product of the current and
the voltage of the element
P (t ) = I (t )V (t )
The energy consumed over time interval T is the integral of the instantaneous
power E T P (t) dt
0
E 1T
The average power is Pavg T T 0 P (t) dt
Power is expressed in units of Watts (W). Energy is usually expressed in Joules ( J)
By Ohm‟s Law, V = IR, so the instantaneous power dissipated in the resistor is
V 2 (t )
R 2
P R ( t) R I R
This power is converted from electricity to heat. VDD supplies power proportional to its
current PVDD (t) =IDD (t) VDD
When the capacitor is charged from 0 to VC, it stores energy EC
Figure shows a CMOS inverter driving a load capacitance.
When the input switches from 1 to 0, the pMOS transistor turns ON and charges the load to
VDD.
According to EC equation the energy stored in the capacitor is
The energy delivered from the power supply is
Gate switches at some average frequency fsw.
20
UNIT-II EC6601-VLSI DESIGN
Over some interval T, the load will be charged and discharged Tfsw times.
Then, the average power dissipation is
This is called the dynamic power because it arises from the switching of the load.
Because most gates do not switch every clock cycle, it is often more convenient to express
switching frequency fsw as an activity factor α times the clock frequency f.
The dynamic power dissipation may be rewritten as
The activity factor is the probability that the circuit node transitions from 0 to 1, because that
is the only time the circuit consumes power.
A clock has an activity factor of α = 1 because it rises and falls every cycle.
The total power of a circuit is calculated as,
Pdynamic = Pswitching + Pshort circuit
21
UNIT-II EC6601-VLSI DESIGN
Explain various ways to minimize the static and dynamic power dissipation. (Nov 2013, May 2015)
Discuss the low power design principles in detail. (Nov 2017)
Low power design involves considering and reducing each of the terms in switching power.
i. As VDD is a quadratic term, it is good to select the minimum VDD.
ii. Choose the lowest frequency.
iii. The activity factor is reduced by putting unused blocks to sleep.
iv. Finally, the circuit may be optimized to reduce the overall load capacitance.
Switching power is consumed by delivering energy to charge a load capacitance, then
dumping this energy to GND.
Activity factor:
If a circuit can be turned OFF entirely, the activity factor and dynamic power go to zero.
Blocks are typically turned OFF, by stopping the clock called as clock gating.
The activity factor of a logic gate can be estimated by calculating the switching probability.
(a)Clock gating:
Clock gating, AND‟s a clock signal with an enable to turn OFF the clock to idle
blocks.
The clock enable must be stable, while the clock is active.
Figure shows how an enable latch can be used to ensure the enable does not change
before the clock falls.
Capacitance:
Switching capacitance comes from the wires and transistors in a circuit.
Wire capacitance is minimized through good floor planning and placement.
Device-switching capacitance is reduced by choosing smaller transistors.
Voltage:
Voltage has a quadratic effect on dynamic power.
Therefore, choosing a lower power supply significantly reduces power
consumption.
The chip may be divided into multiple voltage domains, where each domain is
optimized for the needs of certain circuits.
a. Voltage domains:
Selecting, which circuits belong in which domain and routing power supplies to
multiple domains.
Figure (Voltage domain crossing) shows direct connection of inverters in two
domains using high and low supplies, VDDH and VDDL, respectively.
22
UNIT-II EC6601-VLSI DESIGN
where Ioff is the subthreshold current at Vgs = 0 and Vds = VDD, and S is the subthreshold
slope.
23
UNIT-II EC6601-VLSI DESIGN
2. Gate leakage:
Gate leakage occurs when carriers tunnel through a thin gate dielectric, when a voltage is
applied across the gate (e.g., when the gate is ON).
Gate leakage is a strong function of the dielectric thickness.
3. Junction leakage:
Junction leakage occurs when a source or drain diffusion region is at a different potential
from the substrate.
Leakage of reverse-biased diodes is usually negligible.
4. Contention current:
Static CMOS circuits have no contention current. However, certain alternative circuits
inherently draw current even while quiescent.
2.7.2.2: Methods of reducing static power:
Power gating:
To reduce static current during sleep mode is, to turn OFF the power supply to the
sleeping blocks. This technique is called power gating.
The logic block receives its power from a virtual VDD rail, VDDV.
When the block is active, the header switch transistors are ON, connecting VDDV to
VDD.
When the block goes to sleep, the header switch turns OFF, allowing VDDV to float
and gradually sink toward 0.
Multiple threshold voltage and oxide thickness:
Selective application of multiple threshold voltages can maintain performance on
critical paths with low-Vt transistors, while reducing leakage on other paths with high-
Vt transistors.
Variable threshold voltage:
Method to achieve high Ion in active mode and low Ioff in sleep mode is, by adjusting
the threshold voltage of the transistor by applying a body bias.
This technique is sometimes called variable threshold CMOS (VTCMOS).
Figure shows a schematic of an inverter using body bias.
**********************************************************************
Prepared by: B.ARUNKUMAR,AP/ECE
24
VLSI circuit design for low power:
The growing market of portables such as cellular phones, gaming consoles and battery-powered
electronic systems demands microelectronic circuits design with ultra low power dissipation.
As the integration, size, and complexity of the chips continue to increase, the difficulty in providing
adequate cooling might either add significant cost or limit the functionality of the computing systems
which make use of those integrated circuits
As the technology node scales down to 65nm, there is not much increase in dynamic power
dissipation. However the static or leakage power reaches or exceeds the dynamic power levels
beyond 65nm technology node.
Hence the techniques to reduce power dissipation are not limited to dynamic power. In this, we
discuss circuit and logic design approaches to minimize dynamic, leakage and Total Power
dissipated in a CMOS circuit is sum of dynamic power, short circuit power and static or leakage
power.
Design for low-power implies the ability to reduce all three components of power consumption in
CMOS circuits during the development of a low power electronic product.
In the sections to follow we summarize the most widely used circuit techniques to reduce each of
these components of power in a standard CMOS design.
Dynamic/Switching power is due to charging and discharging of load capacitors driven by the
circuit. Supply voltage scaling has been the most adopted approach to power optimization, since it
normally yields considerable power savings due to the quadratic dependence of switching/dynamic
power Pswitching on supply voltage VDD.
However lowering the supply voltage affects circuit speed which is the major short-coming of this
approach. So both design and technological solutions must be applied to compensate the decrease in
circuit performance introduced by reduced voltage. Some of the techniques often used to reduce
dynamic power are described below.
25
Logic Design for Low Power:
Choices between static versus dynamic topologies, conventional CMOS versus pass-transistor logic styles
and synchronous versus asynchronous timing styles have to be made during the design of a circuit.
In static CMOS circuits, the component of power due to short circuit current is about 10% of
the total power consumption.
However in dynamic circuits we don't come across this problem, since there is no any direct
dc path from supply voltage to ground.
Only in domino-logic circuits there is such a path, in order to reduce sharing, hence there is a small
amount of short-circuit power dissipation.
During logic optimization for low power, technology parameters such as supply voltage are fixed,
and the degrees of freedom are in selecting the functionality and sizing the gates.
Path equalization with buffer insertion is one of the techniques which ensure that signal propagation
from inputs to outputs of a logic network follows paths of similar length to overcome glitches.
When paths are equalized, most gates have aligned transitions at their inputs, thereby minimizing
spurious switching activity/glitches (which is created by misaligned input transitions).
26
Figure: Logic Remapping for Low Power
Other logic-level power minimization techniques include local transformations as shown in figure
above.
A re-mapping transformation is shown, where a high-activity node (marked with x) is removed and
replaced by new mapping onto an and or gate.
Standby Mode Leakage Suppression:
Static/Leakage power, originates from substrate currents and sub threshold leakages. For
technologies 1 µm and above, PSwitching was predominant.
However for deep-submicron processes below 180nm, PLeakage becomes dominant factor. Leakage
power is a major concern in recent technologies, as it impacts battery lifetime.
CMOS technology has been extremely power-efficient when transistors are not switching or in
stand-by mode, and system designers expect low leakage from CMOS chips.
To meet leakage power constraints, multiple-threshold and variable threshold circuit techniques are
often used.
In multiple-threshold CMOS, the process provides two different threshold transitors. Low-threshold
are employed on speed-critical sub-circuits and ther are fast and leaky.
High-threshold transistors are slower but exhibit low sub-threshold leakage, and they are employed
in noncritical/slow paths of the chip.
As more transistors become timing-critical multiple-threshold techniques tend to lose effectiveness.
Variable Body Biasing:
Variable-threshold circuits dynamically control the threshold voltage of transistors through substrate
biasing and hence overcome shortcoming associated with multi-threshold design.
When a variable-threshold circuit is in standby, the substrate of NMOS transistors is negatively
biased, and their threshold increases because of the body-bias effect.
Similarly the substrate of PMOS transistors is biased by positive body bias to increase their Vt in
stand-by. Variable-threshold circuits can, in principle, solve the quiescent/static leakage problem,
but they require control circuits that modulate substrate voltage in stand-by.
Fast and accurate body-bias control with control circuit is quite challenging, and requires carefully
designed closed-loop control.
When the circuit is in standby mode the bulk/body of both PMOS and NMOS are biased by third
supply voltage to increase the Vt of the MOSFET as shown in the Figure.
However during normal operation they are switched back to reduce the Vt.
27
Figure: Variable Body Biasing
Sleep Transistors:
Sleep Transistors are High Vt transistors connected in series with low Vt logic as shown below .
When the main circuit consisting of Low Vt devices are ON the sleep transistors are also ON
resulting in normal operation of the circuit.
When the circuit is in Standby mode even High Vt transistors are OFF.
Since High Vt devices appear in series with Low Vt circuit the leakage current is determined by
High Vt devices and is very low.
So the net static power dissipation is reduced.
In dynamic threshold CMOS (DTMOS), the threshold voltage is altered dynamically to suit the
operating state of the circuit.
A high threshold voltage in the standby mode gives low leakage current, while a low threshold
voltage allows for higher current drives in the active mode of operation.
Dynamic threshold CMOS can be achieved by tying the gate and body together.
The supply voltage of DTMOS is limited by the diode built-in potential in bulk silicon technology.
The PN diode between source and body should be reverse biased.
28
Hence, this technique is only suitable for ultralow voltage (0.6V and below) circuits in bulk CMOS.
Short-circuit power, is caused by the short circuit currents that arise when pairs of PMOS/NMOS
transistors are conducting simultaneously.
In static CMOS circuits, short-circuit path exists for direct current flow from VDD to ground, when
VTn< Vin< VDD-|VTp|
29
UNIT – III
Static latches and Registers Dynamic latches and Registers, Pulse Registers, Sense Amplifier Based
Register, Pipelining, Schmitt Trigger, Monostable Sequential Circuits, Astable Sequential Circuits.
Timing Issues: Timing Classification of Digital System, Synchronous Design.
Cross-coupled inverter pai r is biased at point C. It is amplified and regen erated around the
circuit loop.
The bias point moves away from C until one of the operation points A or B is reached.
C is an unstable operation point. Every deviation causes the operation point to run away
from its original bias. Operation points with this property are termed as metastable.
A bistable circuit has two stable states. In absence of any triggering, the circuit remains in a
single state.
A trigger pulse must be applied to change the state of the circuit.
Common name for a bistable circuit is flip-flop.
3.1.2 SR Flip-Flops
The SR or set-reset flip- flop implementation is shown in Figure (a) below.
This circuit is similar to the cross-coupled inverter pair with NOR gates replacing the
inverters.
The second input of the NO R gates is connected to the trigger inputs (S and R), that make
it possible to force the outputs Q and Q bar.
These outputs are complimentary (except for the SR = 11 state).
When both S and R are 0, the flip-flop is in a quiescent state and both outputs retain their
value.
If a positive (or 1) pulse is applied to the S input, the Q output is force d into the 1 state
(with Qbar going to 0).
Vice versa, a 1 pulse on R resets the flip-flop and the Q output goes to 0.
Figure 3.3
When both S and R are high, both Q and Q bar are forced to zero. This input mode is
considered to be forbidden.
An SR flip- flop can be im plemented using a cross-coupled NAND structure as shown in
Figure 3.4
Clocked SR flip-flop:
Clocked SR flip- flop (a level-sensitive positive latch) is shown in Figure 3 .5.
It consists of a cross-coup led inverter pair, plus 4 extra transistors to drive the flip- flop
from one state to another a nd to provide clocked operation.
Consider the case where Q is high and R pulse is applied.
The combination of transistors M4, M7, and M8 forms a ratioed inverter.
In order to make the latc h switch, we must succeed in bringing Q below the switching
threshold of the inverter M1-M2.
Once this is achieved, th e positive feedback causes the flip- flop to invert states. This
requirement forces to increase the sizes of transistors M 5, M6, M7, and M8.
Figure 3.7 Transistor level implementation of a positive latch built using transmission gates.
To reduce the clock load, implement a multiplexer based NMOS latch using two pass
transistors as shown in Figure 3.8.
The advantage of this approach is the reduced clock load of only two NMO S devices.
When CLK is high, the latch samples the D input, while a low clock-signal enables the
feedback- loop and puts the latch in the hold mode.
Figure 3.8 Multiplexer based NM OS latch using NMOS only pass transistors for multiplexers.
Explain the operation of master-slave based edge triggered register. (May 2016)
Draw and explain the operation of conventional CMOS, pulsed and resettable latches.
(Nov 2012)
Discuss about CMOS register concept and design master slave trigge red register, explain
its operation with overlapping periods. (April 2018, NOV 2018)
An edge-triggered register is to use a master-slave configuration as shown in Figure 3.9.
The register consists of cascading a negative latch (master stage) with a positive latch
(slave stage).
A multiplexer based latch is used to realize the master and slave stages.
On the low phase of the clock, the master stage is transparent and the D input is passed to
the master stage output, Q M .
During this period, the slave stage is in the hold mode, keeping its previous value.
On the rising edge of the clock, the master slave stops sampling the input and the slave
stage starts sampling.
During the high phase of the clock, the slave stage samples the output of the master stage
(QM), while the master stag e remains in a hold mode.
A negative edge-triggered register can be constructed using the same principle by simp ly
switching the order of the positive and negative latch (i.e., placing the positive latch first).
Figure 3.13: One solution for the leakage problem in low-voltage operation using MTCMOS.
************************** ************************************** **************
Discuss about the des ign of sequential dynamic circuits. (Nov 2012 , Nov 2017)
Explain the methodology of sequential circuit design of flip-flop. (May 2014)
A stored value remains valid as long as the supply voltage is applied to t he circuit, hence
the name static.
The major disadvantage of the static gate is, its complexity.
Registers are used in computational structures that are constantly clocked, such as pipelined
data path.
The requirement that the memory should hold state for extended periods of time.
This results in circuits, bas ed on temporary storage of charge on parasitic capacitors.
The principle is identical to the dynamic logic. In dynamic logic, logic signal is a charge,
stored on a capacitor.
The absence of charge denotes as logic 0 and presence of charge denotes a s logic 1.
A stored value can be kept for a limited amount of time (range of milliseconds).
A periodic refresh of its value is necessary.
2
3.2.2 C MOS Dynamic Register:
2
The C MOS Register
Figure 3.16 shows positive edge-triggered register based on the master-slave concept,
2
which is insensitive to clock overlap. This circuit is called the C MOS (Clocked CMOS)
register.
2
Figure 3.17 C MOS D FF during overlap periods.
Explain the operation of True Single Phase Clocked Register. (Nov 2016, April 2017)
In the two-phase clocking schemes, care must be taken in routing the two clock signals to
ensure that overlap is minimized.
2
While the C MOS provides a skew-tolerant solution, it is possible to design registers that
only use a single phase clock.
Explain in detail about timing issues needed for a logic operation. (April 2017)
Explain the timing basics in synchronous design in detail. (Nov 2017)
(A)Sequencing methods:-
Three methods of sequencing block of combinational logic are possible, as shown in
figure below.
In flip-flop based system, one flip flop use one cycle boundary.
Token (data) advances from one cycle to the next on the rising edge. If a token arrives too
early, it waits at the flip flop until next cycle.
In 2-phase system, phases may be separated by tnonoverlap. [tnonoverlap>0]
In pulsed system, pulse with is tpw .
In 2-phase system, full cycle of combinational logic is divided into two phases,
sometimes called ―half-cycles‖. Two latch clocks are called 1 and 2.
Flip flop can be viewed as, a pair of back to back latches using clk and its
complements.
Table shows delay and timing notations of combinational and sequencing elements.
These delays may differ for rising (with suffix ‗r‘) and falling (with suffix ‗f‘).
The delay with timing diagram for all three sequencing elements are, as shown in
figure below.
In combinational logic, input A changing to another value, output Y cannot change
instantaneously. After the contamination delay {tcd}, Y may begin to change (or) glitch.
Output Y settles to a value in propagation delay {t pd} .
Input D in flip flop must have settled by some setup time {tsetup} before the rising edge
of clock and should not change again until, a hold time {thold} after the clock edge.
The output begins to change after a clock-to-contamination delay {tccq} and completely
settles after clock to-Q propagation delay {tpcq}.
≥ + +
The clock period must be
≤ −( + )
or)
≥ ℎ −
≤ 2 −( − )
Explain in detail about pipelining structure needed for a logic operation. (A pril 2017, Nov 2017)
Discuss in detail various pipeli ning approaches to optimize sequential circuits. (May 2013, 2016)
Pipelining is a design tech nique used to accelerate the operation of the datapaths in digital
processors.
The idea is explained with Figure 3.22a.
The goal of the circuit is to compute log(|a - b|), where both a and b represent streams of
numbers.
The minimal clock period Tmin necessary to ensure correct evaluation is given as:
T t
min c q tpd,logictsu
16 UNIT-III –SEQUENTIAL LOGIC CIRCUITS – EC 8095-VLSI DESIGN
Where, tc-q and tsu are the propagation delay and the set-up time of the register respectively.
Registers are edge-triggered D registers.
The term tpd,logic stands for the worst-case delay path through the combinatorial network,
which consists of the adder, absolute value and logarithm functions.
In conventional systems, th e delay is larger than the delays associated with the registers
and dominates the circuit performance.
Assume that each logic module has an equal propagation delay.
Each logic module is then, active for only 1/3 of the clock period.
Pipelining is a technique to improve the resource utilization and increase the functional
throughput.
Introduce registers between the logic blocks, as shown in Figure 3.22b.
This causes the computation for one set of input data to spread over a number of clock
periods, as shown in Table 1.
The result for the data set (a1, b1) only appears at the output after three clock-periods.
Suppose all logic blocks have same propagation delay and that the reg ister overhead is
small with respect to the logic delays.
2
Figure 3.24 Pipe lined datapath using C MOS latches
The only way, a signal can race from stage to stage under this condition is, when the logic
function F is inverting, as in Figure 3.25.
Here F is replaced by a single, static CMOS inverter. Similar considerations are valid for
the (1-1) overlap.
2
It combines C MOS pipeline registers and NORA dynamic logic function blocks.
Each module consists of a block of combinational logic, that can be a mixture of static and
2
dynamic logic, followed by a C MOS latch.
Choosing the right clocking scheme affects the functionality, speed and power of a circuit.
The simple clocking scheme is the two-phase master-slave design.
The predominant approach is use the multiplexer-based register and to generate the two
clock phases locally, by simply inverting the clock.
High-performance CMOS VLSI design is using simple clocking schemes, even at the
expense of performance.
Flip-flop: Flip-flop has high sequencing overhead. It is simple and easy to understand
the operation of flip-flop.
Pulsed latches:-
Faster than flip- flop.
Provides some time borrowing option.
Consumes law power.
Transparent Latch:-
It has low sequencing overhead compared with flip-flop.
It allows almost half cycle of time borrowing and it is good choice.
************************** ************************************** *
3.7 Clock-Distribution Techniques
Explain the clock distribution techniques in synchronous design in detail. (Nov 2017)
Design a clock distribution network based on H tree model for 16 nodes. (April 2018)
Clock skew and jitter are m major issues in digital circuits and they limit the performance
of a digital system.
It is necessary to design a c lock network, that minimizes skew and jitter.
Another important consideration in clock distribution is the power dissipation.
In most high-speed digital processors, a majority of the power is dissipated in the clock
network.
To reduce power dissipation, clock networks must support clock conditioning, the ability to
shut down parts of the clock network.
Unfortunately, clock gating results in additional clock uncertainty.
Fabrics for clocking:
Clock networks include an n network that is used to distribute a global reference to
various parts of the chip.
A final stage is responsible for local distribution of the clock, while considering the local
load variations.
Most clock distribution schemes use the absolute delay from a central clock source to the
clocking elements.
Therefore one common approach to distributing a clock is, to use balance d paths (or called
trees).
The most common type of clock primitive is, the H-tree network (named for the physical
structure of the network) in figure, where a 4x4 array is shown.
In this scheme, the clock is routed to a central point on the chip and bala nced paths.
Latch-Based Clocking:
The use of a latch based methodology (in Figure) enables more flexible timing, allowing
one stage to pass slack to or steal time from following stages.
This flexibility allows an overall performance increase.
In this configuration, a stable input is available to the combinational logic block A
(CLB_A) on the falling edge of CLK1 (at edge2).
On the falling edge of CL K2 (at edge3), the output CLB_A is latched and the computation
of CLK_B is launched.
CLB_B computes on the low phase of CLK2 and the output is available on the falling edge
of CLK1 (at edge4).
This timing appears equivalent to having an edge-triggered system where CLB_A and
CLB_B are cascaded and between two edge-triggered registers.
In both cases, it appears that the time available to perform the combination of CLB_A and
CLB_B are TCLK.
Figure: Latch-based design in which transparent latches are separated by combinational logic.
************************** ************************************** **************
A more reliable and robust technique is the self- timed approach, which presents a local
solution to the timing problem.
Figure uses a pipelined datapath to illustrate how this can be accomplished.
The computation of a logic block is initiated by asserting a Start signal.
The combinational logic block computes on the input data.
This signaling ensures the logical ordering of the events and can be achieved with the
aid of an extra Acknowledge and Req(uest) signal.
In the case of the pipelined datapath, the scenario could proceed as follows.
1. An input word arrives, and a Req(uest) to the block F1 is raised. If F1 is inactive at that
time, it transfers the data and acknowledges this fact to the input buffer.
2. F1 is enabled by raising the Start signal. After a certain amount of time, dependent upon the
data values, the done signal goes high indicating the completion of the computation.
3. A Req(uest) is issued to the F 2 module. If this function is free, an Ack(nowledge) is raised,
the output value is transferred and F1 can go ahead with its next computation.
How do eliminates met stability problem in seque ntial circuit and explain?
Figure: Arbiter
Figure (b) shows an arbiter built from an SR latch and a four-transistor metastability
filter.
If one of the request inputs arrives well before the other, the latch will respond
appropriately.
If they arrive at nearly the same time, the latch may be driven into metastability, as
shown in Figure (c).
The filter keeps both acknowledge signals low, until the voltage difference between the
internal nodes n1 and n2 exceeds Vt , indicating that a decision has been made.
Such an asynchronous arbiter will never produce metastable outputs.
The self- timed approach offers a potential solution to the growing clock-distribution
problem.
It translates the global clock signal into a number of local synchronization problems.
Handshaking logic is needed to ensure the logical ordering of the circuit events and to
avoid race conditions.
The circuit uses a precharged front-end amplifier that samples the differential input signal on
the rising edge of the clock signal.
The outputs of front-end are fed into a NAND cross-coupled SR FF that holds the data and
guarantees that the differential outputs switch only once per clock cycle.
The differential inputs in this implementation don‘t have to have rail-to-rail swing and hence this
register can be used as a receiver for a reduced swing differential bus.
In the preceding sections, we have focused on one single type of sequential element, this is the latch (and
its sibling the register). The most important property of such a circuit is that it has two stable states, and is
hence called bistable. The bistable element is not the only sequential circuit of interest. Other
regenerative circuits can be catalogued as astable and monostable. The former act as oscillators and can,
for instance, be used for on-chip clock generation. The latter serve as pulse generators, also called one-
shot circuits. Another interesting regenerative circuit is the Schmitt trigger. This component has the useful
prop-erty of showing hysteresis in its dc characteristics—its switching threshold is variable and depends
upon the direction of the transition (low-to-high or high-to-low). This peculiar feature can come in handy
in noisy environments.
Definiti on
A Schmitt trigger [Schmitt38] is a device with two important properties:
It responds to a slowly changing input waveform with a fast transition time at the output.
The voltage-transfer characteristic of the device displays different switching thresh-olds for positive- and
negative-going input signals. This is demonstrated in Figure 7.46, where a typical voltage-transfer
characteristic of the Schmitt trigger is shown (and its schematics symbol). The switching thresholds for
the low-to-high and high-to-low transitions are called VM+ and VM− , respectively. The hysteresis
voltage is defined as the difference between the two.
One of the main uses of the Schmitt trigger is to turn a noisy or slowly varying input signal into a clean
digital output signal. This is illustrated in Figure 7.47. Notice how the hysteresis suppresses the ringing on
the signal. At the same time, the fast low-to-high (and high-to-low) transitions of the output signal should
be observed. For instance, steep signal slopes are beneficial in reducing power consumpt ion by
suppressing direct-path currents. The ―secret‖ behind the Schmitt trigger concept is the use of positive
feedback.
CMOS Implementation
One possible CMOS implementation of the Schmitt trigger is shown in Figure 7.48. The idea behind this
circuit is that the switching threshold of a CMOS inverter is determined by the (k n /kp ) ratio between the
NMOS and PMOS transistors. Increasing the ratio results in a reduction of the threshold, while decreasing
it results in an increase in VM.
Adapting the ratio depending upon the direction of the transition results in a shift in the switching
threshold and a hysteresis effect. This adaptation is achieved with the aid of feedback.
Suppose that Vin is initially equal to 0, so that Vout = 0 as well. The feedback loop biases the PMOS
transistor M4 in the conductive mode while M3 is off. The input signal effectively connects to an inverter
consisting of two PMOS transistors in parallel (M2 and M4) as a pull-up network, and a single NMOS
transistor (M1) in the pull-down chain.
This modifies the effective transistor ratio of the inverter to kM1/(kM2+kM4), which moves the
switching threshold upwards.
Once the inverter switches, the feedback loop turns off M4, and the NMOS device M3 is activated.
This extra pull-down device speeds up the transition and produces a clean output signal with steep slopes.
A similar behavior can be observed for the high-to-low transition. In this case, the pull-down network
originally consists of M1 and M3 in parallel, while the pull-up network is formed by M2.
This reduces the value of the switching threshold to VM–.
CMOS Schmitt Trigger
Consider the schmitt trigger with the following device sizes. Devices M 1 and M 2 are 1µ m/0.25µ m, and
3µ m/0.25µ m, respectively. The inverter is sized such that the switching threshold is around VDD /2 (=
1.25 V). Figure 7.49a shows the simulation of the Schmitt trig-ger assuming that devices M 3 and M 4 are
0.5µ m/0.25µ m and 1.5µ m/0.25µ m, respectively. As apparent from the plot, the circuit exhibits
hysteresis. The high-to-low switching point (VM- = 0.9 V) is lower than VDD /2, while the low-to-high
switching threshold (VM+ = 1.6 V) is larger than VDD /2.
It is possible to shift the switching point by changing the sizes of M 3 and M 4 . For example, to modify the
low-to-high transition, we need to vary the PMOS device. The high-to-low threshold is kept constant by
keeping the device width of M 3 at 0.5 µ m. The device width of M 4 is varied as k * 0.5µ m. Figure 7.49b
demonstrates how the switching threshold increases with raising values of k.
A monostable element is a circuit that generates a pulse of a predetermined width every time the quiescent circuit is
triggered by a pulse or transition event. It is called monostable because it has only one stable state (the
quiescent one). A trigger event, which is either a signal transition or a pulse, causes the circuit to go
temporarily into another quasi-stable state.
This means that it eventually returns to its original state after a time period deter-mined by the circuit
parameters. This circuit, also called a one-shot, is useful in generating pulses of a known length. This
functionality is required in a wide range of applications. We have already seen the use of a one-shot in the
construction of glitch registers.
This circuit detects a change in a signal, or group of signals, such as the address or data bus, and produces
a pulse to initialize the subsequent circuitry.
The most common approach to the implementation of one-shots is the use of a sim-ple delay element to
control the duration of the pulse.
The concept is illustrated in Figure 7.51. In the quiescent state, both inputs to the XOR are identical, and
the output is low.
A transition on the input causes the XOR inputs to differ temporarily and the output to go high. After a
delay td (of the delay element), this disruption is removed, and the output goes low again.
A pulse of length td is created. The delay circuit can be realized in many different ways, such as an RC-
network or a chain of basic gates.
In
DE L AY
Out
td t
d
The ring oscillator is a simple, example of an astable circuit. It consists of an odd number of inverters
connected in a circular chain. Due to the odd number of inversions, no stable operation point exists, and
the circuit oscillates with a period equal to 2 × tp × N, with N the number of inverters in the chain and tp
the propagation delay of each inverter.
The simulated response of a ring oscillator with five stages is shown in Figure 7.52 (all gates use
minimum-size devices). The observed oscillation period approximately equals 0.5 nsec,which
corresponds to a gate propagation delay of 50 psec. By tapping the chain at various points, different
phases of the oscillating waveform are obtained (phases 1, 3, and 5 are discussed.
3.0
v1 v3 v5
2.5
2.0
1.5
1.0
0.5
Figure 7.52 Simulated
0.0 waveforms of five-stage ring
osc illat or. T he outputs of st ages 1,
-0.5
0.0 0.5 1.0 1.5 3, and 5 are shown.
time (ns ec)
The ring oscillator composed of cascaded inverters produces a waveform with a fixed oscillating frequency
determined by the delay of an inverter in the CMOS process. In many applications, it is necessary to
control the frequency of the oscillator.
An example of such a circuit is the voltage-controlled oscillator (VCO), whose oscillation frequency is a
function (typically non-linear) of a control voltage.
The standard ring oscillator can be modified into a VCO by replacing the standard inverter with a current-
starved inverter as shown in Figure 7.53 [Jeong87]. The mechanism for controlling the delay of each
inverter is to limit the current available to discharge the load capacitance of the gate.
V
DD
M
2
In Out
M
1
I
ref
In this modified inverter circuit, the maximal discharge current of the inverter is lim-ited by adding an
extra series device. Note that the low-to-high transition on the inverter can also be controlled by adding a
PMOS device in series with M 2 .
The added NMOS transistor M 3 , is controlled by an analog control voltage Vcntl , which determines the
avail-able discharge current. Lowering Vcntl reduces the discharge current and, hence, increases tpHL.
The ability to alter the propagation delay per stage allows us to control the frequency of the ring
structure. The control voltage is generally set using feedback techniques. Under low operating current
levels, the current-starved inverter suffers from slow fall times at its output. This can result in significant
short-circuit current.
This is resolved by feeding its output into a CMOS inverter or better yet a Schmitt trigger. An extra
inverter is needed at the end to ensure that the structure oscillates.
UNIT-IV –EC 8095 VLSI DESIGN
UNIT – IV
DESIGNING ARCHITECTURE BUILDING BLOCKS
Arithmetic Building Blocks: Data Paths, Adders, Multipliers, Shifters, ALUs, power and speedtradeoffs,
Case Study: Design as a tradeoff.
Designing Memory and Array structures: Memory Architectures and Building Blocks, Memory Core,
Memory Peripheral Circuitry.
Data path circuits are meant for passing the data from one segment to other segment
for processing or storing.
The data path is the core of processors, where all computations are performed.
It is generally defined with general digital processor. It is shown in figure.
In this, data is applied at one port and data output is obtained at second port.
Data path block consists of arithmetic operation, logical operation, shift operation and
temporary storage of operands.
UNIT-IV –EC6601 VLSI DESIGN
Data paths are arranged in a bit sliced organization.
Instead of operating on single bit digital signals, the data in a processor are arranged in a
word based fashion.
Bit slices are either identical or resemble a similar structure for all bits.
The data path consists of the number of bit slices (equal to the word length), each
operating on a single bit. Hence the term is bit-sliced.
Draw the structure of ripple carry adder and explain its operation. (Nov 2017)
Explain the operation of a basic 4 bit adder. (Nov 2016)
V
DD
V
DD C
i A B
A B
A
B
Ci B V
DD
A
X C
i
C
i A S
Ci
V
A B B DD
C
A B i A
Co B
Explain the operation and design of Carry lookahead adder (CLA). (May 2017, Nov 2016)
How the drawback in ripple carry adder overcome by carry look ahead adder and
discuss. (Nov 2017)
Explain the concept of carry lookahead adder and discuss its types. (April 2018)
A carry-lookahead adder (CLA) is a type of adder used in digital circuit.
A carry-lookahead adder improves speed by reducing the amount of time required to
determine carry bits.
In ripple carry adder, carry bit is calculated alongwith the sum bit.
Each bit must wait until the previous carry is calculated to begin calculating its own result and
carry bits.
The carry-lookahead adder calculates one or more carry bits before the sum, which reduces
the wait time to calculate the result of the larger value bits.
A ripple-carry adder works starting at the rightmost (LSB) digit position, the two
corresponding digits are added and a result obtained. There may be a carry out of this digit
position.
Accordingly all digit positions other than LSB. Need to take into account the possibility to
add an extra 1, from a carry that has come in from the next position to the right.
Carry lookahead depends on two things:
Calculating, for each digit position, whether that position is going to propagate a carry
if one comes in from the right.
Combining these calculated values to be able to realize quickly whether, for each
group of digits, that group is going to propagate a carry.
UNIT-IV –EC6601 VLSI DESIGN
Theory of operation:
Carry lookahead logic uses the concept of generating and propagating carry.
The addition of two 1-digit inputs A and B is said to generate if the addition will
carry, regardless of whether there is an input carry.
Generate:
In binary addition, A + B generates if and only if both A and B are 1.
If we write G(A,B) to represent the binary predicate that is true if and only if A + B
generates, we have:
G(A,B) = A . B
Propagate:
The addition of two 1-digit inputs A and B is said to propagate if the addition will carry
whenever there is an input carry.
In binary addition, A + B propagates if and only if at least one of A or B is 1.
If we write P(A,B) to represent the binary predicate that is true if and only if A + B
propagates, we have:
P(A,B) A B
These adders are used to overcome the latency which is introduced by the rippling effect
of carry bits.
Write carry look-ahead expressions in terms of the generate gi and propagate pi signals.
The general form of carry signal ci thus becomes
ci 1 ai .bi ci .( ai bi ) g i ci . pi
If ai .b =1, then ci 1 1, write generate term as, g i ai .bi
The Manchester carry chain is a variation of the carry-lookahead adder that uses shared
logic to lower the transistor count.
A Manchester carry chain generates the intermediate carries by tapping off nodes in the
gate that calculates the most significant carry value.
Dynamic logic can support shared logic, as transmission gate logic.
One of the major drawbacks of the Manchester carry chain is increase the propagation delay.
A Manchester-carry-chain section generally won't exceed 4 bits.
In this adder, the basic equation is ci 1 g i ci . pi
Where pi ai bi and g i ai .bi
Table
******************************************************************************
UNIT-IV –EC6601 VLSI DESIGN
V. HIGH SPEED ADDERS:
Design a carry bypass adder and discuss its features. (May 2016)
t
Sum Sum Sum sum Sum
M bits
Design a carry select adder and discuss its features. (May 2016)
Carry save adder is similar to the full adder. It is used when adding multiple numbers.
All the bits of a carry save adder work in parallel.
In carry save adder, the carry does not propagate. So, it is faster than carry propagate adder.
It has three inputs and produces 2 outputs, carry-out is saved. It is not immediately used to
find the final sum value.
Explain the design and operation of 4 x 4 multiplier circuit. (Apr. 2016, 2017, Nov 2016, 2018)
Design a multiplier for 5 bit by 3 bit. Explain its operation and summarize the numbers
of adders. Discuss it over Wallace multiplier. (Nov 2017, April 2018)
A study of computer arithmetic processes will reveal that the most common requirements
are for addition and subtraction.
There is also a significant need for a multiplication capability.
Basic operations in multiplication are given below.
0 x 0 = 0, 0 x 1 = 0, 1 x 0 = 0, 1x1=1
1 0 1 0 1 0 Multiplicand
x 1 0 1 1 Multiplier
1 0 1 0 1 0
1 0 1 0 1 0
0 0 0 0 0 0 Partial products
1 0 1 0 1 0
1 1 1 0 0 1 1 1 0 Result
If two different 4-bit numbers (x0, x1, x2, x3 & y0, y1, y2, y3)are multiplied then
UNIT-IV –EC6601 VLSI DESIGN
Multiplication by shifting:
If x=(0020)2 = (2)10
If it is to be multiplied by 2, then we can shift x in left side. x = (0100)2 = (4)10
If it is to be divided by 2, then we can shift in right side. x = (0001)2 = (1)10.
So, shift register can be used for multiplication or division by 2.
X3 X2 X1 X0 Y0
X3 X2 X1 X0 Y 1 Z0
HA FA FA HA
X3 X2 X1 X0 Y2 Z1
FA FA FA HA
X3 X2 X1 X0 Y3 Z2
FA FA FA HA
Z Z Z Z Z
7 6 5 4 3
Figure: 4 x 4 array multiplier using Fulladder, Halfadder and AND gate.
But in booth multiplication, partial product generation is done based on recoding scheme
e.g. radix 2 encoding.
Bits of multiplicand (Y) are grouped from left to right and corresponding operation on
multiplier (X) is done in order to generate the partial product.
In radix-2 booth multiplication partial product generation is done based on encoding
which is as given by Table.
RADIX-2 PROCEDURE:
1) Add 0 to the LSB of the multiplier and make the pairing of 2 from the right to the
left which shown in the figure.
These group of binary digits are according to the Modified Booth Encoding Table and it
is one of the numbers from the set of (-2, 2, 0, 1, -1).
******************************************************************************
VII. DIVIDERS
There are two types of dividers, Serial divider and Parallel divider. Serial divider is
slow and parallel divider is fast in performance.
Generally division is done by repeated subtraction. If 10/3 is to be performed then,
10 -3 =7, ( divisor is 3, dividend is 10)
7–3=4,
4–3=1
Here, repeated subtraction has been done, after 3 subtractions, the remainder is 1. It is
less than divisor. So now the subtraction is stopped.
Let see the example of binary division with use of 1’s complement method
1010 (10d) / 0011 (3d)
Step1: find 1’s complement of divisor
Step2: add this with the dividend
Step3: if carry is 1, then it is added with the output to get the difference output
Step4: the same procedure is repeated until we are get carry 0.
Step5: then the process is stopped.
UNIT-IV –EC6601 VLSI DESIGN
101 0(10)
Y0 Y1 Y2 Y3 are complemented and given to 4 bit adder block (figure shown below)
X0 X1 X2 X3 are given to MUXs and MUX output is given to D flipflop. Select signal
of MUX is high. It is connected to clear input of counter.
Carry output of adder is connected with clock enable pin of counter. The same is given
to OR gate. The output of this OR gate is given to clock enable signal of flipflops.
The other input of OR gate is tied with select signal of MUX.
UNIT-IV –EC6601 VLSI DESIGN
If X > Y, C0 of adder is high.
After first subtraction, the counter output is incremented by 1.
For each subtraction, the counter output is incremented.
If C0 of adder is low, then clock of counter and FF is disabled. Counting is stopped.
Q3 Q2 Q1 Q0 is the counter output (Quotient)
R3 R2 R1 R0 is the flipflop output (remainder)
******************************************************************************
VIII. SHIFT REGISTERS:
Design 4 input and 4 output barrel shifter using NMOS logic. (NOV 2018).
An n-bit rotation is specified by using the control word R0-n and L/R bit defines a left or
right shifting.
For example y3 y 2 y 1 y 0 = a3 a2 a1 a0
If it is rotated 1-bit in left side, we get If y3 y 2 y 1 y 0 = a2 a1 a0 a3
it is rotated 1-bit in right side, we get y3 y 2 y 1 y 0 = a0 a3 a2 a1
UNIT-IV –EC6601 VLSI DESIGN
Barrel Shifter:
A barrel shifter is a digital circuit that can shift a data word by a specified number of bits in
one clock cycle.
It can be implemented as a sequence of multiplexers (MUX), and in such an implementation
the output of one MUX is connected to the input of the next MUX in a way that depends on
the shift distance.
For example, take a four-bit barrel shifter, with inputs A, B, C and D. The shifter can cycle
the order of the bits ABCD as DABC, CDAB, or BCDA; in this case, no bits are lost.
That is, it can shift all of the outputs up to three positions to the right (thus make any cyclic
combination of A, B, C and D).
The barrel shifter has a variety of applications, including being a useful component in
microprocessors (alongside the ALU).
Logarithmic Shifter:
th
A Shifter with a maximum shift width of M consists of a log2M stages, where the i stage
i
either shifts over 2 or passes the data unchanged.
Maximum shift value of seven bits is shown in figure, to shift over five bits, the first stage is
set to shift mode, the second to pass mode and the last again to shift.
The speed of the logarithmic shifter depends on the shift width in a logarithmic wa, M-bit
shifter requires log2M stages.
The series connection of pass transistors slows the shifter down for larger shift values.
Advantage of logarithmic shifter is more effective for larger shift values in terms of both area
and speed.
******************************************************************************
IX. SPEED AND AREA TRADE OFF:
Discuss the details about speed and area trade off. (May 2017)
Adder:
The tradeoff in terms of power and performance is shown below.
The performance is represented in terms of the delay(speed).
The area estimations for each of the delays are given based on the fact that area is in
relation to the power consumption.
The area of a carry lookahead adder is larger than the area of a ripple carry for a
particular delay.
UNIT-IV –EC6601 VLSI DESIGN
This is because the computations performed in a carry lookahead adder are parallel,
which requires a larger number of gates and also results in a larger area.
CLA –Carry Lookahead Adder, RC, R – Ripple carry adder
Figure: Area Vs Delay for 8 bit adder Figure: Area Vs Delay for 16 bit adder
Figure: Area Vs delay for 32 bit adder Figure: Area Vs delay for 64 bit adder
Figure: Delay Vs Area for all adders Figure: Area Vs Delay for all multiplier
Above figures shows that the delay of the ripple carry adder increases much faster
when compared to the carry lookahead adder as the number of bits is increased.
In the carry lookahead adder, the cost is in terms of the area because computations are
in parallel, and therefore more power is consumed for a specific delay.
Memory Architecture and Building Blocks:
Explain the memory architecture and its control circuits in detail. (April 2018)
When n x m memory is implemented, then, n memory words are arranged in a linear fashion.
One word will be selected at a time by using select line.
If we want to implement the memory 8X8, n=8, m=8(number of bits).
Then we need 8 select signals (one for each word).
But by using decoder we can reduce the number of select signals.
In case of 3 to 8 decoder, if 3 inputs are given to decoder, then we can get 8 select signals.
If n=220, then we can give only 20 inputs to the decoder.
If basic storage cell size is approximately square, then the design is extremely slow. The
vertical wire, which connects the storage cells to I/O will be excessively large.
So, memory arrays are organized in such a way that vertical and horizontal dimensions
are the same.
The words are stored in a row. These words are selected simultaneously.
The column decoder is used to route the correct word to the I/O terminals.
The row address is used to select one row of memory and column address is used to
select particular word from that selected row.
Word line: The horizontal select line which is used to select the single row of cell is
known as word line.
Bit line: The wire which connects the cell in a single column to the input/output circuit is
known as bit line.
Sense amplifier: It requires an amplification of the internal swin g to full rail-to-rail
amplitude.
Block address: the mem ory is divided into various small blocks.
The address which is used to select one of the small blocks to be read or written is known
as block address.
Advantages:
Access time is fast
Power saving is good, because blocks not activated are in power saving mode.
Programming ROM
The transistor in the intersection of row and column is OFF when the associated word
line is LOW. In this condition, we get logic 1 output.
2
EEPROM – E PROM:
Electrically Erasable Programmable ROM. Here Floating gate tunneling oxide
(FLOTOX) is used.
It is similar to floating gate except that the portion of the floating gate is separated from
the channel at the thickness of 10nm or <10nm.
If 10V is applied, electron trravels to and from the floating gate through F owler-Nordheim
tunneling.
Erasing can be done by revering applied voltage which is used for writing.
Figure: (a) Erase (b) Write (c) Read operation of NOR flash memory
UNIT-IV
4.9.3.3 RAM – Random Access Memory
Explain about static and dynamic RAM.
Construct 6T based SRAM cell. Explain its read and write operations. (NOV 2018)
UNIT-IV
Figure: Three transistor dynamic memory cell
To read the cell, the read word line (RWL) is raised. M2 transistor is either ON or OFF
depends upon the stored value.
BL2 bit line is connected to VDD or it is precharged to VDD or VDD-Vt.
When logic 1 is stored, the series combination of M2and M3 pulls BL2 line low.
If logic 0 is stored, then BL2 line is high.
To refresh the cell, first the stored data is read, and its inverse is placed on BL1 and WWL
line is asserted.
One transistor DRAM:
In this cell, to write logic 1 then it is placed on bit line and word line is asserted high.
The capacitor is charged or discharged depending upon the data. Before performing read
operation, bit line is precharged.
UNIT-IV
In the compare mode, stored data are compared using bit line. The match line is
connected to all CAM blocks in a row. And it is initially precharged to VDD.
If there is some match occurs, then internal row is discharged. If even one bit in a row is
mismatched, then the match line is low.
*****************************************************************************
Column Decoder
It should match the bit line pitch of the memory array.
In column decoder, decoder outputs are connected to nMOS pass transistors.
By using this circuit, we can selectively drive one out of m pass transistors.
UNIT-IV
Only one nMOS pass transistor is ON at the time.
Amplification:
In memory structures such as the 1T DRAM, amplification is required for proper
functionality.
Delay Reduction:
The amplifier compensates for the fan-out driving capability of the memory cell by
detecting and amplifying small transitions on the bit line to large signal output
swings.
Power reduction:
Reducing the signal swing on the bit lines can eliminate large part of the power
dissipation related to charging a n d discharging the bit lines.
(iii) Drivers/ Buffers
The length of word and bit lines increases with increasing memory sizes.
Large portion o f the read and write access time can be attributed t o the
wire delays.
A major part of the memory-periphery area is allocated to the drivers (address
buffers and I/O drivers).
******************************************************************************
4.11: Low Power Memory design:
Discuss about Low power memory design.
Figure: (a) Insertion of low threshold device (b) Reducing supply Voltage
******************************************************************************
UNIT-V EC8095-VLSI DESIGN
1.Logic blocks
Based on memories (Flip-flop & LUT – Lookup Table) Xilinx
Based on multiplexers (Multiplexers)-Actel
Based on PAL/PLA - Altera
Transistor Pairs
2. Interconnection Resources
Symmetrical FPGA-s
Row-based FPGA-s
Sea-of-gates type of FPGA-s
Hierarchical FPGA-s (CPLD)
3. Input-output cells (I/O Cell)
Possibilities for programming :
a. Input
b. Output
c. Bidirectional
RE-PROGRAMMABLE DEVICE ARCHITECTURE:
The programmable logic blocks of FPGA are smaller and less capable than a PLD, but an
FPGA chip contains a lot more logic blocks to make it more capable.
As shown in figure the logic blocks are distributed across the entire chip.
These logic blocks can be interconnected with programmable inter connections.
The programmable logic blocks of FPGAs are called Configurable Logic Blocks (CLBs).
CLBs contain LUT, FF, logic gates and Multiplexer to perform logic functions.
The CLB contains RAM memory cells and can be programmed to realize any function of
five variables or any two functions of four variables.
The functions are stored in the truth table form, so the number of gates required to realize
the functions is not important.
5.7.2: Interconnection resources:
Connections between the logic blocks in distant groups require the traversal of one or more levels
of routing segments.
As shown in Figure, only one level of routing directly connects to the logic blocks.
Programmable connections are represented with the crosses and circles.
The logic block pins connecting to connection blocks can then be connected to any number of wire
segments through switching blocks.
Figure shows the Xilinx routing architecture.
There are four types of wire segments available:
General purpose segments that pass through switches in the switch block.
Direct interconnect connects logic block pins to four surrounding connecting blocks
Long line: high fan out uniform delay connections
Clock lines: clock signal provider which runs all over the chip.
It employs wire segments of different lengths in each channel to provide the most appropriate
length for each given connection.
UNIT-V
\ EC8095-VLSI DESIGN
DESIGN FOR TESTABILITY:
VLSI designers have a wide variety of CAD tools to choose from, each with their own
strengths and weaknesses. The leading Electronic Design Automation (EDA) companies include
Cadence, Synopsys, Magma, and Mentor Graphics.
Tanner also offers commercial VLSI design tools. The leading free tools include Electric,
Magic, and LASI.
This set of laboratories uses the Cadence and Synopsys tools because they have the largest
market share in industry, are capable of handling everything from simple class projects to state-of-
the-art integrated circuits.
The full set of tools is extremely expensive but the companies offer academic programs to
make the tools available to universities at a much lower cost.
The tools run on Linux and other flavors of UNIX. Setting up and maintaining the tool
involves a substantial effort. Once they are setup correctly, the basic tools are easy to use, as this
tutorial demonstrates.
Some companies use the Tanner tools because their list price is much lower and they are
easy to use. However, their academic pricing is comparable with Cadence and Synopsys, giving
little incentive for universities to adopt Tanner.
The Electric VLSI Design System is an open-source chip design program developed by
Electric presently does not read the design rules for state-of-the-art nanometer processes and
poorly integrates with synthesis and place & route.
Magic is a free Linux-based layout editor with a powerful but awkward interface that was
once widely used in universities.
The Layout System for Individuals, LASI, developed by David Boyce, is freely available
and runs on Windows. It was last updated in 1999. There are two general strategies for chip
design.
Custom design involves specifying how every transistor is connected and physically
arranged on the chip.
The majority of commercial designs are synthesized today because synthesis takes less
engineering time.
However, custom design gives more insight into how chips are built and into what to do
when things go wrong.
Custom design also offers higher performance, lower power, and smaller chip size. The
first two labs emphasize the fundamentals of custom design, while the next two use logic synt hesis
and automatic placement to save time.
Tool Setup
These labs assume that We have the Cadence and Synopsys tools installed. The tools
generate a bunch of random files. It‟s best to keep them in one place. In your home directory,
mkdir IC_CAD
mkdir IC_CAD/cadence
Getting Started
Before you start the Cadence tools, change into the cadence directory:cd
~/IC_CAD/cadence Each of our tools has a startup script that sets the appropriate paths to the tools
and invokes them.
Start Cadence with the NCSU extensions by running cad-ncsu & A window labeled icfb
will open up.
This is the Integrated Circuit Front and Back End (e.g. schematic and layout) software, part
of Cadence‟s Design Framework interface.
A “What‟s New” and a Library Manager window may open up too. Scroll through the icfb
window and look at the messages displayed as the tool loads up.
Get in the habit of watching for the messages and recognizing any that are out of the ordinary.
This is very helpful when We encounter problems. All of your designs are stored in a
library. If the Library Browser doesn‟t open, choose Tools
Library Manager. We‟ll use the Library Manager to manipulate your libraries. Don‟t try to
move libraries around or rename them directly in Linux; there is some funny behavior and We are
likely to break them.
Familiarize Werself with the Library Manager. Your cds.lib file includes many libraries
from the NCUS CDK supporting the different MOSIS processes. It also includes libraries from the
University of Utah.
The File menu allows We to create new libraries and cells within a library, while the Edit
menu allows We to copy, rename, delete, and change the access permissions.
Choose the “Attach to existing tech library” and accept the default, UofU AMI 0.60u C5N
(3M, 2P, high- res).
This is a technology file for the American Microsystems (now Orbit Semiconductor) 0.6
μm process, containing design rules for layout.
Schematic Entry
Our first step is to create a schematic for a 2-input NAND gate. Each gate or larger
component is called a cell. Cells have multiple views. The schematic view for a cell built with
CMOS transistors will be called cmos sch.
Later, We will build a view called layout specifying how the cell is physically
manufactured. In the Library Manager, choose File • New • Cell View… In your lab1_xx library,
enter a cell name of nand2 and a view name of cmos_sch. The tool should be Composer -
Schematic.
We may get a window asking you to confirm that cmos_sch should be assoc iated with this
tool. The schematic editor window will open. Your goal is to draw a gate like the one shown in
Figure 1. We are working in a 0.6 μm process with λ = 0.3 μm.
Unfortunately, the University of Utah technology file is configured on a half- lambda grid,
so grid units are 0.15 μm. Take care that everything We do is an integer multiple of λ so We don‟t
come to grief later on. Our NAND gate will use 12 λ (3.6 μm) nMOS and pMOS transistors.
Choose Add • Instance to open a Component Browser window. C hoose
UofU_Analog_Parts for the library, then select nmos. The Add Instance dialog will open. Set the
Width to 3.6u (u indicates microns).
Click in the schematic editor window to drop the transistor. We can click a second time to
place another transistor. Return to the Component Browser window and choose pmos. Drop two
pMOS transistors.
Then return to the browser and get a gnd and a vdd symbol. When We are in a mode in the
editor, We can press ctrl-c or Esc to get out of it.
Other extremely useful commands include Edit • Move, Edit • Copy, Edit • Undo, and Edit
35. Delete. Edit • Properties • Object… is also useful to change things like transistor sizes or
wire names.
Move the elements around until they are in attractive locations. I like to keep series
transistors one grid unit apart and place pMOS transistors two grid units above the nMOS. Look at
the bottom of the schematic editor window to see what mode We are in.
Next, use Add • Pin… to create some pins. In the Add Pin dialog, enter a and b. Make sure
the direction is “input.”
The tools are case-sensitive, so use lower case everywhere. Place the pins, being sure that a
is the bottom one.
Although pin order doesn‟t matter logically, it does matter physically and electrically, so
We will get errors if We reverse the order. Then place an output pin y. Now, wire the elements
together.
Choose Add • Wire (narrow). Click on each component and draw a wire to where it should
connect. It is a good idea to make sure every net (wire) in a design has a name.
Otherwise, We‟ll have a tough time tracking down a problem later on one of the unnamed
nets.
Every net in Wer schematic is connected to a named pin or to power or ground except the
net between the two series nMOS transistors. Choose Add • Wire name… Enter mid or something
like that as the name, and click on the wire to name it. Choose Design • Check and Save to save
Wer schematic.
We‟ll probably get one warning about a “solder dot on crossover” at the 4-way junction on
the output node.
This is annoying because such 4-way junctions are normal and common. Choose Check •
Rules Setup… and click on the Physical tab in the dialog. Change Solder On CrossOver from
“warning” to “ignored” and close the dialog.
Then Check and Save again and the warning should be gone.
If We have any other warnings, fix them. A common mistake is wires that look like they
might touch but don‟t actually connect. Delete the wire and redraw it. Poke around the menus and
familiarize Werself with the other capabilities of the schematic editor.
LOGIC VERIFICATION
Cells are commonly described at three levels of abstraction. The register-transfer level
(RTL) description is a Verilog or VHDL file specifying the behavior of the cell in terms of
registers and combinational logic.
It often serves as the specification of what the chip should do. The schematic illustrates
how the cell is composed from transistors or other cells. The layout shows how the transistors or
cells are physically arranged.
Logic verification involves proving that the cells perform the correct function. One way to
do this is to simulate the cell and apply a set of 1‟s and 0‟s called test vectors to the inputs, then
check that the outputs match expectation.
Typically, logic verification is done first on the RTL to check that the specification is
correct. A testbench written in Verilog or VHDL automates the process of applying and checking
all of the vectors.
The same test vectors are then applied to the schematic to check that the schematic matches
the RTL.
Later, we will use a layout-versus schematic (LVS) tool to check that the layout matches
the schematic (and, by inference, the RTL).
We will begin by simulating an RTL description of the NAND gate to become familiar
with reading RTL and understanding a testbench. In this tutorial, the RTL and testbench are
written in System Verilog, which is a 2005 update to the popular Verilog hardware description
language.
There are many Verilog simulators on the market, including NC-Verilog from Cadence,
VCS from Synopsys, and ModelSim from Mentor Graphics.
This tutorial describes how to use NC Verilog because it integrates gracefully with the
other Cadence tools.
NCVerilog compiles your Verilog into an executable program and runs it directly, making
it much faster than the older interpreted simulators. Make a new directory for simulation (e.g.
nand2sim).
Copy nand2.sv, nand2.tv, and testfixture.verilog from the course directory into your new
directory.
cp /courses/e158/10/nand2.sv . cp /courses/e158/10/nand2.tv .
cp /courses/e158/10/nand2.testfixture testfixture.verilog
nand2.sv is the SystemVerilog RTL file, which includes a behavioral description of a nand2
module and a simple self-checking testbench that includes testfixture.verilog. testfixture.verilog
reads in testvectors from nand2.tv and applies them to pins of the nand2 module.
After each cycle it compares the output of the nand2 module to the expected output, and
prints an error if they do not match.
Look over each of these files and understand how they work. First, We will simulate the
nand2 RTL to practice the process and ensure that the testbench works.
Later, We will replace the behavioral nand2 module with one generated from Wer Electric
schematic and will resimulate to check that your schematic performs the correct function.
At the command line, type sim- nc nand2.sv to invoke the simulator. We should see some
messages ending with
ncsim> run
We‟ll be left at the ncsim command prompt. Type quit to finish the simulation. If the
simulation hadn‟t run correctly, it would be helpful to be able to view the results.
NC-Verilog has a graphical user interface called SimVision. The GUI takes a few seconds
to load, so We may prefer to run it only when We need to debug.
To rerun the simulation with the GUI, type sim- ncg nand2.sv A Console and Design
Browser window will pop up.
In the browser, click on the + symbol beside the testbench to expand, then click on dut.
The three signals, a, b, and y, will appear in the pane to the right. Select all three, then right-click
and choose Send to Waveform Window.
In the Waveform Window, choose Simulation • Run. We‟ll see the waveforms of your
simulation; inspect them to ensure they are correct. The 0 errors message should also appear in the
console.
If you needed to change something in your code or testbench or test vectors, or wanted to
add other signals, do so and then Simulation • Reinvoke Simulator to recompile everything and
bring We back to the start.
Then choose Run again. Make a habit of looking at the messages in the console window
and learning what is normal.
Warnings and errors should be taken seriously; they usually indicate real problems that will
catch We later if We don‟t fix them.
Schematic Simulation
Next, We will verify your schematic by generating a Verilog deck and pasting it into the
RTL Verilog file.
While viewing your schematic, click on Tools • Simulation • NCVerilog to open a window
for the Verilog environment. Note the run directory (e.g. nand2_run1), and press the button in the
upper left to initialize the design.
Then press the next button to generate a netlist. Look in the icfb window for errors and
correct them if necessary.
We should see that the pmos, nmos, and nand2 cells were all netlisted. In your Linux
terminal window, cd into the directory that was created. We‟ll find quite a few files.
Take a look at the netlist and other files. testfixture.template is the top level module that
instantiates the device under test and invokes the testfixture.verilog.
Copy your from your nand2sim directory to your nand2_run1 directory using a command
such as
cp ../nand2sim/testfixture.verilog . cp ../nand2sim/nand2.tv .
Back in the Virtuoso Verilog Environment window, We may wish to choose Setup •
Record Signals.
Click on the “All” button to record signals at all levels of the hierarchy. (This isn‟t
important for the nand with only one level of hierarchy, but will be helpful later.)
Then choose Setup • Simulation. Change the Simulation Log File to indicate simout.tmp –
sv. This will print the results in simout.tmp.
The –sv flag indicates that the simulator should accept SystemVerilog syntax used in the
testfixture.verilog. Set the Simulator mode to “Batch” and click on the Simulate button.
We should get a message that the batch simulation succeeded.This doesn‟t mean that it is
correct, merely that it run.
In the terminal window, view the simout.tmp file. It will give some statistics about the
compilation, then should indicate that the 4 tests were completed with 0 errors.
If the simulation fails, the simout.tmp file will have clues about the problems. Change the
simulator mode to Interactive to rerun with the GUI.
Be patient; the GUI takes several seconds to start and gives no sign of life until the n. Add
the waveforms again and run the simulation.
We may need to zoom to fit all the waves. For some reason, SimVision doesn‟t print the
$display message about the simulation succeeding with no errors.
We will have to read the simout.tmp file at the command line to verify that the test vectors
passed. If We find any logic errors, correct the schematic and resimulate.
The rapid pace of innovation has created powerful SOC solutions at consumer prices.
This has created a highly competitive market place where billions of dollars can be won by the
right design delivered at the right time.
These new designs are produced on processes that challenge the fundamental law of
physics and are highly sensitive to equipment variation.
The industry now produces new designs in a complex world where process and
design interaction have created new complex failures that stand in the way of billion-dollar
opportunities.
They also lead to new types of design issues such as delay defects in combinational and sequential
logic.
The challenge is made even greater by the growing complexity in device structure and
design techniques.
Multiple design organizations use multiple IP blocks and multiple libraries that need to work
Together throughout the process window, often across multiple fabs.
These new challenges come at a time when product lifetimes are shrinking, leading to
pressure to reduce time for debug and characterization activities. These problems are seen for the
first time at first silicon.
Test the first chips back from fabrication If We are lucky, they work the first time If not
Logic bugs vs. electrical failures Most chip failures are logic bugs from inadequate simulation or
verification Some are electrical failures Crosstalk Dynamic nodes: leakage, charge sharing Ratio
failures A few are tool or methodology failures (e.g. DRC) Fix the bugs and fabricate a corrected
chip Silicon debug (or “bringup”) is primarily a Non-Recurring Engineering (NRE) cost (like
design) Contrast this with manufacturing test which has to be applied to every part shipped.
MANUFACTURING TEST
A speck of dust on a wafer is sufficient to kill chip Yield of any chip is < 100% Must test
chips after manufacturing before delivery to customers to only ship good parts Manufacturing
testers are very expensive Minimize time on tester Careful selection of test vectors.
A test for a defect will produce an output response which is different from the output when
there is no defect Test quality is high if the set of tests will detect a very high fraction of possible
defects Defect level is the percentage of bad parts shipped to customers Yield is the percentage of
defect- free chips manufactured
Fault models:
Numerous possible physical failures (what we are testing for) Can reduce the number of
failure types by considering the effects of physical failures on the logic functional blocks: called a
Assume that defects will cause the circuit to behave as if lines were “stuck” at logic 0 or 1 Most
commercial tools for test are based on the “stuck-at” model Other fault models “Stuck open”
model for charge retained on a CMOS node Recent use of the “transition” fault model in an
attempt to deal with delays “Path delay” fault model would be better for small delay defects, but
the large number of possible paths is an impediment to the use of this fault model.
Approach to generating tests for defects is to map defects to (higher level) faults: develop
fault model, then generate tests for the faults Typical: gate- level “stuck-at” fault model As
technology shrinks, other faults: bridging faults, delay faults, crosstalk faults, etc.
An interesting point: what is important is how well the tests generated (based on the fault
model) will detect realistic defects the accuracy of the fault model is secondary
Fault simulation:
Identify faults detected by a sequence of tests Provide a numerical value of coverage (ratio
of detected faults to total faults) Correlation between high fault coverage and low defect level
Faults considered Generally, gate level “stuck-at” faults Can also evaluate coverage of switch level
faults Can include timing and dynamic effects of failures.
Although fault simulation takes polynomial time in the number of gates, it can still be
prohibitive for large designs. Static timing analysis (Primetime, for example) only finds structural
long paths
Boundary scan is a method for testing interconnects (wire lines) on printed circuit boards or
sub-blocks inside an integrated circuit. Boundary scan is also widely used as a debugging method
to watch integrated circuit pin states, measure voltage, or analyze sub-blocks inside an integrated
circuit.
Testing
The boundary scan architecture provides a means to test interconnects and clusters of logic,
memories etc. Without using physical test probes. It adds one or more so called 'test cells'
connected to each pin of the device that can selectively override the functionality of that pin.
These cells can be programmed via the JTAG scan chain to drive a signal onto a pin and
across an individual trace on the board. The cell at the destination of the board trace can then be
programmed to read the value at the pin, verifying the board trace properly connects the two pins.
If the trace is shorted to another signal or if the trace has been cut, the correct signal value
will not show up at the destination pin, and the board will be observed to have a fault.
On-Chip Infrastructure
To provide the boundary scan capability, IC vendors add additional logic to each of their
devices, including scan cells for each of the external traces.
These cells are then connected together to form the external boundary scan shift register
(BSR), and combined with JTAG TAP (Test Access Port) controller support comprising four (or
sometimes more) additional pins plus control circuitry.
Some TAP controllers support scan chains between on-chip logical design blocks, with
JTAG instructions which operate on those internal scan chains instead of the BSR.
This can allow those integrated components to be tested as if they were separate chips on a
board. On-chip debugging solutions are heavy users of such internal scan chains.
These designs are part of most Verilog or VHDL libraries. Overhead for this additional
logic is minimal, and generally is well worth the price to enable efficient testing at the board level.
For normal operation, the added boundary scan latch cells are set so that they have no
effect on the circuit, and are therefore effectively invisible.
However, when the circuit is set into a test mode, the latches enable a data stream to be
shifted from one latch into the next.
Once a complete data word has been shifted into the circuit under test, it can be latched into
place so it drives external signals.
Shifting the word also generally returns the input values from the signals configured as
inputs.
Test Mechanism
As the cells can be used to force data into the board, they can set up test conditions. The
relevant states can then be fed back into the test system by clocking the data word back so that it
can be analyzed.
By adopting this technique, it is possible for a test system to gain test access to a board. As
most of today‟s boards are very densely populated with components and tracks, it is very difficult
for test systems to physically access the relevant areas of the board to enable them to test the
board. Boundary scan makes access possible without always needing physical probes.
In modern chip and board design, Design For Test is a significant issue, and one common
design artifact is a set of boundary scan test vectors, possibly delivered in Serial Vector Format
(SVF) or a similar interchange format.
Devices communicate to the world via a set of input and output pins. By themselves, these
pins provide limited visibility into the workings of the device.
However, devices that support boundary scan conta in a shift-register cell for each signal
pin of the device. These registers are connected in a dedicated path around the device's boundary
(hence the name).
The path creates a virtual access capability that circumvents the normal inputs and provides
direct control of the device and detailed visibility at its outputs. he contents of the boundary scan
are usually described by the manufacturer using a part-specific BSDL file.
The boundary-scan cells can be configured to support external testing for interconnection
between chips (EXTEST instruction) or internal testing for logic within the chip (INTEST
instruction).
Typically high- end commercial JTAG testing systems allow the import of design 'netlists'
from CAD/EDA systems plus the BSDL models of boundary scan/JTAG complaint devices to
automatically generate test applications. Common types of test include
When used during manufacturing, such systems also support non-test but affiliated
applications such as in-system programming of various types of flash memory: NOR, NAND, and
serial (I2C or SPI).
Such commercial systems are used by board test professionals and will often cost several
thousand dollars for a fully- fledged system.
They can include diagnostic options to accurately pin- point faults such as open circuits and
shorts and may also offer schematic or layout viewers to depict the fault in a graphical manner.
Tests developed with such tools are frequently combined with other test systems such as in-
circuit testers (ICTs) or functional board test systems.
Design-for-testability techniques improve the controllability and observability of internal nodes, so that
embedded functions can be tested.
Two basic properties determine the testability of a node: 1) controllability, which is a measure of the
difficulty of setting internal circuit nodes to 0 or 1 by assigning values to primary inputs (PIs), and 2)
observability, which is a measure of the difficulty of propagating a node‟s value to a primary output (PO)
. A node is said to be testable if it is easily controlled and observed. For sequential circuits, some have
added predictability, which represents the ability to obtain known output values in response to given
input stimuli. The factors affecting predictability include initializability, races, hazards, oscillations, etc.
DFT techniques include analog test busses and scan methods. Testability can also be improved with BIST
circuitry, where signal generators and analysis circuitry are implemented on chip. Without testability,
design flaws may escape detection until a product is in the hands of users; equally, operational failures
may prove difficult to detect and diagnose.
Traditionally, hardware designers and test engineers have focused on proving the correct manufacture of a
design and on locating and repairing field failures. They have developed several highly structured and
effective solutions to this problem, including scan design and self test. Design verification has been a less
formal task, based on the designer‟s skills. However, designers have found that structured design-for-test
features aiding manufacture and repair can significantly simplify design verification. These features
reduce verification cycles from weeks to days in some cases.
In contrast, software designers and test engineers have targeted design validation and verification. Unlike
hardware, software does not break during field use. Design errors, rather than incorrect replication or
wear out, cause operational bugs. Efforts have focused on improving specifications and programming
styles rather than on adding explicit test facilities. For example, modular design, structured programming,
formal specification, and object orientation have all proven effective in simplifying test.
Although these different approaches are effective when we can cleanly separate a design‟s hardware and
software parts, problems arise when boundaries blur. For example, in the early design stages of a complex
system, we must define system level test strategies. Yet, we may not have decided which parts to
implement in hardware and which in software. In other cases, software running on general-purpose
hardware may initially deliver certain functions that we subsequently move to firmware or hardware to
improve performance.
Designers must ensure a testable, finished design regardless of implementation decisions. Supporting
hardware-software codesign‟ requires “cotesting” techniques, which draw hardware and software test
techniques together into a cohesive whole.
Things to be followed
Large circuits should be partitioned into smaller sub-circuits to reduce test costs. One of the most
important steps in designing a testable chip is to first partition the chip in an appropriate way such that for
each functional module there is an effective (DFT) technique to test it.
Partitioning must be done at every level of the design process, from architecture to circuit, whether testing
is considered or not. Partitioning can be functional (according to functional module boundaries) or
physical (based on circuit topology). Partitioning can be done by using multiplexers and/or scan chains.
Test access points must be inserted to enhance controllability & observability of the circuit. Test
points include control points (CPs) and observation points (OPs). The CPs are active test points,
while the OPs are passive ones. There are also test points, which are both CPs and OPs. Before
exercising test through test points that are not PIs and POs, one should investigate into additional
requirements on the test points raised by the use of test equipments.
Circuits (flip-flops) must be easily initializable to enhance predictability. A power-on reset
mechanism controllable from primary inputs is the most effective and widely used approach.
Test control must be provided for difficult-to-control signals.
Automatic Test Equipment (ATE) requirements such as pin limitation, tri-stating, timing
resolution, speed, memory depth, driving capability, analog/mixed-signal support,
internal/boundary scan support, etc., should be considered during the design process to avoid
delay of the project and unnecessary investment on the equipments.
Internal oscillators, PLLs and clocks should be disabled during test. To guarantee tester
synchronization, internal oscillator and clock generator circuitry should be isolated during the test
of the functional circuitry. The internal oscillators and clocks should also be tested separately.
Analog and digital circuits should be kept physically separate. Analog circuit testing is very much
different from digital circuit testing. Testing for analog circuits refers to real measurement, since
analog signals are continuous (as opposed to discrete or logic signals in digital circuits). They
require different test equipments and different test methodologies. Therefore they should be
tested separately.
Things to be avoided
Asynchronous(unclocked) logic feedback in the circuit must be avoided. A feedback in
the combinational logic can give rise to oscillation for certain inputs. Since no clocking is
employed, timing is continuous instead of discrete, which makes tester synchronization virtually
impossible, and therefore only functional test by application board can be used.
The above guidelines are from experienced practitioners. These are not complete or universal. In
fact, there are drawbacks for these methods:
A There is a lack of experts and tools.
B Test generation is often manual
C This method cannot guarantee for high fault coverage.
D It may increase design iterations.
E This is not suitable for large circuits
In this way, all the flip- flops can be loaded with a known value, and their value can be easily
accessed by shifting out the chain. Figure 39.1 shows a typical circuit after the scan insertion
operation.
Input/output of each scan shift register must be available on PI/PO.
Combinational ATPG is used to obtain tests for all testable faults in the combinational
logic.
Shift register tests are applied and ATPG tests are converted into scan sequences for use
in manufacturing test.
Primary Primary
Inputs Outputs
SFF
TC
SCANIN
CLK
Fig. 39.1 Scan structure to a design
Fig. 39.1 shows a scan structure connected to design. The scan flip-flips (FFs) must be
interconnected in a particular way. This approach effectively turns the sequential testing problem
into a combinational one and can be fully tested by compact ATPG patterns. Unfortunately, there
are two types of overheads associated with this technique that the designers care about very
much. These are the hardware overhead (including three extra pins, multiplexers for all FFs, and
extra routing area) and performance overhead (including multiplexer delay and FF delay due to
extra load).
Only clocked D-type master-slave flip- flops for all state variables should be used.
At least one PI pin must be available for test. It is better if more pins are available.
All clock inputs to flip- flops must be controlled from primary inputs (PIs). There will be
no gated clock. This is necessary for FFs to function as a scan register.
Clocks must not feed data inputs of flip- flops. A violation of this can lead to a race
condition in the normal mode.
Scan Overheads
The use of scan design produces two types of overheads. These are area overhead and
performance overhead. The scan hardware requires extra area and slows down the signals.
II. IO pin overhead: At least one primary pin necessary for test.
JJ. Area overhead: Gate overhead = [4 nsff/(n g+10n ff)] x 100%, where n g = number of
combinational gates; nff = number of flip- flops; nsff = number of scan flip- flops; For full
scan number of scan flip- flops is equal to the number of original circuit flip- flops.
Example: ng = 100k gates, n ff = 2k flip- flops, overhead = 6.7%. For more accurate
estimation scan wiring and layout area must be taken into consideration.
Performance overhead: The multiplexer of the scan flip- flop adds two gate-delays in
combinational path. Fanouts of the flip- flops also increased by 1, which can increase the
clock period.
Scan Variations
There have been many variations of scan as listed below, few of these are discussed here.
MUXed Scan
Scan path
Scan-Hold Flip-Flop
Serial scan
Level-Sensitive Scan Design (LSSD)
Scan set
Random access scan
MUX Scan
It was invented at Stanford in 1973 by M. Williams & Angell.
In this approach a MUX is inserted in front of each FF to be placed in the scan chain.
The scan flip-flips (FFs) must be interconnected in a particular way. This approach effectively
turns the sequential testing problem into a combinational one and can be fully tested by compact
ATPG patterns.
There are two types of overheads associated with this method. The hardware overhead due to
three extra pins, multiplexers for all FFs, and extra routing area. The performance overhead
includes multiplexer delay and FF delay due to extra load.
Scan Path
This approach is also called the Clock Scan Approach.
It was invented by Kobayashi et al. in 1968, and reported by Funatsu et al. in 1975, and
adopted by NEC.
In this approach multiplexing is done by two different clocks instead of a MUX.
It uses two-port raceless D-FFs as shown in Figure 39.3. Each FF consists of two latches
operating in a master-slave fashion, and has two clocks (C1 and C2) to control the scan
input (SI) and the normal data input (DI) separately.
The two-port raceless D-FF is controlled in the following way:
For normal mode operation C2 = 1 to block SI and C1 = 0 →1 to load DI.
For shift register test mode C1 = 1 to block DI and C2 = 0 →1 to load SI.
C2
SI
DI DO
SO
C1 L2
L1
Fig. 39.3 Logic diagram of the two-port raceless D-FF
(b) This approach gives a lower hardware overhead (due to dense layout) and less
performance penalty (due to the removal of the MUX in front of the FF) compared to the
MUX Scan Approach. The real figures however depend on the circuit style and
technology selected, and on the physical implementation.
DI +L1 DI
C L1 +L1
SI
C A
SI
+L2
+L2 B L2
A
B
LSSD requires that the circuit be LS, so we need LS memory elements as defined above. Figure
39.4 shows an LS polarity-hold latch. The correct change of the latch output (L) is not dependent
on the rise/fall time of C, but only on C being `1' for a period of time greater than or equal to data
propagation and stabilization time. Figure 39.5 shows the polarity- hold shift-register latch (SRL)
used in LSSD as the scan cell.
The scan cell is controlled in the following way:
Normal mode: A=B=0, C=0 → 1.
SR (test) mode: C=0, AB=10→ 01 to shift SI through L1 and L2 .
Advantages of LSSD
Correct operation independent of AC characteristics is guaranteed.
FSM is reduced to combinational logic as far as testing is concerned.
Hazards and races are eliminated, which simplifies test generation and fault simulation.
Drawbacks of LSSD
Complex design rules are imposed on designers. There is no freedom to vary from the
overall schemes. It increases the design complexity and hardware costs (4-20% more
hardware and 4 extra pins).
Asynchronous designs are not allowed in this approach.
Sequential routing of latches can introduce irregular structures.
Faults changing combinational function to sequential one may cause trouble, e.g., bridging
and CMOS stuck-open faults.
Test application becomes a slow process, and normal-speed testing of the entire test
sequence is impossible.
It is not good for memory intensive designs.
Combinational PO
Logic
PI RAM
CK nff bite
TC SCANOUT
SCANIN
Select
Address Address
Log2 nff bites Decoder
CK
TC
SCAN
SE OUT
The difference between this approach and the previous ones is that the state vector can
now be accessed in a random sequence. Since neighboring patterns can be arranged so
that they differ in only a few bits, and only a few response bits need to be observed, the
test application time can be reduced.
In this approach test length is reduced.
This approach provides the ability to `watch' a node in normal operation mode, which is
impossible with previous scan methods.
This is suitable for delay and embedded memory testing.
The major disadvantage of the approach is high hardware overhead due to address
decoder, gates added to SFF, address register, extra pins and routing
Scan-Hold Flip-Flop
Special type of scan flip-flop with an additional latch designed for low power testing
application.
It was proposed by DasGupta in Figure 39.8 shows a hold latch cascaded with the SFF.
The control input HOLD keeps the output steady at previous state of flip- flop.
For HOLD = 0, the latch holds its state and for HOLD = 1, the hold latch becomes
transparent.
For normal mode operation, TC = HOLD =1 and for scan mode, TC = 1 and Hold = 0.
Hardware overhead increases by about 30% due to extra hardware the hold latch.
This approach reduces power dissipation and isolate asynchronous part during scan.
It is suitable for delay test
To SD of
next SHFF
D
Q
S
SFF
T
Q
CK
HO
Combinational
circuit
CK1
FF
FF
CK2 SCANOU T
SFF
TC
SFF
SCANIN
Conclusions
Accessibility to internal nodes in a complex circuitry is becoming a greater problem and thus it is
essential that a designer must consider how the IC will be tested and extra structures will be
incorporated in the design.
Scan design has been the backbone of design for testability in the industry for a long time.
Design automation tools are available for scan insertion into a circuit which then generate test
patterns.
Overhead increases due to the scan insertio n in a circuit. In ASIC design 10 to 15 % scan overhead
is generally accepted.
IDDQ Testing:
Iddq testing is a method for testing CMOS integrated circuits for the presence of manufacturing faults. It relies on
measuring the supply current (Idd) in the quiescent state (when the circuit is not switching and inputs are held at
static values). The current consumed in the state is commonly called Iddq for Idd (quiescent) and hence the name.
Iddq testing uses the principle that in a correctly operating quiescent CMOS digital circuit, there is no static current
path between the power supply and ground, except for a small amount of leakage. Many common semiconductor
manufacturing faults will cause the current to increase by orders of magnitude, which can be easily detected. This
has the advantage of checking the chip for many possible faults with one measurement. Another advantage is that it
may catch faults that are not found by conventional stuck-at fault test vectors.
Iddq testing is somewhat more complex than just measuring the supply current. If a line is shorted to Vdd, for
example, it will still draw no extra current if the gate driving the signal is attempting to set it to '1'. However, a
different input that attempts to set the signal to 0 will show a large increase in quiescent current, signalling a bad
part. Typical Iddq tests may use 20 or so inputs. Note that Iddq test inputs require only controllability, and
not observability. This is because the observability is through the shared power supply connection.
Even though this method is quite popular and simple, its inner workings are very complex. It goes beyond just
measuring the supply current. To use an example, if a line is shortened to Vdd it will still be unable to draw extra
current if the gate driving the signal is set to „1‟. But a different input attempting to set the signal at „0‟ will show an
increase in quiescent current that will indicate a bad part in the electrical stream. A typical Iddq test will use about
20 inputs. These test inputs need only controllability and not necessarily observability. The reason for this is that
observability takes place through the shared power connection.
The advantages of Iddq are far greater than anyone could have ever imagined. Firstly, it is a simple and direct test
that can identify physical defects more effectively than standardised equipment or methods. Secondly, the time
period attached to it isn‟t very demanding. What this means is that the design time and area overhead are relatively
low. The test generation is fast, the test application time is fast due to the small sets in vectors, and it catches
underlying effects that other tests can‟t pick up on immediately.
One disadvantage of Iddq testing is that it can be time consuming if compared to methods like scan testing. It is also
a more expensive option, comparatively speaking. The reason for this is because it is achieved by current
measurements that take much more time than reading digital pins in mass production.
2. Systematic Defects: Again systematic defects are more prominent contributor in yield loss in deep submicron
process technologies. Systematic defects are related to process technology due to limitation of lithography process
which increased the variation in desired and printed patterns. Another aspects of process related problem is planarity
issues make layer density requirements necessary because areas with a low density of a particular layer can cause
upper layers to sag, resulting in discontinuous planarity across the chip.
3. Parametric Defects: In deep submicron technology parametric defects is most critical for us. Parametric defects
come into the picture due to improper modeling of interconnects parasitic.
As a result manufactured device does not match the expected result from design simulation and does not meet the
design specification.
Design for manufacturability (DFM) is process to overcome these defects of yield drop out. The DFM will not be
done without collaborations between various technology parties, such as process, design, mask, EDA, and so on.
The DFM will give us a big challenge and opportunity in nanometer era.
Design for Manufacturability is the proactive process which ensures the quality, reliability, cost effective and time
to market.
DFM consist a set of different methodologies trying to enforce some soft (recommended/Mandatory) design rules
regarding the shapes and polygons of the physical layout which improve the yield.
Given a fixed amount of available space in a given layout area, there are potentially multiple yield enhancing
changes that can be made.
There are some DFM guidelines which we can take into account at SOC level.
1. Filler cell (consisting regular Diffusion and Poly silicon structures) insertion and shielding
Issue Addressed: PO/OD non uniformity
Benefit: Higher parametric yield.
2. Via optimization
Issue Addressed: open Via‟s, systematic via opening issue
Benefit: Higher yield after manufacturing and qualification.
3. Wire Spreading
Issue Addressed: wire shorts and opening due to defectivity.
Benefit: Higher yield, decrease cross talk.
4. Power/ground-connected fill
Issue Addressed: Density gradients, Large IR drop, Layout becomes regular
Benefit: Robustness to IR drop
5. Litho hotspot detection and repair
Issue Addressed: Lithography hotspots
Benefit: Higher yield
6. Dummy Metal/Via/FEOL
Issue Addressed: Large density gradients
Benefit: Higher yield
7. CMP hotspot detection
Issue Addressed: CMP hotspots
Benefit: Higher yield