A 32-Bit Carry Lookahead Adder
A 32-Bit Carry Lookahead Adder
A 32-Bit Carry Lookahead Adder
8, AUGUST 2005
Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005 993
Path 2 are identical except that Path 1 is made faster than Path 2, since
Path 1 influences the rise time of the output. The sizes of the transis-
tors in Path 1 and Path 2 should guarantee that the short circuit current
through pMOS P3, nMOS N4 and N3 does not affect the performance.
The capacitance at the evaluation node A consists of the gate capaci-
tance of pMOS P3, the drain capacitance of pMOS P1 and the nMOS
transistors at the top of the nMOS logic tree. It is much smaller than
the corresponding capacitance in ANT, and thus helps achieve higher
performance.
Power consumption is also less in DPANL. The total width of the
two nMOS logic trees in DPANL can be made the same as or even
less than that of ANT. In ANT, in order to charge and discharge the
Fig. 2. ANT logic. large capacitance at the evaluation node A, the sizes of pMOS P1 and
nMOS N1 must be large. Also, in order to discharge the capacitance
evaluation node. pMOS P3 in N1-block and nMOS N3 in N2-block introduced by the feedback transistor pair at point B, nMOS N4 and N2
are used to solve the charge sharing problem between the point OUT need to be large. In DPANL, since evaluation nodes A and B have small
and the point B. When the clock slew rate is high enough, pMOS P3 in capacitances, pMOS P1 and P2, nMOS N1 and N2 can be small; nMOS
N1-block and nMOS N3 in N2-block can be omitted [7]. N4 and N3 can also be small since there is no feedback transistor pair
A schematic diagram of ANT logic is shown in Fig. 2. It improves the attached to point C. So the total channel width of transistors in DPANL
performance using the feedback transistor pair, pMOS P3 and nMOS can be smaller than that in the ANT.
N3. In evaluation phase, if the nMOS logic tree is evaluated, after the The same principle applies to N2-block. The circuit diagram of
voltage of the evaluation node A drops to below (Vdd 0 Vth ), pMOS P3 N2-block is shown in Fig. 3(b).
turns on. Then it pulls up point B and turns on nMOS N3. nMOS N3
in turn pulls down evaluation node A and accelerates the evaluation. B. Operating Principles of the DPANL
However, the speedup using the feedback transistor pair is not signif- When the clock is low, N1-block of DPANL begins its precharge
icant when the number of serial nMOS transistors in the logic tree is phase. The clocked pMOS P1 and P2 are turned on, and the evaluation
small. nodes A and B are precharged to high. The clocked foot transistors
nMOS N1 and N2 are turned off, allowing no current through Path 1
III. CIRCUIT DIAGRAM AND OPERATING PRINCIPLES and Path 2. Since the evaluation node A is precharged to high, pMOS
P3 is turned off. nMOS N4 is turned off by the clock. So the output
A. Basic Idea point keeps its previous state in the capacitance at that point.
The performance of N1-block is affected by the rise time of the When the clock is high, N1-block begins its evaluation phase. If the
output point since two processes are involved. First the evaluation node nMOS logic tree is not evaluated, the evaluation nodes A and B stay
A is pulled down through the current path in the nMOS logic tree. Then high. pMOS P3 is off, nMOS N4 and N3 are on, the output is pulled
pMOS P2 turns on and the output point gets pulled up. The capacitance down. If the nMOS logic tree is evaluated, the evaluation nodes A and
at the evaluation node A significantly affects the performance. For ex- B are pulled down through Path 1 and Path 2, respectively. nMOS N3
ample, in Fig. 2(a), the gate capacitances of pMOS P2, P3, nMOS N2, is turned off. pMOS P3 is turned on and the output is pulled up.
and the drain capacitances of nMOS N3, pMOS P1, and the nMOS The operating principles of N2-block are similar to those of
transistors at the top of the nMOS logic tree are connected to the eval- N1-block. One thing to note is that when the nMOS logic tree is
uation node A. To further enhance the performance, we need to reduce evaluated, the evaluation nodes can not reach full Vdd because of
the capacitance at the evaluation node. the threshold voltage drop in nMOS transistors. The presence of the
We have developed DPANL to achieve this goal [8]. N1-block of feedback transistors pMOS P5 and P6 is to pull up the evaluation
DPANL is shown in Fig. 3(a). The nMOS logic trees in Path 1 and nodes to full Vdd .
Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.
994 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005
TABLE I TABLE II
TRANSISTOR SIZING FOR ANT[10] TRANSISTOR SIZING FOR DPANL
TABLE III
SIMULATION RESULTS OF THE THREE ADDERS
C. Minimal Race Problem in DPANL
Race exists in TSPC, ANL, ANT, and DPANL, namely, output
glitches caused by a race between the discharge of the evaluation node
in the logic block and the discharge of the output node by the latch
block. Let us take the ANT circuit in Fig. 2(a) as an example. Assume
the output was high during precharge phase. If the nMOS logic tree is
evaluated in the evaluation phase, the output will still be high. But at
the beginning of the evaluation phase, node A and CLK are both high,
the output will be discharged through nMOS N4 and N2. After the
evaluation node A is discharged, nMOS N2 is turned off and pMOS
P2 is turned on, the output is pulled up again, thus forming the large
glitch. The large output glitch consumes additional dynamic power.
In order to minimize the race problem, we need to speedup the dis-
charge of the evaluation node A and slow down the discharge of the
output. As we have discussed before, the capacitance at the evaluation
node of DPANL is much smaller than that of ANT, so discharge of the
evaluation node of DPANL is much faster. To slow down the discharge
of the output, we can do transistor sizing for the latch block so that on
the basis of equal rise time and fall time of output, the discharge path
nMOS N4 and N2 are chosen as weak as possible.
Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005 995
D. Discussion
Whereas scaling down of supply voltage is the most effective way
to reduce power consumption, the threshold voltages of transistors also
need to be scaled down to meet performance requirements. However,
the lowering of the transistor threshold voltage leads to the exponen-
tial growth of the subthreshold leakage current. For deep-submicron
processes, the floating evaluation node and the output node of DPANL
logic may be discharged by leakage currents. Keeper similar to domino
Fig. 5. DPANL adder layout. circuit could be applied to keep the noise margin of the evaluation node.
And back-to-back inverters could be used to keep the noise margin of
TABLE IV the output node.
POSTLAYOUT SIMULATION RESULTS We also simulated the DPANL adder using 0.13-m CMOS SPICE
parameters, and the Vdd is 1.2 V. The simulation results show that the
DPANL adder can operate up to 5.4 GHz, and the power consumption
is 170 mW.
V. CONCLUSION
highest clock frequency that DPANL adder can operate correctly is In this paper, we have proposed and analyzed DPANL dynamic cir-
1.85 GHz. It is lower than 2.1-GHz clock frequency predicted in the cuit suitable for high-performance and low-power pipelined system.
prelayout simulation, due to the inclusion of routing capacitances. The DPANL has smaller capacitance at each evaluation node and its race
layout area of DPANL adder is 0.7 mm2 . The power consumption of problem is minimal. DPANL outperforms ANL and ANT in both per-
the adder under 1.85-GHz clock frequency is 1 W. The ANT adder formance, area, and power consumption. The functionality and perfor-
can run up to 1.25 GHz. It is slower than DAPNL adder due to larger mance of a 32-bit CLA adder using the proposed circuit has been ver-
capacitance at the evaluation node of dynamic circuit. The layout ified through chip fabrication and testing.
area of the ANT adder is 1.86 mm2 , which is about 2.7 times of the
layout area of DPANL adder. Although the DPANL carry generation
cell has more transistors than the ANT carry generation cell, the total ACKNOWLEDGMENT
channel width of all transistors in the ANT cell is 1.6 times of that in
The authors would like to thank Prof. A. Shakauri, University of
the DPANL cell. Also, our manual layout yielded smaller area than
California, Santa Cruz, for providing much help with the chip testing.
the place and routing of ANT adder done by using EDA tools [10].
Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.
996 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005
[8] G. Yang, S. O. Jung, S. H. Kim, and S. M. Kang, “A low-power 2.1 Nonenumerative ATPG approaches [6], [11] were proposed to over-
GHz 32-bit carry lookahead adder using dual path all-N-logic,” in Proc. come the problem of path enumeration. Both approaches are using
45th IEEE Int. Midwest Symp. Circuits and Systems, vol. 2, 2002, pp.
graph theoretic arguments and are building on top of PODEM-like fault
298–301.
[9] P. M. Kogge and H. S. Stone, “A parallel algorithm for the efficient solu- propagation methods along selected paths in the circuit. Unfortunately,
tion of a general class of recurrence equations,” IEEE Trans. Commun., the fault coverage from both of these methodologies is very low. Their
vol. COM-22, no. 4, pp. 786–793, Aug. 1973. test efficiency (number of detected faults per generated test) is also quite
[10] C.-C. Wang, Y.-L. Tseng, P.-M. Lee, R.-C. Lee, and C.-J. Huang, “A
1.25 GHz 32-bit tree-structured carry lookahead adder using modified
low. More importantly, none of these methods addresses scalability. In
ANT logic,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. our context, we refer to scalability as the ability of the approach to
50, no. 9, pp. 1208–1216, Sep. 2003. maintain the test efficiency as the number of targeted PDFs increases.
A major difference between the proposed method and the approaches
in [6] and [11] is that we use function-based techniques to generate
the tests. Function-based ATPG methods for PDFs have also been pro-
posed in [1], among some others, but all these approaches are fault
enumerative.
Function-Based Compact Test Pattern Apart from the nonenumerative techniques in [6] and [11], other pro-
Generation for Path Delay Faults cedures that explicitly target the generation of compact test sets for
PDFs were proposed in [2], [12], and [13]. The test compaction pro-
Maria K. Michael and Spyros Tragoudas cedure of [2], as well as the most recent one included in [12], is using
the concept of primary and secondary target faults. Once a test is found
Abstract—We present a function-based nonenumerative automatic test for a primary fault, it is expanded so that it also detects one or more
pattern generation (ATPG) methodology for detecting path delay faults secondary faults. The level of compaction in both of these techniques
(PDFs). The proposed technique consists of a number of topological cir- depends greatly on the selection order of the primary and secondary
cuit traversals during each a linear number of Boolean functions is gener- faults. A slightly different concept, the one of finding maximal sets of
ated per circuit line. From each such function we derive a test that detects
many PDFs. The two major strengths of the approach, that stem from the potentially compatible faults, is used in [13]. Even though they may
function-based formulations used, are very compact test sets, and scala- not target all faults explicitly, the above methods remain enumerative,
bility in test efficiency. The performance of an implementation based on since they are based on the principle of first targeting a single fault and
binary decision diagrams is evaluated and compared with existing compact then attempting to find one or more faults that can be tested mutually
methods to demonstrate the superiority of the proposed method.
with the original fault.
Index Terms—Automatic test pattern generation (ATPG), binary de- The proposed ATPG tool is called NEAT, for Non-Enumerative
cision diagram (BDD), Boolean/algebraic test generation, delay faults,
nonenumerative, test compaction, test efficiency, testing. ATPG. The approach consists of simple topological circuit traversals,
whose number is linear to the number of primary inputs. During
each traversal, a user-defined (constant) number of appropriately
I. INTRODUCTION formulated Boolean functions is maintained per circuit line. Each
Automatic test pattern generation (ATPG) for path delay faults such function, which we call a test function, is guaranteed to sensitize
(PDFs) is an important problem that has been considered in [1], [2], many subpaths from a primary input up to the line. When a circuit
[4]–[6], and [8]–[13], among others. Under the PDF model, a fault is a traversal is completed, tests that detect several PDFs originating from
sequence of falling or rising transitions along a physical path, from a some primary input are generated. The work presented here builds
primary input to a primary output in the circuit. A pair of patterns must upon the ATPG scenario introduced in [8], which did not guarantee
be applied to test each PDF. In this work, we consider combinational hazard-free robust test generation. The current work expands on [8]
and fully enhanced-scanned sequential circuits. by introducing a complete, systematic, and scalable framework that
In traditional enumerative methods, such as [1] and [4], the ATPG can be used to generate all types of tests for PDFs (robust, nonrobust,
process is applied on a fault-by-fault basis. To overcome the problem and functionally sensitizable). NEAT also attempts to maintain the test
of examining all PDFs, which can be an exponential number, many efficiency as more tests are generated. A new dynamic compaction
enumerative methods consider only the longest paths. However, such technique, whose performance is boosted by the fact that we implicitly
restrictions remain enumerative since the examined paths in many maintain very large sets of tests in the form of test functions, assists
circuits remain prohibitively many. The work in [5] suggests not in this goal.
examining paths but instead subpaths (segments). In the strict sense of A circuit is represented as a directed graph, denoted by G. The sub-
the definition, this approach cannot be classified as path-enumerative. circuit of G induced by primary input I is denoted by GI , and it also
However, it does not guarantee a polynomial bound on the number of contains all lines of G that are not driven by I but immediately drive
examined subpaths since the number of examined subpaths is a linear some node in GI . We call such lines the supporting points of GI .
fraction of the total number of PDFs. The controlling (noncontrolling) value of a gate g is denoted by cv (g )
(ncv (g )) 2 f0; 1g. A transition is designated by tr 2 fr; f g, where
r = rising and f = f alling . The positive (negative) cofactor of a
Manuscript received September 11, 2003; revised June 12, 2004. This work Boolean function f with respect to variable x is denoted by fx (fx ),
was supported in part by a grant from Intel Corporation.
M. K. Michael is with the Department of Electrical and Computer where fx = fjx=1 (fx = fjx=0 ).
Engineering, University of Cyprus, 1678 Nicosia, Cyprus (e-mail: Let gate g be on a PDF. An input of g is either an on-input which
mmichael@ucy.ac.cy). assumes a certain transition to be propagated or an off-input which as-
S. Tragoudas is with the Electrical and Computer Engineering Depart-
ment, Southern Illinois University, Carbondale, IL 62901 USA (e-mail:
sumes a value to be justified. We use the PDF classification of [4],
spyros@engr.siu.edu). which categorizes PDF tests into robust, nonrobust, functional sensi-
Digital Object Identifier 10.1109/TVLSI.2005.853607 tizable, and functional unsensitizable. Table I shows the constraints of
Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.