Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

A 32-Bit Carry Lookahead Adder

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

992 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO.

8, AUGUST 2005

APPENDIX A 32-Bit Carry Lookahead Adder


PROOF OF (6) Using Dual-Path All-N Logic
Proof: If a polynomial P (X ) = 0j 3 aj X j is in K [X ] [de- Ge Yang, Seong-Ook Jung, Kwang-Hyun Baek, Soo Hwan Kim,
fined in (1)] then its fourth power is equal to Suki Kim, and Sung-Mo Kang
4 4 4j
P (X ) = aj X (13)
0j 3 Abstract—We have developed dual path all-N logic (DPANL) and applied
it to 32-bit adder design for higher performance. The speed is significantly
enhanced due to reduced capacitance at each evaluation node of dynamic
because K is a field of characteristic 2. Since X 4j mod (X 4 +1) = 1, circuits. The power saving is achieved due to reduced adder cell size and
in R we get X 4j = 1 and from (13) we obtain minimal race problem. Post-layout simulation results show that this adder
can operate at frequencies up to 1.85 GHz for 0.35- m 1P4M CMOS tech-
4 4 4 nology and is 32.4% faster than the adder using all-N transistor (ANT). It
P (X ) = aj = P (1): (14)
0j 3
also consumes 29.2% less power than the ANT adder. A 0.35- m CMOS
chip has been fabricated and tested to verify the functionality and perfor-
mance of the DPANL adder on silicon.
From (2), it follows that
Index Terms—CMOS, dynamic-logic circuit, high performance,
low-power design.
c(1) = (x + 1) + 1 + 1 + x = 1

and, therefore, from (14), we get I. INTRODUCTION


4 3 Much work has been done recently on high-performance low-power
c (X ) = 1 = c(X ) 1 c (X ) = c(X ) 1 d(X )
adder design critical for microprocessors [1]–[3]. Dynamic circuits
that completes the proof of (6). have been widely used owning to faster switching speed and less area
than the conventional static CMOS circuits. Pipelined structure has
also been used to further enhance the operating frequency to achieve
higher throughput.
REFERENCES In pipelined systems of NORA [4], ZIPPER [5], and TSPC [6], low-
speed pMOS logic blocks are used. For speed improvement, all-N logic
[1] FIPS 197: Advanced Encryption Standard, 2001. (ANL) [7] was introduced to use only high-speed nMOS logic in all
[2] K. Gaj and P. Chodowiec, “Comparison of the hardware performance
of the AES candidates using reconfigurable hardware,” in Proc. 3rd Ad-
stages. All-N transistor (ANT) [1] was developed by using a feedback
vanced Encryption Standard Candidate Conf. (AES3), New York, Apr. transistor pair to improve the performance of ANL. For further speed
2000, pp. 40–54. improvement with reduced power consumption, we propose dual-path
[3] A. J. Elbirt, W. Yip, B. Chetwynd, and C. Paar, “An FPGA implemen- all-N logic (DPANL).
tation and performance evaluation of the AES block cipher candidate This paper is organized as following. Section II reviews previous
algorithm finalists,” in Proc. 3rd Advanced Encryption Standard Candi-
work. Section III introduces DPANL. Simulation and chip testing re-
date Conf. (AES3), New York, Apr. 2000, pp. 13–27.
[4] V. Rijmen. Efficient implementation of the Rijndael S-box. [Online]. sults are shown in Section IV, followed by the conclusion in Section V.
Available: http://www.esat.kuleuven.ac.be/~rijmen/rijndael/sbox.pdf
[5] A. Rudra, P. K. Dubey, C. S. Jutla, V. Kumar, J. Rao, and P. Rohatgi, “Ef- II. PREVIOUS WORK
ficient Rijndael encryption implementation with composite field arith-
metic,” in Proc. Int. Workshop Cryptographic Hardware and Embedded NORA uses two-phase clock signals instead of four-phase clock sig-
Systems (CHES’01), vol. 2161, 2001, pp. 171–184. nals and avoids the race problem caused by clock skews with con-
[6] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, “A compact Rijndael
hardware architecture with S-box optimization,” in Proc. Theory and
strained logic composition [6]. True single-phase clock (TSPC) uses
Application of Cryptology and Information Security (ASIACRYPT’01), only a single-phase clock without inversion. It does not suffer from the
vol. 2248, Gold Coast, Australia, Dec. 9–13, 2001, pp. 239–254. clock skew problems and thus can operate at high clock frequency [7].
[7] P. Davies. Thales e-Security white paper: Flexible security. [On- Both NORA and TSPC pipelined systems have the drawback of using
line]. Available: http://www.thales-esecurity.com/Whitepapers/docu- low speed pMOS logic blocks that limit the performance of pipelined
ments/WP_Flexible_Security.pdf
systems.
[8] V. Fischer and M. Drutarovský, “Two methods of Rijndael implemen-
tation in reconfigurable hardware,” in Proc. Int. Workshop on Crypto- Fig. 1 shows a circuit diagram of the CMOS dynamic circuit ANL.
graphic Hardware and Embedded Systems (CHES’01), vol. 2162, Paris, It removes the drawback of TSPC logic by using an nMOS logic tree
France, May 2001, pp. 81–96. in N2-block. To overcome the voltage drop problem in the nMOS logic
[9] J. Wolkerstorfer, “An ASIC implementation of the AES MixColumn op- tree, a positive feedback pMOS P3 in N2-block is used to pull up the
eration,” in Proc. Austrochip 2001, Vienna, Austria, Oct. 12, 2001, pp.
129–132.
[10] C.-C. Lu and S.-Y. Tseng, “Integrated design of AES (advanced encryp- Manuscript received January 21, 2004; revised December 14, 2004. This
tion standard) encrypter and decrypter,” in Proc. IEEE Int. Conf. Appli- work was supported in part by Semiconductor Research Corporation under
cation-Specific Systems, Architectures and Processors (ASAP’02), 2002, Contract 2001-HJ-891, in part by Intel Corporation, and in part by BK21
pp. 277–285. program.
[11] X. Zhang and K. K. Parhi, “Implementation approaches for the advanced G. Yang is with the Nvidia Corporation, Santa Clara, CA 95050 USA (e-mail:
encryption standard algorithm,” IEEE Circuits Syst. Mag., vol. 2, no. 4, gyang@nvidia.com).
pp. 24–46, Mar. 2002. S.-O. Jung is with Qualcomm Inc., San Diego, CA 92121 USA.
[12] P. Chodowiec and K. Gaj, “Very compact FPGA implementation of the K.-H. Baek is with Rockwell Scientific, Thousand Oaks, CA 91360 USA.
AES algorithm,” in Proc. Int. Workshop on Cryptographic Hardware S. H. Kim and S. Kim are with Korea University, Seoul 136-701, Korea.
and Embedded Systems (CHES’03), vol. 2779, Cologne, Germany, Sep. S.-M. Kang is with the University of California, Santa Cruz, CA 95064 USA.
2003, pp. 319–333. Digital Object Identifier 10.1109/TVLSI.2005.853605

1063-8210/$20.00 © 2005 IEEE

Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005 993

Fig. 1. ANL logic. Fig. 3. Circuit diagram of DPANL.

Path 2 are identical except that Path 1 is made faster than Path 2, since
Path 1 influences the rise time of the output. The sizes of the transis-
tors in Path 1 and Path 2 should guarantee that the short circuit current
through pMOS P3, nMOS N4 and N3 does not affect the performance.
The capacitance at the evaluation node A consists of the gate capaci-
tance of pMOS P3, the drain capacitance of pMOS P1 and the nMOS
transistors at the top of the nMOS logic tree. It is much smaller than
the corresponding capacitance in ANT, and thus helps achieve higher
performance.
Power consumption is also less in DPANL. The total width of the
two nMOS logic trees in DPANL can be made the same as or even
less than that of ANT. In ANT, in order to charge and discharge the
Fig. 2. ANT logic. large capacitance at the evaluation node A, the sizes of pMOS P1 and
nMOS N1 must be large. Also, in order to discharge the capacitance
evaluation node. pMOS P3 in N1-block and nMOS N3 in N2-block introduced by the feedback transistor pair at point B, nMOS N4 and N2
are used to solve the charge sharing problem between the point OUT need to be large. In DPANL, since evaluation nodes A and B have small
and the point B. When the clock slew rate is high enough, pMOS P3 in capacitances, pMOS P1 and P2, nMOS N1 and N2 can be small; nMOS
N1-block and nMOS N3 in N2-block can be omitted [7]. N4 and N3 can also be small since there is no feedback transistor pair
A schematic diagram of ANT logic is shown in Fig. 2. It improves the attached to point C. So the total channel width of transistors in DPANL
performance using the feedback transistor pair, pMOS P3 and nMOS can be smaller than that in the ANT.
N3. In evaluation phase, if the nMOS logic tree is evaluated, after the The same principle applies to N2-block. The circuit diagram of
voltage of the evaluation node A drops to below (Vdd 0 Vth ), pMOS P3 N2-block is shown in Fig. 3(b).
turns on. Then it pulls up point B and turns on nMOS N3. nMOS N3
in turn pulls down evaluation node A and accelerates the evaluation. B. Operating Principles of the DPANL
However, the speedup using the feedback transistor pair is not signif- When the clock is low, N1-block of DPANL begins its precharge
icant when the number of serial nMOS transistors in the logic tree is phase. The clocked pMOS P1 and P2 are turned on, and the evaluation
small. nodes A and B are precharged to high. The clocked foot transistors
nMOS N1 and N2 are turned off, allowing no current through Path 1
III. CIRCUIT DIAGRAM AND OPERATING PRINCIPLES and Path 2. Since the evaluation node A is precharged to high, pMOS
P3 is turned off. nMOS N4 is turned off by the clock. So the output
A. Basic Idea point keeps its previous state in the capacitance at that point.
The performance of N1-block is affected by the rise time of the When the clock is high, N1-block begins its evaluation phase. If the
output point since two processes are involved. First the evaluation node nMOS logic tree is not evaluated, the evaluation nodes A and B stay
A is pulled down through the current path in the nMOS logic tree. Then high. pMOS P3 is off, nMOS N4 and N3 are on, the output is pulled
pMOS P2 turns on and the output point gets pulled up. The capacitance down. If the nMOS logic tree is evaluated, the evaluation nodes A and
at the evaluation node A significantly affects the performance. For ex- B are pulled down through Path 1 and Path 2, respectively. nMOS N3
ample, in Fig. 2(a), the gate capacitances of pMOS P2, P3, nMOS N2, is turned off. pMOS P3 is turned on and the output is pulled up.
and the drain capacitances of nMOS N3, pMOS P1, and the nMOS The operating principles of N2-block are similar to those of
transistors at the top of the nMOS logic tree are connected to the eval- N1-block. One thing to note is that when the nMOS logic tree is
uation node A. To further enhance the performance, we need to reduce evaluated, the evaluation nodes can not reach full Vdd because of
the capacitance at the evaluation node. the threshold voltage drop in nMOS transistors. The presence of the
We have developed DPANL to achieve this goal [8]. N1-block of feedback transistors pMOS P5 and P6 is to pull up the evaluation
DPANL is shown in Fig. 3(a). The nMOS logic trees in Path 1 and nodes to full Vdd .

Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.
994 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005

TABLE I TABLE II
TRANSISTOR SIZING FOR ANT[10] TRANSISTOR SIZING FOR DPANL

TABLE III
SIMULATION RESULTS OF THE THREE ADDERS
C. Minimal Race Problem in DPANL
Race exists in TSPC, ANL, ANT, and DPANL, namely, output
glitches caused by a race between the discharge of the evaluation node
in the logic block and the discharge of the output node by the latch
block. Let us take the ANT circuit in Fig. 2(a) as an example. Assume
the output was high during precharge phase. If the nMOS logic tree is
evaluated in the evaluation phase, the output will still be high. But at
the beginning of the evaluation phase, node A and CLK are both high,
the output will be discharged through nMOS N4 and N2. After the
evaluation node A is discharged, nMOS N2 is turned off and pMOS
P2 is turned on, the output is pulled up again, thus forming the large
glitch. The large output glitch consumes additional dynamic power.
In order to minimize the race problem, we need to speedup the dis-
charge of the evaluation node A and slow down the discharge of the
output. As we have discussed before, the capacitance at the evaluation
node of DPANL is much smaller than that of ANT, so discharge of the
evaluation node of DPANL is much faster. To slow down the discharge
of the output, we can do transistor sizing for the latch block so that on
the basis of equal rise time and fall time of output, the discharge path
nMOS N4 and N2 are chosen as weak as possible.

IV. SIMULATION AND CHIP TESTING RESULTS


The Kogge–Stone graph [9] is generally used for tree structure carry
lookahead adders. It has a regular structure and the maximum fanout Fig. 4. (a) Adder floorplan. (b) H-tree clock distribution.
at each cell for each pipeline stage is 2, which leads to high perfor-
mance. But it requires many long interconnects, causing much area and As shown in Table III, ANT is slower than ANL although it has feed-
thus much power consumption. S. Knowles introduced a new family back transistor pair. This is because the number of serial nMOS tran-
of adder structures that offer some tradeoff between performance and sistors in the adder cell circuit is only two and thus the evaluation node
power [3]. One of the structures is very suitable for pipeline systems can be discharged quickly. Feedback transistor pair not only is inef-
because the maximum fanout at each cell for each pipeline stage is 3. fective, but also increases the capacitance at the evaluation node. ANT
It requires less wiring than the Kogg–Stone graph. Thus, we adopted can be faster than ANL when the number of serial nMOS transistors is
this adder structure for low-power adder design. larger. ANT also consumes more area and power than ANL.
Table III shows that DPANL adder can operate at frequencies up
A. Prelayout Simulation to 2.1 GHz. It is 31.3% and 27.3% faster than ANT adder and ANL
For simulation, 0.35-m 1P4M CMOS technology with 3.3-V power adder, respectively. The DPANL adder consumes 32.8% less area and
supply is used. The delay, area, power consumption, and the leakage 29.2% less power than ANT adder. And it consumes 17.8% less area
current of DPANL are normalized to 1 for comparison. Proper tran- and 15.4% less power than ANL adder. The leakage current of DPANL
sistor sizing has been done for DPANL, ANL, and ANT. Table I shows adder is also smaller than that of ANT adder and ANL adder, even
the transistor sizing for ANT N1-block [Fig. 2(a)]; this sizing informa- though DPANL circuit has two evaluation paths. This is because that
tion is from the published ANT work [10], and it also used 0.35-m the total channel width of all transistors in the DPANL carry generation
CMOS process. We used similar transistor sizing for ANL. Table II cell is smaller than that of the ANT cell and the ANL cell.
shows the transistor sizing for DPANL N1-block [Fig. 3(a)]. Based on
Tables I and II, the carry generation cell (gi + pi gi01 ) using ANT has B. Postlayout Simulation
84 m total channel width of all transistors, while the carry generation Fig. 4(a) shows the floor plan for the adder. The inputs are fed to the
cell using DPANL only has 52 m total channel width of all transistors. left side of the adder. There are five stages of pi , gi generation after
The DPANL carry generation cell has 38% less area compared with the the p, g generation stage. The sum stage generates the outputs on the
ANT cell, even though it has three more transistors. right side of the adder. In the layout, the clock signal is fed to the top of
We have built three adders using DPANL, ANL, and ANT, respec- the adder and the buffered clock signal is fed to the center of the adder.
tively. The total channel width of all transistors in the adder is taken Then the clock signal propagates to all the cells in the adder through
as the area of the adder. The power consumptions of the adders are the H-tree. We have used a four-level H-tree clock distribution in the
measured at 1.25 GHz. In order to make the carry propagation chain adder, while a two-level H-tree is shown in Fig. 4(b). Fig. 5 shows the
have the critical delay path, the input signals (A31A30 . . . A1A0) + adder layout.
(B 31B 30 . . . B 1B 0) are chosen as follows: (00 . . . 00) + (11 . . . 11) Table IV shows the post-layout simulation results for the 32-bit
and (11 . . . 11) + (00 . . . 01) [2]. Power consumption is measured for DPANL adder and the 32-bit ANT adder. The process for both adders
this input. is the TSMC 0.35-m 1P4M CMOS process, and Vdd is 3.3 V. The

Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005 995

Fig. 6. Measured MSB output.

D. Discussion
Whereas scaling down of supply voltage is the most effective way
to reduce power consumption, the threshold voltages of transistors also
need to be scaled down to meet performance requirements. However,
the lowering of the transistor threshold voltage leads to the exponen-
tial growth of the subthreshold leakage current. For deep-submicron
processes, the floating evaluation node and the output node of DPANL
logic may be discharged by leakage currents. Keeper similar to domino
Fig. 5. DPANL adder layout. circuit could be applied to keep the noise margin of the evaluation node.
And back-to-back inverters could be used to keep the noise margin of
TABLE IV the output node.
POSTLAYOUT SIMULATION RESULTS We also simulated the DPANL adder using 0.13-m CMOS SPICE
parameters, and the Vdd is 1.2 V. The simulation results show that the
DPANL adder can operate up to 5.4 GHz, and the power consumption
is 170 mW.

V. CONCLUSION
highest clock frequency that DPANL adder can operate correctly is In this paper, we have proposed and analyzed DPANL dynamic cir-
1.85 GHz. It is lower than 2.1-GHz clock frequency predicted in the cuit suitable for high-performance and low-power pipelined system.
prelayout simulation, due to the inclusion of routing capacitances. The DPANL has smaller capacitance at each evaluation node and its race
layout area of DPANL adder is 0.7 mm2 . The power consumption of problem is minimal. DPANL outperforms ANL and ANT in both per-
the adder under 1.85-GHz clock frequency is 1 W. The ANT adder formance, area, and power consumption. The functionality and perfor-
can run up to 1.25 GHz. It is slower than DAPNL adder due to larger mance of a 32-bit CLA adder using the proposed circuit has been ver-
capacitance at the evaluation node of dynamic circuit. The layout ified through chip fabrication and testing.
area of the ANT adder is 1.86 mm2 , which is about 2.7 times of the
layout area of DPANL adder. Although the DPANL carry generation
cell has more transistors than the ANT carry generation cell, the total ACKNOWLEDGMENT
channel width of all transistors in the ANT cell is 1.6 times of that in
The authors would like to thank Prof. A. Shakauri, University of
the DPANL cell. Also, our manual layout yielded smaller area than
California, Santa Cruz, for providing much help with the chip testing.
the place and routing of ANT adder done by using EDA tools [10].

C. Chip Testing REFERENCES


The functionality of the adder chip was verified using HP80000 data [1] C.-C. Wang, P.-M. Lee, R.-C. Lee, and C.-J. Huang, “A 1.25 GHz 32-bit
generator and oscilloscope. It was not possible to feed 1.85-GHz clock tree-structured carry lookahead adder,” in Proc. 2001 IEEE Int. Symp.
signal into the chip due to large capacitance at the chip package pins. Circuits and Systems, vol. 4, 2001, pp. 80–83.
[2] K.-H. Cheng, W.-S. Lee, and Y.-C. Huang, “A 1.2 V 500 MHz 32-bit
In order to verify the DPANL adder performance, we applied 1.6-V carry-lookahead adder,” in Proc. 8th IEEE Int. Conf. Electronics, Cir-
Vdd , while the normal Vdd for TSMC 0.35-m process is 3.3 V. The cuits and Systems, vol. 2, 2001, pp. 765–768.
post-layout simulation showed that DPANL adder should work up to [3] S. Knowles, “A family of adders,” in Proc. 15th IEEE Symp. Computer
200-MHz clock frequency with 1.6-V Vdd . The critical delay inputs Arithmetic, 2001, pp. 277–281.
[4] N. F. Goncalves and H. J. De Man, “NORA: A race-free dynamic CMOS
were chosen as follows: (00 . . . 00) + (11 . . . 11) and (11 . . . 11) +
technology for pipelined logic structures,” IEEE J. Solid-State Circuits,
(00 . . . 01) [2]. Chip measurements confirmed the correct adder opera- vol. SSC–18, no. 6, pp. 261–266, Jun. 1983.
tion under 200-MHz clock frequency. Fig. 6 shows the measured most [5] C. M. Lee and E. W. Szeto, “Zipper CMOS,” IEEE Circuits Devices
significant bit (MSB) of adder output. The MSB output is 1 for one Mag., vol. 2, no. 3, pp. 10–16, May 1986.
clock cycle and is 0 for the next clock cycle, according to the critical [6] J. Yuan and C. Svensson, “High-speed CMOS circuit technique,” IEEE
J. Solid-State Circuits, vol. 24, no. 2, pp. 62–70, Feb. 1989.
delay inputs. The MSB output frequency is 100 MHz, which is cor- [7] R. X. Gu and M. I. Elmasry, “All-N-logic high-speed true-single-phase
rectly half of the clock frequency. The measured chip power consump- dynamic CMOS logic,” IEEE J. Solid-State Circuits, vol. 31, no. 2, pp.
tion under 200-MHz clock frequency and 1.6-V Vdd was 80 mW. 221–229, Feb. 1996.

Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.
996 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005

[8] G. Yang, S. O. Jung, S. H. Kim, and S. M. Kang, “A low-power 2.1 Nonenumerative ATPG approaches [6], [11] were proposed to over-
GHz 32-bit carry lookahead adder using dual path all-N-logic,” in Proc. come the problem of path enumeration. Both approaches are using
45th IEEE Int. Midwest Symp. Circuits and Systems, vol. 2, 2002, pp.
graph theoretic arguments and are building on top of PODEM-like fault
298–301.
[9] P. M. Kogge and H. S. Stone, “A parallel algorithm for the efficient solu- propagation methods along selected paths in the circuit. Unfortunately,
tion of a general class of recurrence equations,” IEEE Trans. Commun., the fault coverage from both of these methodologies is very low. Their
vol. COM-22, no. 4, pp. 786–793, Aug. 1973. test efficiency (number of detected faults per generated test) is also quite
[10] C.-C. Wang, Y.-L. Tseng, P.-M. Lee, R.-C. Lee, and C.-J. Huang, “A
1.25 GHz 32-bit tree-structured carry lookahead adder using modified
low. More importantly, none of these methods addresses scalability. In
ANT logic,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. our context, we refer to scalability as the ability of the approach to
50, no. 9, pp. 1208–1216, Sep. 2003. maintain the test efficiency as the number of targeted PDFs increases.
A major difference between the proposed method and the approaches
in [6] and [11] is that we use function-based techniques to generate
the tests. Function-based ATPG methods for PDFs have also been pro-
posed in [1], among some others, but all these approaches are fault
enumerative.
Function-Based Compact Test Pattern Apart from the nonenumerative techniques in [6] and [11], other pro-
Generation for Path Delay Faults cedures that explicitly target the generation of compact test sets for
PDFs were proposed in [2], [12], and [13]. The test compaction pro-
Maria K. Michael and Spyros Tragoudas cedure of [2], as well as the most recent one included in [12], is using
the concept of primary and secondary target faults. Once a test is found
Abstract—We present a function-based nonenumerative automatic test for a primary fault, it is expanded so that it also detects one or more
pattern generation (ATPG) methodology for detecting path delay faults secondary faults. The level of compaction in both of these techniques
(PDFs). The proposed technique consists of a number of topological cir- depends greatly on the selection order of the primary and secondary
cuit traversals during each a linear number of Boolean functions is gener- faults. A slightly different concept, the one of finding maximal sets of
ated per circuit line. From each such function we derive a test that detects
many PDFs. The two major strengths of the approach, that stem from the potentially compatible faults, is used in [13]. Even though they may
function-based formulations used, are very compact test sets, and scala- not target all faults explicitly, the above methods remain enumerative,
bility in test efficiency. The performance of an implementation based on since they are based on the principle of first targeting a single fault and
binary decision diagrams is evaluated and compared with existing compact then attempting to find one or more faults that can be tested mutually
methods to demonstrate the superiority of the proposed method.
with the original fault.
Index Terms—Automatic test pattern generation (ATPG), binary de- The proposed ATPG tool is called NEAT, for Non-Enumerative
cision diagram (BDD), Boolean/algebraic test generation, delay faults,
nonenumerative, test compaction, test efficiency, testing. ATPG. The approach consists of simple topological circuit traversals,
whose number is linear to the number of primary inputs. During
each traversal, a user-defined (constant) number of appropriately
I. INTRODUCTION formulated Boolean functions is maintained per circuit line. Each
Automatic test pattern generation (ATPG) for path delay faults such function, which we call a test function, is guaranteed to sensitize
(PDFs) is an important problem that has been considered in [1], [2], many subpaths from a primary input up to the line. When a circuit
[4]–[6], and [8]–[13], among others. Under the PDF model, a fault is a traversal is completed, tests that detect several PDFs originating from
sequence of falling or rising transitions along a physical path, from a some primary input are generated. The work presented here builds
primary input to a primary output in the circuit. A pair of patterns must upon the ATPG scenario introduced in [8], which did not guarantee
be applied to test each PDF. In this work, we consider combinational hazard-free robust test generation. The current work expands on [8]
and fully enhanced-scanned sequential circuits. by introducing a complete, systematic, and scalable framework that
In traditional enumerative methods, such as [1] and [4], the ATPG can be used to generate all types of tests for PDFs (robust, nonrobust,
process is applied on a fault-by-fault basis. To overcome the problem and functionally sensitizable). NEAT also attempts to maintain the test
of examining all PDFs, which can be an exponential number, many efficiency as more tests are generated. A new dynamic compaction
enumerative methods consider only the longest paths. However, such technique, whose performance is boosted by the fact that we implicitly
restrictions remain enumerative since the examined paths in many maintain very large sets of tests in the form of test functions, assists
circuits remain prohibitively many. The work in [5] suggests not in this goal.
examining paths but instead subpaths (segments). In the strict sense of A circuit is represented as a directed graph, denoted by G. The sub-
the definition, this approach cannot be classified as path-enumerative. circuit of G induced by primary input I is denoted by GI , and it also
However, it does not guarantee a polynomial bound on the number of contains all lines of G that are not driven by I but immediately drive
examined subpaths since the number of examined subpaths is a linear some node in GI . We call such lines the supporting points of GI .
fraction of the total number of PDFs. The controlling (noncontrolling) value of a gate g is denoted by cv (g )
(ncv (g )) 2 f0; 1g. A transition is designated by tr 2 fr; f g, where
r = rising and f = f alling . The positive (negative) cofactor of a
Manuscript received September 11, 2003; revised June 12, 2004. This work Boolean function f with respect to variable x is denoted by fx (fx ),
was supported in part by a grant from Intel Corporation.
M. K. Michael is with the Department of Electrical and Computer where fx = fjx=1 (fx = fjx=0 ).
Engineering, University of Cyprus, 1678 Nicosia, Cyprus (e-mail: Let gate g be on a PDF. An input of g is either an on-input which
mmichael@ucy.ac.cy). assumes a certain transition to be propagated or an off-input which as-
S. Tragoudas is with the Electrical and Computer Engineering Depart-
ment, Southern Illinois University, Carbondale, IL 62901 USA (e-mail:
sumes a value to be justified. We use the PDF classification of [4],
spyros@engr.siu.edu). which categorizes PDF tests into robust, nonrobust, functional sensi-
Digital Object Identifier 10.1109/TVLSI.2005.853607 tizable, and functional unsensitizable. Table I shows the constraints of

1063-8210/$20.00 © 2005 IEEE

Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.

You might also like