Checker Board
Checker Board
Checker Board
Shomo Chen, Ning Huang, Ting-Pu Tai*, Actel Niu* Trident Microsystems Inc. 3408 Garrett Drive Santa Clara, CA 95054-2803
*
Abstract
Embedded memory quality is critical to overall chip quality. New defect mechanisms that occur at advanced process nodes (65nm and below) are often more pronounced in memories due to their high density and performance requirements. Traditional memory test algorithms are not sufficient to guarantee a low escape rate for these new memory defects. This paper describes 6 advanced test algorithms that address these shortcomings in order to maintain high memory test quality at smaller geometries.3 customized algorithms are focuses and described creation through innovative way especially.
1. Introduction
Chip quality is becoming more difficult to maintain as the process geometry shrinks. Not only is design complexity higher, but new defect types cause DPM (defects per million) to increase. At the same time, the amount of embedded memory in many applications continues to grow, making memory testing a key factor in maintaining low cost and high quality in IC manufacturing. While commercial EDA tools are keeping pace with the need for greater test flow automation, new test algorithms are also needed to minimize the rate of field returns at advanced technology nodes. This paper describes how ASIC vendors can develop customized memory test algorithms to enhance their overall IC testing strategy. This paper is organized as follows: section two reviews new failure modes and fault models, and the limitations of traditional test algorithms. Section three describes an automation flow optimized for testing advanced ICs. Section four describes new algorithms targeting specific new defect mechanisms. Section five demonstrates a methodology for creating customized test algorithms and integrating them into the test automation flow. Sections six discusses the impact of adding advanced test algorithms on test time and die area. Section seven summarizes our conclusions.
A fault model is an abstraction or simulation of a defect that exhibits same behavior as the target defect itself. There are numerous fault models to emulate memory defects [1]. In general, memory fault models can be classified into four groups. (1) Single cell: Stuck-at faults (SAF) Stuck open faults (SOF) Transition faults (TF) Data retention faults (DRF) Read Disturb Faults (RDF)
(2) Dual cells: Inversion coupling faults (CFin) Idempotent coupling faults (CFid) State coupling faults (CFst) Bridge coupling faults (BF)
(4) Read / Write Logic: Address faults (AF) Address decoder open faults (ADOF)
Many algorithms to implement these fault models have been developed, and the most common ones are listed in Table 1. These vary in complexity, which determines the cost of testing, and the covered fault list, which determines the resulting test quality. ASIC vendors must balance cost and quality when they choose the set of test algorithms they will use in their production test flow.
Algorithm March1 (MarchC-) March2 (MarchC+) March3 Col_march1 (MarchC-) Unique Checkerboard RetentionCB
Target Faults AF, SAF, TF, CFin, CFid, and CFst AF, SAF, TF, SOF, CFin, and CFid AF, SAF, SOF, and TF AF, SAF, TF, CFin, CFid, and CFst SAF BF BF and DRF
A review of the literature [3] indicates that March type algorithms are among the best, providing coverage for more than 95% of defects [4]. In addition, pattern size and test duration grow linearly with memory size, and these algorithms have proven effective on real silicon for years [2]. Trident Microsystems typically deploys the March 2 algorithm together with a checkerboard background for technology nodes above 90nm. However, the traditional algorithms are insufficient for finding defects resulting from process variability at nodes below 90nm. More advanced algorithms targeting specific new physical defects must be added to achieve comprehensive, high-confidence manufacturing test.
101010 at every address so that the Nth bits are toggling. This algorithm detects stuck-at faults and bridge coupling faults, assuming the address decoder is fault free. The test operations are described in Figure 2. Write Write
Read
Read
This is modified version of the traditional March 2 algorithm that requires both fast column and fast row addressing. In addition to the faults detected by March 2, the March-LR algorithm can also detect read disturb, worst bit line coupling, and worst cell leakage to bit lines. Compared with traditional March algorithms, March-LR can detect all simple faults (one fault doesnt influence the behavior of other faults) as well as linked faults (one fault influences the behavior of other faults) [7, 8]. Heres the sequence of algorithm steps.
up up down down up up up up up up up up write 0 write 0 read 0, write 1 read 0, write 1 read 1, write 0, read 0, read 0, write 11 read 1, write 0, read 0, read 0, write read 1, write 00 read 1, write read 0, write 1, read 1, read 1, write 0 read 0, write 1, read 1, read 1, write 0 read 0 read 0
One of the subtle defects that occur as geometry shrinks is an open on the address decoder, or ADOF. These defects can occur when a NAND tree is used to implement decoding logic, and can cause a combinational circuit to act like a sequential circuit. A PMOS transistor defect causing a stuck open is a common failure modeSRAM has lots of NAND gates susceptible to PMOS stuck open faults. Traditional March algorithms cant always detect such a defect unless a special address sequence is employed [9]. One approach involves writing data to a selected pair of memory addresses. The procedure is to write a value to a selected base address, then check if the value changes after writing inverse content to a neighboring address. To detect open faults in an address decoder, the algorithm writes to neighboring addresses with a Hamming distance of one, and checks if this operation also results in a write to the base cell. Heres the sequence of algorithm steps.
up up up up write 0 write 0 write 1, shift_write 0, read 1, write 0 write 1, shift_write 0, read 1, write 0
Checkerboard (CB)
This is a simple march type algorithm with a checkerboard data pattern. Based on the topology of memory, the checkerboard algorithm divides cells into two groups such that every neighboring cell is in a different group. The goal of checkerboard operations is to have 010101 patterns imposed on memory cells so that each cells physical neighbors are in the opposite states. When the Nth bits are close to each other, we need to invert 010101 patterns to
In this algorithm, shift_write means write to the shifted address with Hamming distance of one. For example, if the base address is 000, then the four shifted addresses would be 001, 010, and 100. The testing operations are described in Table 2.
First Line(001) R1 W0
Second Line(010) R1
Third Line(100) R1 W0
X-Address R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0
X-Address
X-Address W0 Y-Address W1 W0 W0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0
W0 W0
W0 Y-Address W1 W0 W0
W1 W0 W0 W1
W0 W0 W1 W0
X-Address R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1
The worst scenario for an address decoder is when all address bits are switching at the same time, for example, from address 00000000 to address 11111111, or from address 01010101 to address 10101010. These transitions require state changes in all the NAND gates in the decoder, consequently the settling delay is longer than normal address marching from 00000000, 00000001, 00000010, etc. The decode timing is critical because the array is precharged and the word line driver cant be enabled prematurely. Until the address becomes stable, any small delay faults along the NAND tree in the address decoder can alter the pre-charged state of the array. The Address Complement (AC) algorithm targets these address faults, i.e., faults associated with the worst case settling delay of the address decoder, and has these steps:
up up up up up up up up up up up up uup p up up write 0 write 0 write 0 write 0 read 0 read 0 read 0 read 0 write 1 write 1 write 1 write 1 read 11 read read 1 read 1 address A address A complement (address A) complement (address A) address A address A complement (address A) complement (address A) address A address A complement (address A) complement (address A) address A address A complement (address A) complement (address A)
To optimize a memory layout in order to save decoder device and wiring area, designers take advantage of array symmetry in address decoding. For row decoding, write and read bit lines are shared by all cells of the same column. A similar construction is used for column decoding. Consequently, it is not surprised that a read/write operation on a cell could be affected by an adjacent cell in the same row or column [12]. The Column Disturb algorithm detects coupling effects on adjacent columns that can cause an error when writing different data values to adjacent columns. This type of fault is very common in DRAM, but is also becoming more common is SRAM at smaller geometries. Heres the diagram for CD write / read sequences.
X-Address W1 W1 W1 W1 Y-Address W0 W0 W0 W0 W0 W0 W0 W0 X-Address W0 W0 W0 W0 W0 W0 W0 W0 Y-Address Y-Address Y-Address Y-Address X-Address Y-Address X-Address R1 R1 R1 R1
X-Address
X-Address
Waltz (W)
W1 W1 W1 W1
R1
R1
R1
R1
With shrinking geometry, pattern-sensitive faults caused by cross-coupling are becoming more common but are not detected by the traditional March algorithm [10]. Neighborhood Pattern Sensitive Fault models target these issues, the best known being tiling, bipartite, and rowMarch algorithms [11]. Some new approaches applying data and address scrambling [10], multiple address orders and multiple data backgrounds [11] have also been developed to increase fault coverage, reduce test application time and simplify address sequence generation. The new Waltz algorithm generates address sequences based on topological location and can detect additional NPSF and decoder failures. Heres the diagram for write / read sequences.
X-Address Y-Address
X-Address
W0 W0 W0 W0 W0 W0 W0 W0 X-Address W0 W0 W0 W0 W0 W0 W0 W0
W1 W1 W1 W1
R1
R1
R1
R1
X-Address Y-Address
X-Address Y-Address
W1 W1 W1 W1
R1
R1
R1
R1
In the first line, the OR of 0000 with 1110 gives 1110, and adding 1 gives 1111. For the desired sequence, the starting address is 0000 and ending address is 0001. As shown in the box above, only half of the address transitions can be generated by the single function (the values in bold). We need another function, which we will call addr_comp_max_1, to generate the other values:
addr_comp_max_1 = (A ^ 10'b1111111110) 1;
The function with an initial value of 1111 would generate the following sequence (value in bold).
1111 1111 0000 0000 1101 1101 0010 0010 1011 1011 0100 0100 1001 1001 0110 0110 0111 0111 1000 1000 0101 0101 1010 1010 0011 0011 1100 1100 0001 0001 1110 1110
0001 0001 1110 1110 0011 0011 1100 1100 0101 0101 1010 1010 0111 0111 1000 1000 1001 1001 0110 0110 1011 1011 0100 0100 1101 1101 0010 0010 1111 1111 0000 0000
0000 0000 1101 1101 0010 0010 1011 1011 0100 0100 1001 1001 0110 0110 0111 0111 1000 1000 0101 0101 1010 1010 0011 0011 1100 1100 0001 0001 1110 1110 1111 1111
The starting address is 1111 and ending address is 1110. To illustrate how this function works, lets take a 4-bit address with an initial value of 0000 as an example: UDA component Next we need a User Defined Algorithm (UDA) to use these two functions to generate the required address sequence. Arguments to the UDA include the name of function to be called, the starting address of the desired sequence, ending address, and total number of addressed desired:
addr function name, start, stop, count ;
0000 0000 1111 1111 0010 0010 1101 1101 0100 0100 1011 1011 0110 0110 1001 1001 1000 1000 0111 0111 1010 1010 0101 0101 1100 1100 0011 0011 1110 1110 0001 0001
1110 1110 0001 0001 1100 1100 0011 0011 1010 1010 0101 0101 1000 1000 0111 0111 0110 0110 1001 1001 0100 0100 1011 1011 0010 0010 1101 1101 0000 0000 1111 1111
1111 1111 0010 0010 1101 1101 0100 0100 1011 1011 0110 0110 1001 1001 1000 1000 0111 0111 1010 1010 0101 0101 1100 1100 0011 0011 1110 1110 0001 0001 0000 0000
Four steps describing address sequence, data background and operation can be created as shown below.
step w_0_addr_comp_up; step w_0_addr_comp_up; addr function addr_comp_min_1, 0, 1, 1024; addr function addr_comp_min_1, 0, 1, 1024; data seed; data seed; operation w; operation w; step r_0_addr_comp_up; step r_0_addr_comp_up; addr function addr_comp_min_1, 0, 1, 1024; addr function addr_comp_min_1, 0, 1, 1024; data seed; data seed; operation r; operation r; step w_0_addr_comp_down; step w_0_addr_comp_down; addr function addr_comp_max_1, 1023, 1022, 1024; addr function addr_comp_max_1, 1023, 1022, 1024; data seed; data seed; operation w; operation w; step r_0_addr_comp_down; step r_0_addr_comp_down; addr function addr_comp_max_1, 1023, 1022, 1024; addr function addr_comp_max_1, 1023, 1022, 1024; data seed; data seed; operation r; operation r;
step w_1_waltz_0_up; step w_1_waltz_0_up; addr function waltz_3, 0, 1023, 342; addr function waltz_3, 0, 1023, 342; data invseed; ; datainvseed invseed; operation w; operation w; step r_1_w_0_waltz_0_up; step r_1_w_0_waltz_0_up; addr function waltz_3, 0, 1023, 342; addr function waltz_3, 0, 1023, 342; data seed; data seed; operation rw; ; operationrw rw;
The starting address is zero and the ending address is 1023, as depicted in Figure 6.
0 4 8 C 1 5 9 D 2 6 A E 3 7 B F ~~~ 1008 1012 1016 1020 1009 1013 1017 1021 1010 1014 1018 1022 1011 1015 1019 1023
The UDA has four steps organized in two parts, with each part containing a write and a read operation. Having writes and reads interleaved helps to detect failures early, rather than waiting until the complete write sequence finishes. In a similar way, extra 4 steps of inverse background can be applied to complete the algorithm. Waltz VERILOG component With a two-dimensional address decoding structure (rows and columns), the Waltz algorithm can be viewed as a sequence of 4x4 blocks with 16 addresses per block. Three different waltz pattern types are shown in Figure 5, each having a Hamming distance of three between base cells, meaning the address sequence must be incremented by three. A waltz_3 function can be described as:
waltz_3 = A + 3;
First Block
Last Block
The starting address is one and ending address is 1021, as depicted in Figure 7.
0 1 5 9 D 2 6 A E 3 7 B F ~~~ 1008 1012 1016 1020 1009 1013 1017 1021 1010 1014 1018 1022 1011 1015 1019 1023
1 1 1 1 Type-1
1 1
1 1 1 1 1
4 8
1 1
1 Type-2
First Block
Last Block
Type-3
Figure 5: Three Types of Waltz Algorithm Type 3 Pattern UDA component The UDAs for the three pattern types are shown below. Type 1 Pattern
step w_1_waltz_2_up; step w_1_waltz_2_up; addr function waltz_3, 2, 1022, 341; addr function waltz_3, 2, 1022, 341; data invseed; datainvseed; invseed; operation w; operation w; step r_1_w_0_waltz_2_up; step r_1_w_0_waltz_2_up; addr function waltz_3, 2, 1022, 341; addr function waltz_3, 2, 1022, 341; data seed; data seed; operation rw; ; operationrw rw;
The starting address is two and ending address is 1022, as depicted in Figure 8.
The same 4x4 block structure is used for four different pattern types using four different base columns.
1 1 0 1 0 1 0 0 1 0 0 0 0 0 Type-2 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0
0 4 8 C
1 5 9 D
2 6 A E
3 7 B F ~~~
Type-1
Type-3
Type-4
First Block
Last Block
Figure 10: Four Types of Column Disturb Algorithm For each type, there are three steps representing write operations on a base column (write 1) and two adjacent columns (write 0), as defined by:
step w_0_0_0_up; step w_0_0_0_up; addr function addr_dist_16, 0, 1008, 64; addr function addr_dist_16, 0, 1008, 64; data invseed; datainvseed; invseed; operation w; operation w; step w_1_0_0_up; step w_1_0_0_up; addr function addr_dist_16, 0, 1008, 64; addr function addr_dist_16, 0, 1008, 64; data seed; data seed; operation w; operation w; step r_1_0_0_up; step r_1_0_0_up; addr function addr_dist_16, 0, 1008, 64; addr function addr_dist_16, 0, 1008, 64; data seed; data seed; operation r; operation r;
Figure 8: Address Mapping for Type-3 Waltz Algorithm Finally, an up-count write operation is added to the first step, and an up-count read operation is added into the last step to complete the algorithm. Column Disturb VERILOG component Again, due to two the dimensional address decoding structure, the algorithm is expressed as a 4x4 block with 16 addresses repeated for each read / write operation, as shown in Figure 9.
1 1 1 1
In the example of Figure 11, the starting address for the sequence is 0 and ending address is 1008.
2 2 2 2 0 4 8 C N N N N 1 5 9 D 2 6 A E 3 7 B F ~~~ 1008 1012 1016 1020 1009 1013 1017 1021 1010 1014 1018 1022 1011 1015 1019 1023
First Block
Last Block
Figure 11: Address Mapping for Column Disturb Algorithm Figure 9: Addressing Sequence for Column Disturb Algorithm The notation 1, 2N refers to the sequence of read/write operations. Once an operation is finished at a position within the 4x4 block, the algorithm moves to the next cell in the same column of the block. So addresses must be incremented by 16 using the function shown below: In this algorithm, the address will jump by 16 after each read or write operation finishes, determined by the addr_dist_16 function. If the total number of addresses is 1024, this procedure will repeat 64 times (i.e., 1024 / 16) to reach address 1008, and an additional 15 steps are needed to generate the rest of the required addresses to reach 1023. The repetition statement of the UDA facility can be used in this situation to define the set of steps to perform using a common data value for the three fundamental steps.
addr_dist_16 = A + 16;
UDA component
AC+W+CD MR+CB+ AC+W+CD+AD Dual Port MR+CB MR+CB+ AC+W+CD MR+CB+ 2048x8 83323.3985 0
2087.9988
Write 1 on base column (0~3) Write 1 on base column (0~3) Write 00 on adjacent top column (12~15) Write on adjacent top column (12~15) Write 00 on adjacent bottom column (4~7) Write on adjacent bottom column (4~7) Read 1 on base column (0~3) Read 1 on base column (0~3)
AC+W+CD+AD
In similar way, extra 3 repetitions can be created for each one column with 3 write operations and 1 read operation.
Table 4 shows that the increase in total testing cycles is large in a relative sense, but test time is typically not a major concern because a BIST controller can be operated at clock frequencies as high as 450 MHz.
Algorithm Single Port MR+CB MR+CB+ AC+W+CD MR+CB+ AC+W+CD+AD Dual Port MR+CB MR+CB+ AC+W+CD MR+CB+ AC+W+CD+AD 2048x8 15565200 155652 0 Time (ns) 1024x12 15566300 155663 0 Cycle Change (%)
41165200
411652
164.47
21914000
209140
34.36
31539700
315397
102.62
21914100
219141
40.78
84232.0385
1.09
2996.6388
43.52
83887.8785
0.677
2652.4788
27.03
47119.9066
1.732
2110.6803
repetition addr_dist_1; repetition addr_dist_1; seed 'hffff; seed'hffff; 'hffff; begin begin step w_1_0_0_up; step w_1_0_0_up; step w_1_1_0_up; step w_1_1_0_up; step w_1_2_0_up; step w_1_2_0_up; step w_1_3_0_up; step w_1_3_0_up; step w_0_12_0_up; step w_0_12_0_up; step w_0_13_0_up; step w_0_13_0_up; step w_0_14_0_up; step w_0_14_0_up; step w_0_15_0_up; step w_0_15_0_up; step w_0_4_0_up; step w_0_4_0_up; step w_0_5_0_up; step w_0_5_0_up; step w_0_6_0_up; step w_0_6_0_up; step w_0_7_0_up; step w_0_7_0_up; step r_1_0_0_up; step r_1_0_0_up; step r_1_1_0_up; step r_1_1_0_up; step r_1_2_0_up; step r_1_2_0_up; step r_1_3_0_up; step r_1_3_0_up; end end
Algorithm
Change (%)
Change (%)
1308.6003
0 39.84
46839.1066
1.125
1829.8803
61.29
7. Conclusion
Quality is critical to every chip vendor due to high cost of field returns. For applications with many embedded memories, BIST can help ensure memory and overall chip quality. Flexible algorithms and automated flows are two keys to delivering higher quality memory BIST at a manageable cost. The concept of User Defined Algorithms (UDA) provides a way to customize and improve BIST test algorithms in an efficient and reusable manner in order to address new defect mechanisms emerging at advanced process nodes at 65nm and beyond.
10. Address Sequences and Backgrounds with Different Hamming Distances for Multiple Run March Tests, Svetlana Yarmolik, Belarusian State University of Computer Science and Radio Electronics, Int. J. Appl. Math. Comput. Sci. 2008, Vol. 18, No. 3, 329-339 11. Neighborhood Pattern-Sensitive Fault Testing and Diagnostics for Random-Access Memories, Kuo-Liang Cheng, Ming-Fu Tsai, Cheng-Wen Wu, 2002 12. Test Algorithm for Memory Cell Disturb Failures, Duane Aadsen, Larry Fenstermaker, Frank Higgins, Ilyoung Kim, Jim Lewandowski, Jeffery J. Nagy, Lucent Technologies, Bell Labs 13. Method for Providing User Definable Algorithms in Memory BIST, Omar Kebichi, Christopher John Hill, Paul Reuter, Ian Alexander J. Burgess, United Sates Patent 6671843, 2003
Acknowledgement
We would like to thank UMC for providing the initial Waltz and Column Disturb algorithms, and the Trident DFT engineers Miya Zhou for providing data on area overhead and test time impacts, and Marlene Miao for the at-speed test implementation.
References
1. 2. 3. Memory testing, Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU. Future Challenges in Memory Testing, Said Hamdioui, Georgi Gaydadjiev, Delft University of Technology Using March Tests to Test SRAMs, Van De Goor, A.J., Design & Test of Computers IEEE Volume 10, Issue, March 1993 Design and Test of Large Embedded Memories: An Overview, Rochit Rajsuman, Advantest America R&D Center, IEEE Design & Test of Computers 2001 UMC Embedded SRAM Design for Test and Manufacturing Application Reference, Spec No. GGL04-001-E (version 2), 2003 BIST for Depp Submicron ASIC Memories with High Performance Application, Theo J. Powell, , Paul Policke, Sherry Lai, Texas Instruments Inc., Wu-Tung Cheng, Joseph Rayhawk, Omar Samman, Mentor Graphics Corp., ITC 2003 March LR: A Test for Realistic Linked Faults, A.J. van de Goor, G.N. Gaydadjiev, V.G. Mikitjuk, V.N. Yarmolik, IEEE 1996 Embedded Memory Test Patterns at 130nm and Below, Rob Aitken, Artisan Components, ITC 2004 Chasing Subtle Embedded RAM Defects for Nanometer Technologies, Theo Powell, Texas Instruments Inc., Amrendra Kumar, Joseph Rayhawk, Nilanjan Mukheriee, Mentor Graphics Corporation, ITC 2005
4.
5.
6.
7.
8. 9.