Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Checker Board

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Customized Algorithms for High Performance Memory Test in Advanced Technology Node

Shomo Chen, Ning Huang, Ting-Pu Tai*, Actel Niu* Trident Microsystems Inc. 3408 Garrett Drive Santa Clara, CA 95054-2803
*

Mentor Graphics Corporation 8005 SW Boeckman Road Wilsonville, OR 97070

Abstract
Embedded memory quality is critical to overall chip quality. New defect mechanisms that occur at advanced process nodes (65nm and below) are often more pronounced in memories due to their high density and performance requirements. Traditional memory test algorithms are not sufficient to guarantee a low escape rate for these new memory defects. This paper describes 6 advanced test algorithms that address these shortcomings in order to maintain high memory test quality at smaller geometries.3 customized algorithms are focuses and described creation through innovative way especially.

2. Embedded Memory Test Background


A defect is a manufacturing error that prevents an IC from functioning properly or from meeting its performance specification. In practice, the frequency and range of defect types varies by process technology node. More advanced processes have a higher probability of subtle defect mechanisms that are more difficult to detect. A failure mode is a description of the faulty behavior resulting from a particular type of defect in silicon. There are three major failure modes for embedded SRAM. Single bit failures resulting from poly-contact short (miss-alignment), poly residue, or contact etching-stop defects. Dual bit failures resulting from poly bridging or contact etching-stop defects. Multiple bit failures resulting from via1 etching-stop defects.

1. Introduction
Chip quality is becoming more difficult to maintain as the process geometry shrinks. Not only is design complexity higher, but new defect types cause DPM (defects per million) to increase. At the same time, the amount of embedded memory in many applications continues to grow, making memory testing a key factor in maintaining low cost and high quality in IC manufacturing. While commercial EDA tools are keeping pace with the need for greater test flow automation, new test algorithms are also needed to minimize the rate of field returns at advanced technology nodes. This paper describes how ASIC vendors can develop customized memory test algorithms to enhance their overall IC testing strategy. This paper is organized as follows: section two reviews new failure modes and fault models, and the limitations of traditional test algorithms. Section three describes an automation flow optimized for testing advanced ICs. Section four describes new algorithms targeting specific new defect mechanisms. Section five demonstrates a methodology for creating customized test algorithms and integrating them into the test automation flow. Sections six discusses the impact of adding advanced test algorithms on test time and die area. Section seven summarizes our conclusions.

A fault model is an abstraction or simulation of a defect that exhibits same behavior as the target defect itself. There are numerous fault models to emulate memory defects [1]. In general, memory fault models can be classified into four groups. (1) Single cell: Stuck-at faults (SAF) Stuck open faults (SOF) Transition faults (TF) Data retention faults (DRF) Read Disturb Faults (RDF)

(2) Dual cells: Inversion coupling faults (CFin) Idempotent coupling faults (CFid) State coupling faults (CFst) Bridge coupling faults (BF)

(3) Multiple cells: Neighborhood Pattern Sensitive Faults (NPSF)

3. Automated Design Flow


There are quite a few different memory test methods, but Built-In Self Test (BIST) has traditionally been the best solution for Tridents design flow. BIST requires only a relatively small, simple pattern introduced by Automated Test Equipment (ATE), and it is easy to customize BIST test algorithms to achieve better test quality. Overall, BIST shows a good testing ROI, that is, high test quality and coverage, and a relatively efficient testing process (small test pattern size and test time). In multimedia applications, its quite common to have hundreds of memories embedded in a design. In these cases, it is not cost effective to have designers create and integrate all the BIST circuits required. Designers need automation for these tasks so they can focus on algorithm optimization to achieve better testing quality. For BIST automation tools to be effective, they must handle a wide variety of memory configurations such as different address sizes, data widths, control signals and read/write operations. They then generate the needed BIST functionality with a choice of implementation parameters such as memory number, clock scheme, sequential or concurrent test, normal speed or at-speed, diagnosis and selected algorithms. Once the required BIST circuits are defined, they need to be inserted into and integrated with the devices functional circuitry. The flow needs to consider the right instantiation, pin mapping and pin sharing. Either an RTL level or a gate level netlist can be used as input, together with a corresponding script for insertion and integration. The tools generate a top level netlist with inserted BIST logic, synthesis script, top level test-bench and test pattern in WGL format. After synthesis, the final gate level netlist can be used for simulation.

(4) Read / Write Logic: Address faults (AF) Address decoder open faults (ADOF)

Many algorithms to implement these fault models have been developed, and the most common ones are listed in Table 1. These vary in complexity, which determines the cost of testing, and the covered fault list, which determines the resulting test quality. ASIC vendors must balance cost and quality when they choose the set of test algorithms they will use in their production test flow.

Algorithm March1 (MarchC-) March2 (MarchC+) March3 Col_march1 (MarchC-) Unique Checkerboard RetentionCB

Complexity 10n 14n 10n 10n 5n 4n 4n

Target Faults AF, SAF, TF, CFin, CFid, and CFst AF, SAF, TF, SOF, CFin, and CFid AF, SAF, SOF, and TF AF, SAF, TF, CFin, CFid, and CFst SAF BF BF and DRF

Table 1: Common Memory Testing Algorithm

A review of the literature [3] indicates that March type algorithms are among the best, providing coverage for more than 95% of defects [4]. In addition, pattern size and test duration grow linearly with memory size, and these algorithms have proven effective on real silicon for years [2]. Trident Microsystems typically deploys the March 2 algorithm together with a checkerboard background for technology nodes above 90nm. However, the traditional algorithms are insufficient for finding defects resulting from process variability at nodes below 90nm. More advanced algorithms targeting specific new physical defects must be added to achieve comprehensive, high-confidence manufacturing test.

Figure 1: Memory BIST Automation Flow

4. Testing Strategy and Customized Algorithms


Any BIST scheme involves adding additional circuitry dedicated to the testing function, and this increases die size. Implementing memory BIST with customized algorithms that provide sufficiently high coverage without too much area overhead is a way to reach an optimum balance. Based on the published literature, [5] we can achieve 99.6% coverage by deploying the following algorithms: (1) Checkerboard: 95.8% (2) Marching: 2.3% (3) ADComp (Address Complement): 0.6% (4) Disturb (Column Disturb): 0.6% (5) Waltz: 0.3%. Internal phase-locked loops (PLLs) can be added to drive at-speed testing in order to detect timing-related defects [6]. A smart customization of BIST controller without costly pipeline stages for at-speed memory testing has been implemented in Trident designs. Six advanced algorithms were developed to ensure effective defect coverage at nodes beyond 65nm: March-LR (MR)

101010 at every address so that the Nth bits are toggling. This algorithm detects stuck-at faults and bridge coupling faults, assuming the address decoder is fault free. The test operations are described in Figure 2. Write Write

Read

Read

Figure 2: Operations of Checkerboard Algorithm Heres the sequence of algorithm steps.


up up up up up up up up write checkerboard write checkerboard read checkerboard read checkerboard write inverse checkerboard write inverse checkerboard read inverse checkerboard read inverse checkerboard

Address Decoder Open (AD)

This is modified version of the traditional March 2 algorithm that requires both fast column and fast row addressing. In addition to the faults detected by March 2, the March-LR algorithm can also detect read disturb, worst bit line coupling, and worst cell leakage to bit lines. Compared with traditional March algorithms, March-LR can detect all simple faults (one fault doesnt influence the behavior of other faults) as well as linked faults (one fault influences the behavior of other faults) [7, 8]. Heres the sequence of algorithm steps.
up up down down up up up up up up up up write 0 write 0 read 0, write 1 read 0, write 1 read 1, write 0, read 0, read 0, write 11 read 1, write 0, read 0, read 0, write read 1, write 00 read 1, write read 0, write 1, read 1, read 1, write 0 read 0, write 1, read 1, read 1, write 0 read 0 read 0

One of the subtle defects that occur as geometry shrinks is an open on the address decoder, or ADOF. These defects can occur when a NAND tree is used to implement decoding logic, and can cause a combinational circuit to act like a sequential circuit. A PMOS transistor defect causing a stuck open is a common failure modeSRAM has lots of NAND gates susceptible to PMOS stuck open faults. Traditional March algorithms cant always detect such a defect unless a special address sequence is employed [9]. One approach involves writing data to a selected pair of memory addresses. The procedure is to write a value to a selected base address, then check if the value changes after writing inverse content to a neighboring address. To detect open faults in an address decoder, the algorithm writes to neighboring addresses with a Hamming distance of one, and checks if this operation also results in a write to the base cell. Heres the sequence of algorithm steps.
up up up up write 0 write 0 write 1, shift_write 0, read 1, write 0 write 1, shift_write 0, read 1, write 0

Checkerboard (CB)

This is a simple march type algorithm with a checkerboard data pattern. Based on the topology of memory, the checkerboard algorithm divides cells into two groups such that every neighboring cell is in a different group. The goal of checkerboard operations is to have 010101 patterns imposed on memory cells so that each cells physical neighbors are in the opposite states. When the Nth bits are close to each other, we need to invert 010101 patterns to

In this algorithm, shift_write means write to the shifted address with Hamming distance of one. For example, if the base address is 000, then the four shifted addresses would be 001, 010, and 100. The testing operations are described in Table 2.

Addr ess Base (000) 001 010 100 W1

First Line(001) R1 W0

Second Line(010) R1

Third Line(100) R1 W0

X-Address W0 W0 W1 W0 W0 W1 W0 W0 W1 W0 W0 W1 W0 W0 W1 W0 Y-Address Y-Address

X-Address R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0

X-Address

X-Address W0 Y-Address W1 W0 W0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0

W0 W0

W0 Y-Address W1 W0 W0

W1 W0 W0 W1

W0 W0 W1 W0

Table 2: Operations for Address Decoder Open Algorithm

X-Address W1 W0 W0 W1 W0 W0 W1 W0 W0 W1 W0 W0 W1 W0 W0 W1 Y-Address Y-Address

X-Address R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1 R0 R0 R1

Address Complement (AC)

The worst scenario for an address decoder is when all address bits are switching at the same time, for example, from address 00000000 to address 11111111, or from address 01010101 to address 10101010. These transitions require state changes in all the NAND gates in the decoder, consequently the settling delay is longer than normal address marching from 00000000, 00000001, 00000010, etc. The decode timing is critical because the array is precharged and the word line driver cant be enabled prematurely. Until the address becomes stable, any small delay faults along the NAND tree in the address decoder can alter the pre-charged state of the array. The Address Complement (AC) algorithm targets these address faults, i.e., faults associated with the worst case settling delay of the address decoder, and has these steps:
up up up up up up up up up up up up uup p up up write 0 write 0 write 0 write 0 read 0 read 0 read 0 read 0 write 1 write 1 write 1 write 1 read 11 read read 1 read 1 address A address A complement (address A) complement (address A) address A address A complement (address A) complement (address A) address A address A complement (address A) complement (address A) address A address A complement (address A) complement (address A)

Figure 3: Waltz Algorithm Column Disturb (CD)

To optimize a memory layout in order to save decoder device and wiring area, designers take advantage of array symmetry in address decoding. For row decoding, write and read bit lines are shared by all cells of the same column. A similar construction is used for column decoding. Consequently, it is not surprised that a read/write operation on a cell could be affected by an adjacent cell in the same row or column [12]. The Column Disturb algorithm detects coupling effects on adjacent columns that can cause an error when writing different data values to adjacent columns. This type of fault is very common in DRAM, but is also becoming more common is SRAM at smaller geometries. Heres the diagram for CD write / read sequences.
X-Address W1 W1 W1 W1 Y-Address W0 W0 W0 W0 W0 W0 W0 W0 X-Address W0 W0 W0 W0 W0 W0 W0 W0 Y-Address Y-Address Y-Address Y-Address X-Address Y-Address X-Address R1 R1 R1 R1

X-Address

X-Address

Waltz (W)

W1 W1 W1 W1

R1

R1

R1

R1

With shrinking geometry, pattern-sensitive faults caused by cross-coupling are becoming more common but are not detected by the traditional March algorithm [10]. Neighborhood Pattern Sensitive Fault models target these issues, the best known being tiling, bipartite, and rowMarch algorithms [11]. Some new approaches applying data and address scrambling [10], multiple address orders and multiple data backgrounds [11] have also been developed to increase fault coverage, reduce test application time and simplify address sequence generation. The new Waltz algorithm generates address sequences based on topological location and can detect additional NPSF and decoder failures. Heres the diagram for write / read sequences.

X-Address Y-Address Y-Address Y-Address

X-Address Y-Address

X-Address

W0 W0 W0 W0 W0 W0 W0 W0 X-Address W0 W0 W0 W0 W0 W0 W0 W0

W1 W1 W1 W1

R1

R1

R1

R1

X-Address Y-Address

X-Address Y-Address

W1 W1 W1 W1

R1

R1

R1

R1

Figure 4: Column Disturb Algorithm

5. Creating User Defined Algorithms


The March-LR, Checkerboard and Address Decoder Open algorithms are commonly supported by test automation tools via simple commands to invoke the algorithms when needed. However, the other threeAddress Complement, Waltz, and Column Disturbare not as easily automated because they require more irregular address sequences. Additional user input is needed to describe the desired sequence. One mechanism to provide this information in an efficient and reusable fashion is called the User Defined Algorithm (UDA) facility. This approach makes it easier for users to specify to a test tool how to implement the desired algorithms in a BIST controller for memory testing [13]. The method includes tool support for reading a user-defined test algorithm description, and translating it into BIST circuitry for a specific memory model. The flow combines a global description in VERILOG format, and a local description in tool-specific format. Address Complement VERILOG component One way to specify address sequences is to create a function in the VERILOG language. For example, the address increment function returns a value by applying the corresponding UDA. For the AC algorithm, two simple functions, which are slight modifications to the standard binary complement operation, can produce the needed address values:
addr_comp_min_1 = (A ^ 10'b1111111110) + 1;

In the first line, the OR of 0000 with 1110 gives 1110, and adding 1 gives 1111. For the desired sequence, the starting address is 0000 and ending address is 0001. As shown in the box above, only half of the address transitions can be generated by the single function (the values in bold). We need another function, which we will call addr_comp_max_1, to generate the other values:
addr_comp_max_1 = (A ^ 10'b1111111110) 1;

The function with an initial value of 1111 would generate the following sequence (value in bold).

1111 1111 0000 0000 1101 1101 0010 0010 1011 1011 0100 0100 1001 1001 0110 0110 0111 0111 1000 1000 0101 0101 1010 1010 0011 0011 1100 1100 0001 0001 1110 1110

0001 0001 1110 1110 0011 0011 1100 1100 0101 0101 1010 1010 0111 0111 1000 1000 1001 1001 0110 0110 1011 1011 0100 0100 1101 1101 0010 0010 1111 1111 0000 0000

0000 0000 1101 1101 0010 0010 1011 1011 0100 0100 1001 1001 0110 0110 0111 0111 1000 1000 0101 0101 1010 1010 0011 0011 1100 1100 0001 0001 1110 1110 1111 1111

The starting address is 1111 and ending address is 1110. To illustrate how this function works, lets take a 4-bit address with an initial value of 0000 as an example: UDA component Next we need a User Defined Algorithm (UDA) to use these two functions to generate the required address sequence. Arguments to the UDA include the name of function to be called, the starting address of the desired sequence, ending address, and total number of addressed desired:
addr function name, start, stop, count ;

0000 0000 1111 1111 0010 0010 1101 1101 0100 0100 1011 1011 0110 0110 1001 1001 1000 1000 0111 0111 1010 1010 0101 0101 1100 1100 0011 0011 1110 1110 0001 0001

1110 1110 0001 0001 1100 1100 0011 0011 1010 1010 0101 0101 1000 1000 0111 0111 0110 0110 1001 1001 0100 0100 1011 1011 0010 0010 1101 1101 0000 0000 1111 1111

1111 1111 0010 0010 1101 1101 0100 0100 1011 1011 0110 0110 1001 1001 1000 1000 0111 0111 1010 1010 0101 0101 1100 1100 0011 0011 1110 1110 0001 0001 0000 0000

Four steps describing address sequence, data background and operation can be created as shown below.

step w_0_addr_comp_up; step w_0_addr_comp_up; addr function addr_comp_min_1, 0, 1, 1024; addr function addr_comp_min_1, 0, 1, 1024; data seed; data seed; operation w; operation w; step r_0_addr_comp_up; step r_0_addr_comp_up; addr function addr_comp_min_1, 0, 1, 1024; addr function addr_comp_min_1, 0, 1, 1024; data seed; data seed; operation r; operation r; step w_0_addr_comp_down; step w_0_addr_comp_down; addr function addr_comp_max_1, 1023, 1022, 1024; addr function addr_comp_max_1, 1023, 1022, 1024; data seed; data seed; operation w; operation w; step r_0_addr_comp_down; step r_0_addr_comp_down; addr function addr_comp_max_1, 1023, 1022, 1024; addr function addr_comp_max_1, 1023, 1022, 1024; data seed; data seed; operation r; operation r;

step w_1_waltz_0_up; step w_1_waltz_0_up; addr function waltz_3, 0, 1023, 342; addr function waltz_3, 0, 1023, 342; data invseed; ; datainvseed invseed; operation w; operation w; step r_1_w_0_waltz_0_up; step r_1_w_0_waltz_0_up; addr function waltz_3, 0, 1023, 342; addr function waltz_3, 0, 1023, 342; data seed; data seed; operation rw; ; operationrw rw;

The starting address is zero and the ending address is 1023, as depicted in Figure 6.
0 4 8 C 1 5 9 D 2 6 A E 3 7 B F ~~~ 1008 1012 1016 1020 1009 1013 1017 1021 1010 1014 1018 1022 1011 1015 1019 1023

The UDA has four steps organized in two parts, with each part containing a write and a read operation. Having writes and reads interleaved helps to detect failures early, rather than waiting until the complete write sequence finishes. In a similar way, extra 4 steps of inverse background can be applied to complete the algorithm. Waltz VERILOG component With a two-dimensional address decoding structure (rows and columns), the Waltz algorithm can be viewed as a sequence of 4x4 blocks with 16 addresses per block. Three different waltz pattern types are shown in Figure 5, each having a Hamming distance of three between base cells, meaning the address sequence must be incremented by three. A waltz_3 function can be described as:
waltz_3 = A + 3;

First Block

Last Block

Figure 6: Address Mapping for Type-1 Waltz Algorithm Type 2 Pattern


step w_1_waltz_1_up; step w_1_waltz_1_up; addr function waltz_3, 1, 1021, 341; addr function waltz_3, 1, 1021, 341; data invseed; datainvseed; invseed; operation w; operation w; step r_1_w_0_waltz_1_up; step r_1_w_0_waltz_1_up; addr function waltz_3, 1, 1021, 341; addr function waltz_3, 1, 1021, 341; data seed; data seed; operation rw; ; operationrw rw;

The starting address is one and ending address is 1021, as depicted in Figure 7.
0 1 5 9 D 2 6 A E 3 7 B F ~~~ 1008 1012 1016 1020 1009 1013 1017 1021 1010 1014 1018 1022 1011 1015 1019 1023

1 1 1 1 Type-1

1 1

1 1 1 1 1

4 8

1 1

1 Type-2

First Block

Last Block

Type-3

Figure 7: Address Mapping for Type-2 Waltz Algorithm

Figure 5: Three Types of Waltz Algorithm Type 3 Pattern UDA component The UDAs for the three pattern types are shown below. Type 1 Pattern
step w_1_waltz_2_up; step w_1_waltz_2_up; addr function waltz_3, 2, 1022, 341; addr function waltz_3, 2, 1022, 341; data invseed; datainvseed; invseed; operation w; operation w; step r_1_w_0_waltz_2_up; step r_1_w_0_waltz_2_up; addr function waltz_3, 2, 1022, 341; addr function waltz_3, 2, 1022, 341; data seed; data seed; operation rw; ; operationrw rw;

The starting address is two and ending address is 1022, as depicted in Figure 8.

The same 4x4 block structure is used for four different pattern types using four different base columns.
1 1 0 1 0 1 0 0 1 0 0 0 0 0 Type-2 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0

0 4 8 C

1 5 9 D

2 6 A E

3 7 B F ~~~

1008 1012 1016 1020

1009 1013 1017 1021

1010 1014 1018 1022

1011 1015 1019 1023

Type-1

Type-3

Type-4

First Block

Last Block

Figure 10: Four Types of Column Disturb Algorithm For each type, there are three steps representing write operations on a base column (write 1) and two adjacent columns (write 0), as defined by:
step w_0_0_0_up; step w_0_0_0_up; addr function addr_dist_16, 0, 1008, 64; addr function addr_dist_16, 0, 1008, 64; data invseed; datainvseed; invseed; operation w; operation w; step w_1_0_0_up; step w_1_0_0_up; addr function addr_dist_16, 0, 1008, 64; addr function addr_dist_16, 0, 1008, 64; data seed; data seed; operation w; operation w; step r_1_0_0_up; step r_1_0_0_up; addr function addr_dist_16, 0, 1008, 64; addr function addr_dist_16, 0, 1008, 64; data seed; data seed; operation r; operation r;

Figure 8: Address Mapping for Type-3 Waltz Algorithm Finally, an up-count write operation is added to the first step, and an up-count read operation is added into the last step to complete the algorithm. Column Disturb VERILOG component Again, due to two the dimensional address decoding structure, the algorithm is expressed as a 4x4 block with 16 addresses repeated for each read / write operation, as shown in Figure 9.
1 1 1 1

In the example of Figure 11, the starting address for the sequence is 0 and ending address is 1008.
2 2 2 2 0 4 8 C N N N N 1 5 9 D 2 6 A E 3 7 B F ~~~ 1008 1012 1016 1020 1009 1013 1017 1021 1010 1014 1018 1022 1011 1015 1019 1023

First Block

Last Block

Figure 11: Address Mapping for Column Disturb Algorithm Figure 9: Addressing Sequence for Column Disturb Algorithm The notation 1, 2N refers to the sequence of read/write operations. Once an operation is finished at a position within the 4x4 block, the algorithm moves to the next cell in the same column of the block. So addresses must be incremented by 16 using the function shown below: In this algorithm, the address will jump by 16 after each read or write operation finishes, determined by the addr_dist_16 function. If the total number of addresses is 1024, this procedure will repeat 64 times (i.e., 1024 / 16) to reach address 1008, and an additional 15 steps are needed to generate the rest of the required addresses to reach 1023. The repetition statement of the UDA facility can be used in this situation to define the set of steps to perform using a common data value for the three fundamental steps.

addr_dist_16 = A + 16;

UDA component

AC+W+CD MR+CB+ AC+W+CD+AD Dual Port MR+CB MR+CB+ AC+W+CD MR+CB+ 2048x8 83323.3985 0

2087.9988

Write 1 on base column (0~3) Write 1 on base column (0~3) Write 00 on adjacent top column (12~15) Write on adjacent top column (12~15) Write 00 on adjacent bottom column (4~7) Write on adjacent bottom column (4~7) Read 1 on base column (0~3) Read 1 on base column (0~3)

AC+W+CD+AD

Table 3: Area Overhead Comparison

In similar way, extra 3 repetitions can be created for each one column with 3 write operations and 1 read operation.

6. Implementation and Area Overhead


We can evaluate implementation details in terms of a specific design example having several hundred embedded memory structures to be implemented on UMCs 65nm process. The clock frequency varies from a high of 450MHz to a low of 24MHz, and memory size varies from 200K bits to 2K bits. Memory types include both single and dual port SRAM. All six algorithms mentioned above were deployed and the area overhead and test times are summarized in Tables 3 and 4. The March (MR) and Checkerboard (CB) algorithms are the baselines for comparison. The literature suggests that Address Complement (AC), Waltz (W) and Column Disturb (CD) algorithms can achieve an extra 1.5% coverage [5]. Discussion of the importance of the Address Decoder Open (AD) algorithm can be found in [9]. Table 3 shows that the standalone area overhead for a BIST controller to implement these four advanced algorithms is significant in terms of its affect on the relative area required for BIST, but acceptable if we consider that the worst case overall area for memory and BIST controller is only around 1% of the die.

Table 4 shows that the increase in total testing cycles is large in a relative sense, but test time is typically not a major concern because a BIST controller can be operated at clock frequencies as high as 450 MHz.
Algorithm Single Port MR+CB MR+CB+ AC+W+CD MR+CB+ AC+W+CD+AD Dual Port MR+CB MR+CB+ AC+W+CD MR+CB+ AC+W+CD+AD 2048x8 15565200 155652 0 Time (ns) 1024x12 15566300 155663 0 Cycle Change (%)

Table 4: Test Time Comparison

41165200

411652

164.47

21914000

209140

34.36

31539700

315397

102.62

21914100

219141

40.78

84232.0385

1.09

2996.6388

43.52

83887.8785

0.677

2652.4788

27.03

47119.9066

1.732

2110.6803

repetition addr_dist_1; repetition addr_dist_1; seed 'hffff; seed'hffff; 'hffff; begin begin step w_1_0_0_up; step w_1_0_0_up; step w_1_1_0_up; step w_1_1_0_up; step w_1_2_0_up; step w_1_2_0_up; step w_1_3_0_up; step w_1_3_0_up; step w_0_12_0_up; step w_0_12_0_up; step w_0_13_0_up; step w_0_13_0_up; step w_0_14_0_up; step w_0_14_0_up; step w_0_15_0_up; step w_0_15_0_up; step w_0_4_0_up; step w_0_4_0_up; step w_0_5_0_up; step w_0_5_0_up; step w_0_6_0_up; step w_0_6_0_up; step w_0_7_0_up; step w_0_7_0_up; step r_1_0_0_up; step r_1_0_0_up; step r_1_1_0_up; step r_1_1_0_up; step r_1_2_0_up; step r_1_2_0_up; step r_1_3_0_up; step r_1_3_0_up; end end

Algorithm

BIST+ Memory Area (um^2) 1024x12 46317.8266

Change (%)

BIST Area (um^2)

Change (%)

Single Port MR+CB MR+CB+

1308.6003

0 39.84

46839.1066

1.125

1829.8803

61.29

7. Conclusion
Quality is critical to every chip vendor due to high cost of field returns. For applications with many embedded memories, BIST can help ensure memory and overall chip quality. Flexible algorithms and automated flows are two keys to delivering higher quality memory BIST at a manageable cost. The concept of User Defined Algorithms (UDA) provides a way to customize and improve BIST test algorithms in an efficient and reusable manner in order to address new defect mechanisms emerging at advanced process nodes at 65nm and beyond.

10. Address Sequences and Backgrounds with Different Hamming Distances for Multiple Run March Tests, Svetlana Yarmolik, Belarusian State University of Computer Science and Radio Electronics, Int. J. Appl. Math. Comput. Sci. 2008, Vol. 18, No. 3, 329-339 11. Neighborhood Pattern-Sensitive Fault Testing and Diagnostics for Random-Access Memories, Kuo-Liang Cheng, Ming-Fu Tsai, Cheng-Wen Wu, 2002 12. Test Algorithm for Memory Cell Disturb Failures, Duane Aadsen, Larry Fenstermaker, Frank Higgins, Ilyoung Kim, Jim Lewandowski, Jeffery J. Nagy, Lucent Technologies, Bell Labs 13. Method for Providing User Definable Algorithms in Memory BIST, Omar Kebichi, Christopher John Hill, Paul Reuter, Ian Alexander J. Burgess, United Sates Patent 6671843, 2003

Acknowledgement
We would like to thank UMC for providing the initial Waltz and Column Disturb algorithms, and the Trident DFT engineers Miya Zhou for providing data on area overhead and test time impacts, and Marlene Miao for the at-speed test implementation.

References
1. 2. 3. Memory testing, Cheng-Wen Wu, Lab for Reliable Computing (LaRC), EE, NTHU. Future Challenges in Memory Testing, Said Hamdioui, Georgi Gaydadjiev, Delft University of Technology Using March Tests to Test SRAMs, Van De Goor, A.J., Design & Test of Computers IEEE Volume 10, Issue, March 1993 Design and Test of Large Embedded Memories: An Overview, Rochit Rajsuman, Advantest America R&D Center, IEEE Design & Test of Computers 2001 UMC Embedded SRAM Design for Test and Manufacturing Application Reference, Spec No. GGL04-001-E (version 2), 2003 BIST for Depp Submicron ASIC Memories with High Performance Application, Theo J. Powell, , Paul Policke, Sherry Lai, Texas Instruments Inc., Wu-Tung Cheng, Joseph Rayhawk, Omar Samman, Mentor Graphics Corp., ITC 2003 March LR: A Test for Realistic Linked Faults, A.J. van de Goor, G.N. Gaydadjiev, V.G. Mikitjuk, V.N. Yarmolik, IEEE 1996 Embedded Memory Test Patterns at 130nm and Below, Rob Aitken, Artisan Components, ITC 2004 Chasing Subtle Embedded RAM Defects for Nanometer Technologies, Theo Powell, Texas Instruments Inc., Amrendra Kumar, Joseph Rayhawk, Nilanjan Mukheriee, Mentor Graphics Corporation, ITC 2005

4.

5.

6.

7.

8. 9.

You might also like