research-article

Open access

Reduced On-chip Storage of Seeds for Built-in Test Generation

Author:

Irith PomeranzAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 29, Issue 3

Article No.: 45, Pages 1 - 16

https://doi.org/10.1145/3643810

Published: 14 March 2024 Publication History

PDF eReader

Abstract

Logic built-in self-test (LBIST) approaches use an on-chip logic block for test generation and thus enable in-field testing. Recent reports of silent data corruption underline the importance of in-field testing. In a class of storage-based LBIST approaches, compressed tests are stored on-chip and decompressed by an on-chip decompression logic. The on-chip storage requirements may become a bottleneck when the number of compressed tests is large. In this case, using each compressed test for applying several different tests allows the storage requirements to be reduced. However, producing different tests from each compressed test has a hardware overhead. This article suggests a new on-chip storage scheme for compressed tests that eliminates the additional hardware overhead. Under the new storage scheme, a set of N B-bit compressed tests targeting a set of faults F₀ is translated into a sequence S of N ⋅ B bits. Every B consecutive bits of S are considered as a compressed test. The sequence S thus yields close to N ⋅ B compressed tests, magnifying the test data stored in S almost B times. Taking advantage of the extra tests, the article describes a software procedure that is applied offline to reduce S without losing fault coverage of F₀. Experimental results for benchmark circuits demonstrate significant reductions in the storage requirements of S and significant increases in the fault coverage of a second set of faults, F₁.

1 Introduction

Testing of electronic chips is important after manufacturing, to detect defects that were introduced by fabrication processes, as well as in-field, to detect defects that escaped manufacturing testing or occurred during the lifetime of a chip. Test application can be performed by automatic test equipment (ATE) or using on-chip logic. Logic built-in self-test (LBIST) approaches [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28] use an on-chip logic block for test generation, making it unnecessary to use external test data. This facilitates in-field test application. It also addresses security concerns related to loading and unloading of test data [10]. Recent reports of silent data corruption underline the importance of LBIST for in-field testing.

A linear-feedback shift-register (LFSR) is typically an important part of the on-chip test generation logic. A basic LBIST approach uses the LFSR for applying pseudo-random tests [1]. The fault coverage achieved by the LFSR is increased significantly if multiple initial states, referred to as seeds, are used for the LFSR. Each seed produces a different subset of tests and contributes to the detection of a different subset of faults [2, 5, 7, 17].

A linear logic block, such as an LFSR, is also used as part of test data compression approaches. In this case, an ATE stores compressed tests similar to seeds for an LFSR. On-chip decompression logic transforms a compressed test provided by the ATE into a test that can be applied to the circuit. In [12, 21, 22] and [28], compressed tests are stored on-chip and decompressed using on-chip decompression logic to apply tests to the circuit. This results in the LBIST approach illustrated in Figure 1 and referred to as storage-based LBIST.

Fig. 1.

To reduce the storage requirements for compressed tests or increase the fault coverage, each compressed test in [21, 22] and [28] is used for applying several different tests to the circuit. The LBIST approach from [21] complements scan vectors produced by the on-chip decompression logic to allow each compressed test to produce several different tests. The LBIST approach from [22] complements values in the compressed tests before they are decompressed. The approach described in [28] partitions compressed tests (seeds for an LFSR) into subvectors. It combines subvectors on-chip pseudo-randomly to form new seeds that are then used by the on-chip decompression logic for forming different tests. The variations possible with pseudo-random combinations of subvectors allow the number of subvectors and storage requirements to be reduced significantly.

Other LBIST approaches also modify LFSR seeds or the pseudo-random tests they produce to apply different tests and thus increase the fault coverage [5, 6, 7, 13, 18].

In all the approaches discussed thus far, additional hardware is required for forming additional tests that improve the fault coverage or allow the storage requirements to be reduced. In the storage-based LBIST approaches, the extra hardware is used for complementing bits of compressed or applied tests or for forming seeds from subvectors.

This article suggests a new storage scheme for LFSR seeds in the configuration from Figure 1 that does not require additional hardware other than that required for storage. The new storage scheme is applied to a compressed deterministic test set \(\Psi\) consisting of N seeds that was generated for a B-bit LFSR targeting a set of faults \(F_0\) , with every seed in \(\Psi\) producing one test. The set \(\Psi\) is translated into a sequence of \(N \cdot B\) consecutive bits. The sequence is denoted by S and referred to as a seed sequence. Every B consecutive bits of S are considered as a seed. The seed sequence S thus yields close to \(N \cdot B\) seeds, magnifying the test data stored in S almost B times. By storing S in a shift register and using the B leftmost bits of the shift register as seeds, it is possible to perform test application without storing any additional information and without requiring additional hardware support. Thus, the first contribution of this article is a storage scheme that magnifies the stored seeds without requiring additional hardware support.

The second contribution of this article is a software procedure that modifies the seed sequence S to reduce its storage requirements. The procedure is applied offline to reduce S before it is stored on-chip. Whereas general-purpose storage schemes do not allow the stored data to be modified, the modification of S is accompanied by a fault simulation procedure to ensure that no loss of fault coverage occurs for \(F_0\) when S is modified. After translating \(\Psi\) into a seed sequence S, the procedure omits from S bits that are not needed as part of any seed and reorders S to obtain a new seed sequence from which additional bits can be omitted. When bits are omitted from S, the number of applied tests is reduced as well.

The third contribution of this article is to consider two sets of faults from different fault models: \(F_0\) , for which \(\Psi\) was generated, and a second set of faults, \(F_1\) , that was not targeted when \(\Psi\) was generated. Detecting faults from a second fault model is important for the quality of the test set. Compared with \(\Psi\) , the initial seed sequence S yields a significantly higher fault coverage for \(F_1\) . As the seed sequence is reduced, the procedure simulates the faults in \(F_1\) to determine the effects of the reduction on the fault coverage of \(F_1\) .

For the experiments in this article, \(F_0\) consists of stuck-at faults, and \(F_1\) consists of single-cycle gate-exhaustive faults. The storage scheme and software procedure are not limited to these fault models, and other fault models can be considered instead.

The article is organized as follows. Section 2 describes the storage scheme for seeds. Section 3 describes the software procedure for producing a seed sequence with reduced storage requirements. Section 4 presents experimental results. Section 5 concludes the article.

2 Storage of Seeds

This section discusses the storage scheme for seeds. A given set of seeds \(\Psi = \lbrace \Psi _0 , \Psi _1 , \ldots , \Psi _{N-1} \rbrace\) consists of N seeds. Each seed has B bits. A seed is represented as \(\Psi _i = \psi _{i,0} \psi _{i,1} \ldots \psi _{i,B-1}\) .

For simplicity, it is assumed that a test is compressed into a single seed. In general, a test may be compressed into several seeds. With m seeds per test and N tests, the number of seeds in \(\Psi\) would be \(m \cdot N\) , and m seeds would be used for applying a test.

Figure 2 shows a set of seeds \(\Psi\) with \(N = 3\) and \(B = 4\) as a two-dimensional array. If \(\Psi\) is stored in an on-chip memory, the used part of the memory is of dimensions \(N \times B\) , and a seed is obtained by accessing a memory word.

Fig. 2.

The entries of a two-dimensional array are commonly stored consecutively. The set \(\Psi\) with the seeds stored consecutively is shown at the top of Figure 3.

Fig. 3.

The bits of the seeds in \(\Psi\) are renumbered as shown by the array S in Figure 3. Considering every B consecutive bits of S as a seed, S yields \(N \cdot B - B + 1 = 9\) different seeds, \(S_0 = s_0 s_1 s_2 s_3 = \Psi _0\) , \(S_1 = s_1 s_2 s_3 s_4\) , \(S_2 = s_2 s_3 s_4 s_5, \ldots , S_8 = s_8 s_9 s_{10} s_{11} = \Psi _3\) . Four of these seeds are shown in Figure 3.

In general, the number of bits in S is denoted by n. With N seeds and B bits per seed, \(n = N \cdot B\) initially. The procedure described in Section 3 produces a seed sequence S with a reduced value of n. Using every B consecutive bits of \(S = \langle s_0 s_1 \ldots s_{n-1} \rangle\) as a seed, the possible seeds are \(S_i = s_i s_{i+1} \ldots s_{i+B-1}\) , for \(0 \le i \le n-B\) .

Storage of S on-chip can be implemented in one of several ways. The method for extracting the seeds from S depends on the way S is stored on-chip. If S is stored in an on-chip memory, it is necessary to be able to address every B consecutive bits of the memory. A counter can be used for pointing to the first bit of a seed. Instead, it is possible to implement S as a shift register. As the shift register is shifted left one bit at a time, the B leftmost bits of the shift register provide the next seed. Figure 4 illustrates this implementation for the set of seeds from Figure 3. Figure 4 shows the first five seeds obtained from S. Compared with storage of \(\Psi\) in an on-chip memory, both storage schemes of S are more complex. The benefit is that significantly fewer bits need to be stored.

Fig. 4.

Not every seed that can be extracted from S is needed for detecting target faults. In Figure 5, an additional bit indicates whether the seed \(S_i\) should be used for producing a test. The extra bit is denoted by \(a_i\) . In addition, \(A = \langle a_0 a_1 \ldots a_{n-1} \rangle\) . Only when \(a_i = 1\) is \(S_i\) used for producing a test. In the example of Figure 5, the test set consists of the tests based on \(S_0\) , \(S_1\) , and \(S_7\) . The vector A does not need to be stored if it is acceptable to apply \(n-B+1\) tests based on S.

Fig. 5.

It should be noted that circular shift allows n seeds to be obtained from S. Experimental results indicate that the addition of \(B - 1\) seeds obtained with circular shift does not have a significant effect on the results. Therefore, circular shift is not used in this article.

3 Procedure for Reducing the Seed Sequence

This section describes a software procedure for producing a seed sequence with reduced storage requirements from a given set of seeds. The procedure produces the seed sequence S that will be stored on-chip.

3.1 Procedure Overview

The given set of seeds is denoted by \(\Psi\) , and it consists of N seeds for a B-bit LFSR. The set \(\Psi\) targets a set of faults denoted by \(F_0\) . The same set of target faults is used while reducing the seed sequence S. A second set of target faults, \(F_1\) , is used for fault simulation only.

The procedure is outlined in Figure 6. Initially, all the seeds in \(\Psi\) are concatenated to form the initial seed sequence S. The procedure forms a test set T based on S by using all the seeds from S to produce tests and including in T tests that detect faults from \(F_0\) . The procedure reduces S by identifying bits that do not contribute to any test in T and removing them from S.

Fig. 6.

After S is reduced, new seeds may be obtained from the reduced seed sequence. With the new seeds it is possible to form a new test set T. Based on the new test set T, additional bits of S may be removed. This process is repeated as long as S can be reduced.

When no additional reduction of S is possible, the test set T is used for partitioning S into non-overlapping subsequences. The subsequences are such that every test from T has a subsequence from which its seed can be obtained. This implies that, as long as the subsequences remain intact, their order can be changed without losing the ability to detect any fault from \(F_0\) . The procedure reorders the subsequences to obtain a new seed sequence.

With the new seed sequence, it is possible to obtain new seeds and a new test set. The procedure repeats the process of reducing and reordering S until a termination condition is met. The details of the procedure are described next.

3.2 Fault Simulation

A sequence \(S = \langle s_0 s_1 \ldots s_{n-1} \rangle\) yields \(n-B+1\) seeds, \(S_0\) , \(S_1, \ldots , S_{n-B}\) . Each one of the seeds can be used for producing a test. The test produced by \(S_i\) is denoted by \(t_i\) . Let \(U = \lbrace t_i : 0 \le i \le n-B \rbrace\) be the set of all the tests that can be produced by S. A fault simulation procedure for S is given as Procedure 1 and described next.

Given a set of target faults \(F_0\) , fault simulation with fault dropping of the tests in U is used for finding a test set T. A test \(t_i \in U\) is included in T only if it detects new faults from \(F_0\) .

After computing T, forward-looking reverse order fault simulation is used for removing unnecessary tests from T. This procedure simulates the test set T in reverse order. The procedure avoids simulation of a test \(t_i\) if all the faults it detected in the forward order are already detected or will be detected by tests that appear before it in the test set. A test that does not detect any new faults is removed from T.

Based on T, it is possible to define the sequence \(A = \langle a_0 a_1 \ldots a_{n-1} \rangle\) such that \(a_i = 1\) if \(t_i \in T\) , and \(a_i = 0\) otherwise.

Procedure 1: Fault simulation

(1)

Let \(F_0\) be the set of target faults. Let S be a given seed sequence of length n. Assign \(T = \emptyset\) .

(2)

For \(0 \le i \le n-B\) :

(a)

Find the test \(t_i\) produced by the seed \(S_i\) .

(b)

Simulate \(F_0\) under \(t_i\) and remove detected faults from \(F_0\) .

(c)

If any faults were removed from \(F_0\) when \(t_i\) was simulated, add \(t_i\) to T.

(3)

Apply to T forward-looking reverse order fault simulation and remove unnecessary tests.

(4)

For \(0 \le i \le n-B\) , if \(t_i \in T\) , assign \(a_i = 1\) ; otherwise, assign \(a_i = 0\) .

3.3 Reducing S

A test \(t_i \in T\) is produced by the seed \(S_i = s_i s_{i+1} \ldots s_{i+B-1}\) . Thus, the inclusion of \(t_i\) in T requires the bits \(s_i\) , \(s_{i+1}, \ldots , s_{i+B-1}\) to be retained in S.

For a bit \(s_j\) of S, let \(m_j = 1\) indicate that there is a test \(t_i \in T\) whose seed \(S_i\) includes \(s_j\) . Let \(M = \langle m_0 m_1 \ldots m_{n-1} \rangle\) . If \(m_j = 0\) , \(s_j\) can be removed from S without losing fault coverage.

For illustration, the sequence A from Figure 5 is used in Figure 7 for computing the sequence M. For every seed \(S_i\) with \(a_i = 1\) , Figure 7 shows the bits that result in \(m_j = 1\) .

Fig. 7.

In Figure 7, \(m_j = 0\) is obtained for \(j = 5\) , 6, and 11. Therefore, \(s_5\) , \(s_6\) , and \(s_{11}\) can be removed to obtain the sequence S shown at the bottom of Figure 7.

3.4 Iterative Reduction of S

Considering the seed sequences in Figure 7, the new sequence at the bottom of Figure 7 has several seeds that do not exist in the initial sequence at the top of Figure 7. Whereas the seeds \(s_0 s_1 s_2 s_3\) , \(s_1 s_2 s_3 s_4\) , and \(s_7 s_8 s_9 s_{10}\) exist in both sequences, the new sequence at the bottom of Figure 7 has the seeds \(s_2 s_3 s_4 s_7\) , \(s_3 s_4 s_7 s_8\) , and \(s_4 s_7 s_8 s_9\) that do not exist in the initial sequence. These seeds may result in new tests that will allow additional bits to be removed from S. To benefit from this observation, the procedure repeats the fault simulation process to compute a new test set and update S, as long as bits can be removed from S. The procedure for reducing S is given as Procedure 2.

Procedure 2: Reducing S

(1)

Call Procedure 1 for S to produce the test set T.

(2)

Assign \(m_j = 0\) for \(0 \le j \lt n\) .

(3)

For every test \(t_i \in T\) , assign \(m_j = 1\) for \(i \le j \le i+B-1\) .

(4)

For \(0 \le j \lt n\) , if \(m_j = 0\) , remove \(s_j\) from S.

(5)

If any bits were removed from S, go to Step 1.

3.5 Reordering of the Seed Sequence

In Figure 8, the reduced seed sequence from Figure 7 is renumbered, and tests are computed for detecting the faults in \(F_0\) . The seeds for these tests are shown in Figure 8.

Fig. 8.

Let \(S[i,j] = \langle s_i s_{i+1} \ldots s_j \rangle\) denote the subsequence of S that consists of bits \(s_i\) , \(s_{i+1}, \ldots , s_j\) . To be able to obtain \(S_0\) from S, the subsequence \(S[0,3] = \langle s_0 s_1 s_2 s_3 \rangle\) must remain intact. In a similar way, \(S_1\) requires the subsequence \(S[1,4] = \langle s_1 s_2 s_3 s_4 \rangle\) to remain intact, and \(S_5\) requires the subsequence \(S[5,8] = \langle s_5 s_6 s_7 s_8 \rangle\) to remain intact.

The subsequences \(S[0,3]\) and \(S[1,4]\) overlap, implying that the subsequence \(S[0,4] = S[0,3] \cup S[1,4]\) must remain intact. This partitions S into two non-overlapping subsequences, \(S[0,4]\) and \(S[5,8]\) .

In general, a partition of S is obtained as follows. For every test \(t_i \in T\) , the seed \(S_i\) that produces \(t_i\) is used for defining a subsequence \(S[i,i+B-1]\) . In an iterative process, every two overlapping subsequences, \(S[i_0,j_0]\) and \(S[i_1,j_1],\) are merged into a single subsequence \(S[i_2,j_2] = S[i_0,j_0] \cup S[i_1,j_1]\) such that \(i_2 = min \lbrace i_0 , i_1 \rbrace\) and \(j_2 = max \lbrace j_0 , j_1 \rbrace\) .

After obtaining non-overlapping subsequences, S can be reordered by changing the order of the subsequences. The reordering procedure is given as Procedure 3.

Procedure 3: Reordering S

(1)

Call Procedure 1 for S to produce the test set T.

(2)

Assign \(R = \emptyset\) . For every test \(t_i \in T\) , add to R the range \([i,i+B-1]\) .

(3)

For every two ranges \([i_0,j_0] \in R\) and \([i_1,j_1] \in R\) with an index x such that \(i_0 \le x \le j_0\) and \(i_1 \le x \le j_1\) , merge \([i_0,j_0]\) and \([i_1,j_1]\) into a range \([i_2,j_2]\) such that \(i_2 = min \lbrace i_0 , i_1 \rbrace\) and \(j_2 = max \lbrace j_0 , j_1 \rbrace\) .

(4)

Reorder the ranges in R. Reorder S based on R.

Considering the reordering strategy, experimental results with various reordering heuristics indicate that a random reordering produces the best results overall for the first iterations. This can be explained as follows. Pairs of consecutive subsequences create options for seeds that are not available from each subsequence alone. Therefore, reordering of the subsequences in S creates new options for seeds that were not available before reordering. Without performing fault simulation, it is not possible to predict which pairs of subsequences should be placed consecutively to create new seeds that detect target faults. A random reordering explores the search space without requiring fault simulation.

For many benchmark circuits, random reordering reduces the number of subsequences to a small number. As the number of subsequences is reduced, every bit of the seed sequence is utilized for more seeds. As a result, the number of bits in the seed sequence approaches its minimum value.

When random reordering leaves a large number of subsequences, a heuristic that was found to be useful experimentally is to reorder the subsequences from low to high number of bits. When subsequences with small numbers of bits appear at the beginning of the seed sequence, they tend to be better utilized. This occurs since fault simulation of the seeds is performed in the order \(S_0\) , \(S_1\) , \(\ldots\) with fault dropping. Consequently, the first seeds tend to detect the largest numbers of faults, and as additional seeds are simulated, fewer faults remain to be detected. Considering subsequences instead of seeds, the same effect occurs. Therefore, the first subsequences tend to be utilized better for detecting more faults, and later subsequences, with more bits and fewer detected faults, can be reduced.

To benefit from both strategies, the procedure initially uses random reordering. After an iteration with random reordering where S is not reduced, reordering from low to high number of bits is used for the next iteration. The strategies alternate after every iteration where S is not reduced.

3.6 Termination Condition

In the procedure from Figure 6, after reordering the seed sequence, the new seed sequence is reduced and reordered again. The procedure terminates after a constant number of consecutive iterations where the number of bits in S is not reduced.

4 Experimental Results

The procedure from Figure 6 was applied to benchmark circuits as described in this section.

4.1 Setup

The compressed test set \(\Psi\) is a compact set of seeds targeting single stuck-at faults. The set of target faults \(F_0\) also consists of single stuck-at faults. A second set of faults, \(F_1\) , consists of single-cycle gate-exhaustive faults. Single-cycle gate-exhaustive faults are a superset of single-cycle cell-aware faults when the same gates or cells are used. Single-cycle gate-exhaustive faults are more difficult to detect accidentally than single stuck-at faults since they have more activation conditions that need to be satisfied. Fault simulation of \(F_1\) is carried out under \(\Psi\) . In addition, for the initial seed sequence S, and at the end of every iteration of the procedure from Figure 6, fault simulation of \(F_1\) is carried out using all the seeds that can be produced from S.

The procedure terminates after 128 iterations where the seed sequence S is not reduced. This large number of iterations is used for demonstrating the extent to which S can be reduced. The results demonstrate that the procedure can terminate earlier as discussed at the end of this section.

4.2 Results and Comparison

The results are shown in Tables 1–3. There are several rows for every circuit. The first row, with a dash under column I, uses the set \(\Psi\) containing N seeds, each one stored separately. This row provides a baseline for comparison with an approach that stores compressed tests. The storage requirements of \(\Psi\) are reduced by test data compression, but without the hardware overhead of using each stored compressed test to apply several different tests. This is an appropriate comparison with the procedure from Figure 6 that also has no additional hardware requirements beyond storing a set of compressed tests.

Table 1.

circuit	inp	B	I	reor	part	bits	frac	tests	f.c.	tests	f.c.	ntime
								stuck-at		gate-exhaustive
sasc	132	13	–	0	0	533	1.000	41	100.000	41	67.374	1.00
sasc	132	13	0	0	0	533	1.000	41	100.000	267	97.675	2.00
sasc	132	13	1	0	0	181	0.340	51	100.000	152	87.602	12.50
sasc	132	13	3	0	3	156	0.293	45	100.000	132	84.502	29.50
usb_phy	112	18	–	0	0	648	1.000	36	100.000	36	82.536	1.00
usb_phy	112	18	0	0	0	648	1.000	36	100.000	177	99.159	2.60
usb_phy	112	18	1	0	0	363	0.560	48	100.000	166	98.383	21.00
usb_phy	112	18	2	0	11	301	0.465	52	100.000	157	98.254	37.00
usb_phy	112	18	6	1	3	248	0.383	51	100.000	153	96.636	87.00
s35932	1,763	13	–	0	0	741	1.000	57	89.809	57	98.509	1.00
s35932	1,763	13	0	0	0	741	1.000	57	89.809	110	100.000	1.32
s35932	1,763	13	1	0	0	175	0.236	62	89.809	102	99.971	14.05
b04	78	28	–	0	0	952	1.000	34	99.851	34	79.263	1.00
b04	78	28	0	0	0	952	1.000	34	99.851	139	89.516	4.00
b04	78	28	1	0	0	701	0.736	49	99.851	129	89.055	12.50
b04	78	28	2	0	15	585	0.614	47	99.851	116	88.537	23.00
b04	78	28	3	0	10	557	0.585	46	99.851	118	88.537	35.50
b04	78	28	21	0	4	461	0.484	46	99.851	105	88.249	123.00
s1423	91	18	–	0	0	990	1.000	55	99.076	55	90.381	1.00
s1423	91	18	0	0	0	990	1.000	55	99.076	189	97.641	2.70
s1423	91	18	1	0	0	494	0.499	60	99.076	169	96.370	13.00
s1423	91	18	9	1	9	343	0.346	60	99.076	150	95.402	63.75
systemcdes	320	14	–	0	0	1,428	1.000	102	100.000	102	93.391	1.00
systemcdes	320	14	0	0	0	1,428	1.000	102	100.000	298	99.801	1.83
systemcdes	320	14	1	0	0	343	0.240	108	100.000	244	98.262	8.20
systemcdes	320	14	6	1	4	266	0.186	109	100.000	197	96.790	29.88
b07	53	36	–	0	0	1,476	1.000	41	99.915	41	59.459	1.00
b07	53	36	0	0	0	1,476	1.000	41	99.915	210	79.158	5.25
b07	53	36	1	0	0	965	0.654	50	99.915	176	76.559	30.00
b07	53	36	3	0	15	817	0.554	55	99.915	171	75.624	68.00
b07	53	36	6	0	8	721	0.488	55	99.915	164	75.156	135.00
simple_spi	146	38	–	0	0	1,748	1.000	46	100.000	46	55.066	1.00
simple_spi	146	38	0	0	0	1,748	1.000	46	100.000	663	97.138	4.13
simple_spi	146	38	1	0	0	1,077	0.616	63	100.000	591	94.883	12.83
simple_spi	146	38	2	0	16	991	0.567	64	100.000	593	94.959	20.17
simple_spi	146	38	7	1	9	870	0.498	66	100.000	559	93.769	54.67
des_area	367	14	–	0	0	2,212	1.000	158	100.000	158	72.424	1.00
des_area	367	14	0	0	0	2,212	1.000	158	100.000	1,103	99.920	1.77
des_area	367	14	1	0	0	443	0.200	155	100.000	377	86.803	9.84
des_area	367	14	2	0	3	428	0.193	155	100.000	364	86.019	15.35
i2c	145	43	–	0	0	2,494	1.000	58	100.000	58	73.974	1.00
i2c	145	43	0	0	0	2,494	1.000	58	100.000	385	92.564	5.20
i2c	145	43	1	0	0	1,980	0.794	66	100.000	359	91.352	22.14
i2c	145	43	17	0	17	1,733	0.695	75	100.000	356	90.884	251.57

Table 1. Experimental Results ( \(|S| \lt 2,500\) )

Table 2.

circuit	inp	B	I	reor	part	bits	frac	tests	f.c.	tests	f.c.	ntime
								stuck-at		gate-exhaustive
systemcaes	928	29	–	0	0	3,161	1.000	109	99.995	109	79.613	1.00
systemcaes	928	29	0	0	0	3,161	1.000	109	99.995	1,375	98.701	2.94
systemcaes	928	29	1	0	0	1,292	0.409	189	99.995	837	93.746	19.00
systemcaes	928	29	2	0	13	1,232	0.390	192	99.995	816	93.100	36.80
systemcaes	928	29	8	1	3	930	0.294	193	99.995	656	91.381	101.80
wb_dma	738	47	–	0	0	3,948	1.000	84	100.000	84	73.840	1.00
wb_dma	738	47	0	0	0	3,948	1.000	84	100.000	985	92.756	3.02
wb_dma	738	47	1	0	0	3,227	0.817	123	100.000	945	91.224	18.12
wb_dma	738	47	2	0	44	2,704	0.685	145	100.000	875	90.449	41.34
wb_dma	738	47	21	1	11	2,356	0.597	160	100.000	866	89.314	299.03
s5378	214	36	–	0	0	4,788	1.000	133	99.131	133	67.605	1.00
s5378	214	36	0	0	0	4,788	1.000	133	99.131	1,226	95.114	3.64
s5378	214	36	1	0	0	3,041	0.635	201	99.131	1,084	92.703	42.81
s5378	214	36	2	0	34	2,756	0.576	215	99.131	1,059	92.034	59.06
s5378	214	36	6	0	14	2,372	0.495	226	99.131	970	90.571	137.51
s9234	247	75	–	0	0	10,050	1.000	134	93.475	134	70.825	1.00
s9234	247	75	0	0	0	10,050	1.000	134	93.475	1,128	85.887	7.93
s9234	247	75	1	0	0	8,769	0.873	173	93.475	1,102	85.187	103.82
s9234	247	75	4	0	60	7,918	0.788	200	93.475	1,068	84.772	295.15
s9234	247	75	44	1	40	7,023	0.699	215	93.475	1,058	84.783	1,951.33
s9234	247	75	311	1	21	6,007	0.598	239	93.475	1,069	84.309	9,027.45
aes_core	788	28	–	0	0	10,640	1.000	380	100.000	380	98.259	1.00
aes_core	788	28	0	0	0	10,640	1.000	380	100.000	1026	100.000	1.60
aes_core	788	28	1	0	0	1,927	0.181	587	100.000	1015	99.958	16.28
s15850	611	57	–	0	0	11,229	1.000	197	96.682	197	73.285	1.00
s15850	611	57	0	0	0	11,229	1.000	197	96.682	1,504	82.185	6.80
s15850	611	57	1	0	0	9,635	0.858	246	96.682	1,423	81.543	93.85
s15850	611	57	3	0	104	8,787	0.783	269	96.682	1,347	81.096	180.76
s15850	611	57	29	0	61	7,813	0.696	289	96.682	1,325	80.909	1,077.27
s15850	611	57	155	0	33	6,717	0.598	309	96.682	1,252	80.440	4,001.87
s13207	700	47	–	0	0	11,797	1.000	251	98.462	251	62.259	1.00
s13207	700	47	0	0	0	11,797	1.000	251	98.462	2,105	87.602	6.61
s13207	700	47	1	0	0	7,361	0.624	328	98.462	1,887	82.325	76.19
s13207	700	47	2	0	80	6,930	0.587	336	98.462	1,864	82.443	134.07
s13207	700	47	6	0	32	5,881	0.499	355	98.462	1,779	80.457	305.02
s13207	700	47	55	1	6	4,686	0.397	377	98.462	1,662	78.411	1,584.23
spi	274	44	–	0	0	15,840	1.000	360	99.985	360	67.073	1.00
spi	274	44	0	0	0	15,840	1.000	360	99.985	3,065	95.565	3.49
spi	274	44	1	0	0	3,813	0.241	459	99.985	2,205	87.826	17.67
spi	274	44	2	0	34	2,849	0.180	472	99.985	1,915	85.186	33.14

Table 2. Experimental Results ( \(2,500 \le |S| \lt 20,000\) )

Table 3.

circuit	inp	B	I	reor	part	bits	frac	tests	f.c.	tests	f.c.	ntime
								stuck-at		gate-exhaustive
s38584	1,464	98	–	0	0	21,364	1.000	218	95.852	218	88.313	1.00
s38584	1,464	98	0	0	0	21,364	1.000	218	95.852	2,629	98.998	4.69
s38584	1,464	98	1	0	0	14,481	0.678	449	95.852	2,487	98.589	81.58
s38584	1,464	98	3	0	49	12,007	0.562	529	95.852	2,475	98.403	217.85
s38584	1,464	98	7	0	15	10,524	0.493	532	95.852	2,358	98.222	463.96
b20	527	119	–	0	0	28,322	1.000	238	93.304	238	64.250	1.00
b20	527	119	0	0	0	28,322	1.000	238	93.304	1,957	85.734	9.22
b20	527	119	1	0	0	24,922	0.880	288	93.304	1,920	85.506	144.43
b20	527	119	10	0	138	22,582	0.797	295	93.304	1,911	85.173	885.98
b20	527	119	351	1	85	19,759	0.698	316	93.308	1,820	84.717	14,763.71
b15	483	113	–	0	0	30,058	1.000	266	98.580	266	59.444	1.00
b15	483	113	0	0	0	30,058	1.000	266	98.580	2,517	71.206	17.13
b15	483	113	1	0	0	22,741	0.757	349	98.620	2,423	70.384	131.70
b15	483	113	4	0	126	20,874	0.694	362	98.620	2,418	70.318	399.29
s38417	1,664	111	–	0	0	31,413	1.000	283	99.471	283	70.723	1.00
s38417	1,664	111	0	0	0	31,413	1.000	283	99.471	4,600	85.155	13.86
s38417	1,664	111	1	0	0	28,873	0.919	341	99.471	4,540	84.963	100.46
s38417	1,664	111	2	0	212	27,738	0.883	365	99.471	4,495	84.990	196.70
s38417	1,664	111	7	0	136	24,931	0.794	423	99.471	4,353	84.540	556.14
s38417	1,664	111	26	0	75	21,898	0.697	477	99.471	4,222	84.180	1,710.20
s38417	1,664	111	114	0	30	18,764	0.597	519	99.471	4,039	83.935	4,947.02
b14	280	128	–	0	0	37,120	1.000	290	94.960	290	72.068	1.00
b14	280	128	0	0	0	37,120	1.000	290	94.960	982	83.476	14.13
b14	280	128	1	0	0	28,388	0.765	328	94.960	955	83.296	168.40
b14	280	128	6	0	153	25,962	0.699	334	94.960	945	83.283	677.34
b14	280	128	341	1	97	22,253	0.599	337	94.970	926	83.251	16,067.46
tv80	372	109	–	0	0	40,330	1.000	370	99.527	370	77.264	1.00
tv80	372	109	0	0	0	40,330	1.000	370	99.527	2,986	91.365	9.66
tv80	372	109	1	0	0	28,982	0.719	491	99.527	2,823	90.310	100.67
tv80	372	109	2	0	156	26,541	0.658	513	99.527	2,754	90.091	188.91
tv80	372	109	5	0	99	24,086	0.597	530	99.527	2,730	89.447	378.33
tv80	372	109	20	0	51	20,006	0.496	560	99.527	2,653	88.843	1,234.77
tv80	372	109	200	0	10	16,094	0.399	578	99.527	2,489	87.963	6,867.42
b17	1,444	94	–	0	0	44,086	1.000	469	93.863	469	44.293	1.00
b17	1,444	94	0	0	0	44,086	1.000	469	93.863	5,936	53.200	22.50
b17	1,444	94	1	0	0	30,800	0.699	744	94.902	5,604	52.833	171.90
b17	1,444	94	4	0	99	25,867	0.587	818	95.513	5,432	53.538	454.67

Table 3. Experimental Results ( \(|S| \ge 20,000\) )

The case where the initial seed sequence S, with \(N \cdot B\) bits, is used without any reduction is shown in the second row. Column I has a zero for this case.

Additional rows of Tables 1–3 show the results of the procedure from Figure 6 as it reduces S. The row with \(I = 1\) corresponds to iteration 1. The next rows correspond to the iterations where the number of bits in S is decreased below 0.9, 0.8, 0.7, \(\ldots\) of the number of bits required for \(\Psi\) .

For every iteration, after the name of the circuit, column inp shows the number of inputs (including primary inputs and present-state variables or flip-flops). Column B shows the number of LFSR bits. Column I shows the iteration of the procedure from Figure 6. For \(I \ge 2\) , column reor has a zero for random reordering and a one for reordering from low to high number of bits. Column part shows the number of subsequences into which the seed sequence S is partitioned. Column bits shows the number of bits in \(\Psi\) or S. Column frac shows the number of bits as a fraction of the number of bits in \(\Psi\) . Column \(stuck-at\) shows the number of tests (and seeds) required for detecting single stuck-at faults, followed by the stuck-at fault coverage. Column \(gate-exhaustive\) shows the number of tests (and seeds) required for detecting single-cycle gate-exhaustive faults, followed by the single-cycle gate-exhaustive fault coverage. Column ntime shows the normalized runtime, defined as follows. Let the runtime for fault simulation of \(\Psi\) be \(\rho _0\) . Let the total runtime of the procedure from Figure 6 up to iteration I be \(\rho _I\) . The normalized runtime is \(\rho _I / \rho _0\) .

4.3 Discussion

The following points can be seen from Tables 1–3. Iteration 1 reduces the number of bits in the initial seed sequence significantly. This is achieved by considering approximately \(N \cdot B\) seeds based on the initial seed sequence S instead of the N seeds included in \(\Psi\) , and without changing the order of the seed sequence. In the case of tv80, iteration 1 reduces the number of bits to 0.719 of the number of bits required for \(\Psi\) , which is already a compressed test set. Thus, the additional storage reductions are achieved on top of those achieved by test data compression that is used for \(\Psi\) .

For iteration \(I \ge 2\) , the seed sequence is partitioned into a number of subsequences that depends on the circuit. The number of subsequences varies from a few to over 100. Even with a small number of subsequences, additional iterations typically achieve an additional reduction in the number of bits by reordering the seed sequence before attempting to reduce it again. In the case of tv80, the last iteration reduces the storage requirements to 0.399 relative to \(\Psi\) .

The number of subsequences typically decreases as additional iterations are performed and the number of bits in S is reduced. The reduction in the number of subsequences implies that more overlapping seeds are obtained, and every bit of the seed sequence is utilized for more seeds.

The fault coverage of \(F_1\) is increased significantly when S is considered instead of \(\Psi\) in iteration 0, before S is reduced. As S is reduced, the fault coverage of \(F_1\) may decrease, but it remains significantly higher than that of \(\Psi\) . The fault coverage of \(F_0\) is not allowed to decrease. The same can be required for \(F_1\) . This option was not implemented so as not to limit the reduction of S.

The number of tests available from S is equal to \(n-B+1\) , where n is the number of bits in S. For LBIST, these numbers of tests are acceptable. Tables 1–3 also show the numbers of tests needed for detecting stuck-at and single-cycle gate-exhaustive faults. These numbers are lower than \(n-B+1\) , with more tests needed for single-cycle gate-exhaustive faults than for stuck-at faults. The advantage of applying all \(n-B+1\) tests is the potential for defect detection.

It is interesting to note that the number of tests required for detecting single stuck-at faults increases as the seed sequence is reduced. This is a result of the fact that the seeds have more overlaps as the seed sequence is reduced.

Random reordering is used in more iterations than reordering from low to high number of bits. This explains why column reor has more zeros than ones. Considering all the iterations (and not only the ones reported), reordering from low to high number of bits is used in a significant number of iterations and has a significant effect on the results.

4.4 Computational Effort

The normalized runtime measures the computational effort of the procedure from Figure 6 in terms of fault simulation time. This is appropriate since the procedure performs fault simulation to support the reduction of S.

The runtime for iteration 0 is higher than the runtime for simulating \(\Psi\) because of the number of simulated tests. The runtime per iteration decreases as additional iterations are performed, and the number of bits in S is reduced.

The normalized runtime for iteration 0 does not increase with the size of the circuit, and larger circuits sometimes have a lower normalized runtime. This indicates that the procedure scales similarly to a fault simulation procedure, which is manageable for circuits of any size.

The highest normalized runtimes in Tables 1–3 are obtained when the procedure performs a large number of iterations. It is possible to limit the number of iterations to limit the runtime. Figure 9 illustrates this point by considering the reduction in the storage requirements of S as a function of the normalized runtime for b14. This circuit was selected since it has the highest normalized runtime at termination. Iteration 19 is marked in Figure 9.

Fig. 9.

Based on the data in Figure 9, and considering the results for other circuits in Tables 1–3, it is possible to terminate the procedure after 20 iterations without a significant cost in terms of the storage requirements. In general, it is possible to set a target for the minimum acceptable reduction in storage requirements per iteration. When the reduction drops below the minimum, the procedure can terminate. A similar approach can be applied to the fault coverage of \(F_1\) . When this fault coverage drops below a preselected bound relative to the fault coverage obtained in iteration 0, the procedure can terminate. In this case, it is possible to select a solution where the fault coverage of \(F_1\) has a local maximum. For example, for s13207, instead of a solution with a storage reduction to 0.499 and a fault coverage of 80.457% for \(F_1\) , it is possible to use a solution with a storage reduction to 0.493 and a fault coverage of 80.882% for \(F_1\) .

4.5 Further Comparison

In Tables 1–3 the first row corresponds to an approach that stores a set \(\Psi\) of compressed tests on-chip and uses each compressed test from \(\Psi\) to apply a single test to the circuit. The procedure from Figure 6 is compared with this case to demonstrate the reduced storage requirements relative to a compressed test set and the improved fault coverage achieved for \(F_1\) .

In Table 4, the procedure from Figure 6 is compared with the LBIST approach from [28]. In [28], the compressed tests from \(\Psi\) are partitioned into subvectors, and subvectors are stored on-chip. Compressed tests are formed on-chip using pseudo-random combinations of subvectors. In this approach, each subvector may contribute to the formation of several different compressed tests. A software procedure described in [28] takes advantage of the ability to use each subvector multiple times to reduce the storage requirements and improve the fault coverage of gate-exhaustive faults. The main hardware cost of the approach from [28] is the need for multiplexers, controlled by an LFSR, to select combinations of subvectors that will form compressed tests. For parameters l, p, and \(|V|\) , the approach from [28] requires p l-bit multiplexers with \(|V|\) data inputs and \(\lceil log_2(|V|) \rceil\) select inputs.

Table 4.

circuit	l	p	\(\|V\|\)	[28]	S	[28]	S	[28]	S	[28]	S
	[28]			bits		tests		stuck-at		gate-exh
sasc	1	13	2	2	156	416	143	100.000	100.000	93.923	84.502
usb_phy	1	18	2	2	248	2,913	230	100.000	100.000	99.677	96.636
s35932	1	13	2	2	175	232	162	89.809	89.809	99.996	99.971
b04	1	28	2	2	461	46,318	433	99.851	99.851	98.560	88.249
s1423	1	18	2	2	343	25,579	325	99.076	99.076	99.758	95.402
systemcdes	1	14	2	2	266	1,297	252	100.000	100.000	99.646	96.790
b07	5	8	22	110	721	963,258	685	99.915	99.915	99.636	75.156
simple_spi	2	19	3	6	870	620,445	832	100.000	100.000	99.873	93.769
des_area	1	14	2	2	428	739	414	100.000	100.000	93.432	86.019
i2c	15	3	35	525	1,733	158,072	1,690	100.000	100.000	97.797	90.884
systemcaes	1	29	2	2	930	39,418	901	99.995	99.995	99.989	91.381
wb_dma	10	5	277	2,770	2,356	986,814	2,309	*99.989	100.000	99.577	89.314
s5378	1	36	2	2	2,372	53,432	2,336	99.131	99.131	99.533	90.571
s9234	3	25	8	24	6,007	996,368	5,932	*92.796	93.475	96.131	84.309
aes_core	1	28	2	2	1,927	3,548	1,899	100.000	100.000	99.997	99.958
s15850	4	15	15	60	6,717	999,797	6,660	*96.119	96.682	92.853	80.440
s13207	1	47	2	2	4,686	190,271	4,639	98.462	98.462	97.623	78.411
spi	6	8	51	306	2,849	683,691	2,805	99.985	99.985	99.931	85.186
s38584	5	20	28	140	10,524	995,504	10,426	*95.796	95.852	99.766	98.222
b20	5	24	29	145	19,759	997,430	19,640	*88.861	93.308	86.057	84.717
b15	5	23	32	160	20,874	999,779	20,761	*97.708	98.620	83.513	70.318
s38417	4	28	14	56	18,764	998,535	18,653	*99.343	99.471	95.892	83.935
b14	3	43	5	15	22,253	973,582	22,125	*85.883	94.970	78.680	83.251
tv80	7	16	124	868	16,094	998,072	15,985	*98.919	99.527	97.424	87.963

Table 4. Comparison with [28]

Table 4 is organized as follows. After the name of the circuit, the values of the parameters l, p, and \(|V|\) from [28] are given. Next, the following parameters of the approach from [28] and the approach suggested in this article are compared. Column bits shows the number of bits that need to be stored. Column tests shows the number of applied tests. For the approach suggested in this article, the number of applied tests is computed as \(n-B\) assuming that all the tests based on S will be applied. Column \(stuck-at\) shows the stuck-at fault coverage achieved. When the approach from [28] loses Stuck-at fault coverage, the fault coverage is marked with an asterisk. Column \(gate-exh\) shows the single-cycle gate-exhaustive fault coverage.

The procedure from [28] applies up to 1 million tests. For several of the circuits considered, this bound on the number of tests is not sufficient for achieving complete single stuck-at fault coverage. This limitation is common with LBIST, and a small fault coverage loss is tolerated by many LBIST approaches. This shortcoming does not exist with the approach suggested in this article, which guarantees to achieve the single stuck-at fault coverage of the deterministic test set \(\Psi\) .

The advantages of the approach from [28] over the approach suggested in this article are a reduced number of bits and an increased single-cycle gate-exhaustive fault coverage. Both of these advantages are achieved at a cost of a significantly increased number of applied tests.

Considering the hardware cost, the need for additional hardware is typical of LBIST approaches, and exists in [28] as well. It does not exist with the storage scheme suggested in this article.

5 Concluding Remarks

This article described a new storage scheme for LFSR seeds as part of a storage-based LBIST approach. Under the suggested storage scheme, a set \(\Psi\) of N seeds for a B-bit LFSR is translated into a sequence of \(N \cdot B\) consecutive bits denoted by S. Every B consecutive bits of S are considered as a seed. With \(N \cdot B\) bits in S, the sequence S yields close to \(N \cdot B\) seeds, magnifying the test data stored in S almost B times. The article described a procedure that uses the extra tests to reduce S without losing fault coverage for a set of target faults \(F_0\) . The reduction is achieved by forming a test set T and removing from S bits that do not contribute to T. When no further reduction is possible, the sequence is reordered by finding independent subsequences and changing their order. A second set of target faults, \(F_1\) , is simulated to demonstrate that this approach increases the fault coverage for \(F_1\) . Experimental results for benchmark circuits showed significant reductions in the storage requirements with a significant increase in the fault coverage of \(F_1\) .

References

[1]

P. H. Bardell, W. H. McAnney, and J. Savir. 1987. Built-in Test for VLSI Pseudorandom Techniques. Wiley Interscience.

Abstract

1 Introduction

2 Storage of Seeds

3 Procedure for Reducing the Seed Sequence

3.1 Procedure Overview

3.2 Fault Simulation

3.3 Reducing S

3.4 Iterative Reduction of S

3.5 Reordering of the Seed Sequence

3.6 Termination Condition

4 Experimental Results

4.1 Setup

4.2 Results and Comparison

4.3 Discussion

4.4 Computational Effort

4.5 Further Comparison

5 Concluding Remarks

References

Index Terms

Recommendations

Partially Invariant Patterns for LFSR-Based Generation of Close-to-Functional Broadside Tests

Scan Flip-Flop Grouping to Compress Test Data and Compact Test Responses for Launch-on-Capture Delay Testing

Deterministic Test Vector Compression/Decompression for Systems-on-a-Chip Using an Embedded Processor

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations