# Exploring Subsets of Standard Cell Libraries to Exploit Natural Fault Masking Capabilities for Reliable Logic

Drew C. Ness University of Minnesota Department of Scientific Computation EECS building, 200 Union St SE, Minneapolis, MN, 55455-0167 dness@ece.umn.edu Christian J. Hescott University of Minnesota Department of Electrical and Computer Engineering EECS building, 200 Union St SE, Minneapolis, MN, 55455-0167

research@hescott.com

David J. Lilja University of Minnesota Department of Electrical and Computer Engineering EECS building, 200 Union St SE, Minneapolis, MN, 55455-0167 Iilja@umn.edu

# ABSTRACT

Deep submicron technology is expected to be plagued by many reliability issues including soft errors in logic. To address this, we demonstrate how exploiting the natural fault masking characteristics of logical functions can be achieved by exploring the design space for selecting subsets of cells from within a cell library prior to synthesis. Subset selection alone is shown to improve the reliability of combinational logic circuits by more than 35%. We compare how subset libraries effect the trade-offs between reliability, area, power, and performance. Further, we show that added benefits of reduced cell library size can benefit the design.

### **Categories and Subject Descriptors**

[Testing, Reliability, Fault-Tolerance]

#### Keywords

Cell library, logical fault-masking, fault-tolerance

### 1. INTRODUCTION

Research is increasingly focusing on the growing soft error rates (SERs) in CMOS and the related problems [4, 5, 21]. Soft errors will arise due to a higher susceptibility to radiation (alpha and neutron), temperature (environment), power supply and ground noise, and electromagnetic interference among others [7, 19, 26]. The problem of dealing with these errors has traditionally been a problem for memory designers, however recent studies have indicated that within the next decade the problem of soft errors could be as great in logic as it is in memory now [21]. As error rates scale, more options will be needed by logic designers to combat them.

Traditional methods for dealing with soft errors are generally quite costly in terms of speed, power consumption,

GLSVLSI'07, March 11–13, 2007, Stresa-Lago Maggiore, Italy. Copyright 2007 ACM 978-1-59593-605-9/07/0003 ...\$5.00. and area. These costs are usually unacceptable for many commercial applications[3]. In this paper, we present an approach that utilizes the natural fault masking characteristics of logic with area, power, and delay costs kept lower than traditional methods. Additionally, the complexity of many fault tolerant approaches is traded for time spent exploring the design space to achieve this increased reliability in our approach.

Our method utilizes standard cell libraries and commercial synthesis tools. We attempt to increase the barriers between an error signal and a circuit's primary outputs (POs). In limiting the choices for synthesis to gates which have naturally higher fault masking characteristics we trade reliability for added area, power, and delays. We can also benefit from added efficiency from the synthesis tools arising due to the simplified cell libraries, in some cases reducing delay and/or power.

Our method and preliminary results are presented in this paper.

## 2. MOTIVATION

Reducing the failure rate of circuits due to errors is our primary goal. The process through which a soft error occurs in combinational logic roughly follows this order: a transient event occurs, the event must provide a sufficient charge, a device must be sensitive to the event, and there must be a sensitive path from the device to an output [13, 18]. If no such sensitive path exists, the signal is said to be logically masked[13].Logical masking is one of three dominating masking mechanisms in circuits along with electrical and timing[13, 18, 21, 23].

Logical masking occurs when a gate masks an error in its input due to its logical function. Consider a typical 2-input NAND gate. If one of the two inputs is in error and the other input has a logical value of 0, the gate will produce the correct result regardless of the error. If the same gate has a logical value of 1, then only an error producing a 1 logical value in the other input will result in an error (or an error occurring in the gate itself).

Another consideration for masking is the logical depth of the circuit. The more nodes (gates) that an error signal must travel through reduces the likelihood that the error will produce an error at one of the POs. This effect arises by combining the masking characteristics of each node along the path.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

We must also consider special cases. Gates such as XOR or NOT cannot mask errors. Any error in their inputs will be reflected in their outputs.

Finally, we consider the sensitivity of errors to the inputs. It has been shown that path sensitivity and sensitivity to errors are dependent on the values of the inputs [1, 18].

When we combine these factors into cell design, we are looking for a method that can increase the logical depth without increasing delay significantly, minimize the number of non-masking gates, and alter the gates to change the sensitivity the inputs. By exploring the cell libraries, we can bypass typical costly error analysis by letting the synthesis tools do much of the work for us.

Previous work done to address the issues of fault tolerance at a gate, transistor, or gate level has focussed on two primary areas. The first is analysis of gate and path sensitivity in circuits to highlight areas where noise and faults are likely to have the greatest effects [9, 23, 24]. The second area is in device and parameter alteration. In [2, 25] once sensitive nodes have been identified, resizing the transistors diminishes the likelihood of that node producing an error by reducing the susceptibility to radiation and noise. In [6] a number of parameters can be changed, including voltages and sizings to decrease the susceptibility to noise and radiation based on a similar sensitivity analysis.

In both areas of work, the error sensitivity due to logical masking is taken into account. Our method is fully compatible with these approaches and would enhance the effectiveness of these approaches. Our method is much simpler than these methods, requiring little time to be spent outside of subset selection, as described in the next section.

#### 3. METHODOLOGY

Our goal is to determine what effect the choice of cell libraries and subsets of those libraries had on error reduction. Previous studies have already examined the effects of subset library selection on area, power, and delay[8, 15]. Our method expands this search to include the effect that subset selection have on reliability.

Our investigation was conducted using a variety of subset libraries and statistical error injection. We begin with a selection of 50 combinational circuits from the LGSynth93 benchmarks[17]. The circuits are then synthesized with Synopsys Design Compiler using the open source vxlib  $0.13\mu$ m standard cell library[20]. The synthesized circuits were analyzed for area, delay, circuit depth, average node depth, dynamic power, leakage power, and cell selection. The synthesized circuits are then injected with errors and a base rate of failure (ROF) is determined for each circuit under single error injection conditions. The process is then repeated using each subset cell libraries. Finally, ROF, area, delay, circuit depth, power, and node count comparisons are made for each subset with respect to the base case.

### 3.1 Subset Selection

A wide variety of subsets were considered for this experiment. The criteria for selection were 1)the subset libraries represent a variety of different cells, although not necessarily in the same subset library, 2) the subset library meet the minimum requirements of the compiler<sup>1</sup>, 3) that the subset be logically complete, that is to say every boolean function could be computed with only the primitives represented in the subset, and 4) minimize the number of non-masking cells.

Some of the subset libraries we use are: 2NAND-INV, 2,3NAND-INV, 2,3,4NAND-INV, 2NOR-INV, 2,3NOR-INV, 2,3,4NOR-INV, 2NAND-2NOR-INV, 2AND-1NV, 2OR-INV, 2AND-2OR-INV, DS11, and DS20. The libraries DS11 and DS20 are subset libraries adapted from [8] (without flip-flops).

Each subset library is named according to the primary logic primitives included. Further subsets are then selected from within these categories. The library 2NAND-INV, for example, represents set of subset libraries based only on the 2-input NAND and inverter gates. There are 6 drive strengths for the inverters and 4 for the 2NAND gates. The library 2,3NAND-INV represents a library containing 2 and 3-input NAND gates and inverters, etc.

#### **3.2 Rate of Failure and Error Injection**

A measurement of the ROF is made using a single pulse injection simulation. The pulses are injected in the synthesized circuits, into gates and distributed according to the (sensitive) area for each gate in line with those described in [10, 14, 16]. ROF is then measured as the fraction of trials a circuit has an incorrect value at a PO. A value greater than one represents a decrease

Statistical injection methods are utilized due to the intractably large number of experiments required for exhaustive investigation. The total number of simulations for exhaustive single pulse injection is adapted from [14].

We found the statistical simulations were within 5% of the exhaustive simulations in the tractable base synthesis cases[11].



# 4. **RESULTS**

Figure 1: Select results from the 2NAND-INV subset. A value of 1.0 represents no change. Values less than one indicate decreases. C6288 shows a 15.3%decrease in errors along with a 46.5% increase in area, 0.4% increase in delay, and 6.4% increase in power. The average was 8.5% decrease in errors across the benchmarks.

<sup>&</sup>lt;sup>1</sup>This requirement was bypassed for certain subsets by setting the timing or area details for required gates to a point

that prevented the compiler from selecting them.



Figure 2: Select results from the DS-11 subset. While delay and area overheads were modest the power increases were significant. The average increase in power for the entire benchmark set was 215% with a 5.5% decrease in errors.

Table 1: Partial Summary of Results.

| Lib       | Err   | Best  | Area | Delay | Power |
|-----------|-------|-------|------|-------|-------|
|           | (%)   | (%)   | (%)  | (%)   | (%)   |
| 2NAND     | -8.3  | -28.7 | 78.7 | 22.2  | 14.5  |
| DS-11     | -5.5  | -25.2 | 38.4 | -0.4  | 215.0 |
| DS-20     | 2.1   | -11.3 | 7.6  | 4.0   | 42.8  |
| 2NAND*    | -10.8 | -28.7 | 79.0 | 22.9  | 14.8  |
| DS-11*    | -10.2 | -25.2 | 40.4 | -0.2  | 224.7 |
| DS-20*    | -4.4  | -11.3 | 6.0  | 4.8   | 35.6  |
| 2NOR      | -3.1  | -32.3 | 94.2 | 49.0  | 32.0  |
| 2,3NAND   | -3.5  | -34.2 | 46.7 | 8.6   | 7.6   |
| 2NAND-NOR | -5.6  | -36.7 | 58.9 | 25.2  | 19.1  |

With a limitation of space and an overabundance of results we have presented a typical sampling of our results. Full results will be available in a forthcoming technical report.

Only 6 circuits did not improve somewhat in ROF using 2NAND-INV. All but one of these circuits, rd73, were shown to have some improvement in ROF for at least one subset library. A selection of results from 2NAND-INV and DS-11 are shown in figures 1 and 2.

A Partial summary is shown in table1. Negative results are decreases. Stared (\*) libraries represent only those with a decrease in errors. "Best" represents the benchmark with the largest decrease in errors (eg -36.7 represents a 36.7%decrease in errors). What these summaries don't show is the variation within each subset library. Although there is an average decrease of 5.5% for DS-11, the range of change in errors goes from +40% to -25%. For this reason we suggest exploring several subsets for any given circuit.

## 5. ANALYSIS AND DISCUSSION

Significant, 30% or higher in some circuits, reductions in ROF were able to be achieved by exploring a number of subset cell libraries when compared to the circuits synthesized using the full vxlib cell library. Additionally, many of the circuits showed even greater, some > 60%, reductions when compared to the "as described", unoptimized original

versions.

The gains in reliability were generally accompanied by increases in delay and in area, although a fair number did show additional reductions in delay, area, or both. The additional delay and area costs were generally consistent with reported costs using subset or compact cell libraries [8, 15]. For example, an average 7.6% increase in area, 42.8% increase in power, and 4.0% increase in delay is in line with the 20% increase in area, 38% increase in power, and 8% decrease in delay for DS20 given in [8].

Further, we were able to find at least one, and usually more than one subset cell library that decreased a circuit's failure rate allowing us to choose amongst libraries to find better area and delay parameters.

This reduction in failure rate is achieved with the benefits of subset or compact library design as well. By building an imperfect cell library and analyzing the logic masking effects of the subset libraries, more time could be spent perfecting only the cells needed for the subsets.

Additionally, the reduction in cells has been shown to increase the efficiency of the synthesis tools and increase the speed, area and power consumption for logic blocks [15].

#### 5.1 Discussion

Compared to a circuit level approach, such as triple modular redundancy [12, 22], the subset library masking approach has the benefit of being able to work with a greatly reduced area overhead, an average increase of 40-50% compared to >100% and others > 200%. Many of these methods rely upon sensitive areas (such as majority logic) which may no longer be reliable. Additionally, we can combine our approach with higher-level fault-tolerant methods or even other lower level approaches. Combining our method with gate-level fault-tolerant methods, such as transistor resizing, the increase in reliability should be even greater.

On its own, the method we have outlined is able to reduce failures by reducing the sensitivity to error signals with the benefits of subset or compact cell libraries.

Finally, a major limitation of this approach at this point is the exploratory nature of subset selection. There are many factors, such as increased area and increased logical depth, that are present in many circuits that benefit the most from logical masking. However, these characteristics are present in many of the most unsuccessful circuits as well. We hope that with added investigation into more circuits and more cell libraries that this issue can be resolved. The simplicity of this method compensates for some of this design time lost to exploration.

# 6. CONCLUSION

### 6.1 Future Work

This work represents preliminary findings demonstrating the capability of subset cell libraries for reducing circuit failures by exploiting logical masking and other error sensitivity altering characteristics. We are currently conducting more experiments demonstrating these effects with various other full and subset cell libraries with an analytical component to verify the demonstrated masking effects. These results reflect single-injection per cycle experiments, current experiments are examining the effect that scaling injection rates according to area have on this method. We also plan to include the analysis of failure reduction resulting from gate sizing and transistor selection as well as the logical masking into our future simulations, combining our techniques with the techniques of others. Finally, we also have begun to use the findings from these experiments to construct our own cell libraries specifically designed for improved reliability along with methods for determining an approach to finding subsets for investigation.

## 6.2 Summary

We have shown that subset selection within a standard cell library can benefit a combinational logic design by providing decreased error sensitivity. We have also noted the effects on area and delay. When compared to traditional faulttolerance methods, we find that this method shows promise with greatly reduced costs and the benefits of reduced cell library sizes.

# 7. ACKNOWLEDGMENTS

This work was supported in part by Semiconductor Research Corporation contract no. 2004-HJ-1190, IBM, Intel, the University of Minnesota Digital Technology Center, and the Minnesota Supercomputing Institute.

## 8. REFERENCES

- M. Abramovici et al. Critical path tracing an alternative to fault simulation. In DAC '83: Proceedings of the 20th conference on Design automation, pages 214–220, Piscataway, NJ, USA, 1983. IEEE Press.
- H. Asadi and M. Tahoori. Soft error hardening for logic-level designs. In *Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International Symposium on*, pages 4 pp.-, 2006.
- [3] R. Baumann. Radiation-induced soft errors in advanced semiconductor technologies. Device and Materials Reliability, IEEE Transactions on, 5(3):305–316, 2005.
- [4] C. Constantinescu. Impact of deep submicron technology on dependability of VLSI circuits. In *IEEE International Conference on Dependable Systems and Networks(DSN)*, June 2002.
- [5] C. Constantinescu. Trends and challenges in VLSI circuit reliability. *IEEE MICRO*, 23(4):14–19, 2003.
- [6] Y. Dhillon et al. Soft-error tolerance analysis and optimization of nanometer circuits. In *Design*, *Automation and Test in Europe*, 2005. Proceedings, pages 288–293 Vol. 1, 2005.
- [7] P. Dodd and L. Massengill. Basic mechanisms and modeling of single-event upset in digital microelectronics. *Nuclear Science*, *IEEE Transactions* on, 50(3):583–602, 2003.
- [8] N. M. Duc and T. Sakurai. Compact yet high performance (cyhp) library for short time-to-market with new technologies. In ASP-DAC '00: Proceedings of the 2000 conference on Asia South Pacific design automation, pages 475–480, 2000.
- B. Gill et al. Node sensitivity analysis for soft errors in cmos logic. In *Test Conference, 2005. Proceedings. ITC 2005. IEEE International*, pages 9 pp.-, 2005.
- [10] J. Han and P. Jonker. A defect- and fault-tolerant architecture for nanocomputers. *Nanotechnology*, 14(2):224–230, 2003.

- [11] C. J. Hesscot, D. C. Ness, and D. J. Lilja. A methodology for stochastic fault simulation in vlsi processor architectures. In *MoBs*, 2005.
- [12] P. K. Lala. Self-Checking and Fault Tolerant Digital Design. Academic Press, San Diego California, 2001.
- [13] P. Lidén et al. On latching probability of particle induced transients in combinational networks. In 24<sup>th</sup> International Symposium on Fault-Tolerant Computing, pages 340–349, June 1994.
- [14] A. Maheshwari et al. Techniques for transient fault sensitivity analysis and reduction in vlsi circuits. In Defect and Fault Tolerance in VLSI Systems, 2003. Proceedings. 18th IEEE International Symposium on, pages 597–604, 2003.
- [15] J.-M. Masgonty, S. Cserveny, C. Arm, P.-D. Pfister, and C. Piguet. Low-power low-voltage standard cell libraries with a limited number of cells. In *International Workshop on Power and Timing Modeling, Optimization and Simulation, PATMOS2001.*
- [16] L. Massengill et al. Analysis of single-event effects in combinational logic-simulation of the am2901 bitslice processor. *Nuclear Science, IEEE Transactions on*, 47(6):2609–2615, 2000.
- [17] K. McElvain. Logic synthesis and optimization benchmarks. In International Logic Synthesis Workshop, 1993.
- [18] K. Mohanram and N. A. Touba. Cost-effective approach for reducing soft error failure rate in logic circuits. In *International Test Conference ITC*, pages 893–901, 2003.
- [19] E. Normand. Single event upset at ground level. Nuclear Science, IEEE Transactions on, 43(6):2742–2750, 1996.
- [20] G. Petley. Asic standard cell library design. http://www.vlsitechnology.org/.
- [21] P. Shivakumar et al. Modeling the effect of technology trends on the soft error rate of combinational logic. In *IEEE International Conference on Dependable* Systems and Networkds (DSN), 2002.
- [22] J. von Nuemann. Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components, pages 43–98. Princeton University Press, Princeton N.J., 1955.
- [23] B. Zhang, W.-S. Wang, and M. Orshansky. FASER: fast analysis of soft error susceptibility for cell-based designs. In *Quality Electronic Design*, 2006. ISQED '06. 7th International Symposium on, 2006.
- [24] C. Zhao et al. A scalable soft spot analysis methodology for compound noise effects in nano-meter circuits. In DAC, 2004.
- [25] Q. Zhou and K. Mohanram. Cost-effective radiation hardening technique for combinational logic. In *Computer Aided Design*, 2004. ICCAD-2004. IEEE/ACM International Conference on, pages 100–106, 2004.
- [26] J. F. Ziegler. Terrestrial cosmic rays. IBM Journal of Research and Development, 40(1), 1996.