Reliability Tutorial
Reliability Tutorial
Component Reliability Tutorial Reliability of a component is finding out the probability that the component will perform under its normal operating conditions for a set period of time. Reliability therefore, can be shown as a probability distribution as shown in Figure 1.
100
0 Time t
Figure 1 Graph showing the probability distribution for a semiconductor device. The graph shows a number of failures at low t (Infant Mortality). Towards the end of life there is a large increase of failures (Wear Out). Failure There are two important definitions of component failure when analysing semiconductor devices namely:(i) Degradation failure:- Many semiconductor devices have key parameters eg Beta (DC current gain). When measuring parameters such as this over a period of time, the component is said to have failed if the parameter values drifts outside pre-determined limits. (ii) Catastrophic failures:- The component simply fails completely eg at end of life. Failure mechanisms One of the main causes of failures in GaAs and Silicon is metal migration, which is the physical flow of metal from one contact to the next due to the presence of an electric current. As metal is removed from a contact its surface area will reduce which increases the current density leading to eventual burn out of the contact. In addition, accumulation of metal removed from one contact, can build elsewhere causing short circuits.
Sheet 2 of 7
Reliabilty Definitions As mentioned earlier reliability can be shown as a probability distribution of cumulative failures. The probability of a component surviving to a time t is the reliability, R(t), and is described by the following relationship:-
R (t ) =
f (t ) =
Number of components failing per unit time at instant t Number of components surviving at instant t
R(t ) = exp( ft )
R(t) is therefore an exponential function over time as shown in Figure 2.
Probability R(t)
1/e
Time (t)
Figure 2 Graph showing the probability of survival of a component against time t, for a constant failure rate. The failure rate f(t) is the number of components failing per unit time which we hope will be small in factions of a percent per second. Therefore we need more manageable values which is done by scaling the failure rate so that f(t) is the failure (%) per 1 x 109 hours This is known as a FIT and is defined as:-
1 FIT =
1 Failure 1 x 109
Sheet 3 of 7
Another commonly used method of quantifying failure rates is to use mean time to failures (MTBF). This is defined as the mean time between failures and assuming the failures occur randomly, at a constant rate, the MTBF is given by:-
MTBF =
1 f
P( s) = e
t MTBF
We can plot P(s) against time normalised to MTBF. From the plot (shown in Figure 3) it can be seen that after MTBF, the probability (P(s)) that there will be no failures is 60% and after 1 MTBF this probability of success falls to 37%.
0.60
P(s) %
0.37 0.05
MTBF
1 MTBF
3 MTBF
Time (t)
Figure 3 Graph of Probability of no failures P(s) in % against time normalised to mean time between failures (MTBF). Obviously the more samples of components are used the more accurate the distribution data will be and the better are confidence when predicting component failure for example. If we have a large sample of devices say > 1x105 and we find that there are 1x103 failures in 1x1012 hours then the number of FITs would be:-
1 FIT =
1 Failure 1 x 109
However, if we had one device on test that failed after 11x109 hours this again would = 1 FIT however our confidence would be much lower as the sample is so small.
Sheet 4 of 7
A common graphical interpretation of the failure rate is shown in (Figure 4) and is known as the bathtub curve.
Useful Life
1/MTBF
Infant Mortality
Wear out
0 Time (t)
Figure 4 Graph showing a typical semiconductor failure rate known as the bathtub curve. The bathtub curve consists of three regions first is the Infant mortality where the weak devices fail early. Infant mortality is followed by a region, where there is a constant failure rate, and failures occur due to random overloads. Finally devices begin to wear out resulting in a steep failure rate curve. Life Testing For most applications we need to know the probability of failures after several years and therefore, to obtain data for a given device, accelerated life test are performed by operating the component at elevated temperatures. Operating the device at high temperatures reduces the MTBF and thus data can be acquired in a shorter time. The failure rate due to elevated temperature testing is governed by the Arrhenius equation:-
Ea r = A exp kT
Where, r = rate of process, A = a proportional multiplier, which can be a function of temperature ie A = A(t) Ea = Activation energy constant, k = Boltzmans constant = 8.6x10-5 (eV/K). Using the data on component failures at high temperatures and the Arrhenius equation we can reasonably predict the failure rate a lower temperatures by assuming a lognormal mathematical failure distribution.
Sheet 5 of 7
The failure data is plotted on an Arrhenius lognormal graph, where mean failures are plotted on a log (1/T) scale against a log 1/Time scale. The resulting plots should be a straight line allowing extrapolation to normal operating temperatures. An example of an Arrhenius graph is shown in
1000
100
10 1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.00E+08
Time (hours)
Figure 5 Example Arrhenius plot showing data plotted on a Log 1/T vs Log time scale. Where k = Boltzmans constant = 8.6x10-5 (eV/K).
Ln
t2 Ea 1 1 = t1 k T 2 T1
Where, t1, t2 = time to failure. Ea = Activation energy in electron volts (eV) T = Absolute temperature (K). k = 8.6x10-5 (eV/K).
Sheet 6 of 7
Example plots of Activation Energies The graph shown in Figure 6 show two reliability plots with different activation energies the calculation of which is shown below:100
70
10.0
Failure rate (1000 Hours)
Graph2
Graph1
1.0
0.1
25
100 Temperature C
225
Figure 6 Graph showing two reliability plots with different activation energies. Graph 2 has the higher activation energy as it has the greater slope. Graph 1
t 70 Ln 2 Ln t 6.55. 1 .k = Ea = 0 .1 8.6 x10 5 = 8.6 x10 5 = 0.4eV -3 1 1 1.4x10 1 1 (20 + 273) (225 + 273) T To
Graph 2
t 100 Ln 2 Ln t 6.9. 0.1 1 .k = Ea = 8.6 x10 5 = 8.6 x10 5 = 0.8eV -3 7.3x10 1 1 1 1 (20 + 273) (100 + 273) T To
Typical Activation energies are 1.2ev to 1.9eV
Sheet 7 of 7
Component testing Accurate knowledge of device temperature is vital in predicting the device long-term life during accelerated life testing. The thermal conductivity of GaAs is less than a third of that of Si at room temperature, which means that the certain areas on the die can be significantly hotter than other parts and more importantly hotter than the base-plate temperature. This conductivity of the die is defined by its thermal resistance the temperature difference between the hottest spot and some reference spot, usually the ambient or base-plate temperature, divided by the device power dissipation. Therefore, the thermal resistance is expressed in C/W. As most failures in GaAs occur in the FET channel, all life-test data is referenced to the channel temperature and it is important to determine the channel temperature accurately during life testing.