Paper 262 Statistical Methods For Reliability Data Using SASR Software
Paper 262 Statistical Methods For Reliability Data Using SASR Software
William Q. Meeker Dept. of Statistics and Center for Nondestructive Evaluation Iowa State University Ames, IA 50011 Luis A. Escobar Dept. of Experimental Statistics Louisiana State University Baton Rouge, LA 70803
modeling, reliability budgeting, reliability prediction and assessment, reliability demonstration. Some major objectives in obtaining reliability data include: i Obtaining early identi cation of failure modes and understanding and removing their root causes and thereby improving reliability. ii Determining how long each unit should be run prior to shipment in order to avoid likely premature eld failures. iii Quantifying reliability to determine whether or not a product is ready for release.
Abstract
In the past decade there has been a high degree of interest in improving the quality, productivity, and reliability of manufactured products. Global competition and higher customer expectations for safe, reliable products are driving this interest. After the areas of experimental design and statistical process control the one of reliability is the next to receive a high degree of emphasis. Industry's current concern is on how to move rapidly from product conceptualization to a cost-e ective highly reliable product. Part of the reliability assurance process requires conducting tests and studies to obtain reliability data and to turn these data into useful information for making decisions. In this paper we consider the use of modern methods for analyzing time-to-failure data that can be implemented using SAS software. We provide an appropriate mix of proven traditional techniques, enhanced and brought up to date with some modern computer-based methodology. The methodology will be illustrated using PROC RELIABILITY to analyze some applications of product reliability. Key words: Life data; Censored data, Quality, Survival analysis; PROC RELIABILITY.
1 Introduction
1.1 Importance of reliability data analysis
Proper reliability data analysis are needed in diverse areas like design for reliability, reliability 1
1.3 Censoring
Reliability data are typically censored exact failure time is not known. The most common reason for censoring is the need to analyze data before all units fail. The analysis of censored data is more
complicated when the censoring times of unfailed units di er. This would happen when di erent units of the product enter into the eld at different times, as is usually the case in analyzing eld failure data. It may also be the case when units have di erent degrees of exposure over time or when one is evaluating failures due to a particular failure mode in which case, failures from other independent modes are treated as censored observations. An important assumption needed for standard analysis of censored data is that the censoring time for a unit is chosen independently of when that unit would have failed. For example, if a unit were removed from the eld because it is about to fail, treating it as a censored observation would bias the analysis.
F(t) .5
0 0.0 0.5 1.0 1.5 2.0 2.5
f(t) 0.4
0.2 0.0 0.0 0.5 1.0 1.5 2.0 2.5
t Survival Function
1 3.0
t Hazard Function
S(t) .5
0 0.0 0.5 1.0 1.5 2.0 2.5
h(t)
Figure 1: Typical time-to-failure cdf, pdf, sf, and hf. venience of model speci cation, interpretation, or technical development. All are important for one purpose or another. In Section 2.3, we give the cdfs for the lognormal and Weibull families.
bar 1998 and MEDH 1997. Lognormal distribution. For the lognormal distribution the cdf and the quantiles are F t; ; = logt , ; t 0
1 logtp = + , norp ; 0 p 1 where nor is the cdf for the standardized normal distribution. The logarithm of a lognormal random variable Y = logT follows a normal distribution with mean and standard deviation . This relationship between the lognormal and normal distributions is often used to simplify the process of using the lognormal distribution. Weibull distribution. For the Weibull distribution the cdf and the quantiles are "
t ; t 0 F t; ; = 1 , exp , 1 logtp = + , sev p ; 0 p 1 where = log , = 1= , and sevz = 1 , exp , expz is the cdf for a standardized smallest extreme value distribution. The logarithm of a Weibull random variable Y = logT follows a smallest extreme value distribution with location-scale parameters ; . nor
model are used to provide estimates and con dence intervals for distribution quantiles and population proportion failing.
1
b t is a step function that jumps by an amount F 1=n at each failure time unless there are ties, in
which case the estimate jumps by the number of tied failures divided by n. For details see Meeker and Escobar 1998 or MEDH 1997 Figure 2 was generated with PROC RELIABILITY, the nonparametric estimate of the cdf fall nearly along a straight line, indicating that the Weibull distribution will provide a good t to these data. A similar lognormal plot not shown here had showed some curvature but the degree of departure was small relative to the sampling uncertainty exhibited by the con dence bands, indicating the chain link fatigue data could have come from either distribution. Figure 2 also shows a Weibull Maximum Likelihood ML estimate plot for the chain link data. The dotted lines are drawn through a set of pointwise normal-approximation con dence intervals for F t computed as described in Chapter 8 of Meeker and Escobar 1998.
40 30 20
80 70 60 50 40 30
10 5
Percent
2 1
Percent
100 Cycles (thousands)
20
10
5 .5
.2 .1 30
1 5000
10000 Kilometers
30000
Figure 2: Weibull probability plot with the Weibull ML estimate and a set of approximate 95 con dence intervals for F t for the chain link failure data.
Figure 3: Weibull probability plot of shock absorber failure times with ML estimates and approximate 95 pointwise con dence intervals for quantiles. of F t for values of t between ti and ti+1 is i " Y d j b i = 1; : : :; m: 2 F ti = 1 , 1,
j =1
nj
This is the well-known product-limit or KaplanMeier KM estimator. It can be shown that the KM estimator simpli es to 1 when the data are complete or single censored. Figure 3 obtained using PROC RELIABILITY gives a Weibull probability plot for the shock absorber data along with approximate 95 pointwise likelihood con dence intervals for selected quantiles tp of the life distribution for details see Meeker and Escobar 1998.
components and materials, and to provide early identi cation and removal of failure modes, thus improving reliability.
Table 1: Hours versus temperature data from an ALT experiment on a new-technology integrated circuit device.
Hours Lower Upper 1536 1536 96 384 788 788 1536 1536 2304 2304 192 384 384 788 788 1536 1536 Status Censored Censored Censored Failed Failed Failed Censored Failed Failed Failed Censored Number Temp of Obs Degrees C 50 150 50 175 50 200 1 250 3 250 5 250 41 250 4 300 27 300 16 300 3 300
conducted on each device. The rst inspection was after one day with subsequent inspections at two days, four days, and so on. Tests were run at 150, 175, 200, 250, and 300 C. The analysis of these data requires special statistical methods that are described in Chapter 9 of Nelson 1982, Chapter 3 of Nelson 1990, and Chapters 3, 7, and 21 of Meeker and Escobar 1998. The developers were interested in estimating the activation energy of the suspected failure mode and the long-life reliability of the components as characterized by the proportion of devices in the product population that would fail by 100 thousand hours about 11 years. The analysis here was done using PROC RELIABILITY. Figure 4 is a lognormal probability plot of the failures at 250 and 300 C along with the ML estimates of the individual lognormal cdfs. The di erent slopes in the plot suggests the possibility that the lognormal shape parameter changes from 250 to 300 C. Such a change could be caused by a change in failure mode. Failure modes with a higher activation energy, that might never be seen at low levels of temperature, can appear at higher levels of temperature or other acceleration factors. A 95 con dence interval on 250= 300 is 1:01; 3:53 calculations not shown here, suggests that there could be a di erence. These results also suggested that detailed physical failure mode analysis should be done for at least some of the failed units and that, perhaps,
95 90 80 70 60
1E+07
1E+06
Percent
50 40 30 20 10 5 2 1 100 Celsius 1000 Hours 150 175 250 300 3000 200
1E+05
Hours
1E+04 1E+03 1E+02 20 50 80 95 100 150 250 350 Percent Celsius Percentiles 10 50 90
Figure 4: Lognormal probability plot of the failures at 250 and 300 C for the new-technology integrated circuit device ALT experiment. Table 2: Arrhenius-lognormal model ML estimation results for the new-technology IC device.
95 Approximate ParaML Standard Con dence Intervals meter Estimate Error Lower Upper , 10 : 2 1.5 , 13 : 5 ,7:4 0 9:6 .85 8:05 11:45 1 .52 .06 .42 .64 The loglikelihood is L = ,88:36. The con dence intervals are based on the likelihood ratio approximation method.
Figure 5: Lognormal probability plot showing the Arrhenius-lognormal model ML estimation results for the new-technology IC device. are close to the truth, it appears unlikely that there will be any failures below 200 C during the remaining 3000 hours of testing and, as mentioned before, this was the reason for starting some units at 200 C. The lognormal probability plot on the left hand side of Figure 5 shows estimated lognormal cdfs for all of the test levels of temperature as well at the use-condition of 100 C. The slopes of the lines are the constant assumption.
the accelerated test should be extended until some failures are observed at lower levels of temperature. Table 2 gives Arrhenius-lognormal model ML estimation results for the new-technology IC device assuming a constant . The con dence interval for 1 indicates that the temperature has an accelerating e ect on the failure of the devices. The right hand side in Figure 5 is an Arrhenius plot of the Arrhenius-lognormal model t quantiles t:10, t:50, and t:90 of the new-technology IC device ALT data. Because failures were only observed at 250 and 300 C, the plot shows the rather extreme extrapolation needed to make inferences at the use conditions of 100 C. If the projections 6
422 420 418 416 414 412 410 409 408 406 404 402 400 398 396 394 392 390 331 329 327 251
System ID
Table 3: Times of replacement diesel engine valve seats. From Nelson and Doganaksoy 1989.
System Days Replacement Time ID Observed Days 251 761 403 593 252 759 404 589 573 327 667 98 405 606 165 408 604 328 667 326 653 653 406 594 249 329 665 407 613 344 497 330 667 84 408 595 265 586 331 663 87 409 389 166 206 348 389 653 646 410 601 390 653 92 411 601 410 581 391 651 412 611 392 650 258 328 377 621 413 608 393 648 61 539 414 587 394 644 254 276 298 640 415 603 367 395 642 76 538 416 585 202 563 570 396 641 635 417 587 397 649 349 404 561 418 578 398 631 419 578 399 596 420 586 400 614 120 479 421 585 401 582 323 449 422 582 402 589 139 139
200
600
in other cases, exact times are recorded. System reliability data are collected to estimate quantities like: i the distribution of the times between failures, j = Tj , Tj ,1 j = 1; 2; : : : where T0 = 0; ii the number of failures in the interval 0; t as a function of t; iii the expected number of failures in the interval 0; t as a function of t; iv the rate of occurrence of failures ROCOF as a function of time t.
Times of replacement of diesel engine valve seats. Repair records for a eet of 41 diesel en-
gines were kept over time. Table 3 gives the the times of replacement in number of days of service of the engine's valve seats. This is an example of data on a group of systems. The data were originally given in Nelson and Doganaksoy 1989 and also appear in Nelson 1995. Questions to be answered by these data include the following: i Does the replacement rate increase with age? ii How many replacement valves will be needed in the future? iii Can valve life in these systems be modeled as a renewal process so that simple methods for independent observations can be used for analysis? Simple data plots provide a good starting point for analysis of system repair data. Figure 6 is an event plot of the valve seat repair data showing the observation period and the reported repair times. 7
bt versus age indicates whether the A plot of reliability of the system is increasing, decreasing or unchanging over time.
MCF estimate for the valve-seat replacements. Figure 7 shows the estimate of the valve-
seat MCF as a function of engine age in days. The estimate increases sharply after 650 days, but it is important to recognize that this part of the estimate is based on only a small number i.e., 10 of systems that had a total operating period exceeding 650 hours. The uncertainty in the estimate for longer times is re ected in the width of the con dence intervals the computation of such con dence limits is explained in Nelson 1995 and Meeker and Escobar 1998.
2.5
No. Units No. Events Conf. Coeff. 41 48 95%
Order the unique tij among all of the n systems. Let m denote the number of unique times. These ordered unique times are denoted by t1 : : : tm . Compute ditk the total number of repairs for system i at tk . Let i tk = 1 if system i is still being observed at time tk and i tk = 0 otherwise. Compute j j X X tk ; btj = dtk = d k=1 tk k=1 for dtk = Pn j = 1; : : :; m where P n t d t , t = i k i k k i=1 i=1 i tk , tk = dtk = tk . and d Note that dtk is the total number of system repairs at time tk , tk is the size of the risk set tk is the average number of repairs at tk , and d per system at tk or proportion of repaired systems if individual systems have no more than one repair at a point time. Thus the estimator of the MCF is obtained by accumulating the mean number across systems of repairs per system in each time interval. For information on the computation of the stanbtj and nonparametric con dard errors of dence intervals for MCF, see Nelson 1989, Nelson 1995, and Meeker and Escobar 1998. 8
2.0
1.5
Sample MCF
1.0
.5
-.5
200
400 Days
600
800
Poisson data; and analysis of Weibull data when there are few or no failures. PROC RELIABILITY can be useful to practitioners in other areas like biostatistics, survival analysis, etc. since it supplements the current capabilities of PROCs LIFETEST, LIFEREG, and PHREG see Allison 1995. For this purpose, it would be necessary, however, to upgrade PROC RELIABILITY to handle inspection data readout data or life tables data with a general structure. Currently the procedure only admits inspection data in which all the units have the same inspection schedule, this in general is too restrictive for the analysis of time-to-event data.
4 5 6
be published by New York: John Wiley & Sons. Meeker W. Q., Escobar L. A., Doganaksoy, N., and Hahn G. J. 1997, Methods for Reliability Data Analysis. Juran's Handbook in Quality, Fifth Edition, New York: McGraw Hill. Nelson, W. 1982, Applied Life Data Analysis, New York: John Wiley & Sons. Nelson, W. 1990, Accelerated Testing: Statistical Models, Test Plans, and Data Analyses, New York, NY: John Wiley & Sons, Inc. Nelson, W. 1995, Con dence limits for recurrence data|applied to cost or number of product repairs, Technometrics 37, 147-157.
7 Nelson, W. and Doganaksoy, N. 1989, A computer program for an estimate and con dence limits for the mean cumulative function for cost or number of repairs of repairable products, TIS report 89CRD239, General Electric Company Research and Development, Schenectady, NY. 8 O'Connor, P. D. T 1985, Practical Reliability Engineering, Second Edition, New York, NY: John Wiley & Sons, Inc. 9 Parida, N. 1991, Reliability and life estimation from component fatigue failures below the go-no-go fatigue limit, Journal of Testing and Evaluation, 19, 450-453. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. R indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.
References
1 Allison, P. D. 1995, Survival Analysis Using the SAS R System: A Practical Guide, Cary, NC: SAS Institute Inc. 2 Meeker, W. Q. and Escobar, L. A. 1998, Statistical Methods for Reliability Data, To 9
A SAS Code
macro psfilefile; ************************************; * Create postscript file 'file.ps'*; ************************************; goptions reset=goptions device=psepsf cback=white colors=black noprompt gaccess=gsasfile gsfmode=replace;
filename gsasfile "&file..ps"; put str ; put strGraphics Device is--PSEPSF; mend psfile; ************************************ * NAME: General options * ************************************ options nodate nostimer nonumber source2 ls=76 ps=80; goptions ftext=none htext=2 cell; symbol1 v=plus h=2; symbol2 v=x h=2; symbol3 v=square h=2; symbol4 v=circle h=2; symbol5 v=star h=2; title " "; ************************************ * NAME: Chain link data analysis * ************************************ psfilefigures chain.link; filename chainid 'data chain.link'; data chain; infile chainid firstobs=3; input cycles censor units; run; ; proc reliability data = chain; label cycles='Cycles thousands'; freq units; distribution Weibull; probplot cycles*censor2 lrclper pconfplt llower=30 plower=.1 pupper=40 noinset ; run; ************************************ * NAME: Shock absorber analysis * ************************************ psfilefigures shock.absorber; filename shockid 'data shock.absorber'; data shock; infile shockid firstobs=4; input vehicle distance censor1 censor2 censor; keep vehicle distance censor; run; ; proc reliability data = shock; label distance='Kilometers'; distribution Weibull; probplot distance*censor2 lrclper llower=5000 lupper=30000 plower=1 pupper=80; inset height=2 cfill=white; run; ************************************ * NAME: new technology analysis * ************************************ psfilefigures new.tech.arrh; filename newid 'data new.technology'; data newtech; infile newid firstobs=9; input time fail temp units; run; ; proc reliability data=newtech; label time='Hours'; label temp='Celsius'; freq fail; nenter units; distribution lognormal;
model time = temp readout relation = arr lrcl; rplot time = temp readout relation = arr plotfit 10 50 90 fit=model lupper=1e7 llower=1e2 slower=100 pplot noconf plower=1 pupper=95 nopplegend; run; psfilefigures new.tech.pplot; proc reliability data = newtech; label time='Hours'; label temp='Celsius'; freq fail; nenter units; distribution lognormal; probplot time=temp readout scale=.7 scinit overlay pupper=95 plower=01 lupper=3000 llower=100 noconf; run; ************************************ * NAME: valveseat analysis * ************************************ psfilefigures valveseat; symbol v=dot h=.7; filename valveid 'data valve.seat'; data valve; infile valveid firstobs=3; input id days value @@; run; ; proc reliability; label days='Days'; unitid id; mcfplot days*value-1; inset cfill = white; run;
Authors' Addresses
William Q. Meeker Department of Statistics Room 326 Snedecor Hall Iowa State University Ames, Iowa 50011-1210 Phone : 515 294-5336 Fax : 515 294-4040 Email: wqmeeker@iastate.edu Luis A. Escobar Dept. of Experimental Statistics Room 159A, Ag. Adm. Bldg. Louisiana State University Baton Rouge, LA 70803-5606 Phone : 504 388-8377 Fax : 504 388-8344 Email: luis@stat.lsu.edu http: www.stat.lsu.edu faculty escobar 10