Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
94 views

Product Reliability Slides

This document provides information on analyzing failure data from reliability testing. It discusses the key metrics that can be calculated from failure data including failure density, failure rate, reliability, mean time to failure, and confidence levels. The document outlines different types of failure data such as field data versus test data, grouped versus ungrouped data, and censored versus uncensored data. An example of calculating these metrics from grouped failure data of 500 items is presented, including determining the 90% confidence interval for the mean time to failure.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

Product Reliability Slides

This document provides information on analyzing failure data from reliability testing. It discusses the key metrics that can be calculated from failure data including failure density, failure rate, reliability, mean time to failure, and confidence levels. The document outlines different types of failure data such as field data versus test data, grouped versus ungrouped data, and censored versus uncensored data. An example of calculating these metrics from grouped failure data of 500 items is presented, including determining the 90% confidence interval for the mean time to failure.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Basic Failure Data Analysis

Failure data is basically the time to failure for many items.


From this data we have to calculate:
Failure Density f(t)
Failure Rate (t)
Reliability R(t)
MTTF
Standard deviation
Confidence levels

Failure data is of various types:

Field Data vs Test Data


Field data is collected from the user sites. This will cover a large number of
samples. But the data from the individual samples will not be precise. Usually
The failure times are grouped into intervals containing a large number of samples.
Test data is obtained from reliability testing labs. The number of samples will be small,
but failure times will be precise.

Grouped Vs Ungrouped data


In grouped data, the failure times of many items are grouped into intervals
(typical of field data).
Eg: 0-100 hrs, Items failed=20.
101-200hrs, items failed =23

In ungrouped data, the precise failure times of individual items are noted.
They have to be later ranked according to the order of failure
(1st failure, 2nd failureetc).

3. Censored Vs. Uncensored data


(incomplete Vs. complete data)

Censoring is a common problem in failure data.


Some usual reasons are:
a) items are removed before failure (because they failed in a
different way).
b) test is completed before all items fail.
We have to take into account the censored units, otherwise only the
weakest units would be considered and reliability would be underestimated.

Example of Grouped failure data (Uncensored):


The following field data is assembled, observing the failure of 500 items.
Plot the Probability Density, Reliability, Cumulative failure, Failure rate.
Find the MTTF and Standard deviation.
Time 0
(hr)

100

200

300

400

500

600

700

800

900

1000

Survi 500
vors

480

421

360

343

295

220

175

130

70

The following items have to be tabulated, then plotted as histogram.


Calculation is carried out as finite steps of intervals 0-100, 101-200, 201-300 etc.
Note that t=100hrs.

Time

Ns (t)

Ns(t+t)

Nf(t)

h(t)

f(t)

R(t)

F(t)

0-100

101-200
(etc)

Where:
Ns(t)=Numbers of survivors at time t.
Ns(t+t)=Number of survivors at t+ t.
Nf(t)=Number of failures in the interval t.

h(t) or (t) is the failure rate in the interval t


and given by,

h(t )
=

No. of failures in an interval


(No. of survivors in the beginning of the interval)(t)
N f ( t )
N s ( t ) 100

Its unit is fraction of failures per hour (or, probability of failure per hour).
We can multiply the above with 100 and get percentage of failures per hour.

f(t) is the probability density of failure (discrete form) in each


interval:

f (t )
=

No. of failures in an interval t


Initial sample size
N f ( t )
500

Reliability: R(t )
Net survivors at the end of an interval
Initial sample size
N ( t t )
= s
500

R(t )

Cumulative failure: F (t )
Net failed items at the end of an interval
Initial sample size
N (t t ) 500
= s
1 R (t )
500

F (t )

Time

Ns (t)

Ns(t+t)

Nf(t)

h(t)

f(t)

R(t)

F(t)

0-100

500

480

20

4x10-4

0.04

0.96

0.04

101-200

480

421

59

12.3x10-4

0.118

0.842

0.158

201-300

421

360

61

14.5x10-4

0.122

0.72

0.28

301-400

360

343

17

4.72x10-4

0.034

0.686

0.314

401-500

343

295

48

14.0x10-4

0.096

0.59

0.41

501-600

295

220

75

25.4x10-4

0.150

0.44

0.56

601-700

220

175

45

20.5x10-4

0.09

0.35

0.65

701-800

175

130

45

25.7x10-4

0.09

0.26

0.74

801-900

130

70

60

46.2x10-4

0.12

0.14

0.86

901-1000

70

70

100.0x10-

0.14

1.0

f =1.0
(here, n=10 i.e, number of time intervals, or number of observations)

Hazard rate

Failure density

Cumulative failure

Reliability

Based on the data we can answer questions like these:


1. What is the probability that the item will fail before 300 hours?
i,e cumulative probability of failure upto and including the 201-300 hr step.
=0.04+0.118+0.122=0.28

2. What is the reliability of the item at 300 hours?


R(300) is 0.72.
3. What is the probability of failure between 300-700hours of duty?

Add up failure probabilities f(t) for 301-400, 401-600 and 601-700.


=0.034+0.096+0.15+0.09=0.37

Calculation of MTTF

Failure density

MTTF ti f (ti )

50

150

t i+1 +t i
)
2
(because many items fail in an interval; their precise failure times are not known)
Here ti is the mean time of each interval (i.,e

50 0.04 150 0.118 ........ 950 0.14 548.8hrs

Standard Deviation (s):

Variance ti

f (ti ) MTTF 2

(502 0.04 1502 0.118 .... 9502 0.14 ) 548.82


381060 548.82 79879
s 79879 283 hrs

Calculating the confidence levels of the MTTF

As per Central Limit Theorem, the distribution of means are Gaussian.


If we know the exact value of 0, we can work out the confidence levels.
However, we have only s, the standard deviation of sample with n observations.
Hence we use Students t-distribution, which approximately resembles the
Gaussian distribution, but the spread or uncertainty is more than Gaussian.
The Students t-distribution also realistically represents the distribution for small
values of n (i.e no of observations).

The shape of the t-distribution changes depending on the value of n.


i.e., a parameter called degree-of-freedom (n-1) is used to specify the
shape of the t-distribution.
Higher the value of n, the closer the Student distribution matches the Gaussian
distribution.

Students-t distribution, random variable t is defined as:

x MTTF
s/ n

where:
MTTF

is the mean calculated from the data table.

is standard deviation calculated from n observations.

is the random variable, representing the means of n observations.

Calculating the 90% confidence level for MTTF prediction using T-Distribution?

MTTF t1

s
n

(t1 corresponds to =0.05)

is defined as (1-Confidence Level)/2

For our specific problem:


n=10 (no. of observations)
DOF= n-1 = 9;
Confidence level = 0.9
Hence 0.05.
Look at the chart to find value of t1 corresponding to DOF=9, 0.05 .

Look at the chart to find value of t corresponding to DOF=9, 0.05 .

t1=1.833
Lower limit is : 548.8 1.833 x 238/sqrt(10) = 410.8
Upper limit is: 548.8 + 1.833 x 238/sqrt(10) =686.8
We can say with 90% certainty that the MTTF in this case will be
will be between 410.8 and 686.8.

Time

Ns (t)

0-100

500

101-200

480

201-300

421

301-400

360

401-500

343

501-600

295

601-700

220

701-800

175

801-900

130

901-1000

70

Note:
In this example we have N=500 samples, divided
into n=10 classes or observations.

What is the ideal number of classes to divide 500


samples into? If the number of classes are too small
the histogram will be too approximate.
If the number of classes are too large, each class will
contain only a few samples (not representative).
The optimum number of classes is approximately
given by Sturges rule:
n=integer(1+3.3 log10 N)
Here n=integer [1+3.3 log 500]
=integer[9.9]
=9 classes.

Ungrouped data (without censoring):


These are data typically obtained by Lab testing for reliability.
There will be only a few samples, but the time to failure is accurately noted.
The ungrouped data has to be ordered first, and then we have to plot the various
parameters.

Example:
Six machines A,B,C,D,E, F are tested until their first failure occurs. The failure
times in hours are 14.0, 12.2, 14.6, 14.1, 13.1 and 14.7 hrs.
Let us plot the cumulative failure frequency (F(ti)) and study the curve.

Time of failure is now ordered according to increasing number of failures


Failure order (i)

Time of failure (hr)

Cumulative failure
order; F(t)=i/n

12.2

1/6=0.167

13.1

2/6=0.333

14.0

3/6=0.500

14.1

4/6=0.667

14.6

5/6=0.833

15.0

6/6=1.000

This means, there is 1/6th probability of failure at 12.2hrs,


2/6th at 13.1 hrs.etc

This cumulative curve has to be improved because:


F=1 is not really achievable in a finite time.
Hence a corrected cumulative distribution has to be used.

There are two formulae's to improve the Cumulative Distribution:


1.Mean rank formula:
We assume there are n 1 samples, one of which doesnt fail at the end of the test.
i
Hence cumulative failure F (i ) is given by
.
n 1
This gives F<1 at the final time.

2.Median rank formula:


i 0.3
.
n 0.4
This is considered to be a much better estimate than mean rank.
Here, F (i ) is given by approximately

The two new estimates of F(t) are shown in red


Failure order
(i)

Time of
failure (hr)

F(i)=i/n

F(i)=
i/(n+1)

F(i)=
(i-0.3)/(n+0.4)

12.2

1/6=0.167

0.143 =1/7

0.1094

13.1

2/6=0.333

0.286 =2/7

0.2656

14.0

3/6=0.500

0.4286

0.4219

14.1

4/6=0.667

0.5714

0.5781

14.6

5/6=0.833

0.7183

0.7344

15.0

6/6=1.000

0.8571=6/7

0.8906

It is seen that Median fitting is more parallel to the original plot, and hence
captures the trend better than the Mean fitting.

Detailed Example:
The unordered data of failure times (hr) of 10 machines are as follows
24.5, 18.9, 54.7, 48.2, 20.1, 29.3, 15.4, 33.9, 72.0, 86.1.
Plot Cumulative failure, Reliability, Failure Density, Hazard rate, MTTF.
And 90% confidence interval for MTTF.

Procedure:
1. Find cumulative failure F(ti) using Median formula = (i-0.3)/(n+0.4)

2. Reliability is 1-F(ti)

or R

= (n-i+0.7)/(n+0.4) .

3. Failure density f(t)= -d R/dt = dF/dt


It can be calculated approximately by:

f(ti) = - [R(t i+1)-R(ti)] / (ti+1-ti)

4. Hazard rate h(t)= f(t)/R(t)


It can be calculated approximately by :

h(ti)= f(ti)/R(ti)

Failure
Order(i)

Time
(hr)

R(ti)

Density
f(t)

Hazard
h(t)

1.0

0.0044

0.0044

i=1

15.4

0.933

0.0275

0.0295

i=2

18.9

0.837

0.0801

0.0958

20.1

0.740

0.0219

0.0295

24.5

0.644

0.0200

0.0311

29.3

0.548

0.0209

0.0381

33.9

0.452

0.0067

0.0149

48.2

0.356

0.0148

0.0416

54.7

0.260

0.0056

0.0214

i=9

72.0

0.164

0.0068

0.0417

i=10

86.1

0.0673

(0.933 0.837)
18.9 15.4

0.0275
0.933

T= 403.1
MTTF=403.1/10 =40.31 (each failure time pertains to one item, hence we
can find MTTF by calculating the simple average)
s=23; The 90% confidence intervals are 26.9 and 53.64

Censored Data

How to deal with Incomplete or Censored Data:


These are cases where some samples are prematurely withdrawn before
failure; or failure occurs due to some other causes.
a) Singly censored data:
Here n items are tested for failure in a fixed test time.
b) Multiply censored data:
Here the test times differ among the n items (as shown previous examples).
We will deal with only Multiply censored data as it is more widely used.
There are two cases of multiply censored data: Grouped and Ungrouped.

Life Table method for Grouped data which is censored.


(originally used in medical trials)

Let the failure data be organized into i=1,2,3..n time intervals or bands;
say 0-100hrs, 101-200 hrs etc.
N= Total number of samples.

Nf.i = Number of samples failing in the interval i.


Nc.i = Number of censored items in interval i.
N s.i = Number of survivors in the beginning of an interval = N-Nf.
N s.i = Adjusted N s.i = N s.i - Nc.i /2

i-1 -

Nc. i-1

1. Ri 1 fi
N ' si
1
(i.e., Reliability of the i th band is the product of the local Reliabilities of the
previous band from 1 to i).
i

2. Hazard rate, h i

N fi
N ' si .t

3. Failure density is approximately found out by fi = -

(R i-1 -R i )
.
t

Example:
A total of N=20 samples are tested in intervals of time 50hrs.
5 are censored. 12 are failed. Remaining are left untested.
Calculate the Reliability and Failure Density distribution.
Time

Nf.i

Nc.i

Ns.i

Ns.i

1-Nf.i/Ns.i
(local
reliability)

Ri

hi

0-50

20

20

0.850

0.850

0.003

51-100

17

16.5

0.940

0.85x0.94=0.8

0.0012

101-150

15

14.5

0.862

0.689

0.0028

151-200

12

11

0.812

0.560

0.00364

201-250

0.875

0.489

0.0025

251-300

6.5

0.692

0.339

0.0062

301-350

0.750

0.254

0.005

Ungrouped Censored data.


The method preferred is the Rank adjustment method, by Johnson [1959].
Consider the table below.

Rank (no
censoring)

Rank with censoring

i=1

i=2

i=3

Censored.
Rank is skipped.
(Rank Increment =1.2,say)

i=4

2+1.2=3.2 (instead of 4 without censoring)


(previous rank +R.I)

i=5

3.2+1.2=4.4(instead of 5 without censoring)


(previous rank +R.I)

i=6

i=7

Censored.
Rank is skipped.
(Rank Increment =1.3, say)

4.4+1.3=5.7(instead of 7 without censoring)


(previous rank +R.I)

We give a Rank Increment


for the missing (censored)
data.
Rank is skipped for the
censored data.
The RI is added to previously
given rank, to obtain the
next rank.
A new RI is calculated for
the next censored data.
The later a component is
censored, the higher is the
RI given to it.

How to find rank increment (RI), Reliability etc.


Let n be the highest rank of failure. i.e., it is the no. of items being tested.
i is the original rank without censoring.
it .i is the corrected rank. This is obtained by recursively adding Reliability
Index to the previous ranks.

RI=

n 1 previous corrected rank(it.i-1 )


1 number of units beyond the censored unit

it.i 0.3
Reliability=1n 0.4

Rank (i)

Time(hr)

Rank Index

Corrected
Rank (iti)

Reliability

150

1 (start)

0.933

340 C

(11-1)/(1+8)=1.11

560

1+1.11
=2.11

0.826

800

2.11+1.11
=3.22

0.719

1130 C

1720

3.22+1.2963
=4.518

0.594

2470 C

4210 C

5230

4.518+2.16
=6.679

0.387

10

6890

6.679+2.16
=8.839

0.179

(11-3.22)/(1+5)=1.2963

(11-4.518)/(1+2)=2.16

How to identify a failure distribution, given failure data ?


There are various methods.
We have to analyze the data and obtain the MTTF, Standard Deviation,
R(t), F(t), h(t) and f(t).

We can make some important inferences from them.


Examples:
If the mean and median time to failure are the same, it means it could be
a symmetric distribution i.,,e Gaussian, or Weibull with shape parameter
3 to 4.

If the mean is considerably more than median, it could be an Exponential


Distribution, or Weibull with shape parameter less than 3 (i.e distributions
with a tail).
If the mean and standard deviation are the same, then it could be an
Exponential distribution.

By plotting the Hazard rate - h(t) or (t)

If it is a constant hazard rate, the distribution could be Exponential.


If it is linearly increasing hazard rate, the distribution could be Rayleigh.
If it is a non-linearly increasing or decreasing it could indicate a Weibull
Distribution which require further investigation.

Fitting the data into standard probability distributions


(Weibull, Exponential..etc)
using Curve fitting.

This is a very convenient and general purpose method to identify a


repair distribution.
The plot of ti versus Fi (cumulative probability) is drawn on a linearized
scale.
A line is drawn with the best fit through all plotted points.
The graph should be a straight line if it matches the appropriate
probability distribution.
From the straight line, noting the slope and intercepts we can calculate
the failure rate and other distribution parameters.

For more accurate fitting we have to use specific tests for


specific distributions:
Eg. Bartletts test for Exponential Distribution
Manns test for Weibull Distribution
Kolmogorov Smirnov Test for Normal and Lognormal distributions.

In these exercises we use only uncensored data for simple illustration.


We can equally well used censored data and plot the ti versus Fi using the
censoring methods previously studied.

Fitting the Exponential distribution:


1. For Exponential distribution the cumulative probability is given by,
1 F (t ) e t

or log e (1 F (t )) t.

1
i.e, log e
t

1 F (t )
This is a line in the form of,
y bt ,

where b is the slope of the fitted line, and is equal to .

1
Note : For exponential distribution MTTF= ;

Also, F ( MTTF ) 1 e 1 0.632.


The value of t corresponding to F=0.632 gives the MTTF.

Example 1. Manually fit an Exponential Distribution with the data:


Time to first failure times of 10 machines are given in hours
80,134,148,186,238,450,581,890.

ti

Fi (median rank)

loge( 1/(1-Fi) )

80

0.083

0.0870

134

0.2024

0.2261

148

0.3214

0.3878

186

0.4405

0.5807

238

0.5595

0.8199

450

0.6786

1.1350

581

0.7976

1.5976

890

0.9167

2.4849

Failure rate is equal to the slope, and found from the triangle as the ratio
of delta y to delta x. = 1.4/500 = 0.0028/hour.
R =e -0.0028t. MTTF = 1/ = 357 hrs.

Instead of manually drawing a line, a more accurate option is the


linear least square fit, by minimizing the square of the y deviations.
Assuming the ti values are taken as xi, and the loge () values are taken as
yi,
If y =a+bx is the fitted line, a and b are calculated as follows:

The same can also be drawn on standard Exponential Distribution


Chart paper.

Here we can directly plot Fi (y axis) against ti (x axis) on the ready-made


logarthmic scale.
The value of ti corresponding to Fi =0.632 gives the MTTF.
The reciprocal of MTTF gives the failure rate.

Standard
Exponential Chart
Paper.

F(t)

Time (t)

Exponential plot,
drawn on chart paper.

How to Identify a Normal (Gaussian) distribution ?

t
F (t )


z 1 ( F )
(we can find values of z corresponding to , from tables)

z=

or,
z

This is a line in the form of y bt c.


b (slope)

c ( y intercept)=

When plotted on a standard chart paper, the following relationships are useful:
F(MTTF) =0.5
F(MTTF+) =0.84. (the value of is the horz distance between F=0.5 and F=0.84)

We can plot z versus t , with or without standard chart paper.


Without chart paper:

The failure times of 20 machines are observed to be 68,69.6,71.1,71.4,74.3,74.6,75.5,


77.6,77.8,78,78.2,80.2,,80.3,81.9,83.0,85.6,87.4,87.7,88.4,98.3 hrs.
Check the fit of Gaussian distribution and find MTTF and Standard deviation.
i

ti

Fi (median rank)

zi (from table)

68.0

0.0343

-1.821

69.6

0.0833

-1.3832

71.1

0.1324

-1.1151

71.4

0.1814

-0.9100

74.3

0.2304

-0.7375

..

20

98.3

0.9657

1.8211

zi
Approximately:
Slope =0.124
Y intercept=-10
Standard deviation=8.09
MTTF=80hrs

=10

Data is plotted
on chart paper.
MTTF and S.D are
calculated from
F=0.5
And F=0.84.

MTTF=80
0

50

100 hrs

Plotting Log-normal distributions

We have to use the alternate form of Log-Normal distribution;


1
t
2 log e

tmed
2t

1
e
t 2
( where is the std .dev of t; tmed is the median time to failure.)
f (t )

1
t
F (t ) log e
.
tmed

This is written as :
z 1 ( F )

t
log e

tmed
1

log e t log e tmed .

z is calculated from charts - using the same procedure as Normal


Distribution plotting discussed previously.

i.e., z

log e t log e tmed

bx a
(i.e. x-cordinate is log e (time).
y intercept is log e tmed from which we can find tmed .
Slope is

tmean or MTTF is calculated by a separate formula:


2
MTTF tmed
e

Illustration of a Log-normal plot

zi

1
0
-1
-2
0

10

100

1000

loge ti
Slope is equal to 1/standard_deviation;
Y intercept is equal to -loge tmed

Log-normal probability
chart paper.

Note the log-scale on the


x-axis for time.
Here tmed is the time
corresponding to
F=0.5
1/ is the slope.

We can then find MTTF.

0.1

1.0

10.0

100.0

How to fit a Weibull Distribution:

Since Weibull distribution can model a wide range of failure rates, it is


highly important to learn to model Weibull distributions.
In the two parameter Weibull distribution, we can find the shape factor () and
scale factor ().
In the three parameter Weibull distribution, we also have to find the starting
time (t0 or ).

Two Parameter Weibull Distribution ( shape parameter, =scale parameter);

F (t ) 1- e
1 F (t ) e

t
-

t
-

1 t
log e

1

F
(
t
)

1
log e log e
log e t log e
1

F
(
t
)

i.e, y bx c

(b is the slope equal to , and y intercept c is equal to - log e ).

Note : x coordinates are in the form of log (t).

Note: The value of time t corresponding to F=0.632 gives the value of .

loge [loge (1/(1-F(t))) ]

Usually it is preferred to use standard Weibull chart paper.

y
x

Slope= y/ x=

loget
Manual plotting of Weibull distribution

Plot the following data and check for Weibull fit:

First let us plot without chart paper.


i

ti (hr)

Median
rank (Fi)

loge(ti)

Loge [loge(1/(1-Fi)) ]

32

0.13

3.4657

-1.9714

51

0.31

3.9318

-0.9914

74

0.50

4.3041

-0.3665

90

0.69

4.4998

0.1580

120

0.87

4.7875

0.7131

Slope=2.2/1 =2.2 =

Y intercept is found out by extending the line to cut the y axis at x=0;
c= -9.8
Therefore is calculated as 85.

F=0.632

F(t)

Weibull chart
paper.

10

100

1000

The same example is solved


Using Weibull chart paper.

F(t)

1.9
88

Time (t) hrs

What is the failure rate ?


use 1.9 and 88.

(t )

MTTF= (1

t
0.0216
88
1

0.9

(increasing failure rate).

) 88 (1.53) 88 0.8876 78hrs

( note : Gamma function data is obtained from tables)

tmed ? Put

R(tmed )=0.5

where R( t )=e

t
-

1.9

We get e

- med
88

0.5
1

(or tmed 0.69315 , this is a standard formula derived from the above)
tmed 72.56hrs.

t
(t ) 0.0216
88

0.9

1.9

R(t )=e

t
-
88

Three parameter Weibull Distribution

F(t)

When we plot the 2 parameter Gamma distribution, sometimes we get a curve


which deviates from the straight line, towards the bottom right.

t
This indicates the need for adding the location parameter or failure free time
which is represented by t0 or

g.

Failures start occuring only after the time t0.

The Three parameter Weibull equation is obtained by replacing


t with (t - t0 ) in the function.
R (t ) 1 F (t ) e

(t )

t t0

t t
- 0

MTTF t0 (1

)
1

tmed t0 0.69315

Procedure for 3 parameter Weibull plot:


1. Once we obtain a downward deviating Weibull plot (as discussed earlier),
it means we need to insert the location parameter t0.
2. This is done by trial and error.

3. Assume a reasonable value of t0 . On the x axis, (t- t0 ) is taken instead


of t. If the guess t0 is correct we get a straight line. and are calculated
as usual.
4. If the estimate of t0 is too high we get a upward deviating curve.

F(t)

Corrected curve

t0

Original curve

t0

Log Scale (t-t0)

Goodness of Fit and Hypothesis Testing


using Chi Square (2) distribution.
This is a statistical method for testing wether a given failure data fits into any
predicted distribution.
It can be applied only to Grouped and ordered data. There must be at least
5 samples in each class. If the data is ungrouped, it has to be grouped into
appropriate classes using Sturges Rule.

A hypothesis is made such as The given failure data fits the Exponential
Distribution and it is proved either true or false.
The deviations between the given data and predicted model are expressed
as the Chi-Square parameter 2.
For the hypothesis to be true, the cumulative probability of 2 must be
0.1 or less. This corresponds to 90% confidence level.

Chi - square distribution is represented as,

f ( , ) Y0 2 2 e
1

2
2

Here, 2 is the estimator of deviation (or error value),


is the DOF of the data, i.e., No. of classes-1, or Number of random variables -1.
In order for the hypothesis to be acceptable, 2 must be small such that the
cumulative frequency F ( 2 ) must be 0.1 or less.
This value is obtained from data tables.

Cumulative probability
must be less than 0.1

f(,5

2
2 Distribution for 5 ; the Cumulative probability is the red region.
Some Chi-square Tables give the area of the blue region (i.e 1-F(t)).
Some Chi-square tables give F(t).

Chi Square parameter 2 is calculated as:

Observed i Expected i
Expected i

xi Ei
Ei

Example:
A coin is given. The hypothesis is This is a fair coin.
If it is a fair coin, it has equal chances of obtaining Head or Tails.
To test this hypothesis, we conduct a test of 100 coin tossings
and the data is shown in 2 classes.

Class

Observed

Expected

Heads

38

50

Tails

62

50

(38 50) 2 (62 50) 2


5.76
50
50
2

No. of Classes 1 2 1 1
Look in the Chi square tables for 2 5.76 and DOF 1.
F ( 2 ) must be less than 0.1.
Or the confidence level (1 F ( 2 )) must be at least 0.9.

2 5.76
P 1 F ( 2 ) 0.015interpolated.

P should be at least 0.9 for the hypothesis to be accepted.


Here it is only 0.015.
Hence the hypothesis is definitely rejected.

Reliability example-2:
Grouped failure data of transistors are shown in 5 classes.
1. Check the hypothesis that there is constant failure rate of 0.012 per hour
2. Check the hypothesis that there is a variable failure rate defined by,
h(t)=0.2670 t -0.4170

Data given
Classes (interval in hours)

No. Failed

0-999

18

1000-1999

14

2000-2999

10

3000-3999

12

4000-4999

Let us test the first hypothesis (constant failure rate of 0.012/hr).


This means we expect 12 failures every 1000 hours.
Classes (interval in
hours)

Observed

Expected

0-999

18

12

1000-1999

14

12

2000-2999

10

12

3000-3999

12

12

4000-4999

12

(18 12) 2 (14 12) 2 (10 12) 2 (12 12) 2 (6 12) 2


..... 6.67
12
12
12
12
12
2

No. of Classes 1 5 1 4
Look in the Chi Square tables for 2 6.67 and DOF 4.
We can find the confidence level (1 F (t )).

2 6.7
P 1 F ( 2 ) 0.15interpolated.

P should be at least 0.9 for the hypothesis to be accepted.


Here it is only 0.15.
Hence the hypothesis is definitely rejected.

The hypothesis that there is a variable failure rate defined by,


h(t)=0.2670 t -0.4170
Using the above eqn find the failure rates at the centre of the class
intervals i.,e h(500), h(1500), h(2500),h(3500),h(4500). Let us assume
they represent the failure rate across an interval.
The failure rates are 0.02, 0.0126, 0.0102, 0.0089, 0.008.
The expected failures are 20,13,10,9 and 8 as per the above failure
rates.
Classes (interval in
Observed
Expected
hours)
0-999

18

20

1000-1999

14

13

2000-2999

10

10

3000-3999

12

4000-4999

(18 12)2 (14 13)2


..... 1.1;
12
13
2

2 1.1
P 1 F ( 2 ) 0.9 (very rough interpolation).

P should be at least 0.9 for the hypothesis to be accepted.


Hence the hypothesis is accepted.

An alternate
Chi-Square table
Showing F(2) instead
of 1-F(2).
Here we can
get a better estimate
of the cumulative
probability
corresponding
to 2 1.1 .

Goodness of Fit for Ungrouped data:


There are other tests such as Kolmogorov Smirnov tests which can
be used for ungrouped data.
But they have limitations such as being applicable only for Normal,
and Lognormal distributions.

END

You might also like