Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
67% found this document useful (3 votes)
6K views

Testing of Hypothesis

This document discusses sampling distributions and statistical inference. It begins by explaining that statistical inference involves using sample data to make conclusions about an overall population. It then defines key terms like sample, population, parameter, and statistic. The document goes on to explain that the sample mean follows a normal distribution, according to the central limit theorem. It also defines concepts like the null hypothesis, alternative hypothesis, and types of errors that can occur in statistical testing. Finally, it outlines the general process for hypothesis testing and discusses tests that can be used for the mean when certain conditions are met.

Uploaded by

Siddharth Bahri
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
67% found this document useful (3 votes)
6K views

Testing of Hypothesis

This document discusses sampling distributions and statistical inference. It begins by explaining that statistical inference involves using sample data to make conclusions about an overall population. It then defines key terms like sample, population, parameter, and statistic. The document goes on to explain that the sample mean follows a normal distribution, according to the central limit theorem. It also defines concepts like the null hypothesis, alternative hypothesis, and types of errors that can occur in statistical testing. Finally, it outlines the general process for hypothesis testing and discusses tests that can be used for the mean when certain conditions are met.

Uploaded by

Siddharth Bahri
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 37

SAMPLING DISTRIBUTIONS

Statistical Inference is concerned with making decisions about a population based on


the information contained in a random sample from that population.

A sample is a subset of the population, selected to representative of the larger


population. It is essential that any sample is as representative as possible of the
population from which it is drawn.

Often, we may want to know things about populations but don’t have data for every
person or thing in the population. If a company’s customer service division wanted to
learn whether its customers were satisfied, it would not be practical or perhaps even
possible to contact every individual who purchased a product. Instead, the company
might select a sample or samples of the population.

For instance, we may be interested in the mean fill volume of a can of soft drink. The
mean fill volume in the population is required to be 300 millilitres. An Engineer takes a
random sample of 25 cans and computes the sample average to be x = 298 millilitres.
The Engineer will probably decide that the population mean is µ = 300 millilitres, even
though the sample was 298 millilitres because he or she knows that the sample mean is
a reasonable estimate of µ and that a sample mean of 298 millilitres is very likely to
occur, even if the true population mean is µ = 300 millilitres. In fact, if the true mean is
300 millilitres, tests of 25 cans made repeatedly, perhaps, every five minutes, would
produce values of x that vary both above and below µ = 300 millilitres.

The sample mean is a statistic; that is, a random variable that depends on the results
obtained in each particular sample. Since statistic is a random variable, it has a
probability distribution.

A parameter is a characteristic of a population. A statistic is a characteristic of a


sample. Inferential statistics enables us to make an educated guess about a population

1
parameter based on a statistic computed from a sample randomly drawn from that
population.

A sampling distribution is the probability distribution of a sample statistic that is


formed when samples of size n are repeatedly taken from a population. If the sample
statistic is the sample mean, then the distribution is the sampling distribution of sample
means.

SAMPLING DISTRIBUTION OF MEANS

Suppose that a random sample of size n is taken from a normal population with mean µ
and variance σ 2 . Now each observation in this sample, say, X1, X2, ..., Xn, is a normally
and independently distributed random variable with mean µ and variance σ 2 . Then by
the reproductive property of normal distribution, the sample mean,

X1 + X 2 + + X n
X =
n

has a normal distribution with mean

µ + µ ++ µ
µX = =µ
n
and variance
σ 2 + σ 2 + + σ 2 σ 2
σ X2 = =
n2 n
If we are sampling from a population that has an unknown probability distribution, the
sampling distribution of the sample mean will still be approximately normal with mean µ
σ2
and variance , if the sample size is large. This is one of the most useful theorems
n
called central limit theorem.

CENTRAL LIMIT THEOREM

If X1, X2, ... , Xn is a random sample of size n taken from a population (finite or infinite)
with µ and finite variance σ 2 , and if X is the sample mean, the limiting form of the
distribution of

X −µ
Z = , as n → ∞, is the standard normal distribution.
σ/ n

2
NOMENCLATURE AND DEFINITIONS

Statistical Hypothesis
A statistical hypothesis is a statement about the parameters of one or more populations.

Null Hypothesis (H0)


A null hypothesis is a hypothesis that might be falsified on the basis of observed data.
The null hypothesis typically proposes a general or default position, such as that there is
no relationship between two quantities, or that there is no difference between a
treatment and the control.

Null hypothesis is a statement of zero or no change. If the original claim includes


equality (≤, =, or ≤), it is the null hypothesis. If the original claim does not include
equality (<, not equal, >) then the null hypothesis is the complement of the original
claim. The null hypothesis always includes the equal sign.

The decision is based on the null hypothesis.

Alternative Hypothesis (H1 or Ha)


Statement which is true if the null hypothesis is false. The type of test (left, right, or two-
tail) is based on the alternative hypothesis.

Test statistic
Sample statistic used to decide whether to reject or fail to reject the null hypothesis.

Critical region
Set of all values which would cause us to reject H0 (Region of rejection).

Critical value(s)
The value(s) which separate the critical region from the non-critical region. The critical
values are determined independently of the sample statistics.

Significance level (α)


The probability of rejecting the null hypothesis when it is true; α = 0.05 and α = 0.01 are
common. If no level of significance is given, use alpha = 0.05.

Decision
A statement based upon the null hypothesis. It is either "reject the null hypothesis" or
"fail to reject the null hypothesis". We will never accept the null hypothesis.

Conclusion
A statement which indicates at what level of significance the null hypothesis is rejected
or not rejected.

Statisticians will never accept the null hypothesis, we will fail to reject. In other words,
we'll say that it isn't, or that we don't have enough evidence to say that it isn't, but we'll

3
never say that it is, because someone else might come along with another sample
which shows that it isn't and we don't want to be wrong.

ERRORS OF TESTING OF HYPOTHESIS

Type I error
Rejecting the null hypothesis when it is true. Usually the more serious error.

Type II error
Failing to reject the null hypothesis when it is false.

Alpha (α)
Probability of committing a Type I error.

Beta (β)
Probability of committing a Type II error.

Decision H0 True H0 False


Reject H0 Type I Error - α Correct Assessment
Fail to Reject H0 Correct Assessment Type II Error - β

Which of the two errors is more serious? Type I or Type II?

Since Type I is the more serious error (usually), that is the one we concentrate on. We
usually pick alpha to be very small (0.05, 0.01). Notice here that, alpha is not a Type I
error. Alpha is the probability of committing a Type I error. Likewise beta is the
probability of committing a Type II error

TYPE OF TESTS
The type of test is determined by the Alternative Hypothesis (H1)

Left Tailed Test

H1: parameter < value


Notice the inequality points to the left

Decision Rule: Reject H0 if


test statistic < critical value

4
Right Tailed Test

H1: parameter > value


Notice the inequality points to the right

Decision Rule: Reject H0 if


test statistic > critical value

Two Tailed Test

H1: parameter ≠ value


Notice the inequality points to both sides

Decision Rule: Reject H0 if

test statistic < critical value (left) or test statistic > critical value (right)

THE TESTING PROCESS

Hypothesis testing involves the following general procedure:

1. State the relevant null and alternative hypotheses to be tested.

2. The second step is to consider the assumptions being made in doing the test;
for example, assumptions about the statistical independence or about the form
of the distributions of the observations. This is equally important as invalid
assumptions will mean that the results of the test are invalid.

3. Compute the relevant test statistic. The distribution of such a statistic under
the null hypothesis can be derived from the assumptions. In standard cases this
will be a well-known result. For example the test statistics may follow a
Student’s t distribution or a normal distribution. The distribution of the test
statistic partitions the possible values of the estimator into those for which the
null-hypothesis is rejected and those for which it is not.

4. Compare the test-statistic (S) to the relevant critical values (CV) (obtained from
tables in standard cases).

5. Decide to either fail to reject the null hypothesis or reject it in favor of the
alternative. The decision rule is to reject the null hypothesis (H0) if S > CV and
vice versa.

5
TESTS FOR MEAN

We will see how to conduct a test of hypothesis for a mean, when the following
conditions are met:

 The sampling method is simple random sampling.


 The sample is drawn from a normal or near-normal population.

Generally, the sampling distribution will be approximately normally distributed if any of


the following conditions apply.

 The population distribution is normal.


 The sampling distribution is symmetric, unimodal, without outliers, and the
sample size is 15 or less.
 The sampling distribution is moderately skewed, unimodal, without outliers, and
the sample size is between 16 and 40.
 The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis
plan, (3) analyze sample data, and (4) interpret results.

STATEMENT OF THE HYPOTHESES

The table below shows three sets of hypotheses. Each makes a statement about how
the population mean, μ is related to a specified value μ0.

Case Null hypothesis Alternative hypothesis Number of tails


1 μ = μ0 μ ≠ μ0 2
2 μ > μ0 μ < μ0 1
3 μ < μ0 μ > μ0 1

The first case of hypotheses is an example of a two-tailed test, since an extreme value
on either side of the sampling distribution would cause to reject the null hypothesis. The
other two cases of hypotheses are one-tailed tests, since an extreme value on only one
side of the sampling distribution would cause a researcher to reject the null hypothesis.

6
TESTS FOR MEAN - KNOWN VARIANCE: (Z- Test)

Let X1, X2, ..., Xn is a random sample drawn from a normal population with known
variance σ2. Using sample data, conduct a one-sample Z-test.

Calculate the test statistic Z = X − µ


σ/ n

where Z is the standard normal variable. Compare the Z calculated value with Z α and
Z α / 2 which are the critical values from normal distribution corresponding to α and α/2
probabilities representing one tail and two-tail tests. The decisions are given in the
following table.

Null Alternative Number Reject Null


Case
hypothesis hypothesis of tails Hypothesis
1 μ = μ0 μ ≠ μ0 2 Z ≥ Z α/ 2
2 μ > μ0 μ < μ0 1 Z ≤ −Z α
3 μ < μ0 μ > μ0 1 Z ≥ Zα

EXAMPLE:

Aircrew escape systems are powered by a solid propellant. The burning rate of this
propellant is an important product characteristic. Specifications require that the mean
burning rate must be 50 centimeters per second. We know that the standard deviation
of burning rate is σ = 2 centimeter per second. The experimenter decides to specify a
type I error probability or significance level of α = 0.05 and selects a random sample of
n= 25 and obtains a sample average burning rate of x = 51.3 centimeter per second.
What conclusion should be drawn?

We may solve this problem by following the eight-step procedure outlined as follows.

1. The parameter of interest is μ, the mean burning rate.


2. H0: μ = 50 centimeter per second
3. H1: μ ≠ 50 centimeter per second
4. α = 0.05
5. The test statistics is X − µ0
Z=
σ/ n

6. Reject H0 if Z ≥ 1.96 or if Z ≤ -1.96. Note that this results from step 4,


where we specified α = 0.05 and so the boundaries of the critical region are at
Z 0.025 =1.96 and −Z 0.025 = −1.96 from normal distribution tables.
7. Computations: Since x = 51.3 and σ = 2,
51 .3 − 50
Z0 = = 3.25
2 / 25

7
8. Conclusion: Since Z0 = 3.25 > 1.96 (Zα/2), we reject H0: μ = 50 at the 0.05
level of significance. Stated more completely, we conclude that the mean
burning rate differs from 50 centimeters per second, based on a sample of 25
measurements. In fact, there is strong evidence that the mean burning rate is
not equal to 50 centimeters per second.

Note: In case of one tail tests –

(i) when the alternative is of the type H1: μ > 50 centimeter per second the
conclusion would be to reject H0: μ ≤ 50 at the 0.05 level of significance,
since Z0 = 3.25 > 1.64 (Zα). Stated more completely, we conclude that the
mean burning rate differs from 50 centimeters per second, based on a
sample of 25 measurements. In fact, there is strong evidence that the
mean burning rate exceeds 50 centimeters per second.

(ii) when the alternative is of the type H1: μ < 50 centimeter per second the
conclusion would be do not reject H0: μ ≥ 50 at the 0.05 level of
significance, since Z0 = 3.25 > - 1.64 (Zα). Stated more completely, we
conclude that the mean burning rate differs from 50 centimeters per
second, based on a sample of 25 measurements. In fact, there is strong
evidence that the mean burning rate exceeds 50 centimeters per second.

TESTS FOR MEAN - UNKNOWN VARIANCE: (t - Test)

Let X1, X2, … , Xn is a random sample drawn from a normal population with unknown
variance σ2. Using sample data, conduct a one-sample t-test.

Calculate the test statistic t = X − µ


s/ n

where t has a t-distribution with (n-1) degrees of freedom. Compare the t calculated
value with t α and tα / 2 which are the critical values from normal distribution
corresponding to α and α/2 probabilities representing one tail and two-tail tests. The
decisions are given in the following table.

Null Alternative Number Reject Null


Case
hypothesis hypothesis of tails Hypothesis
1 μ = μ0 μ ≠ μ0 2 t ≥tα / 2
2 μ > μ0 μ < μ0 1 t ≤ −tα
3 μ < μ0 μ > μ0 1 t ≥ tα

EXAMPLE

8
The increased availability of light materials with high strength has revolutionized the
design and manufacture of golf clubs, particularly drivers. Clubs with hollow heads and
very thin faces can result in much longer tee shots, especially for players of modest
skills. This is due partly to the “spring – like effect” that the thin face imparts to the ball.
Firing of golf ball at the head of the club and measuring the ratio of the outgoing velocity
of the ball to the incoming velocity can quantify this spring like effect. The ratio of
velocities is called the coefficient of restitution of the club. An experiment was
performed in which 15 drivers produced by a particular club maker were selected at
random and their coefficients of restitution measured. In the experiment the golf balls
were fired from air cannon so that the incoming velocity and spin rate of the ball could
be precisely controlled. It is of interest to determine if there is evidence (with α = 0.05) to
support a claim that the mean coefficient of restitution exceeds 0.82. The observations
are:
0.8411 0.8191 0.8182 0.8125 0.8750
0.8580 0.8532 0.8483 0.8276 0.7983
0.8042 0.8730 0.8282 0.8359 0.8660

We may solve this problem by following the eight-step procedure outlined as follows.

1. The parameter of interest is the mean coefficient of restitution, μ.


2. H0: μ = 0.82
3. H1: μ > 0.82. We want to reject H0 if the mean coefficient of restitution
exceeds 0.82
4. α = 0.05
5. The test statistics is X − µ0
t0 =
s/ n

6. Reject H0 if t0 ≥ 1.761 (t0.05, 14)


7. Computations: Since x = 0.83725, s = 0.02456, μ0 = 0.82 and n = 15 we
have,
0.83725 − 0.82
t0 = = 2.72
0.02456 / 15

8. Conclusion: Since t0 = 2.72 > 1.761 (t0.05, 14), we reject H0: μ = 0.82 at the
0.05 level of significance that the mean coefficient of restitution exceeds 0.82
based on a sample of 15 measurements.

Note: In case of a two tail test –

i. when the alternative is of the type H1: μ ≠ 0.82 the conclusion would be to
reject H0: μ = 0.82 at the 0.05 level of significance, since t0 = 2.72 > 2.145 (t0.025). Stated
more completely, we conclude that the mean coefficient of restitution is not equal to
0.82 based on a sample of 15 measurements

In case of an one tail test –

9
ii. when the alternative is of the type H1: μ < 0.82, the conclusion would be
do not reject H0: μ = 0.82 at the 0.05 level of significance, since t0 = 2.72 > - 1.761 (
-t0.05, 14),. Stated more completely, we conclude that the mean coefficient of restitution
exceeds 0.82 based on a sample of 15 measurements

PROBLEMS:

1. An inventor has developed a new, energy-efficient lawn mower engine. He


claims that the engine will run continuously for 5 hours (300 minutes) on a single
gallon of regular gasoline. Suppose a simple random sample of 50 engines is
tested. The engines run for an average of 295 minutes, with a standard deviation
of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes
against the alternative hypothesis that the mean run time is not 300 minutes. Use
a 0.05 level of significance. (Assume that run times for the population of engines
are normally distributed.)

2. Bon Air Elementary School has 300 students. The principal of the school thinks
that the average IQ of students at Bon Air is at least 110. To prove her point, she
administers an IQ test to 20 randomly selected students. Among the sampled
students, the average IQ is 108 with a standard deviation of 10. Based on these
results, should the principal accept or reject her original hypothesis? Assume a
significance level of 0.01.

10
TESTS FOR A POPULATION PROPORTION

Suppose that a random sample of size ‘n’ is drawn from a large population and that
X(≤n) observations in this sample belong to a specified class of interest. Then P̂ = X/n
is a point estimator of the proportion ‘p’ of the population that belongs to this class. Note
that n and p are parameters of binomial distribution. When n is relatively large and if p
is not too close to either 0 (zero) or 1 (one), then P̂ is approximately normal with mean
p and variance p(1- p)/n. For this approximation we require that np and n(1-p) be
greater than 5.

We will consider testing the following hypothesis

Case Null hypothesis Alternative hypothesis Number of tails


1 p= p0 p≠ p0 2
2 p > p0 p < p0 1
3 p < p0 p > p0 1

Let X be the number of observations in a random sample of size ‘n’ that belongs to the
class associated with p. Using sample data, conduct a one-sample Z-test as follows:

Calculate the test statistic Pˆ − p 0


Z=
p 0 (1 − p 0 ) / n

where Z has the standard normal distribution. Compare the Z calculated value with Z α
and Z α / 2 which are the critical values from normal distribution corresponding to α and
α/2 probabilities representing one tail and two-tail tests. The decisions are given in the
following table.

Null Alternative Number Reject Null


Case
hypothesis hypothesis of tails Hypothesis
1 p = p0 p ≠ p0 2 Z ≥ Z α/ 2
2 p > p0 p < p0 1 Z ≤ −Z α
3 p < p0 p > p0 1 Z ≥ Zα

The first case of hypotheses is an example of a two-tailed test, since an extreme value
on either side of the sampling distribution would cause to reject the null hypothesis. The
other two cases of hypotheses are one-tailed tests, since an extreme value on only one
side of the sampling distribution would cause a researcher to reject the null hypothesis.
EXAMPLE:

A semiconductor manufacturer produces controllers used in automobile engine


applications. The customer requires that the process fraction defective at a critical

11
manufacturing step not exceed 0.05 and that the manufacturer demonstrate process
capability at this level of quality using α = 0.05. The semiconductor manufacturer takes
a random sample of 200 devices and finds that 4 of them are defective. Can the
manufacturer demonstrate process capability for the customer?

We may solve this problem by following the eight-step procedure outlined as follows.

1. The parameter of interest is the process fraction defective p.


2. H0: p = 0.05
3. H1: p < 0.05; (this formulation of the problem will allow the
manufacturer to make a strong claim about process capability if null hypothesis
H0: p = 0.05 is rejected).
4. α = 0.05
5. The test statistics is Pˆ − p 0
Z0 =
p 0 (1 − p 0 ) / n

6. Reject H0 if Z ≤ -1.96. Note that this results from step 4, where we


specified α = 0.05 and so the boundary of the critical region is −Z 0.025 = −1.96
from normal distribution tables.
7. Computations: Since X=4, n=200, p0= 0.05 and P̂ = 0.02
0.02 − 0.05
Z0 = = −1.95
0.05 (1 − 0.95 ) / 200

8. Conclusion: Since Z0 = -1.95 < -1.645 (Zα), we reject H0: p = 0.05 at


the 0.05 level of significance and conclude that the process fraction defective is
less than 0.05. Hence the process is capable.

PROBLEMS:

1. The CEO of a large electric utility claims that 80 percent of his 1,000,000
customers are very satisfied with the service they receive. To test this claim,
the local newspaper surveyed 100 customers, using simple random sampling.
Among the sampled customers, 73 percent say they are very satisfied. Based
on these findings, can we reject the CEO's hypothesis that 80% of the
customers are very satisfied? Use a 0.05 level of significance.

2. The previous problem is stated a little bit differently. Suppose the CEO claims
that at least 80 percent of the company's 1,000,000 customers are very
satisfied. Again, 100 customers are surveyed using simple random sampling.
The result: 73 percent are very satisfied. Based on these results, should we
accept or reject the CEO's hypothesis? Assume a significance level of 0.05.

12
TESTS FOR DIFFERENCE BETWEEN TWO POPULATION PROPORTIONS

We now consider the case where there are two binomial parameters of interest, say p1
and p2. Suppose that two independent random samples of sizes n1 and n2 are taken
from two populations, and let X1 and X2 represent the number of observations that
belong to the class of interest in samples or 1 and 2, respectively. Let,

X , X X + X2
Pˆ1 = 1 Pˆ2 = 2 and Pˆ = 1
n1 n2 n1 + n 2

Then we test the following hypothesis:

Case Null hypothesis Alternative hypothesis Number of tails


p1 = p2 p1 ≠ p2
1 or or 2
p1 - p2 = 0 p1 - p2 ≠ 0
p1 ≥ p2 p1 < p2
2 or or 1
p1 - p2 ≥ 0 p1 - p2 < 0
p1 ≤ p2 p1 > p2
3 or or 1
p1 - p2 ≤ 0 p1 - p2 > 0

Calculate the test statistic Pˆ1 − Pˆ2 − ( p1 − p 2 )


Z=
p1 (1 − p1 ) p 2 (1 − p 2 )
+
n1 n2

where Z has the standard normal distribution.

Under null hypothesis the test statistic becomes


Pˆ1 − Pˆ2
Z0 =
1 1 
Pˆ (1 − Pˆ ) + 
 n1 n2 

Compare the calculated value of Z0 with Z α and Z α / 2 which are the critical values from
normal distribution corresponding to α and α/2 probabilities, representing one tail and
two-tail tests. The decisions are given in the following table.

Null Alternative Number Reject Null


Case
hypothesis hypothesis of tails Hypothesis
1 p1 = p2 p1 ≠ p2 2

13
or or Z ≥ Z α/ 2
p1 - p2 = 0 p1 - p2 ≠ 0
p1 ≥ p2 p1 < p2
2 or or 1 Z ≤ −Z α
p1 - p2 ≥ 0 p1 - p2 < 0
p1 ≤ p2 p1 > p2
3 or or 1 Z ≥ Zα
p1 - p2 ≤ 0 p1 - p2 > 0

EXAMPLE

Suppose the Acme Drug Company develops a new drug, designed to prevent colds.
The company states that the drug is equally effective for men and women. To test this
claim, they choose a simple random sample of 100 women and 200 men from a
population of 100,000 volunteers. At the end of the study, 38% of the women caught a
cold; and 51% of the men caught a cold. Based on these findings, can we reject the
company's claim that the drug is equally effective for men and women? Use a 0.05 level
of significance.

We may solve this problem by following the eight-step procedure outlined as follows.
This is a two tail test.

1. The parameters of interest are p1 and p2 the proportion of men and women who
caught a cold.
. H0: p1 = p2
. H1: p1 ≠ p2
. α = 0.05
. The test statistics is
Pˆ1 − Pˆ2
Z0 =
1 1 
Pˆ (1 − Pˆ ) + 
 n1 n 2 

where, Pˆ1 = 0.51 , Pˆ2 = 0.38 and Pˆ = 0.5933 , n1=200 and n2 = 100

6. Reject H0 if Z ≥ 1.96 or if Z ≤ -1.96. Note that this results from step 4, where we
specified α = 0.05 and so the boundaries of the critical region are at
Z 0.025 =1.96 and −Z 0.025 = −1.96 from normal distribution tables.

. Computations: The value of the test statistics is,

14
0.51 − 0.38
Z0 = = 2.17
 1 1 
0.5933 (1 − 0.5933 ) + 
 200 100 

8. Conclusion: Since Z0 = 2.17 > 1.96 (Zα/2), we reject H0: p1 = p2 at the 0.05 level
of significance. Hence we reject the claim that the drug is equally effective for
men and women.

Note: In case of one tail tests –

i. when the alternative is of the type H 1: p1 > p2 the conclusion would be to reject
H0: p1 = p2 at the 0.05 level of significance, since Z0 = 2.17 > 1.64 (Zα) and
conclude that the drug is more effective for men than women.

ii. when the alternative is of the type H1: p1 < p2 the conclusion would be do not
reject H0: p1 = p2 at the 0.05 level of significance, since Z0 = 2.17 > - 1.64 (Zα)
and conclude that the drug is more effective for men than women.

TESTS FOR DIFFERENCE IN TWO MEANS - KNOWN VARIANCE: (Z- Test)

Let us see how to conduct a hypothesis test for the difference between two means from
two independent populations with means μ1 and μ2 and variances σ12 and σ22
respectively. Inferences will be based on two random samples of sizes n1 and n2
respectively.

Let X11, X12, … , X1n1 is a random sample of n1 observations from the population with
mean μ1 and variance σ12 and X21, X22, … , X2n2 is a random sample of n2 observations
from the population with mean μ2 and variance σ22. Assume that both populations are
independent and normal.

Then the random variable,

X 1 − X 2 − ( µ1 − µ 2 )
Z=
σ 12 σ 22
+
n1 n2

has a N(0,1) distribution.

We now consider hypothesis testing on the difference in the means, μ1 - μ2. The various
hypotheses are stated in the following table.

Case Null hypothesis Alternative hypothesis Number of tails

15
μ1 = μ 2 μ1≠ μ2
1 or or 2
μ1 - μ2 =0 μ1 - μ2 ≠0
μ1 ≥ μ2 μ1 < μ2
2 or or 1
μ1 - μ2 ≥0 μ1 - μ2 <0
μ1 ≤ μ2 μ1 > μ2
3 or or 1
μ1 - μ2 ≤ 0 μ1 - μ2 > 0

Given the two sample observations, calculate the sample means x1 and x 2 . Under
the null hypothesis the test statistic becomes,
x1 − x 2
Z0 =
σ 12 σ 22
+
n1 n 2

Compare the calculated value of Z0 with Z α and Z α / 2 which are the critical values from
normal distribution corresponding to α and α/2 probabilities, representing one tail and
two-tail tests. The decisions are given in the following table.

Case Null Alternative Number of Reject Null


hypothesis hypothesis tails Hypothesis
μ1 = μ 2 μ1≠ μ2
1 or or 2 Z 0 ≥ Zα/ 2
μ1 - μ2 =0 μ1 - μ2 ≠0
μ1 ≥ μ2 μ1 < μ2
2 or or 1 Z 0 ≤ −Z α
μ1 - μ2 ≥0 μ1 - μ2 <0
μ1 ≤ μ2 μ1 > μ2
3 or or 1 Z 0 ≥ Zα
μ1 - μ2 ≤ 0 μ1 - μ2 > 0

EXAMPLE:

A product developer is interested in reducing the drying time of primer paint. Two
formulations of the paint are tested. Formulation-1 is the standard chemistry and
Formulation-2 has a new drying ingredient that should reduce the drying time. From
experience it is known that the standard deviation of drying time is 8 minutes and this
inherent variability should be unaffected by the addition of the new ingredient. Ten
specimens are painted with Formulation-1 and another 10 specimens are painted with

16
Formulation-2; the 20 specimens are painted in random order. The two sample average
drying times are x1 = 121 minutes and x 2 = 112 minutes respectively. What
conclusions can be the product developer draw about the effectiveness of the new
ingredient, using α = 0.05.

We may solve this problem by following the eight-step procedure outlined as follows.

1. The parameter of interest is the difference in mean drying times, μ 1 -


μ2
2. H0: μ 1 - μ 2 = 0, (or) μ 1 = μ 2
3. H1: μ 1 - μ 2 > 0, (or) μ 1 > μ 2. We want to reject H0 if the new
ingredient reduces the drying time.
4. α = 0.05
5. The test statistics is
x1 − x 2
Z0 =
σ 12 σ 22
+
n1 n2

6. Reject H0 if Z0 ≥ 1.645 (Zα). Note that this results from step 4, where
we specified α = 0.05 and so the boundaries of the critical region are at
Z 0.05 =1.645 from normal distribution tables.
7. Computations: Since x1 =121 minutes, x 2 =112 minutes, σ12 =σ22 =
82 = 64 minutes and n1= n2 = 10, the value of the test statistics is,
121 − 112
Z0 = = 2.52
 82 82 
 + 
 10 10 

8. Conclusion: Since Z0 = 2.52 > 1.645, we reject H 0: μ 1 - μ 2 = 0 at the


0.05 level of significance and conclude that adding the new ingredient to the
paint significantly reduces the drying time.

Note:
(i) In case of a two tail test – when the alternative hypothesis is of the type H1:
μ 1 ≠ μ2, (or) H1: μ 1 - μ 2 ≠ 0 the conclusion would be to reject H0: μ 1 - μ 2 = 0
at the 0.05 level of significance, since Z0 = 2.52 > 1.96 (Zα/2). That is the
mean drying time is significantly different for the two types of primers.

(ii) In case of an one tail test – when the alternative hypothesis is of the
type H1: μ 1 < μ 2 (or), H1: μ 1 - μ 2 < 0, the conclusion would be do not reject
H0: μ 1 - μ 2 = 0 at the 0.05 level of significance, since Z0 = 2.52 > - 1.645
and conclude that adding the new ingredient to the paint significantly reduces
the drying time.

TESTS FOR DIFFERENCE IN TWO MEANS - UNKNOWN VARIANCE: (t- Test)

17
Let us see how to conduct a hypothesis test for the difference between two means from
two independent populations with means μ1 and μ2 and unknown variances σ12 and
σ22 respectively. Here it is assumed that σ12 = σ22 = σ2. That is the variances of the
two normal populations are unknown but are equal. Inferences will be based on two
random samples of sizes n1 and n2 respectively.

Let X11, X12, … , X1n1 is a random sample of n1 observations from the population with
mean μ1 and X21, X22, … , X2n2 is a random sample of n2 observations from the
population with mean μ2 and common unknown variance σ2. Assume that both
populations are independent and normal.

The pooled estimator of the common variance σ2 from the samples is

(n1 − 1) s12 + (n 2 − 1) s 22 1 1
s 2p = , where s1 = n −1 ∑( x1i − x1 ) and s 2 = n − 1 ∑( x 2i − x 2 )
2 2 2 2

n1 + n 2 − 2 1 2

Then the random variable,

X 1 − X 2 − ( µ1 − µ 2 )
t=
1 1
sp +
n1 n 2

has t-distribution with n1+n2-2 degrees of freedom.

We now consider hypothesis testing on the difference in the means, μ1 - μ2. The various
hypotheses are stated in the following table.

Case Null hypothesis Alternative hypothesis Number of tails


μ1 = μ 2 μ1≠ μ2
1 or or 2
μ1 - μ2 =0 μ1 - μ2 ≠0
μ1 ≥ μ2 μ1 < μ2
2 or or 1
μ1 - μ2 ≥0 μ1 - μ2 <0

18
μ1 ≤ μ2 μ1 > μ2
3 or or 1
μ1 - μ2 ≤ 0 μ1 - μ2 > 0

Given the two sample observations, calculate the sample means x1 , x 2 and the
pooled estimate sp. Under the null hypothesis the test statistic becomes,
x1 − x 2
to =
1 1
sp +
n1 n2

Compare the calculated value of t0 with t α and tα / 2 which are the critical values from
normal distribution corresponding to α and α/2 probabilities, representing one tail and
two-tail tests. The decisions are given in the following table.

Case Null Alternative Number of Reject Null


hypothesis hypothesis tails Hypothesis
μ1 = μ 2 μ1≠ μ2 t 0 ≥ tα / 2
1 or or 2
μ1 - μ2 =0 μ1 - μ2 ≠0
μ1 ≥ μ2 μ1 < μ2 t 0 ≤ −tα
2 or or 1
μ1 - μ2 ≥0 μ1 - μ2 <0
μ1 ≤ μ2 μ1 > μ2 t 0 ≥ tα
3 or or 1
μ1 - μ2 ≤ 0 μ1 - μ2 > 0

EXAMPLE:

Two catalysts are being analyzed to determine how they affect the mean yield of a
chemical process. Specifically, catalyst-1 is currently in use, but catalyst-2 is
acceptable. Since catalyst-2 cheaper, it should be adopted, if it does not change the
process yield. A test is run in the pilot plant and the results are shown in the table
below. Is there any difference between the mean yields? Use α = 0.05, and assume
equal variances.

Observation Number Catalyst – 1 Catalyst – 2


1 91.50 89.19
2 94.18 90.95
3 92.18 90.46
4 95.39 93.21
5 91.79 97.19
6 89.07 97.04
7 94.72 91.07
8 89.21 92.75
19
From the data it can be calculated that x1 = 92.255, x 2 = 92.733, s1 = 2.39 and s2 = 2.98.

The problem is solved using the eight-step hypothesis testing procedure as follows.

1. The parameters of interest are μ 1 and μ 2, and we want to know if μ 1 -


μ 2=0.
2. H0: μ 1 - μ 2 = 0, (or) μ 1 = μ 2
3. H1: μ 1 - μ 2 ≠ 0, (or) μ 1 ≠ μ 2.
4. α = 0.05
5. The test statistics is x1 − x 2
to =
1 1
sp +
n1 n2

6. Reject H0 if t0 > 2.145 ( = t0.025,14) (or) if t0 < -2.145 (= -t0.025,14). Note that
this results from step 4, where we specified α = 0.05 and so the boundaries of
the critical region are at t0.025,14 = 2.145 from t-distribution tables (two tail).
7. Computations: We have x1 = 92.255, x 2 = 92.733, s1 = 2.39, s2 =
2.98 and n1= n2 = 8. Therefore,
(n − 1) s12 + (n 2 − 1) s 22 7(2.39) 2 + 7(2.98) 2
s 2p = 1 = = 7.30
n1 + n2 − 2 8+8−2

And sp = √7.30 = 2.70

Hence the value of test statistic is,


92 .255 − 92 .733
t0 = = −0.35
1 1 
2.70  + 
8 8 

8. Conclusion: Since t0 = -0.35, we have -2.145 < t0 = -0.35 < 2.145,


we do not reject H0: μ 1 - μ 2 = 0 at the 0.05 level of significance and conclude
that there is no strong evidence that catalyst-2 results in a mean yield that differs
from the mean yield when catalyst-1 is used.

Note: In case of one tail tests –

20
i. when the alternative is of the type H1: μ1 - μ2 > 0 (or) H1: μ1 > μ2 the
conclusion would be do not reject H0: μ 1 - μ 2 = 0, (or) μ 1 = μ 2 at the 0.05 level of
significance, since t0 = -0.35 < 1.761 (=t0.05,14). We conclude that the mean yield of
Catalyst-1 is not significantly greater than the mean yield when catalyst-2 is used.

ii. when the alternative is of the type H1: μ1- μ2 < 0 (or) H1: μ1 < μ2 the
conclusion would be do not reject H0: μ 1 - μ 2 = 0, (or) μ 1 = μ 2 at the 0.05 level of
significance, since t0 = -0.35 > -1.761 (= -t0.05,14). We conclude that the mean yield of
Catalyst-1 is not significantly less than the mean yield when catalyst-2 is used.

PROBLEMS

1. Within a school district, students were randomly assigned to one of two Math
teachers - Mrs. Smitha and Mrs. Lakshmi. After the assignment, Mrs. Smitha had 30
students, and Mrs. Lakshmi had 25 students. At the end of the year, each class took
the same standardized test. Mrs. Smitha’s students had an average test score of 78,
with a standard deviation of 10; and Mrs. Lakshmi 's students had an average test
score of 85, with a standard deviation of 15. Test the hypothesis that Mrs. Smitha
and Mrs. Lakshmi are equally effective teachers. Use a 0.10 level of significance.
(Assume that student performance is approximately normal.)

2. The Acme Company has developed a new battery. The engineer in charge claims
that the new battery will operate continuously for at least 7 minutes longer than the
old battery. To test the claim, the company selects a simple random sample of 100
new batteries and 100 old batteries. The old batteries run continuously for 190
minutes with a standard deviation of 20 minutes; the new batteries, 200 minutes with
a standard deviation of 40 minutes. Test the engineer's claim that the new batteries
run at least 7 minutes longer than the old. Use a 0.05 level of significance.

21
PAIRED t-TEST

The paired t-test is generally used when measurements are taken from the same
subject before and after some manipulation such as injection of a drug. For example, ya
paired t test can be used to determine the significance of a difference in blood pressure
before and after administration of an experimental drug. Paired t-test may also be used
to compare samples that are subjected to different conditions, provided the samples in
each pair are identical otherwise. For example, we might test the effectiveness of a
water additive in reducing bacterial numbers by sampling water from different sources
and comparing bacterial counts in the treated versus untreated water sample. Each
different water source would give a different pair of data points.

The number of points in each data set must be the same, and they must be organized in
pairs, in which there is a definite relationship between each pair of data points. Clearly
for paired t-test, the data is dependent, i.e. there is a one-to-one correspondence
between the values in the two samples. For example, same subject measured before
and after a process change or same subject measured at different times.

Let (X11, X21), (X12, X22), … , (X1n, X2n) be a set of n paired observations of a sample
drawn from two populations with means μ1 and μ2 and variances σ12 and σ22 respectively.
Define the differences between each pair of observations as Dj = X1j - X2j, j = 1,2, … , n.
Then Dj’’s are assumed to be normally distributed with mean μD = μ1 - μ2 and variance
σD2. Hence testing hypothesis about the difference between μ1 and μ2 can be
accomplished by performing a one-sample t-test on μD.

D − µD
Then, t D = has a t-distribution with (n-1) degrees of freedom. An estimator of
σD / n
1 1
σD2 is given by s D =
2

n −1
∑(di − d ) 2 where di = x1j-x2j and d = ∑d i
n

We now consider hypothesis testing on the difference in the means, μ1 - μ2. The various
hypotheses are stated in the following table.

Case Null
Alternative hypothesis Number of tails
hypothesis
1 μD = 0 μD ≠ 0 2
2 μD ≥ 0 μD < 0 1
3 μD ≤ 0 μD > 0 1

Given the pairs of sample observations, calculate d and sD2. Under the null hypothesis
the test statistic becomes,

d
t0 =
sD / n

22
where t0 has a t-distribution with (n-1) degrees of freedom. Compare the t0 calculated
value with tα and tα / 2 which are the critical values from normal distribution corresponding
to α and α/2 probabilities representing one tail and two-tail tests. The decisions are
given in the following table.

Null Alternative Number Reject Null


Case
hypothesis hypothesis of tails Hypothesis
1 μD = 0 μD ≠ 0 2 t 0 ≥ tα / 2
2 μD ≥ 0 μD < 0 1 t 0 ≤ −tα
3 μD ≤ 0 μD > 0 1 t 0 ≥ tα

EXAMPLE:

The following data refers to Strength predictions for nine Steel Plate Girders by
Karlsruhe and Lehigh Methods. Test whether there is any significant difference between
the two methods.

Girder Karlsruhe Method Lehigh Method Difference dj


1 1.186 1.061 0.119
2 1.151 0.992 0.159
3 1.322 1.063 0.259
4 1.339 1.062 0.277
5 1.200 1.065 0.138
6 1.402 1.178 0.224
7 1.365 1.037 0.328
8 1.537 1.086 0.451
9 1.559 1.052 0.507

The problem is solved using the eight-step hypothesis testing procedure as follows.

1. The parameters of interest is the difference in mean strength


between the two methods, say, μD = μ1 - μ 2 = 0.
2. H0: μD = 0
3. H1: μD ≠ 0
4. α = 0.05
5. The test statistics is
d
t0 =
sD / n

6. Reject H0 if t0 > 2.306 ( = t0.025,8) (or) if t0 < -2.306 (= -t0.025,8). Note that
this results from step 4, where we specified α = 0.05 and so the boundaries of
the critical region are at t0.025,8= 2.306 from t-distribution tables (two tail).

23
7. Computations: We have d = 0.2736, and sD = 0.1356 and n = 9.
Therefore,
0.2736
t0 = = 6.05
0.1356 9

8. Conclusion: Since t0 = 6.05 > 2.306 ( = t0.025,8) we reject H0: μD = 0 at


the 0.05 level of significance and conclude that the strength prediction methods
yield different results.

Note: In case of one tail tests –

iii. when the alternative is of the type H1: μD > 0 the conclusion would be to
reject H0: μD = 0, at the 0.05 level of significance, since t0 = 6.05 > 1.860 (=t0.05,8).
Specifically, the data indicate that the Karlsruhe Method produces, on the average
higher strength predictions than does the Lehigh Methods.

iv. when the alternative is of the type H1: μD < 0 the conclusion would be do
not reject H0: μD = 0, at the 0.05 level of significance, since t 0 = 6.05 >
-1.860 (=-t0.05,8). Specifically, the data indicate that the Karlsruhe Method produces, on
the average higher strength predictions than does the Lehigh Methods.

TEST FOR SINGLE VARIACNE

Suppose that we wish to test the hypothesis that the variance of a normal population σ 2
equals a specified value, say σ02 or equivalently, that the standard deviation σ is equal
σ0. Let X1, X2, … , Xn be a random sample of n observations from this population.

The table below shows three sets of hypotheses for testing the variance. Each makes a
statement about how the population variance, σ2 is related to a specified value σ02.

Case Null hypothesis Alternative hypothesis Number of tails


1 σ2 = σ02 σ2 ≠σ02 2
2 σ2 ≥ σ02 σ2 < σ02 1
3 σ2 ≤ σ02 σ2 > σ02 1

( n − 1) S 2
We use the test statistic χ =
2
.
σ 02
( n − 1) S 2
Under the null hypothesis H0: σ = σ0 , the statistic χ =
2 2 2
has chi-square
σ 02
distribution with (n-1) degrees of freedom.

24
Given a sample of observations calculate, the sample variance
1 .
s2 = ∑
(x − x)2
(n −1)
i

Under the null hypothesis the test statistic becomes,


(n − 1) s 2
χ 02 =
σ 02
where χ02 has a chi-square distribution with (n-1) degrees of freedom. Compare the χ02
calculated value with χα2 and χ 2 which are the critical values from chi-square
α/2

distribution corresponding to α and α/2 probabilities representing one tail and two-tail
tests. The decisions are given in the following table.

Null Alternative Number Reject Null


Case
hypothesis hypothesis of tails Hypothesis
χ02 ≥ χ α / 2
2

1 σ2 = σ02 σ2 ≠σ02 2 or
χ ≤ χ 1−α / 2
2 2
0

χ02 ≤ χ1−α
2
2 σ2 ≥ σ02 σ2 < σ02 1
3 σ2 ≤ σ02 σ2 > σ02 1 χ02 ≥ χα2

EXAMPLE:

An automatic filling machine is used to fill bottles with liquid detergent. A random
sample of 20 bottles results in a sample variance of fill volume of s2 =0.0153 (fluid
ounces)2. If the variance of fill volume exceeds 0.01 (fluid ounces)2, an unacceptable
portion of bottles will be underfilled or overfilled. Is there evidence in the sample data to
suggest that the manufacturer has a problem with underfilled or overfilled bottles?
Use α = 0.05 and assume that fill volume has a normal distribution.

The problem is solved using the eight-step hypothesis testing procedure as follows.

1. The parameters of interest is the population variance, σ2


2. H0: σ2 = 0.01
3. H1: σ2 > 0.01
4. α = 0.05
5. The test statistics is 2 (n − 1) s 2
χ0 =
σ 02

6. Reject H0 if χ02 > 30.14 (= χ02.05 ,19 ). Note that this results from step 4, where we
specified α = 0.05 and so the critical region is at χ02.05 ,19 = 30.14 from chi-
square distribution tables (one tail).

25
7. Computations: We have s2 =0.0153. Therefore,
(19 (0.0153 )
χ02 = = 29 .07
0.01

8. Conclusion: Since χ02 = 29.07 < 30.14 ( = χ02.05 ,19 ) we do not reject the null
hypothesis H0: σ2 = 0.01 at the 0.05 level of significance and conclude that
there is no strong evidence that the variance of fill volume exceeds 0.01(fluid
ounces)2.

Note: (i) In case of a two tail test – when the alternative hypothesis is of the
type H1: σ2 ≠ 0.01 the conclusion would be do not reject H0: σ2 = 0.01
at the 0.05 level of significance since, χ 2 = 29.07 < 32.85 ( χ 2 ) and
0 α/2

χ = 29.07 > 8.91( χ 1−α / 2 ) and conclude that there is no strong


2 2
0

evidence that the variance of fill volume equals 0.01(fluid ounces)2.

(ii) In case of an one tail test – when the alternative hypothesis is of


the type H1: σ2 < 0.01, the conclusion would be do not reject the null
hypothesis H0: σ2 = 0.01 at the 0.05 level of significance, since
χ02 = 29.07 > 10.12 (= χ02.05 ,19 ) and conclude that there is no strong
evidence that the variance of fill volume exceeds 0.01(fluid ounces)2.

TEST FOR EQUALITY OF VARIACNES

Suppose that two independent normal populations are of interest, where the population
means and variances, say, μ1, σ12 , μ2, and σ22 are unknown. We wish to test the
hypothesis about the equality of two variances, say, H 0: σ12 = σ22. Assume that two
random samples of sizes n1 and n2 from the two populations respectively, and let s12 and
s22 be the respective samples variances based on the two samples.

26
The null and alternative hypotheses are given in the following table.

Case Null hypothesis Alternative hypothesis Number of tails


1 σ12 = σ22 σ12 ≠ σ22 2
2 σ12 ≥ σ22 σ12 < σ22 1
3 σ12 ≤ σ22 σ12 > σ22 1

Let X11, X12, … , X1n1 is a random sample of n1 observations from the population with
mean μ1 and variance σ12 and X21, X22, … , X2n2 is a random sample of n2 observations
from the population with mean μ2 and variance σ22. Assume that both populations are
independent and normal. Let s12 and s22 be the respective samples variances based on
the two samples.

σ 22 s12
Define the ratio, F = . This F statistic has F-distribution with (n1-1) numerator
σ 12 s 22
degrees of freedom and (n2-1) denominator degrees of freedom.

s12
Under the hull hypothesis the test statistic becomes, F0 = 2 .
s2
Compare the calculated value of F0 with Fα and Fα/2 which are the critical values from F-
distribution against [(n1-1), (n2-1)] degrees of freedom corresponding to α and α/2
probabilities representing one tail and two-tail tests. The decisions are given in the
following table.

Null Alternative Number Reject Null


Case
hypothesis hypothesis of tails Hypothesis
F0 ≥ Fα/2
1 σ12 = σ22 σ12 ≠ σ22 2 or
F0 ≤ F1-α/2
2 σ12 ≥ σ22 σ12 < σ22 1 F0 ≤ F1-α
3 σ12 ≤ σ22 σ12 > σ22 1 F0 ≥ Fα

EXAMPLE:

In comparing the variability of the tensile strength of two kinds of structural steel, an
experiment yielded the following results: n1 = 13, s12 = 19.2, n2 = 16 and s22 = 3.5, where
the units of measurements are thousand pounds per square inch. Assuming that the
measurements constitute independent random samples from two normal populations,
test the null hypothesis σ12 = σ22 against the alternative σ12 ≠ σ22 at α = 0.02 level of
significance.

The problem is solved using the eight-step hypothesis testing procedure as follows.

1. The parameters of interest are σ12 and σ22

27
2. H0: σ12 = σ22
3. H1: σ12 ≠ σ22
4. α = 0.05
5. The test statistics is s12
F0 =
s 22

6. Reject H0 if F0 ≥ 2.96 (=Fα/2) or F0 ≤ 0.350 (=F1-α/2). Note that this results from
step 4, where we specified α = 0.02 (two tail).

7. Computations: We have s12 = 19.2, and s22 = 3.5.


Therefore, F0 = (19.2/3.5) = 5.49.

8. Conclusion: F0 = 5.49 > 2.96 (=Fα/2) the conclusion is to reject the null
hypothesis H0: σ12 = σ22 and conclude that the variability of the tensile strength
of the two kinds of steel is not the same.

Note: In case of one tail tests –


i. when the alternative is of the type H1: σ12 > σ22 the conclusion would be to
reject H0: σ12 ≤ σ22at the 0.05 level of significance, since F0 = 5.49 > 2.40
(Fα) and conclude that the variability of the tensile strength of the first kind
of steel is greater than the variability of the tensile strength of the second
kind of steel.

ii. when the alternative is of the type H1: σ12 < σ22 the conclusion would be do
not reject H0: σ12 ≥ σ22at the 0.05 level of significance, since F0 = 5.49 >
0.371 (F1-α) and conclude that the variability of the tensile strength of the
first kind of steel is greater than the variability of the tensile strength of the
second kind of steel.

CHI-SQUARE TEST FOR GOODNESS OF FIT

Chi-Square goodness of fit test is a non-parametric test that is used to find out how the
observed value of a given phenomena is significantly different from the expected value.
In Chi-Square goodness of fit test, the term goodness of fit is used to compare the
observed sample distribution with the expected probability distribution. Chi-Square
goodness of fit test determines how well theoretical distribution (such as normal,
binomial, or Poisson) fits the empirical distribution. In Chi-Square goodness of fit test,
sample data is divided into intervals. Then the numbers of points that fall into the
interval are compared, with the expected numbers of points in each interval.

28
PROCEDURE FOR CHI-SQUARE GOODNESS OF FIT TEST

1. Set up the hypothesis for Chi-Square goodness of fit test:

a. Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis


assumes that there is no significant difference between the observed and the
expected value. In other words, the data follows a specified distribution.
b. Alternative hypothesis: In Chi-Square goodness of fit test, the alternative
hypothesis assumes that there is a significant difference between the
observed and the expected value. In other words, the data does not follow a
specified distribution.

2. Compute the value of Chi-Square goodness of fit test using the following formula:
k
(Oi − Ei ) 2
χ2 = ∑
i =1 Ei
where, χ = Chi-Square goodness of fit test statistic, O= observed value and
2

E= expected value.

The test statistic follows, approximately, a χ2 distribution with (k - c) degrees of


freedom where k is the number of non-empty cells and c = the number of
parameters to be estimated + 1

3. Degree of freedom: In Chi-Square goodness of fit test, the degree of freedom


depends on the distribution of the sample. The following table shows the
distribution and an associated degree of freedom:

Type of distribution c Degree of freedom


Binominal distribution
2 n-2
(if p is estimated)
Poisson distribution 2 n-2
Normal distribution 3 n-3

4. Hypothesis testing: Hypothesis testing in Chi-Square goodness of fit test is the


same as in other tests, like Z- test, t-test, etc. The calculated value of Chi-Square
goodness of fit test is compared with the table value corresponding to (k-c)
degrees of freedom and at α level of significance. If the calculated value of Chi-
Square goodness of fit test is greater than the table value, we will reject the null
hypothesis and conclude that there is a significant difference between the
observed and the expected frequency. If the calculated value of Chi-Square
goodness of fit test is less than the table value, we will accept the null hypothesis

29
and conclude that there is no significant difference between the observed and
expected value.
EXAMPLES

1. For example, in 200 flips of a coin, one would expect 100 heads and 100 tails.
But what if 92 heads and 108 tails are observed? Would we reject the hypothesis
that the coin is fair? Or would we attribute the difference between observed and
expected frequencies to random fluctuation?

Solution:

Null hypothesis: The frequency of heads is equal to the frequency


of tails.
Alternative hypothesis: The frequency of heads is not equal to
the frequency of tails.
The expected frequencies in each of the two categories (heads or tails)
are not independent. To obtain the expected frequency of tails (100), we
need only to subtract the expected frequency of heads (100) from the
total frequency (200), or 200 - 100 = 100. Thus, given the expected
frequency in one of the categories, the expected frequency in the other is
readily determined. In other words, only the expected frequency in one of
the two categories is free to vary; that is, there is only 1 degree of
freedom.

The calculation of the χ2 statistic is shown in the table below.

Face O E O-E (O-E)2 (O-E)2/E


Heads 92 100 -8 64 0.64
Tails 108 100 8 64 0.64
Total 200 200 0 χ2 = 1.28

Conclusion:
The critical values of χ2 for 1 degree of freedom, with α = .05 and α = .
01 are 3.841 and 6.635, respectively. As the calculated value of χ2 is less
than the table value at both α = .05 and α = .01 levels of significance we do
not reject the null hypothesis and conclude that the coin is fair. That is,
frequency of heads is equal to the frequency of tails.

2. Suppose we hypothesize that we have an unbiased six-sided die. To


test this hypothesis, we roll the die 300 times and observe the frequency
of occurrence of each of the faces. Because we hypothesized that the die
is unbiased, we expect that the number on each face will occur 50 times.
However, suppose we observe frequencies of occurrence as follows:

Face 1 2 3 4 5 6

30
Value
Occurrenc 42 55 38 57 64 44
e

What would we conclude? Is the die biased, or do we attribute the


difference to random fluctuation?
Solution:

Null hypothesis: The die is fair. In other words, the frequency of


occurrence of each of the six faces of the die is the same.

Alternative hypothesis: The die is not fair. In other words, the


frequency of occurrence of each of the six faces of the die is not the
same.

There are six possible categories of outcomes: the occurrence of the six
faces. Under the assumption that the die is fair, we would expect that the
frequency of occurrence of each of the six faces of the die would be 50.
Note again that the expected frequencies in each of these categories are
not independent. Once the expected frequency for five of the categories
is known, the expected frequency of the sixth category is uniquely
determined, since the total frequency equals 300. Thus, only the
expected frequencies in five of the six categories are free to vary; there
are only 5 degrees of freedom associated with this example.

The calculation of the χ2 statistic is shown in the table below.

Face
O E O-E (O-E)2 (O-E)2/E
Value
1 42 50 -8 64 1.28
2 55 50 5 25 0.5
3 38 50 -12 144 2.88
4 57 50 7 49 0.98
5 64 50 14 196 3.92
6 44 50 -6 36 0.72
Total 300 300 0 10.28 = χ2

Conclusion:
The critical values of χ2 for 5 degree of freedom, with α = .05 and α = .
01 are 11.070 and 15.086, respectively. As the calculated value of χ2 is
less than the table value at both α = .05 and α = .01 levels of significance
we do not reject the null hypothesis and conclude that the die is fair. That
is, the frequency of occurrence of each of the six faces of the die is the
same.

31
3. The president of a major University hypothesizes that at least 90
percent of the teaching and research faculty will favor a new university
policy on consulting with private and public agencies within the state.
Thus, for a random sample of 200 faculty members, the president would
expect 0.90 x 200 = 180 to favor the new policy and 0.10 x 200 = 20 to
oppose it. Suppose, however, for this sample, 168 faculty members favor
the new policy and 32 oppose it. Is the difference between observed and
expected frequencies sufficient to reject the president's hypothesis that
90 percent would favor the policy? Or would the differences be attributed
to chance fluctuation?
Solution:

Null hypothesis: The faculty favouring the new policy is 90 percent.

Alternative hypothesis: The faculty favouring the new policy is not 90


percent.

The expected number of faculty members who oppose it (20) can be


found by subtracting the expected number who supports it (180) from the
total number in the sample (200), or 200 - 180 = 20. Thus, given the
expected frequency in one of the categories, the expected frequency in
the other is readily determined. In other words, only the expected
frequency in one of the two categories is free to vary; that is, there is only
1 degree of freedom.

The calculation of the χ2 statistic is shown in the table below.

Disposition O E O-E (O-E)2 (O-E)2/E


Favour 168 180 - 12 144 0.80
Oppose 32 20 12 144 7.20
Total 200 200 0 8.00 = χ2

Conclusion:
The critical values of χ2 for 1 degree of freedom, with α = .05 and α = .
01 are 3.841 and 6.635, respectively. As the calculated value of χ2 is
greater than the table value at both α = .05 and α = .01 levels of significance
we reject the null hypothesis and conclude that the faculty favouring the
new policy is not 90 percent.

32
CHI-SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES

CONTINGENCY TABLES

A frequency table in which a sample is classified according to the distinct classes of two
different attributes is called a contingency table. It is often of interest to test the
hypothesis that, in the population from which the sample was drawn, the two attributes
are independent. An mxn contingency table has m rows and n columns.

A typical mxn contingency table is given below:

Rows Columns ( Attribute 1)


(Attribute 2) 1 2 ... j … n Total
1 O11 O12 O1j O1n R1
2 O21 O22 O2j O2n R2
. . . . .
i Oi1. Oi2. Oij Oin. Ri
. .
m Om1 Om2 Omj Omn Rm
Total C1 C2 Cj Cn N

The expected frequencies are obtained as follows: The expected frequency of the cell
corresponding to i-th row and j-th column is found by

i th RowTotal × j th ColumnTota l Ri × C j
Eij = =
GrandTotal N

Using the observed frequencies given in the contingency table and the expected
frequencies found using the above formula, we may test the null hypothesis that,

H0: The two attributes are independent (or) the two attributes are not associated.

H1: The two attributes are not independent (or) the two attributes are associated.

The test statistic to test the above hypothesis is given by,

m
χ 2 = ∑∑
n (Oij − E ij ) 2
or simply for easy of understanding χ 2 = ∑
( O − E) 2 .
i j E ij E

This χ2 statistic has chi-square distribution with (m-1)(n-1) degrees of freedom.

The decision is to reject the null hypothesis H0 If the calculated value of χ2 , say χ0 is
2

greater than the table value of χ2 at α level of significance corresponding to (m-1)(n-1)


degrees of freedom.

33
EXAMPLE

The following data were collected in a study on the effectiveness of inoculation for a
particular disease. The two attributes in this case are;

Attribute A: whether or not the person was inoculated; and

Attribute B: whether or not they contracted the disease

The 2x2 contingency table is

Attribute B
Attribute A Disease No disease
Inoculated 10 50
Not Inoculated 30 40

In this case the null hypothesis and alternative hypothesis are stated as,

H0: Contracting the disease is independent of inoculation

H1: Contracting the disease is not independent of inoculation

Expected Frequencies

Observed frequencies Disease No disease Total


Inoculated 10 50 60
Not Inoculated 30 40 70
Total 40 90 130
If the null hypothesis is true then 40 people will contract the disease regardless of
whether or not they were inoculated. i.e., under the null hypothesis Ho, the probability of
contracting the disease is 40/130. Thus out of the 60 people inoculated we would
expect (40 x 60)/130 =18.5 of them to contract the disease. The expected frequencies
are therefore

Expected frequencies Disease No disease Total


Inoculated 18.5 41.5 60
Not Inoculated 21.5 48.5 70
Total 40 90 130

34
The test statistic is χ = ∑2 ( O − E) 2
E

χ =∑
2 ( O − E) 2 =
( 10 − 18.5)
2

+
( .)
50 − 415
2

+
( . )
30 − 215
2

+
( 40 − 48.5)
2

= 10.5
E 18.5 415
. 21.5 48.5

The critical value for a 1% significance level with 1 d.f. is 6.63. The null hypothesis is
therefore rejected at this level and it can be concluded that inoculation does have an
effect on the probability of contracting the disease. From the contingency table it can be
seen that inoculation reduces the risk.

Note
For chi-squared tests expected frequencies should be at least 5.

PROBLEMS

1. An analysis of accident data was made to determine if the distribution of


fatal accidents was dependent on the size of car involved. The following data were
collected:

Small Medium Large


Fatal 67 26 16
Non-fatal 128 63 46

Test the hypothesis that the probability of a fatality is dependent on the size of car.

2. Low birth weight in babies is defined as weights below 2500 grams. The following
table shows the number of low birth weight babies for three groups of mothers, non-
smokers, smokers and ex-smokers. Do the results suggest that the smoking habits
of the mother have an effect on birth weight?
Birth weight Non-smoker Smoker Ex-smoker
< 2500 grams 140 153 27
≥ 2500 grams 2197 1510 433

ADDITIONAL PROBLEMS FOR PRACTICE

1. A soft drink manufacturer, situated in western India, wants to know whether


there is a difference in product acceptance by sex groups. In a market survey, 58
percent of 200 men questioned liked the product and 50 percent of 150 women
questioned liked the product. Is there a significant difference in product acceptance by
men and women?

35
2. The president of the college has reported that the average age of evening
students is 35 years. A random sample of 100 evening students was taken and it was
found that the average of the sample was 34 years with a SD of 5 years. At 1 % level
can we conclude that the resident’s claim is correct?

3. An educator claims that the average IQ of city college students is not more than
110. To test this claim, a random sample of 150 students was taken and gave relevant
test. Their average IQ score came to 11.2 with a standard deviation of 7.2. At level of
significance 0.05 test is the claim of the educator is justified.

4. A sample of 450 items is taken from a population whose standard deviation is


20. Mean of the sample is 30. Test whether the sample has come from a population
with mean 29 at 5% level of significance.

5. A random sample of 400 men and 600 women were asked whether they would
like to have a flyover near their residence. 200 men and 325 were in favour of the
proposal. Test the hypothesis that proportion of men and women in favour of the
proposal are the same.

6. A manufacturer claimed that at least 95% of the equipment which he supplied to


a factory conformed to specifications. An examination of a sample of 200 pieces of
equipment revealed that 18 were faulty. Test his claim at a significance level of 5%
and 1%.

7. A die was thrown 9000 times and through of 3 or 4 observed 3240 times. Show
that the die can not be regarded as unbiased one.

8. In a sample of 600 men from a certain large city, 450 are found be smokers. In
one of 900 from another large city 450 are smokers. Do the data indicate that the
cities are significantly different with respect to the prevalence of smoking among men?

9. In a year there are 956 births in a town A of which 52.5% were male while in
town A and B combined this proportion in a total of 1406 birth was 0.496. Is there any
significance difference in the proportion of male births in the two towns?

10. In two large populations, there are 30 and 25% respectively for fair haired
people. Is this difference likely to be hidden in samples of 1200 and 900 respectively
from the two populations?

11. A machine puts out 16 imperfect articles in a sample of 500. After machine is
overhauled, it puts out 3 imperfect articles in a batch of 100. Has the machine
improved?

12. Two independent random samples of 30 and 40 individuals trained at two


centers provide the examination scores in the following table:

36
Examination Score Results
Training centre Training Centre
A B
Sample size 1. 30 2. 40
Mean 3. 82 4. 78
SD 5. 8 6. 10

Using the above data test whether the two recent centers differ in terms of
educational quality

13. Suppose that 100 tires made by a certain manufacturer lasted on the average
21819 miles with a standard deviation of 1295 miles. Test the null hypothesis that
µ=22000 miles against the alternative hypothesis µ≠22000, at the 0.05 level of
significance.

37

You might also like