Testing of Hypothesis
Testing of Hypothesis
Often, we may want to know things about populations but don’t have data for every
person or thing in the population. If a company’s customer service division wanted to
learn whether its customers were satisfied, it would not be practical or perhaps even
possible to contact every individual who purchased a product. Instead, the company
might select a sample or samples of the population.
For instance, we may be interested in the mean fill volume of a can of soft drink. The
mean fill volume in the population is required to be 300 millilitres. An Engineer takes a
random sample of 25 cans and computes the sample average to be x = 298 millilitres.
The Engineer will probably decide that the population mean is µ = 300 millilitres, even
though the sample was 298 millilitres because he or she knows that the sample mean is
a reasonable estimate of µ and that a sample mean of 298 millilitres is very likely to
occur, even if the true population mean is µ = 300 millilitres. In fact, if the true mean is
300 millilitres, tests of 25 cans made repeatedly, perhaps, every five minutes, would
produce values of x that vary both above and below µ = 300 millilitres.
The sample mean is a statistic; that is, a random variable that depends on the results
obtained in each particular sample. Since statistic is a random variable, it has a
probability distribution.
1
parameter based on a statistic computed from a sample randomly drawn from that
population.
Suppose that a random sample of size n is taken from a normal population with mean µ
and variance σ 2 . Now each observation in this sample, say, X1, X2, ..., Xn, is a normally
and independently distributed random variable with mean µ and variance σ 2 . Then by
the reproductive property of normal distribution, the sample mean,
X1 + X 2 + + X n
X =
n
µ + µ ++ µ
µX = =µ
n
and variance
σ 2 + σ 2 + + σ 2 σ 2
σ X2 = =
n2 n
If we are sampling from a population that has an unknown probability distribution, the
sampling distribution of the sample mean will still be approximately normal with mean µ
σ2
and variance , if the sample size is large. This is one of the most useful theorems
n
called central limit theorem.
If X1, X2, ... , Xn is a random sample of size n taken from a population (finite or infinite)
with µ and finite variance σ 2 , and if X is the sample mean, the limiting form of the
distribution of
X −µ
Z = , as n → ∞, is the standard normal distribution.
σ/ n
2
NOMENCLATURE AND DEFINITIONS
Statistical Hypothesis
A statistical hypothesis is a statement about the parameters of one or more populations.
Test statistic
Sample statistic used to decide whether to reject or fail to reject the null hypothesis.
Critical region
Set of all values which would cause us to reject H0 (Region of rejection).
Critical value(s)
The value(s) which separate the critical region from the non-critical region. The critical
values are determined independently of the sample statistics.
Decision
A statement based upon the null hypothesis. It is either "reject the null hypothesis" or
"fail to reject the null hypothesis". We will never accept the null hypothesis.
Conclusion
A statement which indicates at what level of significance the null hypothesis is rejected
or not rejected.
Statisticians will never accept the null hypothesis, we will fail to reject. In other words,
we'll say that it isn't, or that we don't have enough evidence to say that it isn't, but we'll
3
never say that it is, because someone else might come along with another sample
which shows that it isn't and we don't want to be wrong.
Type I error
Rejecting the null hypothesis when it is true. Usually the more serious error.
Type II error
Failing to reject the null hypothesis when it is false.
Alpha (α)
Probability of committing a Type I error.
Beta (β)
Probability of committing a Type II error.
Since Type I is the more serious error (usually), that is the one we concentrate on. We
usually pick alpha to be very small (0.05, 0.01). Notice here that, alpha is not a Type I
error. Alpha is the probability of committing a Type I error. Likewise beta is the
probability of committing a Type II error
TYPE OF TESTS
The type of test is determined by the Alternative Hypothesis (H1)
4
Right Tailed Test
test statistic < critical value (left) or test statistic > critical value (right)
2. The second step is to consider the assumptions being made in doing the test;
for example, assumptions about the statistical independence or about the form
of the distributions of the observations. This is equally important as invalid
assumptions will mean that the results of the test are invalid.
3. Compute the relevant test statistic. The distribution of such a statistic under
the null hypothesis can be derived from the assumptions. In standard cases this
will be a well-known result. For example the test statistics may follow a
Student’s t distribution or a normal distribution. The distribution of the test
statistic partitions the possible values of the estimator into those for which the
null-hypothesis is rejected and those for which it is not.
4. Compare the test-statistic (S) to the relevant critical values (CV) (obtained from
tables in standard cases).
5. Decide to either fail to reject the null hypothesis or reject it in favor of the
alternative. The decision rule is to reject the null hypothesis (H0) if S > CV and
vice versa.
5
TESTS FOR MEAN
We will see how to conduct a test of hypothesis for a mean, when the following
conditions are met:
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis
plan, (3) analyze sample data, and (4) interpret results.
The table below shows three sets of hypotheses. Each makes a statement about how
the population mean, μ is related to a specified value μ0.
The first case of hypotheses is an example of a two-tailed test, since an extreme value
on either side of the sampling distribution would cause to reject the null hypothesis. The
other two cases of hypotheses are one-tailed tests, since an extreme value on only one
side of the sampling distribution would cause a researcher to reject the null hypothesis.
6
TESTS FOR MEAN - KNOWN VARIANCE: (Z- Test)
Let X1, X2, ..., Xn is a random sample drawn from a normal population with known
variance σ2. Using sample data, conduct a one-sample Z-test.
where Z is the standard normal variable. Compare the Z calculated value with Z α and
Z α / 2 which are the critical values from normal distribution corresponding to α and α/2
probabilities representing one tail and two-tail tests. The decisions are given in the
following table.
EXAMPLE:
Aircrew escape systems are powered by a solid propellant. The burning rate of this
propellant is an important product characteristic. Specifications require that the mean
burning rate must be 50 centimeters per second. We know that the standard deviation
of burning rate is σ = 2 centimeter per second. The experimenter decides to specify a
type I error probability or significance level of α = 0.05 and selects a random sample of
n= 25 and obtains a sample average burning rate of x = 51.3 centimeter per second.
What conclusion should be drawn?
We may solve this problem by following the eight-step procedure outlined as follows.
7
8. Conclusion: Since Z0 = 3.25 > 1.96 (Zα/2), we reject H0: μ = 50 at the 0.05
level of significance. Stated more completely, we conclude that the mean
burning rate differs from 50 centimeters per second, based on a sample of 25
measurements. In fact, there is strong evidence that the mean burning rate is
not equal to 50 centimeters per second.
(i) when the alternative is of the type H1: μ > 50 centimeter per second the
conclusion would be to reject H0: μ ≤ 50 at the 0.05 level of significance,
since Z0 = 3.25 > 1.64 (Zα). Stated more completely, we conclude that the
mean burning rate differs from 50 centimeters per second, based on a
sample of 25 measurements. In fact, there is strong evidence that the
mean burning rate exceeds 50 centimeters per second.
(ii) when the alternative is of the type H1: μ < 50 centimeter per second the
conclusion would be do not reject H0: μ ≥ 50 at the 0.05 level of
significance, since Z0 = 3.25 > - 1.64 (Zα). Stated more completely, we
conclude that the mean burning rate differs from 50 centimeters per
second, based on a sample of 25 measurements. In fact, there is strong
evidence that the mean burning rate exceeds 50 centimeters per second.
Let X1, X2, … , Xn is a random sample drawn from a normal population with unknown
variance σ2. Using sample data, conduct a one-sample t-test.
where t has a t-distribution with (n-1) degrees of freedom. Compare the t calculated
value with t α and tα / 2 which are the critical values from normal distribution
corresponding to α and α/2 probabilities representing one tail and two-tail tests. The
decisions are given in the following table.
EXAMPLE
8
The increased availability of light materials with high strength has revolutionized the
design and manufacture of golf clubs, particularly drivers. Clubs with hollow heads and
very thin faces can result in much longer tee shots, especially for players of modest
skills. This is due partly to the “spring – like effect” that the thin face imparts to the ball.
Firing of golf ball at the head of the club and measuring the ratio of the outgoing velocity
of the ball to the incoming velocity can quantify this spring like effect. The ratio of
velocities is called the coefficient of restitution of the club. An experiment was
performed in which 15 drivers produced by a particular club maker were selected at
random and their coefficients of restitution measured. In the experiment the golf balls
were fired from air cannon so that the incoming velocity and spin rate of the ball could
be precisely controlled. It is of interest to determine if there is evidence (with α = 0.05) to
support a claim that the mean coefficient of restitution exceeds 0.82. The observations
are:
0.8411 0.8191 0.8182 0.8125 0.8750
0.8580 0.8532 0.8483 0.8276 0.7983
0.8042 0.8730 0.8282 0.8359 0.8660
We may solve this problem by following the eight-step procedure outlined as follows.
8. Conclusion: Since t0 = 2.72 > 1.761 (t0.05, 14), we reject H0: μ = 0.82 at the
0.05 level of significance that the mean coefficient of restitution exceeds 0.82
based on a sample of 15 measurements.
i. when the alternative is of the type H1: μ ≠ 0.82 the conclusion would be to
reject H0: μ = 0.82 at the 0.05 level of significance, since t0 = 2.72 > 2.145 (t0.025). Stated
more completely, we conclude that the mean coefficient of restitution is not equal to
0.82 based on a sample of 15 measurements
9
ii. when the alternative is of the type H1: μ < 0.82, the conclusion would be
do not reject H0: μ = 0.82 at the 0.05 level of significance, since t0 = 2.72 > - 1.761 (
-t0.05, 14),. Stated more completely, we conclude that the mean coefficient of restitution
exceeds 0.82 based on a sample of 15 measurements
PROBLEMS:
2. Bon Air Elementary School has 300 students. The principal of the school thinks
that the average IQ of students at Bon Air is at least 110. To prove her point, she
administers an IQ test to 20 randomly selected students. Among the sampled
students, the average IQ is 108 with a standard deviation of 10. Based on these
results, should the principal accept or reject her original hypothesis? Assume a
significance level of 0.01.
10
TESTS FOR A POPULATION PROPORTION
Suppose that a random sample of size ‘n’ is drawn from a large population and that
X(≤n) observations in this sample belong to a specified class of interest. Then P̂ = X/n
is a point estimator of the proportion ‘p’ of the population that belongs to this class. Note
that n and p are parameters of binomial distribution. When n is relatively large and if p
is not too close to either 0 (zero) or 1 (one), then P̂ is approximately normal with mean
p and variance p(1- p)/n. For this approximation we require that np and n(1-p) be
greater than 5.
Let X be the number of observations in a random sample of size ‘n’ that belongs to the
class associated with p. Using sample data, conduct a one-sample Z-test as follows:
where Z has the standard normal distribution. Compare the Z calculated value with Z α
and Z α / 2 which are the critical values from normal distribution corresponding to α and
α/2 probabilities representing one tail and two-tail tests. The decisions are given in the
following table.
The first case of hypotheses is an example of a two-tailed test, since an extreme value
on either side of the sampling distribution would cause to reject the null hypothesis. The
other two cases of hypotheses are one-tailed tests, since an extreme value on only one
side of the sampling distribution would cause a researcher to reject the null hypothesis.
EXAMPLE:
11
manufacturing step not exceed 0.05 and that the manufacturer demonstrate process
capability at this level of quality using α = 0.05. The semiconductor manufacturer takes
a random sample of 200 devices and finds that 4 of them are defective. Can the
manufacturer demonstrate process capability for the customer?
We may solve this problem by following the eight-step procedure outlined as follows.
PROBLEMS:
1. The CEO of a large electric utility claims that 80 percent of his 1,000,000
customers are very satisfied with the service they receive. To test this claim,
the local newspaper surveyed 100 customers, using simple random sampling.
Among the sampled customers, 73 percent say they are very satisfied. Based
on these findings, can we reject the CEO's hypothesis that 80% of the
customers are very satisfied? Use a 0.05 level of significance.
2. The previous problem is stated a little bit differently. Suppose the CEO claims
that at least 80 percent of the company's 1,000,000 customers are very
satisfied. Again, 100 customers are surveyed using simple random sampling.
The result: 73 percent are very satisfied. Based on these results, should we
accept or reject the CEO's hypothesis? Assume a significance level of 0.05.
12
TESTS FOR DIFFERENCE BETWEEN TWO POPULATION PROPORTIONS
We now consider the case where there are two binomial parameters of interest, say p1
and p2. Suppose that two independent random samples of sizes n1 and n2 are taken
from two populations, and let X1 and X2 represent the number of observations that
belong to the class of interest in samples or 1 and 2, respectively. Let,
X , X X + X2
Pˆ1 = 1 Pˆ2 = 2 and Pˆ = 1
n1 n2 n1 + n 2
Compare the calculated value of Z0 with Z α and Z α / 2 which are the critical values from
normal distribution corresponding to α and α/2 probabilities, representing one tail and
two-tail tests. The decisions are given in the following table.
13
or or Z ≥ Z α/ 2
p1 - p2 = 0 p1 - p2 ≠ 0
p1 ≥ p2 p1 < p2
2 or or 1 Z ≤ −Z α
p1 - p2 ≥ 0 p1 - p2 < 0
p1 ≤ p2 p1 > p2
3 or or 1 Z ≥ Zα
p1 - p2 ≤ 0 p1 - p2 > 0
EXAMPLE
Suppose the Acme Drug Company develops a new drug, designed to prevent colds.
The company states that the drug is equally effective for men and women. To test this
claim, they choose a simple random sample of 100 women and 200 men from a
population of 100,000 volunteers. At the end of the study, 38% of the women caught a
cold; and 51% of the men caught a cold. Based on these findings, can we reject the
company's claim that the drug is equally effective for men and women? Use a 0.05 level
of significance.
We may solve this problem by following the eight-step procedure outlined as follows.
This is a two tail test.
1. The parameters of interest are p1 and p2 the proportion of men and women who
caught a cold.
. H0: p1 = p2
. H1: p1 ≠ p2
. α = 0.05
. The test statistics is
Pˆ1 − Pˆ2
Z0 =
1 1
Pˆ (1 − Pˆ ) +
n1 n 2
where, Pˆ1 = 0.51 , Pˆ2 = 0.38 and Pˆ = 0.5933 , n1=200 and n2 = 100
6. Reject H0 if Z ≥ 1.96 or if Z ≤ -1.96. Note that this results from step 4, where we
specified α = 0.05 and so the boundaries of the critical region are at
Z 0.025 =1.96 and −Z 0.025 = −1.96 from normal distribution tables.
14
0.51 − 0.38
Z0 = = 2.17
1 1
0.5933 (1 − 0.5933 ) +
200 100
8. Conclusion: Since Z0 = 2.17 > 1.96 (Zα/2), we reject H0: p1 = p2 at the 0.05 level
of significance. Hence we reject the claim that the drug is equally effective for
men and women.
i. when the alternative is of the type H 1: p1 > p2 the conclusion would be to reject
H0: p1 = p2 at the 0.05 level of significance, since Z0 = 2.17 > 1.64 (Zα) and
conclude that the drug is more effective for men than women.
ii. when the alternative is of the type H1: p1 < p2 the conclusion would be do not
reject H0: p1 = p2 at the 0.05 level of significance, since Z0 = 2.17 > - 1.64 (Zα)
and conclude that the drug is more effective for men than women.
Let us see how to conduct a hypothesis test for the difference between two means from
two independent populations with means μ1 and μ2 and variances σ12 and σ22
respectively. Inferences will be based on two random samples of sizes n1 and n2
respectively.
Let X11, X12, … , X1n1 is a random sample of n1 observations from the population with
mean μ1 and variance σ12 and X21, X22, … , X2n2 is a random sample of n2 observations
from the population with mean μ2 and variance σ22. Assume that both populations are
independent and normal.
X 1 − X 2 − ( µ1 − µ 2 )
Z=
σ 12 σ 22
+
n1 n2
We now consider hypothesis testing on the difference in the means, μ1 - μ2. The various
hypotheses are stated in the following table.
15
μ1 = μ 2 μ1≠ μ2
1 or or 2
μ1 - μ2 =0 μ1 - μ2 ≠0
μ1 ≥ μ2 μ1 < μ2
2 or or 1
μ1 - μ2 ≥0 μ1 - μ2 <0
μ1 ≤ μ2 μ1 > μ2
3 or or 1
μ1 - μ2 ≤ 0 μ1 - μ2 > 0
Given the two sample observations, calculate the sample means x1 and x 2 . Under
the null hypothesis the test statistic becomes,
x1 − x 2
Z0 =
σ 12 σ 22
+
n1 n 2
Compare the calculated value of Z0 with Z α and Z α / 2 which are the critical values from
normal distribution corresponding to α and α/2 probabilities, representing one tail and
two-tail tests. The decisions are given in the following table.
EXAMPLE:
A product developer is interested in reducing the drying time of primer paint. Two
formulations of the paint are tested. Formulation-1 is the standard chemistry and
Formulation-2 has a new drying ingredient that should reduce the drying time. From
experience it is known that the standard deviation of drying time is 8 minutes and this
inherent variability should be unaffected by the addition of the new ingredient. Ten
specimens are painted with Formulation-1 and another 10 specimens are painted with
16
Formulation-2; the 20 specimens are painted in random order. The two sample average
drying times are x1 = 121 minutes and x 2 = 112 minutes respectively. What
conclusions can be the product developer draw about the effectiveness of the new
ingredient, using α = 0.05.
We may solve this problem by following the eight-step procedure outlined as follows.
6. Reject H0 if Z0 ≥ 1.645 (Zα). Note that this results from step 4, where
we specified α = 0.05 and so the boundaries of the critical region are at
Z 0.05 =1.645 from normal distribution tables.
7. Computations: Since x1 =121 minutes, x 2 =112 minutes, σ12 =σ22 =
82 = 64 minutes and n1= n2 = 10, the value of the test statistics is,
121 − 112
Z0 = = 2.52
82 82
+
10 10
Note:
(i) In case of a two tail test – when the alternative hypothesis is of the type H1:
μ 1 ≠ μ2, (or) H1: μ 1 - μ 2 ≠ 0 the conclusion would be to reject H0: μ 1 - μ 2 = 0
at the 0.05 level of significance, since Z0 = 2.52 > 1.96 (Zα/2). That is the
mean drying time is significantly different for the two types of primers.
(ii) In case of an one tail test – when the alternative hypothesis is of the
type H1: μ 1 < μ 2 (or), H1: μ 1 - μ 2 < 0, the conclusion would be do not reject
H0: μ 1 - μ 2 = 0 at the 0.05 level of significance, since Z0 = 2.52 > - 1.645
and conclude that adding the new ingredient to the paint significantly reduces
the drying time.
17
Let us see how to conduct a hypothesis test for the difference between two means from
two independent populations with means μ1 and μ2 and unknown variances σ12 and
σ22 respectively. Here it is assumed that σ12 = σ22 = σ2. That is the variances of the
two normal populations are unknown but are equal. Inferences will be based on two
random samples of sizes n1 and n2 respectively.
Let X11, X12, … , X1n1 is a random sample of n1 observations from the population with
mean μ1 and X21, X22, … , X2n2 is a random sample of n2 observations from the
population with mean μ2 and common unknown variance σ2. Assume that both
populations are independent and normal.
(n1 − 1) s12 + (n 2 − 1) s 22 1 1
s 2p = , where s1 = n −1 ∑( x1i − x1 ) and s 2 = n − 1 ∑( x 2i − x 2 )
2 2 2 2
n1 + n 2 − 2 1 2
X 1 − X 2 − ( µ1 − µ 2 )
t=
1 1
sp +
n1 n 2
We now consider hypothesis testing on the difference in the means, μ1 - μ2. The various
hypotheses are stated in the following table.
18
μ1 ≤ μ2 μ1 > μ2
3 or or 1
μ1 - μ2 ≤ 0 μ1 - μ2 > 0
Given the two sample observations, calculate the sample means x1 , x 2 and the
pooled estimate sp. Under the null hypothesis the test statistic becomes,
x1 − x 2
to =
1 1
sp +
n1 n2
Compare the calculated value of t0 with t α and tα / 2 which are the critical values from
normal distribution corresponding to α and α/2 probabilities, representing one tail and
two-tail tests. The decisions are given in the following table.
EXAMPLE:
Two catalysts are being analyzed to determine how they affect the mean yield of a
chemical process. Specifically, catalyst-1 is currently in use, but catalyst-2 is
acceptable. Since catalyst-2 cheaper, it should be adopted, if it does not change the
process yield. A test is run in the pilot plant and the results are shown in the table
below. Is there any difference between the mean yields? Use α = 0.05, and assume
equal variances.
The problem is solved using the eight-step hypothesis testing procedure as follows.
6. Reject H0 if t0 > 2.145 ( = t0.025,14) (or) if t0 < -2.145 (= -t0.025,14). Note that
this results from step 4, where we specified α = 0.05 and so the boundaries of
the critical region are at t0.025,14 = 2.145 from t-distribution tables (two tail).
7. Computations: We have x1 = 92.255, x 2 = 92.733, s1 = 2.39, s2 =
2.98 and n1= n2 = 8. Therefore,
(n − 1) s12 + (n 2 − 1) s 22 7(2.39) 2 + 7(2.98) 2
s 2p = 1 = = 7.30
n1 + n2 − 2 8+8−2
20
i. when the alternative is of the type H1: μ1 - μ2 > 0 (or) H1: μ1 > μ2 the
conclusion would be do not reject H0: μ 1 - μ 2 = 0, (or) μ 1 = μ 2 at the 0.05 level of
significance, since t0 = -0.35 < 1.761 (=t0.05,14). We conclude that the mean yield of
Catalyst-1 is not significantly greater than the mean yield when catalyst-2 is used.
ii. when the alternative is of the type H1: μ1- μ2 < 0 (or) H1: μ1 < μ2 the
conclusion would be do not reject H0: μ 1 - μ 2 = 0, (or) μ 1 = μ 2 at the 0.05 level of
significance, since t0 = -0.35 > -1.761 (= -t0.05,14). We conclude that the mean yield of
Catalyst-1 is not significantly less than the mean yield when catalyst-2 is used.
PROBLEMS
1. Within a school district, students were randomly assigned to one of two Math
teachers - Mrs. Smitha and Mrs. Lakshmi. After the assignment, Mrs. Smitha had 30
students, and Mrs. Lakshmi had 25 students. At the end of the year, each class took
the same standardized test. Mrs. Smitha’s students had an average test score of 78,
with a standard deviation of 10; and Mrs. Lakshmi 's students had an average test
score of 85, with a standard deviation of 15. Test the hypothesis that Mrs. Smitha
and Mrs. Lakshmi are equally effective teachers. Use a 0.10 level of significance.
(Assume that student performance is approximately normal.)
2. The Acme Company has developed a new battery. The engineer in charge claims
that the new battery will operate continuously for at least 7 minutes longer than the
old battery. To test the claim, the company selects a simple random sample of 100
new batteries and 100 old batteries. The old batteries run continuously for 190
minutes with a standard deviation of 20 minutes; the new batteries, 200 minutes with
a standard deviation of 40 minutes. Test the engineer's claim that the new batteries
run at least 7 minutes longer than the old. Use a 0.05 level of significance.
21
PAIRED t-TEST
The paired t-test is generally used when measurements are taken from the same
subject before and after some manipulation such as injection of a drug. For example, ya
paired t test can be used to determine the significance of a difference in blood pressure
before and after administration of an experimental drug. Paired t-test may also be used
to compare samples that are subjected to different conditions, provided the samples in
each pair are identical otherwise. For example, we might test the effectiveness of a
water additive in reducing bacterial numbers by sampling water from different sources
and comparing bacterial counts in the treated versus untreated water sample. Each
different water source would give a different pair of data points.
The number of points in each data set must be the same, and they must be organized in
pairs, in which there is a definite relationship between each pair of data points. Clearly
for paired t-test, the data is dependent, i.e. there is a one-to-one correspondence
between the values in the two samples. For example, same subject measured before
and after a process change or same subject measured at different times.
Let (X11, X21), (X12, X22), … , (X1n, X2n) be a set of n paired observations of a sample
drawn from two populations with means μ1 and μ2 and variances σ12 and σ22 respectively.
Define the differences between each pair of observations as Dj = X1j - X2j, j = 1,2, … , n.
Then Dj’’s are assumed to be normally distributed with mean μD = μ1 - μ2 and variance
σD2. Hence testing hypothesis about the difference between μ1 and μ2 can be
accomplished by performing a one-sample t-test on μD.
D − µD
Then, t D = has a t-distribution with (n-1) degrees of freedom. An estimator of
σD / n
1 1
σD2 is given by s D =
2
n −1
∑(di − d ) 2 where di = x1j-x2j and d = ∑d i
n
We now consider hypothesis testing on the difference in the means, μ1 - μ2. The various
hypotheses are stated in the following table.
Case Null
Alternative hypothesis Number of tails
hypothesis
1 μD = 0 μD ≠ 0 2
2 μD ≥ 0 μD < 0 1
3 μD ≤ 0 μD > 0 1
Given the pairs of sample observations, calculate d and sD2. Under the null hypothesis
the test statistic becomes,
d
t0 =
sD / n
22
where t0 has a t-distribution with (n-1) degrees of freedom. Compare the t0 calculated
value with tα and tα / 2 which are the critical values from normal distribution corresponding
to α and α/2 probabilities representing one tail and two-tail tests. The decisions are
given in the following table.
EXAMPLE:
The following data refers to Strength predictions for nine Steel Plate Girders by
Karlsruhe and Lehigh Methods. Test whether there is any significant difference between
the two methods.
The problem is solved using the eight-step hypothesis testing procedure as follows.
6. Reject H0 if t0 > 2.306 ( = t0.025,8) (or) if t0 < -2.306 (= -t0.025,8). Note that
this results from step 4, where we specified α = 0.05 and so the boundaries of
the critical region are at t0.025,8= 2.306 from t-distribution tables (two tail).
23
7. Computations: We have d = 0.2736, and sD = 0.1356 and n = 9.
Therefore,
0.2736
t0 = = 6.05
0.1356 9
iii. when the alternative is of the type H1: μD > 0 the conclusion would be to
reject H0: μD = 0, at the 0.05 level of significance, since t0 = 6.05 > 1.860 (=t0.05,8).
Specifically, the data indicate that the Karlsruhe Method produces, on the average
higher strength predictions than does the Lehigh Methods.
iv. when the alternative is of the type H1: μD < 0 the conclusion would be do
not reject H0: μD = 0, at the 0.05 level of significance, since t 0 = 6.05 >
-1.860 (=-t0.05,8). Specifically, the data indicate that the Karlsruhe Method produces, on
the average higher strength predictions than does the Lehigh Methods.
Suppose that we wish to test the hypothesis that the variance of a normal population σ 2
equals a specified value, say σ02 or equivalently, that the standard deviation σ is equal
σ0. Let X1, X2, … , Xn be a random sample of n observations from this population.
The table below shows three sets of hypotheses for testing the variance. Each makes a
statement about how the population variance, σ2 is related to a specified value σ02.
( n − 1) S 2
We use the test statistic χ =
2
.
σ 02
( n − 1) S 2
Under the null hypothesis H0: σ = σ0 , the statistic χ =
2 2 2
has chi-square
σ 02
distribution with (n-1) degrees of freedom.
24
Given a sample of observations calculate, the sample variance
1 .
s2 = ∑
(x − x)2
(n −1)
i
distribution corresponding to α and α/2 probabilities representing one tail and two-tail
tests. The decisions are given in the following table.
1 σ2 = σ02 σ2 ≠σ02 2 or
χ ≤ χ 1−α / 2
2 2
0
χ02 ≤ χ1−α
2
2 σ2 ≥ σ02 σ2 < σ02 1
3 σ2 ≤ σ02 σ2 > σ02 1 χ02 ≥ χα2
EXAMPLE:
An automatic filling machine is used to fill bottles with liquid detergent. A random
sample of 20 bottles results in a sample variance of fill volume of s2 =0.0153 (fluid
ounces)2. If the variance of fill volume exceeds 0.01 (fluid ounces)2, an unacceptable
portion of bottles will be underfilled or overfilled. Is there evidence in the sample data to
suggest that the manufacturer has a problem with underfilled or overfilled bottles?
Use α = 0.05 and assume that fill volume has a normal distribution.
The problem is solved using the eight-step hypothesis testing procedure as follows.
6. Reject H0 if χ02 > 30.14 (= χ02.05 ,19 ). Note that this results from step 4, where we
specified α = 0.05 and so the critical region is at χ02.05 ,19 = 30.14 from chi-
square distribution tables (one tail).
25
7. Computations: We have s2 =0.0153. Therefore,
(19 (0.0153 )
χ02 = = 29 .07
0.01
8. Conclusion: Since χ02 = 29.07 < 30.14 ( = χ02.05 ,19 ) we do not reject the null
hypothesis H0: σ2 = 0.01 at the 0.05 level of significance and conclude that
there is no strong evidence that the variance of fill volume exceeds 0.01(fluid
ounces)2.
Note: (i) In case of a two tail test – when the alternative hypothesis is of the
type H1: σ2 ≠ 0.01 the conclusion would be do not reject H0: σ2 = 0.01
at the 0.05 level of significance since, χ 2 = 29.07 < 32.85 ( χ 2 ) and
0 α/2
Suppose that two independent normal populations are of interest, where the population
means and variances, say, μ1, σ12 , μ2, and σ22 are unknown. We wish to test the
hypothesis about the equality of two variances, say, H 0: σ12 = σ22. Assume that two
random samples of sizes n1 and n2 from the two populations respectively, and let s12 and
s22 be the respective samples variances based on the two samples.
26
The null and alternative hypotheses are given in the following table.
Let X11, X12, … , X1n1 is a random sample of n1 observations from the population with
mean μ1 and variance σ12 and X21, X22, … , X2n2 is a random sample of n2 observations
from the population with mean μ2 and variance σ22. Assume that both populations are
independent and normal. Let s12 and s22 be the respective samples variances based on
the two samples.
σ 22 s12
Define the ratio, F = . This F statistic has F-distribution with (n1-1) numerator
σ 12 s 22
degrees of freedom and (n2-1) denominator degrees of freedom.
s12
Under the hull hypothesis the test statistic becomes, F0 = 2 .
s2
Compare the calculated value of F0 with Fα and Fα/2 which are the critical values from F-
distribution against [(n1-1), (n2-1)] degrees of freedom corresponding to α and α/2
probabilities representing one tail and two-tail tests. The decisions are given in the
following table.
EXAMPLE:
In comparing the variability of the tensile strength of two kinds of structural steel, an
experiment yielded the following results: n1 = 13, s12 = 19.2, n2 = 16 and s22 = 3.5, where
the units of measurements are thousand pounds per square inch. Assuming that the
measurements constitute independent random samples from two normal populations,
test the null hypothesis σ12 = σ22 against the alternative σ12 ≠ σ22 at α = 0.02 level of
significance.
The problem is solved using the eight-step hypothesis testing procedure as follows.
27
2. H0: σ12 = σ22
3. H1: σ12 ≠ σ22
4. α = 0.05
5. The test statistics is s12
F0 =
s 22
6. Reject H0 if F0 ≥ 2.96 (=Fα/2) or F0 ≤ 0.350 (=F1-α/2). Note that this results from
step 4, where we specified α = 0.02 (two tail).
8. Conclusion: F0 = 5.49 > 2.96 (=Fα/2) the conclusion is to reject the null
hypothesis H0: σ12 = σ22 and conclude that the variability of the tensile strength
of the two kinds of steel is not the same.
ii. when the alternative is of the type H1: σ12 < σ22 the conclusion would be do
not reject H0: σ12 ≥ σ22at the 0.05 level of significance, since F0 = 5.49 >
0.371 (F1-α) and conclude that the variability of the tensile strength of the
first kind of steel is greater than the variability of the tensile strength of the
second kind of steel.
Chi-Square goodness of fit test is a non-parametric test that is used to find out how the
observed value of a given phenomena is significantly different from the expected value.
In Chi-Square goodness of fit test, the term goodness of fit is used to compare the
observed sample distribution with the expected probability distribution. Chi-Square
goodness of fit test determines how well theoretical distribution (such as normal,
binomial, or Poisson) fits the empirical distribution. In Chi-Square goodness of fit test,
sample data is divided into intervals. Then the numbers of points that fall into the
interval are compared, with the expected numbers of points in each interval.
28
PROCEDURE FOR CHI-SQUARE GOODNESS OF FIT TEST
2. Compute the value of Chi-Square goodness of fit test using the following formula:
k
(Oi − Ei ) 2
χ2 = ∑
i =1 Ei
where, χ = Chi-Square goodness of fit test statistic, O= observed value and
2
E= expected value.
29
and conclude that there is no significant difference between the observed and
expected value.
EXAMPLES
1. For example, in 200 flips of a coin, one would expect 100 heads and 100 tails.
But what if 92 heads and 108 tails are observed? Would we reject the hypothesis
that the coin is fair? Or would we attribute the difference between observed and
expected frequencies to random fluctuation?
Solution:
Conclusion:
The critical values of χ2 for 1 degree of freedom, with α = .05 and α = .
01 are 3.841 and 6.635, respectively. As the calculated value of χ2 is less
than the table value at both α = .05 and α = .01 levels of significance we do
not reject the null hypothesis and conclude that the coin is fair. That is,
frequency of heads is equal to the frequency of tails.
Face 1 2 3 4 5 6
30
Value
Occurrenc 42 55 38 57 64 44
e
There are six possible categories of outcomes: the occurrence of the six
faces. Under the assumption that the die is fair, we would expect that the
frequency of occurrence of each of the six faces of the die would be 50.
Note again that the expected frequencies in each of these categories are
not independent. Once the expected frequency for five of the categories
is known, the expected frequency of the sixth category is uniquely
determined, since the total frequency equals 300. Thus, only the
expected frequencies in five of the six categories are free to vary; there
are only 5 degrees of freedom associated with this example.
Face
O E O-E (O-E)2 (O-E)2/E
Value
1 42 50 -8 64 1.28
2 55 50 5 25 0.5
3 38 50 -12 144 2.88
4 57 50 7 49 0.98
5 64 50 14 196 3.92
6 44 50 -6 36 0.72
Total 300 300 0 10.28 = χ2
Conclusion:
The critical values of χ2 for 5 degree of freedom, with α = .05 and α = .
01 are 11.070 and 15.086, respectively. As the calculated value of χ2 is
less than the table value at both α = .05 and α = .01 levels of significance
we do not reject the null hypothesis and conclude that the die is fair. That
is, the frequency of occurrence of each of the six faces of the die is the
same.
31
3. The president of a major University hypothesizes that at least 90
percent of the teaching and research faculty will favor a new university
policy on consulting with private and public agencies within the state.
Thus, for a random sample of 200 faculty members, the president would
expect 0.90 x 200 = 180 to favor the new policy and 0.10 x 200 = 20 to
oppose it. Suppose, however, for this sample, 168 faculty members favor
the new policy and 32 oppose it. Is the difference between observed and
expected frequencies sufficient to reject the president's hypothesis that
90 percent would favor the policy? Or would the differences be attributed
to chance fluctuation?
Solution:
Conclusion:
The critical values of χ2 for 1 degree of freedom, with α = .05 and α = .
01 are 3.841 and 6.635, respectively. As the calculated value of χ2 is
greater than the table value at both α = .05 and α = .01 levels of significance
we reject the null hypothesis and conclude that the faculty favouring the
new policy is not 90 percent.
32
CHI-SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES
CONTINGENCY TABLES
A frequency table in which a sample is classified according to the distinct classes of two
different attributes is called a contingency table. It is often of interest to test the
hypothesis that, in the population from which the sample was drawn, the two attributes
are independent. An mxn contingency table has m rows and n columns.
The expected frequencies are obtained as follows: The expected frequency of the cell
corresponding to i-th row and j-th column is found by
i th RowTotal × j th ColumnTota l Ri × C j
Eij = =
GrandTotal N
Using the observed frequencies given in the contingency table and the expected
frequencies found using the above formula, we may test the null hypothesis that,
H0: The two attributes are independent (or) the two attributes are not associated.
H1: The two attributes are not independent (or) the two attributes are associated.
m
χ 2 = ∑∑
n (Oij − E ij ) 2
or simply for easy of understanding χ 2 = ∑
( O − E) 2 .
i j E ij E
The decision is to reject the null hypothesis H0 If the calculated value of χ2 , say χ0 is
2
33
EXAMPLE
The following data were collected in a study on the effectiveness of inoculation for a
particular disease. The two attributes in this case are;
Attribute B
Attribute A Disease No disease
Inoculated 10 50
Not Inoculated 30 40
In this case the null hypothesis and alternative hypothesis are stated as,
Expected Frequencies
34
The test statistic is χ = ∑2 ( O − E) 2
E
χ =∑
2 ( O − E) 2 =
( 10 − 18.5)
2
+
( .)
50 − 415
2
+
( . )
30 − 215
2
+
( 40 − 48.5)
2
= 10.5
E 18.5 415
. 21.5 48.5
The critical value for a 1% significance level with 1 d.f. is 6.63. The null hypothesis is
therefore rejected at this level and it can be concluded that inoculation does have an
effect on the probability of contracting the disease. From the contingency table it can be
seen that inoculation reduces the risk.
Note
For chi-squared tests expected frequencies should be at least 5.
PROBLEMS
Test the hypothesis that the probability of a fatality is dependent on the size of car.
2. Low birth weight in babies is defined as weights below 2500 grams. The following
table shows the number of low birth weight babies for three groups of mothers, non-
smokers, smokers and ex-smokers. Do the results suggest that the smoking habits
of the mother have an effect on birth weight?
Birth weight Non-smoker Smoker Ex-smoker
< 2500 grams 140 153 27
≥ 2500 grams 2197 1510 433
35
2. The president of the college has reported that the average age of evening
students is 35 years. A random sample of 100 evening students was taken and it was
found that the average of the sample was 34 years with a SD of 5 years. At 1 % level
can we conclude that the resident’s claim is correct?
3. An educator claims that the average IQ of city college students is not more than
110. To test this claim, a random sample of 150 students was taken and gave relevant
test. Their average IQ score came to 11.2 with a standard deviation of 7.2. At level of
significance 0.05 test is the claim of the educator is justified.
5. A random sample of 400 men and 600 women were asked whether they would
like to have a flyover near their residence. 200 men and 325 were in favour of the
proposal. Test the hypothesis that proportion of men and women in favour of the
proposal are the same.
7. A die was thrown 9000 times and through of 3 or 4 observed 3240 times. Show
that the die can not be regarded as unbiased one.
8. In a sample of 600 men from a certain large city, 450 are found be smokers. In
one of 900 from another large city 450 are smokers. Do the data indicate that the
cities are significantly different with respect to the prevalence of smoking among men?
9. In a year there are 956 births in a town A of which 52.5% were male while in
town A and B combined this proportion in a total of 1406 birth was 0.496. Is there any
significance difference in the proportion of male births in the two towns?
10. In two large populations, there are 30 and 25% respectively for fair haired
people. Is this difference likely to be hidden in samples of 1200 and 900 respectively
from the two populations?
11. A machine puts out 16 imperfect articles in a sample of 500. After machine is
overhauled, it puts out 3 imperfect articles in a batch of 100. Has the machine
improved?
36
Examination Score Results
Training centre Training Centre
A B
Sample size 1. 30 2. 40
Mean 3. 82 4. 78
SD 5. 8 6. 10
Using the above data test whether the two recent centers differ in terms of
educational quality
13. Suppose that 100 tires made by a certain manufacturer lasted on the average
21819 miles with a standard deviation of 1295 miles. Test the null hypothesis that
µ=22000 miles against the alternative hypothesis µ≠22000, at the 0.05 level of
significance.
37