BBA IV Business Statistics
BBA IV Business Statistics
BBA IV Business Statistics
1
1.1 Introduction of Estimation
2
i.e.,
−
E(X) = µ
E(p) = P
1 – 2
But the sample variance s2 =
n
( )
∑ X – X is not an unbiased estimator of the population
1 – 2
variance σ2. On the other hand S2 =
n–1
( )
X – X provides an unbiased estimate of the
population variance σ2. Thus,
E(S2) = σ2 but E(s2) ≠ σ2
An estimator t is said to be a biased estimator of θ if
E(t) ≠ θ
1 – 2
( )
Thus, s2 = ∑ X – X is a biased estimator of population variance σ2.
n
2. Consistency
A sample statistic t is known to be consistent estimator of the population parameter θ if the
difference between t and θ can be made smaller and smaller by taking the sample size (n)
larger and larger. Mathematically t is consistent estimator for θ if E (t) → θ and Var(t) → O.
−
The sample mean X is a consistent estimator of the population mean µ.
3. Efficiency
If t1 and t2 are two consistent estimators of a parameter θ such that Var(t1) < Var(t2) for all
samples of size n, then t1 is said to be more efficient than t2 . In other words, an estimator with
lesser variability is said to be more efficient and consequently more reliable than the other.
4. Sufficiency
An estimator is said to be sufficient for a parameter, if it contains all the information in the
−
sample regarding the parameter. The sample mean X is the sufficient estimator of the population
mean µ.
3
contains the population mean. For purposes of illustration, suppose we subtract Rs. 500 from Rs.
6,500 and add Rs. 500 to Rs. 6,500. Consequently, we obtain an interval (Rs. 6,000, Rs. 7,000).
Then we state that the interval Rs. 6,000 to Rs. 7,000 is likely to contain the population mean µ .
This procedure is called interval estimation. The value Rs. 6000 is called the lower limit of the
interval and the value Rs. 7,000 is called the upper limit of the interval.
The question arises: what number should we subtract from and add to a point estimate to
obtain an interval estimate? The answer to this depends on the following considerations:
−
(i) The standard error of the sample mean X
(ii) The level of confidence needed.
The interval estimate of a population parameter is called confidence interval. The two limits
within which the estimate for the population parameter lies are known as confidence limits or
Fiducial limits.
µ)
1.2 Confidence Interval estimate of the population mean (µ
N–n
Where, the factor is called the finite population correction factor of finite population
N–1
multiplier. The finite population correction factor can be ignored if sampling fraction (n/N) is less
than 5 per cent.
1.2.1 Interpreting confidence interval: What does 95% confidence interval mean?
A 95 per cent confidence interval estimate means that if all possible samples of the same size
were drawn, then 95 per cent of them would include the true population mean somewhere within the
interval around their sample mean and only 5 per cent of them would not. The values for Zα for the
most commonly used confidence levels can be seen from normal probability table as shown in the
following table.
Confidence level, (I – α) Level of Significance α Significant values Zα
90% 10% 1.645
95% 5% 1.960
98% 2% 2.326
99% 1% 2.576
Note: when no reference to the confidence level is given, then we always take Zα = 3.
Example:
The quality control manager of a tyre company has sample of 100 tyres and has found the
mean life time to be 30,214 km. The population standard deviation to be 860 km. Construct a 95%
confidence interval for the mean life for this particular brand of tyres.
Solution:
In usual notation, we are given that:
−
n = 100, X = 30,214 km, σ = 860 km
For 95% confidence interval,
(1– α) = 0.95 ⇒ α = 0.05
∴ Zα = Z0.05 = 1.96
The 95% confidence interval for the population mean is given by
− σ 860
C.I for µ = X ± Zα = 30214 ± 1.96 × = 30214 ± 168.56
n 100
Lower limit = 30214 – 168.56 = 30045.44
Upper limit = 30214 + 168.56 = 30382.56
Therefore, the 95% confidence interval for the population mean is 30045.44 km
to 30328.56 km.
5
Example:
A machine is producing ball bearings with a diameter of 0.5 inches. It is known that the
standard of the ball bearings is 0.005 inches. A sample of 100 ball bearings is selected and their
average diameter is found to be 0.48 inch. Determine the 99 per cent confidence interval.
Solution:
In usual notation, we are given that:
−
n = 100, X= 0.48 inches, σ = 0.005 inches
For 99% confidence interval,
1 – α = 0.99 ⇒ α = 0.01
∴ Zα = Z0.01 = 2.576
The 99% confidence interval for the population mean is given by
− σ 0.005
C.I for µ = X ± = 0.48 ± 2.576 ×
n 100
= 0.48 ± 0.0013
Lower limit = 0.48 – 0.0013 = 0.4787
Upper limit = 0.48 + 0.0013 = 0.4813
Therefore, the 99% confidence interval for the population mean is 0.4787 inches to 0.4813
inches.
Example:
A random sample of 50 mathematics grades out of total of 200 showed a mean of 75 and a
standard deviation of 10. What are the (i) 95% (ii) 99% confidence limits for estimates of the mean
of the 200 grades?
Solution:
In usual notation, we are given that:
−
n= 50, N = 200, X = 75, s = 10
(i) For 95% confidence interval,
1 – α = 0.95 ⇒ α = 0.05
∴ Zα = Z0.05 = 1.96
The 95% confidence interval for the population mean is given by
− s N–n
C.I for µ = X ± Zα × ×
n N–1
10 200–50
= 75 ± 1.96× ×
50 200–1
Lower limit = 75 – 2.41 = 72.59
Upper limit = 75 + 2.41 = 77.41
Therefore, the 95% confidence interval for the population mean is 72.59 to 77.41
6
(ii) For 99% confidence interval,
1 – α = 0.99 ⇒ α = 0.01
∴ Zα = Z0.01 = 2.576
The 99% confidence interval for the population mean is given by
− s N–n
C.I for µ = X ± Zα × ×
n N–1
10 200–50
= 75 ± 2.576 × ×
50 200–1
= 75 ± 3.16
Lower limit = 75 – 3.16 = 71.84
Upper limit = 75 + 3.16 = 78.16
Therefore, the 99% confidence interval for the population mean is 71.84 to 78.16
Example:
Twelve bank teller were randomly sampled and it was determined they made an average of
3.6 errors per day with a sample standard deviation of 0.42. Construct a 90% confidence interval for
population mean of errors per day.
Solution:
In usual notation, we are given that:
−
n = 12, s = 0.42, X = 3.6
For 90% confidence interval,
1 – α = 0.90 ⇒ α = 0.10 and d.f = n – 1 = 12 – 1 = 11
∴ tα, (n – 1) = t0.01 (11) = 1.796
The 90% confidence interval for the population mean is given by
− s 0.42
C.I for µ = X ± tα, (n – 1) × = 3.6 ± 1.796 × = 3.6 ± 0.2177
n 12
Lower limit = 3.6 – 0.2177 = 3.38
Upper limit = 3.6 + 0.2177 = 3.81
Therefore, the 90% confidence interval for the population mean is 3.38 to 3.81
Example:
The following sample of seven observations from infinite population with a normal
distribution: 240, 260, 350, 350, 420, 510, 530
a) Find the sample mean and sample standard deviation
b) Construct a 98% confidence interval for population mean.
7
Solution:
In usual notation, we are given that:
a) Calculation of sample mean and sample standard deviation
X
(X – X–) (X – X–)2
240 – 140 19600
260 – 120 14400
350 – 30 900
350 – 30 900
420 40 1600
510 130 16900
530 150 22500
2660 76800
Here, n = 7, ΣX = 2660
The sample mean is given by
− ΣX 2660
X= = = 380
n 7
and the sample standard deviation is given by
( – 2 )
Σ X– X 76800
S= = = 12800
n–1 7–1
= 113.137
b) For 98% confidence interval,
1 – α = 0.98 ⇒ α = 0.02 and d.f = n – 1 = 7 – 1 = 6
∴ tα, (n – 1) df = t0.02 (6) = 3.143
The 98% confidence interval for the population mean is given by
− s 113.1371
C.I for µ = X ± tα, (n – 1) × = 380 ± 3.143 × = 380 ± 134.403
n 7
Lower limit = 380 – 134.403 = 245.579
Upper limit = 380 + 134.403 = 514.403
Therefore, the 98% confidence interval for the population mean is 245.597 to 514.403
The concept of proportion is the same as the concept of relative frequency distribution and the
concept of probability of success in binomial distribution. The relative frequency of a category or
class gives the proportion of the sample or population. Similarly, the probability of success in a
binomial experiment represents the proportion of the sample or population that possesses a given
characteristic.
8
Population Proportion (P) The population is obtained by taking the ratio of the number of
elements in a population with a specific characteristic to the number of elements in the population.
X
Population Proportion =
N
Where,
X = number of elements in the population that possesses a specific characteristic
N = total number of elements in the population
Sample proportion (p) the sample proportion is obtained by taking the ratio of the number of
elements in a sample with a specific characteristic to the number of elements in the sample.
x
Sample proportion =
n
Where
x = number of elements in the sample that possesses a specific characteristic
n = total number of elements in the sample
Confidence interval estimate of the population proportion (P)
The (1– α) 100% confidence interval (C.I) for the population proportion P is given by
C.I for P = p ± Zα × S.E. (p)
Where p = sample proportion
Zα = critical value of Z at α level of significance
S.E. (p) = Standard error of sample proportion
Standard error of sample proportion
Case I: When the population size is infinitely large or the sample is drawn with replacement (WR):
(i) If P is known
PQ
S.E. (p) =
n
(ii) If P is not known, then we use the sample proportion p for P, which is an unbiased
estimate of population proportion P, i.e., P$ = p.
Where,
Q = 1 – P, n = sample size, q = 1 – p
Case II: when the population size is finite and the sample is drawn without replacement (WOR):
(i) If P is known
PQ N–n
S.E. (p) = ×
n N–1
(ii) If P is not known
pq N–n
S.E.(p) = ×
n–1 N
Where,
Q =1 – p, q =1 – p, N = Population size, n = sample size
9
Example:
A sample poll of 100 voters chosen at random from all voters in a given district indicated that
55% of them were in favor of a particular candidate. Find the (i) 95%, and (ii) 99% confidence limits
for the proportion of all the voters in favor of this candidate.
Solution:
In usual notation, we are given that:
n= 100, p = 55% = 0.55, q= 1 – p = 1 – 0.55 = 0.45
(i) For 95% confidence interval,
1 – α = 0.95 ⇒ α = 0.05
∴ Zα = Z0.05= 1.96
The 95% confidence interval for the population proportion is
pq 0.55×0.45
C.I for P = p ± Zα × = 0.55 ±1.96× = 0.55 ± 0.098
n 100
Lower limit = 0.55 – 0.098 = 0.452
Upper limit = 0.55 + 0.098
Therefore, the 95% confidence interval for the population proportion is 0.452 to 0.648.
(ii) For 95% confidence interval,
1 – α = 0.99 ⇒ α = 0.01
∴ Zα = Z0.01 = 2.576
The 95% confidence interval for the population proportion is
pq 0.55×0.45
C.I. for P = p ± Zα × =0.55±2.576× = 0.55 ± 0.129
n 100
Lower limit = 0.55 – 0129 = 0.421
Upper limit = 0.55 + 0.129 = 0.679
Therefore, the 99% confidence interval for the population proportion is 0.421 to 0.679.
Example:
A random sample of 800 units from a large consignment showed that 200 were damaged.
Find 95% confidence limits for the population proportion of damaged units in the consignment.
Solution:
In usual notation, we are given that:
n = No. of units in the sample = 800
x = Number of damaged units = 200
x 200
p = Sample proportion of damaged units = = = 0.25
n 800
q = 1 – p = 1 – 0.25 = 0.75
For 95% confidence interval,
1 – α = 0.95 ⇒ α = 0.05
∴ Zα = Z0.05 = 1.96
The 95% confidence interval for the population proportion is
10
pq
C.I for P = p ±Zα ×
n
0.25×0.75
= 0.25±1.96×
800
= 0.25± 0.03
Lower limit = 0.25 – 0.03 = 0.22 and Upper limit = 0.25 + 0.03 = 0.28
Therefore, the 95% confidence interval for the population proportion of damaged units is 0.22
to 0.28.
One reason we usually conduct a sample survey and not a census is that always we have
limited resources at our disposal. In light of this, if a smaller sample can serve our purpose, then we
will be wasting our resources by taking a larger sample. For instance, suppose we want to estimate
the mean life of a certain auto battery. If a sample of 40 batteries can gives us the confidence interval
we are looking for, then we will be wasting money and time if we take a sample of a much larger
size- say, 500 batteries. In such cases, if we know the confidence interval and the width of the
confidence interval that we want, then we will find the approximate size of the sample that will
produce the required result.
µ)
1.4.1 Determining sample size for the estimation of population mean (µ
In order to determine the sample size for estimating population mean, the following factors
must be kept in mind:
(i) The desired confidence level.
(ii) The acceptable sampling error E.
(iii) The standard deviation σ.
To determine the sample size recall the relation:
−
X –µ
Z=
σ
n
σ −
or, Z × =X–µ
n
σ
or, Z × =E
n
or, Z × σ = E × n
Zα × σ 2
∴ n=
E
Where, σ = standard deviation
Z = critical value of Z at desired level of significance
−
E = X – µ = maximum permissible error
−
The difference between the sample mean X and the population mean µ, denoted by E, is
called the sampling error.
11
Example:
The average outstanding balance of loans issued by a bank varies from month to month. From
past experience it is known that the amounts are normally with a standard deviation of Rs 5,000. The
bank wishes to estimate the average by drawing a random sample such that the probability is 0.95
that the mean of the sample will not deviate by more than Rs. 600 from the universe mean. What
should be the sample size?
Solution:
In usual notation, we are given that:
σ = 5000
1 – α = 0.95 ⇒ α = 0.05
∴ Zα = Z0.05 = 1.96
−
E = |X – µ| = 600
Sample size (n) = ?
2 2
Zα × σ 1.96 × 5000
n=
E = 600 = 266.78 = 267
Example:
Mr. X want to determine on the basis of sample survey, the mean time required to complete a
certain job so that he may be 99% confident that the mean may remain with ±2 days of the true
mean. As per the available records the population variance is 64 days. How large should the sample
be for his study?
Solution:
In usual notation, we are given that:
1 – α = 0.99 ⇒ α = 0.01
∴ Zα = Z0.01 = 2.576
−
E = |X – µ| = 2
σ2 = 64
∴ σ=8
Sample size (n) = ?
2
Zα × σ
n =
E
2
2.576 × 8
=
2 = 106.17 = 106
1.4.2 Determining sample size for the estimation of population proportion (p)
Just as we did with the mean, we can also determine the sample size for estimating population
proportion. In order to determine the sample size for estimating population proportion, the following
factors must be kept in mind:
12
(i) The desired confidence level.
(ii) The acceptable sampling error E.
(iii) The estimated true proportion of success.
To determine the sample size, recall the relation:
p–P
Z=
PQ
n
PQ
or, Z × =p–P
n
PQ
or, Z × =E
n
PQ
or, Z2 × = E2
n
2
Zα
n = 2 ×PQ = × PQ
Zα 2
∴
E E
Where, E = p – P = maximum permissible error
Z = critical value of Z at desired level of significance,
Q=1–P
Remarks:
If the population proportion P is unknown, then we use its unbiased estimate provided by
sample proportion (p) or it may be estimated from the past experience or previous sample study or
pilot study. However if no such previous sample study has been carried out we do not have the value
of P or p then we assume P = 0.5 simply because this value results in a larger sample size when the
other conditions remains same.
Example:
The business manager of a large company want to check the inventory records against the
physical inventories by a sample survey. He wants to almost against the physical inventories by a
sample survey. He wants to almost assure that the maximum sampling error should not be more than
5% above or below the true proportion of the inaccurate records. The proportion of the inaccurate
records is estimated at 35% from the past experiences.
(i) Determine sample size.
(ii) How large sample should be taken when such estimate from the past experience is not
available
13
Solution:
In usual notation, we are given that:
E = |p – P| = 0.05
(i) P = 0.35, Q = 1 – P = 1 – 0.35 = 0.65
Since the level of significance is not given we take Zα= 3.
The sample size is obtained as,
Zα2 3 2
n= × PQ =
E 0.05 = 0.35 × 0.75 = 819
(ii) Since the population proportion (P) or sample proportion (p) both are unknown,
we take
P= 0.5 which results in the larger sample size.
P = 0.5, Q =1– P = 1 –0.5 = 0.5
Since the level of significance is not given we take Zα = 3
The sample size is obtained as,
Zα2 3 2
n= × PQ =
E 0.05 = 0.5 × 0.5 = 900
Example:
A sample of 50 students appearing in CMAT examination yields the error as 4 with standard
deviation of 16, In the study, if the sample size is increased to 80, how will the risk be affected, the
standard deviation, error remaining the same.
Solution:
In usual notation, we are given that:
n = 50, E =4, s= 16, risk (α) =?
We have,
2
Zα × σ
n=
E
2
Zα × s
n=
E [since, σ$ = s for large samples]
2
Zα × 16
or, 50 =
4
2 50 0.4616 0.4616
or, Zα = = 3.125
16
Zα = – 1.77 Z = 0 Zα = 1.77
∴ Zα = ±1.77
The area between Z = 0 to Z = 1.77 from normal table is 0.4616,
i.e., P(0 < Z < Zα = 1.77) = 0.4616
α
Thus, = 0.5 – 0.4616
2
or, α = 2 ×0.0384 = 0.0768 = 7.68%
∴ Risk (α) = 7.68%
14
Again, when sample size is increased to 80. Then
n = 80, E = 4, s = 16, risk (α) = ?
2
Zα × σ
n=
E
2
Zα × s
n=
E [since, σ = s for large samples]
2
Zα × 16
or, 80 =
4
2 80
or, Zα = =5
16
∴ Zα = ±2.24
The area between Z= 0 to Z = 2.24 from normal table is 0.4875
i.e. P(0 < Z < Zα = 2.24) = 0.4875
α
Thus, = 0.5 – 0.4875 0.4875 0.4875
2
or, α = 2 × 0.0125 Zα = – 2.24 Z = 0 Zα = 2.24
or, 0.025 = 0.5%
∴ Risk (α) = 2.5%
Hence, if the sample size is increased from 50 to 80, the risk decreases from 7.68% to 2.5%
Example:
What is the sample size required if the risk of error being ± 3 is 0.025 and the standard
deviation is assumed to be 10?
Solution:
In usual notation,
0.4875 0.4875
Risk (α) = 0.025, E = 3, σ = 10
We have, α = 0.025 – Zα Z =0 Zα
α
= 0.0125
2
The area between Z = 0 and Zα is 0.5 – 0.0125
= 0.4875.
∴ Zα = 2.24 (from normal table)
Now, the required sample size is given by
2 2
Zα × σ 2.24 × 10
n=
E = 3 = 55.75 = 56
15
Theoretical Questions
1. What is Estimation? Differentiate between estimator and estimate.
2. What are the main important criteria for a good estimator?
3. Develop the concept of point estimation and interval estimation.
4. What is confidence level? Give its significance in inferential statistics.
Numerical Problems
1. The mean income of 100 employees of a factory was found to be Rs. 5,000 with standard
deviation of 60. Find 95% confidence interval for the mean income of all ht employees of the
factory.
2. The quality control manager at a factory manufacturing light bulbs is interested to estimate
the average life of a large shipment of light bulbs. The standard deviation is known to be 100
hours. A random sample of 50 light bulbs gave a sample average life of 350 hours. Set up a 95
percent confidence interval estimate of the true average life of light bulbs in the shipment.
3. The mean height obtained from a sample of size 100 taken randomly from a population is 164
cm. If the standard deviation of the height distribution of the population is 3 cm, set up 95%
confidence limits for the mean height of the population.
4. A random sample of 50 sales invoices was taken from a large population of sales invoices.
The average value was found to be Rs. 2000 with a standard deviation of Rs. 540. Find a 90%
confidence interval for the true mean value of all the sales.
5. A random sample of 200 consumers at a large brokerage firm is selected for estimating the
mean number of transactions per year for each consumer. The sample mean is 43 and standard
deviation is 12. Determine 99% confidence limits for the mean umber of all customer
accounts of the firm.
6. The mean and variance of a random sample of 64 observations were compute as 160 and 100
respectively. Compute the 95% confidence limits for population mean.
7. A random sample of 100 ball bearings selected from a shipment of 2000 ball bearings has an
average diameter of 0.354 cm with a standard deviation of 0.048. Find the 95% confidence
interval for the average number of these 2000 ball bearings.
8. A sample of 500 bulbs of a company an average life of 1400 hours with standard deviation of
30 hours. Obtain (i) 95% and (ii) 99% fiducial limits for the population mean.
9. The quality control manager at a light bulb factory needs to estimate the mean life of large
shipment of life bulbs. The process standard deviation is known to be 100 hours. A random
sample of 64 light bulbs indicated a sample mean life of 350 hours.
a. Set up a 95% confidence internal estimate of the true population mean life of light bulbs
in this shipment.
b. Do you think that the manufacturer has the right to state that the light bulbs last an
average of 400 hours?
16
10. Seven homemakers were randomly sampled, and it was determined that the distance they
walked in their housework had an average of 39.2 miles per week (mpw) and a sample
standard deviation of 3.2 miles per week (mpw). Construct a 95% confidence interval for the
population mean.
11. 400 apples are taken from a large consignment and 50 are found to bad. Estimate the
percentage of bad apples in the consignment and assign the limits within which the percentage
lies.
12. Suppose we want to estimate the proportion of families in a town which have two or more
children, A random sample of 144 families shows that 48 families have two or more children.
setup a 95 per cent confidence interval estimate of the population proportion of families
having two or more children.
13. Out of 20,000 customer's ledger accounts, a sample of 600 accounts was taken to test the
accuracy of posting and balancing wherein 45 mistakes were found. Assign limits within
which the number of defective cases can be expected at 5% level of significance.
14. A factory is producing 50,000 pairs of shoes daily. From a sample of 500 pairs, 2% were
found to be substandard quality. Estimate the number of pairs that can be reasonably expected
to be spoiled in the daily production and assign limits at 95% level of confidence.
15. An auditor for an insurance company would like to determine the proportion of claims settled
by the company within 2 months of the receipt of the claim. A random sample of 200 claims
is selected, and it is determined that 80 were paid the money within 2 months of the receipts
of the claim. Setup a 99 per confidence interval estimate of the population proportion of the
claims paid within 2 months.
16. A race car driver tested his car for time from 0 to 60 mph, and in 20 testes obtained an
average of 4.85 second with a standard deviation of 1.47 seconds. Calculate a 95% confidence
interval estimate for the time for 0 to 60 mph.
17. A city health department wishes to determine the mean bacteria count per unit volume of
water at lake beach. Researcher have collected 10 water sample of unit volume and have
found the bacteria counts to be: 175, 190, 215, 198, 184, 207, 193, 196, 180, 210
Assume that the measurement constitute a sample from a normal distribution. Construct a
95% confidence interval for the mean bacteria per count per unit volume of water at the lake
beach.
18. In a study of time and motion of factory, the supervisor estimates that the standard deviation
to be 0.95 seconds. If you want to be 95% confident that the error will not exceed 0.01
second. What should be the size of the sample to estimate population mean?
19. A researcher wishes to estimate the mean of population by using sufficiently large sample.
The probability is 0.95 that sample mean will not differ from the true mean by more than,
25% of the standard deviation. How large sample should be taken?
17
20. A researcher wants to estimate universe mean by using sampling technique. What should be
the sample size when the permissible error between parameter value and sample statistic in
95% of chance will not be more than 1.5 and population standard deviation is 15.
21. A manufacturing concern wants to estimate the average amount of purchase of its product in a
month by the customers. If the standard deviation is Rs. 10, find the sample size if the
maximum error is not to exceed Rs. 3 with a probability of 0.99.
22. In measuring reaction time, a psychologist estimated that standard deviation is 1.08 seconds
obtained from a random sample of size 240. What will be the sampling error with 99% level
of confidence?
23. The principal of a college wants to estimate the proportion of smokers among his students.
What size of a sample should be selected so as to have the proportion of smokers not to
exceed by 10% with almost certainty?
(i) It is believed from previous records that the proportion of smokers was 0.30.
(ii) How large should be the sample if there is no such previous estimates are available?
24. It is desired to estimate the proportion of the junior executives who change their first job
within the first five years. This proportion is to be estimate within 3% of error and 0.95 degree
of confidence is to be used. A study revealed that 30% of such junior executives changed their
first job within 5 years.
(i) How large a sample is required to update the study?
(ii) How large should be the sample if the no such previous estimates are available?
25. A firm wishes to estimate with an error of not more than 0.03 and a level of confidence 98%,
the proportion of consumers that prefers its brand of household detergent. Sales reports
indicate that about 0.20 of all consumers prefer the firm's brand. What is the requisite sample
size?
26. A Unilever company of cosmetic product wishes to estimate the proportion of people who
like their product. How large sample should be taken so that there should not be error greater
than 2% with risk of 4.56% given that the proportion of the people like their product is
equally likely?
27. Suppose the sample standard deviation of P/E ratios for stocks on the Nepal stock exchange
(NEPSE) is 7.8. Assume that we are interested in estimating the population mean of P/E ratio
for all stocks listed on NEPSE with 95% confidence. How many stocks should be included in
the sample if we desire a margin of error of 2?
28. If the population proportion of success is 0.65 and n=100, what will be the value of sampling
when acceptance region is 0.95?
29. A sample of 80 MBS students appearing in first year examination yields the error as 4 with
standard deviation of 16. What is risk of error being 4. In the study, if the sample size is
decreased to 50, how will the risk be affected, the standard deviation, error remaining the
same.
18
30. A sample of 50 students appearing in CMAT examination yields the error as 4 with standard
deviation of 16. In the study, if the sample size is increased to 80, how will the risk be
affected, the standard deviation, error remaining the same.
31. How large sample should be taken to keep the risk of error being ± 5 is 0.0456? It is provided
that standard deviation is 20.
32. We have strong indications that the proportion is around 0.7. Find the sample size needed to
estimate the proportion within ± 0.02 with a confidence level of 90 percent.
33. Given a population with a standard deviation of 8.6, what sample size is needed to estimate
the mean of population within ± 0.5 with 99 percent confidence?
– σ – σ
d. X – Zα < µ < X + Zα
n n
2 2
16. Which of the following formula can be used to determine the sample size for the
estimation of the population proportion.
2 P (1 – P) Zα/2.σ 2
a. n= zα b. n=
E2 E
p (1 – P) (N – n)
c. n= d. All of the above
n α–1
17. P= 0.32, how large a sample is required if we want to 95% confidence that the sampling
error will not exceed to 0.02?
a. 400 b. 2090 c. 4400 d. 20000
18. σ = 0.03, how large a sample is required in order to be 95% confidence that the error
will not exceed to 0.05?
a. 30 b. 139 c. 100 d. 1015
19. Bias of an estimator can be:
a. positive b. negative c. negative d. always zero
20. If = 0.01, then we obtain,
a. 99% confidence level b. 90% confidence interval
c. 0.01 degree of confidence d. 0.01 confidence coefficient
21. A sample size 28 yield fisher z-value is 0.74. What is the 95% confidence interval for ?
a. 0.25 < p < 0.89 b. 0.35 < p < 1.13 c. 0.34 < p < 0.81 d. 0.25 < p < 0.80
21
22. When the value of population standard deviation is unknown the values of t in the t-
distribution are
a. More variable than for z b. Less variable than for z
c. Equal to z d. None of these
23. The interval estimate of a population mean with large sample size and known standard
deviation is given by
a. ± / b. ± / c. ± / d. ± /
24. Which part of the area under the normal curve is represented by the coefficient / ?
a. left tail b. right tail c. both tails d. none of these
25. If = 85, = 8 and n = 64, then standard error of sample mean is equal to
a. 1 b. 1.96 c. 2.576 d. none of these
26. If = 25, = 5 and n = 25, the margin of sampling error at 95 per cent confidence is
a. 1 b. 1.96 c. 2.576 d. none of these
27. Sampling distribution is usually the distribution of
a. parameter b. statistic c. mean d. variance
28. The criteria for the best estimator are
a. consistency and efficiency b. unbiasedness and sufficiency
c. consistency and sufficiency d. all of the above
29. An unbiased estimator is necessarily
a. consistent b. not consistent
c. efficient d. none of these
30. is consistent estimator of if → ∞
a. var ( ) → b. var ( ) = c. var ( ) = ∞ d. var ( ) → 0
31. If a normally distributed population has standard deviation, = 1, then the total width
of the 95 per cent confidence interval for the population mean is
a. 1.28 b. 1.64 c. 1. 96 d. None of these
1. a 2. c 3. d 4. d 5. c 6. b 7. a 8. b 9. c 10. d
11. c 12. c 13. b 14. d 15. d 16. a 17. b 18. d 19. c 20. b
21. b 22. c 23. c 24. d 25. d 26. b 27. b 28. c 29. c 30. c
31. d
XXX
22
Test of Significance for Large Samples
Unit
To understand basic principles of hypothesis testing
To know how to establish null and alternative hypothesis
about a population parameter
To develop hypothesis testing methodology for accepting
or rejecting null hypothesis
To understand type – I and type – II errors and its
implications in making a decision
To compute and interpret the critical value approach and
P-value approach.
How to use hypothesis testing to test mean and proportion
for large sample
Use the test statistic, under large sample test (Z-test) to
test the validity of a claim about the true value of any
population parameter
To compute and interpret confidence limits.
23
2.1 Introduction
While inferring statistically about a population parameter on the basis of random sample
drawn from the population, we face two different types of problems. In the fires situation, the
population under discussion is completely unknown to us and we would like to guess about the
population parameter(s) from our knowledge about the sample observations. In the second situation,
some information about the population is already available and we would like to verify how far that
information is valid on the basis of the random sample drawn from that population. First aspect is
known as estimation and the second aspect is known as tests of significance.
Thus, estimation theory and testing of hypothesis are integral parts of statistical inference. In
estimation theory, we learned how to estimate the values of population parameters. In this chapter,
we will introduce how to test the values of population parameters.
Hypothesis- Any assumption regarding the population parameters is called hypothesis. In the
other words to predict the result of an event before experiment is called hypothesis.
Null hypothesis- The basis assumption regarding population parameters which can be tested is
called null hypothesis. In the other word any statement that may difference between observed sample
statistic and specified population parameter is due to the sampling error is called null hypothesis. It
is a hypothesis of no difference. It refers to of no difference. It refers to a specified value of the
population parameter. It represents the default possibility that we will accept unless we have
convincing evidence to the contrary. The Null Hypothesis is denoted by H0. Suppose that the
population has a specified mean value say µ0 the Null Hypothesis is set up as H0:µ0, i.e., the
population has a specified mean value µ0 the Null Hypothesis is set up as
H0: µ = µ0 i.e., the population has a specified mean value µ0.
Consider an example, a manufacturer of dairy milk claims that, on an average, its packet
contains 1000 ml of milk. In reality, this claim may or may not be true. However, we will initially
assume that the manufacture's claim is true. To test claim of the manufacturer, the Null Hypothesis
will be set up as
H0: µ = 10000 ml., i.e., the claim of the manufacturer's is true.
The manufacture's claim will be true if all the packets, on an average, contain 100 ml. of milk.
24
2.2.2 Alternative Hypothesis
It is the complementary statement of Null Hypothesis and represents the conclusion supported
if the Null Hypothesis is rejected. The Alternative Hypothesis is denoted by H1. Suppose that the
population has a specified mean value say µ0, the Alternative Hypothesis may be
H1: µ ≠ µ0, i.e., the population has not a specified mean value µ0.
H1: µ > µ0, i.e., the population mean is greater than the specified value µ0.
H1: µ < µ0, i.e., the population mean is less than the specified mean value µ0.
Suppose a consumer protection group wishes to test the claim of the manufacturer's. In order
to test the claim of the manufacture for the above example, the Alternative Hypothesis will be set up
as
H1: µ < 1000 ml., i.e., the manufacture's claim is false.
The manufacture's claim is false if its milk packets contain, on an average, less than 1000 ml.
of milk.
Note:
i. In the above example, we don't set up Alternative Hypothesis as H1 : µ > 1000 ml.
because if all the packets contain more than 1000 ml. of milk we considered the
manufacture's claim is also valid. Here, we formulate Alternative Hypothesis from the
consumers' side. Consumer's side in that sense if each packet of milk contains more
than 1000 ml. of milk, then it is advantage for the consumer because consumer gets
more quantity of milk at the same cost. We only wish to test; in reality each packet will
contain less than 1000 ml. of milk.
ii. Suppose, if the manufactures wishes to test owns claim, then manufacturer set up
Alternative Hypothesis for his owns sake as
H1: µ ≠ 100 ml. Here, he wishes to determine if each packets of milk contains in reality
significantly less than 1000 ml or more than 1000 ml. The manufacturer always wishes
to fill the amount of milk in each packets either exactly or approximately 100 ml. If he
fills more than 100 ml than it will be disadvantage for the manufacturers because at
the same cost he has to sell more amount of milk whereas if each packet of milk
contains less than 1000 ml. of milk he may loose then consumer from the markets.
Set up the null and Alternative Hypothesis for the following examples:
1. To test if the per capita income of Nepalese people (i) different from $ 240, (ii) less than $
240, (iii) more than $ 240.
Null Hypothesis H0: µ =$240, i.e., the average per capita income of Nepalese people is $ 240.
Alternative Hypothesis
i. H1: µ ≠ $ 240, i.e., the average per capita income of Nepalese people is different from
$240.
ii. H1: µ < $240 i.e, the average per capita income of -Nepalese people is less than $ 240.
iii. H1: µ > $240 i.e., the average per capita income of Nepalese people is more than $ 240.
25
2. Suppose a potato chip manufacture is concerned that the bagging equipment may not be
functioning properly when filling 10 oz bags. To test the concern of the manufacturer, we set
up hypotheses as
Null Hypothesis H0: µ = 10 oz, i.e, the bagging equipment is not functioning properly.
Alternative Hypothesis H1: µ ≠10 oz, i.e, the bagging equipment is not functioning properly.
The bagging equipment is working properly if it puts, on average, 10 oz in each bag. If it puts
significantly more than or less 10 oz in each bag, the equipment is considered as defective.
That is, the bagging equipment is not functioning properly.
3. Suppose we are purchasing 3.5 inches disks. The company claims that only 4% of the disks it
manufacturers are defective. To test the manufacturer's claim, we set up hypotheses as
Null Hypothesis H0: P = 0.04, i.e., the claim of the company is true.
Alternative Hypothesis H1: P > 0.04 i.e., the claim of the company is false.
4. Suppose a company has implemented a new advertising program in the hopes of increasing
sales from last year's annual average of Rs. 10 millions. Test the new advertising program was
successful. Here we set up hypotheses as
Null Hypothesis H0: µ = Rs.10 millions, i.e., the new advertising program was not successful.
Alternative Hypothesis H1: µ > Rs.10 millions, i.e., the new advertising program was
successful.
The advertising program is successful is the sales in the current year should be greater than
last year sales. If it is less or equal to last year sales, the new advertising program is not
successful.
5. If we think about the judicial system in terms of a hypothesis test, how would we set up the
null and the alternative hypotheses?
Null Hypothesis H0: the person is innocent.
Alternative Hypothesis H1: the person is not innocent.
Since will be deciding between the null and the null and the alternative hypotheses using only
sample information, there is always a chance that we may be wrong. That is, we may choose to
believe the Null Hypothesis when, in fact, it is not true. Alternatively, we may choose to believe the
Alternative Hypothesis when, in fact, it is not true. There are two ways that we could be wrong when
we perform a hypothesis test, which is presented in the following table 5.1.
Table 5.1 Two Types of Errors
Actual Situation
Statistical Decision
H0 is True H0 is False
Reject H0 Type I error Correct Decision
Accept H0 Correct Decision Type II error
Thus, there are two types of errors in decision making process.
1. Type I error
2. Types II error
26
2.3.1 Type I Error
The error committed in rejecting true Null Hypothesis is known as Type I error. A Type I
error is made when we reject the Null Hypothesis and the Null Hypothesis is actually true. The
probability of making Type I error is denoted by α. That is,
α = P [Reject H0 | When H0 is true]
Since producer have to bear this kind of risk, the Type I error is also called producer's risk.
The probability of committing a Type I error is referred to as the level of significance of the
statistical test. In other words, probability of rejecting null hypothesis. When it is true is called level
of significance. It is denoted by α. The probability of making a correct decision is (1 – α). The most
commonly used level of significance in practice is 5% and 1%. If we are using level of significance
as 5%, we shall mean that probability of committing Type I error is 0.05. It also means that we are
95% confident that a correct decision has been made.
27
The confident coefficient
The confidence coefficient, denoted by 1 – α, is the complement of probability of Type I
error. More precisely, it is the probability that the null; hypothesis H0 is not rejected when in fact it is
true and should not be rejected. The confidence level of a hypothesis test is (1 – α) × 100%.
A test statistic is a value which is calculated from the sample data. The value of test statistics
is used to decide whether the Null Hypothesis should be accepted or rejected in our hypothesis test.
Sample statistics – Population parameter
Test statistic =
standard error of sample statistic
The choice of a test statistic is guided by the sample size and the value of the population
standard deviation (σ) as shown in the following table:
Choice of Probability Distribution
σ)
Population Standard Deviation (σ
Sample Size
Known Unknown
n ≥ 30 Z-test Z test
n < 30 Z-test t-test
The sampling distribution of the test statistic is divided into two regions, a region of rejection
(or critical region), and a region of acceptance. If the test statistic falls into the region of acceptance,
the Null Hypothesis accepted. If the value of test statistic lies in the rejection region, the Null
Hypothesis is rejected. The value that divides the region of rejection and the region of acceptance is
called critical value. The critical value is obtained from the standard table.
Rejection Rejection
Region Region
Acceptance
Region
Critical
Value
In testing of hypothesis problem, either the rejection region can be on both sides or it can be
on the left side or right side of the distribution curve. A test with two rejection regions is called a
two-tailed test; a test with one rejection region is called a one-tailed test. The one-tailed test is called
a left-tailed test if the rejection region is in the left tail of the distribution curve, and it is called a
right-tailed test if the rejection region is in the right tail of the distribution curve.
28
2.7.1 A Two Tailed Test (TTT)
Let us consider an example, according to the Central Bureau of Statistics. The average per
capita income of people of Nepal was $ 240 in 2001. We wish to test whether or not the average per
capita has changed since 2001. The key word here is changed. The average per capita income has
changed if it has either increase or decreased during the period since 2001. This is an example of
two-tailed test. Here we set up null and Alternative Hypothesis as
Null Hypothesis H0: µ = $240, i.e., the average per capita income is $ 240 or the average per
capital income has not changed.
Alternative Hypothesis H0: µ ≠ $240, i.e., the average per capita income is not $240 or the
average per capital income has changed.
Consider another example; the manager of a XYZ bank reporter that the mean transaction per
day was worth of Rs. 100 millions or significantly different from Rs. 100 millions. The key word
here is significantly different. The man transaction is either more or less than Rs. 100 millions. This
is also an example of two-tailed test. Here we set null and Alternative Hypothesis as
Null Hypothesis H0: µ ≠ Rs. 100 millions, i.e., the means transaction per day is Rs. 100
millions or the mean transaction per day is not significantly different from Rs. 100 millions.
Alternative Hypothesis H1: µ ≠ Rs. 100 millions, i.e., the means transaction per day is Rs. 100
millions or the mean transaction per day is not significantly different from Rs. 100 millions.
Thus, a two-tailed test of the population mean has the following null and alternative
hypothesis:
α/2 α/2
1–α
Remarks:
iii. If the Alternative Hypothesis has a not equal to (≠) sign, as in above example, it is a two tailed
test.
29
2.7.2 A Left-tailed Test (LTT)
Reconsider an example of the average per capita income of people of Nepal. Now, we wish to
test whether the average per capita income is less than $ 240. The key word here is less than, which
indicates left-tailed test. Here we set null and alternative hypotheses as
Null Hypothesis H0: µ = $ 240, i.e., the average per capital income is $240
Alternative Hypothesis H1: µ < $ 240, i.e., the average per capital income is less than $240.
Again, reconsider the example of mean transaction per day of XYZ bank has Rs. 100
millions. Now we wish to test whether the mean transaction per day is less than Rs. 100 millions.
The key word here is less than, which indicates left-tailed test. In a left tailed test, the rejection
region is in the left tail of the distribution curve as shown in the following figure and the area of the
rejection region is equal to the level of significance (α). Here we set null and alternative hypotheses
as
Null Hypothesis H0: µ = Rs. 100 millions, i.e., the mean transaction per day is Rs. 100
millions, or, the mean transaction per day is not significantly different from Rs. 100 millions.
Alternative Hypothesis H1: µ = Rs. 100 millions, i.e., the mean transaction per day is less than
Rs. 100 millions.
Thus, a left-tailed test of the population mean has the following null and alternative
hypotheses:
Null Hypothesis H0: µ = µ0 (A specified number)
Alternative Hypothesis H1: µ < µ0 (A specified number)
α
(1 – α)
Remarks:
How to detect left-tailed test.
i. Problem statement has the keyword less than, decreased, reduced, at least inferior, minority,
below, smaller, shorter etc.
ii. A left-tailed test is used if the population parameter has shifted to a number less than a
specified number.
iii. If the Alternative Hypothesis has a less than (<) sign, in this case, the test is always left-tailed.
(1 – α)
Remark:
How to detect right-tailed test
i. Problem statement has the keywords, greater than, increased, more than, at most above
superior, enhance improvement, gained, etc.
ii. A right-tailed test is used if the population parameter has shifted to a number more than a
specified number.
iii. If the Alternative Hypothesis has a greater than (>) sign, in this case, the test is always right-
tailed.
In general, we look for the comparative words stated in the question in order to detect the
tailed of the test statistic. The most commonly used words in the problem statement are significantly
different, unbiased, only, changed, same, no longer than, less than, more than, equal to, inferior,
superior, above, below, at least, at most, minority, majority, taller, smaller, shorter, effective,
improvement, enhance, decrease, increase, better etc.
The p-value approach is an alternative approach (method) to the decision making process in
hypothesis testing. This approach has become more popular in recent years. In this approach, the
probability value or tail area under the curve of the test statistic used in the hypothesis is to be
determined and their probability value is called p-value. This is why, the p-value is the area defined
by the value of the test statistic calculated from the observed data set and the alternative hypothesis
used in the hypothesis test.
Definition: The p-value of a hypothesis test is defined as the observed smallest significant
value at which the null hypothesis H0 is rejected. The p-value is given by
P-value = probability that the test statistic T (say) is taking the values as extreme as the value
of T calculated from the observed data set.
The p-value is compared with the pre-assigned level of significance in order to make decision
whether to reject or to accept the null hypothesis H0 being tested.
In the p-value approach to the hypothesis testing the
Step 1: Setting of null and alternative hypothesis (H0 and H1),
Step 2: Choosing of a level of significance α and Step 3: Computation of a test statistic T, are
exactly the same as given in the procedure for classical (or traditional) approach of hypothesis
testing.
Step 4: Finding of p-value: One of the most important steps in the p-value method of
hypothesis testing is to find p-value corresponding to the observed value of the test statistic. The p-
value is specifically determined from the table of the test statistic T used as given below:
i. Fro right tailed test
p-value = P (Z ≥ Zcal) = P0 (Say)
or, p-value = P (T ≥ Tcal) = P0 (Say)
= Tail area under the curve of the test statistic T on the right of the Tcal
ii. For left tailed test
p-value = P (Z ≤ Zcal) = P0 (Say)
or, p-value = P (T ≤ Tcal) = P0 (Say)
iii. Fro two tailed test
p-value = P (|Z| ≥ Zcal) = 2P0 (Say)
or, p-value = 2P (|T| ≥ Tcal) = 2P0 (Say)
32
Step 5: Decision rule: The decision will be made by comparing the p-value with the pre-
fixed value of α.
If the p-value is less than or equal to α, then the decision will be reject H0 and accept H1.
Otherwise the decision will be accept H0. That is
(i) For on(right and left) tailed test
a) If p-value, p0 ≤ α , then reject H0 and accept H1
b) If p-value, p0 > α, the accept H0
ii) For two tailed test
a) If p-value, 2 p0 ≤ α, then reject H0 and accept H1
b) If p-value, 2 p0 > α, then accept H0
Step 6: Drawing inference or conclusion: The conclusion should be drawn in words as in
the same manner as worded in the classical approach of hypothesis testing.
Now, we see how the problems of testing the hypothesis is solved using this p-value
approach.
33
Flow Chart for Testing of Hypothesis
Start
Select an appropriate
Level of Significance
Calculate appropriate
Test Statistics (T)
Is T ≤ C ?
Yes No
Accept H0 Reject H0
In this section we discuss the tests of significance when the samples are large. For practical
purpose, the sample is considered as large if n ≥ 30. The larger sample is generally desirable when
the units in the population under study are not homogeneous or uniform. To get more reliable results
about the population parameter, small sample is not sufficient for heterogeneous population, so large
sample test has to be carried out.
34
2.11.1 Assumptions of Z-test
The Z-test is under the following assumptions:
1. The sample size is large, n ≥ 30.
2. The population from which the samples are drawn is normally distributed.
3. The population standard deviation is known.
4. The samples are independent.
35
α)
Step 2: Level of significance (α
Choose the appropriate level of significance in advance. The most commonly used is α = 5%
unless otherwise stated.
Step 3: Test Statistic
Under H0, the test statistic is
– –
X–µ X–µ
Z = =
–
S.E. X σ
n
σ
– ( )
Where, S.E. X = Standard error of mean
n
–
X = Sample mean
µ = Population mean
σ = Population Standard deviation
n = Sample size
If the population standard deviation (σ) is unknown then we use estimate of population
standard deviation provided by the sample standard deviation. i.e., σ$ = s.
Then,
–
X–µ
Z= ~ N (0, 1)
s
n
Step 4: Critical Value
The critical or tabulated value of the test statistic Z at the pre-specified level of significance is
obtained from the area under normal table.
Table of the Normal Distribution
(Critical Values Zα of Z)
Level of Confidence (1 – α))
Nature of Alternative
99% 98% 96% 95% 90%
Hypothesis
α)
Level of Significance (α
1% 2% 4% 5% 10%
Two tailed test ±2.576 ± 2.326 ± 2.054 ± 1.96 ± 1.645
Right tailed test 2.326 2.054 1.751 1.654 1.282
Left tailed test – 2.326 – 2.054 – 1.751 – 1.645 – 1.282
Step 5: Decision
i. If the calculated value of Z is less or equal to the tabulated value of Z, then we accept H0, i.e.,
the population means has a specified value µ0. In other words, there is no significant
difference between sample mean and the population mean, or, the sample has drawn from a
normal population with population mean µ0.
36
ii. If the calculated value of Z is greater than the tabulated value of Z, then we reject H0, i.e., the
population mean has not a specified value µ0. In other words, there is significance difference
between sample mean and the population mean, or, the sample has not drawn from a normal
population with population mean µ0.
Remarks:
Confidence limits for: (1 – α)% confidence limits for the population mean is given by
σ
– – ( )
–
(1 – α)% C.I. for µ = X ± Zα S.E. X = X ± Zα ×
n
For example, 95% confidence limits for µ is given by
σ
– – ( )
–
95% C.I. for µ = X ± Zα S.E. X = X ± 1.96 ×
n
Note: While setting up alternative hypothesis, theoretically we discussed two-tailed
alternative, right tailed alternative and left-tailed alternative. But while solving a particular
problem, we have only one alternative hypothesis. It may be two tailed, right tailed or left
tailed. Whether the alternative is one tailed or two tailed, it all depends on the nature of the
problem.
Example:
A sample of 50 pieces of a certain type of string was tested. The mean breaking strength tuned
out to be 14.5 Kgs. Test whether the sample is from a batch of strings having a mean breaking
strength 15.6 Kgs. and standard deviation of 2.2 Kgs.
Solution:
We are given,
–
n = 50, X = 14.5 Kgs, µ = 15.6 kgs, σ = 2.2 kgs.
Setting up Hypotheses:
Null Hypothesis H0: µ = 15.6 Kgs, i.e., the mean breaking strength of a batch of strings is
15.6 kgs.
Alternative Hypothesis H1: µ ≠ 15.6 Kgs, i.e., the mean breaking strength of a batch of strings
is not 15.6 kgs. [Two-tailed test]
Level of Significance: Since the level of significance is not given we take α = 0.05.
Test Statistic: Under H0, the test statistics is
–
X – µ 14.5 – 15.6
Z = = = – 3.536
σ 2.2
n 50
∴ | Z| = 3.536
Critical Value: The critical or tabulated value of the test statistics Z at 5% level of
significance for two tailed test ± 1.96, i.e., |Z0.05| = 1.96.
37
Decision: Since the calculated value of test statistic |Z| = 3.536 is greater than the critical
value |Z0.05| = 1.96, H0 is rejected and H1 is a accepted. Hence we conclude that the mean
breaking strength of a batch of strings is not 15.6 kgs or the sample is not from a batch of
strings having a mean breaking strength of 15.6 Kgs.
Example:
A random sample of 100 items is taken from a normal distribution whose mean and the
standard deviation are 4 and 0.8 respectively. Can the sample with mean 4.2 be regarded as truly
random sample at 5% level of significance? Make decision through critical value approach and P-
value approach. Make decision through critical value approach and p-value approach.
Solution:
–
We are given, n = 100, µ = 4, σ = 0.8, X = 4.2
Setting up Hypotheses:
Null Hypothesis H0: µ = 4, i.e., the population mean is 4. In other words, the sample is drawn
from a normal population with mean 4 or the sample can be regarded as truly random sample.
Alternative Hypothesis H1: µ ≠ 4, i.e., the population mean is not equal to 4. In other words,
the sample is not drawn from a normal population with mean 4 or the sample cannot be regarded as
truly random sample. [Two-tailed test]
Level of significance: It is given that the level of significance α = 5%.
Test Statistic: Under H0,
–
X – µ 4.2 – 4
Z = =
σ 0.8
n 100
∴ | Z| = 2.50
Critical Value: The critical or tabulated value of the test statistic Z at 5% level of
significance for two tailed test is 1.96 i.e., |Z0.05| = 1.96
Decision: since the calculated value of test statistic is greater than the critical value of test
statistic, H0 is rejected and H1 is accepted. Hence we conclude that the sample cannot be regarded as
truly random sample.
Using P-value approach, for two tailed test
P-value = P [|Z| ≥ Zcal]
= [|Z| ≥ 2.50]
= P [– 2.50 ≤ Z ≤ 2.50]
= 2P [Z ≥ 2.50] by symmetry – 250 Z =0 2.50
= 2 [0.5 – P (0 ≤ Z ≤ 2.50)]
= 2 [0.5 – 0.4938] = 2 × 0.0062 = 0.0124
Decision: Since, P-value i.e. 2P0 = 0.124 < 0.05 = α, Therefore, we rejected the null
hypothesis (H0) and accepted the alternative hypothesis (H1)
38
Example
A sample of 900 members has a mean 3.4 cm and standard deviation 2.61 cm. Cans the
sample be regarded as one drawn from a population with mean 3.25 cm. Using the level of
significance as 0.05, is the claim is acceptable? Also calculate the 95% confidence limits for the
population mean. Make decision through critical value approach and p-value approach.
Solution:
We are given,
−
n = 900, X = 3.4 cm, s = 2.61 cm, µ = 3.25
Setting up Hypotheses:
Null Hypothesis H0: µ = 3.25, i.e. the sample is regarded as a random sample drawn from a
population with mean 3.25 cm.
Alternative Hypothesis H1: µ ≠ 3.25 i.e., the sample is not regarded as a random sample
drawn from a population with mean 3.25 cm. [Two-tailed test]
Level of significance: It is given that the level of significance α = 5%.
Test Statistics: Under H0, the test statistic is
−
X–µ
Z =
s/ n
4.2 – 4
=
0.8/ 100
∴ Z = 1.724
Critical Value: The critical or tabulated value of the test statistics Z at 5% level of
significance for two tailed test is ± 1.96. i.e., |Z0.05| = 1.96.
Decision: Since the calculated value of test statistic Z = 1.724 is less than the critic values
|Z0.05| = 1.96, H0 is accepted. Hence we conclude that the sample is regarded as a random
sample drawn from a population with mean 3.25 cm.
Using P-value approach, for two tailed test
P-value = P [|Z| ≥ Zcal]
= [|Z| ≥ 1.72]
= P [– 1.72 ≤ Z ≤ 1.72]
= 2P [Z ≥ 1.72] by symmetry
= 2 [0.5 – P (0 ≤ Z ≤ 1.72)]
= 2 [0.5 – 0.4573]
– 1.72 Z=0 1.72
= 2 × 0.0427
= 0.0857
Decision: Since, P-value i.e. 2P0 = 0.0857 > 0.05 = α, Therefore, we accepted the null
hypothesis (H0).
39
For 95% Confidence limits for µ
s
– –
95% C.I. for µ = X ± Z0.05 S.E. X ( ) = X– ± 1.96 × n
2.61
= 3.4 ± 1.96 × = 3.4 ± 0.17052
900
Lower confidence limit = 3.4 – 0.17052 = 3.22948
Upper confidence limit = 3.4 + 0.17052 = 3.57052
Example:
A moped manufacturer hypothesized that the mean miles per gallon for its moped is 115.2. It
takes a sample of 49 mopeds and finds that sample mean to be 117.4 per gallon. If the population
standard deviation is known against Alternative Hypothesis that the true mean miles per gallon is
115.2 against Alternative Hypothesis that it is greater than 114.2 using the 0.05 significance level.
Make decision through critical value approach and p-value approach.
Solution:
–
We are given, µ = 115.2, n = 49, X = 117.4, σ = 8.4
Setting up Hypotheses:
Null Hypothesis H0: µ = 115.2 miles per gallon, i.e., the true mean mileage per gallon of
moped is 115.2 miles per gallon.
Alternative Hypothesis H1 : µ > 115.2 Kgs, i.e., the true mean mileage per gallon of moped is
greater than 115.2 miles per gallon. [Right-tailed test]
Level of significance: It is given that the level of significance α = 5%.
Test Statistic: Under H0, the statistic is
−
X– µ 117.4 – 115.2
Z = = = 1.833
σ/ n 8.4/ 49
∴ Z = 1.833
Critical value: The critical or tabulated value of the test statistic Z at 5% level of significance
for right tailed test is 1.645, i.e., Z0.05 = 1.645.
Decision: Since the calculated value of test statistic Z = 1.833 is greater than the critical value
Z0.05 = 1.645, H0 is rejected and H1 is accepted Hence we conclude that the true mean
mileage per gallon of moped is greater than 115.2 miles per gallon.
Using P-value approach, for right tailed test
P-value = P [ Z ≥ Zcal]
= [ Z ≥ 1.83]
= [0.5 – P (0 ≤ Z ≤ 1.83)]
= [0.5 – 0.4664]
= 0.336 Z=0 Ζ = 1.83
Decision: Since, P-value i.e. P0 = 0.0336 < 0.05 = α, Therefore, we rejected the null
hypothesis (H0) and accepted the alternative hypothesis (H1).
40
Example:
It is claimed that a random sample of 100 tyres with a mean tread life of 15,131 km. is drawn
from a population of tyres that has a mean tread life of 15,200 km. and standard deviation of 1,248
Km. Test the validity of this claim.
Solution:
We are given,
–
n = 100, X = 15,131km, µ = 15,200 km, σ = 1,248 km
Setting up Hypotheses:
Null Hypothesis H0: µ = 15,200 km, i.e., the tread life of the tyres is 15,200 km. In other
words, the tyres are drawn from population with mean tread life of 15,200 km.
Alternative Hypothesis H1: µ < 15,200 km, i.e., the tread life of the tyres is less than 15,200
km [Left-tailed test]
Level of significance: Since the level of significance is not given we take α = 0.05
Test Statistic: Under H0, the test statistics is
−
X– µ 15131 – 15200
Z= = = – 0.5529
σ/ n 1248/ 100
∴ |Z| = 0.5529
Critical value: The critical or tabulated value of the test statistic Z at 5% level of significance
for left-tailed test is –1.645, i.e., |Z0.05| = 1.645.
Decision: Since the calculated value of test statistic Z = 0.5529 is less than the critical value
|Z0.05| = 1.645, H0 is accepted. Hence we conclude that the tread life of the tyres is 15,200 km.
In other words, the tyres are drawn from population with mean tread life of 15,200 km.
Example:
The management of priority Health Club claims that its member lose an average of 10 pounds
or more within the first month after joining the club. A consumer agency that wanted to check this
claim took a random sample of 36 members of this health club and found that they lost an average of
9.2 pounds within the first month of membership with a standard deviation of 2.4 pounds. What
would be your decision?
Solution:
We are given.
–
µ = 10 pounds, n = 36, X = 9.2 pounds, s = 2.4 pounds
Setting up Hypotheses:
Null Hypothesis H0: µ = 10 pounds, i.e., the mean weight lost is 10 pounds.
Alternative Hypothesis H1: µ < 10 pounds, i.e., the mean weight lost is less than 10 pounds.
[Left-tailed test]
Level of significance: Since the level of significance is not given we take α = 0.05.
Test Statistic: under H0, the test statistic is
41
−
X– µ 9.2 – 10
Z= = = –2.00
σ/ n 2.4/ 36
∴ |Z| = 2.00
Critical value: The critical or tabulated value of the test statistic Z at 5% level of significance
for left tailed test is –1.645, i.e., |Z0.05| = 1.645.
Decision: Since the calculated value of test statistic |Z| = 2.00 is greater than the critical value
|Z0.05| = 1.645, H0 is rejected and H1 is accepted. Hence we conclude that the mean weight lost
within the first month of membership by the members of this club is less than 10 pounds.
Example:
A study claims that all the adults spend an average of 14 hours or more on chores during a
weekend. A researcher wanted to check if this claim is true. A random sample of 200 adults taken by
this researcher showed that these adults spend an average of 13.75 hours on chores during a weekend
with a standard deviation of 3.0 hours. Test the claim is true at 5% level of significance.
Solution:
We are given,
–
µ = 14 hours, n = 200, X = 13.75 hours, s = 3 hours
Setting up Hypotheses:
Null Hypothesis H0: µ = 14 hours, i.e, the average time spent by adults on chores during a
weekend is 14 hours.
Alternative Hypothesis H1: µ < 14 hours, i.e., the average time spent by adults on chores
during a weekend is less than 14 hours. [Left-tailed test]
Level of significance: It is given that the level of significance α = 5%.
Test Statistic: Under H0, the test statistic is
−
X– µ
Z =
σ/ n
13.75 – 14
= = –1.179
3/ 200
∴ |Z| = 1.179
Critical value: The critical or tabulated value of the test statistic Z at 5% level of significance
for left tailed test is –1.645, i.e., |Z0.05| = 1.645.
Decision: Since the calculated value of test statistic |Z| = 1.179 is less than the critical value
|Z0.05| = 1.645, H0 is accepted. Hence we conclude that the average time spent by adults on
chores during a weekend is 14 hours.
Note: For the preceding example the Null Hypothesis also can be set up as H 0: µ ≥ 14
hours. i.e., the average time spent by adults on chores during a weekend is 14 hours
or more.
42
Example:
Nepal Telecom claims that the mean duration of all long- distance phone calls made by its
residential customers is 10 minutes. A random sample of 100 long-distance calls made by its
residential customers taken from the records showed that the mean duration of calls for this sample
is 9.25 minutes with a standard deviation of 3.75 minutes. Test the mean duration of all long-
distance calls made by residential customers is less than 10 minutes. Make decision through critical
value approach and p-value approach.
Solution:
We are given
–
µ = 10 minutes, n = 100, X = 9.25 minutes, s = 3.75 minutes
Setting up Hypotheses:
Null Hypothesis H0 : µ = 10 minutes, i.e., the mean duration of all long distance calls made
by residential customers is 10 minutes.
Alternative Hypothesis H1 : µ < 10 minutes, i.e., the mean duration of all long distance calls
made by residential customers is less than 10 minutes. [Left-tailed test]
Level of significance: Since the level of significance is not given we take α = 0.05.
Test Statistic: under H0, the test statistic is
−
X– µ 2.25 – 10
Z = = = –2.00
σ/ n 3.75/ 100
∴ |Z| = 2.00
Critical value: The critical or tabulated value of the test statistic z at 5% level of significance
for left tailed test is – 1.645, i.e., |Z0.05| = 1.645.
Decision: Since the calculated value of test statistic |Z| = 2.00 is greater than the critical value
|Z0.05| = 1.645, H0 is rejected and H1 is accepted. Hence we conclude that the mean duration of all
long distance calls made by residential customers is less than 10 minutes.
Using P-value approach, for left tailed test
P-value = P[Z ≤ Zcal]
= P[Z ≤ – 2.00]
= P[Z ≥ 2.00] by symmetry
= 0.5 – P(0 ≤ Z ≤ 2.00)]
= 0.5 – 0.4772 Ζ = – 2.00 Z=0
= 0.0228
Decision: Since, P-value i.e. P0 = 0.0228 < 0.05 = α, Therefore, we rejected the null
hypothesis (H0) and accepted the alternative hypothesis (H1).
43
Example:
An insurance agent claims that the average age of policy-holders who insure through him is
less than the average age for all the agents, which is 30 years. A random sample of 100 policyholders
who had insured through him gave the following age distribution.
Age last birthday (yrs) 16-20 21-25 26-30 31-35 36-40
No. of persons 12 22 20 30 16
Calculate the mean and standard deviation of the sample and use these values to test his claim
at 5% level of significance.
Solution:
–
Calculation of X and s
No. of X – 28
Age X d' = fd' fd'2
persons 5
16-20 12 18 –2 –24 48
21-25 22 23 –1 –22 22
26-30 20 28 0 0 0
31-35 30 33 1 30 30
36-40 16 38 2 32 64
n = 100 ∑fd' = 16 ∑fd'2 = 164
Here, A = 28, h = 5
– ∑fd' 16
Now, sample mean (X) = A + × h = 28 + ×5 = 28.8
n 100
∑fd'2 ∑fd'2
Sample Standard deviation (s) =
n
–
n ×h
164 16 2
= – ×5 = 6.35
100 100
Thus,
–
X = 28.8 years, s = 6.35 years, n = 100, µ = 30 years
Setting up Hypotheses:
Null Hypothesis H0: µ = 30 years, i.e., the average age of the policy holders who insured
through him is 30 years. In order words, the claim of the insurance agent is not valid.
Alternative Hypothesis H1 : µ < 30 years, i.e., the average age of the policy holders who
insured through him is less than 30 years. In other words, the claim of the insurance agent is valid.
[Left-tailed test]
44
Level Statistic: under H0, the test statistic is
−
X– µ 28.8– 30
Z= = = –1.89
s/ n 6.35/ 100
∴ |Z| = 1.89
Critical value: The critical or tabulated value of the test statistic Z at 5% level of
significance for left tailed test is –1.645, i.e., |Z0.05| = 1.645.
Decision: Since the calculated value of test statistic |Z| = 1.89 is greater than the critical value
|Z0.05| = 1.645, H0 is rejected and H1 is accepted. Hence we conclude that the average age of the
policy holders who insured through him is less than 30 years. In other words, the claim of the
insurance agents is valid.
45
Step 3: Test Statistic
Under H0, the statistic is
− − − − − − − −
(X1 – X2) – E(X1 –X2) (X1 – X2) – (µ1 – µ2) (X1 – X2)
Z= = ~ N(0,1) = ~ N(0,1)
− − 2 2 2 2
S.E.(X – X2) σ1 σ2 σ1 σ2
+ +
n1 n2 n1 n2
Where,
2 2
− − σ1 σ2
S.E. (X1 – X2) = Standard error of difference of two means = +
n1 n2
–
X1 = Sample mean of the 1st population
–
X2 = Sample mean of the 2nd population
2
σ 1 = Variance of the 1st population
2
σ2 = Variance of the 2nd population
46
Remarks:
2 2
If σ 1 = σ2 = σ2 and hence we want to test whether two independent samples have come from
the same population then under H0: µ1= µ2, the test statistic is given by
Z=
(X− −
– X2
1 )~ N(0,1)
2 1 1
σ
n1 + n2
If the common variance σ2 is unknown, then it is estimated by the combined sample variance.
i.e.,
2 2
2
n1s1 + n2s2
σ$ = ; for large samples
n1 + n2
Example:
In a certain factory there are two independent processes for manufacturing the same item. The
average weight in a sample of 250 items produced from one process is found to be 120 grams with a
standard deviation of 12 grams, while from the other process are 124 grams and 14 grams in a
sample of 400 items. Is the difference between the mean weights significant at 5% level of
significance?
Solution:
We are given,
For Process 1
Number of items (n1) = 250
–
Average sample weight (X1) = 120 grams
Sample Standard deviation (s1) = 12 grams
For Process 2
Number of items (n2) = 400
–
Average sample weight (X2) = 124 grams
Sample Standard deviation (s2) = 14 grams
Setting up Hypotheses:
Null Hypothesis H0: µ1 = µ2 , i.e., there is no significant difference between the mean weights
of items manufactured by two independent processes. Alternative Hypothesis H1 : µ1 ≠ µ2 , i.e., there
is significant difference between the mean weights of items manufactured by two independent
processes.
Alternative Hypothesis H1 : µ1 ≠ µ2 , i.e., there is significant difference between the mean
weights of items manufactured by two independent processes. [Two-tailed test]
Level of significance: It is given that the level of significance α = 5%.
Test Statistic: Under H0, the test statistic is
47
− −
X1 – X2
Z = 2 2
s1 s2
+
n1 n2
120 – 124
= = – 3.87
122 142
+
250 400
∴ |Z| = 3.87
Critical value: The critical or tabulated value of the test statistic Z at 5% level of significance
for two tailed test is ± 1.96, i.e., |Z0.05| = 1.96.
Decision: Since the calculated value of test statistic |z| = 3.87 is greater than the critical value |Z0.05|
= 1.96, H0 is rejected and H1 is accepted. Hence we conclude that there is significant difference
between the mean weights of items manufactured by two independent processes.
Example:
The average weekly wages of a sample of 200 workers in an industry A was Rs. 1150 with
standard deviation of Rs. 100. The average weekly wages of a sample of 300 workers in an industry
B was Rs. 1000 with standard deviation of Rs. 50. Can we consider the weekly wages paid by
industry A is higher than those paid by industry B?
Solution:
We are given,
For Industry A
Number of workers (n1) = 200
–
Average weekly wages (X1) = Rs. 1150
S.D of wages (s1) = Rs .100
For industry B
Number of workers (n2) = 300
–
Average weekly wages (X2) = Rs. 1000
S.D. of wages (s2) = Rs. 50
Setting up hypotheses:
Null Hypothesis H0: µ1 = µ2 , i.e., the weekly wages paid by industry A and industry B are
equal.
Alternative Hypothesis H1: µ1 > µ2 , i.e., the weekly wages paid by industry A is higher than
those paid by industry B. [Right-tailed test]
Level of significance: Since the level of significance is not given we take α = 0.05.
Test Statistic: Under H0 , the test statistic is
48
− −
(X1 – X2)
Z = 2 2
s 1 s2
+
n1 n2
(1150 – 1000)
=
1002 502
+
200 300
∴ Z = 19.64
Critical Value: The critical or tabulated value of the test statistic Z at 5% level of
significance for right tailed is 1.645, i.e., Z0.05 = 1.645.
Decision: Since the calculated value of test statistic Z = 19.64 is greater than the critical value
Z0.05 = 1.645, H0 is rejected and H1 is accepted. Hence we conclude that the weekly wages
paid by industry A is higher than those paid by industry B.
Example:
A company claims that its light bulbs are superior to those of a competitor based on study,
which showed that a sample of 40 of its bulbs had an average lifetime of 628 hours of continuous
use with a standard deviation of 27 hours. While a sample of 30 bulbs made by the competitor had
an average lifetime of 619 hours of continuous use with a standard deviation of 25 hours. Check, at
5% level of significance, whether the claim is justified.
Solution:
We are given,
For concerned company
No. of bulbs (n1) = 40
–
Average life time of bulbs (X1) = 628 hours
S.D. lifetime of bulbs (s1) = 27 hours
For competitor
No. of bulbs (n2) = 30
–
Average life time of bulbs (X2) = 619 hours
S.D. lifetime of bulbs (s2) = 25 hours
Setting up Hypotheses:
Null Hypothesis H0: µ1 = µ2, i.e., the average lifetime of bulbs manufactured by the company
and its competitor are same. In other words, the claim of the company is not justified.
Alternative Hypothesis H1: µ1 > µ2 , i.e., the average lifetime of bulbs manufactured by the
company is superior to those of a competitor. In other words, the claim of the company is justified.
[Right-tailed test]
Level of significance: It is given that the level of significance α = 5%
Test Statistic: Under H0, the test statistic is
49
− −
X1 – X2 628 – 618
Z = = = 1.44
2
s1
2
s2 272 252
+
+ 40 30
n1 n2
Z = 1.44
Critical value: The critical or tabulated value of the test statistic Z at 5% level of significance
for right tiled test is 1.645 i.e Z0.05 = 1.645.
Decision: Since the calculated value of test statistic Z = 1.44 is less than the critical value
Z0.05 = 1.645. H0 is accepted. Hence we conclude that the average lifetime of bulbs
manufactured by the company and competitor are same. In other words, the claim of the
company is not justified.
Often we want to conduct a test of hypothesis about a population proportion. For example, the
major of the city claimed that only 20% of the people living in the city are below the poverty level,
and here, we want to test whether or not the major claims is true. As another example, a mail-order
company claims that 90% of all orders it receives are shipped within 72 hours. The company's
management may want to determine from time to time whether or not this claim is true.
This section performs the procedure to perform tests of hypotheses about the population
proportion (P), for large samples.
S.E. (p) = N – n PQ , then the test statistic in testing the single sample proportion is
N– 1 n
p–P
Z= ~ N (0, 1)
– n PQ
N
N– 1 n
2. The confidence limits for estimating population proportion is given by
C.I = p ± Zα × S.E. (p)
51
Example:
A wholesaler in apples claims that only 4% of the apples supplied by him are defective. A
random sample of 600 apples contained only 36 defective apples. Test the claim of the wholesaler.
Make decision through critical value approach and P- value approach.
Solution:
We are given,
P = Proportion of defective apples in the population = 0.04
n = Number of apples taken in the sample = 600
x = Number of defective apples in the sample = 36
36
p = Proportion of defective apples in the sample = = 0.06
600
Q = 1 – P = 1 – 0.04 = 0.96
Setting up Hypotheses:
Null Hypothesis H0: P = 0.04, i.e., the proportion of defective apples in the population is 4%.
In other words, the claim of the wholesaler is valid.
Alternative hypotheses H1: P > 0.04, i.e., the proportion of defective apples in the population
is greater than 4%. In other words, the claim of the wholesaler is not valid. [Right-tailed test]
Level of significance: Since the level of significance is not given we take α = 0.05.
Test Statistic: Under H0, the test statistic is
p–P 0.06 – 0.04
Z= = = 2.5
PQ 0.04×0.96
n 600
∴ Z = 2.5
Critical Value: The critical or tabulated value of the test Statistic Z at 5% level of
significance for right – tailed test is 1.645, i.e., Z0.05 = 1.645.
Decision: Since the calculated value of the test statistic Z = 2.5 is greater than the tabulated
value Z0.05 = 1.645, H0 is rejected and H1 is accepted. Hence we conclude that the proportion
of defective apples in the population is greater than 4%. In other words, the claim of the
wholesaler is not valid.
Using P-value approach, for right tailed test
P-value = P[Z ≥ Zcal]
= P[Z ≥ 2.50]
= 0.5 – P (0 ≤ Z ≤ 2.50)
= 0.5 – 0.0.4938 Z=0 Ζ = 2.50
= 0.0062
Decision: Since, P-value i.e. P0 = 0.0062 < 0.05 = α, Therefore, we rejected the null
hypothesis (H0) and accepted the alternative hypothesis (H1).
52
Example:
A manufacture claims that at least 95% of the machine parts supplied by him confirm to
specification. An examination of a sample of 200 parts revealed that 50 parts were defective. Is the
claim of the manufacture rational at 5% level of significance? Make decision through critical value
approach and P- value approach.
Solution:
We are given,
P = Proportion of defective apples in the population = 0.95
n = Number of parts taken in the sample = 200
x = Number of non-defective parts in the sample = 200 – 50 = 150
150
p = Proportion of non defective parts in the sample = = 0.75
200
Q = 1 – P = 1 – 0.95 = 0.05
Setting up Hypotheses
Null Hypothesis H0: P = 0.95, i.e., the proportion of non-defective parts in the population is
95%. In order words, the claim of the manufacturer is rational.
Alternative Hypothesis H1: P < 0.95, i.e., the proportion of non-defective parts in the
population is less than 95%. In other words the claim of the manufacturer is not rational. [Left
- tailed test]
Level of significance: It is given that the level of significance α = 5%
Test Statistic: Under H0 the test statistic is
p–P 0.75 – 0.95
Z= = = 12.98
PQ 0.95×0.05
n 200
∴ |Z| = 12.98
Critical Value: The critical or tabulated value of the test Statistic Z at 5% level of
significance for left – tailed test is – 1.645 , i.e., |Z0.05| = 1.645.
Decision: Since the calculated value of the test statistic |Z| = 12.98 is greater than the
tabulated value |Z0.05| = 1.645, H0 is rejected and H1 is accepted. Hence we conclude that the
proportion of non-defective parts in the population is less than 95%. In other words, the claim
of the manufacturer is not rational.
Using P-value approach, for left tailed test
P-value = P[Z ≤ – Zcal]
= P[Z ≤ – 12.98]
= P[Z ≥ 12.98] by symmetry
= 0.5 – P (0 ≤ Z ≤ 12.98) Ζ = – 2.00 Z=0
= 0.5 – 0.5 = 0.00
Decision: Since, P-value i.e. P0 = 0 < 0.05 = α, Therefore, we rejected the null hypothesis
(H0) and accepted the alternative hypothesis (H1).
53
Example:
A die is thrown 6000 times and it turns up 5 or 6, 2100 times. Can we consider the die is fair?
Make decision through critical value approach and P- value approach.
Solution:
We are given,
n = Sample size = 6000
x = Number of success (turns up 5 or 6) in the sample = 2100
2100
p = Proportion of success in the sample = = 0.35
6000
1 1 1
P = Proportion of success in the population = + =
6 6 3
1 2
Q=1–p=1– =
3 3
Setting up Hypotheses:
Null Hypothesis H0: P = 1/3, i.e., the population proportion of success is 1/3. In other words,
the die is fair.
Alternative Hypothesis H1: P ≠ 1/3, i.e., the population proportion of success is not 1/3 . In
other words, the die is not fair. (Two - tailed test]
Level of significance: Since the level of significance is not given, we take α = 0.05.
Test Statistic: Under H0, the test statistic is
p–P 0.35 – 1/3
Z= = = 2.74
PQ 1/3×/3
n 6000
∴ Z = 2.74
Critical Value: The critical or tabulated value of the test Statistic: Z at 5% level of
significance for two – tailed is ± 1.96
i.e., |Z0.05| = 1.96.
Decision: Since the calculated value of the test statistic Z = 2.74 is greater than the tabulated
value |Z0.05| = 1.96, H0 is rejected and H1 is accepted Hence we conclude that the population
proportion of success is not 1/3. In other words, the die is not fair.
Using P-value approach, for two tailed test
P-value = P[| Z | ≥ | Zcal|]
= P[| Z | ≥ 2.74]
= P[– 2.74 ≥ Z ≤ 2.74] by symmetry
= 0.5 – P (0 ≤ Z ≤ 2.74)
= 2 [0.5 – 0.4969]
= 2 × 0.0031 – 2.74 Z=0 2.74
= 0.0062
Decision: Since, P-value i.e. 2P0 = 0.0062 < 0.05 = α, Therefore, we rejected the null
hypothesis (H0) and accepted the alternative hypothesis (H1).
54
Example:
In a random sample of 600 persons from a large population, 150 are females. Can it be said
that male and female are in the ratio 5:3 in the population.
Solution:
We are given,
n = Number of persons in the sample = 600
x = Number of females in the sample = 150
150
p = Proportion of females in the sample = = 0.25
600
3
P = Proportion of females in the population = = 0.375
3+5
Q = 1 – P = 1 – 0.375 = 0.625
Setting up Hypotheses:
Null Hypothesis H0: P = 0.375, i.e., the proportion of females in the population is 0.375
Alternative Hypothesis H1: P ≠ 0.375, i.e., the proportion of females in the population is not
0.375. [Two - tailed test]
Level of significance: Since the level of significance is not given, we take α = 0.05.
Test Statistic: Under H0, the test statistic is
p–P 0.25 – 0.375
Z= = = – 6.32
PQ 0.375×0.625
n 600
∴ |Z| = 6.32
Critical Value: The critical or tabulated value of the test Statistic Z at 5% level of
significance for two-tailed test is ±1.96 , i.e., |Z0.05| = 1.96.
Decision: Since the calculated value of the test statistic |Z| = 6.32 is greater than the tabulated
value |Z0.05| = 1.96, H0 is rejected and H1 is accepted. Hence we conclude that the proportion
of females in the population is not 0.375.
Example:
A sample of 600 persons selected at random from a large city gives the results that males are
53%. Is there any reason to doubt the hypothesis that males and females are equal in the city?
Solution:
We are given,
n = Number of persons in the sample = 600
p = Proportion of males in the sample = 0.53
P = Proportion of males in the population = 0.50
Q = 1 – P = 1 – 0.50 = 0.50
Null Hypothesis H0: P = 0.50, i.e., the proportion of females in the population is 0.50. In other
words, the proportions of males and females in the city are equal.
55
Alternative Hypothesis H1: P ≠ 0.50, i.e., the proportion of males in the city is not 0.50. In
other words, the proportions of males and females in the city are not equal. [Two - tailed test]
Level of significance: Since the level of significance in not given we take α = 0.05.
Test Statistic: Under H0, the test statistic is
p–P 0.53 – 0.50
Z= = = 1.47
PQ 0.50×0.50
n 600
∴ Z = 1.47
Critical Value: The critical or tabulated value of the test Statistic Z at 5% level of
significance for tow- tailed is ±1.96, i.e., |Z0.05| = 1.96.
Decision: Since the calculated value of the test statistic Z = 1.47 is less than the tabulated
value |Z0.05| = 1.96, H0 is accepted. Hence we conclude that the proportion of males in the city is
0.50. In other words, the proportions of males and females in the city are equal.
Example:
Direct mailing company sells computers and computer parts by mail. The company claims
that at least 90% of all orders are mailed within 72 hours after they received. The quality control
department at the company often takes samples to check if this claim is called. A recently taken
sample of 150 orders showed that 1429 of them were mailed within 72 hours. Do you think the
company's claim is true? Use a 1% level of significance.
Solution:
We are given,
n = number of order in sample = 150
x = Number of order mailed within 72 hours = 129
129
p = Proportion of orders mailed within 72 hours in the sample = = 0.86
150
P = Proportion of all ordered mailed within 72 hours in the population = 0.90
Q = 1 – P = 1 – 0.90 = 0.10
Setting up Hypotheses:
Null Hypothesis H0: P = 0.90., the proportion of all ordered that are mailed within 72 hours in
the population is 90%. In order words, the company's claim is true.
Alternative Hypothesis H1: P < 0.90, i.e., the proportion of all ordered that are mailed within
72 hours is the population is less than 90%. In other words, the company's claim is not true.
[Left- tailed test]
Level of significance: It is given that the level of significance α = 1%
Test Statistic: Under H0, the test statistic is
p–P 0.86 – 0.90
Z = = = – 1.63
PQ 0.90×0.10
n 150
∴ |Z| = 1.63
56
Critical Value: The critical or tabulated value of the test Statistic Z at 1% level of
significance for left - tailed test is –2.326, i.e., |Z0.05| = 2.326.
Decision: Since the calculated value of the test statistic |Z| = 1.63 is less than the tabulated
value |Z0.05| = 2.326, H0 us accepted. Hence we conclude that the proportion of all ordered that
are mailed within 72 hours in the population is 90%, and the company's claim is true.
Example:
An auditor claims that 10 per cent of customers' ledger accounts are carrying mistakes of
posting and balancing. A sample of 600 was taken to test the accuracy of posting and balancing and
45 mistakes were found. Are these results consistent with the claim of the auditor?
Solution:
We are given,
P = proportion of customer's ledger accounts carrying mistakes of posting and balancing in the
population = 0.10
n = Number of ledger accounts taken in the sample = 600
x = Number if mistakes of posting and balancing in the sample = 45
45
p = Proportion of mistakes of posting and balancing in the sample = = 0.075
600
Q = 1 – P = 1 – 0.10 = 0.90
Setting up Hypotheses:
Null Hypothesis H0: P = 0.10, i.e., the proportion of customer's ledger accounts carrying
mistakes of posting and balancing in the population is 10%. In other words, the sample results are
consistent with the claim of the auditor.
Alternative Hypothesis H1: P ≠ 0.10, i.e., the proportion of customer's ledger accounts
carrying mistakes of posting and balancing in the population is not 10%. In order words, the sample
results are not consistent with the claim of the auditor. [Two - tailed test]
Level of significance: Since the level of significance is not given we take α = 0.05.
Test Statistic: Under H0 the test statistic is
p–P 0.075 – 0.10
Z= = = – 2.04
PQ 0.10×0.90
n 600
∴ |Z| = 2.04
Critical Value: The critical or tabulated value of Z at 5% level of significance for two- tailed
test is ± 1.96, i.e., |Z0.05| = 1.96
Decision: Since the calculated value of the test statistic |Z| = 2.04 is greater than the tabulated
value |Z0.05| = 1.96, H0 is rejected and H1 is accepted. Hence we conclude that the proportion of
customer's ledger accounts carrying mistakes of posting and balancing in the population is not 10%
in other words, the sample results are not consistent with the claim of the auditor.
57
2.12.2 Test of significance of difference between two proportions
Let us consider two independent populations each having population proportion P1 and P2 of
the certain attributes or characteristics. Now, we want to test whether there is an difference between
the proportions of these populations with respect to this attribute. For example, suppose two
independent population proportions: Proportion of smokers in Kathmandu city and proportion of
smokers in Pokhara city, here we want to test whether the proportion of smokers between the two
cities is significantly different or proportion of smokers i one city is higher or less than the another
city.
Step 1: Setting up hypotheses
Null Hypothesis H0: P1 = P2, i.e., the two independent population proportions are same. In
other words, there is no significant difference between two independent population proportions.
Alternative Hypothesis: Any one of the following Alternative Hypothesis will be set while
solving the problems.
i. H1: P1 ≠ P2 i.e., the two independent population proportions are not same. In other words,
there is significant difference between two independent population proportions.
[Two-tailed test]
ii. H1: P1 > P2, i.e., the population proportion of the first population is greater than population of
the second population. [Right-tailed test]
iii. H1: P1 < P2, i.e., the population of the first population is less than population proportion of the
second population. [Left-tailed test]
Step 2: Level of significance (α)
Choose the appropriate level of significance in advance. The most comply used is α = 5%
unless otherwise stated.
Step 3: Test Statistic
Under H0: P1 = P2 = P (say) the test statistic is
(p1 – p2) – E(p1 – p2)
Z=
S.E.(p1 – p2)
p1 – p2
Z=
PQ +
1 1
n1 n2
Where, S.E. (P) = Standard error of difference between two proportions
P1 =Sample proportion of the first population
P2 = Sample proportion of the second population
n1 = Sample size taken from first population
n2 = Sample size taken from second population
58
In general P, common population proportion is not known and we use its unbiased estimate
based on sample given by
x 1 + x 2 n 1p 1 + n 2 p 2
P$ = =
n1 + n2 n1 + n2
p1 – p2
∴Z=
P$ Q$ 1 + 1
n1 n2
$ = 1 – P$
Where, Q
Step 4 : critical value
The critical or tabulated value of the test statistic Z at the pre-specified level of significance is
obtained from the area under normal curve.
Step 5: Decision
i. If the calculated value of Z is less or equal to the tabulated value of Z, then we accept H0, i.e.,
the two independent population proportions are same. In other words, there is no significant
difference between two independent population proportions.
ii. If the calculated value of Z is greater than the tabulated value of Z, then we reject H0, i.e., the
two independent population proportions are not same. In other words, there is significant
difference between two independent population proportions.
Example:
At a certain date in a large city 400 out of a random sample of 500 men were found to be
smokers. After the tax on tobacco has been heavily increased, another ransom sample of 600 men in
the same city included 400 smokers. Was the observed decrease in the proportion of the smokers
significant? Test at 5% level of significance.
Solution:
We are given,
Before Tax on Tobacco
n1 = Sample size = 500
x1 = number of male smoker in the sample = 400
400
p1 = Proportion of smokers in the sample = = 0.80
500
After Tax on Tobacco
n2 = Sample size = 600
x2 = number of male smoker in the sample = 400
400
p2 = Proportion of smokers in the sample = = 0.667
600
Since the population proportion of male smokers P is not given, it is estimated as
n 1p 1 + n 2p 2 x 1 + x 2 400 + 400
P$ = = = = 0.7273
n1 + n2 n1 + n2 5800 + 600
$ = 1 – P = 1 – 0.7273 = 0.22727
Q
59
Setting up hypotheses:
Null Hypothesis H0: P1 = P2, i.e., the proportion of male smokers in the population before and
after heavy tax on tobacco are same. In other words, there is no significant decrease in the
proportion of male smokers after heavy tax on tobacco.
Alternative Hypothesis H1: P1 > P2, i.e., the proportion of male smokers in the population before the
tax on tobacco is greater than of after heavy tax on tobacco. In other words, there is significant
decrease in the proportion of male smokers after heavy tax on tobacco. [Right-tailed test]
Level of Significance: It is given that the level of significance α = 5%.
Test Statistic: Under H0, the test statistic is
p1 – p2 0.80–0.667
Z = =
P$ Q$ 1 + 1 0.7273×0.2727
1 1
n1 n2 500 + 600
∴ Z = 4.93
Critical Value: The critical or tabulated value of the test Statistic Z at 5% level of
significance for right -tailed test is 1.645, i.e., Z0.05 = 1.645.
Decision: Since the calculated value of the test statistic Z = 4.93 is greater than the tabulated
value Z0.05 = 1.645, H0 is rejected and H1 is accepted. Hence we conclude that the proportion
of male smokers in the population before the tax on tobacco is greater than of after heavy tax
on tobacco. In other words, there is significant decrease in the proportion of male smokers
after heavy tax on tobacco.
Example:
In a random sample of 1,000 persons from town A, 400 are found to be consumers of wheat.
In a random sample of 800 from town B, 400 are found to be consumers of wheat. Do these data
reveal a significant difference between town A and town B, as far as the proportion of wheat
consumers is concerned?
Solution:
We are given
In town A
n1 = Sample size = 1000
x1 = Number of wheat consumers in the sample = 400
400
p1 = Proportion of wheat consumers in the sample = = 0.40
1000
In town A
n2 = Sample size = 800
x2 = Number f wheat consumers in the sample = 400
400
p2 = Proportion of wheat consumers in the sample = 0.50
800
since, the population proportion of wheat consumers P is not given, it is estimated as
60
n 1p 1 + n 2p 2 x 1 + x 2 400 + 400
P$ = = = = 0.444
n1 + n2 n1 + n2 1000 + 800
$ =1 – P = 1 – 0.444 = 0.556
Q
Setting up Hypotheses:
SNull Hypothesis H0: P1 = P2, i.e., the proportion of wheat consumers in the population in
town A and town B are same. In other words, there is no significant difference in the proportion of
wheat consumers in town A and town B.
Alternative Hypothesis H1: P1 ≠ P2, i.e., the proportion of wheat consumers in the population
in town A and town B are same. In other words, there is significant difference in the proportion of
wheat consumers in town A and town B. [Two-tailed test]
Level of significance: Since the level of significance is not given, we take α= 0.05.
Test Statistic: under H0 the test statistic is
p1 – p2 0.40 – 0.50
Z= = = –4.24
P$ Q$ 1 + 1 0.444×0.556
1
+
1
n1 n2 1000 800
∴ |Z| = 4.24
Critical Value: The critical or tabulated value of the test Statistic Z at 5% level of
significance for two- tailed test is ±1.96, i.e., |Z0.05| = 1.96.
Decision: Since the calculate value if the test statistic |Z| = 4.24 is greater than the tabulated
value |Z0.05| = 1.96, H0 is rejected and H1 is accepted. Hence we conclude that the proportion of
wheat consumers in the population in town A and town B are not same. In other words, there is
significant different in the proportion of wheat consumers in town A and town B.
Example:
A company has the head office at Kathmandu and a branch office at Pokhara. The personnel
director wanted to know if the workers at the two places would like the introduction of a new plan of
work and a survey was conducted for this purpose. Out of sample of 500 workers at Kathmandu,
62% favored the new plan. At Pokhara out of a sample of 400 workers, 41% were against the new
plan. Is there any significant difference between the two groups in their attitude towards the new
plan at 5% level?
Solution:
We are given,
For Kathmandu Branch
n1 = No. of workers in the sample = 500
p1 = Proportion of workers in favored to the new plan in the sample = 0.62
For Pokhara Branch
n2 = No. of workers in the sample = 400
61
p2 = Proportion of workers in favored to the new plan in the sample = 1 – 0.41 = 0.59
Since the population proportion of workers in favored to the new plan P is not given, it is
estimated as
63
In town B
n1 = No. of births in the sample = 956
p2 = Proportion of male birth in the sample
Since, P$ =0.496
n 1p 1 + n 2p 2
or, = 0.496
n1 + n2
956×0.525 + 450× P2
or, = 0.496
1406
$ = 1 – P$ = 1 – 0.496 = 0.504
Q
Setting up hypotheses
Null Hypothesis H0: P1 = P2, i.e., the proportion of male births in the population in town A
and B are same. In other words, there is no significant difference in male births in the two towns.
Alternative Hypothesis H1: P1 ≠ P2, i.e., the proportion of male births in the population in
town A and town B are not same. In other words, there is significant difference in male births in the
two towns. [Two-tailed test]
Level of significance: Since the level of significance is not given, we take α = 0.05.
Test Statistic under H0, the test statistic is
p1 – p2
Z =
P$ Q$ +
1 1
n1 n2
0.525 –0.434
=
0.496×0.504
1 1
956 + 450
= 3.18
∴ Z = 3.18
Critical Value: The critical or tabulated value of the test Statistic Z at 5% level of
significance for two-tailed test is ± 1.96, i.e., |Z0.05| = 1.96.
Decision: Since the calculated value of the test statistic Z = 3.18 is greater than the tabulated
value | Z0.05 | = 1.96, H0 is rejected and H1 is accepted. Hence we conclude that the proportion of
male births in the population in town A and town B are not same. In other words, there is significant
difference in male births in the two towns.
64
Theoretical Questions
1. What is hypothesis and testing of hypothesis? What are the importance of hypothesis testing
in decision making?
2. Discuss different steps used in testing of hypothesis.
3. Define the following terms: (a) Null Hypothesis and Alternative Hypothesis (b) Type I error
and Type II error (c) Level of significance (d) one tailed and two tailed tests (e) Acceptance
and Rejection region.
4. What do you understand by test of significance? State the general procedure of testing a
hypothesis.
5. Explain the procedure for Z-test.
6. Write the general procedure for testing a single mean in case of large sample.
7. Explain how do you test the significance of
i. Single mean ii. difference of two means
iii. Single proportion iv. difference of two proportions
8. Describe briefly the large sample test for testing the significance for difference of two
proportions.
9. Describe the p-value approach of hypothesis testing.
Practical Problems
1. The principal of a school A claimed that the mean IQ of his students is at least 80. An
examination is conducted to 50 randomly selected students from the school and it is found
that the mean IQ is 77 and standard deviation is 3.46. Test whether there is significant
difference between the sample mean IQ and the population mean IQ.
2. A sample of 100 units is found to have mean 99. Test at 5% level of significance; whether the
sample has been drawn from a normal population with mean 100 and standard deviation 8.
3. The mean lifetime of sample of 400 fluorescent light bulb produced by a company is found to be
1570 hours with a standard deviation of 150 hours. Test the hypothesis that the mean lifetime of
the bulbs produced by the company is at least 1600 hours at 1% level of significance.
4. A random sample of 40 certain motor bike gave the mean running capacity of 55 km per liter
of petrol. Can this sample be regarded as a sample from a population of standard deviation 11
having mean (a) 60 km per liter (b) less than 60 km per liter.
5. The mean breaking strength of the cables supplied by a manufacturer is 1800 pounds (lb) with
a standard deviation of 100 Ib. By a new technique in the manufacturing process, it is claimed
that the breaking strength of the cables has increased. In order to test this claim a samples of
50 cables is tested and it is found that the mean /breaking strength is 1850 Ib. Can we support
the claim at a 1% level of significance? Use critical value approach and P-value approach to
make decision.
6. The manufactures of a certain brand of auto batteries claims that the mean life of these
batteries is 45 months. A consumer protection agency wants to check this claim and took a
random sample of 36 such batteries and found that the mean life for this sample is 43.75
months with a standard deviation of 4.5 months. Test the claim of the manufacturer at 5%
level of significance. Use critical value approach and P-value approach to make decision.
65
7. The mayor of a large city claims that the average net worth of families living in this city is at
least Rs. 300,000. A random sample of 100 families selected from this city produced a mean
net worth of Rs. 288,000 with a standard deviation of Rs. 80,000. Using the 5% level of
significance level, can you conclude that the mayor's claim is false? Use critical value
approach and P-value approach to make decision.
8. A sample of 100 households in a certain community has an average income of Rs. 628 per
week with a standard deviation of Rs. 60. Find the standard error of the mean and determine
99% confidence limits within which the income of all the people in this community is
expected to lie. Also, test the hypothesis that the average income was Rs. 640 per week.
9. A restaurant company has a policy of opening new restaurants only in those areas that have a
mean household income at least Rs. 35,000 per months. The company is currently considering
an area in which to open a new restaurant. The company's research department took a sample
of 150 households from area and food that the mean income of these households. is
Rs. 33,400 per months with a standard deviation of Rs. 5,400. Using the 1% level of
significance would you conclude that the company should not open a restaurant in this area?
10. A consumer advocacy group suspects that a local supermarket's 10-ounce packages of certain
kind of Cheese actually weigh less than 10 ounces. The group took a random sample of 36
such packages weighted each one and found that the mean weight for the sample to be 9.995
ounces with a standard deviation of 0.15 ounce. Using 5% significance level, would you
conclude that the mean weight for all such packages is less than 10 ounces. What is your
decision if you test at 1% level of significance?
11. The still company manufacturer steel bars. If the production process is working properly, it
turns out steel bars with mean length of at least 2.8 feet with a standard deviation of 0.20 feet.
Longer steel bars can be used or altered, but shorter bars must be scrapped. A sample of 36
bars is selected from the production line and the sample mean was found to be 2.73 feet. Test
whether the production process needs adjustment.
12. If 60 MA economics students are found to have a mean height of 63.60 inches and a 50 MBS
students have a mean height of 69.51 inches. Would you conclude that management students
are taller than economics students? Assume that standard deviation of height of post graduate
students to be 2.24 inches.
13. A certain college conducts both morning and evening classes intended to be identical. A
random sample of 200-morning class students yields examination result as average score of
72.4 with a standard deviation of 14.8 random sample of 100 night class students yields
examination result as average score 73.9 with a standard deviation of 17.9 is the average score
of morning and evening classes statistically equal at 5% level of significance? Use critical
value approach and P-value approach to make decision.
14. In a random sample of 100 electric light tubes of manufacturer A have a mean lifetime of
1400 hours with a standard deviation of 200 hours while in a random sample of 150 of
manufacturer B have a mean lifetime of 1200 hours with a standard deviation of 100 hours.
Can we conclude that the mean lifetime of two manufacturers are significantly different?
66
15. A firm believes that the tyres produced by process A on an average last longer than tyres
produced by process B. To test this belief, random samples of tyres produced by the two
processes were tested and the results are:
Process Sample size Average lifetime (in km) S.D (in km)
A 50 22,400 1000
B 50 21,800 1000
Is there evidence at a 5 per cent level of significance that the firm is correct in its belief?
16. A person buys 100 electric tubes of each of two weel-known makes, taken at random from stock
for testing purpose. He finds that makes A' has a mean life of 1,300 hours with a standard
deviation of 82 hours, and ' make B' has mean life of 1,248 hours with a standard deviation of 93
hours. Discuss the significance of these results. Which make of electric tubes should the person
buy? Use critical value approach and P-value approach to make decision.
17. Samples of the two types of electric bulbs were tested for length of life and following results
was obtained.
Type I n1 = 50 – s1 = 36 hrs
X1 = 1234 hrs
Type II n2 = 50 – s2 = 40 hrs
X2 = 1036hrs
Is the difference in the means sufficient to warrant that Type I is superior to Type II regarding
to the length of life?
18. The mean height of 50 students who showed above average participation in athletes was 68.2
inches with a standard deviation of 2.5 inches; while 50 students who showed no interest in
such participation had a mean height of 67.5 inches with a standard deviation of 2.8 inches.
Test the hypothesis that students who participate in athletics are taller than other students.
19. Two random samples of electric bulbs manufactured by company X and Y gave the following
data:
Sample Size Average (hours) S.D.
X 100 950 50
Y 150 1000 40
i. Test whether there is significant difference in the two mean life of the two makes.
ii. Test whether the mean of sample Y exceeds that of sample X or not. Use critical value
approach and P-value approach to make decision.
20. A sample survey of yield of wheat conducted on irrigated and non-irrigated plots showed the
following data:
Irrigated plots Non-irrigated plots
Sample size 70 75
Mean yield (Qntls.)/hector 30 20
Standard deviation 10 15
67
i. Test whether, the difference in the average yield of wheat from the plots of two
categories is significant or not?
ii. Test whether the average yield of wheat from irrigated plots is significantly more than
the from non-irrigated plots.
21. Two random samples of Nepalese people taken from rural and urban region gave the
following data of their income:
Sample Size Average monthly income S.D.
I from rural region 150 800 50
II from urban region 100 1250 30
i. Test whether there is significant difference between the two sample means of monthly
income.
ii. Test whether the average monthly income of rural people is significantly less than that of
the urban people
22. An electronics company has 100 workers of two categories A and B in order to assemble a
certain kind of colour televisions. The aver rage output per day and their variability showed
by the company as follows:
Category of workers Size Mean S.D.
A 40 50 5
B 60 55 9
i. The difference in the mean output per day by the workers of categories A and B is
significant.
ii. The average output given by the workers of category A is more than that by the workers
of category B.
23. Two chemical solution X and Y were tested for their PH, the degree of acidity and gave the
following data on their PH values.
Chemical Solution No. of Observations Average PH values S.D.
X 45 6.40 1.27
Y 35 7.25 1.16
Test whether
i. The two types of solutions have different mean pH values of not.
ii. The average PH values of solution X is significantly less than that of solution Y or not.
24. In a sample of 400 parts manufactured by a factory, the number of defective parts was found
to be 30. The company, claimed that only 5% of their product is defective. Is the claim
reasonable?
25. A manufacturer claimed that at least 95% of the equipments which she supplied to a factory
conformed to the specifications. An examination of the sample of 200 pieces of equipment
revealed that 18 were faulty. Test her claim at 5% level of significance.
26. The controller of examination of FW university claimed that at least 60% of the students have
passed in the university. An examination is conducted to a random sample of 200 students and
it is found that 140 students were passed in the examination. Test whether the controller's
claim is valid or not at 5% level of significance. Use critical value approach and P-value
approach to make decision.
68
27. The coca-cola company is interested in entering the fruit drink market. Before bringing its
new product to the market the company wishes to be sure that it will capture more than 20%
of the fruit drink market. A survey of 1000 people shows that 210 respondents prefer its new
product to other fruit drinks. Is there enough evidence to allow coca-cola to proceed with the
new product?
28. Suppose that production manager implements a newly developed sealing system for boxes. he
takes a random sample of 299 boxes from the daily output and finds that 12 need rework. Test
the hypothesis that the new sealing system has decreased defective packages to below 10 per
cent. (use 1 per cent level of significance)
29. A mail order company claims that at least 60% of all orders are mailed within 48 hours. From
time to time the quality control department at the company checks if this promise is fulfilled.
Recently the quality control department at this company took a sample of 400 orders and
found that 208 of them were mailed within 48 hours of the placement of the orders. At 1%
level of significance, can you conclude that the company's claim is true?
30. An airlines claim that only 15% of its flight arrive more than 10 minutes late. Suppose we
take a random sample of 50 flights by airlines and found that 10 flights arrive late. Test the
claims of the airlines at 1 % level of significance. Use critical value approach and P-value
approach to make decision.
31. In a sample of 625 persons selected at random from a city, 300 were males. Test the
hypothesis that males and females in equal numbers in city at 5% level of significance.
32. It is claimed that both tea and coffee are equally popular in Illam district. In a random sample
of 1200 person 650 were regular consumers of tea. Is the claim justified at 5% level of
significance? Use critical value approach and P-value approach to make decision.
33. In a metropolitan city, it was observed that 500 out of 1500 men are against the "Green
Sticker Control Policy" in vehicles. Based on this information can you conclude that the
majority of the people in the city are favor of the policy, assuming that people in the favor and
disfavor are equal?
34. A coin is tossed 900 times and heads appear 490 times. Does this support the hypothesis that
coin is unbiased? Use critical value approach and P-value approach to make decision.
35. A dice is thrown 300 times and of these 135 yielded prime numbers. Is this consistent with the
hypothesis that the dice is unbiased?
36. In a random sample of 600 and 1000 men from two cities, 400 and 600 men are found to be
literate. Do the data indicate at 1% level of significance that the populations are significantly
different in the percentage of literacy?
37. In a sample of 300 units of a manufactured product, 65 units were found to be defective and in
another sample of 200 units. There were 35 defectives. Is their significance difference in the
proportion of defectives the samples at the 5% level of significance?
38. In a random sample of 900 men taken from Kathmandu, 450 are found to be smokers. In
another random sample of 600 men taken from Biratnagar 400 are smokers. Do the data
indicate that percentage of smokers is Kathmandu is less than that of Biratnagar?
69
39. A candidate for election made a speech in city A but not in city B. A sample of 500 voters
from city A showed that 60% of the voters were in favor of him, whereas a sample of 300
voters from city B showed that 48% of the voters favored him. Is speech produced any effect
on voters in city A. Use critical value approach and P-value approach to make decision.
40. A company is considering two different television advertisements for production of a new
product. Management believes that advertisement. A is less effective than advertisement. B.
Two test market areas with virtually identical consumer characteristics are selected:
advertisement. A is used in one area and advertisement. B in the other area. In a random
sample of 100 customers who saw advertisement A, 22 had tried the product. In a random
sample of 60 customers who saw advertisement B, 18 had tried the product. Does this indicate
that advertisement? A is less effective than advertisement B, if a 5 per cent level of
significance is used?
41. A machine puts out 16 imperfect articles in a sample of 500. After machine is overhauled, it
puts out 3 imperfect articles in a batch of 100. Has the machine improved?
42. Before an increase in excise duty on coffee, 800 persons were coffee drinkers in a sample of
1000 persons were found to be coffee drinkers. After an increase in duty, 800 persons out of
1200 people were found to be coffee drinkers. Do you think that there has been a significant
decrease in the consumption of coffee after increase in excise duty?
43. In a year there are 956 births in a town A, of which 52.5% were males, while in town A and B
combined, this proportion in a total of 1,406 births was 0.496. Is there any significant
difference in the proportion of male births in the two towns?
1. |Z| = 6.13 reject H0 2. |Z| = 1.25, accept H0
3. |Z| = 4.0, reject H0
4. (a) |Z| = 2.87 , reject H0 (b) Z = – 2.87, reject H0
5. |Z| = 3.54 , reject H0, p-value < α, reject H0
6. |Z| = 1.67 , reject H0,p-value < α, reject H0
7. |Z| = 1.5 , accept H0, p-value > α, accept H0
–
8. |Z| = 2.0 ,accept H0, limits = (612.54, 643.46), S.E.(X) = 6
9. |Z| = 3.63 , reject H0 10. |Z| = 0.2 , reject H0
11. |Z| = 2.1 , reject H0 12. |Z| = 13.75 , reject H0
13. |Z| = 0.72 , accept H0 14. Z = 9.26 , reject H0
15. Z = 3 , reject H0 16. Z = 4.19 , reject H0, p-value < α, reject H0
17. Z = 26.02 , reject H0 18. Z = 1.32 , accept H0
19. (i) |Z| = 8.37, reject H0 (ii) Z = 8.37, reject H0
20. (i) Z = 4.75, reject H0 (ii) Z = 4.75, reject H0
70
21. (i) |Z| = 88.82, reject H0 (ii) Z = – 88.82, reject H0
22. (i) |Z| = 3.55, reject H0 (ii) Z = 3.55, reject H0
23. (i) |Z| = 3.14, reject H0 (ii) Z = – 3.14, reject H0
24. |Z| = 2.29 reject H0 25. |Z| = 2.59, reject H0
26. |Z| = 2.89, accept H0 p-value > α, accept H0
27. |Z| = 0.79, accept H0
28. |Z| = 3.54, reject H0 29. |Z| = 3.27, reject H0
30. |Z| = 0.99, accept H0, p-value > α, accept H0
31. |Z| = 1, accept H0
32. |Z| = 2.77, reject H0, p-value < α, reject H0 33. |Z| = 12.91, reject H0
34. |Z| = 2.67, reject H0, p-value < α, reject H0 35. |Z| = 1.73, accept H0
36. |Z| = 2.67, accept H0 37. Z = 1.14, accept H0
38. Z = 6.38, reject H0 39. Z = 3.31, reject H0, p-value < α, reject H0
40. Z = 1.13, accept H0 41. Z = 0.104, accept H0
42. |Z| = 6.92, reject H0 43. |Z| = 3.18, reject H0
72
21. A two tailed test of a difference between two proportions led to Z=1.96 for its
standardized difference of sample proportions. For which of the following significance
level would you reject H0
a. α = 0.10 b. α = 0.01 c. α = 0.05 d. α = 0.02
22. Which of the following is called type II error?
a. The hypothesis is true, but our test reject it.
b. The hypothesis is true, but our test accept it.
c. The hypothesis is false, but our test accept it.
d. The hypothesis is false, but our test reject it.
23. For a two-tailed test of hypothesis at (=0.1 , then acceptance region is the entire region.
a. to the right of the negative critical value.
b. Between the two critical value.
c. Outside of the two critical values.
d. To the left of the positive critical values.
24. An assertion or conjecture made about the distribution of one or more variables on one
or more population is called
a. a research hypothesis b. a statistical hypothesis
c. composite hypothesis d. null hypothesis
25. When is the power of a test?
a. by increasing the sample size. b. by decreasing the sample size.
c. With a fixed sample size. d. By decreasing the level of significance.
26. Which is the power of a test?
a. 1-) b. ) c. ! d. 1- !
27. When null hypothesis is H0: *=9, the alternate hypothesis can be
a. H1: +=9 b. H 1: + ≠ 9 c. H 1: + < 9 d. All of these.
28. When ( = 0.05 and .=0.10 in a test of hypothesis. The power of the test is
a. 0.05 b. 0.90 c. 0.85 d. 0.95
29. The large sample test for testing /0 = / for normal population is
a. Z-test b. t-test c. F-test d. one of these
30. For test of hypothesis 1: *0 ≤ * and 10 : *0 > * , the critical region at α = 0.01 and n >
30 is
a. Z ≤ 1.96 b. Z > 1.96 c. Z ≤ 1.645 d. Z > 1.645
31. When a null hypothesis is 1: * = * , The alternative hypothesis can be
a. 45 : + ≥ +6 b. 45 : + < +6 c. 45 : + ≠ +6 d. 45 : + = +6
32. To test H0 : * = * vs. H1 : * > * when the population S.D. is known, the appropriate
test is:
a. t-test b. Z- test c. F- test d. None of the above
73
33. Test of hypothesis H0: * = 70 vs. H : * > 70 leads to:
a. One-sided left-tailed test b. One-sided right-tailed test
c. Two-tailed test d. None of the above
34. Testing H0 : * = 1500 against * <1500 leads to:
a. One-sided lower tailed test b. One-sided upper tailed test
c. Two-tailed test d. All the above
35. Testing H0 : *= 100 vs. H1 : * ≠ 100 leads to:
a. One-sided upper tailed test b. One-sided lower tailed test
c. Two-tailed test d. None of the above
36. To test H0: P= 0.4 vs. H1: P ≠ 0.4 in binomial population, there are eight persons out of
fifteen who favored a proposal. The value of statistic-Z is:
a. 5.813 b. 1.08 c. 7.32 d. None of the above
37. The value of 7( at (=0.05 is
a. 1.96 b. 2.575 c. 1.64 d. 2.33
38. We can use the normal distribution to represent the sampling distribution of the
population when the sample size is:
a. more than 30. b. less than 30 c. more than 15 d. less than 15
39. Suppose you have observed proportion for three geographic region. You wish to test
whether the region have significantly different proportions. Assuming p1, p2 and p3 are
the true proportions, which of the following would be your null hypothesis?
a. H 0: p 1 = p 2 = p 3 b. H0: p1 ≠p2 ≠p3
c. H0: p1, p2, p3 are not equal d. None of these.
1. b 2. d 3. d 4. c 5. a 6. b 7. b 8. c 9. a 10. b
11. b 12. A 13. c 14. d 15. c 16. d 17. a 18. b 19. c 20. d
21. c 22. c 23. b 24. b 25. a 26. a 27. d 28. b 29. a 30. c
31. b 32. b 33. b 34. a 35. c 36. d 37. c 38. a 39. a
XXX
74
Unit Test of Significance for Small Samples
75
3.1 Test of Significance
The test of hypothesis can be reviewed as a test of significance. This is why if we test the null
hypothesis H0: θ = θ0, it is the testing of null hypothesis that there is no difference between
parametric value and the hypothesized value θ0. Since the hypothesis H0 is tested on the basis of a
random sample drawn from a population we compute a statistic t which is an unbiased point
estimator of θ. Then the test of the H0: θ = θ0 is equivalent to the test of hypothesis that there is no
significant different between the statistic t and the parametric value θ. Therefore the problem of test
of hypothesis can be solved a problem of test of significance
76
3.2.2 Degrees of freedom (d.f.)
The degree of freedom is defined as the total number of observations less the number of
independent constraints imposed on the observations. It is denoted by ν (the letter 'Nu' of the Greek
alphabet). For example; if k is the number of independent constraints in a set of data of n
observations then the degree of freedom is ν = n – k.
More specifically, degree of freedom is defined as the number of observations that can be
chosen freely. As for example, suppose we know that the mean of four values is 20. Consequently,
the sum of these four values is 80. Now, how many values out of four can we choose freely so that
the sum of these four values is 80? The answer is that we can freely choose 4 – 1 = 3 values.
Suppose we choose 27, 8, and 19 as the three values. Given these three values and the information
that the mean of four values is 20, the fourth value is 80 – 27 – 8 – 19 = 26. Thus, once we have
chosen three values, the fourth value is automatically determined. Consequently, the number of
degrees of freedom for this examples is df = ν = n – 1 = 4 – 1 = 3
3.2.3 Small Sample Test (t-test)
Many times the size of a sample that is used to a test of hypothesis about the mean µ is small,
that is, n < 30. This may be the reason because we have limited resources and cannot afford to take a
large sample or because of the nature of the experiment itself. If small sample is sufficient to get
information about the population parameter, it is not necessary to take a large sample to make
decision about the population parameter; it saves out time, available resources and money. If the
population is normal, the population standard deviation (σ) is not know, and the sample size is small,
that is, n < 30, then the normal distribution is replaced by the t distribution to make a test of
hypothesis about the population parameter.
3.2.4 Assumptions for t-test
The t test is used under the following assumptions:
1. The sample size is less than 30, i.e., n < 30
2. The population standard deviation σ is unknown.
3. The parent population from which the sample is drawn is normal.
3.2.5 Applications of t-test
The t-test has a wide number of applications in management. Some of them are
1. Test of significance of single mean.
2. Test of significance of difference between two sample means.
3. Paired test for difference of means.
4. Test of significance of an observed sample correlation coefficient.
3.2.6 Test of significance of single mean
Let us consider a normal population with population mean µ and standard deviation σ which
is not known. Now we want to test population mean is significantly different from a specified value
of the mean. For example, it is claimed that the students spend, on an average, Rs. 2000 to buy the
books they need. Here we want to test whether in reality they spend Rs. 2000 or not. If a single value
of a population parameter is to be tested, we carry out the test of significance of single mean.
The procedure for testing the significance of a single mean is as follows:
77
Step 1: Setting up Hypotheses
Null Hypothesis H0: µ = µ0, i.e., the population has a specified value µ0. In other words,
there is no significant difference between sample mean and population mean, or, the sample has
been drawn from a normal population.
Alternative hypothesis: Any one of the following alternative hypothesis will be set while
solving the problems.
i. H1:µ ≠ µ0, i.e., the population has not a specified value µ0. In other words, there is significant
difference between sample mean and population mean, or the sample has not been drawn
from normal population with mean µ0. [Two-tailed test]
ii. H1: µ > µ0, i.e., the population mean is greater than a specified value µ0. [Right-tailed test]
iii. H1: µ < µ0, i.e., the population mean is less than a specified value µ0. [Left-tailed test]
α)
Step 2: Level of significance (α
Choose the appropriate level of significance in advance. The most commonly used is
α = 5% unless otherwise stated.
Step 3: Test Statistic
Under H0, the test statistics is
–
X–µ
t=
−
( )
S.E. X
–
X–µ
t= ~ t(n – 1)
S
n
– S
Where, S.E.(X) = standard error of mean =
n
–
X = Sample mean
µ = Population mean
n = Sample size
S = Unbiased estimate of population standard deviation
Calculations of S, unbiased estimate of population standard deviation
i. Actual mean method
1
(– 2 )
S= Σ X–X ,
n–1
– Σx
where, X=
n
ii. Direct Method
1 (ΣX)2
S= ΣX 2–
n – 1 n
78
iii. Short-cut method
1 2 (Σd)2
S= Σd –
n – 1 n
Where, d = X – A, A = Assumed mean
Step 4: Degree of freedom (d.f.). The degree of freedom is df = n – 1.
Step 5: Critical value
The critical or tabulated value of the test statistic t at the pre-specified level of significance for
(n–1) degree of freedom according to the tails of a test is obtained from the t - table.
Step 6: Decision
i. If the calculated value of t is less or equal to the tabulated value of t, then we accept H0 i.e.,
the population mean has a specified value µ0. In other words, there is no significant difference
between sample mean and the population mean, or the sample has drawn from a normal
population with mean µ0.
ii. If the calculated value of t is greater than the tabulated value of t, then we reject H0, i.e., the
population mean has not a specified value µ0. In other words, there is significant difference
between sample mean and the population mean, or the sample has not drawn from a normal
population with mean µ0
Remarks:
i. Confidence (or Fiducial) limits for estimating population mean µ. (1 – α)% Confidence limits
for estimating the population mean µ is given by
– S
C.I. for µ = X ± tα‚ (n – 1)×
n
2
ii. The sample variance s is given by
1 –2
s2 = Σ(X – X)
n
–2
⇒ ns2 = Σ(X – X)
The unbiased estimate of population variance is given by
1 –2
S2 = Σ(X – X)
(n – 1)
–2
or, (n – 1) S2 = Σ(X – X)
or, (n – 1) S2 = n s2
S2 s2
or, =
n n–1
S s
or, =
n n–1
In numerical problems, sometimes we are given the sample standard deviation. Hence, for
testing test of significance of single mean we used the following test statistic.
–
X– µ
t=
s / n –1
79
Example:
A random sample of size 20 from a normal population gives a sample mean of 42 and sample
standard deviation of 6. Test the hypothesis that the population mean is 44.
Solution:
We are given,
Sample size (n) = 20,
– ( )
Sample mean X = 42,
Sample standard deviation (s) = 6
Population mean (µ) = 44
Setting up hypotheses:
Null hypothesis H0:µ = 44, i.e., the population mean is 44.
Alternative hypothesis H1:µ ≠ 44, i.e., the population mean is not 44. [Two tailed test]
Level of significance: Since the level of significance is not given, we take α = 5%.
Test Statistic: Under H0, the test statistic is,
–
X– µ 42 – 44
t = = = – 1.45
s / n –1 6/ 20 – 1
∴ | t | = 1.45
Degree of freedom: df = n – 1 = 20 – 1 = 19
Critical Value: The tabulated value of the test statistic t at 5% level of significance for 19
degree of freedom and in two tailed test is ± 2.093, i.e., | t0.05,19| = 2.093.
Decision: Since the calculated value of |t| = 1.45 is less than the tabulated value of
| t0.05,19| = 2.093, H0 is accepted. Hence we conclude that the population mean is 44.
Example:
An automobile tyre manufacture claims that average life of a particular grade of tyre is more
than 20,000 km when used under normal conditions. A random sample of 16 tyres was tested and a
mean and standard deviation are found to be 22,000 km and 5000 km respectively. Assuming the life
of the tyres in km to be normally distributed, decide whether the manufacturer's claim is valid. Use
critical value and P-value approach for decision.
Solution:
In the usual notations, we are given that:
–
n = 16, X = 22,000 km., s = 5,000 km., µ = 20,000 km.
Setting up hypothesis:
Null hypothesis H0: µ = 20,000 km, i.e., the average life of tyres is 20,000 km. In other words
the company's claim is not valid.
80
Alternative hypothesis H1: µ > 20,000 km, i.e., the average life of tyres is 20,000 km. In
other words the company's claim is valid. [Right tailed test]
Level of significance: Since the level of significance is not given, we take α = 5%.
Test Statistic: Under H0, the test statistic is,
–
X– µ 22000 – 20000
t = = = 1.55
s / n –1 5000/ 16 – 1
∴ t = 1.55
Degree of freedom: df = n – 1 = 16 – 1 = 15
Critical value: The tabulated value of the test statistic t at 5% level of significance for 15
degree of freedom and in right tailed test is 1.753, i.e., t0.05,15 = 1.753.
Decision: Since the calculated value of t = 1.55 is less than the tabulated value of t0.05,15 =
1.753, H0 is accepted. Hence we conclude that the average life of tyres is 20,000 km. In other words
the company's claim is not valid.
Using P-value approach for one tailed test,
P-value = P [T ≥ Tcal] = P [T ≥ 1.55] = 0.070
Decision: Since P-value P0 = 0.070 > 0.05 = α
Therefore we accepted the null hypothesis (H0)
t=0 1.55
Example:
Ten cartons are taken at random from an automatic filing machine. The mean net weight of
ten cartoons is 15.5 ounces and standard deviation is 0.88 ounces. Can we conclude that there is a
significant difference in the sample mean from the intended weight of 16 ounces? Also obtain 95%
and 99% confidence limits for the population mean.
Solution:
In the usual notations, we are given that:
–
n = 10, X = 15.5 Ounces, s = 0.88 Ounces, µ = 16 Ounces.
Setting up hypotheses:
Null hypothesis H0:µ = 16 Ounces, i.e., there is no significant difference between sample
mean and the intended mean.
Alternative hypothesis H1:µ ≠ 16 Ounces, i.e., there is significant difference between sample
mean and the intended mean.[Two tailed test]
Level of significance: We take the level of significance as α = 5%.
Test Statistic: Under H0, the test statistic is,
–
X– µ 15.5 – 16
t = = = – 1.70
s / n –1 0.88/ 10 – 1
∴ | t | = 1.70
81
Degree of freedom: df = n – 1 = 10 – 1 = 9
Critical value: The tabulated value of the test statistic t at 5% level of significance for 9
degree of freedom and in two tailed test is ±2.262, i.e., | t0.05,9 | = 2.262.
Decision: Since the calculated value of |t| = 1.70 is less than the tabulated value of
| t0.05,9 | = 2.262, H0 is accepted. Hence we conclude that there is no significant difference between
sample mean and intended mean.
For 95% confidence limits:
1 – α = 0.95 ⇒ α = 0.05, df = n – 1 = 10 – 1 = 9, t0.05, 9 = 2.262
Thus, 95% confidence limits for population mean µ is
s
– ( )
– –
C.I. for µ = X ± tα, n–1 × S.E. X = X ± t0.01, 9 ×
n–1
0.88
= 15.5 ± 3.25 × = 15.5 ± 0.95
10 – 1
∴ Lower limit = 15.5 – 0.95 = 14.55
Upper limit = 15.5 + 0.95 = 16.45
Example:
The mean weekly sales of soap bars in departmental stores were 146 per store. After
advertising campaign the mean weekly sales in 22 stores for a typical week increased to 153 per
store with a standard deviation of 17.2. Was the advertising campaign successful?
Solution:
In the usual notations, we are given that:
–
µ = 146, n = 22, X = 153, s = 17.2
Setting up hypotheses:
Null hypothesis H0: µ = 146, i.e., the mean weekly sales of soap bars in a departmental store
is 146 per store. In other words, the campaign was not successful.
Alternative hypothesis H1: µ > 146, i.e., the mean weekly sales of soap bars in a departmental
store is greater than 146 per store. In other words, the campaign was successful. [Right tailed test]
Level of significance: Since the level of significance is not given, we take α = 5%
Test Statistic: Under H0, the test statistic is,
–
X– µ 153 – 146
t = =
s / n –1 17.2/ 22 – 1
∴ t = 1.865
Degree of freedom: df = n – 1 = 22 – 1 = 21
Critical value: The tabulated value of the test statistic t at 5% level of significance for 21
degree of freedom and in right tailed test is 1.721, i.e., t0.05, 21 = 1.721.
Decision: Since the calculated value of t = 1.865 is greater than the tabulated value of
t0.05, 21 = 1.721, H0 is rejected and H1 is accepted.
Hence we conclude that the mean weekly sale of soap bars in a departmental store is greater
than 146 per store. In other words, the campaign was successful.
82
Example:
The manufacturer of a certain make a electric bulbs claims that his bulbs have a mean life of
25 months. A random of 6 such bulbs gave the following life times:
Life in months: 24 26 30 20 20 18
Can you regard the producer's claim to be valid at 1% valid of significance?
Solution:
In usual notations, we are given:
n = 6, µ = 25 months
–
Calculations of X and S:
X d = X – 20 d2
24 4 16
26 6 36
30 10 100
20 0 0
20 0 0
18 –2 4
Σd = 18 Σd2 = 156
Here, A = 20
– Σd 18
∴ X =A+ = 20 + = 23
n 6
1 2 (Σd)2
S = Σd –
n–1 n
1 182 1
= × 156 – = × [156 – 54] = 4.52
6–1 6 5
Setting up hypotheses:
Null hypothesis H0: µ = 25 months, i.e., the mean life of the electric bulbs is 25 months. In
other words, the producer's claim is valid.
Alternative hypothesis H1:µ < 25 month, i.e., the mean life of the electric bulbs is less than 25
months. In other words, the producer's claim is not valid. [Left tailed test]
Level of significance: It is given that level of significance α = 1%.
Test Statistic: Under H0, the test statistic is,
–
X–µ 23 – 25
t = = = – 1.08
S 4.52/ 6
n
∴ | t | = 1.08
Degree of freedom: df = n – 1 = 6 – 1 = 5
83
Critical value: The tabulated value of the test statistic t at 1% level of significance for 5
degree of freedom and in left tailed tests is – 3.365, i.e., | t0.01, 5| = 3.365.
Decision: Since the calculated value of | t | = 1.08 is less than the tabulated value of
|t0.01, 5| = 3.365, H0 is accepted. Hence we conclude that the mean life of the electric bulbs is 25
months. In other words, the producer's claim is valid.
Example:
A sample survey on marriage are of Nepalese daughters is conducted in a certain district of
Nepal. The survey showed the following distribution of age at marriage.
Age at marriage 10-15 15-20 20-25 25-30 30-35
Number of Daughter 4 10 4 1 1
Do the sample data consistent with the hypothesis that the mean age at marriage of Nepalese
daughter is 18 years at 5% level of significance? Test at 2% level of significance?
Solution:
We want to test null hypothesis H0 against alternative hypothesis H1 which we set up as
follows:
Null hypothesis: H0: µ = 18 years i.e., the mean age at marriage of Nepalese daughter is 18 years.
Alternative hypothesis: H1: µ ≠ 18 i.e., the mean age at marriage of Nepalese daughter is not
18 years. (Two tailed test)
–
X–µ
Test statistic: Under H0, the test statistic is t = ~ t(n – 1)
S
n
X – 22.5
Age of marriage f mid-value(X) d'= fd' fd'2
5
10-15 4 12.5 -2 -8 16
15-20 10 17.5 -1 -20 20
20-25 4 22.5 0 0 0
25-30 1 27.5 1 1 1
30-35 1 32.5 2 2 4
Total n = 20 -25 41
– Σfd' – 25
Where, X = A + × h = 22.5 + × 5 = 16.25 years
n 20
The sample variance S2 is given by
1 (Σfd')2
S2 = Σfd'2 – × h2
n–1 n
2
1 41 – (– 25) × 52 = 12.83
=
20 – 1 20
∴ S = 3.58
84
–
X – µ 16.25 – 18 – 1.75
t= = = = – 2.18
S 3.58 0.80
n 20
∴ | t | = 2.18
Critical value: The critical value of t at α = 0.05 and n – 1 = 19 d.f. for two tailed test is t0.025,
19 = 2.09. Similarly, the critical value at α = 0.02 and 19 d.f. for two tailed test is
t0.01,19 = 2.54.
Decision: Since the compute value of t, |t| = 2.18 is greater than the critical value t0.0258, 19=
2.09, we reject the null hypothesis H0 at 5% level of significance. Hence we can conclude that
the mean age at marriage of Nepalese daughter is not 18 years.
But, as |t| = 2.18 < 2.54 = t0.02, 19 we accept the null hypothesis H0 at α = 0.02 and hence we
can concluded that the mean age at marriage of Nepalese daughter is 18 year at 2% level of
significance.
Example:
A random sample 16 values from a normal population showed a mean of 41.5 inches and the
sum of squares of deviations from this mean equal to 135 square inches. Show that the assumption of
a mean of 43.5 inches for the population is not reasonable. Obtain 95 per cent fiducial limits for the
same.
Solution:
In usual notations, we are given that:
– –
n = 16, X = 41.5 inches, Σ(X – X )2 = 135, µ = 43.5 inches,
1 – 2 1
∴ S = × Σ(X – X) = × 135 = 3
n–1 16 – 1
Setting up hypotheses:
Null hypothesis H0:µ = 43.5 inches, i.e., the mean of the population is 43.5 inches.
Alternative hypothesis H1:µ ≠ 43.5 inches, i.e., the mean of the population is not 43.5 inches.
[Two - tailed test]
85
Degree of freedom: df = n – 1 = 16 – 1 = 15
Critical value: The tabulated value of the test statistic t at 5% level of significance for 15
degree of freedom and in two tailed test is ± 2.131, i.e., | t0.05, 15 | = 2.131.
Decision: Since the calculated value of |t| = 2.67 is greater than the tabulated value of
|t 0.05, 15| = 2.131, H0 is rejected and H1 is accepted.
Hence we conclude that the mean of the population is not 43.5 inches.
For 95% confidence limits:
1 – α = 0.95 ⇒ α = 0.05, df = n – 1 = 16 – 1 = 15, t0.05, 15 = 2.131.
Thus, 95% confidence limits for population mean µ is
S 3
– ( )– –
C.I. for µ = X ± tα, n–1 × S.E. X = X ± t0.05,15 ×
n
= 41.5 ± 2.131 ×
16
= 41.5 ± 1.6
86
iii. H1:µ1 < µ2, i.e., the mean of first population is less than the mean of second population.[Left-
tailed test]
α)
Step 2: Level of significance (α
Choose the appropriate level of significance in advance. The most commonly used is
α = 5% unless otherwise stated.
Step 3: Test Statistic
– – – –
(X1 – X2) – E(X1 – X2)
Under H0 the pooled two sample t-statistic is, t =
– –
S.E.(X1 – X2)
– –
X1 – X2
= ∼t
2 1 1 n1 + n2 – 2
Sp
n1 + n2
Where,
2 1 1
SP +
– –
S.E.( X1 – X2 ) = standard error of different between means =
n1 n2
–
X1 = Mean of the sample drawn from 1st population
–
X2 = Mean of the sample drawn from 2nd population
n1 = Size of the sample drawn from 1st population (n1 < 30)
n2 = Size of the sample drawn from 2nd population (n2 < 30)
2
Sp = Unbiased estimate of the common population variance = σ2 or pooled estimator of the
variance
2
Calculations of Sp, an unbiased estimate of population variance:
i. Actual mean method:
1
2
Sp =
n1 + n2 – 2
[ – –
Σ(X1 – X1)2 + Σ(X2 – X2)2 ]
ii. Direct method:
2 2
2
Sp =
1 ΣX 2 – (ΣX1) + ΣX2 – (ΣX2)
n1 + n2 – 2 1 n1 2 n2
iii. Short-cut method
2 2
2
Sp =
1 Σd2 – (Σd1) + Σd2 – (Σd2)
n1 + n2 – 2 1 n1 2 n2
Where, d1 = X1 – A, d2 = X2 – B, A and B are assumed mean.
2 2
2 2 2 2 n1s1 + n2s2
iv. When biased sample variance s1 and s2 are given, then Sp is estimated as Sp =
n1 + n2 – 2
2 2 2
v. When unbiased sample variances S1 and S2 are given, then Sp is estimated as
2 2
2 (n1 – 1)S1 + (n2 – 1)S2
Sp =
n1 + n2 – 2
87
Step 4: Degree of freedom:The degree of freedom is d.f. = n1 + n2 – 2.
Step 5: Critical value:
The critical or tabulated value of the test statistic t at the pre-specified level of significance for
n1 + n2 – 2 degree of freedom and according to tails of a test is obtained from the t -table.
Step 6: Decision
i. If the calculated value of t is less than or equal to the tabulated value of t, then we accept H0,
i.e. two independent population means are equal. In other words, there is no significant
difference between the sample means.
ii. If the calculated value of t is greater than the tabulated value of t, then we reject H0, i.e., two
independent populations means are not equal. In other words, there is significant difference
between the sample means.
Example:
The mean life of a sample of 10 electric light bulbs was found to be 1456 hours with standard
deviation of 423 hours. A second sample of 17 bulbs chosen from a different batch showed a mean
life of 1280 hours with standard deviation of 398 hours. Is there a significant difference between the
means of the two batches?
Solution:
In usual notations, we are given that:
–
n1 = 10, X1 = 1456 hours, s1 = 423 hours
–
n2 = 17, X2 = 1280 hours, s2 = 398 hours
Setting up hypotheses:
Null hypothesis H0:µ1 = µ2, i.e., there is no significant difference between the means life of
light bulbs of the two batches.
Alternative hypothesis H1:µ1 ≠ µ2, i.e., there is significant difference between the means of
light bulbs of the two batches. [Two-tailed test]
Level of significance: Since the level of significance is not given, we take α = 5%
Test Statistic: Under H0, the test statistic is,
– –
X1 – X2
t=
2 1 1
S –
p n1 n2
2 2
2 n1s1 + n2s2 10 × (423)2 + 17 × (398)2
Where, Sp = = = 179286.32
n1 + n2 – 2 10 + 17 – 2
1456 – 1280
∴ t= = 1.04
1 1
179286.32 × +
10 17
∴ t = 1.04
88
Degree of freedom: df = n1 + n2 – 2 = 10 + 17 – 2 = 25
Critical value: The tabulated value of the test statistic t at 5% level of significance for 25
degree of freedom and in two tailed test is ± 2.06, i.e., | t0.05,25 | = 2.06.
Decision: Since the calculated value of t = 1.04 is less than the tabulated value of | t0.05,25 | =
2.06, H0 is accepted. Hence we conclude that there is no significant difference between the
means of the two batches of light bulbs.
Example:
You are given the following data about the life of two brands of bulbs:
Mean life Standard deviation Sample size
Brand A 2230 hrs 250 hrs 12
Brand B 2000 hrs 300 hrs 15
Do you think that mean life of brand A bulbs is higher than that of brand B bulbs?
Solution:
In usual notations, we are given that:
Brand A Brand B
n1 = 12 n2 = 15
– –
X1 = 2230 hours X2 = 2000 hours
s1 = 250 hours s2 = 300 hours
Setting up hypotheses:
Null hypothesis H0: µ1 = µ2, i.e., there is no significant difference between the mean life of
bulbs of brand A and brand B.
Alternative hypothesis H1: µ1 > µ2, i.e., mean life of bulbs of brand A is higher than the mean
life of bulbs of brand B. [Right-tailed test]
89
Example:
For a random sample of 10 pigs fed on diet A, the increase in weight (in lbs) in a certain
period were 10, 17, 13, 12, 9, 8, 14, 15, 6 and 16. For another random sample if 12 pigs fed on diet
B, the increase in weight in the same period were 14, 18, 8, 21, 23, 10, 17, 12, 22, 15, 7 and 13. Test
whether diets A and B differ significantly as regards their effect on increase in weight. Use critical
value and P-value approach for decision.
Solution:
Setting up hypotheses: Null hypothesis H0: µ1 = µ2, i.e., there is no significant difference
between the average effect on increase in weight due to diets A and B.
Alternative hypothesis H1:µ1 ≠ µ2, i.e., there is no significant difference between the average
effect on increase in weight due to diets A and B. [Two-tailed test]
Level of significance: Since the level of significance is not given, we take α = 5%
2
Calculations sample means and Sp
Diet A Diet B
X1 d1 = X1 – 10 2 X2 d2 = X2 – 15 2
d1 d2
10 0 0 14 –1 1
17 7 49 18 3 9
13 3 9 8 –7 49
12 2 4 21 6 36
9 –1 1 23 8 64
8 –2 4 10 –5 25
14 4 16 17 2 4
15 5 25 12 –3 9
6 –4 16 22 7 49
16 6 36 15 0 0
7 –8 64
13 –2 4
Σd1 = 20 2
Σ d1 = 160 Σd2 = 0 2
Σ d2= 314
Here,
n1 = 10, A = 10
n2 = 12, B = 15
– Σ d1 20
Now, X1 = A + = 10 + = 12
n1 10
– Σ d2 0
X2 = B + = 15 + = 15
n2 12
2 1 2 (Σd1)2 2 (Σd2)2
S = Σd – + Σd –
p n1 + n2 – 2 1 n1 2 n2
2 2
1 160 – 20 0
= + 314 –
10 + 12 – 2 10 12
160 – 40 + 314
= = 21.7
20
90
Test Statistic: Under H0, the test statistic is,
– –
X1 – X2
t =
2 1 1
Sp –
1
n n 2
12 – 15
=
1 1
21.7 ×
10 + 12
= – 1.50
∴ | t | = 1.50
Degree of freedom: df = n1 + n2 – 2 = 10 + 12 – 2 = 20
Critical value: The tabulated value of the test statistic t at 5% level of significance for 20
degree of freedom and in two tailed tests is ± 2.086, i.e., | t0.05,20| = 2.086.
Decision: Since the calculated value of |t| = 1.50 is less than the tabulated value of | t0.05,20 | =
2.086, H0 is accepted. Hence we conclude that there is no significant difference between the
average effect on increase in weight due to diets A and B.
P-value = 2P [| T | ≥ Tcal]
– 1.50 t=0 1.50
= 2P [| T | ≥ 1.50]
= 2 × 0.075
= 0.15
Example:
Two types of drugs were used on 5 and 7 patients for reducing their weight. Drug A was
imported and drug B was indigenous. The decrease in the weight after using the drugs for six months
was as follows:
Drug A 10 12 13 11 14
Drug B 8 9 12 14 15 10 9
Is there a significant difference in the efficacy of two drugs? If not, which drug should you
buy?
91
Solution:
Setting up hypotheses:
Null hypothesis H0: µ1 = µ2, i.e., there is no significant difference in the efficacy of two drugs
A and B.
Alternative hypothesis H1:µ1 ≠ µ2, i.e., there is no significant difference in the efficacy of two
drugs A and B. [Two-tailed test]
Level of significance: Since the level of significance is not given, we take α = 5%
2
Calculations sample means and Sp
Drug A Drug B
X1 – – X2 – –
X 1 – X1 (X1 – X1 )2 X 2 – X2 (X2 – X2 )2
10 –2 4 8 –3 9
12 0 0 9 –2 4
13 1 1 12 1 1
11 –1 1 14 3 9
14 2 4 15 4 16
10 –1 1
9 –2 4
60 10 77 44
Here, n1 = 5, n2 = 7
– ΣX1 60
Now, X1 = = = 12
n1 5
– ΣX2 77
X2 = = = 11
n2 7
1 1
2
Sp =
n1 + n2 – 2
[ – –
Σ(X1 – X1)2 + Σ(X2 – X2)2 = ]
5+7–2
[10 + 44]
Degree of freedom: df = n1 + n2 – 2 = 5 + 7 – 2 = 10
Critical value The tabulated value of the test statistic t at 5% level of significance for 10
degree of freedom and in two tailed test is ± 2.228, i.e., | t0.05,10 | = 2.228.
Decision: Since the calculated value of t = 0.735 is less than the tabulated value of
|t0.05,10 | = 2.228, H0 is accepted. Hence we conclude that there is no significant difference in
the efficacy of two drugs A and B. So, any drugs, A or B can be brought.
92
3.2.8 Paired t-test for difference of means
In t-test for difference of means, we performed the hypothesis testing procedure about the
difference between two population means when the samples are independent of each other. This
section describes hypothesis procedure for the difference between two population means when the
samples are dependent. The problem is to test if the sample means differ significantly or not.
Paired t-test is used under the following assumptions:
i. The sample sizes are equal, i.e., n1 = n2 = n (< 30).
ii. The two samples are dependent.
In case of two dependent samples, two data values are collected before and after from the
same source and hence these are also called paired samples. For example, suppose we may want to
make inferences about the mean sales for a certain product was increased after the heavy
advertisement program. To do so, we select a sample of shops less than 30 and record their sales
before and after the advertising program. In this example, both sets of data are collected from the
same sample of shops, once before and once after the program. This is an example of paired
samples.
In order to carry out this test we take the difference between the two data values for each
element of the two samples before (X) and after (Y) which is denoted as d = X – Y. This value of d
is called paired difference. We then treat all the values d as one sample and make inferences
applying procedures similar to the ones used for one sample cases as test of significance of single
mean. In the above example of advertising program the advertisement program is considered as
unsuccessful if the mean of the paired difference for the population µd = µx – µy = 0.
The procedure for testing the significance of paired t-test for difference of means is as
follows:
Step 1: Setting up hypotheses
Null hypothesis H0: µd = 0, or, H0: µx – µy = 0 i.e., there is no significance difference in the
population means before and after the treatment. In other words, the treatment is not effective.
Alternative hypothesis: Any one of the following alternative hypothesis will be set while
solving the problems.
i. H1: µd ≠ 0, or, H1: µx – µy ≠ 0 i.e., there is significance difference in the population means
before and after the treatment.[Two tailed test]
ii. H1: µd > 0, or, H1: µx – µy > 0 i.e., the population mean before the treatment is greater than the
population mean after the treatment.[Right tailed test]
iii. H1: µd < 0, or, H1: µx – µy < 0 i.e., the population mean before the treatment is less than the
population mean after the treatment.[Left-tailed test]
α)
Step 2: level of significance (α
Choose the appropriate level of significance in advance. The most commonly used is α = 5%
unless otherwise stated.
Step 3: Test Statistic
Under H0, the test statistic is
93
–
d
t =
–
S.E.(d)
–
d
t = ∼tn – 1
S d/ n
–
Where, d = the mean of the paired difference for the sample
d = X – Y = difference of observations
Sd = sample standard deviation of the paired difference
–
Calculations of d and Sd:
– Σd
i. d=
n
1 –
ii. Sd = Σ(d – d)2
n–1
1 2 (Σd)2
= Σd –
n – 1 n
Step 4: Degree of freedom: The degree of freedom is df = n – 1
Step 5: Critical value: The critical or tabulated value of the test statistic t according to tails of a
test is obtained from the t-table.
Step 6: Decision
i. If the calculated value of t is less than or equal to the tabulated value of t, then we accept H0,
i.e., there is no significance difference in the population means before and after the treatment.
In other words, the treatment is not effective.
ii. If the calculated value of t is greater than the tabulated value of t, then we reject H0, i.e., there
is significance difference in the population means before and after the treatment.
Example:
A researcher wanted to find the effect of a special diet on systolic blood pressure. She selected
a sample of seven adults and put them on this dietary plan for three months. The following table
gives the systolic blood pressure of these seven adults before and after the completion of this Plan.
Before 210 180 195 220 231 199 224
After 193 186 186 223 220 183 233
Using the 5% significance level, can we conclude that the dietary plan is effective in reducing
blood pressure?
Solution:
Setting up hypotheses:
Null hypothesis H0:µd = 0, i.e., there is no significant difference on systolic blood pressure
and after the completion of dietary plan. In other words, the dietary plan is not effective in
reducing blood pressure.
94
Alternative hypothesisH1:µd > 0, i.e., the systolic blood pressure has reduced after the
completion of dietary plan. In other words, the dietary plan is effective in reducing blood
pressure. [Right-tailed test]
Level of significance: It is given that level of significance α = 5%
Test Statistic: Under H0, the test statistic is
–
d
t=
S d/ n
–
Calculations of d and Sd
Before the plan (X) After the plan (Y) d=X–Y d2
210 193 17 289
180 186 –6 36
195 186 9 81
220 223 –3 9
231 220 11 121
199 183 16 256
224 233 –9 81
2
Σd = 35 Σd = 873
Here, n = 7
– Σd 35
d = = =5
n 7
1 2 (Σd)2
Sd = Σd –
n – 1 n
1 (35)2
= × 873 –
7–1 7
1
6 [
= × 873 – 175] = 10.73
–
d 5
t = = = 1.226
Sd/ n 10.73/ 7
∴ t = 1.226
Degree of freedom: df = n – 1 = 7 – 1 = 6
Critical value: The tabulated value of the test statistic t at 5% level of significance for 6
degree of freedom and in right tailed test is 1.943, i.e., t0.05,6 = 1.943.
Decision: Since the calculated value of t = 1.226 is less than the tabulated value of t0.05,6 =
1.943, H0 is accepted. Hence we conclude that there is no significant difference on systolic
blood pressure before and after the completion of dietary plan. In other words, the dietary plan
is not effective in reducing blood pressure.
95
Example:
A private agency claims that the crash course it offers significantly increase the writing speed
of the secretaries. The following table gives the scores of eight secretaries before and after they
attended this course.
Before 81 75 89 91 65 70 90 64
After 97 72 93 110 78 69 115 72
Using the 5% significance level, can you conclude that attending this course increases the
writing speed of secretaries? Use critical value and P-value approach for decision.
Solution:
Setting up hypotheses:
Null hypothesis: H0 : µd = 0, i.e., there is no significant difference in the average writing
speed of secretaries before and after attending the crash course. In other words, the crash course does
not increase the writing speed of secretaries.
Alternative hypothesis: H1:µd < 0, i.e., the crash course increases the writing speed of
secretaries.[Left-tailed test]
Level of significance: It is given that the level of significance α = 5%.
Test Statistic: Under H0, the test statistic is
–
d
t=
S d/ n
–
Calculations of d and Sd
Before the plan (X) After the plan (Y) d=X–Y d2
81 97 – 16 256
75 72 3 9
89 93 –4 16
91 110 – 19 361
65 78 – 13 169
70 69 1 1
90 115 – 25 625
64 72 –8 64
2
Σd = – 81 Σd = 1501
Here,
n=8
– Σd – 81
d= = = – 10.125
n 8
1 2 (Σd)2 1 (– 81)2 1
Sd = Σd – = × 1501 – = × [1501 – 820.125]
n – 1 n 8–1 8 7
= 9.86
96
–
d – 10.125
t = = = – 2.904
Sd/ n 9.86/ 8
∴ | t | = 2.904
Degree of freedom: df = n – 1 = 8 – 1 = 7
Critical value: The tabulated value of the test statistic t at 5% level of significance for 7
degree of freedom and in left tailed test is 1.895, i.e., | t0.05,7 | = 1.895.
Decision: Since the calculated value of | t | = 2.904 is greater than the tabulated value of
| t0.05,7 | = 1.895, H0 is accepted and H1 is accepted. Hence we conclude that the crash course
increases the writing speed of secretaries.
Using P-value approach, for left tailed test
P-value = P [T ≤ –Tcal]
= P [T ≤ – 2.90]
= 0.011
Decision: Since P-value = 0.011 < 0.05 = α, Therefore we rejected the null hypothesis (H0)
and accepted the alternative hypothesis (H1)
Example:
The sales figure of an item in eight in eight shops before and after advertisement is given as:
Before 70 65 48 72 80 92 98 100
Level of significance: Since the level of significance is not given, we take α = 5%.
–
d
Test Statistic: Under H0, the test statistic is t =
Sd
n
97
–
Calculations of d and Sd
Before the plan (X) After the plan (Y) d=X–Y d2
70 72 –2 4
65 70 –5 25
48 53 –5 25
72 75 –3 9
80 84 –4 16
92 95 –3 9
98 105 –7 49
100 104 –4 16
2
Σd = – 33 Σd = 153
Here, n = 8
– Σd – 33
d = = = – 4.125
n 8
1 2 (Σd)2
Sd = Σd –
n–1 n
1 (– 33)2
× 153 –
1
= = × [153 – 136.125] = 1.553
8–1 8 7
–
d – 4.125
t = = = – 7.51
Sd/ n 1.553/ 8
∴ |t| = 7.51
Degree of freedom: df = n – 1 = 8 – 1 = 7
Critical value: The tabulated value of the test statistic t at 5% level of significance for 7
degree of freedom and in left tailed test is 1.895, i.e., | t0.05,7 | = 1.895.
Decision: Since the calculated value of | t | = 7.51 is greater than the tabulated value of
| t0.05,7 | = 1.895, H0 is rejected and H1 is accepted. Hence we conclude that the average sale
before the advertisement is less than the average sales after the advertisement. In other words,
the advertisement was effective.
Example:
A certain drug administered to each of 12 patients resulted in the following increases in blood
pressure.
5, 2, 8, – 1, 3, 0, – 2, 1, 5, 0, 4, 6
Can it be concluded that the stimulus will in general be accompanied by an increase in blood
pressure?
Solution:
Setting up hypothesis:
Null hypothesis H0: µd = 0, i.e., there is no significant difference in the blood pressure before
and after the drug is administered to patients. In other words, the drug does not result in an increase
the blood pressure of the patients.
98
Alternative hypothesis H1: µd < 0, i.e., the drug results in an increase the blood pressure of the
patients [Right-tailed test]
Level of significance: Since the level of significance is not given, we take α = 5%.
–
d
Test Statistic: Under H0, the test statistic is t =
Sd
n
–
Calculations of d and Sd
d 5 2 8 –1 3 0 –2 1 5 0 4 6 Σd = 31
d2 25 4 64 1 9 0 4 1 25 0 16 36 Σd2 = 185
Here, n = 12
– Σd – 33
d = =
n 8
= – 4.125
1 2 (Σd)2
Sd = Σd –
n – 1 n
1 (– 31)2
= × 185 –
12 – 1 12
1
= × [185 – 80.08]
11
= 3.088
–
d
∴ t=
S d/ n
2.58
= = 2.894
3.088/ 12
∴ | t | = 7.51
Degree of freedom: df = n – 1 = 12 – 1 = 11
Critical value: The tabulated value of the test statistic t at 5% level of significance for 11
degree of freedom and in right tailed test is 1.796.
Decision: Since the calculated value of t = 2.894 is greater than the tabulated value of
t0.05,11 = 1.796, H0 is rejected and H1 is accepted. Hence we conclude that the drug results in an
increase the blood pressure of the patients.
99
Theoretical Questions
1. Discuss small sample test of significance. What are the assumptions made for the small
sample tests?
2. Distinguish between large sample test and small sample tests of significance.
3. Define degree of freedom. How degree of freedom is determined for different category of t-
test?
4. Explain the applications of student's t-test stating the assumptions involved.
5. Discuss the student's t-tests for testing the hypothesis whether the sample mean is
significantly different from a hypothetical value +6 :
6. Discuss briefly, how do you test the significance of the difference between two sample means,
(i) When the sample values are not paired, (ii) when the sample values are paired and
dependent.
(ii) What do you understand by paired t-test? Under what conditions will you apply it?
Practical Problems
1. Ten cartons are taken at random from an automatic filling machine. The mean net weight of
the 10 cartons is 11.8 kg and standard deviation is 0.15 kg. Does the mean differ significantly
from the intended weight of 12 kg?
2. A machine is designed to produce insulating washers for electrical devices of average
thickness of 0.025 cm. A random sample of 10 washers was found to have an average
thickness of 0.024 cm with a standard deviation of 0.002 cm. Test at 5% level
of significance; the thickness is significantly different from an average thickness of 0.025.
3. An automatic machine was designed to pack exactly 5 pounds of oil. A random sample of 15
packets was examined to test the machine. The average weight was found to be 4.94 pounds
with standard deviation of 0.10 assuming the weights of packets to be normal, test at 5% level
of significance whether the machine is working properly. Use critical value approach and P-
value approach to make decision.
4. A random sample of 20 students is drawn from a certain campus and their average height was
found to be 66.6 inches and standard deviation of 2.5 inches. At 5% level of significance, test
the null hypothesis H0: µ = 65 inches against alternative hypothesis.
i. H1: µ ≠ 65 inches ii. H1: µ > 65 inches
Use critical value and P-value approach for decision.
5. A psychologist claims that the mean age at which children start walking 12.5 months. In order
to test psychologist claim, a random sample of 18 children taken and found that the mean age
at which these children started walking was 12.9 months with a standard deviation of 0.8
month. Using the 1% level of significance level, test the hypothesis that mean age at which
children start walking is not 12.5 months.
100
6. The mean weekly sales of the chocolate bar in candy stores were 140.3 bars per store. After
an advertisement campaign, the mean weekly sales in 22 stores for a typical week increased to
153.7 with standard deviation of 17.2. Can you consider the advertisement effective at 5%
level of significance?
7. A company claims that the mean life of its electric light bulbs is 28 months. A random sample
of 10 bulbs has the following life in months:
26 24 32 28 20 20 23 24 30 43
Test the claim of the company at 5% level of significance. Use critical value approach and P-
value approach to make decision.
8. A random sample of size 16 has mean 53 as mean. The sum of squares of deviations from
mean is 150. Can this sample be regarded as taken from the population having mean 56 as
mean? Also find the 95% confidence limits for the mean.
9. Certain pesticide is packed into bags by a machine. A random sample of 10 bags is drawn and
their contents are found to weigh in kg as follows:
50 49 52 44 45 45 48 46 45 49
Test if the average packing can be taken to be 50 kg.
10. A fertilize mixing machine is set to give 12 kg of nitrate for every quintal bag of fertilizer.
Ten 100 kg bags are examined. The percentages of nitrate are as follows:
11 14 13 12 13 12 13 14 11 12
Is there reason to believe that the machine is defective? Use critical value approach and P-
value approach to make decision.
11. The file contains prices (in dollars) for two tickets, with online service charges, large popcorn,
and two medium soft drink at a sample of six theater chains:
36.15 31.00 35.05 40.25 33.75 43.00
At the 0.05 level of significance, is their evidence that the mean price for two movie tickets
with online service charges, large popcorn, and two medium soft drinks, is different from
$35? Use critical value approach and P-value approach to make decision.
12. A certain maternity hospital recorded that the mean weight of new born babies was 3 kg. A
medical research worker examined a sample of 12 new born babies selected at random from
the hospital and the weight of the 12 new born babies were found to be as given below:
2.7 2.8 2.9 3.2 2.7 2.8 3.3 2.8 3 2.9 3.1 2.6
Test whether there is significant different between the sample mean weight and the population
mean weight of the new born babies. Use critical value approach and P-value approach to
make decision.
13. The mean life of a sample of 10 electric bulbs was found to be 1456 hours with standard
deviation of 423 hours. A second of 17 bulbs chosen from different batch showed a mean life
of 1280 hours and standard deviation 398 hours. Is there any significance difference between
the mean of the two batches?
14. In the comparison of two kinds of paint, a consumer testing service finds that four 1-gallon
cans of one brand cover on the average 546 square feet with a standard deviation of 31 square
feet, whereas four 1-gallon cans of another brand cover on the average 492 square feet with a
standard deviation of 26 square feet. Assuming that the two populations sampled are normal
and have equal variance, test the null hypothesis H0: µ1 = µ2 against the alternative hypothesis
H1: µ1 > µ2 at the 0.05 level of significance. Use P-value approach for decision.
101
15. Two salesmen A and B working in a certain district. From a sample survey the following
result was obtained. State whether there is any significant difference in the average sale
between the two salesmen.
A B
No. of sales 20 18
Average sales (in Rs.) 170 205
Standard Deviation (in Rs.) 20 25
16. Two types of batteries are tested for their lengths of life and the following data are obtained.
No. of samples Mean life Variance
Type A 9 600 hours 121
Type B 8 640 hours 144
Is the battery of type B is superior to that of type A at 5% level of significance?
17. To compare the prices of a certain commodity in two towns, nine shops were selected at
random in each town. The following figures give the price in two towns:
Town A 61 56 63 56 63 59 56 44 61
Town B 55 47 59 51 61 57 54 64 58
Test whether the average price can be said to be the same in the two towns.
18. Two new drugs A and B are given two independent groups of 10 and 12 patients with heart
disease respectively. After 30 minutes the reduction of blood pressure due to the two new
drugs A and B records are as given below:
Drug A 7 16 14 9 10 11 6 8 10 9
Drug B 10 12 16 14 11 12 13 8 12 15 9 12
(i) The whether the two new drugs A and B are equally effective for the patients with heart
disease at 5% level of significance.
(ii) Would you conclude that the drug A is less effective than the drug B is reducing the
blood pressure of patients with heart disease at 5% level of significance?
19. At communication service centre A, the records of telephone calls sold per day for a month of
30 days gave the following distributions.
Telephone calls sold 100-120 120-140 140-160 160-180 180-200
Number of days 2 5 15 6 2
At another communication service center B, the records of telephone calls sold per day for the
same month of 30 days gave the following distribution.
Telephone calls sold 100-120 120-140 140-160 160-180 180-200
Number of days 1 2 10 14 3
Test whether there is significant different between the two means of telephone calls sold at
communication service centre A and B at 2% level of significance.
102
20. The monthly advertising expenditure of a company for two products A and B are as follows:
Expenditure in Rs.
Month
Product A Product B
January 100 175
February 120 200
March 125 250
April 145 225
May 150 200
June 140 150
July 200 200
Is there sufficient evidence to conclude that the average expenditure on advertising on product
B is more than on product A? Use critical value approach and P-value approach to make
decision.
21. An aptitude test was conducted for two groups of executive Group 1 consists of engineers and
group 2 consists of accountants. The score obtained by the candidates are given below:
Engineers 125 115 119 85 97 107 125 125 110
Accountants 112 98 109 96 77 70 114 100
Do you find any significant difference between the scores of these two groups? Use critical
value approach and P-value approach to make decision.
22. Two chemical solution X and Y were tested for their PH, the degree of acidity of the solution.
Six observations on each solution for their PH values were taken as:
X 8 5 9 6 8 7
Y 5 7 7 8 6 6
Can you conclude that the two types of solutions have different mean PH values at 5% level of
significance?
23. A group of five patients treated with medicine 'A' weight 42, 39, 48, 60, and 41 kg. A second
group of seven patients from the same hospitals treated with medicine 'B' weight 38, 42, 56,
64, 68, 69 and 62 kg. Do you agree with the claim that the medicine 'B' increases the weight
significantly? Test (i) at 1% and (ii) at 10% level of significance.
24. The means of two random samples of size 9 and 7 are 196.42 and 198.82 respectively. The
sum of squares of the deviations from their mean is 26.94 and 18.73 respectively. Can the
sample be considered to have been drawn from the large population with same mean?
25. Two random sample of increment of weights in pigs on feeding the two types of Pig foods A
and B are as follows:
Food A 7 10 6 5 8 7 11 9 10 11
Food B 8 10 9 10 11 12 10 11 9 12
103
(i) Test whether food A is significantly different from food B on regards to their effects on
increase in weight. Assuming that the two random samples are independent.
(ii) Test whether the food B is better than food A using paired t-test. Assuming that the two
pig foods A and B are fed on the same set of 10 pigs which makes the two sample related
and the increment of weights are occurred in pairs. Use critical value approach and P-
value approach to make decision.
26. A random sample of nine students was selected to test for the effectiveness of a special course
designed to improve memory. The following table gives the results of a memory test given to
those students before and after this course.
Before 43 57 48 65 81 49 38 69 58
After 49 56 55 77 89 57 36 64 69
Test at the 1% level of significance whether this course makes any statistically significant
improvement in the memory of all students. Use critical value approach and P-value approach
to make decision.
27. The sales data of certain clothes in six shops before and after a special promotional campaign
are as follows:
Shops A B C D E F
Before Campaign 53 28 31 48 50 42
After Campaign 58 29 30 55 56 45
Can the special promotional campaign was successful?
28. A company claims that its 12-week special exercise program significantly reduces weight. A
random sample of six persons was selected and these persons were put on this exercise
program for 12 weeks. The following table gives the weights (in pounds) of those six persons
before and after the program.
Employee 1 2 3 4 5 6 7 8
Without music 220 202 226 190 200 215 208 210
With music 236 190 240 200 220 205 212 215
104
30. The manufacturer of a gasoline additive claims that the use of this additive increases gasoline
mileage. A random sample of six cars was selected. These cars were driven for one week
without the gasoline additive and then for one week without the gasoline additive and then for
one week with the gasoline additive. The table gives the miles per gallon for these cars
without and with the gasoline additive.
Test at 5% level of significance level whether the use of the gasoline additive increase the
gasoline mileage.
31. A special coaching class on Statistics in a group of 10 students yields the following increases
in score.
8 10 –2 0 –5 –1 9 12 6 5
Roll No. 1 2 3 4 5 6 7 8
Increase in marks 2 –2 6 –8 12 5 –7 2
Do the marks indicate that the students have gained from the coaching?
33. A drug was administered to 10 patients and the increments in their blood pressure were
recorded to be 6, 3, – 2, 4, – 3, 4, 6, 0, 0, and 2. Is it reasonable to believe that the drug has no
effect on change of blood pressure?
34. To test the desirability of a certain modification in typist’s desks, 9 typists were given two
tests of almost same nature, one on the desk in use and the other on the new type. The
following difference in the number of words typed per minute was recorded:
Typist A B C D E F G H I
Do the data indicate that the modification in desk increases typing speed? Use critical value
approach and P-value approach to make decision.
35. The mean value of difference and the sum of squares of the differences for a sample of size 10
was found to be 0.6 and 200 respectively. Test the difference is significant at 5% level of
significance.
105
11. t = 0.8556, p-value = 0.4313 > 0.05 = α, accept H0 . There is not enough evidence to
concluded that the mean price for two tickets, with online service charges, large popcorn, and
two medium soft drinks, is different from $35.
12. |t| = 1.67, accept H0. 13. t = 1.04, accept H0.
21. |t| = 2.06, accept H0, p-value = 0.0029 < 0.05 = α, reject H0
22. t = 0.59, accept H0.
23. | t | = 1.70 (i) accept H0 , (ii) reject H0 . 24. | t | = 2.64, reject H0.
25. (i) |t| = 2.28, reject H0. (ii) |t| = 2.585 reject H0, p - value = 0.024 < 0.05 = α, reject H0.
106
1. Which one of the following is not correct?
a. Sample size in analysis of variance need not be equal.
b. A chi-square value is always positive.
c. The chi-square and t-distribution and both always symmetrical distribution.
d. Analysis of variance is used to test the equality of three or more population means.
2. Range of the statistic-t is:
a. – 1 to +1 b. – ∞ to +∞ c. 0 to ∞ d. 0 to 1
3. The test statistic to test µ1 = µ2 for normal population when population standard
deviation is not known is
a. F-test b. Z-test c. t-test d. none of these
4. Student’s t-test was invented by:
a. R.A. Fisher b. G.W. Snedecor c. W.S. Gosset d. W.G. Cochran
5. Student’s t-test is applicable in case of:
a. Small samples b. For samples of size between 5 and 30
c. Large samples d. None of the above
6. A sample of 12 specimen taken from a normal population is expected to have a mean 50
mg/cc. The sample has a mean 64 mg/cc with a variance of 25. To test H0:
= 50 vs. H1: ≠50, you will use:
a. Z-test b. 8 -test c. F-test d. t-test
7. Student’s t-test is applicable only when:
a. The variate values are independent b. The variable is distributed normally
c. The sample is not large d. All the above
8. Paired t-test is applicable when the observations in the two samples are:
a. Paired b. Correlated c. Equal in number d. All the above
9. The mean difference between 9 paired observations is 15.0 and the standard deviation of
differences is 5.0. The value of statistic t is:
a. 27 b. 9 c. 3 d. Zero
10. The degrees of freedom for statistic-t for paired t-test based on n pairs of observations
is:
a. 2 (n – 1) b. n – 1 c. 2n – 1 d. n(n – 1)
11. To test a hypothesis about proportions of items in a class, the usual test is:
a. t-test b. F-test c. Z-test d. None of the above
12. Which of the following is necessary condition for using a t-distribution table?
a. n is small b. s is known but is not
c. The population is infinite d. All of these.
13. Which of the following are basis assumptions of t-test?
a. The two populations are equal. b. The two samples are random ones.
c. The two population have the same variance. d. All of these.
14. Let first sample has 13 elements with s1 = 17 and second sample has 9 elements with
s2=22, which of the following is the value of S2p
a. 19 b. 19.5 c. 361 d. 367
15. The sample mean and sample standard deviation of 25 observations are 40 and 2
respectively, if the population mean is 10 the tstat.is
a. 75 b. 50 c. 25 d. 5
107
16. In testing a hypothesis about two population means, it the t distribution is used, which of
the following assumptions is required?
a. The standard deviations are not the same. b. Both population means are the same.
c. Both populations are normally distributed. d. The sample sizes are equal.
17. The t test for the difference between the means of two samples makes what assumption?
a. Populations are approximately normally distributed.
b. Samples are randomly and independently drawn.
c. Sample variances are equal.
d. All of the above.
18. When testing for differences between the means of two related populations, what is the
null hypothesis?
a. The difference between the two population means is greater than 1.
b. The difference between the two population means is equal to 0.
c. The difference between the two population means is equal to 1.
d. The difference between the two population means is greater than 0.
19. If you are testing a hypothesis that two population proportions are the same, you should
do which of the following?
a. Calculate a pooled value for the sample proportion.
b. Use a sample proportion equal to 0.5.
c. Average the two sample proportions.
d. Use a 0.05 level of significance.
20. If you test for the difference between the means of two related samples, there are how
many degrees of freedom?
a. n-1 b. (n1 + n2)/2 c. n1 + n2 – 2 d. (n1 + n2)/2 – 1
21. When you test for a difference between two population means from small samples, when
should a pooled variance be calculated?
a. When the population variances are assumed to be equal.
b. Always calculate a pooled variance.
c. When the sample sizes are different.
d. When a two-tail test is used.
22. In what type of test is the variable of interest the difference between the values of
corresponding observations rather than the individual observations?
a. Mean difference between two related populations
b. Difference between the means of two independent populations
c. Equality of variances from two independent populations
d. All of the above
23. Orange juice is bottled on two different production lines. A sample of 5 bottles from the
first line yields a mean of 1.2 quarts with a standard deviation of 0.02 quarts, and a
sample of 6 bottles from the second line yields a mean of 1.15 quarts with a standard
deviation of 0.01 quarts. The test statistic is equal to which of the following?
a. 4 b. 2 c. 0.05 d. 0.01
1. c 2. b 3. c 4. c 5. b 6. d 7. d 8. d 9. b 10. b
11. c 12. d 13. d 14. c 15. a 16. c 17. d 18. b 19. a 20. c
21.a 22. a 23. a
XXX
108
Unit
Analysis of Variance
109
4.1 F–Distribution
F–Statistic
The F statistic is defined as the ratio of two independent chi-square variates divided by their
respective degrees of freedom. If X and Y are two independent chi-square variates with
ν1 = n1 – 1 and ν2 = n2 – 1 degrees of freedom, then the ratio
X/ν1
F= ∼F follows Snedecor's F-distribution with
Y/ν2 (ν1, ν2)
(ν1, ν2) degrees of freedom. The sampling distributions of F statistic does not involve any
populations parameters and depends only on the degrees of freedom ν1 and ν2.
The F– distribution, also called variance ration distribution, is skewed to the right and the F
values can never be negative. F – distribution has two numbers of degree of freedom: degrees of
freedom for numerator and degrees of freedom for denominator. Each combination of degrees of
freedom for the numerator and for the denominator gives a different F – distribution curve. The
typical shape of the F distribution is shown below:
Shape of the F-distribution
For a F-distribution, degree of freedom for the numerator and degrees of freedom for
denominator are usually written as follows:
df = (ν1,ν2)
Suppose we want to test whether two independent samples have been drawn from the normal
populations with the same population variance or to test whether two normal populations have same
variance, we perform F test for equality of population variance. The procedure for testing the
equality of population variances is as follows:
Step 1: Setting up Hypothesis:
2 2
Null Hypothesis H0: σ1 = σ2 , i.e., the two population variances are equal. In other words, the
two independent estimates of the common population variance do not differ significantly.
2 2
Alternative hypothesis H1: σ1 ≠ σ2 , i.e., the two population variances are equal. In other
words, the two independent estimates of the common population variance differ significantly.
Step 2: Level of significance (α)
Choose the appropriate level of significance in advance. The most commonly used is α = 5%
unless otherwise stated.
Step 3: Test Statistic:
Under H0, the test statistic is
2
2 2 S1
If S1 > S2 , F= 2∼F(n1– 1, n2- 1)
S2
2
2 2 S2
If S2 > S1 , F= 2∼F(n2– 1, n1- 1)
S1
Where, n1 = size of sample taken from population 1
n2 = size of sample taken from population 2
2 2
S1 and S2 are unbiased estimates of the common population variance σ2.
2 2
Computations of S1 and S2
i. Actual mean method:
2 1 – 2 1 –
S1 = Σ(X1 – X1)2 , S2 = Σ(X2 – X2)2
n1 – 1 n2 – 1
ii. Direct method:
2 1 2 (ΣX1)2 2 1 2 (ΣX2)2
S1 = ΣX1 – , S2 = ΣX –
n1 – 1 n1 n2 – 1 2 n2
iii. Short-cut method:
2 1 2 (Σd1)2 2 1 2 (Σd2)2
S1 = Σd1 – , S2 = Σd –
n1 – 1 n1 n2 – 1 2 n2
Where, d1 = X1 – A, d2 = X2 – B , A, and B are assumed means
111
2 2 2 2
If the biased estimates of sample variances, s1 and s2 , are given then S1 and S2 are calculated
by using the following relation:
2 n 1s 12 2 n 2s 22
s1 = and S2 =
n1 – 1 n2 – 1
2 1
( – 2 )
Where, s1 = Σ X1 – X1
n1
2 1
( – 2 )
s2 = Σ X2 – X2
n2
Step 4: Degree of freedom: (df)
2 2
If S1 > S2 , the degree of freedom is
df = (n1 – 1, n2 – 1)
2 2
If S2 > S1 , the degree of freedom is
df = (n2 – 1, n1 – 1)
Step 5: Critical value:
The critical or tabulated value of the test statistic F at the pre-specified level of significance is
obtained from the F-table.
Step 6: Decision
i. If the calculated value of F is less than or equal to the tabulated value of F, then we accept H0,
i.e., the two population variances are equal. In other words, the two independent estimates of
the common population variance do not differ significantly.
ii. If the calculated value of F is greater than the tabulated value of F, then we reject H0, i.e., the
two population variance are not equal. In other words, the two independent estimates of the
common population variance differ significantly.
Example:
Two independent samples of 8 and 7 items respectively had the following values of variables.
Sample I 9 11 13 11 15 9 12 14
Sample II 10 12 10 14 9 8 10
Do the estimates of population variance differ significantly?
Solution:
Setting up Hypotheses:
2 2
Null Hypothesis H0: σ1 = σ2 , i.e., the two population variances do not differ significantly.
2 2
Alternative hypothesis H1:σ1 ≠ σ2 , i.e., the two population variances differ significantly.
Level of significance (α): since the level of significance is not given we take α = 5% = 0.05.
112
2 2
Computations of S1 and S2
X1 d1 = X1 – 11 2 X2 d2 = X2 – 10 2
d1 d2
9 – 4 10 0 0
11 0 0 12 2 4
13 2 4 10 0 0
11 0 0 14 4 16
15 4 16 9 –1 1
9 –2 4 8 –2 4
12 1 1 10 0 01
14 3 9
Σ d 1= 6 2
Σ d1 = 38 Σ d2 = 3 2
Σ d2 = 25
Now,
2 1 2 (Σd1)2 1 (6)2
S1 = Σd1 – = 38 – = 4.786
n1 – 1 n1 8 – 1 8
2 1 2 (Σd2)2 1 (3)2
S2 = Σd2 – = 25 – = 3.952
n2 – 1 n2 7 – 1 7
Test Statistic:
Under H0, the test statistic is
2
S1 2 2
F = 2 (Q S1 > S2 )
S2
4.786
= = 1.211
3.952
∴ F = 1.211
Degree of freedom: df = (n1 – 1, n2 – 1) = (8 – 1, 7 – 1) = (7, 6)
Critical value: The critical or tabulated value of the test statistic F at 5% level of significance
for (7, 6) degree of freedom is ± 4.21, i.e., F0.05,(7,6) = 4.21.
Decision: Since the calculated value of F = 1.211 is less than the tabulated value of
F0.05,(7,6) = 4.21, H0 is rejected. Hence we conclude that the two population variances do not differ
significantly.
Example:
Two random samples were drawn from two normal populations and their values are:
A: 66 67 75 76 82 84 88 90 92
B: 64 66 74 78 82 85 87 92 93 95 97
Test whether the two populations have the same variance at the 5% level of significance.
Solution:
Setting up Hypotheses:
2 2
Null Hypothesis H0: σ1 = σ2 , i.e., the two population have the same variance.
113
2 2
Alternative hypothesis H1: σ1 ≠ σ2 , i.e., the two population variances do not have the same
variances.
Level of significance (α): It is given that the level of significance α = 5% = 0.05.
2 2
Computations of S1 and S2
X1 X1 – – – X2 X1 – – –
X1 = X1 – 80 (X1 – X1) = (X1 – 80)2 X2 = X2 – 83 (X2 – X2) 2 = (X2 – 83)2
66 – 14 196 64 – 19 361
67 – 13 169 66 – 17 289
75 –5 25 74 –9 81
76 –4 16 78 –5 25
82 2 4 82 –1 1
84 4 16 85 2 4
88 8 64 87 8 16
90 10 100 92 9 81
92 12 144 93 10 200
95 12 144
97 14 196
720 734 913 1298
Here, n1 = 9, n2 = 11
– ΣX1 720 – ΣX2 913
∴ X1 = = = 80, X2 = = = 83
n1 9 n2 11
2 1 – 734
S1 = Σ(X1 – X1)2 = = 91.75
n1 – 1 9–1
2 1 – 1298
S2 = Σ(X2 – X2)2 = = 129.8
n2 – 1 11 – 1
Test Statistic:
Under H0, the statistic is
2
S1 2 2
F = 2 (Q S1 > S2 )
S2
129.8
= = 1.415
91.75
∴ F = 1.415
Degree of freedom: df = (n1 – 1, n1 – 1) = (11 – 1, 9 – 1) = (10, 8)
Critical value: The critical or tabulated value of the test statistic F at 5% level of significance
for (10, 8) degree of freedom is 3.35, i.e., F0.05,(10,8) = 3.35 .
Decision: Since the calculated value of F = 1.415 is less than the tabulated value of
F0.05,(10,8) = 3.35, H0 is accepted. Hence we conclude that the two populations have the same
variances.
114
Example:
In a sample of 8 observations, the sum of the squared of items from their mean was 94.5. In
another sample of 10 observations, the value was found to b3 101.7. Test whether the difference in
sample variability is significant at 5% level of significance.
Solution:
In usual notations, we are given:
n1 = 8 n2 = 10
– –
Σ(X1 – X1)2 = 94.5 Σ(X2 – X2)2 = 101.7
2 1 – 94.5
∴ S1 = Σ(X1 – X1)2 = = 13.5
n1 – 1 8–1
2 1 – 101.l7
S2 = Σ(X2 – X2)2 = = 11.3
n2 – 1 10 – 1
Setting up Hypotheses:
2 2
Null Hypothesis H0: σ1 = σ2 , i.e., there is no significant difference in the population
variances. In other words, the difference in sample variability is not significant.
2 2
Alternative hypothesis H1: σ1 ≠ σ2 , i.e., there is no significant difference in the population
variances. In other words, the difference in sample variability is significant.
Level of significance (α): It is given that the level of significance α = 5% = 0.05.
Test Statistic: Under H0, the statistic is
2 2
Since S1 > S2 ,
2
S1
F = 2
S2
13.5
= = 1.195
11.3
∴ F = 1.195
Degree of freedom: df = (n1 – 1, n2 – 1) = (8 – 1, 10 – 1) = (7, 9)
Critical value: The critical or tabulated value of the test statistic F at 5% level of significance
of significance for (7, 9) degree of freedom is 3.29, i.e., F0.05,(7, 9) = 3.29.
Decision: Since the calculated value of F = 1.195 is less than the tabulated value of
F0.05,(7,9) = 3.29, H0 is accepted. Hence we conclude that there is no significant difference in
the population variances. In other words, the difference in sample variability is not significant.
115
Example:
Test whether two populations have the same variance or not from the following:
Sample I Sample II
n1 = 7 n2 = 6
– –
Σ(X1 – X1)2 = 320 Σ(X2 – X2)2 = 350
Solution
In usual notations, we are given:
Sample I Sample II
n1 = 7 n2 = 6
– –
Σ(X1 – X1)2 = 320 Σ(X2 – X2)2 = 350
2 1 – 320
∴ S1 = Σ(X1 – X1)2 = = 53.33
n1 – 1 7–1
2 1 – 350
S2 = Σ(X2 – X2)2 = = 70
n2 – 1 6–1
Setting up Hypotheses:
2 2
Null Hypothesis H0: σ1 = σ2 , i.e., the two population have the same variance.
2 2
Alternative hypothesis H1: σ1 ≠ σ2 , i.e., the two population variances do not have the same
variances.
Level of significance (α): Since the level of significance is not given we take α = 5% = 0.05.
Test Statistic: Under H0, the statistic is
2 2
Since S1 > S2 ,
2
S1
F= 2
S2
70
= = 1.313
53.33
∴ F = 1.313
Degree of freedom: df = (n1 – 1, n2 – 1) = (6 – 1, 7 – 1) = (5,6)
Critical value: The critical or tabulated value of the test statistic F at 5% level of significance
for (5, 6) degree of freedom is 4.39, i.e., F0.05,(5, 6) = 4.39 .
Decision: Since the calculated value of F = 1.313 is less than the tabulated value of
F0.05,(5, 6) = 4.39, H0 is accepted. Hence we conclude that the two populations have the same
variances.
116
Example:
Can the following two samples be regarded as coming from the same normal population?
Sample Size Sample mean Sum of squares of deviations from the mean
1 10 12 120
2 12 15 314
Solution:
In usual notations, we are given:
Sample I Sample II
n1 = 10 n2 = 10
– –
X1 = 12 X2 = 15
– –
Σ(X1 – X1)2 = 120 Σ(X2 – X2)2 = 314
Since the normal populations have two parameters: the mean µ and the variance σ2, to test
whether the samples are from same normal populations, we have to test:
(i) Equality of population means
(ii) Equality of population variances
Test of equality of two population means : (t- test)
Setting up hypotheses:
Null hypothesis H0: µ1 = µ2, i.e., there is no significant difference between two population
means.
Alternative hypothesis H1:µ1 ≠ µ2 , i.e., there is significant difference between two
population means
Level of significance: Since the level of significance is not given, we take α = 5% = 0.05
Test statistic: Under H0, the test statistic is
– –
X1 – X2
t=
Sp +
2 1 1
n1 n2
2 1 1 434
Where, Sp = = [120 + 314] = = 21.7
n1 + n2 – 2 10 + 12 – 2 20
12 – 15
∴ t =
21.7 ×
1 1
10 + 12
∴ | t | = 1.504
Degree of freedom: df = n1 + n2 – 2 = 10 + 12 – 2 = 20
Critical value: The tabulated value of the test statistic t at 5% level of significance for 20
degree of freedom and in two tailed test is ± 2.086, i.e., | t0.05,20 | = 2.086.
Decision: Since the calculated value of the test statistic | t | = 1.504 is less than the tabulated
value of the test statistic | t0.05,20 | = 2.086, H0 is accepted. Hence we conclude that there is no
significant difference between two population means.
Test of significance: Since the level of significance is not given we take α = 5% = 0.05.
Test statistic: Under H0, the statistic is
117
2
2 2 S2
Since S2 > S1 , F = 2
S1
2 1 – 120
Here, S1 = Σ(X1 – X1)2 = = 13.33
n1 – 1 10 – 1
2 1 – 314
S2 = Σ(X2 – X2)2 = = 28. 55
n2 – 1 12 – 1
2
S2
28.55
F= 2 = = 2.14
13.33
S1
∴ F = 2.14
Degree of freedom: df = (n2 – 1, n1 – 1) = (12 – 1, 10 – 1) = (11,9)
Critical value: The critical or tabulated value of the test statistic F at 5% level of significance
for (11, 9) degree of freedom is 3.105, i.e., F0.05,(11,9) = 3.105.
Decision: Since the calculated value of F = 2014 is less than the tabulated value of
F0.05,(11,9) = 3.105, H0 is accepted. Hence we conclude that there is no significant difference in
the population variances.
2 2
Since both the Null hypotheses H0: µ1 = µ2 and H0: σ1 = σ2 are accepted, we conclude that
two samples can be regarded as coming from the same normal population.
119
Variance between samples
F=
Variance within samples
MSC
F=
MSE
Calculations of Variance between samples i.e. due to columns (MSC) and variance within
samples i.e. due to errors (MSE)
Variance between Samples (MSC)
The variance between samples gives an estimate of σ2 based on the variation among the
means of samples taken from different populations. To calculate variance between samples,
we first compute the sum of square between samples i.e. sum of square due to columns (SSC),
is obtained as follows:
– – –
i. Calculate sample means X1, X 2, .... Xc of all c samples.
ii. Calculate the mean of sample means called grand means as
– sum of obesrvations of all samples
X =
number of observations
– – – – – –
iii. Calculate: (X1 – X ), (X 2 – X ), ...., (Xc – X )
iv. The SSC is calculated as
– – – – – –
SSC = n1 (X1 – X ), + n2(X2 – X)2 + .... + nc (Xc – X)2
Then, the mean sum of squares between samples, MSC, is obtained on dividing SSC by the
degree of freedom. That is
SSC
MSC =
c–1
Variance within Samples (MSE)
The variance within samples gives an estimate of σ2 based on the variation within the data of
different samples. To calculate variance within samples, we first compute the sum of square
within samples i.e. Sum of square due to error (SSE) is obtained as follows:
– – –
i. Calculate sample means X 1, X2, .... Xc of all c samples.
– – – – – –
ii. Calculate: (X1 – X1 ), (X2 – X2 ), ...., (Xc – Xc )
iii. The SSE is calculated as
– – –
SSE = ∑(X1 – X1 )2 + ∑(X2 – X2 )2 + .... +∑(Xc – Xc )2
Then, the mean sum of squares within samples, (MSE) is obtained on dividing SSE by the
corresponding degree of freedom. That is
SSE SSE
MSE = =
d.f n–c
Where, n = total number of observations, c = number of groups
The sum of square within samples is also called sum of square due to errors.
120
One-Way ANOVA table
Source of Degree of Sum of Mean square of
F-ration Fα
Variation (S.V) freedom(d.f) squares(SS) squares(MSS)
Between samples c–1 SSC MSC MSC Fα{(c – 1), (n – c)} df
MSC = F=
(columns) c–1 MSE
Within samples n–c SSE SSE
MSE =
(errors) n –c
Total n–1 TSS
Step 4: Degree of freedom (d.f)
The degree of freedom is d.f. = (c – 1, n – c).
Step 5: Critical value
The critical or tabulated value of the test statistic F at the pre-specified level of significance
for (c – 1, n – c) degree of freedom is obtained from the F - table.
Step 6: Decision
i. If the calculated value of F is less than or equal to the tabulated vale of F, then we accept
H0, i.e. the population means of c independent populations are equal.
ii. If the calculated value of F is greater than the tabulated value of F, then we reject H0, i.e.
the population means of c independent populations are unequal.
Example:
Three randomly selected groups of chickens are fed on three different diets. Each group
consists of five chickens. Their weight gains during a specified period of time are as follows:
Diet I 4 4 7 7 8
Diet II 3 4 5 6 7
Diet III 6 7 7 7 8
Test the hypothesis that mean gains of weights due to the three diets are equal.
Solution:
Setting up hypotheses:
Null hypothesis H0: µ1 = µ2 = µ3, i.e., there is no significant difference in mean weight gains
due to different diets.
Alternative hypothesis H1: µ1 ≠ µ2 ≠ µ3, i.e., there is significant difference in mean weight
gains due to different diets.
Level of significance:
Since the level of significance is not given, we take α = 0.05
Test Statistic: Under H0, the test statistic is
MSC
F=
MSE
121
Where,
SSC
MSC = means sum of squares between samples (due to columns) =
c–1
SSE
MSE = Mean sum of squares within samples (due to errors) =
n–c
c = Number of samples,
n = Total number of observations
Calculations of MSC and MSE
= 14 + 10 + 2 = 26
SSC 10 10
Now, MSC = = = =5
c–1 3–1 2
SSE 26 26
MSE = = = = 2.17
n – c 15 – 3 12
122
Source of Degree of Sum of Mean sum of
F-ration F∝
Variation (S.V) freedom squares squares
Between samples 3 – 1 = 2 SSC = 10 10 MSC F0.05,(2,12) df. = 3.89
MSC = = 5 F=
2 MSE
Within samples 15 – 3 = 12 SSE = 26 26 = 2.304
MSE = = 2.17
12
Total 15 – 1 = 14 TSS = 36
Degree of freedom: df = (c – 1, n – c) = (2,12)
Critical value: The tabulated value of the test statistic F at 5% level of significance for
(2, 12) degree of freedom is 3.89, i.e., F0.05,(2,12) = 3.89.
Decision: Since the calculated value of F = 2.304 is less than the tabulated value of
F0.05,(2,12) = 3.89, H0 is accepted. Hence we conclude that there is no significant difference in
mean weight gains due to different diets.
2. Short-Cut Method
The following steps are required under the short-cut method for one-way classification.
Step 1: Setting hypotheses
Null hypothesis H0 : µ1 = µ2 = ... = µc, i.e., the c independent population means are equal.
Alternative hypothesis: H1: µ1 ≠ µ2 ≠ ... ≠ µc, i.e., c independent population means are not
equal. In other words, at least two means of the populations are not equal.
Step 2: Level of significance (α)
Choose the appropriate level of significance in advance. The most commonly used is α = 5%
unless otherwise stated.
Step 3: Test Statistic
Under H0, the test statistic is given by
Variance between samples
F=
Variance within samples
MSC
F=
MSE
The following steps obtain the test statistic:
i. Find the sum of the values of observations of all the c samples and denote it by T.
T = ∑ X1 + ∑ X2 + ..... + ∑ Xc
ii. Find the correction factor (C.F.) as
T2
C.F. = , where n = n1 + n2 + ... + nc
n
iii. Find the total sum of squares (TSS) as
2 2 2 T2 2
[
TSS = ∑X1 + ∑X2 + .... + ∑Xn –
n] = ∑xij – C.F.
123
iv. Find the sum of squares between samples (due to columns) as
(∑X1)2 (∑X2)2 (∑Xc)2 T2 T2c
SSC =
n1 + n2 + ... + nc – n = ∑ ni – C.F
v. Find the sum of squares within samples (due to errors) as
SSE = TSS – SSC
Then the MSC and MSE are obtained as
SSC SSE
MSC = , MSE =
c–1 n–c
One-Way ANOVA table
Source of Degree of Sum of Mean square of F-ration F∝
Variation (S.V) freedom (d.f) squares (SS) squares(MSS)
Between samples c–1 SSC SSC MSC Fα{(c – 1), (n – c)} df
MSC = F=
(columns) c–1 MSE
Within samples n–c SSE SSE
MSE =
(errors) n–c
Total n–1 TSS
Step 4: Degree of freedom (d.f)
The degree of freedom is d.f. = (c – 1, n – c).
Step 5: Critical value
The critical or tabulated value of the test statistic F at the pre-specified level of significance
for (c – 1, n – c) degree of freedom is obtained from the F - table.
Step 6: Decision
i. If the calculated value of F is less than or equal to the tabulated vale of F, then we accept
H0, i.e., the population means of c independent populations are equal.
ii. If the calculated value of F is greater than the tabulated value of F, then we reject H0, i.e.,
the population means of c independent populations are unequal.
Example:
The following data represents the number of units of production per day turned out by 5
different workers using different types of machines.
Machine type 1 1 3 4 5
A 44 46 34 33 38
B 38 40 36 38 42
C 47 52 44 46 49
D 36 43 32 33 39
Test whether the mean productivity is the same for the four different machine types.
Solution:
Setting up hypotheses:
Null hypothesis H0: µ1 = µ2 = µ3, = µ4, i.e., there is no significant difference in mean
productivity for the different machine types.
Alternative hypothesis H1: µ1 ≠ µ2 ≠ µ3, ≠ µ4, i.e., there is significant difference in mean
productivity for the different machine types.
Level of significance:
Since the level of significance is not given, we take α = 0.05
MSC
Test statistic: Under H0, the test statistic is F =
MSE
124
Where,
SSC
MSC = means sum of squares between samples (due to columns) =
c–1
SSE
MSE = Mean sum of squares within samples (due to errors) =
n–c
c = Number of samples,
n = Total number of observations
Calculations of MSC and MSE
2 2 2 2
XA XB XC XD XA XB XC XD
44 38 47 36 1936 1444 2209 1296
46 40 52 43 2116 1600 2704 1849
34 36 44 32 1156 1296 1936 1024
33 38 46 33 1089 1444 2116 1089
38 42 49 39 1444 1764 2401 1521
2 2 2 2
∑XA = 195 ∑ XB = 194 ∑XC = 238 ∑XD = 183 ∑X = 7741 ∑ X = 7548 ∑ X = 11366 ∑X = 6779
A B c D
Here, nA = 5, nB = 5, nC = 5, c = 4 n = nA + nB + nc + nd = 20
i. The total sum of all observations is obtained as
T = ∑ XA + ∑XB + ∑ XC + ∑XD = 195 + 194 + 238 + 183 = 810
ii. The correction factor is obtained as
T2 (810)2
C.F. = = = 32805
n 20
iii. The total sum of squares is obtained as
2 2 2 2
TSS =∑XA + ∑XB + ∑XC + ∑XD – C.F.
= 7741 + 7548 + 11366 + 6777 – 32805 = 629
iv. The sum of squares between samples is obtained as
(∑XA)2 (∑XB)2 (∑XC)2 (∑XD)2
SSC = + + + – C.F.
nA nB nC nD
(195)2 (194)2 (238)2 (183)2
= + + + = 353.8
5 5 5 5
v. The sum of squares within samples is obtained as
SSE = TSS – SSC = 629 – 353.8 = 275.2
125
Critical value:
The tabulated value of the test statistic F at 5% level of significance for (3,16) degree of
freedom is 3.24, i.e., F0.05,(2,12) = 3.24.
Decision:
Since the calculated value of F = 6.86 is greater than the tabulated value of F0.05,(3,16) = 3.24,
H0 is rejected and H1 is accepted. Hence we conclude that there is no significant difference in
mean Productivity of the different machine types.
3. Coding Method
When the magnitude of each observation is very large, the direct and short cut method
become more tedious and time consuming in the calculations of value of test statistic. Under
this situation, we adopt coding method. Coding here denotes addition, subtraction,
multiplication, or division of data by a constant number. After then using the steps of short-
cut method, we performed the analysis of variance under coding method.
Example:
The following data weekly sale records (in Rs.) of three salesmen A,B, and C of a company
during 13 sale-calls:
A 300 400 300 500
B 600 300 300 400
C 700 300 400 600 500
Test whether the mean weekly sales of three salesmen are different.
Solution:
Setting up hypotheses:
Null hypothesis H0: µ1 = µ2 = µ3, i.e., there is no significant difference in mean weekly sales
of three salesmen.
Alternative hypothesis H1: µ1 ≠ µ2 ≠ µ3, i.e., there is significant difference in mean weekly
sales of three salesmen.
Level of significance:
Since the level of significance is not given, we take α = 0.05
MSC
Test statistic: Under H0, the test statistic is F=
MSE
SSC
Where, MSC = means sum of squares between samples (due to columns) =
c–1
SSE
MSE = Mean sum of squares within samples (due to errors) =
n–c
The given data can be coded by subtracting 300 and dividing by 100 and then data reduced as
follows:
126
Calculations of MSC and MSE
XA XB XC 2 2 2
XA XB XC
0 3 4 0 9 16
1 0 0 1 0 0
0 0 1 0 0 1
2 1 3 4 1 9
2 4
∑XA = 3 ∑ XB = 4 ∑XC = 10 2
∑ XA = 5
2
∑ XB = 10
2
∑ Xc = 30
Here, nA = 4, nB = 4, nC = 5, c = 3 n = nA + nB + nc = 4 + 4 + 5 = 13
i. The total sum of all observations is obtained as
T = ∑ XA + ∑XB + ∑ XC = 3 + 4 + 10 =17
ii. The correction factor is obtained as
T2 (17)2
C.F. = = = 22.23
n 20
iii. The total sum of squares is obtained as
2 2 2
TSS =∑XA + ∑XB + ∑XC – C.F.
= 5 + 10 + 30 – 22.23 = 22.77
iv. The sum of squares between samples (due to columns) is obtained as
(∑XA)2 (∑XB)2 (∑XC)2
SSC = + + – C.F.
nA nB nC
(3)2 (4)2 (10)2
= + + – 22.23 = 4.02
4 4 5
v. The sum of squares within samples (due to errors) is obtained as
SSE = TSS – SSC = 22.77 – 4.02 = 17.75
One-Way ANOVA table
S.V d.f. SS MSS F-ration F∝
Between 3–1=2 4.02 4.02 MSC F0.05,(2,10)df.
MSC = = 2.01 F=
samples 2 MSE = 4.10
Within samples 13 – 3 = 10 18.75 18.75 = 4.10
MSE = = 1.875
10
Total 13 – 1 = 12 22.77
Degree of freedom: df = (c – 1, n – c) = (2,10)
Critical value: The tabulated value of the test statistic F at 5% level of significance for (2,10)
degree of freedom is 4.10, i.e., F0.05,(2,12) = 4.10.
Decision: Since the calculated value of F = 1.072 is less than the tabulated value of
F0.05,(2,10) = 4.10, H0 is accepted. Hence we conclude that there is no significant difference in
the mean weekly sales of three salesmen.
127
4.2.2 Two-way analysis of variance or two-way classification
In one way analysis of variance we have studied the effect of one factor on different
sample (treatment) groups. Under two way analysis of variance we will discuss the effect of
two factors. The data here are classified according to the two different factors. For eg. sale of
a production may vary sales man to sales man as well as it may vary with season to season
similarly, the production of garments will vary machine to machine as well as it may vary
machine operator to machine operators. In two way analysis of variance (ANOVA):
Total variation = Variation due to columns + variation due to rows + variation due to
experimental error
i.e., Total sum of squares = sum of square due to columns + sum of square due to rows + sum
of square due to error
or, TSS = SSC + SSR + SSE
The following steps are used for carrying two-way analysis of variance.
Step 1 : Setting up hypotheses
Null hypothesis
H0: µ1 = µ2 = .... = µr, i.e, there is no significant difference among 'r' rows.
H0: µ1 = µ2 = .... = µc, i.e, there is no significant difference among 'c' columns.
Alternative hypothesis
H1: µ1 ≠ µ2 ≠ .... ≠ µr, i.e., there is no significant difference among 'r' rows.
H1: µ1 ≠ µ2 ≠ .... ≠ µc, i.e., there is no significant difference among 'c' columns.
Step 2: Level of significance (α)
Choose the appropriate level of significance in advance. The most commonly used is α = 5%
unless otherwise stated.
Step 3: Test Statistic
Under H0:µ1 = µ2 = .... = µr, the test statistic is
MSR
F= ∼ F(r – 1), (r – 1)× (c – 1)
MSE
Under H0 : µ1 = µ2 = ... = µc, the test statistic is
MSC
F= ∼ F(c – 1), (r – 1)× (c – 1)
MSE
Where,
MSR = Mean sum of squares due to rows,
MSC = Mean sum of squares due to columns,
MSE = Mean sum of squares due to errors,
Calculations of MSR, MSC and MSE:
1. Find the sum of the values of all the items of all the samples and denote it by T, i.e.,
T = ∑∑ Xij
T2
2. Find the correction factor (C.F.) as C.F. =
n
Where, n = total number or observations.
2
3. Find the sum of the squares of all items of all the samples, i.e., find ∑∑Xij .
4. Find the total sum of squares (TSS) as
TSS = ∑∑Xij2 – C.F.
128
5. Find the sum of square due to rows (SSR) as
130
Sum of squares between rows (seasons) is
2
(∑S)2 + (∑XM)2 + (∑XW)2 T r
SSR = – C.F. = ∑ – C.F
nj nj
1282 + 1122 + 1202
= – 10800 = 32
4
Total 11 210
Decision:
i. Since the calculated value of FR = 0.71 is less than the tabulated value of F0.05,(2,6) = 5.14,
H0 is accepted. Hence we conclude that there is no significant difference in seasons as far
as sales concerned.
ii. Since the calculated value of FC = 0.62 is less than the tabulated value of F0.05,(3,6) = 4.76,
H0 is accepted. Hence we conclude that the sales of different salesmen do not differ
significantly.
131
Example:
Set up a two way ANOVA table for the data given below:
Treatment
Pieces of field
A B C D
P 45 40 38 37
Q 43 41 45 38
R 39 39 41 41
(Use coding method subtracting 40 from the given numbers)
Solution:
Setting up hypotheses:
i. H0: µA = µB = µC, = µD, i.e., there is no significant difference in the average yield
between different treatments.
H1: µA ≠ µB ≠ µC, ≠ µD, i.e., there is no significant difference in the average yield
between different treatments.
ii. H0: µP = µB = µQ, = µR, i.e., there is no significant difference in pieces of field as far as
yield concerned.
H1: µP ≠ µQ ≠ µR i.e., there is significant difference in pieces of field as far as yield
concerned.
Level of significance:
Since the level of significance is not given, we take α = 0.05.
Test Statistic:
Subtracting 40 from the given numbers, the coding table can be written as
Calculation of TSS, SSR, SSC and SSE
Treatments
XA XB XC XD Row total (Tr)
Fields
XP 5 0 –2 –3 ∑X P = 0
XQ 3 1 5 –2 ∑ XQ = 7
XR –1 –1 1 1 ∑ XR = 0
Column total (Tc) ∑XA = 7 ∑XB = 0 ∑XC = 4 ∑XD = 4 T=7
Here, n = 12
T 2 72
Correction factor (C.F.) = = = 4.083
n 12
Total sum of squares (TSS) is
TSS = ∑∑Xij2 – C.F.
= 52 + 02 + – (2)2 + (–3)2 + 32 + 12 + 52 = (– 2)2 + (– 1)2 + (– 1)2 + 12 + 12 – 4.086 = 76.917 Sum
of squares between rows (pieces of field) is
132
(∑A)2 + (∑XB)2 + (∑XC)2 + (∑XD)2 T2c
SSC =
ni – C.F. = ∑ ni – C.F
72 + 02 + 42 +(– 4)2
= – 4.083 = 22.917
3
Sum of squares between rows (seasons) is
133
Theoretical Questions
1. Explain the applications of F-tests stating the assumptions involved.
2. Discuss, how do you test the significance of difference between two independent estimates of
population variances.
3. Discuss the variance ratio test stating the assumptions involved.
4. Describe the F-test for testing of equality of two population variances. State clearly the
assumptions involved.
5. What do you understand by analysis of variance (ANOVA) ? What are its basic assumptions?
6. Describe briefly the procedure followed in one-way ANOVA.
7. Describe briefly the procedure followed in two-way ANOVA?
Practical Problems
1. Test the hypothesis whether the two samples have been taken from two populations with same
variance.
Samples I 75 74 86 82 72 76 80
Samples II 105 100 115 119 106 110 115 111 107
2. Two horses A and B were tested according to the time (in seconds) to run a particular track
with the following results:
Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29
Test whether the two horses have the same variability in funning capacity.
3. In a test given to two groups of students drawn from two normal populations, the marks
obtained were as follows:
Group A 18 20 36 50 49 36 34 49 41
Group B 29 28 26 35 30 44 46
Obtain estimates of the variances of populations and examine at 5% level, whether the two
populations have the same variance.
4. Obtain the estimates of the population variances and test whether the populations have same
variances or not from the following
Samples I n1 = 10 –
∑ (X1 – X1 )2 = 120 hrs
Samples II n1 = 12 –
∑ (X2 – X2 )2 = 314 hrs
5. An engineer measured the length (in ft.) of 16 GI pipes produced by two factory A and B. with
the same measuring instrument. The mean and variance were obtained as follows:
Mean Variance
Factory A 20.33 1.54
Factory B 22.4 2.96
Do these data present sufficient evidence to indicate a difference in variability of length of the
pipes of the two factories A and B?
134
6. Two random samples of 10 and 12 students are drawn from government school and private
school and marks obtain in the examination are as follows:
Govt. School 45 40 35 50 70 25 50 66 53 45
Pvt. School 65 45 70 60 65 50 60 70 65 65 75 65
(a) Test whether (i) the difference in variability of marks of students from government school
and private boarding school is signification. (ii) the difference in mean marks of students
from Gov. School and private boarding school is significant.
(b) Are the marks of students of government school and private boarding school significantly
different?
7. Two new drugs A and B are given to two independent and random groups of 10 and 12
patients with heart disease, drawn from two normal population respectively. The reduction of
blood pressures due to the drugs A and B recorded are as follows:
Drug A 7 16 14 9 10 11 6 8 10 9
Drug B 10 12 16 14 11 12 13 8 12 15 9 12
(i) Is the difference in variability of reduction of blood pressures due to drugs A and B
significant?
(ii) Is the sample of reception of blood pressure due to drug A reveals significantly higher
variability than that due to drug B?
8. The mean diameter of a steel pipe produced by two processes, A and B, is practically the same but
the standard deviation may differ. For a sample of 22 pipes produced by A, the standard deviation
is 2.9 m, while for a sample of 16 pipes produced by B; the standard deviation is 3.8 m. Test
whether the pipes produced by process a have the same variability as those of process B.
9. Two independent samples of sizes 9 and 8 gave the sum of squares of deviations from their
respective means are 160 and 91 respectively. Can the samples be regarded as drawn from the
normal populations with equal variances?
10. The average rate of return recorded by an investor are as follows:
Financial Sector Manufacturing Sector Hotels
6 2 1
5 3 2
7 1 3
10 5 1
12 4 3
Is there any significance difference in the average return due to the sectors? Test the
hypothesis at 5% level of significance.
135
11. The following table represents the sales of three salesmen in four districts.
Sales Figures ('000)
Districts
A B C
Kathmandu 14 20 16
Lalitpur 12 23 15
Bhaktapur 10 20 10
Palpa 8 18 12
Test whether there is any significance difference in the sales by three salesmen.
12. Sales men in various sectors are assigned to increase the sales. Data are recorded as below:
Region
I II III
20 30 25
80 40 50
50 30 40
60 40 30
70 50 40
At 5% level of significance, test whether there is significant difference in sales due to the regions.
13. Four groups of students were subjected to different teaching methods and tested at the end of
specified period of time. As a result of dropouts from the experiments groups the number of
students varied group to group. Do the data shows in the table present the sufficient evidence
to indicate a difference in the mean achievement for the four teaching methods? Use α = 0.05.
Group 1 65 87 73 79 81 69
Group 2 75 69 83 81 72 79 90
Group 3 59 78 67 62 83 76
Group 4 94 89 80 88
14. A trucking company wishes to test the average life of each of the four brands of tyres. The
company on randomly uses all brands on randomly selected trucks. The records showing the
lives (thousands of miles) of tyres are given as follows:
Brand I 20 23 18 17
Brand II 19 15 17 20 16
Brand III 21 19 20 17 16
Brand IV 15 17 16 18
Test the hypothesis that the average life for each brand of tyres is the same.
15. The following table shows the lives in hours of four batches of electric lamps:
A 1600 1610 1650 1680 1700 1720 1800
B 1580 1640 1640 1700 1750
C 1460 1550 1600 1620 1640 1660 1740 1820
D 1510 1520 1530 1570 1600 1680
Perform an analysis of variance of these data.
136
16. A research company has designed three different systems to clean up oil spills. The following
table contains the results, measured by how much surface area (in sq.m) is cleaned per hour.
Are the three systems equally effective? Use 5% level of significance.
System A System B System C
55 57 66
60 53 52
63 64 61
56 49 57
59 62 -
55 - -
Total 348 285 236
Grand total 869
17. The labor productivity index of Nepal are recorded as below:
Year
Sector
1985 1990 1995
Agriculture 100 125 67
Manufacturing 100 60 68
Community and Social Service 100 89 80
Does the labor productivity index vary due to the difference in sector as well as difference in
time period?
18. Perform a two-way analysis of variance, using 5% level of significance.
Treatments
Plots of Land
A B C D
I 38 40 41 39
II 45 42 49 36
III 40 38 42 42
Use the coding method for subtracting 40 from the given numbers.
19. A company appoints four salesmen A, B, C and D and observes their sales in three seasons:
summer, monsoon and winter. The figures of their sales (in thousands Rs.) are given below:
Salesmen
A B C D
Season
Summer 36 36 21 35
Monsoon 26 28 29 29
Winter 28 29 31 32
Test whether (i) the effect of seasons and (ii) the effect of salesman significant on the sales of
the company.
20. The following table gives the number of television sold by 4 salesmen in three months May,
June and July:
Salesmen
Month
A B C D
May 50 36 21 35
June 46 28 29 29
July 39 29 31 32
Is there a significant difference in the sales made by the four salesmen? Is there a significant
difference in the sales made during different months?
137
21. Sep up an analysis of variance for the following tow way design results:
Per acre Production data for wheat
Varieties of seeds A B C
W 6 5 5
X 7 5 4
Y 3 3 3
Z 8 7 4
22. The yields of three varieties of wheat using four different kinds of fertilizers are given Table
below:
Fertilizer treatment Variety of Wheat
Total
v1 v2 v3
t1 64 72 74 210
t2 55 57 47 159
t3 59 66 58 183
t4 58 57 53 168
Total 236 252 232 720
Test the hypothesis at 0.05 level of significance that
(a) There is no difference in the average yield of wheat when different kinds of fertilizer are used
(b) There is no difference in the average yield of the three varieties of wheat.
23. The following data represent the sale (Rs. '000) per month of three brands of fairness cream
allocated among three cities:
Cities
A B C
Brands
I 12 48 30
II 42 54 57
III 9 42 21
Test whether (i) the mean sales of the three brands are equal and (ii) the mean sales of fairness
cream in each city are equal.
24. Four trained operator work in four machines in production of a new product. The productivity
of the operators and machine are recorded as below:
Machines
Operators
1 2 3 4
1 10 12 14 16
2 12 11 13 16
3 14 15 12 11
4 16 10 17 17
Test whether difference in average productivity is due to the difference in operators or the machines.
25. A company appoints four salesmen, A, B, C and D and observes their sales in three regions.
The figures are given below:
Salesmen
Region A B C D
X 164 155 159 158
Y 172 157 166 157
Z 174 147 158 153
Find out if there is difference in the sales of different salesmen and regions. (Test at 1% level)
138
26. To study the performance of three detergents and three different water temperatures, the
following whiteness readings were obtained with specially designed equipment.
Water temp Detergent A Detergent B Detergent C
Cold 57 55 67
Warm 49 52 60
Hot 54 46 56
Perform a two way analysis of variance using 5% level of significance.
27. The following information gives the number of units of a product produced by three different
types of machines:
Total sum of squares = 1128
Sum of squares between machines = 448
No. of units produced = 9
Perform one-way AVOVA for the given data.
28. Following information gives the yields for three varieties of wheat grown on four plots:
Total sum of squares = 32
Sum of square between varieties of wheat = 8
Sum of square between plots = 18
Perform two-way ANOVA.
1. F = 1.42, accept H0 2. F = 1.03 accept H0
2 2
3. S = 141.75, S = 64.33, F = 2.03 accept H0
1 2
2 2
4. S = 13.33, S = 28.55, F = 2.14 accept H0
1 2
5. F = 1.92 accept H0
6. (a) (i) F = 2.63 accept H0 (ii) |t| = 3.25 reject H0 (b) H1 : µPB > µGov
2 2
7. (i) F = 1.05 accept H0 (ii) F = 1.05 accept H0 : σA = σB
8. F = 1.72, accept H0 9. F = 1.54 accept H0
10. F = 12.92 reject H0 11. F = 0.55 accept H0
12. F = 2.468 accept H0 13. F = 3.76 reject H0
14. F = 1.67 accept H0 15. F = 2.15 accept H0
16. F = 0.17 accept H0 17. (i) F = 3.71 accept H0, (ii) F = 0.16 accept H0
18. (i) F = 1.312 accept H0, (ii) F = 1.218 accept H0
19. (i) F = 0.71 accept H0, (ii) F = 0.62 accept H0
20. (i) F = 1.02 accept H0, (ii) F = 3.33 accept H0
21. (i) F = 4 accept H0, (ii) F = 6 reject H0
22. (i) FR = 9.22, reject H0 (ii) Fc = 1.56, accept H0
23. (i) F = 9.4 reject H0, (ii) F = 10.3 reject H0
24. (i) F = 1.036; F = 0.621, H0 is accepted
25. FR = 1.55, accept H0 and Fc = 9.22, reject H0
26. FR = 5, accept H0 and Fc = 8.42, reject H0
27. F = 1.98 accept H0
28. (i) F = 4 accept H0, (ii) F = 6 reject H0
139
9
1. The test statistic F = 90 is used for testing the null hypothesis
141
21. Which of the following assumption is necessary for using ANOVA?
a. The population is continuous. b. The population has median.
c. The population is symmetric. d. All of the above.
22. The value of Tij is equal to
a. ∑QA<5 ∑PB<5 xAB b. ∑QA<5 ∑PB<5 xij2
RS RS
c. ∑QA<5 ∑PB<5 xAB − d. ∑QA<5 ∑PB<5 xAB − ∑QA<5
PQ PQ
a. H 0: p 1 < p 2 b. H 0: p 1 > p 2 c. H 0: p 1 ≠ p 2 d. H 0: p 1 = p 2
1. b 2. c 3. a 4. b 5. b 6. d 7. a 8. a 9. c 10. b
11. c 12. a 13. d 14. d 15. b 16. a 17. c 18. a 19. b 20. d
XXX
142
Unit
Non-Parametric Tests
143
5.1 Definition
Non-parametric tests are defined as "those statistical tests which do not depend on any
assumption about the form of the population i.e., those tests whose models do not specify the
conditions about the parameters of the parent population from which the sample has been drawn and
these tests are based on ordered sample observations or ordered statistics."
The non-parametric tests are also called distribution free tests because in these tests we make
no assumption about the distribution of the parent population i.e., the distribution is unspecified.
The measures of location and dispersion which are commonly used in N-P tests are median
quartiles, range, inter-quartile range etc. for which an ordered sample is desirable.
Through the N-P tests do not depend on assumption about the form of the population
following are some basis assumptions associated with the N-P tests.
1. The sample observations are independent.
2. The variable under study is continuous.
3. pdf is continuous
4. Lower order moments exist.
145
5.2.3 Conditions for the validity of χ2 test
The χ2 test is used under the following assumptions or conditions:
1. The sample observations should be independent.
2. The observed frequency should be equal to the expected frequency.
3. The total frequency should be reasonably large, say greater than 50.
4. No theoretical cell frequency should be less than 5. If any theoretical frequency is less than b
5, then for the applications of chi-square test, it is pooled with the preceding or succeeding
frequency so that the pooled frequency is more than 5 and finally adjusts for the degree of
freedom lost in polling.
146
Step 4: Degree of freedom The degree of freedom is obtained as
df = n1 – 1 – k1 – k2
Where,
1. 1 d.f. is lost because of the linear constraint ∑O = ∑ E = N.
2. k1 is the number of parameters computed from the given data and used in estimating the
theoretical frequencies of the distribution, if there is no need of computation of
parameters we take k1 as zero.
3. k2 is the number of d.f. lost in pooling of theoretical cell frequencies which are less than
5. if pooling is not necessary we take k2 as zero.
Step 5: Critical value
The tabulated value of χ2 for a given level of significance and degree of freedom (d.f.) is
obtained from the chi-square table.
Step 6: Decision
1. if the calculated value of χ2 is greater than the tabulated value of χ2 is greater than the
tabulated value of χ2 , then we reject, H0 and hence we concludes that there is significant
difference between observed and expected frequencies. That is, experiment does not
support the theory.
Example:
The following table lists the frequency distribution of cars sold at an auto dealership during
the past 10 months.
Months Jan Feb Mar Apr May Jun Jul Aug Sep Oct
Cars sold 23 17 15 10 14 12 13 15 26 25
Using the 5% significance level, will you conclude that the number of cars sold at this
dealership is the same for each month?
Solution:
Setting up hypothesis:
Null hypothesis H0: The number of cars sold at this dealership is the same for each month.
Alternative hypothesis H1: The number of cars sold at this dealership is not the same for
each month.
Level of significance: It is given that the level of significance α = 5%
Test statistic: Under H0 the test statistic is,
(O – E)2
χ2 = ∑ Where, O = Observed frequency, E = Expected frequency
E
Here, n1 = 10, under the null hypothesis, the expected frequency of number of cars sold is
∑O 170
E= = = 17
n1 10
147
Calculations of χ2
Months O E O–E (O – E)2 (O – E)2
E
Jan 23 17 6 36 2.118
Feb 17 17 0 0 0
Mar 15 17 –2 4 0.235
Apr 10 17 –7 49 2.882
May 14 17 –3 9 0.529
June 12 17 –5 25 1.471
July 13 17 –4 16 0.941
Aug 15 17 –2 4 0.235
Sept 26 17 9 81 4.765
Oct 25 17 8 64 3.765
∑ O = 170 ∑ E = 170 (O – E)2
∑ = 16.94
E
(O – E)2
∴ χ2 = ∑ = 16.94
E
Degree of freedom: d.f = n1 – 1 = 10 – 1 = 9
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 9
degree of freedom is 16.92, i.e., χ2 0.05,9 = 16.92.
Decision: Since the calculated value of χ2 = 16.94 is greater than the tabulated value of
χ20.05,9 = 16.92 Ho is rejected and H1 is accepted. Hence we conclude that the number of cars
sold at this dealership is not the same for each month.
Example:
The following figures show the distribution of digits in number chosen at random from a
telephone directory:
Digits 0 1 2 3 4 5 6 7 8 9
Frequency 925 875 965 900 935 950 875 800 875 900
Test the hypothesis that the digits are distributed randomly throughout a telephone directory.
Solution:
Setting up hypothesis:
Null hypothesis H0: The digits are not randomly distributed throughout a telephone directory.
In other words, the digits are uniformly distributed throughout a telephone directory.
Alternative hypothesis H1: The digits are randomly distributed throughout a telephone directory.
Level of significance: Since the level of significance is not given, we take α = 0.05.
(O – E)2
Test statistic: Under H0 the test statistic is, χ2 = ∑
E
Where, O = Observed frequency, E = Expected frequency
Here, n1 = 10, under the null hypothesis, the expected frequency of number of cars sold is
∑O 9000
E= = = 900
n1 10
148
Calculations of χ2
Digits O E O–E (O – E)2 (O – E)2
E
0 925 900 25 625 0.694
1 875 900 – 25 625 0.694
2 965 900 65 4225 4.694
3 900 900 0 0 0
4 935 900 35 1225 1.361
5 950 900 50 2500 2.778
6 875 900 – 25 625 0.694
7 800 900 – 100 10000 11.111
8 875 900 – 25 625 0.694
9 900 900 0 0 0
∑ O = 9000 ∑ E = 9000 (O – E)2
∑ = 22.72
E
(O – E)2
∴ χ2 = ∑ = 22.72
E
Degree of freedom: df = n1 – 1 = 10 – 1 = 9
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 9
degree of freedom is 16.92, i.e., χ2 0.05,9 = 16.92.
Decision: Since the calculated value of χ2 = 22.72 is greater than the tabulated value of
χ20.05,9 = 16.92 Ho is rejected and H1 is accepted. Hence we conclude that the number of cars
sold at this dealership is not the same for each month.
Example:
A bank has an ATM installed inside the bank, and it is available to its customers only from 6
AM to 6 PM Sunday through Friday. The manager of the bank wanted to investigate if the number
of transactions made on this ATM is the same for each of the six days (Sunday to Friday) of the
week. She randomly selected one week and counted the number of transactions made on this ATM on
each of the six days during this week. The information she obtained is given in the following table:
Days Sun. Mon. Tue. Wed. Thu. Fri.
No. of user 253 217 224 279 267 260
At the 1% level of significance test the hypothesis that number of people who use this ATM
on each of the six days of the week is the same.
Solution:
Setting up hypothesis:
Null hypothesis H0: The number of people who use this ATM on each of the six days of the
week is the same.
Alternative hypothesis H1: The number of people who use this ATM on each of the six days
of the week is not the same.
Level of significance: It is given that the level of significance α = 1%
149
Test statistic: Under H0 the test statistic is,
(O – E)2
χ2 = ∑
E
Where, O = Observed frequency, E = Expected frequency
Here, n1 = 6, under the null hypothesis, the expected frequency of number of cars sold is
∑O 1500
E= = = 250
n1 6
Calculations of χ2
(O – E)2
Days O E O–E (O – E)2
E
Sun 253 250 3 9 0.036
Mon 217 250 – 33 1089 4.356
Tue 224 250 – 26 676 2.704
Wed 279 250 29 841 3.364
Thu 267 250 17 289 1.156
Fri 260 250 10 100 0.400
(O – E)2
∑ O = 1500 ∑ E = 1500 ∑ = 12.02
E
(O – E)2
∴ χ2 = ∑ = 12.72
E
Degree of freedom: df = n1 – 1 = 6 – 1 = 5
Critical value: The tabulated value of the test statistic χ2 at 1% level of significance for 5
degree of freedom is 15.09, i.e., χ2 0.01,5 = 15.09.
Decision: Since the calculated value of χ2 = 12.02 is less than the tabulated value of χ20.01,5 =
15.09, Ho is accepted. Hence we conclude that the number of people who use this ATM on
each of the six days of the week is the same.
Example:
A die is tossed 240 times and the following results were obtained.
Faces 1 2 3 4 5 6
Frequency 45 35 40 37 41 42
Test the hypothesis that the die is unbiased.
Solution:
Setting up hypothesis:
Null hypothesis H0: The die is unbiased.
Alternative hypothesis H1: The die is not unbiased.
Level of significance: Since the level of significance is not given, we take α = 0.05.
Test statistic: Under H0 the test statistic is,
(O – E)2
χ2 = ∑
E
Where, O = Observed frequency, E = Expected frequency
Here, n1 = 6, under the null hypothesis, the expected frequency of number of cars sold is
∑O 240
E= = = 40
n1 6
150
Calculations of χ2
Faces O E O–E (O – E)2 (O – E)2
E
1 45 40 5 25 0.625
2 35 40 –5 25 0.625
3 40 40 0 0 0
4 37 40 –3 9 0.225
5 41 40 1 1 0.025
6 42 40 2 4 0.1
∑O = 240 ∑ E = 240 (O – E)2
∑ = 1.6
E
2 (O – E)2
∴ χ =∑ = 1.6
E
Degree of freedom: df = n1 – 1 = 6 – 1 = 5
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 5
degree of freedom is 11.07, i.e., χ2 0.05,5 = 11.07.
Decision: Since the calculated value of χ2 = 1.6 is less than the tabulated value of χ20.05,5 =
11.09, Ho is accepted. Hence we conclude that the die is unbiased.
Example:
A sample analysis of examination results of 200 MBA's was made. It was found that 46
students had failed, 68 secured a first division, 62 secured a second division and the rest were
placed in third division. Are these figures commensurate with the general examination result
which in the ration of 4: 3: 2: 1 for various categories respectively?
Solution:
Setting up hypothesis:
Null hypothesis H0: The given data commensurate with the general examination result which
is in the ration of 4:3:2:1.
Alternative hypothesis H1: The given data do not commensurate with the general examination
result which is in the ration of 4: 3: 2: 1.
Level of significance: Since the level of significance is not given, we take α = 0.05.
(O – E)2
Test statistic: Under H0 the test statistic is, χ2 = ∑
E
Where, O = Observed frequency, E = Expected frequency
The expected frequencies for different division are
4
E (failed) = × 200 = 80
4+3+2+1
3
E (first division) = × 200 = 60
4+3+2+1
2
E (second division) = × 200 = 40
4+3+2+1
1
E (third division) = × 200 = 20
4+3+2+1
151
Calculations of χ2
Result O E O–E (O – E)2 (O – E)2
E
Failed 46 80 – 34 1156 14.45
1st Division 68 60 8 64 1.067
2nd Division 32 40 22 484 12.1
3rd Division 24 20 4 16 0.8
∑ O = 200 ∑ E = 200 (O – E)2
∑ = 28.42
E
(O – E)2
∴ χ2 = ∑ = 28.42
E
Degree of freedom: df = n1 – 1 = 4 – 1 = 3
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 3
degree of freedom is 7.82, i.e., χ2 0.05,3 = 7.82.
Decision: Since the calculated value of χ2 = 28.02 is greater than the tabulated value of
χ20.05,3 = 7.82, Ho is rejected and H1 accepted. Hence we conclude that the given data do not
commensurate with the general examination result which is the ration of 4: 3: 2: 1.
Example:
Among 64 offspring's of a certain cross between guinea pigs, 34 were red, 10 were black and
20 were white. According to the genetic model these numbers should be in the ration 9:3:4.
Are the data consistent with the model at 5% level of significance?
Solution:
Setting up hypothesis:
Null hypothesis H0: The proportions of colors of guinea pigs are in the ration of 9:3:4. In
other words; the data are consistent with the genetic model.
Alternative hypothesis H1: The proportions of colors of guinea pigs are not in the ratio of
9:3:4. In other words, the data are not consistent with the generic model.
Level of significance: It is given that the level of significance α = 5%
(O – E)2
Test statistic: Under H0 the test statistic is, χ2 = ∑
E
Where, O = Observed frequency, E = Expected frequency
The expected frequencies for different colors of pigs are
9 3
E (red) = × 64 = 36, E (black) = × 64 = 12
9+3+4 9+3+4
4
E (white) = × 64 = 16
9+3+4
Calculations of χ2
Colors O E O–E (O – E)2 (O – E)2
E
Red 34 36 –2 4 0.111
Black 10 12 –2 4 0.333
White 20 16 4 16 1.000
∑ O = 64 ∑ E = 64 (O – E)2
∑ = 1.44
E
(O – E)2
∴ χ2 = ∑ = 1.44
E
152
Degree of freedom: df = n1 – 1 = 3 – 1 = 2
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 2
degree of freedom is 5.99, i.e., χ2 0.05,2 = 5.99.
Decision: Since the calculated value of χ2 = 1.44 is less than the tabulated value of χ20.05,2 =
5.99, Ho is accepted. Hence we conclude that the proportions of colors of guinea pigs are in
the ration of 9:3:4 In other words, the data are consistent with the genetic model.
Example:
In an experiment on pea breeding, Mendel obtained the following frequencies of seeds: 315
round and yellow, 101 wrinkled and yellow, 108 rounds and green, 32 wrinkled and green.
Theory predicts that the frequencies should be in the proportion 9:3:3:1 respectively. Is there
any evidence to doubt the theory at α = 0.05 level of significance?
Solution:
Setting up hypothesis:
Null hypothesis H0: The frequency of the Mendel is in the proportion of 9: 3: 3: 1. In other
words, the experiment supports the theory.
Alternative hypothesis H1: The frequency of the Mendel is not in the proportion of 9: 3: 3: 1.
In other words, the experiment does not support the theory.
Level of significance: It is given that the level of significance α = 5%
Test statistic: Under H0 the test statistic is,
(O – E)2
χ2 = ∑
E
Where, O = Observed frequency, E = Expected frequency
Calculation of expected frequency:
9
E (Round and yellow) = × 556 = 312.75,
9+3+3+1
3
E (wrinkled and yellow) = × 556 = 104.25
9+3+3+1
3
E (Round and green) = × 556 = 104.25
9+3+3+1
1
E (wrinkled and green) = × 556 = 34.75
9+3+3+1
Calculations of χ2
Mendel O E (O – E)2 (O – E)2
E
Round and Yellow 315 312.75 5.0625 0.0162
Wrinkled and Yellow 101 104.25 10.5625 0.1013
Round and green 108 104.25 14.0625 0.1349
Wrinkled and green 32 34.75 7.5625 0.2176
∑ O = 556 ∑ E = 556 (O – E)2
∑ = 1.44
E
(O – E)2
∴ χ2 = ∑ = 0.47
E
Degree of freedom: d.f = n1 – 1 = 4 – 1 = 3
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 3
degree of freedom is 7.82, i.e., χ2 0.05,3 = 7.82.
Decision: Since the calculated value of χ2 = 0.47 is less than the tabulated value of χ20.05,3 =
7.82, Ho accepted. Hence we conclude that the proportions of 9:3:3:1. In other words, the
experiment supports the theory.
153
5.2.6 Test of Independence of Attributes
In a test of independence, we test whether two attributes (characteristics) of a given
population are independent or not. For example, we may want to test if there is an association
between being a man or woman and having a preference for watching sports on television. As
another example, we may want to test is there an association gender and habits of smoking. For this,
information can be summarized and presented in two-way classification table which is also called
contingency table. The table below is the 2 × 2 contingency table presenting the distribution 200
persons according to their gender and smoking habits of the persons.
The procedure for testing the independence of two attributes presented in r × c contingency
table is as follows:
Step 1: Setting up hypotheses
H0: There is no significant relationship between two attributes, i.e., two attributes are
independent.
H1: There is association or relationships between two attributes, i.e., two attributes are
dependent.
Step 2: Level of significance
Choose the appropriate level of significance in advance. The most commonly used is α = 5%
unless otherwise stated.
Step 3: Compute the value of test-statistic
(O – E)2
χ2 = ∑
E
Where, O = Observed frequency
E = Expected frequency
Calculation of expected frequency:
The expected frequencies corresponding to each cell are calculated as follows:
Row total × Column total RT × CT
E= =
Grand total N
Step 4: Degree of freedom
The degree of freedom for r × c contingency table is
d.f = (number of rows – 1) × (number of columns – 1) = (r – 1) × ( c – 1)
Step 5: Critical value
The tabulated value of χ2 for a given level of significance and degree of freedom (df) is
obtained from the chi-square table.
154
Step 6: Decision
1. If the calculated value of χ2 is less than or equal to tabulated value of χ2, then we accept
H0, and hence we concludes that the attributes are independent.
2. If the calculated value of χ2 is greater than the tabulated value of χ2, then we reject H0,
and hence we concludes that the attributes are dependent.
Remarks:
Special case of r × c contingency table: 2 × 2 contingency table
Let us consider 2 × 2 contingency table as,
Total
a b a+b
c d c+d
Total a+c b +d N=a+b+c+d
For 2 × 2 table the value of χ2 is directed calculated by using the following formula:
N(ad – bc)2
χ2 =
(a + c) (b + d) (a + b) (c + d)
156
Example:
A sample of 500 workers of factory according to sex and nature of work is as follows:
Nature of work Male female Total
Technical 200 100 300
Non-Technical 50 150 200
Total 250 250 500
Test, at 5% level of significance, whether is there any relationship between sex and nature of work.
Solution:
Setting up hypothesis:
Null hypothesis H0: There is no relationship between sex and nature of work.
Alternative hypothesis H1: There is some relationship between sex and nature of work.
Level of significance: It is given that the level of significance α = 0.05.
2 (O – E)2
Test statistic: Under H0 the test statistic is, χ = ∑
E
RT × CT
Where, O = Observed frequency, E = Expected frequency=
N
Calculations of expected frequencies:
Nature of Work Male Female Row Total (RT)
Technical 200 100 300
Non-Technical 50 150 200
Column total (CT) 250 250 N = 500
300 × 250 300 × 250
∴ E (200) = = 150, E(100) = = 150
500 500
200 × 250 200 × 250
E (50) = = 100, E (150) = = 100
500 500
Calculations of χ2
O E O–E (O – E)2 (O – E)2
E
200 150 50 2500 16.67
100 150 – 50 2500 16.67
50 100 – 50 2500 25
150 100 50 2500 25
∑ O = 500 ∑ E = 500 (O – E)2
∑ = 83.34
E
(O – E)2
∴ χ2 = ∑ = 83.34
E
Degree of freedom: d.f = (r – 1) × (c – 1) = (2 – 1) × (2 – 1) = 1
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 1
degree of freedom is 3.84, i.e., χ2 0.05, 1 = 3.84.
Decision: Since the calculated value of χ2 = 83.34 is greater than the tabulated value of
χ20.05,9 = 3.84, Ho is rejected and H1 is accepted. Hence we conclude that there is some
relationship between sex and nature of work.
157
Example:
Do the following data provide evidence of the effectiveness of inoculation?
Attacked Not attacked Total
Inoculated 20 300 320
Not Inoculated 80 600 680
Total 100 900 1000
Test, at 5% level of significance, whether is there any relationship between sex and nature of work.
Solution:
Setting up hypothesis:
Null hypothesis H0: Inoculation is not effective.
Alternative hypothesis H1: Inoculation is effective.
Level of significance: Since the level of significance is not given, we take α = 0.05.
(O – E)2
Test statistic: Under H0 the test statistic is, χ2 = ∑
E
RT × CT
Where, O = Observed frequency, E = Expected frequency =
N
Calculations of expected frequencies:
Attacked Not attacked Row Total (RT)
Inoculated 20 300 320
Not Inoculated 80 600 680
Total 100 900 N = 1000
320 × 100 320 × 900
∴ E (20) = = 32, E(300) = = 228
1000 1000
680 × 100 6800 × 900
E (80) = = 68, E(600) = = 612
1000 1000
Calculations of χ2
O E O–E (O – E)2 (O – E)2
E
20 32 – 12 144 4.5
300 288 12 144 0.5
80 68 12 144 2.118
600 612 – 12 144 0.235
∑ O = 1000 ∑ E = 1000 (O – E)2
∑ = 7.35
E
(O – E)2
∴ χ2 = ∑ = 7.35
E
Degree of freedom: d.f = (r – 1) × (c – 1) = (2 – 1) × (2 – 1) = 1
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 1
degree of freedom is 3.84, i.e., χ2 0.05, 1 = 3.84.
Decision: Since the calculated value of χ2 = 7.35 is greater than the tabulated value of
χ20.05,1 = 3.84, Ho is rejected and H1 is accepted. Hence we conclude that inoculation is
effective.
158
Example:
A company is interested in determining whether an association exists between the commuting
time of their employees and the level of stress related observed on the job. A study of 116 assembly-
line workers reveals the following:
Stress
Commuting time Total
High Moderate Low
Under 20 min 9 5 18 32
20-50 min 17 8 28 53
Over 50 min 18 6 7 31
Total 44 19 53 116
At 1% level of significance, is there any evidence of a significant relationship between
commuting time and stress?
Solution:
Setting up hypothesis:
Null hypothesis H0: The stress on the job is independent of commuting time.
Alternative hypothesis H1: The stress on the job is not independent of commuting time.
Level of significance: it is given that the level of significance, α = 0.01
Test statistic: Under H0 the test statistic is,
(O – E)2
χ2 = ∑
E
RT × CT
Where, O = Observed frequency, E = Expected frequency =
N
Calculations of expected frequencies:
Stress Row Total
Commuting time
High Moderate Low (RT)
Under 20 min 9 5 18 32
20-50 min 17 8 28 53
Over 50 min 18 6 7 31
Column total (CT) 44 19 53 N = 116
The contingency table is of 3 × 3, the degree of freedom would be (3 – 1) × (3 – 1) =4, that is,
we will have to calculate only four expected frequencies and others can be calculate only four
expected frequencies and other can be calculated automatically as shown below:
32 × 44
E11 = E (9) = = 12.14,
116
32 × 19
E12 = E (5) = = 5.24,
116
53 × 44
E21 = E (17) = = 20.10,
116
53 × 19
E22 = E (8) = = 8.68
116
159
The remaining frequencies are computed as,
E13 = E (18) = 32 – E(9) – E (5) = 32 – 12.14 – 5.24 = 14.62,
E23 = E (28) = 53 – E(17) – E (9) = 53 – 20.10 – 8.68 = 24.22
E31 = E (18) = 44 – E(9) – E (17) = 44 – 12.14 – 20.10 = 11.76
E32 = E (6) = 19 – E(5) – E (8) = 19 – 5.24 – 8.68 = 5.08
E33 = E (7) = 53 – E13 – E23 = 53 – 41.62 – 24.22 = 14.16
Calculations of χ2
O E O–E (O – E)2 (O – E)2
E
9 12.14 – 3.14 9.8596 0.8122
5 5.24 – 0.24 0.0576 0.0109
18 14.62 3.38 11.4244 0.7814
17 20.10 – 3.10 9.6100 0.4781
8 8.68 – 0.68 0.4624 0.0533
28 24.22 3.78 14.2884 0.5899
18 11.76 6.24 38.9376 3.3110
6 5.08 0.92 0.8464 0.1666
7 14.16 – 7.16 51.2656 3.6204
∑ O = 116 ∑ E = 116 (O – E)2
∑ = 9.82
E
(O – E)2
∴ χ2 = ∑ = 9.82
E
Degree of freedom: d.f = (r – 1) × (c – 1) = (3 – 1) × (3 – 1) = 4
Critical value: The tabulated value of the test statistic χ2 at 1% level of significance for 4
degree of freedom is 13.28, i.e., χ2 0.05, 4 = 13.28.
Decision: Since the calculated value of χ2 = 9.82 is less than the tabulated value of
χ20.05,4 = 13.28, Ho is accepted. Hence we conclude that the stress on the job is independent of
commuting time.
Example:
Two researchers adopted different sampling techniques while investigating the same group of
students to find the number of students falling in different intelligence levels, the results are as
follows:
No. of students in each level
Researcher Total
Below average Average Above average Genius
X 86 60 44 10 200
Y 40 33 25 2 100
Total 126 93 69 12 300
Would you say that the sampling techniques adopted by the two researchers are significantly
different?
160
Solution:
Setting up hypothesis:
Null hypothesis H0: there is no significant difference between the sampling techniques
adopted by the two researchers.
Alternative hypothesis H1: there is significant difference between the sampling techniques
adopted by the two researchers.
Level of significance: Since the level of significance is not given, we take α = 0.05.
(O – E)2
Test statistic: Under H0 the test statistic is, χ2 = ∑
E
RT × CT
Where, O = Observed frequency, E = Expected frequency=
N
Calculations of expected frequencies:
X 86 60 44 10 200
Y 40 33 25 2 100
The contingency table is of 2 × 4, the degree of freedom would be (2 – 1) × (4 – 1) =3, that is,
we will have to calculate only three expected frequencies and others can be calculate
automatically as shown below:
200 × 126
E (86) = = 84
300
200 × 93
E (60) = = 62
300
200 × 69
E (44) = = 46
300
The remaining frequencies are computed as,
E(10) = 200 – E (86) – E(60) – (E44) = 200 – 84 – 62 – 46 = 8
E(40) = 126 – E (86) = 126 – 84 = 42
E(33) = 93 – E (60) = 93 – 62 = 31
E(25) = 69 – E (44) = 69 – 46 = 23
E(2) = 12 – E (10) = 12 – 8 = 4
161
Calculations of χ2
O E O–E (O – E)2 (O – E)2
E
86 84 2 4 0.048
60 62 –2 4 0.064
44 46 –2 4 0.087
10 8 2 4 0.500
40 42 –2 4 0.095
33 31 2 4 0.0129
25 23 0 0 0
2 27 4 27
∑O = 300 ∑E = 300 (O – E)2
∑ = 0.92
E
2 (O – E)2
∴ χ =∑ = 0.92
E
Degree of freedom: d.f = (r – 1) × (c – 1) – 1 = (2 – 1) × (4 – 1) = 2.
Since, 1 d.f is lost in polling.
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 2
degree of freedom is 5.99, i.e., χ2 0.05,2 = 5.99.
Decision: Since the calculated value of χ2 = 0.92 is less than the tabulated value of
χ20.05,2 = 5.99, Ho is accepted. Hence we conclude that there is no significant difference
between the sampling techniques adopted by the two researchers.
Example:
In an experiment on the immunization of goats from anthrax, the following results were
obtained. test the efficacy of the vaccine.
Died Survive Total
Inoculated 2 10 12
Not Inoculated 6 6 12
Total 8 16 24
Solution:
Setting up hypothesis:
Null hypothesis H0: The vaccine is not effective in the immunization of goats from anthrax.
Alternative hypothesis H1: the vaccine is effective in the immunization of goats from anthrax.
Level of significance: Since the level of significance is not given, we take α = 0.05.
Test statistic: Under H0 the test statistic is,
(O – E)2
χ2 = ∑
E
RT × CT
Where, O = Observed frequency, E = Expected frequency =
N
Since the cell frequency 2 is less than 5, we apply Yates correction for continuity for
computing the value of chi-square. This consists in adding 0.5 to the cell frequency which is
less than 5 and adjusting the remaining frequencies as given in table below:
162
Calculations of expected frequencies:
Died Survive Row Total(RT)
Inoculated 2 + 0.5 = 2.5 10 – 0.5 = 9.5 12
Not Inoculated 6 – 0.5 = 5.5 6 + 0.5 = 6.5 12
Column Total(CT) 8 16 N = 24
Calculation of expected frequencies:
The contingency table is of 2 × 2, the degree of freedom would be (2 – 1) × (2 – 1) = 1, that
is, we will have to calculate only one expected frequency and others can be calculate
automatically as shown below:
12 × 8
∴ E (2.5) = = 4, E (9.5) = 12 – E (2.5) = 12 – 4 = 8
24
E (5.5) = 8 – E (2.5) = 8 – 4 = 4, E (6.5) = 16 – E (9.5) = 16 – 8 = 8
Calculations of χ2
O E O–E (O – E)2 (O – E)2
E
2.5 4 – 1.5 2.25 0.5625
9.5 8 1.5 2.25 0.28125
5.5 4 1.5 2.25 0.5625
6.5 8 – 1.5 2.25 0.28125
∑ O = 24 ∑ E = 24 (O – E)2
∑ = 1.69
E
(O – E)2
∴ χ2 = ∑ = 1.69
E
Degree of freedom: d.f = (2 – 1) × (2 – 1) = 1
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 1
degree of freedom is 3.84, i.e., χ2 0.05,1 = 3.84.
Decision: Since the calculated value of χ2 = 1.68 is less than the tabulated value of
χ20.05,1 = 3.84, Ho is accepted. Hence we conclude that the vaccine is not effective in the
immunization of goats from anthrax.
Alternative method (Using Yates Correction for continuity)
The given 2 × 2 contingency table is
Died Survive Row Total
Inoculated a=2 b = 10 a + b = 12
Not Inoculated c=6 d=6 c + d = 12
Column Total a=c=8 b + d = 16 N = 24
N 2 24 2
N |ad – bc| – 24 × |2 ×6 –10 × 6| –
2 2
Then, χ2 = = = 1.69
(a + b) (a + c) ( b + d) ( c + d) 12 × 8 × 16 × 12
163
Degree of freedom: df = (2 – 1) × (2 – 1) = 1
Critical value: The tabulated value of the test statistic χ2 at 5% level of significance for 1
degree of freedom is 3.84, i.e., χ2 0.05,1 = 3.84.
Decision: Since the calculated value of χ2 = 1.69 is less than the tabulated value of χ20.05,1 =
3.84, Ho is accepted. Hence we conclude that the vaccine is not effective in the immunization
of goats from anthrax.
164
Step 6: Decision
Case I For two tailed test
2 2
If χ1 – α/2 ≤ χ2cal ≤ χα 2' then H0 is accepted, otherwise, H0 is rejected.
Acceptance
region
Acceptance
region
χ2α, n – 1
Case III For two tailed test
2
If χ2cal ≥ χ1 – α , then H0 is accepted, otherwise, H0 is rejected.
Acceptance
region
χ21 – α, n – 1
Remarks:
If sample size is large (n ≥ 30), we use Fisher's approximation to the chi-square distribution
and then apply the normal test considering the test statistic
Z= 2 χ2 – 2n – 1
Example:
A sample of score of 20 post graduate students revealed the sample variance 170. Test at 5%
level of significance that the variance of score of all post graduate students is different from 150.
Solution:
In the usual notation, it is given that
n = 20 s2 = 170 σ2 = 150
165
Setting up hypothesis:
H0: σ2 = 150 i.e. the population variance is 150.
H1 :σ2 ≠ 150 i.e. the population variance is not 150 (Two tailed test).
Level of significance: It is given that α2 = 5% = 0.05
Test Statistic: Under H0, the test statistic is given by
n s2 20 × 170
χ2 = = = 22.67
σ2 1500
∴ χ2cal = 22.67
Degree of freedom: The degree of freedom is df = 20 – 1 = 19
Critical value: The tabulated values are
2 2 2 2
χα /2,n – 1 and χ0.025,19 = 32.852 and χ1– α/2,n – 1 = χ0.975,19 = 8.907
2 2
Decision: Since χ1– α/2,n – 1 ≤ χ2cal ≤ χα/2,n – 1, therefore, H0 is accepted Therefore, the
population variance is 150.
Example:
A manufacturing process produces the articles with standard deviation 1. A new
manufacturing process is designed with the objective that it has less variance. A random sample of
size 25 shows the sample standard deviation 0.73. Has the objective fulfilled?
Solution:
In the usual notations, it is given that
n = 25, s = 0.73 and σ = 1
Setting up hypothesis:
H0: σ2 = 1 i.e. the population standard deviation is 1 or the objective is not fulfilled.
H1 :σ2 < 1 i.e. the population standard deviation is less than 1 or the objective is fulfilled
(Left-tailed test)
Level of significance: Since it is not given, we take, α = 5% = 0.05
Test Statistic: Under H0, the test statistic is given by
n s2 25 × 0.732
χ2 = = = 13.32
σ2 12
∴ χ2cal = 13.32
Degree of freedom: The degree of freedom is df = 25 – 1 = 24
Critical value: The tabulated value is
2 2
χ1– α,n – 1 = χ0.95,24 = 13.85
2
Decision: Since χ2cal < χ1– α,n – 1, then H0 is rejected and H1 is accepted. So, the manufacturer's
objective is fulfilled.
166
Example:
A machine is set up to filled the cookies in the package in such a way that the net average
weight is 500 grams with standard deviation 5 grams. If the standard deviation is more than 5 grams,
the machine needs an adjustment. For this purpose, a random sample of size 20 packages is selected
which shows the sample SD of 6. Test at 1% level of significance whether the machine needs an
adjustment.
Solution:
In the usual notations, it is given that
n = 20, S = 6 grams and σ = 5 grams
Setting up hypothesis:
H0: σ = 5 i.e., the population standard deviation is 5 grams or the machine does not need an
adjustment.
H1: σ > 5 i.e. the population standard deviation is more than 5 grams or the machine needs
an adjustment.
Level of significance: It is given that α = 1% = 0.01
Test Statistic: Under H0, the test statistic is given by
n s2 20 × 62
χ2 = = = 28.8
σ2 52
∴ χ2cal = 28.8
Degree of freedom: The degree of freedom is df = 20 – 1 = 19
Critical value: Under H0, the test statistic is given by
χ2a, n – 1 = χ20.01,19 = 36.19
Decision: Since χ2cal < χ2a, n – 1, then H0 is accepted. So, the machine does not need an
adjustment.
167
Critical Range for the Marascuilo Procedure
2
Critical range (CR) = Tχα
pi (1 – pi) pi' (1 – pi')
+
ni ni'
In the final step, we compare each of the c (c – 1)/2 pairs of sample proportions against its
corresponding critical range. We declare a specific pair significantly different if the absolute
different in the sample proportions, |pi – pi'|, is greater than its critical range.
To apply the Marascuilo procedure, return to the guest satisfaction survey. Using the χ2 test,
we concluded that there we evidence of a significance difference among the population proportions.
Then the sample proportions are computing by using following contingency table:
Column variable
Row Variable Totals
A B C D
1 X1 X2 X3 X4 X
2 n1 X 1 n2 X 2 n3 X 3 n4 X 4 n X
Totals n1 n2 n3 n4 N
Here,
X1 X2 X3
p1 = , p2 = , p3 = . . . so on
n1 n2 n3
Conclusion:
If | pi – pi'| > CR (i.e. there is a significance difference between two groups)
If | pi – pi'| ≤ CR (i.e. there is no significance difference between two groups)
Example:
In a recent article in Quality Progress, the author argues that improving organizational
performance (in schools, government, or business) can occur only after good approaches have been
selected and implemented. Danville Community Schools in Indiana rates the faculty on their use of
quality tools that will lead to improve performance. The results are recorded in the following
contingency table:
School
Use of Tools North South Middle School High School
Elementary Elementary
Low 24 5 18 32
High 7 21 4 12
a. Is their evidence of a significance difference among the schools with respect to the
proportions of teachers who have obtained a "High" rating? (Use α = 0.05).
b. If appropriate, use the Marascuilo procedure and α = 0.05 to determine which schools
are different.
168
Solution:
a. Setting up hypothesis:
Null hypothesis (H0): Σ0 = ΣE (i.e. all the proportions are equal)
Alternative hypothesis (H1): Σ0 ≠ Σ E (i.e. all the proportion are not equal)
Level of significance: (α) = 0.05
Test Statistic:
(O – E)2
χ2 = ∑
E
Calculations of χ2
Blocks O RT × CT (O – E) (O – E)2 (O – E)2/E
E=
T
I 24 79 × 31 4.09 16.728 0.840
= 19.91
23
II 5 79 × 36 – 11.7 136.89 8.197
= 16.70
123
29.61
(O – E)2
∴ χ2 = ∑ = 29.61
E
Critical value:
2 2
χα {(c – 1) × (r – 1) } df = χ0.5 {(4 – 1) × (2 – 1) } df = χ2 0.05, 3 = 7.81
2 2
Decision: χcal > χtab at 5% level of significance
H0 is rejected and H1 is accepted and concluded that there is a difference in the
proportion making use of quality tools between the schools.
169
b. The Marascuilo procedure enables we make comparisons between all pairs of groups.
First we compute the sample proportions; as follows:
X1 24
p1 = = = 0.774
n1 31
X2 5
p2 = = = 0.192
n2 26
X2 18
p3 = = = 0.818
n2 22
X4 32
p4 = = = 0.727
n4 44
Critical range for the Maracuilo Procedure
2 pi (1 – pi) pi' (1 – pi')
CR = χα +
ni ni'
2
Here, χα = 7.81 = 2.794
Absolute Difference in Proportions Critical Range
0.774 (1 – 0.774) 0.192 (1 – 0.192)
2.794 +
31 26
|p1 – p2| = |0.774 – 0.192| = 0.5819
= 2.794 0.0056 + 0.0059 = 2.794 × 0.1075
= 0.3005
0.774 (1 – 0.774) 0.818 (1 – 0.818)
|p1 – p3 | = | 0.774 – 0.818| = 0.044 2.794 +
31 22
= 2.294 0.0056 + 0.00677 = 2.794 × 0.111202
= 0.3107
0.774 (1 – 0.774) 0.727 (1 – 0.727)
|p1 – p4 | = | 0.774 – 0.727| = 0.047 2.794 +
31 44
= 2.794 0.0056 + 0.0045 = 2.794 × 0.100548
= 0.281
0.192 (1 – 0.192) 0.818 (1 – 0.818)
|p2 – p3 | = | 0.191 – 0.818| = 0.626 2.794 +
26 22
= 2.794 0.0059 × 0.00677 = 2.794 × 0.11256
= 0.31449
0.192 (1 – 0.192) 0.727 (1 – 0.727)
|p2 – p4 | = | 0.192 – 0.727| = 0.535 2.794 +
26 44
= 2.794 0.0059 × 0.0045 = 2.794 × 0.10198
= 0.285
0.818 (1 – 0.818) 0.727 (1 – 0.727)
|p3 – p4 | = | 0.818 – 0.727| = 0.091 2.794 +
22 44
= 2.794 0.00059 × 0.0040 = 2.794 ×0.10616
= 0.2966
Conclusion:
Q |p1 – p2|, |p2 – p3|, |p2 – p4| are > C.R.
∴ There is a significance difference between p2 and other proportions
It means south elementary school is different from the other three schools. It has a much
higher proportion of high ratings in the use of quality tools.
170
Example:
More shoppers do the majority of their grocery shopping on Saturday than any other day of
the week. However, is there a difference in the various age groups in the proportion of people who
do the majority of their grocery shopping on Saturday? A study showed the results for the different
age groups. The data were reported as percentages, and no sample sizes were given. The results are
recorded in the following contingency table:
Age
Major shopping day
Under 35 35-54 Over 54
Saturday 24% 28% 12%
A day other than Saturday 76% 72% 88%
Assume that 200 shoppers for each group were surveyed.
a. Is there evidence of a significant difference among the age groups with respect to major
grocery shopping day? Use α = 0.05).
b. If appropriate, use the Marascuilo procedure and α = 0.05 to determine which age groups
are different.
c. Discuss the managerial implications of (a) and (b). How can grocery stores use this
information to improve marketing and sales?
Solution:
a. Setting up hypothesis:
Null hypothesis (H0): Σ0 = ΣE (i.e. all the proportions are equal)
Alternative hypothesis (H1): Σ0 ≠ Σ E (i.e. all the proportion are not equal)
Level of significance: (α) = 0.05
Test Statistic:
(O – E)2
χ2 = ∑
E
The given 2 × 3 contingency table is
Age Totals
Major shopping day
Under 35 35-54 Over 54
Saturday 48 56 24 128
A day other than Saturday 152 144 176 472
Totals 200 200 200 600
Calculations of χ2
Blocks O RT × CT (O – E) (O – E)2 (O – E)2/E
E=
T
I 48 42.667 5.333 28.44 0.667
II 152 157.333 -5.333 28.44 0.181
III 56 42.667 13.333 177.69 4.166
IV 144 157.333 -13.333 177.69 1.130
V 24 42.667 -18.667 348.46 8.167
VI 176 157.333 18.667 348.46 2.215
Totals 16.526
(O – E)2
∴ χ2 = ∑ = 16.526
E
171
2
Critical value: χα {(c – 1) × (r – 1) } df = χ2 0.05, 2 = 5.991
2 2
Decision: χcal > χtab at 5% level of significance
H0 is rejected and H1 is accepted and concluded that there is a difference in the proportion
making use of quality tools between the schools.
b) The Marascuilo procedure enables we make comparisons between all pairs of groups. First we
compute the sample proportions; as follows:
X1 48 x2 56 x3 24
p1 = =
n1 200
= 0.24 p2 = n = 200 = 0.28 p3 = n = 200 = 0.12
2 3
Conclusion:
Q |p2 – p3|, |p1 – p3| are > C.R.
∴ There is a significance difference between the 35-54and over 54groups and between the
under 35 and over 54 groups.
c) The stores can use this information to target heir marketing to the specific groups of shoppers
on Saturday and that days other than Saturday.
It means south elementary school is different from the other three schools. It has a much
higher proportion of high ratings in the use of quality tools.
172
Test statistic: Under H0, the test statistic is obtained by the following procedure:
i. Combine the sample observations of both samples so that n = n1 + n2.
ii. Rank these n observations either from smallest to largest or from largest to smallest value. If
the value of two or more observations are equal, then assign each of the tied observations by
the average of their ranks.
iii. Determine sum of the ranks assigned to the values of the first sample and second sample
separately and denote them by R1 and R2 respectively.
In small sample case, the null and alternative hypothesis stated as follows:
Setting up hypothesis:
Null hypothesis (H0): µ1 = µ2 (i.e., there is no significance difference between two means)
Alternative hypothesis (H1): µ1 ≠ µ2 (i.e., there is significance difference between two means)
or, Alternative hypothesis (H1) : µ1 > µ2 (Right Tailed Test)
or, Alternative hypothesis (H1) : µ1 < µ2 (Left Tailed Test)
α)
Level of Significance: (α
Test statistic: Under H0,
n1 (n1 + 1)
U 1 = R1 –
2
n2 (n2 + 1)
U 2 = R2 –
2
Where,
R1 = Sum of rank occupied by first sample
R2 = Sum of rank occupied by second sample
n1 = number of first sample
n2 = number of second sample
Critical Value: Consider a smaller one i.e., Take U = min {U1, U2}
Making Decision:
Alternative hypothesis Rejecting the null hypothesis if
µ1 ≠ µ2 (TTT) U ≤ Uα
µ1 > µ2 (RTT) U2 ≤ U2α
µ1 < µ2 (LTT) U1 ≤ U2α
Case II: Large Sample Case i.e., when n1, or n2 > 15
Test statistic: In large sample case, the sampling distribution of U is approximately normal
with mean,
n1 × n2 n1n2 (n1 + n2 + 1)
µ= and variance, σ2 =
2 12
Therefore the test statistic is
U–µ
Z=
σ
173
n 1 × n2
U–
2
or, Z =
n1 n2 (n1 + n2 + 1)
12
n1n2(n1 + n2 + 1)
Where, U = min {U1, U2}, and σ =
12
Then the test is completed by the usual procedure of the Z-test (normal test).
Example:
The following are the miles per gallon obtained from two kinds of gasoline.
Gasoline: A 17.0 17.8 15.7 16.8 18.4 16.2 18.3 18.1 14.3
Gasoline: B 18.6 18.8 17.1 19.5 17.6 19.0 15.2 19.8 17.5 18.0
Test the hypothesis at the 0.05 level of Significance that average mileage of gasoline A is less
than of gasoline B. Use Mann-Whitney U test.
Solution:
Jointly Assigned Rank
Gasoline A Rank (A) Gasoline B Rank (B)
17.0 6 18.6 15
17.8 10 18.8 16
15.7 3 17.1 7
16.8 5 19.5 18
18.4 14 17.6 9
16.2 4 19.0 17
18.3 13 15.2 2
18.1 12 19.8 19
14.3 1 17.5 8
18.0 11
Now,
R1 = 6 + 10 + 3 + 5 + 14 + 4 + 13 + 12 + 1 = 68
R2 = 15 +16 + 7 + 18 + 9 + 17 + 2 + 19 + 8 + 11 = 122
Setting up hypothesis:
Null hypothesis (H0): µ1 = µ2
Alternative hypothesis (H1): µ1 < µ2 (Left Tailed Test)
Level of Significance: = 0.05
174
Test statistic and determine critical value: Since the test concerning Rank-sum test and the
number of both samples are less than 15 but the table favours this data we use U-test with
U = min (U1, U2). Where U2α = U0.10 (n1 = 9, n2 = 10) = 24. Then we reject null hypothesis H0 if
U≤ 24.
Now, U = min {U1, U2}
Where,
n1 (n1 + 1) 9 × 10
U 1 = R1 – = 68 – = 23
2 2
n2 (n2 + 1) 10 × 11
U 2 = R2 – = 122 – = 67
2 2
∴ U = {23, 67} = 23
Making Decision: Since the calculated value U = 23 is less than the tabulate value U = 24. So
we can reject null hypothesis H0. Thus we concluded that on average gasoline A is less mileage than
the gasoline B.
Example:
Given the data below, test at the 0.05 level of significance whether the two samples come
from identical continuous population or whether the average burning time of brand A flares is less
than that of brand B flares.
Solution:
Ranking the data jointly according to size
Brand A Rank (A) Brand B Rank (B)
14.9 7 15.2 8
11.3 1 19.8 17
13.2 4 14.7 6
16.6 12 18.3 15
17.0 14 16.2 11
14.1 5 21.2 19
15.5 10 18.9 16
13.0 3 12.2 2
16.9 13 15.3 9
19.4 18
Now,
R1 = 7 + 1 + 4 + 12 + 14 + 5 + 10 + 3 + 13 = 69
R2 = 8 +17 + 6 + 15 + 11 + 19 + 16 + 2 + 9 + 18 = 121
Setting up hypothesis:
Null hypothesis (H0): µ1 = µ2
Alternative hypothesis (H1) : µ1 < µ2 (Left Tailed Test)
Level of Significance: = 0.05
175
Test statistic and determine critical value: Since the test concerning Rank-sum test and the
number of both samples are greater than 8 but the table favours this data we use U-test with
U = min (U1, U2). Where U2α = U0.10 (n1 = 9, n2 = 10) = 24. Then we reject null hypothesis H0 if
U≤ 24.
Now, U = min {U1, U2}
Where,
n1 (n1 + 1) 9 × 10
U 1 = R1 – = 69 – = 24
2 2
n2 (n2 + 1) 10 × 11
U 2 = R2 – = 121 – = 66
2 2
∴ U = {24, 66} = 24
Making Decision: Since the calculated value U = 24 is equal to the tabulate value U = 24. So
we can reject null hypothesis H0. Thus we concluded that on average brand A flares have
stronger burning time then brand B flares.
Example:
The following are the weight gains (in pounds) or two random samples of young turkeys fed
two different diets but otherwise kept under identical conditions:
Diet: 1 16.3 10.1 10.7 13.5 14.9 11.8 14.3 10.2 12.0 14.7 23.6 15.1 14.5 18.2 13.2 14.0
Diet: 2 21.3 23.8 15.4 19.6 12.0 13.9 18.8 19.2 15.3 20.1 14.8 18.9 20.7 21.1 15.8 16.2
Use the U-test at the 0.01 level of significance test the null hypothesis that the two
populations sampled are identical against the alternative hypothesis that on the average the second
diet produces a greater gain in weight.
Solution:
Ranking the Data Jointly According to Size
Diet 1 Rank (1) Diet 2 Rank (2)
16.3 21 21.3 30
10.1 1 23.8 32
10.7 3 15.4 18
13.5 8 19.6 26
14.9 15 12.0 5.5
11.8 4 13.9 9
14.3 11 18.8 23
10.2 2 19.2 25
12.0 5.5 15.3 17
14.7 13 20.1 27
23.6 31 14.8 14
15.1 16 18.9 24
14.5 12 20.7 28
18.4 22 21.1 29
13.2 7 15.8 19
14.0 10 16.2 20
176
Now,
R1 = 21 + 1+ 3+ 8+ 15 + 4 + 11 + 2 + 5.5 + 13 + 31 + 16 + 12 + 22 + 7 + 10 = 181.5
R2 = 30 + 32 + 18 + 26 + 5.5 + 9 + 23 + 25 + 17 + 27 + 14 + 24 + 28 + 29 + 19 + 20 = 346.5
Setting up hypothesis:
Null hypothesis (H0): µ1 = µ2
Alternative hypothesis (H1): µ1 < µ2 (Left Tailed Test)
Level of Significance: = 0.01
Test statistic and determine critical value: Since the test concerning Rank-sum test and the
number of both samples are greater than 15. So we use Z-test with Zα = Z0.01 = 2.33
U–µ
Now, Z =
σ
Where,
n1n2 16 × 16
µ= = = 128
2 2
n1 (n1 + 1) 16 × 17
U 1 = R1 – = 181.5 – = 45.5
2 2
n2(n2 + 1) 16 × 17
U 2 = R2 – = 346.5 – = 210.5
2 2
Now, U = min {U1, U2} = min {45.5, 210.5} = 45.5
n1n2(n1 + n2 + 1) 16 × 16 × 33
and, σ = = = 26.533
12 12
U – µ 45.5 – 128
or, Z = = = – 3.11
σ 26.533
or, |Z| = 3.11
Making Decision: Since the calculated value |Z| = 3.11 is less than the tabulate value |Z| =
2.33. So we can reject null hypothesis H0. Thus we concluded the second diet produced a
greater gain in weight.
The H test, also called the Kruskal-Wallis test, is a generalization of the rank sum test of the
proceeding section to the case where we test the null hypothesis that k samples come from identical
population. In other words, it is a non-parametric alternative to the one-way analysis of variance.
As in the U test, the data are ranked jointly from low to high, as though they constitute one
sample. The, letting Ri be the sum of the ranks of the values of the ith sample. Let k (≥ 3)
independent random samples of size ni (i = 1, 2, . . . , k), probably different sizes, be drawn from k
populations with different location (Mdi unknown).
177
Null hypothesis H0: µ1 = µ2 = . . . = µk, i.e., the k independent population means are equal.
Alternative hypothesis H1: µ1 ≠ µ2 ≠ . . . ≠ µk i.e., the k independent population means are not
equal. In other words, at least two means of the populations are not equal.
Test statistic: Under H0, Kruskal and Wallis H statistic is
12 k R2i
H = . Σ – 3 (n + 1)
n(n + 1) i = 1 ni
2 2 2
Rk
12 R1 R2 – 3 (n + 1)
or, H = + +...+
n(n + 1) n1 n2 nk
Where,
n = n1 + n2 + n3 +………..+ nk = total number of values in the k samples
ni = number of values in the ith sample (i = 1, 2, 3, . . . k)
Ri = Sum of ranks assigned to the ith sample
2
R i = Square of the sum of the ranks assigned to the ith sample
k = number of groups or number of populations sampled
Case I: Small case i.e., when k = 3 and ni ≤ 5
Critical region: For different values of n1 (i = 1, 2, 3 = k), the probability p0 associated with
the value as extreme as observed H is obtained from the Kruskal-Wallis table i.e. p0 = P (H ≥ Hcal).
Decision: If P0 ≤ α a pre-assigned level, we reject the null hypothesis H0 and other wide we
accept the H0 at the α-level.
Case II: Large sample case i.e., when k ≥ 3 and n1 > 5
In this case, the sampling distribution of H, under H0, can be approximated by a chi-square
distribution with k-1 degrees of freedom i.e. H ~ χ2 (k – 1) under H0.
Critical region: For a given α Level, we obtain the critical value of H from the chi-square
2
table for k-1 degrees of freedom i.e. χα‚(K – 1) .
2
Decision: If H* ≥ χα‚(K – 1) we reject the null hypothesis H0, otherwise we accept the H0 at
the α level.
Note: H* = Hcal.
Example:
The table given below present the operation times in hours for 3 types of scientific calculators
before recharge is required.
Calculators
1 2 3
24.0 23.2 18.4
16.7 19.8 19.1
22.8 18.1 17.3
19.8 17.6 17.3
18.9 20.2 19.7
17.8 18.9
18.8
19.3
Use the Kruskal-Wallis H tests at the 0.05 level of significance, to test the hypothesis that the
operating times for all three calculators are equal.
178
Solution:
Assign Rank Jointly
1 Rank (1) 2 Rank (2) 3 Rank (3)
24.0 19 23.2 18 18.4 7
16.7 1 19.8 14.5 19.1 11
22.8 17 18.1 6 17.3 2.5
19.8 14.5 17.6 4 17.3 2.5
18.9 9.5 20.2 16 19.7 13
17.8 5 18.9 9.5
18.8 8
19.3 12
R1 = 19 + 1 + 17 + 14.5 + 9.58 = 61, n1 = 5
R2 = 18 + 14.5 + 6 + 4 + 16 + 5 = 63.5, n2 = 6
R3 = 7 + 11 + 2.5 + 2.5 + 13 + 9.5 + 8 + 12 = 65.5, n3 = 8
n = 5 + 6 + 8 = 19
Setting up hypothesis:
Null hypothesis (H0): all the means are equal
Alternative hypothesis (H1): all the means are not equal
Specify level of significance: α = 0.05
Choose an appropriate test statistics and determine critical value: Since the test
concerning Kruskal-Wallis test, so we use H–test with
2
12 k Ri
H= . Σ – 3 (n + 1)
n(n + 1) i = 1 ni
Where,
2
χ0.05‚2 = 5.991. Thus we reject null hypothesis H0 if H ≥ 5.991.
179
Example:
The following are the final examination grades of samples from three groups of students who
were taught German by three different methods:
1st method: 94, 88, 91, 74, 87, 97
2nd method: 85, 82, 79, 84, 61, 72, 80
3rd method: 89, 67, 72, 76, 69
Use the H test at the 0.05 level of significance to test the null hypothesis that the three
methods are equally effective.
Solution:
Assign rank jointly
1st method Rank (1) 2nd method Rank (2) 3rd method Rank (3)
94 17 85 12
88 14 82 10 89 15
91 16 79 8 67 2
74 6 84 11 72 4.5
87 13 61 1 72 7
97 18 72 4.5 76 3
80 9 6
Here,
R1 = 17 + 14 + 16 + 6 + 16 + 8 = 84
n1 = 6
R2 = 12 + 10 + 8 11 + 1 + 4.5 + 9 = 55.5
n2 = 7
R3 = 15 + 2 + 4.5 + 7 + 3 = 31.5 n3 = 5 n = 6 + 7 + 5 = 18
Setting up hypothesis:
Null hypothesis (H0): all the means are equal
Alternative hypothesis (H1): all the means are not equal
Level of significance: α = 0.05
Choose an appropriate test statistics and determine critical value: Since the test
concerning Kruskal-Wallis test, So we use H-test with
12 k R2i
H= . Σ – 3 (n + 1)
n(n + 1) i = 1 ni
Where,
2 2
χα‚v = χ0.05‚2 = 5.991. Thus we reject null hypothesis. H0 if H ≥ 5.991.
Compute the value of test
2
12 k Ri
H = . Σ – 3 (n + 1)
n(n + 1) i = 1 ni
2 2 2
12 R1 R2 R3
= + + – 3 (n + 1)
n(n + 1) n1 n2 n3
2 2 2
12 84 + 55.5 + 31.5 – 3 × 19 = 6.665
=
18 × 19 6 7 5
Making decision: Since the calculated value H = 6.665 are greater than the tabulated value
Hα = 5.991. So we reject null hypothesis H0. Thus we conclude that all sample means are not equal.
180
Example:
To compare three bowling balls, a professional bowler bowls five games with each ball and
gets the following results:
Ball A: 208 220 247 192 229
Ball B: 216 196 189 205 210
Ball C: 212 198 207 232 221
Use the Kruskal-Wallis test at the 0.05 level of significance to test whether the bowler can
expect to score equally well with the four bowling balls.
Solution:
Ball A Rank (A) Ball B Rank (B) Ball C Rank (C)
208 7 216 10 212 9
220 11 196 3 198 4
247 15 189 1 207 6
192 2 205 5 232 14
229 13 210 8 221 12
Totals R1 = 48 R2 = 27 R3 = 45
Here,
K = 3, n1 = 5, n2 = 5, n3 = 5 and n = n1 + n2 + n3 = 5 + 5 + 5 = 15
Setting up hypothesis:
Null hypothesis (H0): all the means are equal
Alternative hypothesis (H1): all the means are not equal
Level of Significance (α) = 0.05
Choose an appropriate test statistic and determine critical value: Since the test
concerning Kruskal-Wallis test, so we use H-test with
Theoretical Questions
1. What do you mean by non-parametric tests? Distinguish between parametric and non-
parametric and non-parametric tests.
2. State the basis assumption associated with the non-parametric tests. Give advantages and
disadvantages of non-parametric tests.
4. What non-parametric test would you use when the two samples are related? Explain the two
samples sign test for small samples.
10. Describe on the process of testing whether two attributes are independent significantly or not.
11. Define a contingency table. Discuss the χ2 -test of independence of two attributes in a 2×2
contingency table.
12. Explain Yates correction for continuity.
13. Explain the test procedure of Mann-Whitney U test for (i) small sample case and (ii) large
sample case.
14. Discuss the Kruskal-Wallis one way ANOVA test for small samples. How is the Kruskal-
Wallis H test carried out for large samples?
Practical Problems
1. The following table gives the number of accidents that occurred during the seven days of a
week. Test whether the accidents are uniformly distributed throughout the week.
Days Sun Mon Tue Wed Thus Fri Sat
No. of accidents 14 18 12 11 15 14 14
2. The demand for a particular spare part in a factory was found to vary from day-to-day. In a
sample study the following information was obtained:
Days Sun Mon Tue Wed Thus Fri
No. of parts demanded 124 125 110 120 126 115
Test the hypothesis that the number of parts demanded does not depend on the day of the week.
182
3. Genetic theory states that children having one parent of blood type M and the other of blood
type N will always be one of the three types, M, MN, N and that the proportions of three types
on the average be as 1:2:1 A report states that out of 300 children having one M parent and one
N parent, 30% were found to be type M, 45% type MN and remainder type N. Test the
hypothesis by χ2 test.
4. The theory predicts that the proportions of beans in the four groups, A, B, C, and D should be
9:3:3:1 respectively. In an experiment, it was found that out of 1600 beans the number of beans
distributed in the four groups were 882, 313, 287, and 118 respectively. Does this experiment
result support the theory at 5% level of significance?
5. A survey amongst women was conducted to study the family life. The observations are as
follows:
Family life
Total
Happy Not Happy
Educated 70 30 100
Not educated 60 40 100
Total 130 70 200
Test whether there is any association between family life and education.
6. The following table gives the classification of 100 workers according to gender and the nature
of work. Test whether nature of work is associated with the gender of the worker.
Skilled Unskilled
Male 40 20
Female 10 30
7. In an industry, 200 workers, employed for a specific job, were classified according to their
performance and training received/ not received to test independence of a specific training and
performance. The data is summarized as follows:
Performance
Total
Good Not good
Trained 100 50 150
Untrained 20 30 50
Total 120 80 200
Use chi-square test of independence at 5% level of significance and write your conclusion.
8. One thousands students at college level were studied in terms of their CMAT score and the
types of school they came from and the data is presented as below:
CMAT
Total
High Low
Private School 460 140 600
Public School 240 160 400
Total 700 300 1000
At 5% level of significance, can you conclude that there is association between the types of
school and the CMAT scores?
183
9. A tobacco company claims that there is no relationship between smoking and lung ailments. To
investigate the claims random sample of 300 males in the age group of 40 to 50 is given medial
test. The observed sample results are tabulated below:
Lung ailment No lung ailment Total
Smokers 75 105 180
Non-Smokers 25 95 120
Total 100 200 300
On the basis of this information, can it be concluded that smoking and lung ailments are
independent?
10. From the following data, can you conclude that inoculation is effective in preventing
tuberculosis?
Group Attacked Not attacked Total
Inoculated 10 90 100
Not inoculated 26 74 100
Total 36 164 200
11. There is a general belief that high-income families send their children to private schools. To
verify this 16,00 families were selected at random in a city and the following results were
obtained:
Type of
schooling Public Private Total
Income
Low 506 494 1000
High 438 162 600
Total 944 656 1600
At 5% level of significance, can you conclude that there is association between the types of
income level of the families?
12. Out of 8,000 graduates in a town, 800 are females; out of 1,600 graduate employees, 120 are
female. Use chi-square test to determine if any distinction is made in appointment on the basis
of gender.
13. In a survey of 200 boys, of which 75 were intelligent, 40 and skilled fathers, while 85 of the
unintelligent boys had unskilled fathers. Do these data support the hypothesis that skilled
fathers have intelligent boys? Use chi-square test.
14. A sample of 300 students of undergraduate and 300 of post graduate classes of a university
were asked to give their opinion towards the autonomous colleges. 190 of the undergraduate
and 210 of the post graduate student favoured the autonomous status.
Present the above fact in the form of frequency table, and test at 5% level, that opinions of
undergraduate and post graduate student on autonomous status of colleges are independent.
184
15. Nepal Telecom Company conducts a survey to determine the ownership of cellular phones in
different age groups. The results for 1000 households are shown in the table given below. Test
the hypothesis that the proportions owning cellular phones are the same for different age
groups.
Cellular phone Age Groups Total
18-24 25-54 55-64 ≥ 65
Yes 50 80 70 50 250
No 200 170 180 200 750
Total 250 250 250 250 1000
If appropriate, use the Marascuilo procedure and α = 0.05 to determine which age groups are
different.
16. An automobile company gives you the following information about age groups and the liking
for particular model of car which it plans to introduce.
Below20 20 – 39 40 - 59 60 and above Total
Liked the car 140 80 40 20 280
Disliked the car 60 50 30 80 220
Total 200 130 70 100 500
On the basis of this data can it concluded that the model appeal is independent of the age group
of the persons? If appropriate, use the Marascuilo procedure and α = 0.05 to determine which
age groups are different.
17. The sales pattern due to trained and fresh salesman were produced as follows:
Sales (Rs)
0-500 500-1000 Above 1000 Total
Trained salesman 15 25 30 70
Fresh salesman 15 10 5 30
Total 30 35 35 100
Is there any evidence to conclude that the training of salesman and the sales revenue are
associated? If appropriate, use the Marascuilo procedure and α = 0.05 to determine which
groups are different.
18. In a survey of media preferences, 8 to 17 year-olds were asked which medium was their
favorite and the following result was obtained.
Medium
Internet TV Phone Radio Other
Boys 190 170 60 60 20
Girls 140 85 155 85 35
Test the hypothesis that the distribution of media preferences is the same for boys and girls in
this age group at 1% level of significance. If appropriate, use the Marascuilo procedure and
α = 0.05 to determine which groups are different.
185
19. The following information is obtained concerning an investigation of 50 ordinary shops of
small size:
Shops
Total
In towns In villages
Run by men 17 18 35
Run by women 3 12 15
Total 20 30 50
Can it be inferred that shops run by women and men are same in villages and towns?
20. The distribution of 100 individuals by hair color and eye color is given below. Is there any
association between these two attributes?
Hair color
Eye color Total
Black Fair Brown
Blue 25 12 8 45
Grey 20 5 5 30
Brown 15 5 5 25
Total 60 22 18 100
21. A random sample of 200 students was selected and their grading in ability in mathematics and
interest in business administration were as given below.
186
24. A machine is set up to fill the cookies in the packages in such a way that the net average weight
is 500 grams with standard deviation 5 grams. If the standard deviation is more than 5 grams,
the machine needs an adjustment. For this purpose, random sample of size 20 package is
selected which shows the sample SD of 6. Test at 1% level of significance, whether the
machine needs an adjustment.
25. A manufacturing process produces the articles with standard deviation 1. A new manufacturing
process is designed with the objective that it has less variance. A random sample of size 25
shows the sample standard deviation 0.73. Has the objective fulfilled?
26. Test the hypothesis that population SD 8 given that sample SD 10 for a random sample of size 51.
27. The groups of rats, one group consisting of trained ones, another groups not trained on (i.e.
controlled) have the following number of trials to achieve certain criterion:
Trained rats 78 64 75 45 82
Use Mann-Whitney U test to test if there is a difference between the two average number of
trails of trained and untrained rats.
28. Two independent random samples of unemployed men and women are drawn and the ages of
the 4 unemployed women and 5 unemployed men are recorded as follows:
Women 60 63 36 44
Men 53 39 22 33 24
Do these data present sufficient evidence to conclude that there is a difference in average age of
unemployed men and women? Use Mann-Whitney U test at α = 0.05, test whether the two
random samples have been come from same population or from populations with same pdf.
29. Two independent random sample drawn from two populations are as follows:
Sample I 4 6 8 10 12 13 14 15 19
Sample II 1 2 3 5 7 9 11 18
Applying Mann-Whitney U test with α = 0.05, test whether the two samples have come from
the same population or from populations with equal mean.
30. A study is performed to compare the variability of two brands of tire. The following mileages
(100 miles) were obtained for eight tires of each kind.
Do these data provide sufficient evidence to support the research hypothesis that there is a
significant difference between the average mileages of the two brands of tires? use Mann-
Whitney U test at (i) α = 0.01 (ii) α = 0.05 and (iii) α = 0.10.
187
31. The following are the scores of certain randomly selected students at mid-term (MT) and final
examination (FE).
MT 55 57 72 92 57 74
FE 80 76 63 58 56 37 75
Set up null hypothesis and alternative hypothesis and test your null hypothesis using Mann-
Whitney U-test.
32. The bacteria counts per unit volume are shown below for two types of cultures A and B. Four
observations were made for each culture:
A 27 31 26 25
B 32 29 35 28
Do these data present sufficient evidence to indicate a difference in the population distributions
for A and B? Use Mann-Whitney U test.
33. Two plastic each produced by a different process were tested for ultimate strength. The
measurement shown below represent load in units of 1000 pounds-per-square inch:
Plastic I 15.3 18.7 22.3 17.65 19.1 14.8
Plastic II 21.2 22.4 18.3 19.3 17.1 27.7
(i) Use Mann-Whitney U test with α = 0.05 to test the null hypothesis the there is no difference in
the distributions of ultimate strength of the two plastics.
(ii) Do the data present sufficient evidence to concluded that the average load per square inch of
plastic II is greater than that of plastic I? Use Mann-Whitney U test with α = 0.01 and α = 0.10.
34. An interview was concluded to two groups each from one of the two campuses X and Y for
free-ship. The scores obtained by the students of the two groups wee as follows:
Group X 6 3 4 2 1 5 7 4 8 5
Group Y 7 1 5 4 8 9 7 10 11 5
12 15 9 3 2 1 11 14 13 15
Use the Mann-Whitney U test to test the research hypothesis that the groups Y is more
appropriate than the group X for the free-ship.
35. The following are data on the breaking strength (in pounds) of random samples of two kinds of
2-inch cotton ribbons.
144 181 200 187 169 171 186 194
Type I Ribbon:
176 182 133 183 197 165 180 198
175 164 172 194 176 198 154 134
Type II Ribbon
169 164 185 159 161 189 170 164
Use the U test at the 0.05 level of significance to test the claim that Type I ribbon is, on the
average, stronger than Type II ribbon.
188
36. For the following scores of 3 matched groups, apply Kruskal-Walls H test to test the hypothesis
that the three groups are not significantly different.
Group Scores
A 96 128 83 61 101
B 82 124 132 135 109
C 115 149 166 147 -
37. Following data represent the marks obtained by a student in 3 different subjects during first
semester of MBA.
Group Marks
Statistics 81 75 92 78 83
Accountancy 74 79 82 90 -
Economics 84 70 65 72 -
Apply Kruskal-Walls H test to test whether there is a difference between the marks in there
subject or not.
38. An agricultural experiment was conducted to compare the yield of wheat by using the three
types of chemical fertilizer nitrogen (N), phosphorus (P) and potash (K). Twelve plots of equal
size were selected at random and divided into three groups of four each and planted wheat.
Each group was randomly selected and the fertilizer was applied in the plots under the identical
conditions. Then the yields of wheat recorded were given in the following table:
Chemical Fertilizer
N P K
122 81 80
80 80 82
138 79 65
121 65 58
(i) Apply Krukal-Wallis H test to test whether the three types of fertilizers are equally effective or
not.
or, Test whether the population of yields of wheat in using three types of fertilizers differ in
location significantly.
39. A programme of a political party is conducted in keeping the view of for coming election for
parliament in Nepal. Five state are selected at random and each divided into 5 sub-regions. For
the programme each of five central members of the political party is sent to the five sub-regions
selected at random and the increment of number of voters (in 100) found in the election are
given in the following table:
189
Central Members State
1 2 3 4 5
A 41 40 38 40 28
B 41 50 46 40 -
C 42 38 54 40 44
D 30 35 38 - 42
E 24 15 - 20 -
Apply the Kruskal-Wallis H test to test the hypothesis that the populations of number of
increment of voters corresponding to the five central numbers are not different in location
significantly.
40. For a certain period, the advertisement through three different media T.V., Radio and
newspapers, is given by the Pepsi-cola company for promoting sales of the Pepsi-Cola. The
sales (in 1000 bottles) recorded from 6 cities are given in the following table.
Advertising Sales in Cities
Media Kathmandu Bhaktapur Lalitpur Pokhara Biratnagar Birgunj
T.V. 22 15 14 21 16 20
Radio 9 7 13 10 12 7
Newspaper 14 12 8 7 11 6
Apply the Kruskal-Wallis H test whether the three advertising medias are equally effective in
promoting the sales of the Pepsi-Cola.
41. The following are the final examination grades of samples from three groups of students who
were taught German by three different methods (classroom instruction and language laboratory,
only classroom instruction, and only self-study in language laboratory):
First method 94 88 91 74 87 97
Second Method 82 82 79 84 61 72 80
Third Method 98 67 72 76 69
Use the H test at the 0.05 level of significance to test the null hypothesis that the three methods
are equally effective.
42. The following are the miles per gallon that a test driver got for 10 tank full of each of three
kinds of gasoline:
Gasoline A: 20 31 24 33 23 24 28 16 19 26
Gasoline B: 29 18 29 19 20 21 34 33 30 23
Gasoline C: 19 31 16 26 31 33 28 28 25 30
Use the Kruskal-Wallis test at the 0.05 level of significance to test whether there is a difference
in the actual average mileage yield of the three kinds of gasoline.
190
15. χ2 = 14.3, reject H0; ∵ |p1 – p2|, |p2 – p4| are > C.R. ∴ there is significance difference between
the (18 - 24) and (25 – 54) groups and between the (25 – 54) and ≥ 64 age groups.
16. χ2 = 70.16, reject H0; ∵ |p1 – p4|, |p2 – p4|, and |p3 – p4| are > C.R. ∴ there is significance
difference between p4 and other proportions.
17. χ2 = 9.85, accept H0; ∵ |p1 – p3| > C.R. ∴ there is significance difference between the
Rs. (0-500) and over Rs. 1000 sales.
27. U = 9 > U0.05, (5, 4) = 1 accept H0 28. U = 3 > U0.05, (4, 5) = 1 accept H0
39. U = 18, accept H0 30. U = 29, accept H0 at α = 0.01, α = 0.05 and α = 0.10
33. (i) U = 9, accept H0 (ii) U = 9, accepted H0 at α = 0.01, α = 0.05 but reject H0 α = 0.10
191
193
23. While performing Kruskal-Wallis test, the ranks are assigned:
a. Independently to the observations for each treatment
b. For observations in each block independently
c. By pooling all the observations
d. None of the above
24. The test statistics in the Kruskal-Wallis test is:
a. Weighted sum of squares of the deviations of the sum of treatments rank from the
expected sum of ranks
b. Sum of squares of the deviations of the sum of treatments rank form the expected sum of
ranks
c. Both (a) and (b)
d. Neither (a) nor (b)
25. The statistic H under the Kruskal-Wallis test is approximately distributed as:
a. Student’s t b. Snedecor’s F c. Chi-square d. Normal deviate-Z
26. When the number of treatments is kruskal-Wallis test is two, the statistic H reduces to:
a. Mann-Whitney U statistic b. Wilcoxon’s U statistic
c. Both (a) and (b) d. Neither (a) nor (b)
27. Kruskal-Wallis H with K treatments and n blocks which is approximated to Chi-square
has d. f:
a. (n -1) b. (k – 1) (n – 1) c. (k – 1) d. K (n – 1)
28. If the ties occur in the Kruskal-Wallis test, with usual notations, the correction C for ties is:
T T T
a. Σ (n2 – 1) b. Σ (n2 – 1) c. Σ (n – 1) d. None of the above
n k Kn
29. If C id the correction factor for ties in Kruskal-Wallis test statistic H, the corrected test
statistic is:
a. H–C b. H/C c. H + C d. H ×C
30. The hypothesis that the population variance has a specified value can be tested by:
a. F-test b. Z-test c. χ2 -test d. None of the above
2 2
31. The test statistic to be used to test H0: σ = C vs. H1: σ ≠ C with usual notations is:
(n – 1)s2 (n – 1)s2 ns2
a. χ2 = 2 b. χ2 = c. χ2 = 2 d. All the above
C C C
2
32. χ2 to test H0: σ2 = σ0 is based on a sample of size n has degrees of freedom equal to:
Statistic-χ
a. n-1 b. n c. (n+1) d. None of the above
2
33. χ in case of contingency table of order (2 × 2) is
Degrees of freedom for statistic-χ
a. 3 b. 4 c. 2 d. 1
34. In a multinomial distribution with 4 classes, the degrees of freedom for χ2 is:
a. 3 b. 4 c. 2 d. 1
194
35. Formula for χ2 for testing a null hypothesis in a multinomial distribution with usual
notations is:
2
2 k (Oi – Ei)2 2 k Oi
a. χ = Σi = 1 b. χ = Σi = 1 –n
Ei Ei
2
2 k Oi
c. χ = Σi = 1 –n d. all the above
npi
36. The statistic- with usual notations in case of contingency table of order (m × P) is
given by the formula:
2
2 m p (Oij – Eij)2 2 m p Oij –Eij
a. χ = Σi = 1 Σj = 1 b. χ = Σi = 1 Σj = 1
Eij Eij
Oij –Eij2
χ2 = Σi = 1 Σj = 1
m p
c. d. all the above
Eij
37. Degrees of freedom for Chi-square in case of contingency table of order (4 3) are:
a. 12 b. 9 c. 8 d. 6
195
44. In parametric test, which of the following is true?
a. Shape of distribution is required.
b. Preseure sufficiency.
c. Uses the population parameter.
d. Ignore the sufficiency.
45. In which test preserve the sufficiency?
a. Parametric test.
b. Non-parametric test.
c. both, parametric and non-parametric test.
d. None of the above.
46. Which of the following is not non-parametric test?
a. χ2 b. Rank correlation c. sign test d. F-test
47. In the k-w test of k sample, the appropriate number of degrees of freedom is:
a. K b. K-1 c. nk-1 d. n-k
48. Range of the statistics – x2 is:
a. -1 to +1 b. -∞ to +∞ c. 0 to ∞ d. 0 to 1
49. For sample size greater than 30, the sampling distribution of rank correlation coefficient
is approximately which distribution.
a. t b. binomial c. chi-square d. Normal.
50. Which test is the parametric alternative of two way analysis of variance?
a. H test b. U test c. Friedman test d. Rum test
51. Which test used to test the independence of two random variables?
a. Z-test b. t-test c. Rank correlation d. Run test
52. Which of the following is the test statistics for H-test?
2
12 r1 12
a. – 3 (n – 1) b.
n(n + 1) ni bk(K + 1)
n(n + 1) (2n + 1) n(n + 1) (2n + 1)
c. d.
24 12
1. a 2. b 3. c 4. c 5. c 6. a 7. c 8. b 9. a 10. c
11. a 12. c 13. c 14. c 15. a 16. a 17. c 18. c 19. d 20. c
21. a 22. a 23. c 24. a 25. c 26. c 27. c 28. a 29. b 30. c
31. b 32. a 33. d 34. a 35. d 36. a 37. d 38. c 39. b 40. b
41. d 42. d 43. c 44. d 45. a 46. d 47. b 48. b 49. d 50. c
51. c 52. a
XXX
196
Unit Simple Linear Regression
197
6.1 Correlation
Correlation Analysis is a statistical tool which studies the relationship between two or more
than two variables. Correlation Analysis involves various methods and techniques used for studying
and measuring the extent of relationship between the variables. Two variables are said to be correlated if the
change in one variable results in a corresponding change in the other variable. For example:
a. Income and expenditure of a family b. Demand and supply of a commodity
c. Age of husbands and wives. d. Sales and advertising of a commodity.
e. Marks of students in two subjects. f. Interest rate and deposit in a Bank
a) Positive and Negative Correlation: Correlation can be both positive and negative. If the
increase in the value of one variable results in a corresponding increase in the other variable,
then it is known as Positive Correlation, for example, income and expenditure of a family. On the
other hand, if the increase in one variable results in a corresponding decrease in the other
variable, then it is known as Negative Correlation, for example, price and demand of a commodity.
b) Simple, Multiple and Partial Correlation: If the relationship between only two variables is
studied then it is known as simple correlation. In a multiple correlation, however, we study
more than two variables. For example, the simultaneous study of correlation among land,
labour, capital and production of a crop is an example of a multiple correlation. Correlation
among deposit in a bank, interest rate, income, population etc also is a multiple correlations.
In a partial correlation, we study two variables, other variables remaining constant. For
example, the deposit in the bank with 5% interest rate (constant) and varying income level, is
a partial correlation between deposit and income level, keeping the interest rate constant.
c) Linear and Non-Linear Correlation: The correlation between two variables is said to be
linear if the constant ratio of change in one variable tends to the constant ratio of change is the
other variable. For example,
X 0 1 2 3 4
Y 3 5 7 9 11
The linear correlation is represented by a straight line.
The correlation between two variables is said to be Non-linear if the constant ratio of change
in one variable does not tend to the constant ratio of change in the other variable. For Example:
X 0 1 2 3
Y 5 10 19 21
The non-linear correlation can be explained by log-linear, exponential or any other
relationship.
198
6.1.2 Cause and Effect Relationship
In measuring correlation, the cause and effect relationship must be established between
variables. If there is no cause and effect relationship between the variables, there is little sense in
computing correlation between them. However, if the correlation is computed without establishing a
relationship of cause and effect, the correlation is called 'nonsense correlation', or statistically, it is
called 'spurious' correlation.
If a change in one variable leads to a change in another variable in any way, it is said to have
the existence of cause and effect relationship between them. For example, price-demand,
advertising-expense, sales-revenue, age-weight, interest rate-deposit in bank and so on.
The variable that causes the change is called 'causal variable' such as price, advertising
expense, age, and interest rate in the above examples. And, the variable, which is the result of a
change in causal variable, is called 'effect variable' such as, demand, sales, weight, and deposit in the
above examples. In other words, causal variables are also called independent variables and effect
variables are called dependent variables.
The example of rainfall is USA and the production of paddy in Nepal, is the 'nonsense
correlation'.
The commonly used methods, in studying the correlation between two variables, are
a) Scatter diagram (Graphical Method)
b) Karl Pearson's Coefficient of Correlation (Mathematical method)
a) Scatter Diagram (Graphical Method)
This is the simplest method by diagrammatic representation of a bi-variate distribution. For a
given bi-variate distribution, X is an independent variable and Y is a dependent variable, we
plot the given data in the form of dots. The diagram formed by these dots is known as Scatter
Diagram.
If all the points lie in a straight line from lower left to upper right direction then it is known as
Perfect Positive Correlation. Similarly, If all the points lie in a straight line from upper left to
lower right direction then it is known as perfect negative correlation. If all the points are very
dense if they slant upward and downward then it is known as a high degree of Positive and
Negative correlation. Similarly, if all the points are widely scattered in an upward and
downward slope then it is known as low degree of positive and negative correlation. Again if
the points are very widely scattered and do not make any slant then we say that there is no
correlation between the two variables.
The scatter diagram method, however can be used only to observe the correlation. It does not
help us find the magnitude of the correlation.
199
We can see this in the following diagrams.
y y
y
o x o x
o x r=–1
r=+1
Perfect Positive Correlation Perfect Negative Correlation Low Degree of Positive Correlation
y y y y
o x o x o x o x
Low degree of High degree of High degree of No correlation
negative correlation positive correlation negative correlation
r =
( ––
Σ X– X ) (Y – ––
Y) nΣXY – ΣXΣY
= 2 2
Σ(X – X )
–– 2
Σ(Y – Y )
–– 2 [n∑X – (∑X) ] [n∑Y2 – (∑Y) ]
2
– 2
= Σ(Y – Y ) = ΣY – nY
– 2
SSY
= Σ(X – X) (Y – Y ) = ΣXY – nX Y
– – ––
SSXY
200
Properties of Correlation: Some properties of Karl Pearson's correlation coefficient
a. The correlation coefficient lies between +1 and -1 i.e. –1 ≤ r ≤ 1.
b. The correlation coefficient is independent of change of origin and scale i.e rxy = ruv
c. The correlation coefficient is the geometric mean of two regression coefficients.
d. If r = 0, there is no correlation.
r = +1, there is perfect positive correlation
r = –1, there is perfect negative correlation
r = closer to +1, a high degree of positive correlation
r = closer to –1, a high degree of negative correlation
r = closer to 0 a low degree of positive or negative correlation
Example:
Find the coefficient of correlation between X and Y. Also develop the scatter diagram for
these data:
X 1 2 3 4 5 6 7 8 9
Y 12 11 13 15 14 17 16 19 18
Solution:
X Y X2 Y2 XY
1 12 1 144 12
2 11 4 121 22
3 13 9 169 39
4 15 16 225 60
5 14 25 196 70
6 17 36 289 102
7 16 49 256 112
8 19 64 361 152
9 18 81 324 162
45 135 285 2085 731
Karl Pearson's coefficient of correlation is given by
nΣXY – ΣXΣY
r = 2 2
[n∑X – (∑X) ] [n∑Y2 – (∑Y) ]
2
9 × 731 – 45 × 135
= 2 2
[9 × 285 – (45) ] [9 × 2085 – (135) ]
504
=
23.24 × 23.24
= 0.933
201
There is a high degree of positive correlation between X and Y variables.
Scatter Diagram
Y
19
18
17
16
15
14
13
12
11
10 X
0 1 2 3 4 5 6 7 8 9
Example:
Calculate the correlation coefficient of the marks in Mathematics and Statistics obtained by
eight students given below. Also plot a scatter diagram.
Marks in Mathematics 67 68 65 68 72 72 69 71
(X)
Marks in Statistics (Y) 65 66 67 67 68 69 70 72
Solution:
We take deviations from their respective means.
X Y X-69 Y - 68 (X-69)2 (Y-68)2 (X-69)(Y-68)
67 65 -2 -3 4 9 6
68 66 -1 -2 1 4 2
65 67 -4 -1 16 1 4
68 67 -1 -1 1 1 1
72 68 3 0 9 0 0
72 69 3 1 9 1 3
69 70 0 2 0 4 0
71 72 2 4 4 16 8
544 44 36 24
–– ΣX 552 –– ΣY 544
Here, X = = = 69, Y = = = 68
n 8 n 8
Karl Pearson's coefficient of correlation is given by
r =
( –– )
Σ X – X (Y – Y )
––
=
24
=
24
= 0.60
44 36 6.63 ×6
(
Σ X– X
–– 2 ) Σ Y– Y (
–– 2 )
202
There is a positive correlation between marks of two subjects
Scatter Diagram
Y
72
71
70
69
68
67
66
65 X
65 66 67 68 69 70 71 72
203
Step 3: Test Statistic
Under H0, the test statistic is
r
t = × n – 2∼tn – 2
1 – r2
Where, r = the sample correlation coefficient
n = sample size
Calculations of r:
i. r =
( ) (Y – ––
––
Σ X– X Y)
=
nΣXY – ΣXΣY
2 2
Σ(X – X )
–– 2
Σ(Y – Y )
–– 2 [n∑X – (∑X) ] [n∑Y2 – (∑Y) ]
2
Example:
A study of the heights of 18 pairs of boys and girls working in a call centre shows that the
coefficient of correlation is 0.50. Test whether the correlation is significant. Also fine the 95%
confidence interval for the population correlation coefficient.
Solution:
We are given, n = 18, r = 0.50
Setting up hypotheses:
Null hypothesis H0: ρ = 0, i.e., the population correlation between heights of boys and girls
in the population is not significant.
Alternative hypothesis H0: ρ ≠ 0, i.e., the correlation between heights of boys and girls in the
population is significant. [Two-tailed test]
Level of significance: It is given that the level of significance α = 5%.
Test Statistic: Under H0, the test statistic is
204
r 0.50 × 18 – 2
= 2× n–2 = = 2.309
1–r 1 –(0.50)2
∴ t = 2.309
Degree of freedom: df = n – 2 = 18 – 2 = 16
Critical value: The tabulated value of the test statistic t at 5% level of significance for 16
degree of freedom and in two tailed test is ± 2.120, i.e., | t0.05,16 | = 2.120.
Decision: Since the calculated value of t = 2.309 is greater than the tabulated value of
| t0.05, 16 | = 2.120, H0 is rejected and H1 is accepted. Hence we conclude that the correlation
between heights of boys and girls in the population is significant.
ρ):
For 95% confidence interval for population correlation coefficient (ρ
Here,
1 – α = 0.95 ⇒ α = 0.05
1 – r2
C.I. for ρ = r ± tα, (n – 2) ×
n
2
1–r
= r ± t0.05, 16 ×
n
1 – (0.50)2
= 0.50 ± 2.12 × = 0.50 ± 0.375
18
∴ Lower limit = 0.50 – 0.375 = 0.125
Upper limit = 0.50 + 0.375 = 0.875
Example:
Using the following data:
X 3 6 5 4 4 6 7 5
Y 3 2 3 5 3 6 6 4
a) Develop a scatter diagram for these data and what kind of relationship exists between X
and Y.
b) Calculate the correlation coefficient and interpret the results.
c) Test the hypothesis that their exists positive correlation between two variables at 5%
level of significance.
Solution:
a) Calculation of Correlation Coefficient
X Y X2 Y2 XY
3 3 9 9 9
6 2 36 4 12
5 3 25 9 15
4 5 16 25 20
4 3 16 9 12
6 6 36 36 36
7 6 49 36 42
5 4 25 16 20
2 2
ΣX = 40 ΣY = 32 ΣX = 42 ΣY = 144 ΣXY = 166
205
Scatter Diagram
Y
1 X
1 2 3 4 5 6 7 8
According to scatter diagram their exists low degree of positive relationship between X and Y
variables.
b) The correlation coefficient is
nΣXY – ΣXΣY
r = 2 2
[n∑X – (∑X) ] [n∑Y2 – (∑Y) ]
2
8 × 166 – 40 × 32
=
8 × 212 – (40)2 8 × 144 – (32)2
r = 0.44
c) Setting up hypothesis:
Null hypothesis (H0): ρ = 0 (There is no correlation between X and Y)
207
a. Scatter Diagram
12
10
4 X
20 30 40 50 60
b) According to scatter diagram, there exists high degree of positive relationship between annual
profit and research and development because the plotted point lines from lower left corner to
upper right corner.
nΣXY – ΣXΣY 8 × 2293 – 63 × 310
c) r= 2 2 = = 0.708
2
[n∑X – (∑X) ] [n∑Y – (∑Y) ]2 8 × 555 – (63)2 8 × 13350 – (310)2
Since r = 0.708, indicates that there is high degree of positive relationship between annual
profit and sales.
Setting up hypothesis:
Null hypothesis (H0): ρ = 0 (There is no correlation between annual profit and sales)
Alternative hypothesis (H1): ρ > 0 (There is a positive correlation between annual profit and
sales). (right-tailed test)
Level of Significance: α = 0.05 and d.f. = n – 2 = 8 – 2 = 6
r 0.708
Test Statistics: Under H0, t = × n–2= × 8 – 2 = 2.45
1 – r2 1 – (0.708)2
Critical value: tα (n – 2) = t 0.05 (6) = 1.943
Decision: Since t > tα i.e. 2.45 > 1.943
Therefore, H0 is rejected and H1 is accepted. So there is evidence of positive correlation
between annual sales and profit.
208
6.2 Regression Analysis
The literal meaning of the regression is stepping back or returning to the average value. The
term was first used by the British Biometrician Sir Francis Galton on estimating the nature of
relationship between height of fathers and sons in 1877. It is used widely in business and economics
to study the average relationship between two or more variables. Regression means the estimation or
prediction of an unknown value of one variable with the help of a known value of another variable
based on the historical data. The unknown variable is called a dependent variable an the known
variable is called an independent variable. The average relation between independent and dependent
variables that is mathematically established is known as Simple Linear Regression simple because
there is only one independent variable and linear because the relationship between the two variables
is linear. If the independent variables are more than one in a study, it is called multiple regressions.
Flow Chart for Regression Models
Regression Models
Simple Multiple
(1 Explanatory Variable) (2 + Explanatory Variables)
Linear regression attempts to model the relationship between two variables of fitting a linear
equation to observed data. One variable is considered to be an explanatory variable, and the other is
considered to be a dependent variable. For example, a modeler might want to relate the weights of
individuals to their heights using a linear regression model.
Before attempting to fit a linear model to observed data, a modeler should first determine
whether or not there is a relationship between the variables of interest. This does not necessary imply
that one variable causes the other (for example, higher SAT scores do not cause higher college
grades), but that there is some significant association between the two variables. A scatter plot can
be a helpful tool in determining the strength of the relationship between two variables. If there
appears to be no association between the proposed explanatory and dependent variables (i.e. the
scatter plot does not indicate any increasing or decreasing trends), the fitting a linear regression
model to the data probably will not provide a useful model. A valuable numerical measure of
association between two variables is the correlation coefficient, which is a value between – 1 and 1
indicating the strength of the association of the observed data for two variables.
209
The adjustment people make is to write the mean response as a linear function of the predictor
variable. This way, we allow for variation in individual responses(Y), while associating the mean
linearly with the predictor (X). The model we fit is as follows:
E(X/Y) = β0 + β1 X,
and we write the individual responses as,
Y = β0 + β1 X + ε
Where,
Y = Dependent (response) variable (Population)
X = Independent (explanatory) variable (Population)
β0 = Population Y-intercept.
β1 = Population slope coefficient or population regression coefficient. i.e. it measures the
average rate of change in dependent variable (Y) per unit change in independent variable (X)
We now have the problem of using the sample data to compute estimates of the parameters
β0 and β1. First, we take a sample of n objects, observing values Y of the response variable and X of
the predictor variable. We would like to choose as estimates for β0 and β1, the values b0 and b1 that
best fit the sample data, and the line that best fits the set of data points, called the sample regression
line. Sample statistic b0 and b1 provides an estimate of the population parameters β0 and β1 as well as
a predicted value of Y.
Y = b0 + b1X + e . . . (1)
$
This implies Y = b0 + b1X
Where
Y = Dependent (response) variable
X = Independent (explanatory) variable.
b0 = Sample Y-intercept
b1 = Sample slope coefficient or sample regression coefficient. i.e., it measure average rates of
change in dependent variable (Y) per unit change in independent variable (X).
e = Residual or error term = The difference between the observed and estimate value of the
$
dependent variable (Y) = Y – Y
$ = Estimated value of the dependent variable for a given value of independent variable.
Y
210
Flow chart of the estimation process in simple linear regression equation
Sample Data:
x y
Regression Model x1 y1
Y = β0 + β1X + ε x2 y2
Regression Equation
. .
Y = β0 + β1X . .
Unknown Parameters . .
xn yn
Estimated
b0 + b1 Regression Equation
Provide estimates of Y = b0 + b1X
β 0 and β 1 Sample Statistics
b0 + b1
211
Estimated regression live
Y y = b0 + b1x
y
Residual
y
y
y
Per unit change in x
} Slope = b1
y
y
y
b0 y
0
} y – intercept = b0
By using principle of least square, we can two normal equations of regression model (1).
The two normal equation of regression line (1) are
ΣY = nb0 + b1ΣX . . . (i)
2
ΣXY = b0ΣX + b1ΣX . . . (ii)
By solving these two normal equations we get the value of b0 and b1 as
b1 = =
(
– –
SSXY Σ X – X Y – X )( )
SSX – 2 ( )
Σ X–X
––
ΣXY – nX Y
=
–
ΣX2 – nX2
nΣXY – ΣX ΣY
=
nΣX2 – (ΣX)2
– – ΣY ΣX
b0 = Y – b 1X = – b1
n n
After finding the value of b0 and b1, we get the required fitted regression model of Y on X as
212
$
Y = b0 + b1X
Error Term (Residual): the different observed and estimated value of the dependent variable
(Y) is called error or residual and it is denoted by 'e'.
e=Y–Y $
Where,
e = error
Y = observed value of the dependent variable
$ = estimated value of the dependent variable for a given value of independent variable
Y
Example:
The data on sales and promotion expenditure on a product for 6 years are given below:
Years 1983 1984 1985 1986 1987 1988
Sales Rs.(Lakh) 8 10 9 12 10 11
Promotion Exp. Rs. (000) 2 2 3 5 5 6
Develop the estimating equation that describes the effect of promotion expenses on sales.
Estimate the promotion expenses to generate the sales level of 15 lakh rupees.
Solution:
Let X be the sales and Y be promotion expenditure.
X (Sales) Y (Promotion Exp.) X2 Y2 XY
8 2 64 4 16
10 2 100 4 20
9 3 81 9 27
12 5 144 25 60
10 5 100 25 50
11 6 121 36 66
60 23 610 103 239
Let the linear regression line of equation is
Y = b0 + b1X . . . (1)
We solve the following normal equations to find the value of b0 and b1
ΣY = nb0 + b1ΣX
ΣXY = b0ΣX + b1ΣX2
or, 23 = 6b0 + 60b1 . . . (i)
239 = 60b0 + 610b1 . . . (ii)
Multiplying equation (i) by 10 and subtracting it form equation (ii), we get
239 = 60b0 + 610b1
230 = 60b0 + 600b1
- - -
9 = 10b1
b1 = 0.9
213
Substituting the value of b1 in equation (i), we get
23 – 60 (0.9)
b0 =
6
23 – 54
=
6
23 – 54
=
6
–31
= = –5.17
6
∴ The regression equation of (Y on X) promotion expenses on sales
Y = b0 + b1X
or, Y = –5.17 + 0.9X
The estimated promotion expenses go generate the sales level of 15 lakh rupees, when X =15, are
$ –5.17 + 0.9 (15) = 8.33 lakh
Y=
Example:
Develop the estimating linear equation to predict sales (Y) when expenses (X) = Rs. 20,000.
Sales (Y) 50 50 55 60 65 65 65 60 60 50
Expenses (X) 11 13 14 16 16 15 15 14 13 13
Solution:
– – – – – –
Calculation of Regression Coefficient
X X– X (X – X )2 Y Y– Y (Y – Y )2 (X – X )(Y – Y )
50 –8 64 11 –3 9 24
64 –8 64 13 –1 1 8
55 –3 9 14 0 0 0
60 2 4 16 2 4 4
65 7 49 16 2 4 14
65 7 49 15 1 1 7
65 7 49 15 1 1 7
60 2 4 14 1 0 0
60 2 4 13 –1 1 –2
50 –8 64 13 –1 1 8
580 0 360 140 0 22 70
– – –
From the table ΣX = 580, ΣY = 140, Σ(X – X )2 = 360, Σ(Y – Y )2 = 22, Σ(X – X)
–
(Y – Y) = 70, n = 10
Let the linear regression line of equation is
Y = b0 + b1X . . . (1)
Using least squire method,
214
(
– –
SSXY Σ X – X Y – X )( 70 )
b1 = = = = 0.19
SSX – 2
( 360
)
Σ X–X
ΣY ΣX 140 580
and, b0 = – b1 = – 0.19 × = 2.748
n n 10 10
Now substituting the value of b0 and b1 in equation (1)
Y = 2.748 + 0.194X
When, X = 20 then
$ = 2.748 + 0.194 × 20 = 6.628
Y
}
unexplained deviation
model, several measures of variation need (SSE): y – y
to be developed. In a regression analysis, y= b + b x
y 0
Total deviation
1
Explained deviaton
the total variation or total sum of squares (SSR):y – y (SST):y y
(SST or TSS) is divided into explained y
variation or regression sum of squares
(SSR) and unexplained variation or error
sum of squares (SSE). These different
measures of variation are shown in the
x x X
following figure.
From the figure, mathematically
Total sum of Square = Regression Sum of Square + Error Sum of Square
i.e. TSS = SSR + SSE
Where, TSS = Total sum of square deviation (or total variation) of actual values of variable Y
from its mean value.
SSE = Sum of squares of error (unexplained variation) in the values of dependent variable Y
from the least squares line due to the sampling errors (i.e. amount of residual variation in the data
that is not explained by independent variable X).
SSR = Sum of squares of regression (or explained variation)is the actual values of dependent
variable Y accounted for or explained by variation among value of independent variable of X.
Computing the sum of Squares
Total sum of squares: Total sum of squares, abbreviate TSS, is a measure of variation in the
–
values of dependent variable (Y) around their mean value Y. That is the total sum of square can be
expressed mathematically as:
215
–– 2 (ΣY)2
(
TSS = Σ Y – Y )
= ΣY2 –
n
–
= ΣY2 – nY2
Regression Sum of Squares: The total sum of squares of the total variation is divided into
the sum of two components. One of them is the explained variation due to the relationship between
the considered dependent variable (Y) and independent variable (X). This variation is known as
regression sum of square and shortly denoted by SSR. The regression sum of squares is the sum of
the squared differences between the predicted value of Y and the mean value of Y i.e.
2
– 2
$ – Y) (ΣY) –
SSR = Σ(Y = b0 ΣY + b1 ΣXY – = b0 ΣY + b1 ΣXY – n Y2
n
Error Sum of Squares: The another component of total sum of square is the unexplained
variation which might be developed due to some other factors other than the relationship between
variable X and Y. This component is known as error sum of square and is shortly denoted by SSE.
The error sum of squares is computed as the sum of the squared differences between the observed
value of Y and the predicted value of Y i.e.
$ 2 = ΣY2 – b ΣY – b ΣXY
SSE = Σ(Y – Y) 0 1
(–– 2 )
–
Here, TSS = Σ Y – Y = ΣY2 – n. Y2
2
(
–
^ –Y
SSR = Σ Y ) –
= boΣY + b1ΣXY – n. Y2
2
SSE = Σ(Y – Y
^ ) = ΣY2 – b ΣY – b ΣXY
o 1
Note: Since the difference between actual and estimated values of the dependent variable is
^
called residuals. It is denoted by e i.e. e = Y – Y.
The standard error of the estimate measures the average variation or scatteredness of the
observed data point around the regression line. Standard error of the estimate is used to measure the
reliability of the regression equation and it is denoted by Se or Syx and is calculated by using the
following relation.
SSE
Se =
n–2
^ )2
Σ(Y –Y
or, Se =
n–2
∑Y2 – b0 ∑Y – b1∑XY
or, Se =
n–2
216
6.2.5 Interpreting the Standard Error of the Estimate
The regression line having the lesser value of the standard error of the estimate is more
reliable than the regression line having the higher value of the standard error of the estimate i.e. how
much the value of the standard error of the estimate is less; the fitted regression line is more reliable.
1. Is Se = 0 this means there is no variation of the observed data around the regression line i.e. all
the observed data lies in the regression line is perfect of predicting the dependent variable.
2. If the value of Se is large then fitted regression line is poor for predicting the dependent
variable since there is greater variation of the observed data around the regression line.
3. If the value of Se is small, this means there is less variation of the observed data around the
regression line. So the regression line will be better for predicting the dependent variable.
If Se = 2.5, this mean, the average variation of the observed data around the regression line is
2.5
The coefficient of determination measures the strength or extent of the linear association
between dependent variable (Y) and independent variable (X). It measures proportion or percent of
the variation in the dependent variable (Y) that is explained by independent variable (X) of the
regression line. In other word, coefficient of variation measures the total variation in the dependent
variable due to the variation in the independent variable and it is denoted by 'r2'. The following
relations are used to obtain the value of coefficient of determination.
SSR
r2 =
SST
SSE
or, r2 = 1 –
SST
^ )2
Σ(Y – Y
or, r2 = 1–
– )2
Σ(Y – Y
–
b0∑Y + b1∑XY – n . Y2
or, r2 =
–
∑Y 2 – n Y 2
Note: Since coefficient of determination is the square of the Correlation coefficient. So
correlation coefficient is the square root of the coefficient of determination is the square of
the Correlation coefficient. So correlation coefficient is the square root of the coefficient of
determination and can be obtained from the coefficient of determination by the following
relation.
r = ± r2
If the regression coefficient (b1) is negative then take the negative sign.
If the regression coefficient (b1) is positive then take the positive sign.
217
For example, if the correlation coefficient between two variable is 0.8945 i.e. r = 0.8945 and
its coefficient of determination is r2 = (0.8945)2 = 0.80. It means 80% of the variation in
dependent variable is explained by variation in independent variable of regression line and
remaining 20% variation in the dependent variable (Y) is due to other related factors which
are not accounted in the model.
Example:
The city council of Kathmandu has gathered data on number of minor traffic accidents and the
number of youth football games that occurred in town over the weekend.
X (football games) 20 30 10 12 15 25 34
Y (minor accident) 6 9 4 5 7 8 9
a) Develop the estimating linear equation to predict minor accident from football games.
b) Predict the number of minor traffic accidents that will occur at weekends during which 30
soccer games will take place in Kathmandu.
c) Calculate and interpret the standard error of estimate Se for these data
d) Calculate and interpret the value for the coefficients of determination.
Solution:
Calculation of Regression Coefficient
X Y XY X2 Y2
20 6 120 400 36
30 9 270 900 81
10 4 40 100 16
12 5 60 144 25
15 7 105 225 49
25 8 200 625 64
34 9 306 1156 81
146 48 1101 3550 352
a) Let the linear regression line of equation is
Y = bo + b1X . . . (i)
Using least square method
nΣXY – ΣXΣY 7 × 1101 – 146 × 48
b1 = = = 0.1977
nΣX2 – (ΣX)2 7 × 3550 – (146)2
ΣY ΣX 48 146
b0 = – b1 = – 0.1977 × = 2.73
n n 7 7
Now substituting the value of b0 and b1 in equation (1)
Y = 2.73 + 0.1977X
b) When, X = 20 then
$ = 2.73 + 0.1973 × 20 = 8.661 = 9 (approximately)
Y
The number of minor traffic accident that will occur at weekends be 9 during which 30 soccer
games will take place in Kathmandu.
218
c) Standard error of estimate, Se, for these data is given by
∑Y2 – b0∑Y – b1 ∑XY
Se =
n–2
352 – 2.73 × 48 – 0.1977 × 1101
=
7–2
3.2923
= = 0.658 = 0.811
5
i.e. The variability of observed value of minor accident around the regression line is 0.811
number.
d) The coefficient of determination is given by
–
b0∑Y + b1∑XY – n . Y2 2.73 × 48 + 0.1977 × 1101 – 7 × 6.852
r2 = = = 0.8638
– 341 – 7 × 6.852
∑Y 2 – n Y 2
Therefore, the coefficient of determination is 0.8639 which means 86.39% of variation in
minor traffic accidents is explained by variation in football games that occurred in Kathmandu
and the remaining 13.61% variation in minor trafficking accidents is due to the other related
factors.
Example:
A consultant is interested in seeing how accurately a new job performance index measured,
what is important for a corporation. One way to cheek is to look at the relationship between the job
evaluation index and an employee's salary. A sample of eight employee's was taken and information
about salary (in thousands of Rs.) and job performance index (1-10; 10 is best) was collected.
Job performance index (X) 9 7 8 4 7 5 5 6
Salary : (Y) 36 25 33 15 28 19 20 22
a) Develop the estimating linear equation that best describe these data.
b) Calculate and interpret the standard error of estimate Se for these data.
c) Calculate and interpret the coefficient of determination and coefficient of correlation for these
data by using regression constants.
Solution:
Calculation of Regression Coefficient
Job Performance index (X) Salary (Y) XY X2 Y2
9 36 324 81 1296
7 25 175 49 625
8 33 264 64 1089
4 15 60 16 225
7 28 196 49 784
5 19 65 25 361
5 20 100 25 400
6 22 132 36 484
51 198 1346 345 5264
219
a) Let the linear regression line of equation is
Y = b0 + b1 X . . . (i)
ΣY ΣX 198 51
b0 = – b1 = – 4.21 × = –2.11
n n 8 8
i.e. The variability of observed value of salary around the regression line is Rs. 1.794
thousands
c) The coefficient of determination is given by
–
b0∑Y + b1∑XY – n . Y2 –2.11 × 198 + 4.21 × 1346 – 8 × 24.752
r2 = = = 0.9708
– 1982 – 8 × 24.752
∑Y 2 – n Y 2
Hence, the sample coefficient of determination is given by 0.9708 which means 97.08% of
variation in salary is explained by variation job performance index and the remaining 2.92%
variation in salary is due to the other related factors.
i.e. there is high degree of positive correlation between salary and job performance index.
220
entirely different model with opposite conclusions. So such underlying assumptions have to be
verified before attempting to regression modeling. Such information is not available from the
summary statistic such as t-statistic, F-statistic or coefficient of determination.
One important point to keep in mind is that these assumptions are for the population and we
work only with a sample. So the main issue is to take a decision about the population on the basis of
a sample of data.
Several diagnostic methods to check the violation of regression assumption are based on the
study of model residuals with the help of various types of graphics.
a) Linearity: Firstly linear regression needs the relationship between the independent and
dependent variables to linear. It is also important to check for outliers since linear regression
is sensitive to outlier effects. The linearity assumption can best tested with scatter plot.
b) Homoscedasticity: This assumptions requires that the variation around the line of regression
be constant for all values of independent variables (X). This means that the errors vary the
same amount when X is a low value as when X if a high value. The Homoscedassticity
assumption is important for using the least square method to fit the regression line. If there are
serious departures from this assumption, either data transformations or weighted least square
method can be applied.
c) Independence of errors: This assumption requires that the error around the regression line
be independent for each value of explanatory variables. This is particularly important when
data are collected over a period of time. In such situation errors for specific time period are
often correlated with those of the previous time period.
d) Normality of errors: This assumption requires that, the errors around the regression line be
normally distributed for each value of X (independent variables). As long as the distribution
of the errors around the regression line for each value of independent variables in not
extremely different from a normal distribution, then inference about the line of regression and
regression coefficients will not be seriously affected.
6.2.8 Residual Analysis
The residual analysis is a graphical method to evaluate whether the regression model that has
been fitted to the data is an appropriate model. In additional residual analysis enables potential
violations the assumptions of the regression model.
The aptness of the fitted regression model is evaluated by plotting the residual on the vertical
axis against the corresponding X value of the independence variable along the x-axis. If the fitted
model is appropriate for the data then there will be no apparent pattern in this plot. However, if the
fitted model is not appropriate then there will be a relationship between X values and the residual
(e). By plotting the histogram, box-and-whisker plot, stem-and-leaf display of the error term, we can
measure the normality of the error.
Definition: The residual is defined as the difference between the observed and fitted value of
study variable. The ith residual is denoted by ei defined as
$ ,
ei = Y i – Y i = 1, 2, . . . , n
i
x
Whereas the following figure indicates a nonlinear trend:
Y
x
b) Residual analysis for Homoscedasticity
222
The assumption of homoscedasticity can be evaluated from a plot of the residuals with X and
observe whether they appear to be major difference in the variability of residuals for different
value of X.
x
Heteroscedasticity
Y
x
Homoscedasticity
c) Residual Analysis for independence of errors
The assumption of independence of the errors can be evaluated by plotting the residuals in the
order or sequence in which the observed data were obtained.
Standard Error of Intercept
X2 X2
Se Σ Se Σ
n n
S b0 = =
2 –
(
Σ X–X
– ) ΣX2 – nX2
Standard Error of Slope or Regression Coefficient
223
Se Se Se
S b1 = = =
SSx 2 –
(
–
Σ X–X ) ΣX2 – nX2
Confidence interval for Y-intercept (β0)
b0 – tn – 2 Sb0 ≤ β0 ≤ b0 + tn – 2 Sb0
Confidence interval for the population regression coefficient or population slope (β1)
b1 – tn – 2 Sb1 ≤ β1 ≤ b1 + tn – 2 Sb1
Where, tn – 2 is tabulated value of 't' obtained from two tailed students' t-table at (n – 2) degree
of freedom and 'α' percent level of significance.
d) Residual Analysis for Normative of error: The assumption of normality of
disturbances is very much needed for the validity of the results for testing of
hypothesis, confidence intervals and prediction intervals. Small departures from
normality may not affect the model greatly but gross non-normality is more serious.
The normal probability plots help in verifying the assumption of normal distribution.
If errors coming from a distribution with thicker and heavier tails than normal, then
the least squares fit may be sensitive to a small set of data. Heavy tailed error
distribution often generates outliers that “pull” the least squares too much in their
direction. In such cases, other estimation techniques like robust regression methods
should be considered.
This figure has an ideal normal probability plot. Points lie approximately on the straight line
and indicate that the underlying distribution is normal.
Pi
1
0.5
ei
0
This figure has sharp change in the direction of trend in upward direction from the mid. This
indicates that the underlying distribution is positively skewed.
224
Pi
1
0.5
ei
0
This figure has sharp change in the direction of trend in downward direction from the mid.
This indicates that the underlying distribution is negatively skewed.
Pi
1
0.5
ei
0
Where,
Se
S b1=
SSX
( – 2 ) – 2 ( )
SSX = Σ X – X = ΣX2 – n X
Decision:
Since | t | ≤ |tα, (n – 2)|, therefore the null hypothesis (H0) is accepted otherwise it is rejected and
alternative hypothesis (H1) is accepted.
(ii) β 1)
F-test for slope (β
As an alternative to t-test, we can use F-test to determine whether the slope is simple linear
regression in statistically significant. i.e., this test is applied to test whether the regression
coefficient β1 is statistically significant or not.
Stated up hypothesis:
Null Hypothesis (H0): β1 = 0 (There is no linear relationship i.e. slope is zero)
Alternative hypothesis (H1): β1 ≠ 0 (There is a linear relationship i.e. slope is not zero)
Test statistics: Under H0
MSR
F=
MSE
SSR
Where, MSR =
k
SSE
MSE =
n–k–1
k = number of independent variables in the regression model. The value of k = 1 for simple
linear regression model as it has only one predictor variable X.
SSR and SSE are the regression sum of squares and error sum of squares respectively. The
test statistic F follows F-distribution with (n– K – 1) i.e.,(n – 2) degree of freedom for k = 1.
The calculations for the F-statistic are summarized in a table called analysis of variance
(ANOVA) for simple linear regression which is shown below:
226
Analysis of Variance (ANOVA) table for simple linear regression
Decision:
Since F ≤ Fα, (n – 1), therefore the null hypothesis (H0) is accepted otherwise it is rejected and
alternative hypothesis (H1) is accepted.
Example:
The management of a company has been studying the relationship between advertisement
campaign cost (X) and sale of his product (Y) at 25 places. The sample slope was found to be 3.68.
The standard error of the regression slope coefficient is 0.196 is there reason to believe that the slope
has change from its past value of 4.80 use 5% level of significant.
Solution:
We are given,
n = 25
b1 = 3.68
Sb1 = 0.196
227
Example:
A firm administered a test to sales trainees before they go into the field. The management of
the firm is interested in determining the relationship between the test scores and the sales made by
the trainees at end of the year in the field. The following data were collected for 10 sales personnel
who have been in the field one year.
Sales person number 1 2 3 4 5 6 7 8 9 10
Test Score (X) 2.6 3.7 2.4 4.5 2.6 5.0 2.8 3.0 4.0 3.4
Number of units sold (Y) 90 140 80 180 100 190 110 130 170 150
a) Find the least square regression line that could be used to predict the sales from trainee test
scores.
b) How much does the expected number of units does sold increase for each 1-point increase in a
trainee's test scores?
c) Use the least square line to predict the number of units that would be sold by a trainee who
received an average test score.
d) Test whether there is linear relationship between test score and number of unit sold at 5%
level of significance.
Solution:
Calculation of Regression Coefficient
X Y XY X2 Y2
2.6 90 234 6.76 8100
3.7 140 518 13.69 19600
2.4 80 192 5.76 6400
4.5 180 810 20.25 32400
2.6 100 260 6.76 10000
5.0 190 950 25 36100
2.8 110 308 7.84 12100
3.0 130 390 9 16900
4.0 170 680 16 28900
3.4 150 510 11.56 22500
2 2
ΣX = 34 ΣY = 1340 ΣXY = 4852 ΣX = 122.62 ΣY = 193000
a) Let the linear regression line of equation is
Y = b0 + b1 X . . . (i )
Using least square method,
n ΣXY – ΣX ΣY 10 × 4852 – 34 × 1340
b1 = = = 42.165
2
nΣX – (ΣX) 2
10 × 122.62 – (34)2
ΣY ΣY 1340 34
b0 = – b1 = – 42.165 × = – 9.361
n n 10 10
Now substituting the value of b0 and b1 in equation (1)
Y = – 9.361 + 42.165X, be the required regression line that could be used to predict the sales
from trainee test scores.
228
b) The expected number of units sold increase for each 1-point increase in a trainee's test
score When, X = 1-point increase in a trainee's test score, then
Y$ = – 9.361 + 42.165× 1 = 32.804 = 33 (approximately)
– ΣX 34
c) X= = = 3.4
n 10
$ = – 9.361 + 42.165X = – 9.361+42.165 × 3.4 = 134.
Y
The number of units that would be sold by a trainee who received an average test score 134.
d) Standard error of estimate, Se, for these data is given by
∑Y2 – b0∑Y – b1 ∑XY 193000 – ( – 9.361) × 1340 – 42.165 × 4852
Se = = = 10.94
n–2 10 – 2
The standard error of the regression coefficient (slope) is given by
Se
S b1 =
–
ΣX2 – n . X2
10.94
=
122.62 – 10 × (3.4)2
10.94
=
7.02
10.94
= = 4.129
2.649
Setting up hypothesis:
H0: β1 = 0 i.e. there is no linear relationship between test score and number of unit sold
H1: β1 ≠ 0 i.e. there is linear relationship between test score and number of unit sold (two
tailed test)
Level of significance: α = 0.05
Degree of freedom = n – 2 = 10 – 2 = 8
Test Statistic: Under H0,
b1 42.165
t= = = 10.21
Sb1 4.129
Critical Values:
Tabulated value of t at 0.05 level of significance and 8 degree of freedom can be obtained
from t table as tα = 2.306
Decision:
Since t > tα, therefore H0 is rejected at 0.05 level of significance. So, there is evidence that
there linear relationship between test score and number of unit sold.
Example:
For a company to maintain a competitive edge in the market place, spending on research and
development (R & I) is essential. To determine the optimum level of (R & I) is essential. To
determine the optimum level of (R and D) spending and its effect on the company's value,
performing a simple linear regression was proposed. Data collected for the largest R and D spenders
were used to fit a straight line regression model relating Y to X, where
Y = Price/earnings (P/E) ratio, and
X = R and D expenditure/sales (R/S) ratio
229
The data for the twenty companies used in the study are provided in the following table.
Company (P/E) ratio (R/S) Company (P/E) ratio (R/S) ratio
ratio
1 5.6 .003 11 8.4 0.058
2 7.2 .004 12 11.1 0.058
3 8.1 .009 13 11.1 0.067
4 9.9 .021 14 13.2 0.080
5 6.0 .023 15 13.4 0.080
6 8.2 .030 16 11.5 0.081
7 6.3 .035 17 9.8 0.091
8 10.0 .037 18 16.1 0.092
9 8.5 .044 19 7.0 0.064
10 13.2 .051 20 5.9 0.028
Calculate shows that
ΣX = 0.959, ΣY = 190.5, ΣX2 = 0.0616, ΣY2 = 1978.3, ΣXY = 10.292
a. Assuming a linear relation, use the least square method to find the regression equation.
b. compute the residual e, for company with R/S ratio, X = 0.050 and P/E ratio, Y = 13.2
and also, compute the standard error of the estimate, SYX and interpret its measuring.
c. Calculate the coefficient of determination of the regression equation in the problem and
interpret its meaning.
Solution:
232
f) Setting up hypothesis
Null hypothesis (H0): β1 = 0
Alternative hypothesis (H1): β1 ≠ 0
Level of significance: α = 0.05 and d.f. = n – 2 = 8 – 2 = 6
Test Statistic: Under H0
MSR
F=
MSE
SSR
Where, MSR =
K
SSE
MSE =
n–k–1
k = number of independent variable, Here K=1
Now,
–
SSR = b0 ΣX + b1 XY – n Y 2 = 0.548 × 40 + 0.636 × 364 – 8 × 25 = 53.424
2
SSE = ΣY – b0ΣY – b1 ΣXY = 256 – 0.548 × 40 – 0.636 × 364 = 2.546
ANNOVA Table
Source of
df SS MSS Fcal Ftab
Variation
Regression k=1 SSR = 53.424 53.424 MSR Fα, k, (n – 2)
MSK = F=
1 MSE = F0.05 (1, 6)
= 53.424 53.424 = 5.99
=
Error n–2=8–2 SSE = 2.576 2.576 0.4129
MSE = = 124.532
=6 0.429
= 0.429
Total n–1=8 –1 TSS = 56
=7
Decision: Since F > Fα
∴ Null hypothesis (H0) is rejected and alternative hypothesis is accepted.
g) The 95% confidence interval for β1 is given by
b1 ± tα(n – 2) × Sb1
We have, n = 8, b1 = 0.636, Sb1 = 0.207 and tα, (n – 2) = 2.447
∴ 0.636 ± 0.5065
Thus the lower limit = 0.636 – 0.5065 = 1.1425
Upper limit = 0.636 + 0.5067 = 0.1293
With 95% confidence that the population slope lies between (0.1293 to 1.425)
Example:
A marketer is interested in the relation between the width of the shelf space for her brand of
coffee (X) and weekly sales (Y) of the product in a suburban supermarket (assume the height is
always at eye level). Marketers are well aware of the concept of ‘compulsive purchases’, and know
that the more shelf space their product takes up, the higher the frequency of such purchases. She
believes that in the range of 3 to 9 feet, the mean weekly sales will be linearly related to the width of
the shelf space.
233
Week X Y
1 6 526
2 3 421
3 6 581
4 9 630
5 3 412
6 9 560
7 6 434
8 3 443
9 9 590
10 6 570
11 3 346
12 9 672
Total 72 6185
a) Find the least square regression line of equation of Y on X. Also predict the value of Y When
X=1
b) How do you interpret a slope of the regression line?
c) Computer and interpret the value of coefficient of determination.
d) Compute and interpret the standard error of estimate.
e) Test whether there is linear relationship between brand of coffee (X) and Sales (Y) at 5%
level of significance.
f) Set up null and alternative hypothesis, carryout F-test, and interpret the result at 5% level of
significance.
g) Set up a 95% confidence interval estimate for slope β1.
Solution:
Calculation of Regression Coefficient
Week X Y XY X2 Y2
1 6 526 3156 36 276676
2 3 421 1263 9 177241
3 6 581 3486 36 337561
4 9 630 5670 81 396900
5 3 412 1236 9 169744
6 9 560 5040 81 313600
7 6 434 2604 36 188356
8 3 443 1329 9 196249
9 9 590 5310 81 348100
10 6 570 3420 36 324900
11 3 346 1038 9 119716
12 9 672 6048 81 451584
72 6185 39600 504 3300627
234
a) Let the linear line of equation in Y = b0 + b1X . . . (1)
Using least square method,
n ΣXY – ΣX ΣY 12 × 39600 – 72 × 6185 29880
b1 = = = = 34.5833
nΣX2 – (ΣX)2 12 × 504 – (72)2 864
ΣY ΣX
b0 = – b1
n n
6185 72
= – 34.5833
12 12
= 515.4167 – 207. 50
= 307. 967
Substituting the value of b0 and b1 in equation (1)
Y = 307.967+ 34.5833X
When X = 1 then
$
Y = 307.967 + 34.5833 × 1
= 342.55feet.
b) Here the slope of regression line (b1) = 34.5833 which indicates that per unit changed in
independent variable (X) then dependent variable (Y) is changed by 34.588
c) The coefficient of determination is given by
–
b0∑Y + b1∑XY – n . Y2
r2 =
–
∑Y2 – n Y2
307.967 × 6185 + 34.588 × 39600 – 12 × 265654.340
=
3300627 – 12 × 265654.340
= 0.7679
It indicates that 76.79% the variation in dependent variable (Y) is explained by the variation
in independent variable and the remaining 23.21% variation on dependent variable is due to
the other factors.
d) The standard error of estimate is given by
∑Y2 – b0∑Y – b1 ∑XY
Se =
n–2
3300627 – 307.967 × 6185 – 34.5833 × 39600
=
12 – 2
26352.425
=
10
= 51.33
i.e., the variability of the observed value of dependent variable (Y) around the regression line
is 51.33.
e) Setting up hypothesis:
Null hypothesis (H0): β1 = 0 (There is no linear relationship)
Alternative hypothesis (H1): β1 ≠ 0 (There is a linear relationship)
Level of Significance: = 0.05 and df = n – 2 = 12 – 2 = 10
235
Test Statistic: Under H0
b1
t=
S b1
Where,
Se
S b1 =
–
ΣX2 – nX2
51.33 51.33
= = = 6.049
804 – 12 × 36 8.4853
34.5833
∴t= = 5.717
6.049
Critical value: tα, (n – 2) = t0.05 (10) = 2.228
Decision: Since t > tα,(n – 2)
Therefore, null hypothesis (H0) is rejected and alternative hypothesis (H1) is accepted.
f) Setting up hypothesis
Null hypothesis (H0): β1= 0
Alternative hypothesis (H1): β1 ≠ 0
Level of significance: α = 0.05 and d.f. n – 2 = 12 – 2 = 10
Test Statistic: Under H0,
MSR
F=
MSE
SSR
Where, MSR =
K
SSE
MSE =
n–k–1
k = number of independent variable. Here K=1
Now,
–
SSR = b0 ΣY + b1 ΣXY – n Y2
= 307.967 × 6185 + 34.5833 × 39600 – 12× 265654.340 = 86422.495
SSE = ΣY2 – b0ΣY – b1 ΣXY = 3300627 – 307.967 × 6185 – 34.5833× 39600 = 26352.425
ANNOVA Table
Source of
df SS MSS Fcal Ftab
Variation
Regression k=1 SSR = 86422.495 86422.495 MSR Fα,{ k, (n – 2)}
MSR= F=
1 MSE = F0.05 (1,
= 86422.495 = 10)
Error n – 2 = 12– 2 SSE = 26352.425 26352.425 86422.495 = 4.96
MSE = 2635.2425
= 10 10
= 2635.2425 = 32.79
Total n –1 = 12 –1 56
= 11
Decision: Since, F > Fα therefore, null hypothesis (H0) is rejected and alternative hypothesis
is accepted.
g) The 95% confidence interval for β1 is given by
236
b1 ± tα(n – 2) × Sb1
We have, n = 12, b1 = 34.5833, Sb1 = 6.049 and tα, (n – 2) = t 0.05(10) = 2.23
∴ 34.5833 ± 2.23 × 6.049 34.5833 ± 13.489
Thus the lower limit = 34.5833– 13.489 = 21.094
Upper limit = 34.5833 + 13.489 = 48.072
With 95% confidence that the population slope lines between (21.094 to 48.072).
Example:
The following table gives aptitude test scores and productivity indices of 8 randomly selected
workers. Find the equation to the line which can be used to predict the productivity index from the
aptitude score. Whose test score are 66.
Aptitude score (X) 57 58 59 59 60 51 62 64
Productivity index (Y) 67 68 65 68 72 72 69 71
a) Find the least square regression line of equation of Y on X. Also predict the value of Y when
X=66.
b) How do you interpret a slope of the regression line?
c) Compute and interpret the value of coefficient of determination.
d) Compute and interpret the standard error of estimate.
e) Test whether there is linear relationship between aptitude score (X) and Productivity
index (Y) at 5% level of significance.
f) Compute the overall F test statistic and test at 5% level of significance.
g) Set up a 95% confidence interval estimate for slope β1 .
Solution:
X Y XY X2 Y2
57 67 3819 3249 4489
58 68 3944 3364 4624
59 65 3835 3481 4225
59 68 4012 3481 4624
60 72 4320 3600 5184
61 72 4392 3721 5184
62 69 4278 3844 4761
64 71 4544 4096 5041
480 552 33144 28836 38132
a) Let the linear regression line of equation is
Y = b0 + b . . . (1)
Using least square method,
n ΣXY – ΣX ΣY 8 × 33144 – 480 × 552 192
b1 = = = = 0.667
nΣX2 – (ΣX)2 8 × 28836 – (480)2 288
ΣY ΣX 552 480
b0 = – b1 = – 0.667 × = 69 – 40.02 = 28.98
n n 8 8
237
Substituting the value of b0 and b1 in equation (1)
Y = 28.98+0.667X
When X = 66 Then,
$ = 28.98 + 0.667 × 66 = 73.002
Y
b) Here, the slope of regression line (b1) = 0.667 which indicates that per unit changed in
independent variable (aptitude score) then dependent variable (Productivity index) in changed
by 0.667.
c) The coefficient of determination is given by
–
b0∑Y + b1∑XY – n . Y2 28.98 × 552 + 0.667 × 33144 – 8 × 476 16.008
r2 = = = = 0.3638
– 38132 – 8 × 4761 44
∑Y2 – n Y2
It indicates that 36.38% of the variation in dependent variable (Y) is explained by the
variation in independent variable (X) and the remaining 63.62 variation in dependent variable
is due to the other factors.
d) The standard error of estimate is given by
∑Y2 – b0∑Y – b1 ∑XY 38132 – 28.98 × 552 – 0.667 × 33144 5.2907
Se = = = = 0.939
n–2 8–2 6
i.e., the variability of observed value of dependent variable (X) around the regression line is
0.939
e) Setting up hypothesis:
Null hypothesis (H0):β1=0 (There is no linear relationship between X and Y)
Alternative hypothesis (H1):β1≠0 (There is linear relationship between X and Y)
Level of significance: α=0.05
Test statistic: Under H0,
b1
t=
S b1
Se
Where, Sb1 =
–
Σx2 – nX2
0.939 0.939
= = = 0.1565
28836 – 8 × 3600 6
0.667
∴t= = 4.262
0.15654
Critical value: tα,(n-2) = t0.05,(6) = 2.447
Decision: Since t> tα,(n-2)
Therefore null hypothesis (H0) is rejected and alternative hypothesis (H1) is accepted.
238
f) Setting up null hypothesis:
SSE
MSE = , K = number of independent variables
n–K–1
Here, K=1
–
Now, SSR = b0 ΣY + b1 ΣXY – n Y2 = 16.008 and,
ANOVA-table
Source of Variation df SS MSS Fstat. Fα
Regression K=1 SSR =16.008 16.008 MSR Fα,k,(n-2)
MSR = = 16.008 F =
1 MSE =
0.05,(1,6)
Error n – 2= 8 – 2= 6 SSE = 27.992 27.992 16.008 = 5.99
MSE = = 4.665 = 4.665
6
Total n –1= 8 – 1=7 TSS=44 = 3.431
Decision: Since, F<Fα
Therefore, null hypothesis (H0) is accepted.
b1±tα,(n-2)×Sb1
⇒ 0.667±2.447×0.1565
⇒ 0.667±0.3829
Thus the lower limit = 0.667 – 0.3829=0.284
and Upper limit = 0.667 + 0.3829=1.049
with 95% confidence that the population slope lies between (0.284, 1.049)
239
Theoretical Questions:
1. What is meant by correlation? Does it always signify cause and effect relationship between
the two variables?
2. What do you mean by correlation? Discuss different types of correlation giving suitable
examples.
3. Define Karl Pearson's coefficient of correlation. Discuss the relative merits and demerits of
Karl Pearson's method.
4. State different measures of correlation and briefly discuss their usefulness.
5. State properties of correlation coefficient. Discuss the significance and interpretation of
coefficient of correlation.
6. What is scatter diagram? Give the procedure of drawing a scatter diagram. Draw a scatter
diagram when the coefficient of correlgation r = ± 1.
7. What do you mean by regression?
8. Describe the uses of regression analysis in business and management problems.
9. What are regression equations? Describe them with suitable example.
10. What do you mean by regression coefficient? Explain the concept of regression coefficient of
Y on X.
11. Discuss the student's t-test for testing the significance for observed sample correlation
coefficients.
12. Discuss the student's t-test for testing the significance of regression coefficient.
13. Write short notes on:
a. Positive and negative correlation b. Scatter diagram
c. Karl Pearson's coefficient of correlation d. Assumption on regression analysis
e. Residual analysis f. Standard error of estimate
g. Coefficient of determination h. Simple Linear regression model
Practical Problems:
1. From the following data determine Karl Pearson's coefficient of correlation.
X variable Y variable
No. of items 12 12
Mean 15 10
Standard deviation 3.16 3.29
Sum of the square of deviation form mean 120 130
Sum of the products of the deviations of both the variables = 90
2. Calculate the coefficient of correlation from given information where co-variance of two
variables X and Y is 10, the variance of X is 25 and the variance of Y is 16.
240
3. From the following data, find out the coefficient of correlation as given by Karl Pearson.
a. Sum of the deviations of X series from its assumed mean = –160
b. Sum of the deviations of Y series from its assumed mean = –30
c. Sum of the squares of deviations of X series = 8500
d. Sum of the squares of deviations of Y series = 3000
e. Sum of the products of the deviation of X and Y series = 2000
f. Number of pairs of the data = 20
4. Two series of X and Y with 50 items each have standard deviations 4.5 and 3.5 respectively.
If the sum of the products of deviations of X and Y series from their respective arithmetic
means be 420. Find the coefficient of correlation between X and Y.
5. Calculate the coefficient of correlation form the following data. Also develop a scatter
diagram for these data.
Price in Rs 2 4 6 10 16
Quantity in Kg 66 48 30 12 6
6. Calculate the coefficient of correlation from the following data. Also develop a scatter
diagram for these data.
X 12 9 8 10 11 13 7
Y 14 8 6 9 11 12 3
7. From the following pair of series, calculate Karl Pearson's coefficient of correlation.
Price per unit 8 10 15 17 20 22 24 25
Supply(in metric tons) 25 30 32 35 37 40 42 45
Also develop a scatter diagram for these data.
8. Compute Karl Pearson's coefficient of correlation from the following data. Also develop a
scatter diagram for these data.
X 45 55 56 58 60 65 68 70 75 80 85
Y 56 50 48 60 62 64 65 70 74 82 90
9. Calculate the coefficient of correlation from the following data. Also develop a scatter
diagram for these data.
X 400 500 600 700 800 900 1000 1100 1200
Y 200 250 300 350 400 450 500 550 600
10. The following are the marks obtained by eight students in Accounatncy and Business
Statistics.
Marks in Account 65 66 67 67 68 69 70 72
Marks in Statistics 67 68 65 68 72 72 69 71
Calculate the coefficient of correlation between marks of two subjects. Also develop a
scatter diagram for these data.
241
11. A researcher wants to find out if there is a relationship between the heights of the sons and the
heights of their fathers. In other words, do tall father have tall sons? He took a random sample
of 6 fathers and their 6 sons. Their heights in inches are given below:
Father (x): 63 65 66 67 67 68
Sons (y): 67 68 65 68 70 70
On the basis of the data,
a. Plot a scatter diagram
b. Base on scatter diagram, what kind of relationship exists between height of father and sons?
c. Compute the correlation coefficient and interpret the result.
d. Test the hypothesis that there positive correlation between these two variables at 5%
level of significance
12. Pepsi cola is studying the effect of its latest advertising campaign. People chosen at random
were called and asked how many bottles of Pepsi advertisements they had either read or seen
in the past week.
X (no. of ads.) 3 7 4 2 0 4 1 2
Y (bottles purchased) 11 18 9 4 7 6 3 8
a. Plot a scatter diagram
b. Base on scatter diagram, what kind of relationship exists between these two variables?
c. Compute the correlation coefficient and interpret the result.
d. Test the hypothesis that there positive correlation between these two variables at 5%
level of significance.
13. A random sample of 18 pairs of observations from a normal population gives a correlation
coefficient of 0.52. It is likely that the variables in the population are uncorrelated? Also
compute 98% confidence limits for the population correlation coefficient.
14. A study of the heights of 18 pairs of boys and girls working in a call center shows that the
coefficient of correlation is 0.5. Apply t-test to find whether correlation is significant.
15. A random sample of 13 pairs was drawn from a normal population and the coefficient of
correlation between the pairs was 0.6. Is this value significant of the existence of correlation
in the universe?
16. Find the least value of r (correlation coefficient) in a sample of 20 pair of
observations from a normal population that is significant at 5% level.
17. A financial analyst has gathered the following data about the relationship between income and
investment in securities in respect of 7 randomly selected families:
Income (in 000 Rs.) 20 30 10 12 15 25 34
Percent invested in securities 6 9 4 5 7 8 9
a. Develop the estimating linear equation that best describe these data.
b. To predict the percentage of income invested in securities by a family earning of Rs.
30000 annually.
c. Compute and interpret the value of correlation of determination.
d. Calculate the coefficient of correlation.
18. The marketing manager of a large supermarket chain would like to determine the effect of
shelf space on the sales of pet food. A random sample of 12 equal sized stores is selected with
the following results.
Store 1 2 3 4 5 6 7 8 9 10 11 12
Weekly sales Y (in
1.6 2.2 1.4 1.9 2.4 2.6 2.3 2.7 2.8 2.6 2.9 3.1
hundreds of $)
Self space (X) (feet) 5 5 5 10 10 10 15 15 15 20 20 20
242
a. Assuming a linear relationship, use the least squares method to find the best fitting
regression equation and hence compute the residual for store 6.
b. What percentage of the total variation in sales is explained by shelf space?
19. Suppose that you are interested in using past expenditure on R & D by a firm to predict
current expenditures on R & D. You got the following data by taking a random sample of
firms, which X is the amount spent on R & D (in 000Rs.) 5 year ago and Y is the amount
spent on R & D (in 000 Rs.) in the current year:
X: 10.05 10.80 11.05 10.80 12.20
244
b. Use the regression equation to estimate the number of motors rejected for an employee
with 3 weeks of experience in the job.
c. Compute and interpret the value of coefficient of determination.
d. Compute and interpret the standard error of estimate.
e. Test whether there is linear relationship between experience (X) and number of motors
rejected (Y) at 5% level of significance.
f. Set up null and alternative hypothesis, carryout F-test, and interpret the result at 5% level
of significance.
g. Set up a 95% confidence interval estimate for slope β1.
27. A statistician for American automobile manufacturer would like to develop a model for
predicting delivery time (the day between the ordering of the car and the actual delivery of the
car) of customer ordered new automobile. A random sample of 15 cars is selected with the
result is summarized in the following table.
Car 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No. of ordered (X) 3 4 4 7 7 8 9 11 12 12 14 16 20 23 25
Delivery (Y) 25 32 26 38 34 41 39 46 44 51 58 53 64 66 70
a. Find the least square regression line of equation of Y on X. Also predict the value of Y
When X = 8
b. How do you interpret a slope of the regression line?
c. Computer and interpret the value of coefficient of determination.
d. Compute and interpret the standard error of estimate.
e. Test whether there is linear relationship between ordered of cars (X) and delivery of cars
(Y) at 5% level of significance.
f. Set up null and alternative hypothesis, carryout F-test, and interpret the result at 5% level
of significance.
g. Set up a 95% confidence interval estimate for slope β1.
28. A galvanized rubber ball is heated to make it stronger. The more it is heated up to a level the
stronger it becomes. The following table represents the time in minutes the ball is heated and
its corresponding strength in the units of hardness.
Ball number 1 2 3 4 5 6 7 8 9 10
Time (X) 62 72 98 76 81 56 76 92 88 49
Hardness (Y) 112 124 131 117 132 96 120 136 97 85
a. Establish which one is independent variable and which one is dependent variable.
b. Find the regression equation of Y and X case.
c. Compute the expected hardness for a ball that is heated for 25 minutes.
d. Computer and interpret the value of coefficient of determination
e. Compute and interpret the standard error of estimate.
f. Test whether there is linear relationship between time (X) and hardness (Y) at 5% level
of significance.
g. Compute the overall F- test statistic and test at 5% level of significance.
h. Set up a 95% confidence interval estimate for slope β1.
245
1. r = 0.72 2. r = 0.5 3. r = 0.38 4. r = 0.53 5.r = -0.932
6. r = 0.95 7. r = 0.98 8.r = 0.92 9.r +1 10. 0.603
11. c. r = 0.53 d. t < tα(n – 2) i.e. 1.25< 2.776 so,H0 is accepted
12. c. r = 0.787 d. t ˃ tα(n – 2) i.e. 3.125 ˃ 2.447 so, H0 is rejected
13. t < tα(n – 2) i.e. 2.44< 2.583 so, H0 is accepted and the 98% confidence limits are(0.076, 0.964)
14. t ˃ tα(n – 2) i.e. 2.307 ˃ 2.12 so, H0 is rejected it means r is significant
15. t ˃ tα(n – 2) i.e. 3.32 ˃ 2.201 so, H0 is rejected it means r is significant
^
16. r ˃ 0.0877 17. a. Ŷ = 2.727+0.198X; b. Y = 8.667; c. r2 =0.866; d. r = 0.930
^
18. a.Y = 1.45+0.74X, residual e=0.41
b. r2 = 0.684. This is indicates 68.4% of the total variation in sales is explained by the
variation in shelf space.
^
19. a.Y = -17.075+3.723X
b. The slope of regression line b1=3.723 which indicates that there is on an average increase
of Y by 3.723 per unit changes the X. c. Syx = 2.0477
^
20. a.Y =6.4915X – 80.443 b. 244.132 c. Se=10.232
^
21. a.Y =14.95 – 1.24X b. 8.75 d. r2 =0.98
e. r = -0.98 f. Se = 0.67
^
22. a.Y = 1.1681+1.7156X, b. When points increase by 1000, pay is increase by 1715.6 units
c. Se= 0.3737 d. r2 = 0.9495 and r = 0.9744
^
23. b. Y = -0.42 + 0.12X c. Se=0.39 d. r2 = 0.782
^
24. $41560, Y = -2.46+0.71X, b. r2 = 0.8875 c. Se= 1.697
d. t ˃ tα(n – 2) i.e. 6.266 ˃ 2.571 so, H0 is rejected
e. F > Fα i.e. 39.44 ˃ 6.61 so, H0 is rejected
f. 95% confidence interval estimate are (0.4187, 1.001)
^ ^
25. a. Y = 2 + 0.75X, b. Y = 11.75 c. r2 = 0.5625 d. Se= 1.565
e. t ˃ tα(n – 2) i.e. 2.533 ˃ 2.015 so, H0 is rejected
f. F < Fα i.e. 1.285< 6.61 so, H0 is accepted
g. 95% confidence interval estimate are (0.1535, 1.346 )
^ ^
26. a. Y = 35.57- 1.4X, b. Y = 31approx. c. r2 = 0.8365 d. Se= 2.56
e. |t| > |tα, (n – 2)| i.e. 7.143 ˃ 2.228 so, H0 is rejected
f. F > Fα i.e. 5.116 > 4.96 so, H0 is rejected.
g. 95% confidence interval estimate are (25.67, 37.07 ) or (26, 37) approx.
^ ^
27. a. Y = 22.2123 + 2.0218X, and Y = 38.3867 c. r2 = 0.9268 d. Se= 3.43
e. t ˃ tα(n – 2) i.e. 15.19 ˃ 2.160 so, H0 is rejected
f. F ˃ Fα i.e. 51.627 ˃ 4.67 so, H0 is rejected.
g. 95% confidence interval estimate are (1.734, 2.309)
^ ^
28. b. Y = 56.5 + 0.78X, c. Y = 76 d. r2 = 0.51 e. Se= 12.86
f. |t| > |tα, (n – 2)| i.e. 2.89 ˃ 2.306 so, H0 is rejected
g. F > Fα i.e. 8.27 > 5.32 so, H0 is rejected.
h. 95% confidence interval estimate are (0.15378, 1.40262 )
246
Objective Questions
1. Karl Pearson's correlation coefficient can be used if the two variables are
a. non-linearly related b. linearly related
c. either a or b d. neither a or b
2. The range of simple correlation coefficient is
a. o to ∞ b. - ∞ to ∞ c. 0 to 1 d. -1 to 1
3. The unit of correlation coefficient is
a. km/hr b. percent c. non-existing d. none of the above
4. The geometric mean of two regression coefficient is
a. 1 b. 0 c. r2 d. r
5. Probable error is used for
a. measuring the error in r b. testing the significance of r
c. both a and b d. estimating the value of r
6. Correlation coefficient is independent of the change of
a. scale b. origin c. both a and b d. neither a and b
7. Spearman rank correlation coefficient lies between
a. -1 to 0 b. 0 to 1 c. -1 to +1 d. -3 to +3
8. The value of r2 lies between
a. -∞ to +∞ b. -1 to +1 c. 0 to 1 d. none of the above
9. In simple correlation coefficient the quantity r2 is known as
a. coefficient of determination b. coefficient of non-determination
c. coefficient of alienation d. none of the above
10. Coefficient of correlation was invented in the year
a. 1910 b. 1890 c. 1908 d. none of the above
11. The idea of product moment correlation was given by
a. R.A. Fisher b. Sir F. Galton c. Karl Pearson d. Spearman
12. The limits of population correlation coefficient are given by
a. r ± SE b. r ± PE c. r ± 6PE d. r ± 0.6745PE
13. Spearman's rank correlation coefficient can be calculated as
6ΣD2 6ΣD3 6ΣD3
a. 3 b. 1 – 3 c. 1– d. Both the a & c
n –n n –n n(n2 – 1)
14. The rank correlation coefficient of the following series is
R1 : 1, 2, 3, 4, 5 R2 : 5 4 3 2 1
a. + 1 b. 0 c. -1 d. None of these
15. The term regression was introduced by
a. R.A. Fisher b. Sir Francis Galton c. Karl Pearson d. Croxton and Cowden
16. A scatter diagram
a. is a statistical test b. must be linear
c. must be curvilinear d. is a graph of x and y values
17. If the relationship between variables x and y is linear, then the points on the scatter diagram
a. will fall exactly on a straight b. will fall on a curve
c. must represent population parameters d. are best represented by a straight line
18. If the relationship between x and y is positive, as variable y decreases, decreases, variable x
a. increases b. decreases c. remains same d. changes linearly
19. In a 'negative' relationship
a. as x increases, y increases b. as x decreases, y decreases
c. as x increases, y decreases d. both a. and b.
247
20. The lowest strength of association is reflected by which of the following correlation
coefficients?
a. 0.95 b. – 0.60 c. – 0.35 d. 0.29
21. The highest strength of association is reflected by which of the following correlation
coefficient?
a. – 1.0 b. – 0.95 c. 0.1 d. 0.85
22. There is a high inverse association between measures 'overweight' and 'life expenctancy'.
A correlation coefficient consistent with the above statement is:
a. r = 0.80 b. r = 0.20 c. r = –0.20 d. r = – 0.80
23. Of the following measurement levels which is the required level for the valid calculation of
the Pearson correlation coefficient
a. nominal b. ordinal c. internal d. ratio
24. Of the following measurement levels, which is required for the valid calculation of the
Spearman correlation coefficient?
a. nominal b. ordinal c. internal d. ratio
25. There is a high direct association between measures of 'cigarette smoking' and lung
damage'. The correlation coefficient consistent with the above statement is
a. 0.30 b. 0.80 c. – 0.80 d. – 0.30
26. The correlation coefficient appropriate for establishing the degree of correlation between
the two variables (assuming a linear relationship)
a. is determined by the sample size b. is Spearman's R
c. Pearson's r d. both b. and c.
27. When deciding which measure of correlation to employ with a specific set of data, you
should consider
a. whether the relationship is linear or nonlinear
b. the type of scale of measurement for each variable
c. both a. and b.
d. neither a. nor b.
28. The proportion of variance accounted for by the level of correlation between two
variables is calculated by
a. X b. r2 c. Σx d. not possible
29. The value of correlation coefficient
a. depends on the origin
b. depends on the unit of scale
c. depends on both origin and unit of scale
d. is independent with respect to origin and unit of scale
30. Which of the following statements is false?
a. In a prefect positive correlation, each individual obtains the same z value on each variable
b. Spearman's correlation coefficient is used when one or both variables are at least of interval scaling
c. The range of the correlation coefficient is from – 1 to + 1
d. A correlation of r = 0.85 implies a stronger association than r = 0.70
31. The strength of a linear relationship between two variables x and y is measured by
a. r b. r2 c. R2 d. bxy or byx
2
32. If value of r = 0.64, then what is the coefficient of correlation?
a. 0.40 b. 0.04 c. 0.80 d. 0.08
33. If both dependent and independent variables increase in an estimating equation, then
coefficient of correlation falls in the range
a. – 1 ≤ r ≤ 1 b. 0 ≤ r ≤ 1 c. – 3 ≤ r ≤ 3 d. none of these
248
34. If unexplained variation between variables x and y is 0.25, then r2 is
a. 0.25 b. 0.50 c. 0.75 d. none of these
35. What type of relationship between the two variables is indicated by the sign or r?
a. direct relation b. indirect relation c. both a. and b. d. none of these
36. If X and Y are two variables, there can be at most
a. one regression line b. Two regression line
c. Three regression line d. an infinite number of regression line
37. If bxy is less than unity, the byx is
a. less than 1 b. greater than 1 c. equal to 1 d. equal to 0
38. If r ± 1, the two lines of regression are
a. coincident b. parallel
c. perpendicular to each other d. none of the above
39. In the regression line y = a+bx, b is called the
a. slope of the line b. intercept of the line
c. neither a nor b d. both a and b
40. If the signs of the regression coefficients are negative then the sign of the correlation
coefficient be
a. positive b. negative c. both a and b d. neither a nor b
41. If byx is positive then bxy must be
a. positive b. negative c. both a and b d. neither a nor b
42. The two regression lines are given as x+2y-5 = 0 and 2x+3y-8 = 0 then the mean values of x
and y respectively are
a. 2, 1 b. 1, 2 c. 2, 5 d. 2, 3
–– ––
43. The regression line of Y on X is 2x+3y = 8 and X = 1 the value of Y is
a. 1 b. 2 c. 3 d. 4
44. Standard Error of the estimate can be calculated as
SSx – b1 SSxy SSy – b1 SSxy
a. Se = b. Se =
n–2 n–2
SSy – b0 SSxy SSxy – SSx
c. Se = d. Se =
n–2 n–2
45. Coefficient of determination can be calculated as
SSR SSE
a. r2 = b. r2 = 1 –
SST SST
–2
b0ΣY – b1 ΣXY – n Y
c. r2 = d. all of the above
–
ΣY2 – nY2
46. If SSR = 36 and SSE = 4, The value of SST is
a. 40 b. 32 c. 80 d. none of the above
1. b 2. d 3. d 4. d 5. b 6. c 7. c 8. c 9. a 10.b 11. c
12. b 13. d 14. b 15. b 16. d 17. d 18.b 19. c 20. d 21.a 22. d
23.c 24.c 25.b 26.c 27.c 28.b 29.d 30b 31.a 32.c 33.d
34.c 35. d 36. b 37. b 38. a 39. a 40.b 41. a 42. b 43. b 44. b
45. d 46. a
249
Bibliography
Levine, Stephan, Krehbiel and Berenson (2008), Statistics for Managers using Microsoft excel, 5th
edition, Now Delhi : Prentice Hall of India.
Douglas A Lind, William G Marchal, Samuel A Wathen (2008), Statistical Techniques in Business
and Economics, 13th edition, Tata McGraw-Hill India.
Levin, Richard I. and David S Rubin : Statistics for management, 7th edition, Prentice-Hall of India.
Berenson. Mark L. and David M. Levine: Business Statistics: Concepts and Applications, Prentice-
Hall, Inc.
Gupta, C.B (1998) "Statistical Method" Vikas Publishing House Pvt. Ltd. Delhi, India.
Elhance, D.N. and Agrawal, B.M (1999) "Fundamental of Statistics" Kitab Mahal, Allahabad,
India.
Croxton and Cowden, "Practical Business Statistics", Pretince Hall, London.
Sharma, Dr. Prem (2013) "Social Science Research Methodology" Kshitiz Prakashan, Kirtipur,
Kathmandu.
Shrestha Dr. Sunity and Amatya Sunil (2004) "Quantitative Technique for Business Studies" Ratan
Pustak Bhandar. Kathmandu.
Singh, Manjeet (1998) "Refresher course in Mathematics" Vol. I and II, Dhanpath Rai and Sons,
New Sadak, Delhi, India.
Subedi, Puspa Kamal (2002) "Social Research in Thesis Writing" Buddha Academic Enterprises Pvt.
Ltd. Kathmandu.
Aryal, J.P. and A. Gautam, Quantitative Techniques, Vidharthy Pustak Bhandar, Kathmandu, Nepal.
Beri G.C., Statistics of Management, Tata McGraw-Hill. 2003.
Bez K., Quantitative Techniques in Economics, Kalyani Publishers, New Delhi, Inida.
Gupta S.C. and V.K. Kapoor, Fundamentals of mathematical Statistics, Sultan Chand and Sons,
New Delhi, India.
Gupta S.P., Statisticals Method, Sultan Chand and Sons, New Delhi, India.
Kapoor J.N. and H.C. Sexena, Mathematical Statistics, S. Chand and Company Ltd., India.
Kothari, C.R., Quantitative Techniques, Vikash Publishing Housa, India.
Vohra N.D., Quantitative Techniques in Management, Tata McGraw Hill, New Delhi, 1995.
Yamane T., Statistics, An Introductory Analysis, Harper and Row, New York Mathematics of
Economists, India.
Acharya, K.P, Katuwal, B., Yadav, A.K. (2011), Statistical Methods, Kathmandu: Dhaulagari
Books and Stationary.
Sharma, P.K. and Chadhari, A.K. (2004), Statistical Methods, Kathmandu: Khanal Book Prakashan.
Sharma, S. and Silwal, D. (2000), Statistics for Management, Kathmandu: Taleju Prakashan.
Sthapit, A.B, Gautam, H. Josi, P. R., Dangol, P. M. (2007), Statistical Methods, 4th Kathmandu.
Buddha academic Publisher and distributors P. Ltd.
250
Table 1: Area under Normal Curve
Entry represents area under the standardized normal
distribution from the mean to Z
–∞ ∞
Z .00 .01 .02 .03 .04 .05 .06 .07 O Z .08 .09
0.0 .000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2518 .2549
0.7 .2580 .2612 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .288 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
0.9 .3519 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .44410
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4482 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4878 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4918
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 4.965 .4966 .4987 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .49865 .49869 .49874 .49878 .49882 .49886 .49889 .49893 .49897 .49900
3.1 .49903 .49906 .49910 .49913 .49916 .49918 .49921 .49924 .49926 .44929
3.2 .49931 .49934 .49936 .49938 .49940 .49942 .49944 .49946 .49948 .49950
3.3 .49952 .49953 .49955 .49957 .49958 .49960 .49961 .49962 .49964 .49965
3.4 .49966 .49968 .49969 .49970 .49971 .49972 .49973 .49974 .49975 .49976
3.5 .4997 .49978 .49978 .49979 .49980 .49981 .49981 .49982 .49983 .49983
3.6 .49984 .46685 .49985 .49986 .49986 .49987 .49987 .49988 .49988 .49989
3.7 .49989 .49990 .49990 .49990 .49991 .49991 .49992 .49992 .49992 .49992
3.8 .49993 .49993 .49993 .49994 .49994 .49994 .49994 .49995 .19995 .49995
3.9 .49995 .49995 .49996 .49996 .49996 .49996 .499996 .49996 .49997 .49997
251
Table 2:
Tail area (Probability) under standard normal probability
curve from Z to ∞ i.e. P (Z > Zcal.) = p0, the probability abilities
associated with values as extreme as observed values of Z in the –∞ ∞
X–µ O Z
standard normal distribution. Where Z =
σ
Entry represents area under the standardized normal distribution from Z to ∞
Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641
0.1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247
0.2 .4207 .4168 .4129 .409 .4052 .4013 .3974 .3936 .3897 .3859
0.3 .3821 .3783 .3745 .370 .3669 .3632 .3594 .3557 .3520 .3121
0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3195 .3156 .2776
0.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 .2810 .2451
0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2148
0.7 .2420 .2389 2358 .2327 .2296 .2266 .2236 .2206 .2177 .1867
0.8 .2119 .2090 2061 .2033 .2005 .1977 .1949 .1922 .1894 .1611
0.9 .1841 .1814 1788 .1762 .1736 .1711 .1685 .1660 .1635 .1379
1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1171
1.1 .1357 .1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .0985
1.2 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0823
1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0681
1.4 .0808 .0793 .0778 .0764 .0749 .0735 .0721 .0708 .0694 .0559
1.5 .0668 .0655 .0643 .0630 .0680 .0606 .0594 .0582 .0571 .0455
1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0367
1.7 .0446 .0436 .0427 .0480 .0409 .0401 .0392 .0384 .0375 .0294
1.8 .0359 .0351 0.344 .0336 .0329 .0322 .0314 .0307 .0301 .0233
1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0183
2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0143
2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0110
2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0084
2.3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0064
2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0048
2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 0036
2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0026
2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0019
2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0014
2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0010
3.0 .0013 .0013 .0013 .0012 .0012 .0011 .0011 .0011 .0010 .0007
3.1 .0010 .0009 .0009 .0009 .0008 .0008 .0008 .0008 .0007
3.2 .0007
3.3 .0005
3.4 .0003
3.5 .00023
3.6 .00016
3.7 .00011
3.8 .00007
3.9 .00005
4.0 .00003
252
Table 3:
Critical Values of Students t-distribution
Level of significance for one-tailed test
Df .10 .05 .025 .01 .005 .0005
Level of significance for two-tailed test
.20 .10 .05 .02 .01 .001
1 3.078 6.314 12.706 31.821 63.657 636.619
2 1.886 2.920 4.303 6.965 9.925 31.598
3 1.638 2.353 3.182 4.541 5.841 12.941
4 1.533 2.132 2.776 3.747 4.804 8.610
5 1.476 2.015 2.571 3.365 4.032 6.859
6 1.440 1.943 2.447 3.143 3.707 5.959
7 1.415 1.895 2.365 2.998 3.499 5.405
8 1.397 1.860 2.306 2.896 3.355 5.041
9 1.383 1.833 2.262 2.821 3.250 4.781
10 1.372 1.812 2.228 2.764 3.169 4.587
11 1.363 1.796 2.201 2.718 3.106 4.437
12 1.356 1.782 2.179 2.681 3.055 4.318
13 1.350 1.771 2.160 2.650 3.012 4.221
14 1.345 1.761 2.145 2.624 2.977 4.140
15 1.341 1.753 2.131 2.602 2.947 4.073
16 1.337 1.746 2.120 2.583 2.921 4.015
17 1.333 1.740 2.110 2.567 2.898 3.965
18 1.330 1.734 2.101 2.552 2.878 3.922
19 1.328 1.729 2.093 2.539 2.881 3.883
20 1.325 1.725 2.086 2.528 2.845 3.850
21 1.323 1.721 2.080 2.218 2.831 3.819
22 1.321 1.717 2.074 2.508 2.819 3.792
23 1.319 1.714 2.069 2.500 2.807 3.767
24 1.318 1.711 2.064 2.492 2.797 3.745
25 1.316 1.708 2.060 2.485 2.787 3.726
26 1.315 1.706 2.056 2.479 2.779 3.707
27 1.314 1.703 2.052 2.473 2.771 3.690
28 1.313 1.701 2.048 2.467 2.763 3.674
29 1.311 1.699 2.045 2.462 2.758 3.659
30 1.310 1.697 2.042 2.457 2.750 3.646
40 1.303 1.684 2.021 2.423 2.704 3.551
60 1.296 1.671 2.000 2.390 2.660 3.460
120 1.289 1.658 1.980 2.358 2.617 3.373
∞ 1.282 1.645 1.960 2.326 2.576 3.291
253
Table 4:
ν) distribution: P[t(ν
One-sided P-values from t(ν ν) > t], df = ν
ν 1 2 3 4 5 6 7 8 9 10 11 12
1.30 0.209 0.162 0.142 0.132 0.125 0.121 0.117 0.115 0.113 0.111 0.110 0.109
1.32 0.206 0.159 0.139 0.129 0.122 0.117 0.114 0.112 0.110 0.108 0.107 0.106
1.34 0.204 0.156 0.136 0.126 0.119 0.114 0.111 0.109 0.107 0.105 0.104 0.103
1.36 0.202 0.153 0.134 0.123 0.116 0.111 0.108 0.105 0.103 0.102 0.101 0.099
1.38 0.200 0.151 0.131 0.120 0.113 0.108 0.105 0.102 0.100 0.099 0.097 0.096
1.40 0.197 0.148 0.128 0.117 0.110 0.106 0.102 0.100 0.098 0.096 0.095 0.093
1.42 0.195 0.146 0.125 0.114 0.107 0.103 0.099 0.097 0.095 0.093 0.092 0.091
1.44 0.193 0.143 0.123 0.112 0.105 0.100 0.097 0.094 0.092 0.090 0.089 0.088
1.46 0.191 0.141 0.120 0.109 0.102 0.097 0.094 0.091 0.089 0.087 0.086 0.085
1.48 0.189 0.139 0.118 0.106 0.099 0.095 0.091 0.089 0.087 0.085 0.083 0.082
1.50 0.187 0.136 0.115 0.104 0.097 0.092 0.089 0.086 0.084 0.082 0.081 0.080
1.52 0.185 0.134 0.113 0.102 0.094 0.090 0.086 0.083 0.081 0.080 0.078 0.077
1.54 0.183 0.132 0.111 0.099 0.092 0.087 0.084 0.081 0.079 0.077 0.076 0.075
1.56 0.181 0.130 0.108 0.097 0.090 0.085 0.081 0.079 0.077 0.075 0.074 0.072
1.58 0.180 0.127 0.106 0.095 0.0087 0.083 0.079 0.076 0.074 0.073 0.071 0.070
1.60 0.178 0.125 0.104 0.092 0.085 0.080 0.077 0.074 0.072 0.070 0.069 0.068
1.62 0.176 0.123 0.102 0.090 0.083 0.078 0.075 0.072 0.070 0.068 0.067 0.066
1.64 0.174 0.121 0.100 0.088 0.081 0.076 0.073 0.0070 0.068 0.066 0.065 0.063
1.66 0.173 0.119 0.098 0.086 0.079 0.074 0.070 0.068 0.066 0.064 0.063 0.061
1.68 0.171 0.117 0.096 0.084 0.077 0.072 0.068 0.066 0.064 0.062 0.061 0.059
1.70 0.169 0.116 0.094 0.082 0.075 0.070 0.066 0.064 0.062 0.060 0.059 0.057
1.72 0.168 0.114 0.092 0.080 0.073 0.068 0.065 0.062 0.060 0.058 0.057 0.056
1.74 0.166 0.112 0.090 0.078 0.071 0.066 0.063 0.060 0.058 0.056 0.055 0.054
1.76 0.164 0.110 0.088 0.077 0.069 0.064 0.061 0.058 0.056 0.054 0.053 0.052
1.78 0.163 0.109 0.087 0.075 0.068 0.063 0.059 0.056 0.054 0.053 0.051 0.050
1.80 0.161 0.107 0.085 0.073 0.066 0.061 0.057 0.055 0.053 0.051 0.050 0.049
1.82 0.160 0.105 0.083 0.071 0.064 0.059 0.056 0.053 0.051 0.049 0.048 0.047
1.84 0.158 0.104 0.082 0.070 0.063 0.058 0.054 0.052 0.049 0.048 0.046 0.045
1.86 0.157 0.102 0.080 0.068 0.061 0.056 0.053 0.050 0.048 0.046 0.045 0.044
1.88 0.156 0.100 0.078 0.067 0.059 0.055 0.051 0.048 0.046 0.045 0.043 0.042
1.90 0.154 0.099 0.077 0.065 0.058 0.053 0.050 0.047 0.045 0.043 0.042 0.041
1.92 0.153 0.097 0.075 0.064 0.056 0.052 0.048 0.046 0.044 0.042 0.041 0.039
1.94 0.151 0.096 0.074 0.062 0.055 0.050 0.047 0.044 0.042 0.041 0.039 0.038
1.96 0.150 0.095 0.072 0.061 0.054 0.049 0.045 0.043 0.041 0.039 0.038 0.037
1.98 0.149 0.093 0.071 0.060 0.052 0.048 0.044 0.042 0.040 0.038 0.037 0.036
2.00 0.148 0.092 0.070 0.059 0.051 0.046 0.043 0.040 0.038 0.037 0.035 0.034
254
2.02 0.146 0.090 0.068 0.058 0.050 0.045 0.042 0.039 0.037 0.035 0.034 0.033
2.04 0.145 0.089 0.067 0.057 0.049 0.044 0.040 0.038 0.036 0.034 0.033 0.032
2.06 0.144 0.088 0.066 0.055 0.048 0.043 0.039 0.037 0.035 0.033 0.032 0.031
2.08 0.143 0.087 0.065 0.054 0.047 0.041 0.038 0.036 0.034 0.032 0.031 0.030
2.10 0.141 0.085 0.063 0.053 0.046 0.040 0.037 0.034 0.033 0.031 0.030 0.029
2.12 0.140 0.084 0.062 0.052 0.045 0.039 0.036 0.033 0.032 0.030 0.029 0.028
2.14 0.139 0.083 0.061 0.051 0.044 0.038 0.035 0.032 0.031 0.029 0.028 0.027
2.16 0.138 0.082 0.060 0.050 0.043 0.037 0.034 0.031 0.030 0.028 0.027 0.026
2.18 0.137 0.081 0.059 0.048 0.042 0.036 0.033 0.030 0.029 0.027 0.026 0.025
2.20 0.136 0.079 0.058 0.047 0.041 0.035 0.032 0.029 0.028 0.026 0.025 0.024
2.22 0.135 0.078 0.057 0.046 0.040 0.034 0.031 0.029 0.027 0.025 0.024 0.023
2.24 0.134 0.077 0.055 0.045 0.039 0.033 0.030 0.028 0.026 0.025 0.023 0.022
2.26 0.133 0.076 0.054 0.044 0.038 0.032 0.029 0.027 0.025 0.024 0.023 0.022
2.28 0.132 0.075 0.053 0.043 0.037 0.0031 0.028 0.026 0.024 0.023 0.022 0.021
2.30 0.131 0.074 0.052 0.042 0.036 0.031 0.027 0.025 0.023 0.022 0.021 0.020
2.32 0.130 0.073 0.052 0.041 0.035 0.030 0.027 0.024 0.023 0.021 0.020 0.019
2.34 0.129 0.072 0.051 0.040 0.034 0.029 0.026 0.024 0.022 0.021 0.020 0.019
2.36 0.128 0.071 0.050 0.039 0.033 0.028 0.025 0.023 0.021 0.020 0.019 0.018
2.38 0.127 0.070 0.049 0.038 0.032 0.027 0.024 0.022 0.021 0.019 0.018 0.017
2.40 0.126 0.069 0.048 0.037 0.031 0.027 0.024 0.022 0.020 0.019 0.018 0.017
2.42 0.125 0.068 0.047 0.036 0.030 0.026 0.023 0.021 0.019 0.018 0.017 0.016
2.44 0.124 0.067 0.046 0.036 0.029 0.025 0.022 0.020 0.019 0.017 0.016 0.016
2.46 0.123 0.067 0.045 0.035 0.029 0.025 0.022 0.020 0.018 0.017 0.016 0.015
2.48 0.122 0.066 0.045 0.034 0.028 0.024 0.021 0.019 0.017 0.016 0.015 0.014
2.50 0.121 0.065 0.044 0.033 0.027 0.023 0.020 0.018 0.017 0.016 0.015 0.014
2.52 0.120 0.064 0.043 0.033 0.027 0.023 0.020 0.018 0.016 0.015 0.014 0.013
2.54 0.119 0.063 0.042 0.032 0.026 0.022 0.019 0.017 0.016 0.015 0.014 0.013
2.56 0.119 0.062 0.042 0.031 0.025 0.021 0.019 0.017 0.015 0.014 0.013 0.013
2.58 0.118 0.062 0.041 0.031 0.025 0.021 0.018 0.016 0.015 0.014 0.013 0.012
2.60 0.117 0.061 0.040 0.030 0.024 0.020 0.018 0.016 0.014 0.013 0.012 0.012
2.70 0.113 0.057 0.037 0.027 0.021 0.018 0.015 0.014 0.012 0.011 0.010 0.010
2.80 0.109 0.054 0.034 0.024 0.019 0.016 0.013 0.012 0.010 0.009 0.009 0.008
2.90 0.106 0.051 0.031 0.022 0.017 0.014 0.011 0.010 0.009 0.008 0.007 0.007
3.00 0.102 0.048 0.029 0.020 0.015 0.012 0.010 0.009 0.007 0.007 0.006 0.006
3.20 0.096 0.043 0.025 0.016 0.012 0.009 0.008 0.006 0.005 0.005 0.004 0.004
3.40 0.091 0.038 0.021 0.014 0.010 0.007 0.006 0.005 0.004 0.003 0.003 0.003
3.60 0.086 0.035 0.018 0.011 0.008 0.006 0.004 0.003 0.003 0.002 0.002 0.002
3.80 0.082 0.031 0.016 0.010 0.006 0.004 0.003 0.003 0.002 0.002 0.001 0.001
4.00 0.078 0.029 0.014 0.008 0.005 0.004 0.003 0.002 0.002 0.001 0.001 <0.001
255
df = ν
ν 13 14 15 16 17 18 19 20 21 22 23 24
1.30 0.108 0.107 0.107 0.106 0.105 0.105 0.105 0.104 0.104 0.104 0.103 0.103
1.32 0.105 0.104 0.103 0.103 0.102 0.102 0.101 0.101 0.101 0.100 0.100 0.100
1.34 0.102 0.101 0.100 0.099 0.099 0.098 0.098 0.098 0.097 0.097 0.097 0.096
1.36 0.098 0.098 0.097 0.096 0.096 0.095 0.095 0.094 0.094 0.094 0.094 0.093
1.38 0.095 0.095 0.094 0.093 0.093 0.092 0.092 0.091 0.091 0.091 0.090 0.090
1.40 0.092 0.092 0.091 0.090 0.090 0.089 0.089 0.088 0.088 0.088 0.087 0.087
1.42 0.090 0.089 0.088 0.087 0.087 0.086 0.086 0.086 0.085 0.085 0.085 0.084
1.44 0.087 0.086 0.085 0.085 0.084 0.084 0.083 0.083 0.082 0.082 0.082 0.081
1.46 0.084 0.083 0.082 0.082 0.081 0.081 0.080 0.080 0.080 0.079 0.079 0.079
1.48 0.081 0.081 0.080 0.079 0.079 0.078 0.078 0.077 0.077 0.077 0.076 0.076
1.50 0.079 0.078 0.077 0.077 0.076 0.075 0.075 0.075 0.074 0.074 0.074 0.073
1.52 0.076 0.075 0.075 0.074 0.073 0.073 0.072 0.072 0.072 0.071 0.071 0.071
1.54 0.074 0.073 0.072 0.072 0.071 0.070 0.070 0.070 0.069 0.069 0.069 0.068
1.56 0.071 0.071 0.070 0.069 0.069 0.068 0.068 0.067 0.067 0.067 0.066 0.066
1.58 0.069 0.068 0.067 0.067 0.066 0.066 0.065 0.065 0.065 0.064 0.064 0.064
1.60 0.067 0.066 0.065 0.065 0.064 0.064 0.063 0.063 0.062 0.062 0.062 0.061
1.62 0.065 0.064 0.063 0.062 0.062 0.061 0.061 0.060 0.060 0.060 0.059 0.059
1.64 0.062 0.062 0.061 0.060 0.060 0.059 0.059 0.058 0.058 0.058 0.057 0.057
1.66 0.060 0.060 0.059 0.058 0.058 0.057 0.057 0.056 0.056 0.056 0.055 0.055
1.68 0.058 0.058 0.057 0.056 0.056 0.055 0.055 0.054 0.054 0.054 0.053 0.053
1.70 0.056 0.056 0.055 0.054 0.054 0.053 0.053 0.052 0.052 0.052 0.051 0.051
1.72 0.055 0.054 0.053 0.052 0.052 0.051 0.051 0.050 0.050 0.050 0.049 0.049
1.74 0.053 0.052 0.051 0.051 0.050 0.049 0.049 0.049 0.048 0.048 0.048 0.047
1.76 0.051 0.050 0.049 0.049 0.048 0.048 0.047 0.047 0.046 0.046 0.046 0.046
1.78 0.049 0.048 0.048 0.047 0.046 0.046 0.046 0.045 0.045 0.044 0.044 0.044
1.80 0.048 0.047 0.046 0.045 0.045 0.044 0.044 0.043 0.043 0.043 0.042 0.042
1.82 0.046 0.045 0.044 0.044 0.043 0.043 0.042 0.042 0.042 0.041 0.041 0.041
1.84 0.044 0.044 0.043 0.042 0.042 0.041 0.041 0.040 0.040 0.040 0.039 0.039
1.86 0.043 0.042 0.041 0.041 0.040 0.040 0.039 0.039 0.038 0.038 0.038 0.038
1.88 0.041 0.041 0.040 0.039 0.039 0.038 0.038 0.037 0.037 0.037 0.036 0.036
1.90 0.040 0.039 0.038 0.038 0.037 0.037 0.036 0.036 0.036 0.035 0.035 0.035
1.92 0.039 0.038 0.037 0.036 0.036 0.035 0.035 0.035 0.034 0.034 0.034 0.033
1.94 0.037 0.036 0.036 0.035 0.035 0.034 0.034 0.033 0.033 0.033 0.032 0.032
1.96 0.036 0.035 0.034 0.034 0.033 0.033 0.032 0.032 0.032 0.031 0.031 0.031
1.98 0.035 0.034 0.033 0.033 0.032 0.032 0.031 0.031 0.030 0.030 0.030 0.030
2.00 0.033 0.033 0.032 0.031 0.031 0.030 0.030 0.030 0.029 0.029 0.029 0.028
2.02 0.032 0.031 0.031 0.030 0.030 0.029 0.029 0.028 0.028 0.028 0.028 0.027
256
2.04 0.031 0.030 0.030 0.029 0.029 0.028 0.028 0.027 0.027 0.027 0.026 0.026
2.06 0.030 0.029 0.0029 0.028 0.028 0.027 0.027 0.026 0.026 0.026 0.025 0.025
2.08 0.029 0.028 0.028 0.027 0.026 0.026 0.026 0.025 0.025 0.025 0.024 0.024
2.10 0.028 0.027 0.027 0.026 0.025 0.025 0.025 0.024 0.024 0.024 0.023 0.023
2.12 0.027 0.026 0.026 0.025 0.025 0.024 0.024 0.023 0.023 0.023 0.022 0.022
2.14 0.026 0.025 0.025 0.024 0.024 0.023 0.023 0.022 0.022 0.022 0.021 0.021
2.16 0.025 0.024 0.024 0.023 0.023 0.022 0.022 0.022 0.021 0.021 0.020 0.020
2.18 0.024 0.023 0.023 0.022 0.022 0.021 0.021 0.021 0.020 0.020 0.019 0.020
2.20 0.023 0.023 0.022 0.021 0.021 0.021 0.020 0.020 0.020 0.019 0.018 0.019
2.22 0.022 0.022 0.021 0.020 0.020 0.020 0.019 0.019 0.019 0.019 0.018 0.018
2.24 0.022 0.021 0.020 0.019 0.019 0.019 0.019 0.018 0.018 0.018 0.017 0.017
2.26 0.021 0.020 0.020 0.018 0.019 0.018 0.018 0.018 0.017 0.017 0.016 0.017
2.28 0.020 0.019 0.019 0.018 0.018 0.018 0.017 0.017 0.017 0.016 0.015 0.016
2.30 0.019 0.019 0.018 0.017 0.017 0.017 0.016 0.016 0.016 0.016 0.015 0.015
2.32 0.019 0.018 0.017 0.016 0.017 0.016 0.016 0.016 0.015 0.015 0.014 0.015
2.34 0.018 0.017 0.017 0.016 0.016 0.016 0.015 0.015 0.015 0.014 0.014 0.014
2.36 0.017 0.017 0.016 0.016 0.015 0.015 0.015 0.014 0.014 0.014 0.013 0.013
2.38 0.017 0.016 0.016 0.015 0.015 0.014 0.014 0.014 0.013 0.013 0.012 0.013
2.40 0.016 0.015 0.015 0.014 0.014 0.014 0.013 0.013 0.013 0.013 0.012 0.012
2.42 0.015 0.015 0.014 0.014 0.014 0.013 0.013 0.013 0.012 0.012 0.011 0.012
2.44 0.015 0.014 0.014 0.013 0.013 0.013 0.012 0.012 0.012 0.012 0.011 0.011
2.46 0.014 0.014 0.013 0.013 0.012 0.012 0.012 0.012 0.011 0.011 0.010 0.011
2.48 0.014 0.013 0.013 0.012 0.012 0.012 0.011 0.011 0.011 0.011 0.010 0.010
2.50 0.013 0.013 0.012 0.012 0.011 0.011 0.011 0.011 0.010 0.010 0.010 0.010
2.52 0.013 0.012 0.012 0.011 0.011 0.011 0.010 0.010 0.010 0.010 0.009 0.009
2.54 0.012 0.012 0.011 0.011 0.011 0.010 0.010 0.010 0.010 0.009 0.009 0.009
2.56 0.012 0.011 0.011 0.010 0.010 0.010 0.010 0.009 0.009 0.009 0.008 0.009
2.58 0.011 0.011 0.010 0.010 0.010 0.009 0.009 0.009 0.009 0.009 0.008 0.008
2.60 0.011 0.010 0.010 0.010 0.009 0.009 0.009 0.009 0.008 0.008 0.007 0.008
2.70 0.009 0.009 0.008 0.008 0.008 0.007 0.007 0.007 0.007 0.007 0.006 0.006
2.80 0.008 0.007 0.007 0.006 0.006 0.006 0.006 0.006 0.005 0.005 0.005 0.005
2.90 0.006 0.006 0.005 0.005 0.005 0.005 0.005 0.004 0.004 0.004 0.004 0.004
3.00 0.005 0.005 0.004 0.004 0.004 0.004 0.004 0.004 0.003 0.003 0.003 0.003
3.10 0.004 0.004 0.004 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.002
3.20 0.003 0.003 0.003 0.003 0.003 0.002 0.002 0.002 0.002 0.002 0.002 0.002
3.30 0.003 0.003 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002
3.40 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.001 0.001 0.001 0.001 0.001
3.50 0.002 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001 <.001 <.001
257
df = ν
ν 25 26 27 28 29 30 40 50 60 80 100 ∞
1.30 0.103 0.103 0.102 0.102 0.102 0.102 0.101 0.100 0.099 0.099 0.098 0.097
1.32 0.099 0.099 0.099 0.099 0.099 0.098 0.097 0.096 0.096 0.095 0.095 0.093
1.34 0.096 0.096 0.096 0.096 0.095 0.095 0.094 0.093 0.093 0.092 0.092 0.090
1.36 0.093 0.093 0.093 0.092 0.092 0.092 0.091 0.090 0.089 0.089 0.088 0.087
1.38 0.090 0.090 0.089 0.089 0.089 0.089 0.088 0.087 0.086 0.086 0.085 0.084
1.40 0.087 0.087 0.086 0.086 0.086 0.086 0.085 0.084 0.083 0.083 0.082 0.081
1.42 0.084 0.084 0.084 0.083 0.083 0.083 0.082 0.081 0.080 0.080 0.079 0.078
1.44 0.081 0.081 0.081 0.080 0.080 0.080 0.079 0.078 0.078 0.077 0.076 0.075
1.46 0.078 0.078 0.078 0.078 0.078 0.077 0.076 0.075 0.075 0.074 0.074 0.072
1.48 0.076 0.075 0.075 0.075 0.075 0.075 0.073 0.073 0.072 0.071 0.071 0.069
1.50 0.073 0.073 0.073 0.072 0.072 0.072 0.071 0.070 0.069 0.069 0.068 0.067
1.52 0.071 0.070 0.070 0.070 0.070 0.069 0.068 0.067 0.067 0.066 0.066 0.064
1.54 0.068 0.068 0.068 0.067 0.067 0.067 0.066 0.065 0.064 0.064 0.063 0.062
1.56 0.066 0.065 0.065 0.065 0.065 0.065 0.063 0.063 0.062 0.061 0.061 0.059
1.58 0.063 0.063 0.063 0.063 0.062 0.062 0.031 0.060 0.060 0.059 0.059 0.057
1.60 0.061 0.061 0.061 0.060 0.060 0.060 0.059 0.058 0.057 0.057 0.056 0.055
1.62 0.059 0.059 0.058 0.058 0.058 0.058 0.057 0.056 0.055 0.055 0.054 0.053
1.64 0.057 0.057 0.056 0.056 0.056 0.056 0.054 0.054 0.053 0.052 0.052 0.051
1.66 0.055 0.054 0.054 0.054 0.054 0.054 0.052 0.052 0.051 0.050 0.050 0.048
1.68 0.053 0.052 0.052 0.052 0.052 0.052 0.050 0.050 0.049 0.048 0.048 0.046
1.70 0.051 0.051 0.050 0.050 0.050 0.050 0.048 0.048 0.047 0.047 0.046 0.045
1.72 0.049 0.049 0.048 0.048 0.048 0.048 0.047 0.046 0.045 0.045 0.044 0.043
1.74 0.047 0.047 0.047 0.046 0.046 0.046 0.045 0.044 0.043 0.043 0.042 0.041
1.76 0.045 0.045 0.045 0.045 0.044 0.044 0.043 0.042 0.042 0.041 0.041 0.039
1.78 0.044 0.043 0.043 0.043 0.043 0.043 0.041 0.041 0.040 0.039 0.039 0.038
1.80 0.042 0.042 0.042 0.041 0.041 0.041 0.040 0.039 0.038 0.038 0.037 0.036
1.82 0.040 0.040 0.040 0.040 0.040 0.039 0.038 0.037 0.037 0.036 0.036 0.034
1.84 0.039 0.039 0.038 0.038 0.038 0.038 0.037 0.036 0.035 0.035 0.034 0.033
1.86 0.037 0.037 0.037 0.037 0.037 0.036 0.035 0.034 0.034 0.033 0.033 0.031
1.88 0.036 0.036 0.035 0.035 0.035 0.035 0.034 0.033 0.032 0.032 0.032 0.030
1.90 0.035 0.034 0.034 0.034 0.034 0.034 0.032 0.032 0.031 0.031 0.030 0.029
1.92 0.033 0.033 0.033 0.033 0.032 0.032 0.031 0.030 0.030 0.029 0.029 0.027
1.94 0.032 0.032 0.031 0.031 0.031 0.031 0.030 0.029 0.029 0.028 0.028 0.026
1.96 0.031 0.030 0.030 0.030 0.030 0.030 0.028 0.028 0.027 0.027 0.026 0.025
1.98 0.029 0.029 0.029 0.029 0.029 0.028 0.027 0.027 0.026 0.026 0.025 0.024
2.00 0.028 0.028 0.028 0.028 0.027 0.027 0.026 0.025 0.025 0.024 0.024 0.023
258
2.02 0.027 0.027 0.027 0.027 0.026 0.026 0.025 0.024 0.024 0.023 0.023 0.022
2.04 0.026 0.026 0.026 0.025 0.025 0.025 0.024 0.023 0.023 0.022 0.022 0.021
2.06 0.025 0.025 0.025 0.024 0.024 0.024 0.023 0.022 0.022 0.021 0.021 0.020
2.08 0.024 0.024 0.024 0.023 0.023 0.023 0.022 0.021 0.021 0.020 0.020 0.019
2.10 0.023 0.023 0.023 0.022 0.022 0.022 0.021 0.020 0.020 0.019 0.019 0.018
2.12 0.022 0.022 0.022 0.022 0.021 0.021 0.020 0.019 0.019 0.019 0.018 0.017
2.14 0.021 0.021 0.021 0.021 0.020 0.020 0.019 0.019 0.018 0.018 0.017 0.016
2.16 0.020 0.020 0.020 0.020 0.020 0.019 0.018 0.018 0.017 0.017 0.017 0.015
2.18 0.019 0.019 0.019 0.019 0.019 0.019 0.018 0.017 0.017 0.016 0.016 0.015
2.20 0.019 0.018 0.018 0.018 0.018 0.018 0.017 0.016 0.016 0.015 0.015 0.014
2.22 0.018 0.018 0.018 0.017 0.017 0.017 0.016 0.015 0.015 0.015 0.014 0.013
2.24 0.017 0.017 0.017 0.017 0.016 0.016 0.015 0.015 0.014 0.014 0.014 0.013
2.26 0.016 0.016 0.016 0.016 0.016 0.016 0.015 0.014 0.014 0.013 0.013 0.012
2.28 0.016 0.016 0.015 0.015 0.015 0.015 0.014 0.013 0.013 0.013 0.012 0.011
2.30 0.015 0.015 0.015 0.015 0.014 0.014 0.013 0.013 0.012 0.012 0.012 0.011
2.32 0.014 0.014 0.014 0.014 0.014 0.014 0.013 0.012 0.012 0.011 0.011 0.010
2.34 0.014 0.014 0.013 0.013 0.013 0.013 0.012 0.012 0.011 0.011 0.011 0.010
2.36 0.013 0.013 0.013 0.013 0.013 0.012 0.012 0.011 0.011 0.010 0.010 0.009
2.38 0.013 0.012 0.012 0.012 0.012 0.012 0.011 0.011 0.010 0.010 0.010 0.009
2.40 0.012 0.012 0.012 0.012 0.012 0.011 0.011 0.010 0.010 0.009 0.009 0.008
2.42 0.012 0.011 0.011 0.011 0.011 0.011 0.010 0.010 0.009 0.009 0.009 0.008
2.44 0.011 0.011 0.011 0.011 0.011 0.010 0.010 0.009 0.009 0.008 0.008 0.007
2.46 0.011 0.010 0.010 0.010 0.010 0.010 0.009 0.009 0.008 0.008 0.008 0.007
2.48 0.010 0.010 0.010 0.010 0.010 0.009 0.009 0.008 0.008 0.008 0.007 0.007
2.50 0.010 0.010 0.009 0.009 0.009 0.009 0.008 0.008 0.008 0.007 0.007 0.006
2.52 0.009 0.009 0.009 0.009 0.009 0.009 0.008 0.007 0.007 0.007 0.007 0.006
2.54 0.009 0.009 0.009 0.008 0.008 0.008 0.008 0.007 0.007 0.007 0.006 0.006
2.56 0.008 0.008 0.008 0.008 0.008 0.008 0.007 0.007 0.007 0.006 0.006 0.005
2.58 0.008 0.008 0.008 0.008 0.008 0.008 0.007 0.006 0.006 0.006 0.006 0.005
2.60 0.008 0.008 0.007 0.007 0.007 0.007 0.006 0.006 0.006 0.006 0.005 0.005
2.70 0.006 0.006 0.006 0.006 0.006 0.006 0.005 0.005 0.004 0.004 0.004 0.003
2.80 0.005 0.005 0.005 0.005 0.004 0.004 0.004 0.004 0.003 0.003 0.003 0.003
2.90 0.004 0.004 0.004 0.004 0.004 0.003 0.003 0.003 0.003 0.002 0.002 0.002
3.00 0.003 0.003 0.003 0.003 0.003 0.003 0.002 0.002 0.002 0.002 0.002 0.001
3.10 0.002 0.002 0.002 0.002 0.002 0.002 0.0020 0.002 0.001 0.001 0.001 <.001
3.20 0.002 0.002 0.002 0.002 0.002 0.002 0.001 0.001 0.001 <.001 <.001 <.001
3.30 0.001 0.001 0.001 0.001 0.001 0.001 0.001 <.001 <.001 <.001 <.001 <.001
3.40 0.001 0.001 0.001 0.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001
259
Note: This table is applied only when 1.30 ≤ tcal. ≤ 4.00 for 1 to 12 df. and 1.30 ≤ tcal. ≤ 3.50 for 13 to ∞ df.
Table 5:
Significant values of chi-square
distribution with ν d.f. at α-level i.e P(χ2 > χα ) ₌
2
O ∞
χ2(∝,v)
d.f. Level of significance
0.995 0.990 0.975 0.950 0.900 0.100 0.050 0.025 0.010 0.005
1 0.0000393 0.000157 0.000982 0.000393 0.0158 2.71 3.84 5.02 6.64 7.88
2 0.0100 0.0201 0.0506 0.103 0.211 4.61 5.99 7.38 9.21 10.59
3 0.0717 0.115 0.216 0.352 0.584 6.25 7.81 9.35 11.35 12.84
4 0.207 0.297 0.484 0.711 1.064 7.78 9.49 11.14 13.28 14.86
5 0.412 0.554 0.831 1.15 1.61 9.24 11.07 12.83 15.09 16.75
6 0.676 0.672 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55
7 0.990 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28
8 1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.96
9 1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59
10 2.16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19
11 2.60 3.05 3.82 4.58 5.58 17.28 19.68 21.92 24.73 26.76
12 3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30
13 3.57 4.11 5.01 5.90 7.04 19.81 22.36 24.74 27.89 29.82
14 4.07 4.66 5.63 6.57 7.79 21.06 23.68 26.12 29.14 31.32
15 4.60 5.23 6.26 7.26 8.55 22.31 24.99 27.49 30.58 32.80
16 5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.27
17 5.70 6.41 7.56 8.67 10.08 24.77 27.59 30.19 33.41 35.72
18 6.26 7.01 8.23 9.39 10.86 25.99 28.87 31.53 34.81 37.16
19 6.84 7.63 8.91 10.12 11.62 27.20 30.14 32.85 36.19 38.58
20 7.43 8.26 9.59 10.85 12.44 28.41 31.41 34.17 37.57 40.00
21 8.03 8.90 10.28 11.59 13.24 29.62 32.67 35.48 38.93 41.40
22 8.64 9.54 10.98 12.34 14.04 30.81 33.92 36.78 40.29 42.80
23 9.26 10.20 11.69 13.09 14.85 32.01 35.17 38.08 41.64 44.18
24 9.89 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98 45.56
25 10.52 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93
26 11.16 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64 48.29
27 11.81 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.65
28 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 51.00
29 13.12 4.26 16.05 17.71 19.77 39.09 42.56 45.72 46.59 52.34
30 13.79 14.95 16.79 18.49 20.60 40.26 13.77 46.98 50.89 53.67
40 20.71 22.16 24.43 26.51 29.05 51.81 55.76 59.34 63.69 66.77
50 27.99 29.71 32.36 34.76 37.69 63.17 67.50 71.42 76.15 79.49
60 35.53 37.48 40.48 43.19 46.46 74.40 79.08 63.30 88.38 91.95
70 43.27 45.44 48.76 51.74 55.33 85.53 90.53 95.02 100.43 104.22
80 51.17 53.54 57.15 60.39 64.28 96.58 101.88 163.63 112.33 116.32
90 59.19 61.75 65.65 69.13 73.29 107.57 113.15 118.14 124.12 128.30
100 67.33 70.06 74.22 77.93 82.36 118.50 124.34 129.56 135.81 140.17
260
Table 6:
Significant values (points) of F (variance ratio) at α = 0.01 with ν1 and ν2 degrees of freedom
v1
1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∞
v2
1 4052 5000 5403 5625 5764 5859 5928 5982 6022 6056 6106 6157 6209 6235 6261 6287 6313 6339 6366
2 98.5 99.0 99.2 99.2 99.3 99.3 99.4 99.4 99.4 99.4 99.4 99.4 99.4 99.5 99.5 99.5 99.5 99.5 995
3 34.1 30.8 29.5 28.7 28.2 27.9 27.7 27.5 27.3 27.2 27.1 26.9 26.4 26.6 26.5 26.4 26.3 26.2 26.1
4 21.2 18.0 16.7 16.0 15.2 15.2 15.0 14.8 14.7 14.5 14.4 14.2 14.0 13.9 13.8 13.7 13.7 13.6 13.5
5 16.3 13.3 12.1 11.4 11.0 10.7 10.5 10.9 10.2 10.1 9.89 9.72 9.55 9.47 9.38 9.29 9.20 9.11 9.02
6 13.7 10.9 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.72 7.56 7.40 7.31 7.23 7.14 7.06 6.97 6.88
7 12.2 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.47 6.31 6.16 6.07 5.99 5.91 5.82 5.74 5.65
8 11.3 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.67 5.52 5.36 5.28 5.20 5.12 5.03 4.95 4.86
9 10.6 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 5.11 4.96 4.81 4.73 4.65 4.57 4.48 4.40 4.31
10 10.0 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.71 5.56 4.41 4.33 4.25 4.17 4.08 4.00 3.91
11 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54 4.40 4.25 4.10 4.02 3.94 3.86 3.78 3.69 3.60
12 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30 4.16 4.01 3.86 3.78 3.70 3.62 3.54 3.45 3.36
13 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10 3.96 3.82 3.66 3.59 3.51 3.43 3.34 3.25 3.17
14 8.88 6.51 5.56 5.04 4.70 4.46 4.28 4.14 4.03 3.94 3.80 3.66 3.51 3.43 3.35 3.27 3.18 3.09 3.00
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.67 3.52 3.37 3.29 3.21 3.13 3.05 2.96 2.87
16 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.55 3.41 3.26 3.18 3.10 3.02 2.93 2.84 2.75
17 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.46 3.31 3.16 3.08 3.00 2.92 2.83 2.75 2.65
18 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 3.51 3.37 3.23 3.08 3.00 2.92 2.84 2.75 2.66 2.57
19 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.30 3.15 3.00 2.92 2.84 2.76 2.67 2.58 2.48
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.23 3.09 2.94 2.86 2.78 2.69 2.61 2.52 2.42
21 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.40 3.31 3.17 3.03 2.88 2.80 2.72 2.64 2.55 2.46 2.36
22 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 3.12 2.98 2.83 2.75 2.67 2.58 2.50 2.40 2.31
23 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 3.07 2.93 2.78 2.70 2.62 2.54 2.45 2.35 2.26
24 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17 3.03 2.89 2.74 2.66 2.58 2.49 2.40 2.31 2.21
25 7.77 5.57 4.68 4.18 3.66 3.63 3.46 3.32 3.22 3.13 2.99 2.85 2.70 2.62 2.54 2.45 2.36 2.27 2.17
26 7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.18 3.09 2.96 2.82 2.66 2.58 2.50 2.42 2.33 2.23 2.13
27 7.68 5.49 4.80 4.11 3.78 3.56 3.39 3.26 3.15 3.06 2.93 2.78 2.63 2.55 2.47 2.38 2.29 2.20 2.10
28 7.64 5.45 4.57 4.07 3.75 3.53 3.36 3.23 3.12 3.03 2.90 2.75 2.60 2.52 2.44 2.35 2.26 2.17 2.06
29 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.09 3.00 2.87 2.73 2.57 2.49 2.41 2.33 2.23 2.14 2.03
30 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98 2.84 2.70 2.55 2.47 2.39 2.30 2.21 2.11 2.01
40 7.31 5.18 4.31 3.83 351 3.29 3.12 2.99 2.89 2.80 2.66 2.52 2.37 2.29 2.20 2.11 2.02 1.92 1.80
60 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.820 2.72 2.63 2.50 2.35 2.20 2.12 2.03 1.94 1.84 1.73 1.60
120 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47 2.34 2.19 2.03 1.95 1.86 1.76 1.66 1.53 1.38
∞ 6.63 4.61 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32 2.18 2.04 1.88 1.79 1.70 1.59 1.47 1.32 1.00
261
Table 7
Significant values (points) of F (variance ratio) at α = 0.05 with ν1 and ν2 degrees of freedom
v1
1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∞
v2
1 161 200 216 225 230 234 237 239 241 242 244 246 248 249 250 251 252 253 254
2 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.5 19.5 19.5 19.5 19.5 19.5
3 10.1 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.63
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.36
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.28 3.22 3.15 3.12 3.08 3.04 3.00 2.97 2.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.91 2.84 2.77 2.74 2.70 2.66 2.62 2.58 2.54
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.40
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2.30
13 4.67 3.81 3.41 3.16 3.03 2.92 2.83 2.77 2.71 2.67 2.60 2.53 2.46 2.42 2.38 2.34 2.30 2.25 2.21
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.13
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.48 2.40 2.33 2.29 2.25 2.20 2.16 2.11 2.07
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.01
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.56 2.49 2.45 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 1.96
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.92
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.84
21 3.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.81
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.78
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 2.20 2.13 2.05 2.00 1.96 1.91 1.86 1.81 1.76
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.18 2.11 2.03 1.98 1.94 1.89 1.84 1.79 1.73
25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.71
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.15 2.07 1.99 1.95 1.90 1.85 1.80 1.75 1.69
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20 2.13 2.06 1.97 1.93 1.88 1.84 1.79 1.73 1.67
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19 2.12 2.04 1.96 1.91 1.87 1.82 1.77 1.71 1.65
29 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.18 2.10 2.03 1.94 1.90 1.85 1.81 1.75 1.70 1.64
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.09 2.01 1.93 1.89 1.84 1.79 1.74 1.68 1.62
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 2.00 1.92 1.84 1.79 1.74 1.69 1.64 1.58 1.51
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 1.92 1.84 1.75 1.70 1.65 1.59 1.53 1.47 1.39
120 3.92 3.07 2.88 2.45 2.29 2.18 2.09 2.02 1.96 1.91 1.83 1.75 1.66 1.61 1.56 1.50 1.43 1.35 1.25
∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83 1.75 1.67 1.57 1.52 1.46 1.39 1.32 1.22 1.00
262
Table 8:
Critical Values for the U Test
Values of U0.10
n1
2 3 4 5 6 7 8 9 10 11 12 13 14 15
n2
2 0 0 0 1 1 1 1 2 2 3 3
3 0 0 1 2 2 3 4 4 5 5 6 7 7
4 0 1 2 3 4 5 6 7 8 9 10 11 12
5 0 1 2 4 5 6 8 9 11 12 13 15 16 18
6 0 2 3 5 7 8 10 12 14 16 17 19 21 23
7 0 2 4 6 8 11 13 15 17 19 21 24 26 28
8 1 3 5 8 10 13 15 18 20 23 26 28 31 33
9 1 4 6 9 12 15 18 21 24 27 30 33 36 39
10 1 4 7 11 14 17 20 24 27 31 34 37 41 44
11 1 5 8 12 16 19 23 27 31 34 38 42 46 50
12 2 5 9 13 17 21 26 30 34 38 42 47 51 55
13 2 6 10 15 19 24 28 33 37 42 47 51 56 61
14 3 7 11 16 21 26 31 36 41 46 51 56 61 66
15 3 7 12 18 23 28 33 39 44 50 55 61 66 72
Values of U0.05
n2
2 3 4 5 6 7 8 9 10 11 12 13 14 15
n1
2 0 0 0 0 1 1 1 1
3 0 1 1 2 2 3 3 4 4 5 5
4 0 1 2 3 4 4 5 6 7 8 9 10
5 0 1 2 3 5 6 7 8 9 11 12 13 14
6 1 2 3 5 6 8 10 11 13 14 16 17 19
7 1 3 5 6 8 10 12 14 16 18 20 22 24
8 0 2 4 6 8 10 13 15 17 19 22 24 26 29
9 0 2 4 7 10 12 15 17 20 23 26 28 31 34
10 0 3 5 8 11 14 17 20 23 26 29 30 36 39
11 0 3 6 9 13 16 19 23 26 30 33 37 40 44
12 1 4 7 11 14 18 22 26 29 33 37 41 45 49
13 1 4 8 12 16 20 24 28 30 37 41 45 50 54
14 1 5 9 13 17 22 26 31 36 40 45 50 55 59
15 1 5 10 14 19 24 29 34 39 44 49 54 59 64
263
Values of U0.02
n2
2 3 4 5 6 7 8 9 10 11 12 13 14 15
n1
2 0 0 0
3 0 0 1 1 1 2 2 2 3
4 0 1 1 2 3 3 4 5 5 6 7
5 0 1 2 3 4 5 6 7 8 9 10 11
6 1 2 3 4 6 7 8 9 11 12 13 15
7 0 1 3 4 6 7 9 11 13 14 16 17 19
8 0 2 4 6 7 9 11 13 15 17 20 22 24
9 1 3 5 7 9 11 14 16 18 21 23 26 28
10 1 3 6 8 11 13 16 19 22 24 27 30 33
11 1 4 7 9 12 15 18 22 25 28 31 34 37
12 2 5 8 11 14 17 21 24 28 31 35 38 42
13 0 2 5 9 12 16 20 23 27 31 35 39 43 47
14 0 2 6 10 13 17 22 26 30 34 38 43 47 51
15 0 3 7 11 15 19 24 28 33 37 42 47 51 56
Values of U0.01
n2
3 4 5 6 7 8 9 10 11 12 13 14 15
n1
3 0 0 0 1 1 1 2
4 0 0 1 1 2 2 3 3 4 5
5 0 1 1 2 3 4 5 6 7 7 8
6 0 1 2 3 4 5 6 7 9 10 11 12
7 0 1 3 4 6 7 9 10 12 13 15 16
8 1 2 4 6 7 9 11 13 15 17 18 20
9 0 1 3 5 7 9 11 13 16 18 20 22 24
10 0 2 4 6 9 11 13 16 18 21 24 26 29
11 0 2 5 7 10 13 16 18 21 24 27 30 33
12 1 3 6 9 12 15 18 21 24 27 31 34 37
13 1 3 7 10 13 17 20 24 27 31 34 38 42
14 1 4 7 11 15 18 22 26 30 34 38 42 46
15 2 5 8 12 16 20 24 29 33 37 42 46 51
264
Table 9
Probabilities associated with values as large as observed values of Kruskul-Wallis H
Statistic i.e. p0 = P(H > H*) where H* = Hcal
sample sizes sample sizes
n1 n2 n3 H P n1 n2 n3 H P
2 1 1 2.7000 .500 4 3 2 6.4444 .008
6.3000 .011
2 2 2 3.6000 .200 5.4444 .046
5.4000 .051
2 2 2 4.5714 .067 4.5111 .098
3.7143 .200 4.4444 .102
3 1 1 3.2000 .300 4 3 3 6.7455 .010
6.7455 .010
3 2 1 4.2857 .100 6.7091 .013
3.8571 .100 5.7909 .046
3 2 2 5.3572 .029 4.7091 .092
4.7143 .048 4,7000 .101
4.5000 .067
4.4643 .105 4 4 1 6.6667 .010
6.1667 .022
3 3 1 5.1429 .043 4.9667 .048
4.5714 .100 4.8667 .054
4.0000 .129 4.1667 .082
3 3 2 6.2500 .011 4.0667 .102
5.3611 .032 4 4 2 7.0364 .006
5.1389 .061 6.8727 .011
4.5556 .100 5.4545 .046
4.2500 .121 5.2364 .052
3 3 3 7.2000 .004 4.5545 .098
6.4889 .011 4.4455 .103
5.6889 .029 4 4 3 7.1439 .010
5.6000 .050 7.1364 .011
5.0667 .086 5.5985 .049
4.6222 .100 5.5758 .051
4 1 1 3.5714 .200 4.5455 .099
4 2 1 4.8214 .057 4.4773 .102
4.5000 .076 4 4 4 7.6538 .008
4.0179 .114 5.6923 .049
4 2 2 6.0000 .014 5.6923 .049
5.3333 .033 5.6538 .054
5.1250 .052 4.6539 .097
4.4583 .100 4.5001 .104
4.1667 .105 5 1 1 3.8571 .143
4 3 1 5.8333 .021 5 2 1 5.2500 .036
5.2083 .050 5.0000 .048
5.2083 .050 5.0000 .048
5.0000 .057 4.4500 .071
4.0556 .093 4.2000 .095
3.8889 .129 4.0500 .119
265
5 2 2 6.5333 .008 5 4 4 7.7604 .009
6.1333 .013 7.7440 .011
5.1600 .034 5.6571 .049
5.0400 .056 5.6176 .050
4.3733 .090 4.6187 .100
4.2933 .122 4.5527 .102
5 3 1 6.4000 .012 5 5 1 7.3091 .009
4.9600 .048 6.8364 .011
4.8711 .052 5.1273 .046
4.0178 .095 4.9091 .053
3.8400 .123 4.1091 .086
5 3 2 6.9091 .009 4.0364 .105
6.8281 .010 5 2 2 7.3385 .010
5.2509 .049 7.2692 .010
5.1055 .052 5.3385 .047
4.6509 .091 5.2464 .051
4.4945 .101 4.6231 .097
5 3 3 7.0788 .009 4.5077 .100
6.9818 .011 5 5 3 7.5780 .010
5.6485 .049 7.5429 .010
5.5152 .051 5.7055 .046
4.5333 .097 5.6264 .051
44121 .109 4.5451 .100
5 4 1 6.9545 .008 4.5363 .102
6.8400 .011 5 5 4 7.8229 .010
4.9855 .044 7.7914 .010
4.8600 .056 5.6657 .049
3.9873 .098 5.6429 .050
3.9600 .102 4.5229 .099
5 4 2 7.2045 .009 4.5200 .101
7.1182 .010 5 5 5 8.0000 .009
5.2727 .049 7.9800 .010
5.2682 .050 5.7800 .049
4.5409 .098 5.6600 .051
4.5182 .101 4.5600 .100
5 4 3 7.4449 .010 4.5000 .102
7.3949 .011
5.6564 .049
5.6308 .050
4.5487 .099
4.5231 .103
266
Far Western University
End-Term Examination - 2075
Business Statistics II (MGT 341)
Faculty: Management (BBA) Full Marks: 80
Level: Undergraduate Time: 2hrs. 40Minutes
Semester: Fourth
Candidates are required to give their answers in their own words as far as practicable. The figures
in the margin indicate full marks.
Group 'B'
Attempt any six questions. 6 × 8 = 48
1. What do you understand by test of significance? State the general procedure of testing on
hypothesis.
2. Define type-I and type II errors in the hypothesis testing procedure. It is claimed that a
random sample of 100 tyres with mean tread life of 15,131 km. is drawn from a population of
tyres that has mean tread life of 15,200 km. and standard deviation of 1,248 km. Test the
validity of this claim. Using the 0.05 level of significance.
3. A machine produced 20 defective articles in a batch of 400. After overhauling it produced 10
defectives in a batch of 300. Has the machine improved? Making decision through critical
value approach and p-value approach.
4. In a manufacturing company, the new modern manager is in a belief that music enhances the
productivity of the workers. He made observation on 8 workers for a work and recorded the
production before, and after the music was installed. From the data given below, can one
conclude that productivity has been changed due to music?
Employee 1 2 3 4 5 6 7 8
Without 220 202 226 190 200 215 208 210
music
With music 236 190 240 200 220 205 212 215
5. The following tables gives the data on occupation of fathers and that of sons:
Occupation of sons
Occupation of Fathers
Agriculture Service Business
Agriculture 212 130 74
Service 64 148 72
Business 160 84 201
Test whether there is any association between the occupation of fathers and that of sons. Give
2
0.05(4) = 9.49
268
Far Western University
End-Term Examination-2075
Business Statistics II (MGT 341)
Faculty: Management (BBA) Full Marks: 20
Level: Undergraduate Time: 20 Minutes
Semester: Fourth
Group 'A'
√) the best answer.
K (√
1. Theory of Estimation is expounded by
a. Karl Pearson b. Bowley c. Spear Men d. Fisher
2. A specific observed value of a statistic used to estimate the population parameter is
called
a. point estimate b. interval estimate
c. estimator d. none of the above
3. For sample –x = 100, the upper limit of a 90% confidence interval for µ is 112 what is the
lower limit of this confidence interval?
a. 100 b. 110 c. 88 d. 92
4. Which of the following is called β - error?
a. accepting H0 while H0 was false
b. accepting H0 while H0 was true
c. rejecting H0 while H0 was true
d. rejecting H0 while H0 was false
5. The null hypothesis H0 of p-value approaches is rejected if
a. P – value ≥ ∝ b. P – value = ∝ c. P – value ≤ ∝ d. P – value > ∝
6. For a two tailed test of hypothesis at ∝ = 0.1, then acceptance region is the entire region.
a. to the right of the negative critical value
b. between the two critical values
c. outside of the two critical values
d. to the left of the positive critical value
7. For, test of hypothesis H0: µ1 ≥ µ2 and the alternative hypothesis can be
a. H 1: µ 1 = ≠ µ 2 b. H 1: µ 1 = µ 2 c. H 1: µ 1 > µ 2 d. H 1: µ 1 < µ 2
8. When null hypothesis is H0: µ = 9, the alternative hypothesis can be
a. H 1: µ > 9 b. H 1: µ ≠ 9 c. H 1: µ < 9 d. all of the above
9. For two tailed test the value of Z∝ at ∝ = 0.01 is
a. 1.96 b. 2.575 c. 2.33 d. 1.64
269
10. Which of the following are basis assumptions of t-test?
a. The two populations are equal
b. the two samples are random ones
c. the two populations have the same variance
d. all of the above
11. Let n1 = 13, S1 = 17, n2 = 9, S2 = 22 then the combined sample variance (S2) is
a. 19 b. 361 c. 367 d. 19.5
12. Which of the following is not non-parametric test?
a. x2 – test b. Kruskal-Walls H test
c. Mann-Whitney U-test d. F-test
13. In the K-W test of K sample, the appropriate number of degree of freedom is
a. k b. k-1 c. nk-1 d. n-k
14. Degree of freedom for chi-square in case of contingency table of order (4 × 3) are:
a. 12 b. 9 c. 8 d. 6
15. The test analysis of variance concern with
a. proportion b. mean c. variance d. all of the above
16. Which of the following is necessary for using ANOVA?
a. The population is continuous b. The population has median
c. The population is symmetric d. All of the above
17. The sum of square due to error can be obtained from the equation.
a. SSE = TSS + SSR b. SSE = SSC + SSR – TSS
c. SSE = SSC + SSR d. SSE = TSS – SSC – SSR
18. Scatter diagram of the variate values (x,y) is given the idea about.
a. functional relationship b. regression model
c. distribution of errors d. none of the above.
19. In a regression line of y on x, the variable x is known as.
a. dependent variable b. explanatory variable
c. regressor d. all of the above
20. Coefficient of determination can be calculated as
SSR SSR
a. r2 = b. r2 = 1 –
TSS TSS
b0∑y + b1∑xy – ny2
c. r2 = d. all of the above
∑y2 – ny2
270