Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

Module_4_Class

Class Notes by professor for Module 4 of BCS301 CSE

Uploaded by

BreadBeau
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module_4_Class

Class Notes by professor for Module 4 of BCS301 CSE

Uploaded by

BreadBeau
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Module - 4 : Statistical Inference - II

(Sampling Variables, Central Limit Theorem, and Confidence Limits for


Unknown Mean)

Dr. P. Rajendra

Professor, Dept. of Maths

CMRIT, Bengaluru.

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 1 / 17
Sampling Variables: predict the value of a variable in a population using
sample data.For example, Estimating the average accounts receivable from
a sample of customer balances. With sampling, we derive:
n
1X
Sample Mean = X̄ = Xi
n
i=1

This sample mean provides an estimate for the population mean µ, with
some uncertainty.
Central Limit Theorem (CLT):
Given a population with mean µ and standard deviation σ, the sample
mean X̄ of size n follows:
 
σ
X̄ ∼ N µ, √
n

Standard Error: σX̄ = √σn


As n increases, the distribution of the sample mean approaches a normal
distribution, even if the population is not normally distributed.
Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 2 / 17
Figure: Central Limit Theorem in Action - Distribution of sample means
approaches normal distribution as n increases.

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 3 / 17
Example of CLT:
A sample of size n = 50 is taken from a population with:
Population mean µ = 100
Population standard deviation σ = 15
Solution:
15
Standard Error = √ ≈ 2.12
50
The sample mean X̄ will follow N(100, 2.12).

Confidence Interval(CI)
A confidence interval provides a range of values within which the true
population mean µ is likely to lie. Formula for CI is given by
σ
X̄ ± Z · √
n

where Z is the critical value from the standard normal distribution.

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 4 / 17
Confidence Level Z-value
99% 2.58
95% 1.96
90% 1.645
Example of CI for Unknown Mean: A sample of n = 36 students has a
mean score X̄ = 85 with a standard deviation s = 10. Find the 95%
confidence interval for the population mean.
Solution:
10
CI = 85 ± 1.96 · √ = 85 ± 1.96 · 1.67 = 85 ± 3.27
36
95% CI = (81.73, 88.27)

Description Population Sample


Size N n
X̄ = n1P ni=1 Xi
P
Mean µ
1
Variance σ2 s 2 = n−1 (Xi − X̄ )2
q
1 P
Standard Deviation σ s = n−1 (Xi − X̄ )2

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 5 / 17
Problem 1: State the Central Limit Theorem. Use CLT to evaluate
P[50 < X̄ < 56], where X̄ represents the mean of a random sample of size
100 from an infinite population with Mean µ = 53 Variance σ 2 = 400.
Solution:
Central Limit Theorem (CLT): Let X1 , X2 , . . . , Xn be a random sample
of size n drawn from a population with mean µ and variance σ 2 . As the
sample size n becomes large, the sampling distribution of the sample mean
X̄ approaches a normal distribution with mean µ and variance σ 2 /n,
regardless of the shape of the original population distribution.

Mathematically:
σ2
 
X̄ ∼ N µ, for large n.
n
We need to evaluate P[50 < X̄ < 56], where n = 100, µ = 53, σ 2 = 400
⇒ σ = 20. Using CLT, the sampling distribution of the sample mean X̄ is:
 
σ 20
X̄ ∼ N µ = 53, √ = √ =2 .
n 100

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 6 / 17
Standardize the bounds using the z-score formula:

X̄ − µ
Z=
√σ
n

For X̄ = 50:
50 − 53
Z1 = = −1.5
2
For X̄ = 56:
56 − 53
Z2 = = 1.5
2
Therefore:
P(50 < X̄ < 56) = P(−1.5 < Z < 1.5)
= 2 × P(0 < Z < 1.5) = 2 × (0.4332) = 0.8664
The probability that the sample mean X̄ lies between 50 and 56 is:

P(50 < X̄ < 56) = 0.8664

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 7 / 17
Problem 2: An unknown distribution has a mean of 90 and a standard
deviation of 15. Samples of size n = 25 are drawn randomly from the
population. Find the probability that the sample mean is between 85 and
92.

Solution: Given:
Population mean: µ = 90
Population standard deviation: σ = 15
Sample size: n = 25
Using the Central Limit Theorem, the sampling distribution of the sample
mean X̄ is:
σ2
 
15
X̄ ∼ N µ = 90, = √ =3 .
n 25

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 8 / 17
Standardize the bounds using the z-score formula:

X̄ − µ
Z=
√σ
n

For X̄ = 85:
85 − 90
Z1 = = −1.67
3
For X̄ = 92:
92 − 90
Z2 = = 0.67
3
Therefore:
P(85 < X̄ < 92) = P(−1.67 < Z < 0.67)
= P(−1.67 < Z < 0) + P(0 < Z < 0.67)
= 0.4514 + 0.2454 = 0.6965
The probability that the sample mean X̄ lies between 85 and 92 is:

P(85 < X̄ < 92) = 0.6965

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 9 / 17
Problem 3: An electrical firm manufactures light bulbs with a lifespan
that is approximately normally distributed with mean (µ) 800 hours
Standard deviation (σ) 40 hours. If a random sample of 16 bulbs is
selected, what is the probability that the sample mean lifespan will be less
than 775 hours?
Solution: To solve this, we use the Central Limit Theorem (CLT) which
states that the sampling distribution of the sample mean follows a normal
distribution with:
σ
µX̄ = µ = 800 and σX̄ = √
n
where σX̄ is the standard error of the mean n = 16 is the sample size
40 40
σX̄ = √ = = 10
16 4
We need to find the probability that the sample mean X̄ is less than 775
hours. This corresponds to:

P(X̄ < 775)


Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 10 / 17
We standardize this to a Z -score:
X̄ − µX̄ 775 − 800 −25
Z= = = = −2.5
σX̄ 10 10

Using the standard normal distribution table, we find the cumulative


probability for Z = −2.5.

P(Z < −2.5) = 0.0062


Thus, the probability that the sample mean life of the 16 bulbs is less than
775 hours is:
P(X̄ < 775) = 0.0062
The probability that a random sample of 16 light bulbs will have an
average lifespan of less than 775 hours is 0.0062 (or 0.62%). This
indicates that such an outcome is quite unlikely.

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 11 / 17
Problem 4: A random sample of size 64 is taken from an infinite
population having mean 112 and variance 144. Using the Central Limit
Theorem, find the probability of getting the sample mean X̄ greater than
114.5.

Solution:
By the Central Limit Theorem, the distribution of the sample mean X̄
follows:
σ2
 
X̄ ∼ N µ,
n
Given µ = 112, σ 2 = 144, and n = 64, the standard error of the mean
is:
σ 12 12
σX̄ = √ = √ = = 1.5
n 64 8

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 12 / 17
Convert 114.5 to the corresponding Z -score:

X̄ − µ 114.5 − 112 2.5


Z= = = = 1.66
σX̄ 1.5 1.5

Using standard normal distribution tables, P(Z > 1.66) corresponds


to:

P(Z > 1.66) = 0.5 − P(0 ≤ Z ≤ 1.66) = 0.5 − 0.4515 = 0.0485

Thus, the probability of getting a sample mean X̄ > 114.5 is:

P(X̄ > 114.5) = 0.0485

Hence, the likelihood of observing such a sample mean is


approximately 4.85%.

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 13 / 17
Problem 5:
The mean and standard deviation (SD) of the diameters of a sample
of 250 rivet heads manufactured by a company are given as:
Mean (µ) = 7.2642 mm, SD (σ) = 0.0058 mm.
Find the confidence limits for the mean diameter at the following
confidence levels:(i) 99% (ii) 98% (iii) 95% (iv) 90%

Solution: We know that


σ
C .I . = µ ± Z √
n
Z-values:
Confidence Level Z-value
99% 2.58
98% 2.33
95% 1.96
90% 1.645
Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 14 / 17
Calculating the Standard Error:
σ 0.0058
√ = √ = 0.000367
n 250
99% Confidence Interval:
7.2642 ± (2.58 × 0.000367) = 7.2642 ± 0.00094
(7.26326, 7.26514)
98% Confidence Interval:
7.2642 ± (2.33 × 0.000367) = 7.2642 ± 0.00086
(7.26334, 7.26504)
95% Confidence Interval:
7.2642 ± (1.96 × 0.000367) = 7.2642 ± 0.00073
(7.26347, 7.26493)
90% Confidence Interval:
7.2642 ± (1.645 × 0.000367) = 7.2642 ± 0.00061 = (7.26359, 7.26481)
Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 15 / 17
Problem 6:
A sample 10, 12, 16, 19 is taken from a normal population with a
known variance of 6.25. Find the 95% confidence interval for the
population mean.
Solution: Given,
Sample: 10, 12, 16, 19
Sample size: n = 4
Mean of the sample:
10 + 12 + 16 + 19
X̄ = = 14.25
4

Population variance: σ 2 = 6.25 ⇒ σ = 6.25 = 2.5
Confidence level: 95% ⇒ Z = 1.96 (from normal distribution table)
We know that,
 
σ 2.5
C .I . = X̄ ± Z √ = 14.25 ± 1.96 × √ = 14.25 ± (1.96 × 1.25)
n 4
= 14.25 ± 2.45 = (14.25 − 2.45, 14.25 + 2.45) = (11.80, 16.70)
Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 16 / 17
Assignment Problems

(1). Let the observed value of the mean X̄ of a random sample of size 20
from a normal distribution with mean µ and variance σ 2 = 80 be 81.2.
Find a 90% and 95% confidence interval for µ.

(2). A random sample of size 25 from a normal distribution (σ 2 = 4)


yields a sample mean X̄ = 78.3. Obtain a 99% confidence interval for µ.

(3). The heights of a random sample of 50 college students showed:


Mean: X̄ = 174.5 cm, Standard deviation: s = 6.9 cm. Construct a 99%
confidence interval for the mean height of all college students.

Dr. P. Rajendra (Professor, Dept. of Maths) Module - 4 : Statistical Inference - II CMRIT, Bengaluru. 17 / 17
Topic - 2 : Small Samples - Student’s t-Test

Dr. P. Rajendra

Professor, Dept. of Maths

CMRIT, Bengaluru.

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 1 / 20
1. Sampling Distribution for Small Samples: In case of Large samples,
sampling distribution approaches a normal distribution and values of
sample statistic are considered best estimates of the parameters in a
population. It will no longer be possible to assume that statistics
computed from small samples are normally distributed. As such, a new
technique has been devised for small samples which involves the concept
of ‘degrees of freedom’.
Degrees of Freedom: The Degrees of freedom (d.f.) are particularly
important for small samples because small sample sizes tend to introduce
more variability and uncertainty in statistical estimates. The d.f. plays a
key role in adjusting for this added uncertainty to make statistical tests
more reliable. The d.f is defined as the number of independent values in a
set of observations. For example: If x1 + x2 + x3 = 15, knowing two
values determines the third. Thus, the degrees of freedom (d.f.) are 2.
When you compute the mean of a sample, one degree of freedom is used
to calculate the sample mean, meaning only n − 1 values are free to vary
when calculating other statistics, such as the sample variance. Therefore
for a sample of size n, the degrees of freedom used in variance calculations
are d.f. = n − 1
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 2 / 20
2. Student’s t-Test: The Student’s t-distribution is a probability
distribution used to estimate population parameters (like the mean) when
the sample size is small or the population standard deviation is unknown.
It plays a key role in hypothesis testing and confidence intervals for small
samples. For sample sizes n ≤ 30, t-statistic formula is given by:
x̄ − µ
t= √
s/ n
2 1 Pn 2
Where s = n−1 i=1 (xi − x̄) = Sample variance. The null hypothesis
H0 is accepted if the calculated |t| is less than the critical value at the
given level of significance. The t-distribution was developed by William
Sealy Gosset under the pseudonym Student.
Aspect t-Distribution Normal Distribution
Shape Bell-shaped, with heavier tails Bell-shaped
Spread Wider (more spread out) Narrower
d.f Varies with n − 1 Not dependent on d.f.
Usage Small samples, unknown σ Large samples, known σ
Table: Comparison of t-Distribution and Normal Distribution

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 3 / 20
Problem 1: A certain stimulus was administered to each of 12 patients,
and the following changes in blood pressure were recorded:
5, 2, 8, −1, 3, 0, 6, −2, 1, 5, 0, 4
Can it be concluded that the stimulus increases blood pressure?
Note: The critical value t0.05 for 11 degrees of freedom is 2.201.
Solution: The Sample Mean x̄ is
1X 31
x̄ = x= = 2.5833
n 12
The Sample Variance s 2 is
1 X
s2 = (xi − x̄)2
(n − 1)
1 
s2 = (5 − 2.58)2 + (2 − 2.58)2 + (8 − 2.58)2 + (−1 − 2.58)2
11
+(3 − 2.58)2 + (0 − 2.58)2 + (6 − 2.58)2 + (−2 − 2.58)2
+(1 − 2.58)2 + (5 − 2.58)2 + (0 − 2.58)2 + (4 − 2.58)2


s 2 = 9.538 ⇒ s = 9.538 = 3.088
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 4 / 20
Hypothesis Test:
H0 : The stimulus does not affect blood pressure, i.e., µ = 0.
H1 : The stimulus increases blood pressure, i.e., µ > 0.
The t-statistic:
x̄ − µ 2.5833 − 0
t= √ = √
s/ n 3.088/ 12
2.5833
= 2.8979 ≈ 2.9
=⇒ t =
0.8911
Compare with the Critical Value: The Critical value t0.05,11 = 2.201
and the calculated t = 2.9.
Since t = 2.9 > 2.201, we reject the null hypothesis at the 5

Conclusion: With 95% confidence, we conclude that the stimulus is


accompanied by an increase in blood pressure.

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 5 / 20
Figure: t-Distribution Curve with Critical Region

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 6 / 20
Problem 2: A random sample of 10 boys had the following I.Q scores:
70, 120, 110, 101, 88, 83, 95, 98, 107, 100.
Does this data support the assumption of a population mean I.Q. of 100
at a 5% level of significance? (Note: t0.05 = 2.262 for 9 d.f.)
Solution: I.Q. of 10 boys are
x : 70, 120, 110, 101, 88, 83, 95, 98, 107, 100
Sample Mean (x̄) is:
1X 972
x̄ = x= = 97.2
n 10
The Variance (s 2 ) is:
1 X
s2 = (x − x̄)2
n−1
1
⇒ s2 =
× 1833.6 ≈ 203.73333
9
The Standard Deviation (s) is:

s = s 2 ≈ 14.2735
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 7 / 20
Given Population Mean (µ) = 100
The t-statistic:
x̄ − µ
t=
√s
n

97.2 − 100
= 14.2735
≈ −0.6203

10

Comparison with Critical Value:

t ≈ −0.6203 < −2.262


Conclusion:
Since the calculated t value does not exceed the critical value, we do not
reject the null hypothesis. Thus, the data does not support the assumption
of a population mean I.Q. of 100 at the 5% level of significance.

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 8 / 20
Figure: t-Distribution Curve with Critical Region

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 9 / 20
Problem 3: Ten individuals are chosen at random from a population, and
their heights in inches are found to be:
63, 63, 66, 67, 68, 69, 70, 70, 71, 71.
Test the hypothesis that the mean height of the universe is 66 inches at a
5% level of significance. (Note: t0.05 = 2.262 for 9 d.f.)
Solution: Given the Heights of the individuals (in inches):
x : 63, 63, 66, 67, 68, 69, 70, 70, 71, 71
Sample Mean (x̄) is:
1X 678
x̄ = x= = 67.8
n 10
The Variance (s 2 ) is:
1 X
s2 = (x − x̄)2
n−1
1
s 2 = [(63−67.8)2 +(63−67.8)2 +(66−67.8)2 +(67−67.8)2 +(68−67.8)2
9
+(69 − 67.8)2 + (70 − 67.8)2 + (70 − 67.8)2 + (71 − 67.8)2 + (71 − 67.8)2 ]
s 2 ≈ 9.067 ⇒ s ≈ 3.011
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 10 / 20
Given Population Mean (µ) = 66
The t-statistic:
x̄ − µ
t= =
√s
n

67.8 − 66
3.011

10
⇒ t ≈ 1.8979 ≈ 1.89
Comparison with Critical Value:

t ≈ 1.89 < 2.262


Conclusion:
Since the calculated t value does not exceed the critical value, we do not
reject the null hypothesis. Thus, the hypothesis is accepted at the 5%
level of significance.

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 11 / 20
Figure: t-Distribution Curve with Critical Region

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 12 / 20
Problem 4: Two types of batteries are tested for their length of life, and
the following results are obtained:
Battery A: n1 = 10, x̄1 = 500 hrs, σ12 = 100
Battery B: n2 = 10, x̄2 = 560 hrs, σ22 = 121
Compute the Student’s t-statistic and test whether there is a significant
difference in the two means.
Solution: Given:
n1 = 10, x̄1 = 500 hrs, σ12 = 100
n2 = 10, x̄2 = 560 hrs, σ22 = 121
The pooled variance s 2 is given by:
n1 σ12 + n2 σ22
s2 =
n1 + n2 − 2
10 × 100 + 10 × 121 1000 + 1210
s2 = = = 122.78
10 + 10 − 2 18
Thus, the pooled standard deviation s is:

s = 122.78 ≈ 11.0805
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 13 / 20
The formula for the t-statistic is:
x̄2 − x̄1
t= q
s n11 + n12

Substitute the known values:


560 − 500
t= √
11.0805 0.1 + 0.1
60
t= √
11.0805 × 0.2
60 60
t= = ≈ 12.1081 ≈ 12.11
11.0805 × 0.4472 4.956
The computed value of t is 12.11.
The degrees of freedom (d.f.) = n1 + n2 − 2 = 18.
The computed t-value 12.11 is greater than the table value of t for
18 degrees of freedom at all standard levels of significance (e.g., 0.05,
0.01).
Conclusion: There is a significant difference in the mean life of the two
batteries.
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 14 / 20
Problem 5: A group of boys and girls were given an intelligence test. The
mean score, SD score, and the number of individuals in each group are as
follows:
Boys Girls
x̄ 74 70
σ 8 10
n 12 10
Test whether the difference between the means of the two groups is
significant at the 5% level of significance. Given: t0.05 = 2.086 for 20
degrees of freedom.
Solution: Given:
x̄1 = 74, σ1 = 8, n1 = 12 (Boys)
x̄2 = 70, σ2 = 10, n2 = 10 (Girls)
The pooled variance s 2 is given by:
n1 σ12 + n2 σ22
s2 =
n1 + n2 − 2
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 15 / 20
12 × 64 + 10 × 100 768 + 1000 1768
s2 = = = = 88.4
12 + 10 − 2 20 20

s = 88.4 ≈ 9.4
The formula for the t-statistic is:
|x̄1 − x̄2 |
t= q
s n11 + n12
Substitute the known values:
|74 − 70| 4 4 4
t= q = √ = √ = ≈ 0.9939
1
9.4 12 + 10 1 9.4 0.0833 + 0.1 9.4 0.1833 4.0244

The computed value of t is 0.9939. The degrees of freedom (d.f.) =


n1 + n2 − 2 = 20. The critical value of t at the 5% level of significance for
20 degrees of freedom is 2.086. Since 0.9939 < 2.086, we fail to reject the
null hypothesis.
Conclusion: The hypothesis that there is no significant difference between
the means of the two groups is accepted at the 5% level of significance.
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 16 / 20
Problem 6: Two horses, A and B, were tested for their performance in a
race, measured by the time taken to complete the race (in seconds). The
times recorded are as follows:
Horse A: 28, 30, 32, 33, 33, 29, 34
Horse B: 29, 30, 30, 24, 27, 29, -
Test whether we can discriminate between the two horses based on their
performance using a t-test.

Solution: Let the variables x and y represent the times for Horse A and
Horse B, respectively.
n1
1 X 219
x̄ = xi = = 31.30
n1 7
i=1
n2
1 X 169
ȳ = yi = = 28.20
n2 6
i=1

where n1 = 7 and n2 = 6.
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 17 / 20
X
(xi − x̄)2 = (28 − 31.3)2 + (30 − 31.3)2 + (32 − 31.3)2 + (33 − 31.3)2

+(33 − 31.3)2 + (29 − 31.3)2 + (34 − 31.3)2 = 31.4

X
(yi − ȳ )2 = (29 − 28.20)2 + (30 − 28.20)2 + (30 − 28.20)2

+(24 − 28.20)2 + (27 − 28.20)2 + (29 − 28.20)2 = 26.84


1 hX X i
s2 = (xi − x̄)2 + (yi − ȳ )2
n1 + n2 − 2
31.4 + 26.84
s2 = = 5.2973
7+6−2

s = 5.2973 = 2.3016
|x̄ − ȳ | |31.30 − 28.20|
t= q = q = 2.42
s n11 + n12 2.3016 17 + 16

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 18 / 20
Given a significance level of 5% (α = 0.05), the critical value t0.05 from
the t-distribution is:

t0.05 = 2.2 and t0.02 = 2.72

Since:

t = 2.42 falls between t0.05 = 2.2 and t0.02 = 2.72,

we reject the null hypothesis at 5% but not at 2%. This suggests that
there is a statistically significant difference in the performance of the two
horses at the 5% level.

Conclusion: We can conclude that Horse A and Horse B have significantly


different performances based on their race times. However, the evidence is
not strong enough to reject the null hypothesis at the 2% significance level.

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 19 / 20
Assignment Questions

1 Ten individuals are chosen at random from the population, and their
heights (in inches) are:

63, 63, 64, 65, 66, 69, 69, 70, 70, 71

Discuss whether the mean height in the universe is 65 inches, given


that the value of Student’s t for 9 degrees of freedom at 5% level of
significance is 2.262.
2 Nine items have the following values:

45, 47, 50, 52, 48, 47, 49, 53, 51

Does the mean of these values differ significantly from the assumed
mean of 47.5? Hint: |t| = 1.84 < t0.05 = 2.31 for ν = 8.

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 2 : Small Samples - Student’s t-Test CMRIT, Bengaluru. 20 / 20
Topic - 3 : Chi-Square Test and Goodness of Fit

Dr. P. Rajendra

Professor, Dept. of Maths

CMRIT, Bengaluru.

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 1 / 16
Introduction to Chi-Square Test

The Chi-Square (χ2 ) test is used to measure the correspondence between


theoretical and observed frequencies.
Let Oi (i = 1, 2, 3, . . . , n) be the set of observed frequencies.
Let Ei (i = 1, 2, 3, . . . , n) be the set of expected frequencies.
The χ2 statistic is defined as:

(O1 − E1 )2 (O2 − E2 )2 (On − En )2


χ2 = + + ... +
E1 E2 En
In compact form:
n
X (Oi − Ei )2
χ2 =
Ei
i=1

where: X X
Oi = Ei = N (total frequency)

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 2 / 16
Goodness of Fit Test using χ2
The Chi-Square test helps in checking the goodness of fit for various
theoretical distributions, including Binomial, Poisson and Normal.
Decision Rule:
If the calculated value of χ2 is less than the table value of χ2 at a
specified level of significance, the hypothesis is accepted.
Otherwise, the hypothesis is rejected.
Conditions for Applying the Chi-Square Test: The following conditions
must be satisfied for the valid application of the Chi-Square test:
1 No theoretical (expected) frequency should be smaller than 5.

If any expected frequency is less than 5, two or more classes should be


grouped together to increase the frequency.
The number of degrees of freedom will be adjusted according to the
number of classes after grouping.
2 The total observed and expected frequencies must be equal:
X X
Oi = Ei = N (total frequency)

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 3 / 16
Problem 1: Four coins are tossed 100 times, and the following results
were observed:
No.of Heads Frequency
0 5
1 29
2 36
3 25
4 5
Fit a binomial distribution to the data. Test the goodness of fit using the
χ2 -test with χ20.05,4 = 9.49.
Solution:  
4
P(X = x) = (0.5)x (0.5)4−x
x
Using the formula:
 
4
P(0) = (0.5)0 (0.5)4 = 0.0625
0
 
4
P(1) = (0.5)1 (0.5)3 = 0.25
1
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 4 / 16
 
4
P(2) = (0.5)2 (0.5)2 = 0.375
2
 
4
P(3) = (0.5)3 (0.5)1 = 0.25
3
 
4
P(4) = (0.5)4 (0.5)0 = 0.0625
4
Expected Frequencies:

Ei = 100 × P(X = i)

E0 = 100 × 0.0625 = 6.25


E1 = 100 × 0.25 = 25
E2 = 100 × 0.375 = 37.5
E3 = 100 × 0.25 = 25
E4 = 100 × 0.0625 = 6.25

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 5 / 16
No.of Heads Oi Ei
0 5 6.25
1 29 25
2 36 37.5
3 25 25
4 5 6.25
Chi-Square Calculation:
4
2
X (Oi − Ei )2
χ =
Ei
i=0

(5 − 6.25)2 (29 − 25)2 (36 − 37.5)2 (25 − 25)2 (5 − 6.25)2


χ2 = + + + +
6.25 25 37.5 25 6.25
1.5625 16 2.25 1.5625
= + + +0+ = 0.25 + 0.64 + 0.06 + 0.25 = 1.2
6.25 25 37.5 6.25
Conclusion: Given χ2 = 1.2 and χ20.05,4 = 9.49. So, 1.2 less than 9.49.
Since the calculated χ2 value is less than the table value at the 5%
significance level, we accept the null hypothesis.
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 6 / 16
Problem 2: A dice is thrown 264 times, and the following frequency
distribution is observed:
Face Value x Observed Frequency Oi
1 40
2 32
3 28
4 58
5 54
6 60
Assuming the dice is unbiased, the expected frequency for each face value
is Ei = 264 2
6 = 44. Calculate the χ value to test the goodness of fit.
Solution:
Face Value x Observed Frequency Oi Expected Frequency Ei
1 40 44
2 32 44
3 28 44
4 58 44
5 54 44
6 60
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit
44
CMRIT, Bengaluru. 7 / 16
The formula for calculating the chi-square statistic is:
6
2
X (Oi − Ei )2
χ =
Ei
i=1

(40 − 44)2 (32 − 44)2 (28 − 44)2 (58 − 44)2


=⇒ χ2 = + + + +
44 44 44 44
(54 − 44)2 (60 − 44)2
+
44 44

16 144 256 196 100 256 968


=⇒ χ2 = + + + + + = = 22
44 44 44 44 44 44 44
Calculated χ2 value: 22
If the table value of χ2 for 5 degrees of freedom at the 5%
significance level is not provided, it can be obtained from a standard
chi-square table. If the calculated value exceeds the table value, we
reject the hypothesis that the dice is unbiased.
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 8 / 16
Problem 3: A survey of 320 families with 5 children each revealed the
following distribution:
No. of Boys No. of Girls No. of Families Oi
5 0 14
4 1 56
3 2 110
2 3 88
1 4 40
0 5 12
Solution:
Hypothesis:

H0 : The probability of male and female births is equal.


H1 : The probability of male and female births is not equal.

Significance level: α = 5%, Number of families, n = 320


Probability of male birth, p = 0.5; probability of female birth,
q = 1 − p = 0.5, Number of children per family = 5
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 9 / 16
The binomial distribution formula is:
   
5 5
P(x) = (0.5)x (0.5)5−x = (0.5)5
x x
The expected frequency for each number of boys is:
 
5
E (x) = 320 × P(x) = 320 × (0.5)5
x
Observed and Expected Frequencies Table
(Oi −Ei )2
No. of Boys Oi Ei Oi − Ei (Oi − Ei )2 Ei
5 14 10 4 16 1.6
4 56 50 6 36 0.72
3 110 100 10 100 1.0
2 88 100 -12 144 1.44
1 40 50 -10 100 2.0
0 12 10 2 4 0.4
Total 7.16
Since the calculated χ2 value (7.16) is less than the table value (11.07),
we fail to reject the null hypothesis H0 .
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 10 / 16
Problem 4: A survey of 320 families with 5 children each revealed the
following distribution:

No. of Boys No. of Girls No. of Families


5 0 14
4 1 56
3 2 110
2 3 88
1 4 40
0 5 12

Test if the male and female births are equally probable at a 5% level of
significance.

Solution: Given Data


Number of families surveyed: n = 320
Probability of male or female birth: p = 0.5, q = 1 − p = 0.5
Number of children per family: n = 5
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 11 / 16
Statistical Hypotheses:

H0 : Probability of male and female births is equal.


H1 : Probability of male and female births is not equal.

We use the Chi-square test to verify the hypothesis.


Expected Frequencies Calculation: The probability of observing x boys
follows a Binomial distribution:
   
5 x 5−x 5
P(x) = (0.5) (0.5) = (0.5)5
x x

The expected frequency for each outcome is:

E (x) = 320 × P(x)

∴ E (0) = 10, E (1) = 50, E (2) = 100, E (3) = 100, E (4) = 50, E (5) = 10

Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 12 / 16
Observed and Expected Frequencies:
(Oi −Ei )2
No. of Boys Observed (Oi ) Expected (Ei ) Oi − E i Ei

5 14 10 4 1.6
4 56 50 6 0.72
3 110 100 10 1
2 88 100 -12 1.44
1 40 50 -10 2
0 12 10 2 0.4
6
X (Oi − Ei )2
χ2 = = 7.16
Ei
i=1

Degrees of freedom: df = 6 − 1 = 5, Calculated value: χ2 = 7.16


Critical value at 5% level of significance: χ20.05,5 = 11.07
Since the calculated χ2 value is less than the critical value:
7.16 < 11.07
We fail to reject the null hypothesis H0 .
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 13 / 16
Problem 5: The theory predicts that the proportion of beans in four
groups (A, B, C, and D) should follow the ratio 9 : 3 : 3 : 1.
In an experiment with 1600 beans, the observed distribution is as follows:
Group Number of Beans
A 882
B 313
C 287
D 118
Verify if the observed distribution is consistent with the theoretical
prediction using a chi-square test.
Solution: Given Data
Total number of beans: 1600
Theoretical ratio: 9 : 3 : 3 : 1
Sum of the ratios: 9 + 3 + 3 + 1 = 16
The expected frequencies for each group are calculated as:
9 3
E (A) = 1600 × = 900, E (B) = 1600 × = 300
16 16
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 14 / 16
3 1
= 300, E (D) = 1600 ×
E (C ) = 1600 × = 100
16 16
Observed and Expected Frequencies:
(Oi −Ei )2
Group Observed (Oi ) Expected (Ei ) Oi − Ei Ei

A 882 900 -18 0.36


B 313 300 13 0.5633
C 287 300 -13 0.5633
D 118 100 18 3.24
The chi-square statistic is calculated as:
4
X (Oi − Ei )2
χ2 =
Ei
i=1

χ2 = 0.36 + 0.5633 + 0.5633 + 3.24 = 4.72


The calculated chi-square value is: χ2 = 4.72. The calculated chi-square
value suggests the observed distribution is consistent with the theoretical
prediction.
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 15 / 16
Assignment Problems:
1 Fit a Poisson distribution for the following data and test the goodness

of fit at the 5% level of significance. Given that χ20.05 = 7.815 for 3


degrees of freedom.
x Frequency
0 122
1 60
2 15
3 2
4 1
2 A die was thrown 60 times, and the following frequency distribution

was observed. Test whether the die is unbiased at the 5% l.o.s.


Faces Frequency
1 15
2 6
3 4
4 7
5 11
6 17
Dr. P. Rajendra (Professor, Dept. of Maths)Topic - 3 : Chi-Square Test and Goodness of Fit CMRIT, Bengaluru. 16 / 16
Topic - 4: F-Test (Fisher’s Test)

Dr. P. Rajendra

Professor, Dept. of Maths

CMRIT, Bengaluru.

Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 1 / 12
Introduction to F-Test (Fisher’s Test)

The F-test, also known as Fisher’s test, is a statistical test used in


hypothesis testing.
It is employed to check whether the variances of two populations or
two samples are equal if two samples have been drawn from the same
population.
The test utilizes the F statistic to compare two variances by dividing
one by the other.

Mathematical Formulation: Let x1 , x2 , . . . , xn1 and y1 , y2 , . . . , yn2


represent values of two independent random samples drawn from normal
populations with equal variances σ 2 .

Let x̄1 and x̄2 be the sample means, and let the sample variances be
defined as:

Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 2 / 12
1 n
1 X
s12 = (xi − x̄)2 ,
n1 − 1
i=1
n2
1 X
s22 = (yi − ȳ )2 .
n2 − 1
i=1

The F statistic is defined as:


s12
F = where s12 > s22 .
s22
This leads to the F-distribution, also known as the variance ratio
distribution, with degrees of freedom:

ν1 = n1 − 1,
ν2 = n2 − 1.

Note: The larger of the variances is placed in the numerator to ensure


F ≥ 1.
Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 3 / 12
Problem 1: The table shows the Population Standard Deviation and
Sample Standard Deviation for both men and women. Find the F statistic,
considering the Men population in the numerator.

Population Standard Deviation Sample Standard Deviation


Men 30 35
Women 50 45
Solution: Given, σ1 = 30, σ2 = 50,s1 = 35, s2 = 45
The F statistic is given by:
s12
σ12
F =
s22
σ22
Substituting the given values:
 2
35 1225
302 900 1.36
F = = 2025
== = 1.68
452
2500
0.81
502

The F statistic is F = 1.68.


Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 4 / 12
Problem 2: Two random samples drawn from two normal populations are
given below:
Sample Data
Sample-I 20, 16, 26, 27, 22, 23, 18, 24, 19, 25,-,-
Sample-II 27, 33, 42, 35, 32, 34, 38, 28, 41, 43, 30, 37

(i). Obtain the estimates of the variance of each population.


(ii). Test at a 5% level of significance whether the two populations have
the same variance.
Solution:
Null Hypothesis H0 : σ12 = σ22
Alternate Hypothesis H1 : σ12 ̸= σ22
Given Data,
Sample Data Sample Size
Sample-I 20, 16, 26, 27, 22, 23, 18, 24, 19, 25 n1 = 10
Sample-II 27, 33, 42, 35, 32, 34, 38, 28, 41, 43, 30, 37 n2 = 12
Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 5 / 12
The sample mean for Sample-I:
Pn1
xi 20 + 16 + 26 + 27 + 22 + 23 + 18 + 24 + 19 + 25
x̄1 = i=1 = = 22
n1 10
The sample mean for Sample-II:
Pn2
i=1 yi
x̄2 =
n2
27 + 33 + 42 + 35 + 32 + 34 + 38 + 28 + 41 + 43 + 30 + 37
= = 35
12
The sample variance for Sample-I:
1 n
1 X 120
S12 = (xi − x̄1 )2 = = 13.33
n1 − 1 9
i=1

The sample variance for Sample-II:


2 n
1 X 314
S22 = (yi − ȳ2 )2 = = 28.54
n2 − 1 11
i=1

Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 6 / 12
Since S22 > S12 , we place the larger variance in the numerator:

S22 28.54
F0 = 2
= = 2.14
S1 13.33

Degrees of freedom for Sample-I: ν1 = n1 − 1 = 10 − 1 = 9


Degrees of freedom for Sample-II: ν2 = n2 − 1 = 12 − 1 = 11

For a 5% level of significance, the critical value FE for ν1 = 9 and ν2 = 11


is FE = 3.10.

Decision Rule: If F0 < FE , accept the null hypothesis H0 .


Since F0 = 2.14 < FE = 3.10, we accept the null hypothesis at the 5%
level of significance.

Conclusion: The two samples may be regarded as drawn from populations


having the same variance.

Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 7 / 12
Problem 3: The I.Q.’s of 25 students from one college showed a variance
of 16, and those of an equal number from another college had a variance
of 8. Discuss whether there is any significant difference in variability of
intelligence.
Solution:
Let σ12 = 16 and σ22 = 8.
The F-statistic is calculated as:
σ12 16
F = 2
= =2
σ2 8

The tabulated value of F at a 5% level of significance is 1.98.Since


the calculated value of F = 2 is slightly higher than the tabulated
value at the 5% level, the variability of intelligence is just significant
at the 5% level.
The tabulated value of F at a 1% level of significance is 2.62. Since
the calculated F = 2 is less than the tabulated value of 2.62 at the 1%
level, the variability of intelligence is not significant at the 1% level.
Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 8 / 12
Problem 4: Two samples of sizes 9 and 8 give the sum of squares of
deviations from their respective means equal to 160 inches2 and 91 inches2
respectively. Can these be regarded as drawn from the same normal
population?
Solution:
Let H0 : There is no significant difference between the variances of
two populations.
Given: (x − x̄)2 = 160 and (y − ȳ )2 = 91.
P P
Sample variances:
160 91
s12 = = 20, s22 = = 13
8 7
The F-statistic is:
s2 20
F = 12 = ≈ 1.54
s2 13
Degrees of freedom: ν1 = n1 − 1 = 8, ν2 = n2 − 1 = 7.
Tabulated F0.05 = 3.73.
Since the calculated F < F0.05 , we accept H0 .
Thus, the two samples can be regarded as drawn from normal populations
with the same variance.
Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 9 / 12
Problem 5: Two independent samples of sizes 7 and 6 have the following
values:

Sample A: 28, 30, 32, 33, 33, 29, 34


Sample B: 29, 30, 30, 24, 27, 29

Solution:
H0 : The samples have been drawn from normal populations having
the same variance.
Mean of Sample A:
219 169
x̄ = = 31.285, ȳ = = 28.166
7 6

1 X 1
S12 = (xi − x̄)2 = (28 − 31.285)2 + · · · + (34 − 31.285)2

n1 − 1 6
1
= [10.791 + 1.651 + 0.511 + 2.941 + 2.941 + 5.221 + 7.371] = 5.238
6
Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 10 / 12
1 X 1
S22 = (yi − ȳ )2 = (29 − 28.166)2 + · · · + (29 − 28.166)2

n2 − 1 5
1
= [0.695 + 3.364 + 3.364 + 17.355 + 1.359 + 0.695] = 5.366
5

S22 5.366
F = 2
= = 1.025
S1 5.238
Degrees of freedom: ν1 = 6, ν2 = 5.
Tabulated F0.05 (6, 5) = 4.95.
Since the calculated F is less than the tabulated F , we accept H0 .
Therefore, the samples have been drawn from normal populations with the
same variance.

Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 11 / 12
Assignment Questions

1 The I.Q.’s of 25 students from one college showed a variance of 16,


and those of an equal number from another college had a variance of
8. Discuss whether there is any significant difference in the variability
of intelligence. Use 1% and 5% levels of significance.
2 Two independent samples of sizes 7 and 6 have the following values:
Sample A: 28, 30, 32, 33, 33, 29, 34
Sample B: 29, 30, 30, 24, 27, 29
Examine whether the samples have been drawn from normal
populations having the same variance.
3 In two groups of ten children each, the increase in weight due to
different diets during the same period, were as follows (in pounds):
Group 1: 3, 7, 5, 6, 5, 4, 4, 5, 3, 6
Group 2: 8, 5, 7, 8, 3, 2, 7, 6, 5, 7
Is there a significant difference in their variability?

Dr. P. Rajendra (Professor, Dept. of Maths) Topic - 4: F-Test (Fisher’s Test) CMRIT, Bengaluru. 12 / 12

You might also like