Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Hypothesis Testing 23.09.2023

Download as pdf or txt
Download as pdf or txt
You are on page 1of 157

HYPOTHESIS TESTING

Prof. Alka Tripathi


Department of Mathematics
Jaypee Institute of Information Technology
A-10, Sector-62, Noida, U.P., 201309 INDIA
ELEMENTARY SAMPLING
THEORY
• relationships existing between a population and
samples drawn from the population.

•estimating unknown population quantities (such as


population mean and variance)-population parameters
or parameters from a knowledge of corresponding
sample quantities (such as sample mean and variance)
Statistics and Parameters
A statistic is a numerical value computed
from a sample. Its value may differ for
different samples. (Professor R.A. Fisher )
e.g. sample mean x , sample standard
deviation s, and sample proportion p̂ .
A parameter is a numerical value
associated with a population. Considered
fixed and unchanging. e.g. population
mean m, population standard deviation s,
and population proportion p.
•whether the observed differences between two samples
are due to chance variation or whether they are really
significant

•tests of significance and hypotheses


• theory of decisions
• statistical inference
Design Experiment

Methods of Sampling
1.Simple Random Sampling
a) [Simple Random Sampling without replacement]
b) [Simple Random Sampling with replacement]
2. Systematic Sampling
3.Stratified Sampling
4.Cluster Sampling
5.Quota Sampling
6. Purposive Sampling ( or Judgment Sampling)
Purposive Sampling –
Sample units are selected with definite purpose in view

Simple Random Sampling -


a) Simple Random Sampling without replacement
b) Simple Random Sampling with replacement

Stratified Sampling –
Entire heterogeneous population is divided into a
number of homogeneous groups termed as
strata, which differ from one another but each of these
groups is homogeneous within itself. Then units are
sampled at random from each of these stratum
Sampling distributions
Consider all possible sample of size N that can be drawn
from a given population(either with or without replacement)
For each sample, we can compute a statistic (i.e. mean, s.d.)
that will vary from sample to sample.
In this manner we obtain distribution of the statistics called
Sampling distributions

Sampling distribution of means - particular statistics used is


the sample mean
A population consists of the five numbers 2,3,6,8 and
11. Consider all possible samples of size 2 that can
be drawn with replacement from this population.

The corresponding sample means are


Unbiased Estimate - A statistic t = t ( x1 , x2 ,...., xn )
a function of the sample values x1 , x2 ,...., xn
is an unbiased estimate of population parameter m .

E(t) = m or E(Statistic) = Parameter

The statistic is said to be an unbiased estimate of the


parameter.
Standard Error –
The standard deviation of the sampling
distribution of a statistic is known as its Standard
Error, abbreviated as S.E.
- important role in the large sample theory and
forms the basis of the testing of hypothesis
Basic Statistical Laws:
1. Law of Statistical Regularity:- It states that a reasonably large
number of items selected at random from a large group of
items, will on the average represent the characteristics of the
group.

2. Law of Inertia of Large Number: It states that large groups of


data show high degree of stability because there is a greater
possibility that one side are compensated by the extremes on
the other side.

3. Central Limit Theorem : If x1, x2, x3, …….. xn is a random


sample of size n drawn from any population (having mean µ
and variance σ2), then the distribution sample mean (x) is
normally distributed with mean µ and variance σ2/n, provided n
is sufficiently large, i.e. n→∞, where µ and σ2 respectively are
population mean and variance.
Suppose that all possible samples of size N
are drawn without replacement from a finite
population of size N p  N.
m X , s X − mean and standard deviation of the sampling distributi on

m, s − population mean and standard deviation


s Np − N
m X = m & sX =
N Np −1
if the population is infinite or if sampling is with replacement
s
then m X = m & s X =
N
For large values of N (N>=30), the sampling
distribution of mean is approximately a normal dist
with mean and sd as m X & s X irrespective of the
population
In case the population is normally distributed , the
sampling distribution of means is also normally
distributed even for small values of N (i.e. N<30)
Q1.A population consists of the five numbers 2,3,6,8 and
11. Consider all possible samples of size 2 that can
be drawn with replacement from this population.
Find (a) the mean of the population,(b) the standard
deviation of the population, (c) the mean of the
sampling distribution of means, and (d) the standard
deviation of the sampling distribution of means (i.e.,
the standard error of means).
There are 5(5) = 25 samples of size 2 that can be
drawn with replacement

The corresponding sample means are


sum of all sample means 150
mX = = = 6.0
25 25
m = mX
Ans. (a) 6 (b) 3.29 (c) 6 (d) 2.32
Q2. Solve above problem for the case that the sampling
is without replacement.
(a) 6 (b) 3.29 (c) 6 (d) 2.01

s Np − N
m X = m & sX =
N Np −1
Q3. Assuming that the heights of 3000 male students
at a university are normally distributed with mean
68.0 inches (in) and standard deviation 3.0 in. If 80
sample consisting of 25 students each are obtained,
what would be the expected mean and standard
deviation of the resulting sampling distribution of
means if the sampling were done (a) with replacement
and (b) without replacement ?

Ans. (a) 68in and 0.6in


(b) 68in and .597in
Sampling of Attributes
Consider sampling from a population which is divided into two
mutually exclusive and collectively exhaustive classes-one class
possessing a particular attribute, say A, and the other class not
possessing that attribute, and then note down the number: of
persons in the sample of size n, possessing that attribute. The
presence of an attribute in sampled unit may be termed as
success and its absence as failure.
In this case a sample of n observations is identified with that of a
series of n independent Bernoulli trials with constant probability P
of success for each trial.
If X is the number of successes in n independent
trials with constant probability P of success for
each trial
Sampling distribution of proportions

Population is infinite

p and q
Consider all possible samples of size N drawn from this
population and for each sample consider the proportion
p of successes.

m = p & s = pq
pq p(1 − p)
mp = p & sp = =
N N
mean of sample standard deviation of sample

For large values of N (N>=30), the sampling


distribution is very closely normally distributed.
Above population is binomially distributed
Equation valid for a finite population in which
sampling is with replacement.
For finite population in which sampling is without
pq Np − N
replacement m p = p & s p =
N Np −1
Q1. Find the probability that in 120 tosses of a fair
coin (a) between 40% and 60% will be heads and
(b) 5/8 or more will be heads.
0.9774
11
pq
m P = p = 1 / 2, sP = = 2 2 = 0.0456
N 120
0.40 - 0.50
40% in standard units = = −2.19
0.0456
60% in standard units = 2.19

Required probability = 0.9714


Proportion is actually a discrete variable.
We need to subtract
1
from 0.40 and add the
the same to 0.60 2N
1/2N = 1/240 = 0.00417
Required prob is = 0.40 - 0.00417 - 0.50
= −2.28
0.0456
Ans will be 0.9774

(0.40-0.00417) and (0.60 + 0.00417) correspond to


the proportions 47.5/120 and 72.5/120
Q2. It has been found that 2% of the tools produced
by a certain machine are defective. What is the
probability that in a shipment of 400 such tools
(a) 3% or more and (b) 2% or less will prove defective ?

0.5764 0.5
Comparing Means
• We have two normal populations (1 and 2)
• Let m1 and s1 denote the mean and standard
deviation of population 1.
• Let m2 and s2 denote the mean and standard
deviation of population 2.
• Let x1, x2, x3 , … , xn denote a sample from a normal
population 1.
• Let y1, y2, y3 , … , ym denote a sample from a normal
population 2.
• Objective is to compare the two population means
We know that:
s1
x is Normal with mean m x = m1 and s x =
n
and
s2
y is Normal with mean m y = m2 and s y =
m
Thus D = x − y is Normal with mean
m x − y = m x − m y = m1 - m 2

s 12 s 22
s x − y = s x2 + s y2 = +
n m
Q1. Let X be a variable that stands for any of the
elements of the population 3,7,8 and Y be a variable
that stands for any of the elements of the population
2,4. Compute

(a )m x (b)mY (c)mX−Y (d)sX (e)sY (f )sX−Y .


(m1, s1 ), (m2 , s2 )
mS1−S2 = mS1 − mS2 = m1 − m 2
s12 s 22
sS1−S2 = s + s
2
S1
2
S2 = +
N1 N 2
m P1−P 2 = m P1 − m P 2 = p1 − p 2
Comparing Proportions
• Suppose we have two Success-Failure experiments
• Let p1 = the probability of success for experiment 1.
• Let p2 = the probability of success for experiment 2.
• Suppose that experiment 1 is repeated n1 times and
experiment 2 is repeated n2
• Let x1 = the no. of successes in the n1 repititions of
experiment 1, x2 = the no. of successes in the n2
repititions of experiment 2.

x1 x2
pˆ1 = and pˆ 2 =
n1 n2
x1 x2
What is the distribution of D = pˆ1 − pˆ 2 = − ?
n1 n2
We know that:
x1
pˆ1 = is Normal with mean m pˆ1 = p1
n1
p1 (1- p1 )
and s pˆ1 =
n1
x2
Also pˆ 2 = is Normal with mean m pˆ 2 = p2
n2
p2 (1- p2 )
and s pˆ 2 =
n2
Thus D = pˆ1 − pˆ 2 is Normal with mean
m pˆ − pˆ = m pˆ − m pˆ = p1 - p2
1 2 1 2

p1 (1 − p1 ) p2 (1 − p2 )
s pˆ − pˆ = s + s
2
pˆ1
2
pˆ 2 = +
1 2
n1 n2
p1q1 p 2q 2
sP1−P 2 = s + s
2
P1
2
P2 = +
N1 N2
mS1+S2 = mS1 + mS2 & sS1+S2 = s + s 2
S1
2
S2

s2
s 2
= 1
+ 2
N1 N 2
Two distances are measured as 27.3cm and 15.6cm
with standard deviations of 0.16cm and 0.08cm resp.
Determine the mean and standard deviation of
(a) the sum and (b) the difference of the distances.
(42.9cm,0.18cm,11.7cm,0.18cm)
STATISTICAL DECISION
THEORY
When we attempt to make decisions about populations
on the basis of sample information, we have to make
assumptions or guesses about the nature of the
population involved or about the value of some parameter
of the population. Such assumptions may or may not be
true are called Statistical Hypothesis.

Very often, we set up a hypothesis which assumes that


there is no significant difference between the sample
statistic and corresponding population parameter. Such
hypothesis is called Null Hypothesis.
A hyothesis that is different from(or complementary to)
Null Hypothesis is called Alternative Hypothesis
H 0 , H1
A procedure for deciding to accept or reject Null
hypothesis is called Test of Hypothesis.
If we suppose that a particular hypothesis is true but
find the results in a random sample differ markedly
from the results expected under the hypothesis then
we would say that the observed differences are
significant and would thus be inclined to reject the
hypothesis.
If the difference is due to sampling fluctuation then
called insignificant difference. If the difference is due
to the reason that either the sampling procedure is not
purely random or samples are not drawn from the given
population then called Significant difference.
Procedure is called Test of Significance.
If we are prepared to reject a null hypothesis when
it is true or if we are prepared to accept that the
difference between a sample statistic and
corresponding parameter is significant, when the
sample statistic lies in a certain region or interval
is called Critical region or region of rejection.

Complementary is region of acceptance


In the case of large samples, the sampling distribution
of many statistics tend to become normal distribution.

If ‘t’ is a statistic in large samples, then it follows a


normal distribution with mean E(t), which is the
corresponding population parameter and S.D. equal
to S.E.(t).

t - E(t)
Hence z = is a standard normal variate
SD(t)
i.e. z (called the test statistic)follows a normal
distribution with mean 0 and S.D.1.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0190 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141

0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224

0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2969 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133

0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3513 0.3554 0.3577 0.3529 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857

2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890

2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981

2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993

3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995

3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997

3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
Area under standard normal curve between t = -1.96 and
t=+1.96 is 0.95.

Area under normal curve ‘t’ between [E(t)-1.96 S.E(t)]


and [E(t)+1.96 S.E(t)] is 95% or only 5% of t will
lie outside this interval.

If we are prepared to accept that difference between


t and E(t) is significant when t lies in above region
then this region constitute critical region of ‘t’.
Test statistics z for which the Critical region or
Acceptance region are separated is called Critical Value
or Significant value of z and denoted by z  where
 is level of significance
Reject the hypothesis at the 0.05 significance level
if the z score of the statistic S lies outside the range -
1.96 to 1.96 (i.e., either z >1.96 or z <-1.96).
-equivalent –sample statistic is significant at the
0.05 level.
Accept the hypothesis otherwise (or, if desired make no
decision at all)
z score- test statistic

Other significant level – 0.01 level ( -2.58 to 2.58)


The probability  that a random value of a statistic
lies in the critical region is called level of significance
and usually expressed as percentage.

P(E( t ) − 1.96S.E.(t ))  t  P(E( t ) + 1.96S.E.(t )) = 0.95


P( z  1.96) = 0.05

Level of significance is maximum probability with which


we are prepared to reject Null Hypothesis when it is
true or total area of rejection.
Type I error – If we reject a hypothesis when it should
be accepted

Type II error – If we accept a hypothesis when it should


be rejected
wrong decision or error in judgment has occurred

Errors in Testing of Hypothesis


Type I – producer’s risk
, 
Type II – consumer’s risk
Level of significance – max prob with which we would
be willing to risk a Type I error

Significance level .05 or .01


(c) Probabilities and numbers of standard deviations

Shaded area = 0.683 Shaded area = 0.954 Shaded area = 0.997

− + − + − +
68% chance of falling 95% chance of falling 99.7% chance of falling
between − and + between −  and + between −  and +
Two–tailed and one-tailed tests
extreme values on both sides of the mean
extreme values to one side of the mean
critical region – one side of the distribution with
area equal to the level of significance

Level of 0.10 0.05 0.01 0.005 0.002


significance

Critical -1.28 or -1.645 or -2.33 or -2.58 or -2.88 or


values of z 1.28 1.645 2.33 2.58 2.88
for one-
tailed tests
Critical -1.645 -1.96 and -2.58 and -2.81 and -3.08 and
values of z and 1.645 1.96 2.58 2.81 3.08
for two-
tailed tests
Procedure for Testing of Hypothesis
1. Null hypothesis H 0 is defined.
2. Alternativ e hypothesis H1 is also defined after
a careful study of the problem and also the nature of
the test (whether one - tailed or two - tailed) is decided.
3. LOS  is fixed or taken from the problem if
specified and z  is noted.
t - E(t)
4. The test - statisticz = is computed.
SD(t)
5. Comparison is made between z and z  . If z  z  ,
H 0 is accepted or H1 is rejected , i.e., it is concluded
that thedifference between t and E(t) is not significan t
at % LOS.
On the other hand , if z  z  , H 0 is rejected or
H1 is acceped, i.e., it is concluded that the
difference between t and E(t) is significan t at
% LOS.
Interval Estimation of Populaton Parameters
point estimation, interval estimation
Confidence interval
 t − E( t ) 
P{ z  1.96} = 0.95 P   1.96 = 0.95
 SD( t ) 
P{t − 1.96SD( t )  E( t )  t + 1.96SD( t )} = 0.95
Thus {t - 1.96SD(t), t + 1.96SD(t)}are 95%
confidence limits for E(t)
Test of Significance for Large Samples

1. The sampling dist of a statistic is approximately


normal, irrespective of whether the distribution
of the population is normal or not.
2. Sample statistics are sufficiently close to the
corresponding population parameters and
hence may be used to calculate the standard
error of the sampling dist.
Test of significance of the difference between sample
mean and population mean.

Here S = X, the sample mean ; m s = m X = m, the populationmean ;


s
and ss = s X = , where s is the populationstandard deviation
N
and N is the sample size.
X −m
The z score is given by z =
s/ N
Procedure for Testing of Hypothesis
1. Null hypothesis H 0 is defined.
2. Alternativ e hypothesis H1 is also defined after
a careful study of the problem and also the nature of
the test (whether one - tailed or two - tailed) is decided.
3. LOS  is fixed or taken from the problem if
specified and z  is noted.
t - E(t)
4. The test - statisticz = is computed.
SD(t)
5. Comparison is made between z and z  . If z  z  ,
H 0 is accepted or H1 is rejected , i.e., it is concluded
that thedifference between t and E(t) is not significan t
at % LOS.
The heights of college students in Chennai are
normally distributed with standard deviation 6cm and
sample of 100 students had their mean height 158cm
Test the hypothesis that the mean height of college
students in Chennai is 160cm.
H0 : x = m
158 − 160
H1 : x  m z= = −3.33
6 / 10
x −m
z=
s/ N

Conclusion , z = 3.33  2.58, we reject the hypothesis at


1% level of significan ce
Test of significance of the difference between the means
of two samples.

Let X1 & X 2 be the means of two large samples of sizes n1 & n 2 drawn
from two population s(normal or non - normal ) with the same mean m
and variance s12 & s 22 resp.
s1 s2
Then X1 follows N(m, ) & X 2 follows N(m, )
n1 n2
Either exactly or approximately.
, X1 − X 2 also follows a normal distributi on.
E( X1 − X 2 ) = E( X1 ) − E(X 2 ) = m − m = 0

z=
(X − X ) − E(X − X )
1 2 1 2 z=
X1 − X 2
s1 s 2
2 2
s(X1 −X 2 ) +
n1 n2
1. If the samples are drawn from the same population , i.e., if s1 = s 2 = s, then
X1 − X 2
z=
1 1
s +
n1 n 2

2.If s 1 & s 2 are not known ands 1  s 2 ,s 1 & s 2 can be approximated by the
X1 − X 2
sample SDs s1 & s2 . Hence , in such a situation z =
s12 s2 2
+
n1 n2
Q. A simple sample of heights of 6400 English men
has a mean of 170 cm and an SD of 6.4 cm, while a
simple sample of heights of 1600 Americans has a
mean of 172cm and an SD of 6.3cm. Do the data
indicate that Americans are on the average , taller than
the Englishmen ?
Here n1 = 6400, x1 = 170 & s1 = 6.4;
n 2 = 1600, x 2 = 172 & s 2 = 6.3

H 0 : m1 = m 2 or x1 = x 2
i.e. the sample have been drawn from two different
population s with the same mean.
H1 : m1  m 2 or x1  x 2
Left hand test is to be used.
Let LOS be 1%. Therefore , z  = −2.33
x1 − x 2 x1 − x 2
z= =
s s
2 2
2 2
+
1 2 s1
+
s 2
n1 n 2 n1 n 2
170 − 172
= = −11.32  z  z
(6.4) 2
+
(6.3)
2

6400 1600
Therefore, the difference between x1 & x 2 (or m1 & m 2 ) is significan t
at 1% level, i.e. H 0 is rejected and H1 is accepted.
That is, the Americans are , on the average, taller than..
The means of two single large samples of 1000 and
2000 members are 67·5 inches and 68·0 inches
respectively. Can the samples be regarded as drawn
from the same population of standard deviation 2·5
inches? (Test at 5% level of significance).
Test for Single Proportion
A dice is thrown 9,000 times and a throw of 3 or 4 is
observed 3,240 times. Show that the dice cannot be
regarded as an unbiased one and find the limits
between which the probability of a throw of 3 or 4 lies.
n = 9,000 ; X = Number of successes = 3,240
probable limits for the population proportion of success
Test of significance of the difference between
sample proportion and population proportion.
Here S = P , the proportion of “successes” in a sample
( or sample proportion)
mS = m P = p, where p is the the population proportion of
successesor prob of succ also called sample proportion mean
pq
and N is the sample size; sS = s P =
N
S − ms
z= P−p
ss z=
pq / N
Test of significance of the difference between the
two sample proportions
Let P1 and P2 be the proportions of successes in two
large samples of size n1 and n2 respectively, drawn from
the same population or from two populations with the
same proportion p.

mP1- P2 = mp1 - mp2 = p - p = 0


pq pq
s P1- P2 = sP1 +
2
sP2
2
= +
n1 n 2
P1 - P2
z=
pq pq
+
N1 N 2
P1 - P2
If p is not known z =
P1Q1 P1Q1
+
N1 N2
Q. Experience has shown that 20% of a manufactured
product is of top quality. In one day’s production of 400
articles, only 50 are of top quality. Show that either the
production of the day chosen was not a representative
sample, or the hypothesis of 20% was wrong. Based on
the particular day’s production, find also the 95%
confidence limits for the percentage of top quality
product.
P = Proportion of top quality products in the sample
= 50/400=1/8

From the alternativ e hypothesis H1 , we note that


two - tailed test is to be used.
Let LOS = 5%. therefore, z  = 1.96.
1/ 8 −1/ 5
P−p =
z=
pq
1 4 1
 
z = 3.75  1.96
N 5 5 400
The difference between p and P is significant at 5% level.
Also H 0 is rejected. Hence H 0 is wrong or the
production of the particular day chosen is not a
representative sample.
Test of Significance for Small Samples

Student' s t - Distributi on
A random variable T is said to follow student' s t - distributi on
or simply t - distributi on, if its probabilit y density function is
− ( +1) / 2
1  t  2
given by f(t) = 1 +  -  t  
 1 
 ,  
 2 2
where  denotes the number of degrees of freedom of the t - distributi on.
Properties of t-distribution

df = inf

df = 3
df = 1
1. The probability curve of the t-distribution is similar to
the standard normal curve, and is symmetric about t = 0,
bell-shaped and asymtotic to the t-axis.
2. For sufficiently large value of deg of freedom, the t-
distribution tends to the standard normal dist.
3. The mean of t-dist is zero.

4. The variance of the t - dist is , if   2, and is
-2
greater than 1, but it tends to 1 as  → .
Uses of t-Distribution
Critical Values of t and the t-Table
The critical value of t at level of significan ce(LOS)  and
deg of freedom  is given by P{ t  t  ()} =  for two - tailed test,
as in the case of normal distributi on .......
Test of significance of the difference between sample
mean and population mean.

Here Sample mean = x of size n drawn from a


X−m
populationN(m, s). The z score is given by z = .
s/ N

n
s=s
n −1
x −m x −m x −m
t= = =
s/ n n 1 s / n −1
s 
n −1 n
Tests made on the breaking strength of 10 pieces of a
metal gave the following results : 578,572,570,568,572,
570,570,572,596 & 584kg. Test if the mean breaking
strength of the wire can be assumed as 577kg.
m = 577kg., x = 575.2, s = 8.26
575.2 − 577
t= = −0.65
8.26 / 9
H0 : x = m & H1 : x  m.
Two tailed test is to be used. Let LOS be 5%.
Q. A machinist is expected to make engine parts with
axle diameter of 1.75cm. A random sample of 10 parts
shows a mean diameter of 1.85cm, with an SD of 0.1cm
On the basis of this sample , would you say that the
work of the machinist is inferior?
Test of significance of the difference between the means
of two samples.

Let X1 & X 2 be the means of two samples of sizes n1 & n 2 drawn


from two population s(normal or non - normal ) with the same mean m
and variance s12 & s 22 resp.
E( X1 − X 2 ) = E( X1 ) − E(X 2 ) = m − m = 0

z=
(X − X ) − E(X − X )
1 2 1 2 z=
X1 − X 2
s1 s 2
2 2
s(X1 −X 2 ) +
n1 n2
X1 − X 2
t=
2 2 d.f. = n1 + n 2 − 2
s1 s2
+
n1 − 1 n 2 − 1
Test of significance of the difference between means
of two small samples drawn from the same normal
population.
If the samples are drawn from the same population, i.e., if s 1= s2 = s, then
X1 - X 2
t=
1 1
s +
n1 n 2
If s1 & s 2 are equal and not known , then s1 = s 2 = s is approximat ed by
n s 2
+ n s 2
x1 − x 2
s =
2 1 1 2 2
. Hence , in such a situation z =
n1 + n 2  1 1  n1s12 + n 2s 22 
 +  
 n1 n 2  n1 + n 2 
n1s12 + n 2s 22
If size of sample is small s =
n1 + n 2 - 2

x1 − x 2
t=
 1 1  n1s12 + n 2s 22 
 +   d.f. = n1 + n 2 − 2
 n1 n 2  n1 + n 2 − 2 
A developmental psychologist would like to examine
the difference in verbal skills for 8-year-old boys versus
8-year-old girls. A sample of 10 boys and 10 girls is
obtained, and each child is given a standardized verbal
abilities test. The data for this experiment are as
follows:
Girls Boys

n1 = 10 n2 = 10
X1 = 37 X 2 = 31
S²1 = 15 S²2 = 21
STEP 1: get mean difference

X1 − X 2 = 6

STEP 2: Compute Pooled Variance

s =
2 n1S 1 + n 2S 2
2
=
2
(15 + 21)10 =
360
= 20
n1 − 1 + n 2 − 1 (10 − 1) + (10 − 1) 18
STEP 4: Compute T statistic and df
( X1 − X 2 ) − (m1 − m 2 ) x1 − x 2
t= =
s x1 − x 2  1 1  n1s12 + n 2s 22 
 +  
 n1 n 2  n1 + n 2 − 2 
(37 − 31) − 0
= =3
2

d.f. = (n1 - 1) + (n2 - 1) = (10-1) + (10-1) = 18


STEP 5: Use table
T = 3 with 18 degrees of freedom

For alpha = .01, critical value of t is 2.878


Our T is more extreme, so we reject the null hypothesis
There is a significant difference between boys and girls
A study is done by a community group in two neighboring colleges to
determine which one graduates students with more math classes.
College A samples 11 graduates. Their average is four math classes
with a standard deviation of 1.5 math classes. College B samples nine
graduates. Their average is 3.5 math classes with a standard deviation of
one math class. The community group believes that a student who
graduates from college A has taken more math classes, on the average.
Both populations have a normal distribution. Test at a 1% significance
level. Answer the following questions.
a. Is this a test of two means or two proportions?
b. Are the populations standard deviations known or unknown?
c. Which distribution do you use to perform the test?
d. What is the random variable?
e. What are the null and alternate hypotheses? Write the null and
alternate hypotheses in words and in symbols.
f. Is this test right-, left-, or two-tailed?
h. Do you reject or not reject the null hypothesis?
i. Conclusion:
a. Is this a test of two means or two proportions? Mean

b. Are the populations standard deviations known or unknown?


Unknown
c. Which distribution do you use to perform the test? T- distribution
d. What is the random variable? No. of graduates in maths (normal dist)
e. What are the null and alternate hypotheses? Write the null and
alternate hypotheses in words and in symbols.
H0 : mA = mB ,H0 : mA  mB
f. Is this test right-, left-, or two-tailed? Right - tailed
h. Do you reject or not reject the null hypothesis? Accept Null Hypo

X1 − X 2 4 − 3.5
t= = = 0.85  t
s12 s2 2 1.52 12
+ +
n1 n2 10 8
i. Conclusion: Both college has similar maths class.
THE F DISTRIBUTION
a large or small ratio would indicate a large,
difference, while a ratio nearly equal to 1 would
indicate a small difference.
Sampling distribution in such a case can be found
and is called the F distribution, named after R.A. Fisher
n1 , s1 sˆ 12
F = 2 follows a F-distribution with 1 & 2
sˆ 2
n 2 , s2 deg rees of freedom.
2 2
ns ns
find population estimates sˆ = 2
, sˆ 2 =
1 1 2 2 2
n1 − 1 n 2 −1
1

with degree of freedom 1 = n1 − 1 & 2 = n 2 − 1


respectively.
(1 / 2 )−1
CF
This distributi on is given by
(1F + 2 )(1 + 2 ) / 2

where C is a constant depending on 1 & 2


such that the total area under the curve is 1.
If sˆ 12 = sˆ 22 , then F = 1. Hence our aim is to find how
far any observed value of F can differ from unity due
to fluctuatio ns of sampling.

If F denotes the observed(calculated) value


and F1 ,2 () denotes the critical (tabulated )
value of F at LOS , then P{F  F1 ,2 ()} = 
F - test is not a two - tailed test and is always a right - tailed test,
since F cannot be negative.
Thus if F  F1 ,2 () , then the difference between sˆ 12 & sˆ 22 is significan t at
LOS . The sample may not be regarded as drawn from the same population
or from population with same variance.
We should always make F  1. This is done by taking the
larger of the two estimates of variance as s12 and by assuming
the corresponding deg of freedom as 1
A sample of size 13 gave an estimated population
variance of 3.0 , while another sample of size 15
gave an estimate of 2.5. Could both samples be
from population with the same variance ?
n1 = 13, sˆ 12 = 3.0 & 1 = 12
n 2 = 15, sˆ 22 = 2.5 & 2 = 14
H o : s1 = s 2 , i.e. The two samples have been drawn
ˆ 2
ˆ 2

from population with the same variance.


H1 : sˆ 12  sˆ 22 .
Let LOS be 5%.
sˆ 12 3.0
F= 2 = = 1.2
sˆ 2 2.5
F5% (12,14) = 2.53
F  F5%
 H 0 is accepted i.e. the two samples
could have come from two normal
population s with same variance.
In one sample of 10 observations from a normal
population, the sum of the squares of the deviations
of the sample values from the sample mean is
102·4 and in another sample of 12 observations from
another normal population, the sum of the squares of
the deviations of the sample values from
the sample mean is 120·5. Examine whether the two
normal populations have the same variance.
The nicotine contents in two samples of tobacco are
given below :
Sample 1: 21 24 25 26 27
Sample 2: 22 27 28 30 31 36
Can you say that the two samples came from the same
population ?
x1 = 24.6, x 2 = 29.0
s = 4.24, s = 18.0
2
1
2
2

ˆs12 = 5.30, 1 = 4
ˆs 22 = 21.60, 2 = 5

H0 : s
ˆ =s
ˆ , H1 : s
2
1
ˆ s
2
2
ˆ 2
1
2
2

F = 4.07 F0.05 (5,4) = 6.26


Since F  F0.05 , H 0 is accepted. Therefore, the
variance of the two population can be regarded
as equal.

x1 − x 2
t= = −1.92 &  = 9
 n1s12 + n 2s 22  1 1 
  + 
 n1 + n 2 − 2  n1 n 2 
If s1 & s 2 are equal and not known , then s1 = s 2 = s is approximat ed by
n s 2
+ n s 2
x1 − x 2
s =
2 1 1 2 2
. Hence , in such a situation z =
n1 + n 2  1 1  n1s12 + n 2s 22 
 +  
 n1 n 2  n1 + n 2 
n1s12 + n 2s 22
If size of sample is samll s =
n1 + n 2 − 2
x1 − x 2
t= d.f. = n1 + n 2 − 2
 1 1  n s + n s 2 2
 +  1 1

2 2

 n1 n 2  n1 + n 2 − 2 
H 0 : x1 = x 2 & H1 : x1  x 2
t  t 0.05 = 2.26
H 0 is accepted. The mean of two samples do not
differ significan tly
Therefore, the two samples could have been drawn from
the same normal population.
The. following figures give the prices in rupees of a
certain commodity in a sample of 15 shops at selected
at random from a city A and those in a sample of 13
shops from another city B.
City A: 7·41, 7.77, 7.44 , 7·40 , 7.38, 7·93, 7·58, 8·28,
7·23, 7·52,7·82,7·71, 7·84 ,7·63, 7·68
City B 7·08, 7·49, 7·42, 7·04, 6·92,7·22,7·68, 7·24, 7·74
,7.81 ,3·28 ,7·43, 7·41
Assuming that the distribution of prices in the two cities
is normal, answer the following :
(i) Is it possible that the average price of city B is Rs.
7·20?
(ii) Is it reasonable to say that the variability of prices in
the two cities is the same ?
(iii) Is it reasonable to say that the average prices are
the same in two cities?
Chi-Square Test
If X1, X 2 ,....,X n are normally distribute d independen t random variables , then
it is known that (X12 + X 22 + ....+ X 2n ) follows a probabilit y distributi on
called chi - square ( 2 − distributi on )distributi on with n deg of
freedom.

X has a chi - square distributi on  () with  degrees2

of freedom
 x (/2 )−1 −x / 2
 e ,0  x  
if the pdf of X is f(x) =  ( / 2)2 / 2

0 x 0

E(X) =  Var (X) = 2
Mean equals the number of degrees of freedom and
the variance equals twice the number of degrees of
freedom.
Properties of chi-sq distribution

1. As d.f. becomes smaller and smaller , the curve is


skewed more and more to the right. As d.f. increases
the curve becomes more and more symmetrical.
2. As d.f. tends to infinity , the chi-sq distribution
becomes a normal distribution.
P
DF 0.995 0.975 0.20 0.10 0.05 0.025 0.02 0.01 0.005 0.002 0.001
0.0000 0.0009 1.642 2.706 3.841 5.024 5.412 6.635 7.879 9.550 10.828
1 393 82
2 0.0100 0.0506 3.219 4.605 5.991 7.378 7.824 9.210 10.597 12.429 13.816

3 0.0717 0.216 4.642 6.251 7.815 9.348 9.837 11.345 12.838 14.796 16.266

4 0.207 0.484 5.989 7.779 9.488 11.143 11.668 13.277 14.860 16.924 18.467

5 0.412 0.831 7.289 9.236 11.070 12.833 13.388 15.086 16.750 18.907 20.515

6 0.676 1.237 8.558 10.645 12.592 14.449 15.033 16.812 18.548 20.791 22.458

7 0.989 1.690 9.803 12.017 14.067 16.013 16.622 18.475 20.278 22.601 24.322

8 1.344 2.180 11.030 13.362 15.507 17.535 18.168 20.090 21.955 24.352 26.124

9 1.735 2.700 12.242 14.684 16.919 19.023 19.679 21.666 23.589 26.056 27.877

10 2.156 3.247 13.442 15.987 18.307 20.483 21.161 23.209 25.188 27.722 29.588

11 2.603 3.816 14.631 17.275 19.675 21.920 22.618 24.725 26.757 29.354 31.264

12 3.074 4.404 15.812 18.549 21.026 23.337 24.054 26.217 28.300 30.957 32.909

13 3.565 5.009 16.985 19.812 22.362 24.736 25.472 27.688 29.819 32.535 34.528
DF 0.995 0.975 0.20 0.10 0.05 0.025 0.02 0.01 0.005 0.002 0.001

4.075 5.629 18.15 21.06 23.68 26.11 26.87 29.14 31.31 34.09 36.12
14
1 4 5 9 3 1 9 1 3
4.601 6.262 19.31 22.30 24.99 27.48 28.25 30.57 32.80 35.62 37.69
15
1 7 6 8 9 8 1 8 7
5.142 6.908 20.46 23.54 26.29 28.84 29.63 32.00 34.26 37.14 39.25
16
5 2 6 5 3 0 7 6 2
5.697 7.564 21.61 24.76 27.58 30.19 30.99 33.40 35.71 38.64 40.79
17
5 9 7 1 5 9 8 8 0
6.265 8.231 22.76 25.98 28.86 31.52 32.34 34.80 37.15 40.13 42.31
18
0 9 9 6 6 5 6 6 2
7.434 9.591 25.03 28.41 31.41 34.17 35.02 37.56 39.99 43.07 45.31
20
8 2 0 0 0 6 7 2 5
9.886 12.40 29.55 33.19 36.41 39.36 40.27 42.98 45.55 48.81 51.17
24
1 3 6 5 4 0 0 9 2 9
13.78 16.79 36.25 40.25 43.77 46.97 47.96 50.89 53.67 57.16 59.70
30 7 1 0 6 3 9 2 2 2 7 3
20.70 24.43 47.26 51.80 55.75 59.34 60.43 63.69 66.76 70.61 73.40
40 7 3 9 5 8 2 6 1 6 8 2
1. It is used to test the goodness of fit, i.e., it is used
to judge whether a given sample may be reasonably
regarded as a simple sample from a certain
hypothetical population.
2. It is used to test the independent of attributes. That
is , if a population is known to have two attributes
(or traits), then chi-distribution is used to test
whether the two attributes are associated or
independent, based on a sample drawn from the
population.
Conditions for validi ty of  - Test
2

1. The number of observations N in the sample must be


reasonably large, say  50.
2. Individual frequencies must not be too small, i.e. Oi  10.
3. The number of classes n must be neither to o small nor
too large, i.e. 4  n  16.
 2 − Test of Independence of Attributes
If the population is known to have two major attributes A and B, then
A can be divided into m categories A1 , A 2 ,.....,A m and B can be divided
into n categories B1 , B2 ,.....,Bn . Accordingly the members of the population
and hence those of the sample, can be divided into mn classes.
Here the data is presented in the form of matrix with m rows and n columns.
A / B 
 A B Bn 
 1 1

 . O11 . O1n 
 
 Am O mn 

H 0 : The two attributes A and B are independen t,


Oi*O* j
E ij = , i = 1,2,...m; j = 1,2,.....,n
N
Total of observed freq in the ith row  total of a observed
freq. in the jth column
E ij =
Total of all cell frequencies

 (O ij − E ij )
 
2
m n
 =  
2

i =1 j=1  E 
 ij
 = (m − 1)(n − 1)

if  <  ,
2 2

H 0 is accepted at  % LOS,
i.e. the attributes
A and B are independent.
2 Test of Independence
Example
• You’re a marketing research analyst. You ask
a random sample of 286 consumers if they
purchase Diet Pepsi or Diet Coke. At the .05
level, is there evidence of a relationship?
Diet Pepsi
Diet Coke No Yes Total
No 84 32 116
Yes 48 122 170
Total 132 154 286
2 Test of Independence
Solution
2 Test of Independence
• H0: Test Statistic:
• Ha:
=
• df =
• Critical Value(s):
Decision:
Reject
Conclusion:

0 2
• H0: No Relationship Test Statistic:
• Ha: Relationship
=
• df =
• Critical Value(s):
Decision:
Reject
Conclusion:

0 2
• H0: No Relationship Test Statistic:
• Ha: Relationship
 = .05
• df = (2 - 1)(2 - 1) = 1
• Critical Value(s):
Decision:
Reject
Conclusion:

0 2
• H0: No Relationship Test Statistic:
• Ha: Relationship
 = .05
• df = (2 - 1)(2 - 1) = 1
• Critical Value(s):
Decision:
Reject
 = .05 Conclusion:

0 3.841 2
✓ E(nij)  5 in all cells

116·132 Diet Pepsi 154·116


286 No Yes 286
Diet Coke Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286

170·132 170·154
286 286
Total of observed freq in the ith row  total of a observed
freq. in the jth column
E ij =
Total of all cell frequencies
• H0: No Relationship Test Statistic:
• Ha: Relationship 2 = 54.29
 = .05
• df = (2 - 1)(2 - 1) = 1
• Critical Value(s):
Decision:
Reject
 = .05 Conclusion:

0 3.841 2
• H0: No Relationship Test Statistic:
• Ha: Relationship 2 = 54.29
 = .05
• df = (2 - 1)(2 - 1) = 1
• Critical Value(s):
Decision:
Reject Reject at  = .05
 = .05 Conclusion:

0 3.841 2
• H0: No Relationship Test Statistic:
• Ha: Relationship 2 = 54.29
 = .05
• df = (2 - 1)(2 - 1) = 1
• Critical Value(s):
Decision:
Reject Reject at  = .05
 = .05 Conclusion:
There is evidence of a
0 3.841 2 relationship
A certain drug is claimed to be effective in curing cold. In an
experiment on 500 persons with cold, half of them were given
the drug and half of them were given the sugar pill. The
patients reaction to the treatment are recorded in the following
table. On the basis of this data, can it be concluded that the
drug and sugar pills differ signicantly in curing cold?
Use Chi – Square Test.
Helped Harmed No effect
Drug 150 30 70
Sugar pills 130 40 80
Helped Harmed No effect Total

Drug 150 140 30 35 70 75 250


Sugar pills 130 140 40 35 80 75 250
280 70 150 500

2 =S (Observed – Expected)2
Expected

= 3.56
d.f. = (2-1)(3-1) = 2

2 =3.56 < 5.02 at 5% level of sig.


Both are independent.
The following data is collected on two characters.
Based on this, can you say that there is no relation
between smoking and literacy ?
Smokers Non-
smokers
Literates 83 57
Illiterates 45 68
H 0 : Literacy and smoking habit are independen t

Smokers Non- Total


smokers
Literates 83 57 140
Illiterates 45 68 113
Total 128 125 253
O E E(rounded) (O − E ) 2

83 128  140 71 122


= 70.83 = 2.03
253 71
57
45
68
Goodness of Fit
The following data show the distribution of digits in
the numbers chosen at random from a telephone
directory:

Digit: 0 1 2 3 4 5 6 7 8 9 Total

Freq: 1026 1107 997 966 1075 933 1107 972 964 853 10000

Test whether the digits may be taken to occur equally


frequently in the directory.
H 0 : The digits occur equally frequently
If we follow above we will have
Oi : 1026,1107,.....,853
E i : 1000,1000,......,1000
2 =S (Observed – Expected)2
Expected

=
1
1000
2
 2

(26) + (107 ) + ..... = 58.542

 2
0.05 (9) = 16.919
The following data give the no of aircraft accidents
that occurred during the various day of a week.
day mon tue wed thu fri Sat

No of 15 19 13 12 16 15
acci
Test whether the accidents are uniformly distributed over
the week.
A total number of 3759 individuals were interviewed in a
public opinion survey on a political proposal. Of them,
1872 were men and rest women. A total of 2257
individuals were in favour of the proposal and 917 were
opposed to it. A total of 243 men were undecided and
442 women were opposed to the proposal. Do you
justify or contradict the hypothesis that there is no
association between sex and attitude ?

A dice is thrown 132 times with following results:


Number turned up 1 2 3 4 5 6
Frequency 16 20 25 14 29 28
Is the dice unbiased?

You might also like