Hypothesis Testing
Hypothesis Testing
Hypothesis Testing
n
1
• x= ∑ x i
n i=1
n
1
• s21= ∑
n−1 i=1
( x i−x )
2
n
1
s = ∑ ( x i−x )
2 2
•
n i=1
• If E(statistic) = parameter
• This means that the average of all sample means equals the population mean.
• E ( x )=μ
Also, E ( s1 ) =σ E ( s2 ) ≠ σ 2
2 2
and
• Distribution of a statistic may not be the same as the distribution of the population.
( )
n n n
1 1 1
• E ( x )=E ∑ x i = ∑ E ( x i )= ∑ μ=μ
n i=1 n i=1 n i =1
( )
n n n
1 1 1 1
• Var ( x )=Var ∑ x i = 2 ∑ Var ( x i )= 2 ∑ σ 2= 2 n σ 2=σ 2 /n
n i =1 n i=1 n i=1 n
• Square root of variance is generally called as standard deviation.
• Different samples of the same size from the same population yield different sample means.
σ
• Then x N (μ , )
√n
• When the population distribution is not normal,
σ
• Then also, x N (μ , ), provided n → ∞ .
√n
• Practically, this result is true for n ≥ 30.
( x−μ)
• N (0 ,1)
σ /√n
• Clearly, this result is valid when
• When the sample size n is large enough, binomial distribution approaches normal distribution.
• So, for large n ,
p−π
•
•
√ π (1−π )
n
~N (0,1),
This is a particular case of central limit theorem.
• Practically, this result is true for n ≥ 30.
Or, when nπ ≥ 5as well as n(1 – π )≥5 .
• Critical value divides the whole area under the probability curve into two regions:
• Critical (Rejection) region
• When the statistical outcome falls into this region, H is rejected.
0
• Size of this region is α.
• Acceptance Region
• When the statistical outcome falls into this region, H is
0
accepted. Size of this region is (1-α).
Testing of Statistical Hypothesis
(One Samples Test)
• Excel Formula
• For TTT: T.INV(α/2,n-1) and
T.INV(1 - α/2 ,n-1)
• For RTT: T.INV(1- α ,n-1)
• For LTT: T.INV(α ,n-1)
2. p – value Approach in t-test
1. Let Tc be the computed value of the test statistic and T ~t(n-1)
2. Then p – value is given by the following probability
• For two-tailed tests: 2P(T> |Tc|)
1. Excel Formula: 2*(1-T.DIST(ABS(Tc),n-1,1))
• For right-tailed tests: P(T> Tc)
1. Excel Formula: 1-T.DIST(Tc, n-1,1))
• For left-tailed tests: P(T< Tc)
1. Excel Formula: T.DIST(Tc, n-1,1))
√
2 2
• Z test for two independent samples σ1 σ 2
+
n 1 n2
x1 −x2
Z c= N (0 , 1)
√
2 2
• Z test for two independent samples s s 1 2
+
n1 n2
• t test for two independent samples assuming equal variances
x 1−x 2
T c= t ( n +n −2 ) 1
[ ( n1−1 ) s 1+ ( n2−1 ) s2 ]
√
2 2 2
• 1 1 1 2
, where S =
S + n 1 +n 2 −2
n1 n2
• Use t (n +n −2) distribution for critical value/ p-value.
1 2
√ [ ]
2 2
• 2
s1 s2
+
2
, Where ( s 1 /n1 ) ( s 2 /n2 )
2 2
n1 n2 +
n1 −1 n2−1
d
• paired t test T c= t (n−1)
sd / √ n
• Testing the Hypothesis for Difference of Proportions
p1− p 2
Z c= N (0 , 1) n1 p 1+ n2 p2
•
√^π (1− ^π ) 1
+
1
n1 n2( )
, where ^π =
n1 + n2
( p1 − p2 )
N ( 0 ,1 )
• Thus,
√ 1 1
π (1−π )( + )
n1 n2
.
• In the given example, we have three populations.
• We wish to test
• H0: π1 = π2 = π3 (All the proportions are the same)
• H1: Not all π1, π2, π3 are equal
• The table of data shown in the example is called the Contingency Table.
• Contingency Tables are used to classify sample observations according to two or more
characteristics.
• Contingency Table is useful in situations involving multiple population proportions.
• Let a contingency table has r rows and c columns.
• Then, it will have r x c cells
Chi square tests are always right tailed.
• We always have
• If we approximate some expected frequency, we must make sure that above condition is
satisfied.
• In these problems, data is of discrete type
• Chi – Square distribution is a continuous distribution.
• It loses its validity if any expected frequency is less than FIVE.
• In such case, the expected frequency is pooled with the preceding or succeeding
frequency.
• D.f. is reduced by one for one such pooling.
• We do not make any assumption about the distribution of parent population.
• The difference between two means can be examined using t – test or Z – test.
• If we have more than 2 samples.
• We wish to test the hypothesis that
• all the samples are drawn from the population having the same means.
• Or all population means are the same.
• We use ANOVA.
• ANOVA is essentially a
procedure for testing the
difference among various
groups of data for
homogeneity.
• At its simplest, ANOVA tests
the following hypotheses:
H0: The means of all the
groups are equal.
H1: Not all the means are
equal
• doesn’t say how
or which ones
differ.
• Can follow up
with “multiple
comparisons”
•
ANOVA IS ALWAYS RIGHT TAILED
TOO
• If the observations are large, you can
shift their origin and scale.
• This will not change the result.
• Shifting origin means adding or
subtracting some constant.
• Shifting of scale means multiplying or
dividing by some constant.
• Two-way analysis of variance is
an extension of one-way
analysis of variance.
• The variation is controlled by
two factors.
• The values of random variable
X are affected by different
levels of two factors.
• Assumptions
The populations are normally
distributed.
The samples are
independent.
The variances of the
populations are equal.