Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Hypothesis Testing

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 29

HYPOTHESIS TESTING

n
1
• x= ∑ x i
n i=1
n
1
• s21= ∑
n−1 i=1
( x i−x )
2

n
1
s = ∑ ( x i−x )
2 2

n i=1
• If E(statistic) = parameter

• then the statistic is said to be an Unbiased Estimate of the parameter.

• Sample mean is an unbiased estimate of the population mean.

• This means that the average of all sample means equals the population mean.

• E ( x )=μ

Also, E ( s1 ) =σ E ( s2 ) ≠ σ 2
2 2
and

• Unknown parameters are estimated using sample observations.

• Parameter values are fixed.

• Values of statistics vary from sample to sample.

• Each sample has some probability of being chosen.

• Each value of a statistic is associated with probability.

• Thus, Statistic is a random variable.

• Distribution of a statistic is called a sampling distribution.

• Distribution of a statistic may not be the same as the distribution of the population.

• We saw in previous example, E ( x )=μ∧Var ( x )=σ 2 /n.

• This is always true and can be proved as below:

( )
n n n
1 1 1
• E ( x )=E ∑ x i = ∑ E ( x i )= ∑ μ=μ
n i=1 n i=1 n i =1

( )
n n n
1 1 1 1
• Var ( x )=Var ∑ x i = 2 ∑ Var ( x i )= 2 ∑ σ 2= 2 n σ 2=σ 2 /n
n i =1 n i=1 n i=1 n
• Square root of variance is generally called as standard deviation.

• Here we shall call it Standard Error.

• Different samples of the same size from the same population yield different sample means.

• Standard Error of x is a measure of the variability in different values of sample mean.

Central Limit Theorem


• When population distribution is N(μ, σ),

σ
• Then x N (μ , )
√n
• When the population distribution is not normal,

σ
• Then also, x N (μ , ), provided n → ∞ .
√n
• Practically, this result is true for n ≥ 30.

• The result may also be written as

( x−μ)
• N (0 ,1)
σ /√n
• Clearly, this result is valid when

• Sample comes out of a normal population, or

• Sample size is large (n ≥ 30).

• Suppose a population has mean μ = 8 and standard deviation σ = 3.


• Suppose a random sample of size n = 36 is selected.
• What is the probability that the sample mean is between 7.75 and 8.25?
• P ( 7.75< x <8.25 ) ?
σ

x N (μ , ) x N (8 , 0.5)
√ n , or
• Using Excel,
• P ( 7.75< x <8.25 ) = NORM.DIST(8.25,8,0.5,1)-
NORM.DIST(7.75,8,0.5,1)

POPULATION & SAMPLE PROPORTIONS

• X and π are population parameters.


• x and p are sample statistics.
• p provides an estimate of π .
• Note that, x B(n , π )
• E ( x )=n π ,
• Var (x )=n π (1 – π ),
• This implies that
• E( p)=E (x /n)=π ,
2
• Var ( p)=Var (x /n)=n π (1 – π )/n =π (1 – π )/n .
• Standard Error ( p)=√ [Var ( p)]=√[ π (1 – π )/n]

• When the sample size n is large enough, binomial distribution approaches normal distribution.
• So, for large n ,
p−π



√ π (1−π )
n
~N (0,1),
This is a particular case of central limit theorem.
• Practically, this result is true for n ≥ 30.
Or, when nπ ≥ 5as well as n(1 – π )≥5 .

• We have seen the following 2 results:


x−μ
• N ( 0 , 1)
σ /√ n
• This result is valid:
• When sample size is 30 or more, or
• When parent population has normal distribution
p−π
N ( 0 ,1 )
• √ π ( 1−π ) /n
• This result is valid
• when sample size is 30 or more, or
• When nπ ≥ 5 , as well as n ( 1−π ) ≥ 5

• Two types of error:


• Type I Error: Reject H0, when it is true
• Size of Type I Error = P(Type I Error)
• =P(Reject H0, when it is true)
• =α (Also called Producer’s risk)
• Type II Error: Accept H0, when it is wrong
• Size of Type II Error = P(Type II Error)
• =P(Accept H0, when it is wrong)
• =β (Also called Consumer’s risk)
• Size of Type I Error (α) is called the Level of Significance.
• α is set by the researcher in advance.

• Critical value divides the whole area under the probability curve into two regions:
• Critical (Rejection) region
• When the statistical outcome falls into this region, H is rejected.
0
• Size of this region is α.
• Acceptance Region
• When the statistical outcome falls into this region, H is
0
accepted. Size of this region is (1-α).
Testing of Statistical Hypothesis
(One Samples Test)

Testing of Hypothesis for µ (z-test)


• Conditions/ Assumptions:
• Population is normal or n ≥ 30
• σ is known or n ≥ 30
x−μ
Z c=
σ
Test Statistic:
√n
1. Obtain the Critical Values using Excel or the Statistical Table
• Excel Formula
• For TTT: NORM.S.INV(α/2) and
NORM.S.INV(1 - α/2)
• For RTT: NORM.S.INV(1- α)
• For LTT: NORM.S.INV(α)
• p – value Approach
• Let Z be the computed value of the test statistic and Z ~ N (0,1)
c
• Then p – value is given by the following probability
• For two-tailed tests: 2P(Z> |Zc|)
• Excel Formula: 2*(1-
NORM.S.DIST(ABS(Zc),1))
• For right-tailed tests: P(Z> Zc)
• Excel Formula: 1-NORM.S.DIST(Zc,1))
• For left-tailed tests: P(Z< Zc)
• Excel Formula: NORM.S.DIST(Zc,1))

Testing of Hypothesis for µ (z-test)


• Conditions/ Assumptions:
• n<30; Population is normal; σ is unknown
x−μ
Z c=
σ
• Test Statistic:
√n
1.Obtain the Critical Values using t
distribution with (n-1) degree of
freedom (t ). (n−1)

• Excel Formula
• For TTT: T.INV(α/2,n-1) and
T.INV(1 - α/2 ,n-1)
• For RTT: T.INV(1- α ,n-1)
• For LTT: T.INV(α ,n-1)
2. p – value Approach in t-test
1. Let Tc be the computed value of the test statistic and T ~t(n-1)
2. Then p – value is given by the following probability
• For two-tailed tests: 2P(T> |Tc|)
1. Excel Formula: 2*(1-T.DIST(ABS(Tc),n-1,1))
• For right-tailed tests: P(T> Tc)
1. Excel Formula: 1-T.DIST(Tc, n-1,1))
• For left-tailed tests: P(T< Tc)
1. Excel Formula: T.DIST(Tc, n-1,1))

Testing of Statistical Hypothesis


(Two Samples Tests)
x 1−x 2
Z c= N (0 ,1)


2 2
• Z test for two independent samples σ1 σ 2
+
n 1 n2
x1 −x2
Z c= N (0 , 1)


2 2
• Z test for two independent samples s s 1 2
+
n1 n2
• t test for two independent samples assuming equal variances
x 1−x 2
T c= t ( n +n −2 ) 1
[ ( n1−1 ) s 1+ ( n2−1 ) s2 ]

2 2 2
• 1 1 1 2
, where S =
S + n 1 +n 2 −2
n1 n2
• Use t (n +n −2) distribution for critical value/ p-value.
1 2

• t test for two independent samples assuming unequal variances


2
x 1−x 2 ( s21 /n 1+ s 22 /n 2)
T c= t (f ) f=

√ [ ]
2 2
• 2
s1 s2
+
2
, Where ( s 1 /n1 ) ( s 2 /n2 )
2 2

n1 n2 +
n1 −1 n2−1
d
• paired t test T c= t (n−1)
sd / √ n
• Testing the Hypothesis for Difference of Proportions
p1− p 2
Z c= N (0 , 1) n1 p 1+ n2 p2

√^π (1− ^π ) 1
+
1
n1 n2( )
, where ^π =
n1 + n2

( p1 − p2 )
N ( 0 ,1 )
• Thus,
√ 1 1
π (1−π )( + )
n1 n2
.
• In the given example, we have three populations.
• We wish to test
• H0: π1 = π2 = π3 (All the proportions are the same)
• H1: Not all π1, π2, π3 are equal
• The table of data shown in the example is called the Contingency Table.
• Contingency Tables are used to classify sample observations according to two or more
characteristics.
• Contingency Table is useful in situations involving multiple population proportions.
• Let a contingency table has r rows and c columns.
• Then, it will have r x c cells
Chi square tests are always right tailed.

• We always have

• If we approximate some expected frequency, we must make sure that above condition is
satisfied.
• In these problems, data is of discrete type
• Chi – Square distribution is a continuous distribution.
• It loses its validity if any expected frequency is less than FIVE.
• In such case, the expected frequency is pooled with the preceding or succeeding
frequency.
• D.f. is reduced by one for one such pooling.
• We do not make any assumption about the distribution of parent population.
• The difference between two means can be examined using t – test or Z – test.
• If we have more than 2 samples.
• We wish to test the hypothesis that
• all the samples are drawn from the population having the same means.
• Or all population means are the same.
• We use ANOVA.

• ANOVA is essentially a
procedure for testing the
difference among various
groups of data for
homogeneity.
• At its simplest, ANOVA tests
the following hypotheses:
 H0: The means of all the
groups are equal.
 H1: Not all the means are
equal
• doesn’t say how
or which ones
differ.
• Can follow up
with “multiple
comparisons”

ANOVA IS ALWAYS RIGHT TAILED
TOO
• If the observations are large, you can
shift their origin and scale.
• This will not change the result.
• Shifting origin means adding or
subtracting some constant.
• Shifting of scale means multiplying or
dividing by some constant.
• Two-way analysis of variance is
an extension of one-way
analysis of variance.
• The variation is controlled by
two factors.
• The values of random variable
X are affected by different
levels of two factors.
• Assumptions
 The populations are normally
distributed.
 The samples are
independent.
 The variances of the
populations are equal.

• HA0: All levels of Factor A have the


same effect
• HA1: All levels of Factor A don’t have
the same effect
• HB0: All levels of Factor B have the
same effect
• HB1: All levels of Factor B don’t have
the same effect
• HAB0: There is no interaction effect
• HAB1: Interaction effect is there

You might also like