Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
10 views

Testing of Hypothesis Unit-I

The document discusses key concepts in hypothesis testing including: - Hypothesis are statements about parameters or distributions that may or may not be true. Statistical hypotheses can be simple, specifying the population completely, or composite. - Null and alternative hypotheses are defined. The null hypothesis is the hypothesis being tested and is typically denoted as H0. The alternative hypothesis is complementary to the null. - Errors in hypothesis testing include type I errors, where the null is incorrectly rejected, and type II errors, where it is incorrectly accepted. - Other concepts covered are test statistics, critical regions, significance levels, confidence intervals, and p-values. Normal distributions and their properties are also summarized.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Testing of Hypothesis Unit-I

The document discusses key concepts in hypothesis testing including: - Hypothesis are statements about parameters or distributions that may or may not be true. Statistical hypotheses can be simple, specifying the population completely, or composite. - Null and alternative hypotheses are defined. The null hypothesis is the hypothesis being tested and is typically denoted as H0. The alternative hypothesis is complementary to the null. - Errors in hypothesis testing include type I errors, where the null is incorrectly rejected, and type II errors, where it is incorrectly accepted. - Other concepts covered are test statistics, critical regions, significance levels, confidence intervals, and p-values. Normal distributions and their properties are also summarized.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Hypothesis

An assumption or a statement which may or may not be true is known as hypothesis.

Statistical hypothesis:
A statement or an assumption about parameter/s or probability distribution of a population which may or may
not be true is known as statistical hypothesis.

Simple and composite hypothesis


A statistical hypothesis which specifies the population completely is known as simple hypothesis.
A statistical hypothesis which does not specifies the population completely is known as composite hypothesis.

Example:-
1) The hospital administrator wants to test the hypothesis that the average length of stay of
patients admitted to hospitals is 5 days.
2) H:µ=75 cents, where µ is the true population average of a daily per-student candy +soda
expenses in US high schools.
3) H:p < 0.10, where p is the population proportion of defective helmets for a given manufacturer.
4) If µ1 and µ2 denote the true average breaking strengths of two different types of twine, one
hypothesis might be assertion that µ1 - µ2 =0 or another is the statement µ1 - µ2 >5.
If X1, X2, X3, ……Xn is a random sample of size n from normal population with mean µ and variance
𝜎 2 , then the hypothesis.
Ho=𝜇 = 𝜇0 , 𝜎 2 = 𝜎02 is a simple hypothesis, whereas each of the following hypothesis is a
composite hypothesis.
1) 𝜇 = 𝜇0 ,
2) 𝜎 2 = 𝜎02
3) 𝜇 < 𝜇0 , 𝜎 2 = 𝜎02
4) 𝜇 > 𝜇0 , 𝜎 2 = 𝜎02
5) 𝜇 = 𝜇0 , 𝜎 2 < 𝜎02
6) 𝜇 = 𝜇0 , 𝜎 2 > 𝜎02
7) 𝜇 < 𝜇0 , 𝜎 2 > 𝜎02
Ex:- X ~ N(𝜇, 𝜎 2 )
So 𝜇 = 10 , 𝜎 2 = 4 is a simple hypothesis
𝜇 = 10 , 𝜎 2 > 4
𝜇 = 10 , 𝜎 2 < 4
𝜇 = 10 , 𝜎 2 ≠ 4
𝜇 ≠ 10 , 𝜎 2 = 4 these are all composite hypothesis.
Null and alternative hypothesis
The hypothesis being tested in a statistical significance test of significance is known as null hypothesis.
According to prof. R. A Fisher, null hypothesis is the hypothesis which is tested for possible rejection
under the assumption that is true. Null hypothesis is denoted by Ho.
Any hypothesis which is complementary of the null hypothesis is known as alternative hypothesis.
Alternative hypothesis is known as H1.
Example :- Two different concerns manufacture drugs for inducing sleep, drug A manufactured by
first concern and drug B manufactured by second concern. Each company claims that its drug is
superior that of the other and it is desired to test which is a superior drug A or B ? To formulate the
statistical hypothesis let X be a random variable which denotes the additional hours of sleep gained by
an individual when drug A is given and let the random variable Y denote the additional hours of sleep
gained when drug B is used. Let us suppose that X and Y follow the probability distribution with
means 𝜇𝑥 and 𝜇𝑦 respectively. Here our null hypothesis would be that there is no significance between
the effects of two drugs. Symbolically,
Ho= 𝜇𝑥 = 𝜇𝑦

While the alternative hypothesis given as


H1=𝜇𝑥 ≠ 𝜇𝑦 i.e. there is significance difference between the two drugs

Or
H1 =𝜇𝑥 > 𝜇𝑦 i.e drug A is more effective than drug B

Or
H1 =𝜇𝑥 < 𝜇𝑦 i.e drug B is more effective than drug A

Types of tests
Null hypothesis Alternative hypothesis
Ho= 𝜇 = 100 H1= 𝜇 ≠100 (two tailed)
H1= 𝜇 >100 (one tailed – right tailed)
H1= 𝜇 <100 (one tailed – left tailed)

Ho= 𝜇 ≥100 H1= 𝜇 < 100 (one tailed)

Ho= 𝜇 ≤100 H1= 𝜇 > 100 (one tailed)

Test statistics:- Test statistics is a function of sample observation used to decide whether null
hypothesis will be rejected or accepted.
For example:- To test Ho= 𝜇𝑥 = 𝜇0 v/s H1=𝜇𝑥 ≠ 𝜇0
𝑥̅ −𝜇0
Test statistics is Z= 𝜎
√𝑛

Critical region:- It is also known as rejection region.


Divide the sample space into two mutually disjoint subsets say A and B such that if sample point lies in
region A then Ho will be rejected and if sample point lies in region B then Ho will be accepted. Thus
region A is known as rejection region and B is known as acceptance region.
Thus reject Ho if they belong to A and accept if they belong to B.

Critical region A

Acceptance region B

Level of significance:-
Level of significance is the size of the type I error (or the maximum producer’s risk). The levels of significance
usually employed in testing of hypothesis are 5% and 1 %. The level of significance is always fixed in advance
before collecting the sample information. Maximum allowable probability of making type I error. This
probability is denoted by 𝛼.

Confidence Interval
Confidence intervals measure the degree of uncertainty or certainty in a sampling method. They can
take any number of probability limits, with the most common being a 95% or 99% confidence
level. Statistics use confidence intervals to measure uncertainty in an estimate of a population
parameter based on a sample.

Errors in Testing of hypothesis


There are two types of errors in testing of hypothesis
Type I error and Type II error

1) Type I error :- Rejecting Ho , when Ho is true (False positive). It is denoted by 𝛼. i.e When we
reject a hypothesis when it should be accepted
2) Type II error :- Accepting Ho , when Ho is false (False negative). It is denoted by 𝛽. When we
accept a hypothesis when it should be rejected.
The Wolf Story to remember the difference between Type 1 and Type 2 errors)
This is the most popular method to remember the difference between Type 1 and Type 2 errors.
Everyone probably knows the story of the boy who cried wolf! You only need to remember that the
villagers committed both Type 1 and Type 2 errors in that order.
Type I vs. Type II Errors
When the boy cries, pretending wolf encountered for the 1st time, the villagers believe it to be true and
rush to the scene but there’s no wolf. This is a false positive or Type 1 error.
Then the boy cries actually the 2nd time when a wolf is encountered and the villagers ignore or don’t
believe there wolf. This is a false negative or Type 2 error.

P-value
The p-value measures the “extremeness” of the sample. The p-value is the probability we would get
the sample we have or something more extreme if the null hypothesis were true. The p-value is the
lowest level of significance at which the observed value of a test statistic is significant (i.e., one rejects
Ho).
• This probability is calculated assuming that the null hypothesis is true.
• Beware:-The p-value is not the probability that Ho is true, nor is it an error probability!
• The p-value is between 0 and 1.
Using P- Value as a decision tool
P value is the smallest value of 𝛼 for which we can reject the null hypothesis Ho.
Calculating P value
Calculating P value depends on the alternative hypothesis.
𝑥̅ −𝜇0
Suppose that Z= 𝜎 is computed from value of the test statistics. The following table illustrate
√𝑛
How to compute P-values and how to use P-value for testing of hypothesis.

Hypothesis Testing
A review of the key difference between population, sample and sampling distribution.
Normal Distribution
If i. the area under a curve from the ordinate x=a to x=b is equal to the probability that x takes values
between a and b
ii. the total area under the curve is unity
then the curve is called a normal curve and its probability function is given by
(𝑥−𝑚)2
1
y = f(x)= 𝑒 2𝜎2 , -∞ < x < ∞,
𝜎 √2𝜋

-∞ < m < ∞,

0 < 𝜎 2 < ∞,

P( ∞ < x < ∞) =∫
−∞
𝑓(𝑥)𝑑𝑥 =1

Properties of Normal distribution-


1.The normal curve is bell shaped and symmetric about x= m
2. Mean, Median and Mode coincide
3. The x axis is an asymptote to the curve
4. The 1st and 3rd Quartile are equidistant from the mean
𝑄3 − 𝑄1 2 2 2
Quartile deviation= = 𝜎 , 𝑄1 = m- 𝜎 , 𝑄3 = m + 𝜎
2 3 3 3
5. Relationship between Quartile Deviation, Mean Deviation and Standard Deviation
2 4
Quartile deviation= 𝜎 Mean Deviation = 𝜎
3 5
6Q.D = 5 M.D = 4 S.D

6. The points of inflexion are m ± 𝜎


7. Skewness is zero
P(m- σ<X< m+σ)=0.6825 P(-1<Z< +1)=0.6825
P(m- 2σ<X< m+2σ)=0.9544 P(-2<Z< +2)=0.9544
P(m-3 σ<X< m+3σ)=0.9973 P(-3 <Z< +3)=0.9973
1. The normal distribution is given by

𝑥2 9
( + )
Y= k 𝑒 18 2

Find 1. K 2. Mean 3. Variance 4. 1st Quartile 5. 3rd Quartile 6. Mean Deviation 7. Quartile deviation

(𝑥−9)2
Y= k 𝑒 18

Comparing with y =
(𝑥−𝑚)2
1
f(x)= 𝑒 2𝜎2
𝜎 √2𝜋

We find, m=9, 2𝜎 2 =18…. 𝜎= 3


1
K= 3√2𝜋

2. If mean and standard deviation of a continuous variable are 12 and 16 respectively. Find 1. The
points of inflexion 2. The equation of asymptote 3. Quartile deviation 4. 1st Quartile 5. 3rd Quartile 6.
Mean Deviation
Hence we find the need for defining a Standard Normal Variate.
𝑋−𝑚
X ~ N(m,𝜎 ) then z = ~ SNV(0,1)
𝜎

x2
1
f(z)= 𝑒 2
√2𝜋

P(m- 𝜎 < X < m + 𝜎 ) =0.6825→ P(-1<z<1) =0.6825


P(m- 2 𝜎 < X < m+ 2 𝜎)=0.9544→ P(-2<z<2) = 0.9544
P(m-3 𝜎 < X < m + 3 𝜎)=0.9973→ P(-3<z<3) = 0.9973
P(-1.96<z<1.96) =0.95
P(-2.58<z<2.58) =0.99
1. For a SNV z, if P (z≥ 1.83) = 0.0336,
find the probability that
1. z lies in between -1.83 and 0
2. z is less than 1.83
3. z is less than 1.83
4. z is greater than 1.83
5. |z|≤ 1.83 6. |z|≥ 1.83

2. If X ~ 12,4 ,Find
i.P(X≥ 20)
ii.P(X≤ 20)
[Given that 47.72% of the items lie between z=0 and z=2, where z is a SNV]

3. If X ~ 30,5 and P(0≤ z ≤ 0.8) = 0.2881 and P(0 ≤ z ≤ 2) = 0.4772 [z~(0,1)]


Find i. P(26≤ x ≤ 40)
ii. P(|x-30|>5)

4. If X ~ 20,5 and P(z≥ 1.6) = 0.0548 P(z ≤− 1.2) = 0.1151 [z~(0,1)]Find P(12≤ x ≤ 14)

5.In the F.Y.B.Com class, there are seven hundred students. It was observed that 80% of the students
obtained above 45 marks in a subject while 50% of them obtained above 57 marks. Assuming the
distribution of marks to be normally distributed, determine
i. modal marks of the students
ii the number of students who obtained between 45 and 69 marks

6.The distribution of monthly income of 3000 balwadi teachers confirms to a normal curve with mean equal to
rupees 600 and standard deviation equal to rupees hundred. [For a SNV z , area between 0 and 2 is 0.4772 and
between 0 and 1.83 is 0.4667]
Find
1.percentage of teachers having monthly income of more than rupees 800
2.number of teachers having monthly income less than rupees 400
3.the highest monthly income among the lowest paid hundred teachers
4.the lowest monthly income of the highest paid hundred teacher
7.The qualifying marks for a certain examination are 35 and to secure distinction one has to secure
more than 74 marks. If 25% of the students fail where as 6.681 % obtained distinction, determine the
mean and standard deviation, assuming that the distribution of marks is normal .Given that for a
standard normal variate Z, probability that z takes values between 0 and 1.5 is 0.43319.

8.The local authorities in a certain City installed 10,000 electric lamps in the streets of a city. If these
lamps have an average life of rupees 1000 burning hours with a standard deviation of 200 hours, what
number of lamps might be expected to fail
1.in the first 800 hours
2.between 800 and 1200 hours

9.The income of a group of 10000 persons were found to be normally distributed with mean rupees
500 and standard deviation rupees 60. Find the lowest income of the richest 500. also find the limits
within which the middle 50% of the persons earn their income. [Given that for a standard normal
variate Z the area between 0 and 1.645 is 0.45]
10. In a distribution exactly normal 7% of the items are under 35 and 89% are under 63. What are the
mean and standard deviation of the distribution. Given that for a standard normal variate Z, the area
between 0 and 1.23 is 0.39 and the area between 0 and 1.48 is 0.43
11. Of a large group of men ,5% are under 60 inches in height, 40% are between 60 and 65 inches.
Assuming a normal distribution, Find the mean height and standard deviation. [P[0<z<0.13]=0.05
P[0<z<1.645]=0.45
Hypothesis testing for mean when σ is known
We are going to examine two equivalent ways to perform a hypothesis test: the classical
approach and the p-value approach. The classical approach is based on standard deviations.
This method compares the test statistic (Z-score) to a critical value (Z-score) from the standard
normal table. If the test statistic falls in the rejection zone, you reject the null hypothesis.
The p-value approach is based on area under the normal curve. This method compares the area
associated with the test statistic to alpha (α), the level of significance (which is also area under
the normal curve). If the p-value is less than alpha, you would reject the null hypothesis.

2) when σ is known
We are going to examine two equivalent ways to perform a hypothesis test: the classical
approach and the p-value approach. The classical approach is based on standard deviations.
This method compares the test statistic (Z-score) to a critical value (Z-score) from the standard
normal table. If the test statistic falls in the rejection zone, you reject the null hypothesis.
The p-value approach is based on area under the normal curve. This method compares the area
associated with the test statistic to alpha (α), the level of significance (which is also area under
the normal curve). If the p-value is less than alpha, you would reject the null hypothesis.
Hypothesis Test about the Population Mean (μ) when the Population Standard Deviation
(σ) is Unknown
The population standard deviation (σ) is not known. We can estimate the population standard
deviation (σ) with the sample standard deviation (s). However, the test statistic will no longer
follow the standard normal distribution. We must rely on the student’s t-distribution with n-1
degrees of freedom. Because we use the sample standard deviation (s), the test statistic will
change from a Z-score to a t-score.
𝑥̅ −𝑢 𝑥̅ −𝑢
Z= 𝜎 t= 𝑠
√𝑛 √𝑛
For example
1) Afzal weighs the contents of 50 more packets of crisps and finds that the mean weight
of his sample is 24.7 g. The weight stated on the packet is 25 g and the manufacturers
claim that the weights are normally distributed with standard deviation 1 g. Can Afzal
justifiably complain that these packets are underweight?
Solution:- For this problem
Ho= M = 25g
H1= M  25g
As Afzal suspects that the crisps are underweight he will reject the null hypothesis for
unusually low values of 𝑥̅ . The critical region consists of these values at the extreme left hand
end of the distribution of 𝑥̅ , which have a 5% probability in total. (This is called a one tailed
test.)
The critical value of z, which can be found from normal distribution tables, is -1. 645.
Now the test statistic is
𝑥̅ −𝜇0 24.7−25
Z= 𝜎 = Z= 1 = -2.12
√𝑛 √50

This value of z is significant as it is less than the critical value,–1.645, and falls in the critical
region for unusual values of 𝑥̅ . As it is extremely unlikely under Ho (and is better explained by
H1) you can reject Ho . Afzal's results are such that he has good cause to complain to the
manufacturers!.

A school dentist regularly inspects the teeth of children in their last year at primary
school. She keeps records of the number of decayed teeth for these 11-year-old children in
her area. Over a number of years, she has found that the number of decayed teeth was
approximately normally distributed with mean 3.4 and standard deviation 2.1. She visits
just one middle school in her rounds. The class of 28 12-year-olds at that school have a
mean of 3.0 decayed teeth. Is there any significant difference between this group and her
usual 11-year-old patients?
𝑛
This is used when s2 is the sample variance ,But for larger n, =1
𝑛−1
𝑥̅ −𝑢
And so 𝜎 2 = 𝑠 2 and so use the test statistics , z = 𝑠
√𝑛

You might also like