Hypothesis Testing: Nguyễn Thị Thu Thủy
Hypothesis Testing: Nguyễn Thị Thu Thủy
Hypothesis Testing: Nguyễn Thị Thu Thủy
HYPOTHESIS TESTING
HANOI–2021
(1)
Email: thuy.nguyenthithu2@hust.edu.vn
Nguyễn Thị Thu Thủy (SAMI-HUST) ProSta–Chapter 6 HANOI–2021 1 / 73
6.1 Introduction
Content
1 6.1 Introduction
6.1.1 Key terms and concepts
6.1.2 Statistical decision for hypothesis testing
5 Problems
Introduction
Introduction
Hypothesis testing was introduced by Ronald Fisher, Jerzy Neyman, Karl Pearson and
Pearson’s son, Egon Pearson. Hypothesis testing is a statistical method that is used in
making statistical decisions using experimental data. Hypothesis Testing is basically an
assumption that we make about the population parameter.
Examples
H0 : µ1 = µ2 , which shows that there is no difference between the two population
means.
H1 : µ1 6= µ2 or H1 : µ1 > µ2 or H1 : µ1 < µ2 .
Conclusions
As you will soon see, it is easier to show support for the alternative hypothesis by
proving that the null hypothesis is false. Hence, the statistical researcher always begins
by assuming that the null hypothesis H0 is true. The researcher then uses the sample
data to decide whether the evidence favors H1 rather than H0 and draws one of these
two conclusions:
Reject H0 and conclude that H1 is true.
Accept (do not reject) H0 as true.
Level of significance
Refers to the degree of significance in which we accept or reject the
null-hypothesis.
100% accuracy is not possible for accepting or rejecting a hypothesis, so we
therefore select a level of significance that is usually 5%.
Type II error
When we accept the null hypothesis but it is false.
P [Type II error] = β.
In Hypothesis testing, the normal curve that shows the acceptance region is called
the β region.
Power
Usually known as the probability of correctly accepting the null hypothesis. 1 − β is
called power of the analysis.
Note
H0 is true H0 is false
Do not reject H0 Correct decision Type II error
Reject H0 Type I error Correct decision
One-tailed test
A one-tailed test is a statistical test in which the critical area of a distribution is
one-sided so that it is either greater than or less than a certain value, but not both.
Two-tailed test
A two-tailed test is a method in which the critical area of a distribution is two-sided and
tests whether a sample is greater than or less than a certain range of values.
Examples
1 One-tailed test:
Left-tailed test: H0 : µ1 = µ2 , H1 : µ1 < µ2 ;
Right-tailed test: H0 : µ1 = µ2 , H1 : µ1 > µ2 .
2 Two-tailed test: H0 : µ1 = µ2 , H1 : µ1 6= µ2 .
Rejection region
The rejection region is the values of test statistic for which the null hypothesis is
rejected.
Rejection region
The rejection region is the values of test statistic for which the null hypothesis is
rejected.
Example 1
You wish to show that the average hourly wage of carpenters in the state of
California is different from $14, which is the national average. This is the alternative
hypothesis, written as
H1 : µ 6= 14.
H0 : µ = 14.
You would like to reject the null hypothesis, thus concluding that the California
mean is not equal to $14.
Example 2
H1 : p < 0.03
Content
1 6.1 Introduction
6.1.1 Key terms and concepts
6.1.2 Statistical decision for hypothesis testing
5 Problems
- We should first describe the assumptions on which the experiment is based. The
model for the underlying situation centers around an experiment with X1 , X2 , . . . , Xn
representing a random sample from a distribution with mean µ and variance σ 2 > 0.
Consider first the hypothesis
H 0 : µ = µ0 , H1 : µ 6= µ0 .
- The appropriate test statistic should be based on the random variable X. In Chapter
4, the Central Limit Theorem was introduced, which essentially states that despite the
distribution of X, the random variable X has approximately a normal distribution with
mean µ and variance σ 2 /n for reasonably large sample sizes. So, µX = µ and
2
σX = σ 2 /n.
H 0 : µ = µ0 , H1 : µ 6= µ0 .
H 0 : µ = µ0 , H1 : µ > µ0 .
H 0 : µ = µ0 , H1 : µ < µ0 .
Example 3
The average weekly earnings for female social workers is $670. Do men in the same
positions have average weekly earnings that are higher than those for women? A
random sample of n = 40 male social workers showed x = $725. Assuming a
population standard deviation of $102, test the appropriate hypothesis using
α = 0.01.
Solution
- You would like to show that the average weekly earnings for men are higher than
$670, the women’s average. Hence, if µ is the average weekly earnings for male social
workers, you can set out the formal test of hypothesis in steps.
Solution (continuous)
1 Null and alternative hypotheses: H0 : µ = 670 versus H1 : µ > 670 (one-tailed
test).
2 Using the sample information, calculate
(x − µ0 ) √ (725 − 670) √
z= n= 40 = 3.41.
σ 102
3 Rejection region: For this one-tailed test, zα = 2.33 (shown in Figure 3).
4 Compare the observed value of the test statistic, z = 3.41, with the critical value
necessary for rejection, zα = 2.33. Since the observed value of the test statistic
falls in the rejection region, you can reject H0 .
5 Conclusion: The average weekly earnings for male social workers are higher than
the average for female social workers. The probability that you have made an
incorrect decision is α = 0.01.
- Values of x that are either “too large” or “too small” in terms of their distance from
µ0 are placed in the rejection region. If you choose α = 0.01, the area in the rejection
region is equally divided between the two tails of the normal distribution, as shown in
Figure 4. Using the standardized test statistic z, you can reject H0 if z > 2.58 or
z < −2.58. For different values of a, the critical values of z that separate the rejection
and acceptance regions will change accordingly.
x 0 1 2 3 4 5 6 7 8 9
0,0 0,50000 50399 50798 51197 51595 51994 52392 52790 53188 53586
0,1 53983 54380 54776 55172 55567 55962 56356 56749 57142 57535
0,2 57926 58317 58706 59095 59483 59871 60257 60642 61026 61409
0,3 61791 62172 62556 62930 63307 63683 64058 64431 64803 65173
0,4 65542 65910 66276 66640 67003 67364 67724 68082 68439 68739
0,5 69146 69447 69847 70194 70544 70884 71226 71566 71904 72240
0,6 72575 72907 73237 73565 73891 74215 74537 74857 75175 75490
0,7 75804 76115 76424 76730 77035 77337 77637 77935 78230 78524
0,8 78814 79103 79389 79673 79955 80234 80511 80785 81057 81327
0,9 81594 81859 82121 82381 82639 82894 83147 83398 83646 83891
1,0 84134 84375 84614 84850 85083 85314 85543 85769 85993 86214
1,1 86433 86650 86864 87076 87286 87493 87698 87900 88100 88298
1,2 88493 88686 88877 89065 89251 89435 89617 89796 89973 90147
1,3 90320 90490 90658 90824 90988 91149 91309 91466 91621 91774
1,4 91924 92073 92220 92364 92507 92647 92786 92922 93056 93189
1,5 93319 93448 93574 93699 93822 93943 94062 94179 94295 94408
1,6 94520 94630 94738 94845 94950 95053 95154 95254 95352 95449
1,7 95543 95637 95728 95818 95907 95994 96080 96164 96246 96327
1,8 96407 96485 96562 96638 96712 96784 96856 96926 96995 97062
1,9 97128 97193 97257 97320 97381 97441 97500 97558 97615 97670
x 0 1 2 3 4 5 6 7 8 9
2,0 97725 97778 97831 97882 97932 97982 98030 98077 98124 98169
2,1 98214 98257 98300 98341 98382 98422 99461 98500 98537 98574
2,2 98610 98645 98679 98713 98745 98778 98809 98840 98870 98899
2,3 98928 98956 98983 99010 99036 99061 99086 99111 99134 99158
2,4 99180 99202 99224 99245 99266 99285 99305 99324 99343 99361
2,5 99379 99396 99413 99430 99446 99261 99477 99492 99506 99520
2,6 99534 99547 99560 99573 99585 99598 99609 99621 99632 99643
2,7 99653 99664 99674 99683 99693 99702 99711 99720 99728 99763
2,8 99744 99752 99760 99767 99774 99781 99788 99795 99801 99807
2,9 99813 99819 99825 99831 99836 99841 99846 99851 99856 99861
3,0 0,99865 3,1 99903 3,2 99931 3,3 99952 3,4 99966
3,5 99977 3,6 99984 3,7 99989 3,8 99993 3,9 99995
4,0 999968
4,5 999997
5,0 99999997
H 0 : µ = µ0 , H1 : µ 6= µ0 .
H 0 : µ = µ0 , H1 : µ > µ0 .
H 0 : µ = µ0 , H1 : µ < µ0 .
Example 4
A local telephone company claims that the average length of a phone call is 8
minutes. In a random sample of 18 phone calls, the sample mean was 7.8 minutes
and the standard deviation was 0.5 minutes. Is there enough evidence to support
this claim at α = 0.05?
Solution
1 Null and alternative hypotheses: H0 : µ = 8 versus H1 : µ 6= 8. The test is a
two-tailed test.
√
2 α = 0.05, critical region: t < −2.110 or t > 2.110, where t = x−µ
s
0
n with 17
degrees of freedom (see Table t-distribution).
√
3 Computations: x, s = 0.5, and n = 18. Hence, t = (7.8−8)
0.5
18 = −1.70.
4 Decision: Do not reject H0 .
5 At the 5% level of significance, there is not enough evidence to reject the claim
that the average length of a phone call is 8 minutes.
Note
If n ≥ 30, T ∼ N (0; 1).
Example 5
The daily yield for a local chemical plant has averaged 880 tons for the last several
years. The quality control manager would like to know whether this average has
changed in recent months. She randomly selects 50 days from the computer database
and computes the average and standard deviation of the n = 50 yields as x = 871
tons and s = 21 tons, respectively. Test the appropriate hypothesis using α = 0.05.
t-Distribution
PP α
0, 10 0, 05 0,025 0, 01 0, 005 0, 002 0.0005
d.f. PP
P
1 3,078 6,314 12,706 31,821 63,526 318,309 363,6
2 1,886 2,920 4,303 6,965 9,925 22,327 31,600
3 1,638 2,353 3,128 4,541 5,841 10,215 12,922
4 1,533 2,132 2,776 3,747 4,604 7,173 8,610
5 1,476 2,015 2,571 3,365 4,032 5,893 6,869
6 1,440 1,943 2,447 3,143 3,707 5,208 5,959
7 1,415 1,895 2,365 2,998 3,499 4,705 5,408
8 1,397 1,860 2,306 2,896 3,355 4,501 5,041
9 1,383 1,833 2,262 2,821 3,250 4,297 4,781
10 1,372 1,812 2,228 2,764 3,169 4,144 4,587
11 1,363 1,796 2,201 2,718 3,106 4,025 4,437
12 1,356 1,782 2,179 2,681 3,055 3,930 4,318
13 1,350 1,771 2,160 2,650 3,012 3,852 4,221
14 1,345 1,761 2,145 2,624 2,977 3,787 4,140
15 1,341 1,753 2,131 2,606 2,947 3,733 4,073
16 1,337 1,746 2,120 2,583 2,921 3,686 4,015
17 1,333 1,740 2,110 2,567 2,898 3,646 3,965
18 1,330 1,734 2,101 2,552 2,878 3,610 3,922
19 1,328 1,729 2,093 2,539 2,861 3,579 3,883
20 1,325 1,725 2,086 2,528 2,845 3,552 3,850
t-Distribution
PP α
0, 10 0, 05 0,025 0, 01 0, 005 0, 002 0.0005
d.f. PP
P
21 1,323 1,721 2,080 2,518 2,831 3,527 3,819
22 1,321 1,717 2,074 2,508 2,819 3,505 3,792
23 1,319 1,714 2,069 2,500 2,807 3,485 3,767
24 1,318 1,711 2,064 2,492 2,797 3,467 3,745
25 1,316 1,708 2,060 2,485 2,787 3,450 3,725
26 1,315 1,796 2,056 2,479 2,779 3,435 3,707
27 1,314 1,703 2,052 2,473 2,771 3,421 3,690
28 1,313 1,701 2,048 2,467 2,763 3,408 3,674
29 1,311 1,699 2,045 2,462 2,756 3,396 3,659
+∞ 1,282 1,645 1,960 2,326 2,576 3,090 3,291
- The z-test for a population is a statistical test for a population proportion. The
z-test can be used when a binomial distribution is given such that np ≥ 5 and
n(1 − p) ≥ 5.
- The test statistic is the sample proportion and the standardized test statistic is z.
p̂ − µp̂ p̂ − p √
z= = p n.
σp̂ p(1 − p)
H0 : p = p0 , H1 : p 6= p0 .
H0 : p = p0 , H1 : p > p0 .
H0 : p = p0 , H1 : p < p0 .
Example 6
A college claims that more than 94% of their graduates find employment within 6
months of graduation. In a sample of 500 randomly selected graduates, 475 of them
were employed. Is there enough evidence to support the college’s claim at a 1% level
of significance?
Solution
- Verify np0 ≥ 5 and n(1 − p0 ) ≥ 5: np0 = (500)(0.94) = 470;
n(1 − p0 ) = (500)(0.06) = 30. Normal Distribution.
1 Null and alternative hypotheses: H0 : p = 0.94 versus H1 : p > 0.94 (right-tailed
test).
475
2 Computations: p̂ = 500
= 0.95,
(0.95 − 0.94) √
z= √ 500 = 0.94.
0.94 × 0.06
3 The critical value: zZα = z0.01 = 2.33 (see Table the values of standard normal
x
1 −t2
CDF Φ(x) = √ e 2 dt).
2π −∞
4 Decision: z = 0.94 < 2.33 = zα , H0 is not rejected.
5 At the 1% level of significance, there is not enough evidence to support the
college’s claim.
Example 7
Solution
- Verify np0 and n(1 − p0 ) are at least 5. np0 = (100)(0.125) = 12.5;
n(1 − p0 ) = (100)(0.875) = 87.5.
1 Null and alternative hypotheses: H0 : p = 0.125; H1 : p 6= 0.125 (two-tailed test).
5
2 Computations: p̂ = 100
= 0.05,
(0.05 − 0.125) √
z= p 100 = −2.27.
(0.125)(0.875)
Content
1 6.1 Introduction
6.1.1 Key terms and concepts
6.1.2 Statistical decision for hypothesis testing
5 Problems
Example 8
A high school math teacher claims that students in her class will score higher on the
math portion of the ACT than students in a colleague’s math class. The mean ACT
math score for 49 students in her class is 22.1 and the sample standard deviation is
4.8. The mean ACT math score for 44 of the colleague’s students is 19.8 and the
sample standard deviation is 5.4. At α = 0.10, can the teacher’s claim be supported?
Three conditions are necessary to use a t-test for small independent samples.
1 The samples must be randomly selected.
2 The samples must be independent.
3 Each population must have a normal distribution and population variances are
equal.
Example 9
Example 9 Solution
Let µ1 and µ2 represent the population means of annual incomes in Brownsville and
Greensville, respectively.
1 State the claim mathematically. H0 : µ1 = µ2 , H1 : µ1 6= µ2 .
2 The standardized error is
r s r
1 1 (n1 − 1)s21 + (n2 − 1)s22 1 1
σx1 −x2 = σ̂ + = +
n1 n2 n1 + n2 − 2 n1 n2
s r
(17 − 1)78002 + (18 − 1)73752 1 1
= + = 2564.92.
17 + 18 − 2 17 18
Content
1 6.1 Introduction
6.1.1 Key terms and concepts
6.1.2 Statistical decision for hypothesis testing
5 Problems
Example 10
A recent survey stated that male college students smoke less than female college
students. In a survey of 1245 male students, 361 said they smoke at least one pack of
cigarettes a day. In a survey of 1065 female students, 341 said they smoke at least
one pack a day. At α = 0.01, can you support the claim that the proportion of male
college students who smoke at least one pack of cigarettes a day is lower than the
proportion of female college students who smoke at least one pack a day?
Example 10 Solution
Let p1 and p2 represent the population proportions of of male and female college
students, respectively.
H0 : p1 = p2 , H1 : p1 < p2 .
Caculate
n1 p̂1 + n2 p̂2 361 + 341 702
p̄ = = = ' 0.304 and 1 − p̄ = 0.696.
n1 + n2 1245 + 1065 2310
Because 1245 × 0.304, 1245 × 0.696, 1065 × 0.304, and 1065 × 0.696 are all at
least 5, we can use a two-sample z-test.
0.29 − 0.32
z= r 1 ' −1.56.
1
0.304 × 0.696 × +
1245 1065
Content
1 6.1 Introduction
6.1.1 Key terms and concepts
6.1.2 Statistical decision for hypothesis testing
5 Problems
Problems
Problem 1
Problem 2
A local telephone company claims that the average length of a phone call is 8
minutes. In a random sample of 18 phone calls, the sample mean was 7.8 minutes
and the standard deviation was 0.5 minutes. Is there enough evidence to support
this claim at α = 0.05?
Problems
Problem 3
A marketing expert for a pasta-making company believes that 40% of pasta lovers
prefer lasagna. If 9 out of 20 pasta lovers choose lasagna over other pastas, what can
be concluded about the expert’s claim? Use a 0.05 level of significance.
Problem 4
It is believed that at least 60% of the residents in a certain area favor an annexation
suit by a neighboring city. What conclusion would you draw if only 110 in a sample
of 200 voters favored the suit? Use a 0.05 level of significance.
Problems
Problem 5
To determine whether car ownership affects a student’s academic achievement, two
random samples of 100 male students were each drawn from the student body. The
grade point average for the n1 = 100 non-owners of cars had an average and variance
equal to x1 = 2.70 and s21 = 0.36, while x2 = 2.54 and s22 = 0.40 for the n2 = 100 car
owners. Do the data present sufficient evidence to indicate a difference in the mean
achievements between car owners and nonowners of cars? Test using α = 0.05.
Problem 6
A manufacturer claims that the average tensile strength of thread A exceeds the
average tensile strength of thread B by at least 12 kilograms. To test this claim, 50
pieces of each type of thread were tested under similar conditions. Type A thread
had an average tensile strength of 86.7 kilograms with a standard deviation of 6.28
kilograms, while type B thread had an average tensile strength of 77.8 kilograms
with a standard deviation of 5.61 kilograms. Test the manufacturer’s claim using a
0.05 level of significance.
Problems
Problem 7
Engineers at a large automobile manufacturing company are trying to decide
whether to purchase brand A or brand B tires for the company’s new models. To
help them arrive at a decision, an experiment is conducted using 12 of each brand.
The tires are run until they wear out. The results are as follows:
Brand A: xA = 37, 900 kilometers, sA = 5100 kilometers.
Brand B: xB = 39, 800 kilometers, sB = 5900 kilometers.
Test the hypothesis that there is no difference in the average wear of the two brands
of tires. Assume the populations to be approximately normally distributed with
equal variances. Use a 0.01 level of significance.
Problems
Problem 8
A recent survey stated that male college students smoke less than female college
students. In a survey of 1245 male students, 361 said they smoke at least one pack of
cigarettes a day. In a survey of 1065 female students, 341 said they smoke at least
one pack a day. At α = 0.01, can you support the claim that the proportion of male
college students who smoke at least one pack of cigarettes a day is lower than the
proportion of female college students who smoke at least one pack a day?
Problem 9
In a study to estimate the proportion of residents in a certain city and its suburbs
who favor the construction of a nuclear power plant, it is found that 63 of 100 urban
residents favor the construction while only 59 of 125 suburban residents are in favor.
Is there a significant difference between the proportions of urban and suburban
residents who favor the construction of the nuclear plant? Use a 0.01 level of
significance.