Lecture 8
Lecture 8
University of Gondar,
May, 2019
Objectives
2
Hypothesis testing
◼ Z-test
◼ T-test
Testing associations
◼ Chi-Square test
Introduction # 1
3
Sampling Distribution ..........
9
Standard deviation and Standard error
x
p=
n
Example
Some BLUE estimators
17
Interval Estimation
18
The confidence level is the probability that the interval estimate will
contain the parameter, assuming that a large number of samples are
selected and that the estimation process on the same parameter is
repeated.
Confidence intervals…
20
[ x − z . , x + z . ]
2 n 2 n
[ p − z . p(1 − p) / n , p + z . p(1 − p) / n ]
2 2
Interval estimation
22
23
24
25
Confidence intervals…
26
The 95% confidence interval is calculated in such a way that, under the
conditions assumed for underlying distribution, the interval will contain true
population parameter 95% of the time.
Loosely speaking, you might interpret a 95% confidence interval as one which
you are 95% confident contains the true parameter.
90% CI is narrower than 95% CI since we are only 90% certain that the interval
includes the population parameter.
On the other hand 99% CI will be wider than 95% CI; the extra width meaning
that we can be more certain that the interval will contain the population
parameter. But to obtain a higher confidence from the same sample, we must be
willing to accept a larger margin of error (a wider interval).
Confidence intervals…
27
Confidence interval for a single mean
CI =
30 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
Confidence interval ……
31
f(t)
}
0 .2
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106 0 .1
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977 0 .0
15 1.341 1.753 2.131 2.602 2.947 -1.372
0 1.372
-2.228 2.228
16 1.337 1.746 2.120 2.583 2.921
}
}
17 1.333 1.740 2.110 2.567 2.898 t
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861 Area = 0.025 Area = 0.025
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23
24
1.319
1.318
1.714
1.711
2.069
2.064
2.500
2.492
2.807
2.797
Whenever is not known (and the population is
25
26
1.316
1.315
1.708
1.706
2.060
2.056
2.485
2.479
2.787
2.779
assumed normal), the correct distribution to use is
27
28
1.314
1.313
1.703
1.701
2.052
2.048
2.473
2.467
2.771
2.763
the t distribution with n-1 degrees of freedom.
29
30
1.311
1.310
1.699
1.697
2.045
2.042
2.462
2.457
2.756
2.750
Note, however, that for large degrees of freedom,
40
60
1.303
1.296
1.684
1.671
2.021
2.000
2.423
2.390
2.704
2.660
the t distribution is approximated well by the Z
120
1.289
1.282
1.658
1.645
1.980
1.960
2.358
2.326
2.617
2.576
distribution.
Point and Interval Estimation of the Population Proportion (p)
xi
0.295
x= i =1
= = 0.01844
n 16
Construct 90%, 95%, 98% confidence interval for the mean
(0.01844-1.65x0.0123/4, 0.01844+1.65x0.0123/4)=(0.0134, 0.0235)
(0.01844-1.96x0.0123/4, 0.01844+1.96x0.0123/4)=(0.0124, 0.0245)
(0.01844-2.33x0.0123/4, 0.01844+2.33x0.0123/4)=(0.0113, 0.0256)
Example 2
13
. .
1.350
.
1.771
.
2.160
.
2.650
.
3.012
t 0.025 = 2.145
14 1.345 1.761 2.145 2.624 2.977 The corresponding confidence interval or
15 1.341 1.753 2.131 2.602 2.947
s
.
.
.
.
.
.
.
.
.
.
.
. interval estimate is: x t 0 . 025
. . . . . . n
35
.
= 10.37 2.145
15
= 10.37 1.94
= 8.43,12.31
Example 3:
38
n=300,
Estimate of the seat belt of the city at 95%
CI = p ± z ×(√p(1-p) /n) =(0.35,0.47)
Example 4:
In a sample of 400 people who were questioned regarding their participation in sports,
160 said that they did participate. Construct a 98 % confidence interval for P, the
proportion of P in the population who participate in sports.
Solution:
Let X= be the number of people who are interested to participate in sports.
X=160, n=400, =0.02, Hence
Z 2 = Z 0.01 = 2.33
Hence, we can conclude that about 98% confident that the true proportion of people in
the population who participate in sports between 34.5% and 45.7%.
HYPOTHESIS TESTING
40
Introduction
Researchers are interested in answering many types of
questions. For example, A physician might want to know
whether a new medication will lower a person’s blood
pressure.
Examples
2
1
Choose a. The value should be small, usually less
Identify the null hypothesis H0 and
than 10%. It is important to consider the
the alternate hypothesis HA.
consequences of both types of errors.
3
Select the test statistic and determine 4
its value from the sample data. This
value is called the observed value of Compare the observed value of the statistic to the
the test statistic. Remember that t critical value obtained for the chosen a.
statistic is usually appropriate for a
small number of samples; for larger 5
number of samples, a z statistic can Make a decision.
work well if data are normally 6
distributed. Conclusion
Test Statistics
46
Observed _ Hypothesized
Test statistics = value value .
Standard error
The known distributions are Normal distribution, student’s distribution , Chi-
square distribution ….
Critical value
47
The critical value separates the critical region from the noncritical region for
a given level of significance
Decision making
48
H0: m = m0 /2
H1: m m0
0
Two tailed test
Hypothesis testing about a Population mean (μ)
53
2 H 0 : m = m 0 ( = 0 )
H A : m1 m 0 ( 0 )
x − m0
z cal = , ztabulated = z for one tailed test
n
if z cal − ztab reject H o
Decision :
if z cal − ztab do not reject H o
3 H 0 : m = m 0 ( = 0 )
H A : m1 m 0 ( 0 )
if z cal ztab reject H o
Decision :
if z cal ztab do not reject H o
The P- Value
56
When the p-value is less than to 0.05, we often say that the
result is statistically significant.
Hypothesis testing for single population mean
61
If the sample size is small (if np<5 and n(1-p)<5) then use student’s
t- statistic for the tabulated value of the test statistic.
Chi-square test
67
The null hypothesis for this test is there is no association between the
variables. Consequently a significant p-value implies association.
Test of Association
71
Additionally, chi squared test should not be used when the observed values
in a cell are <5. It is, at times not inappropriate to pad an empty cell with a
small value, though, as one can only assume the result would be more
significant with no value there.
Test Statistic: 2-test with d.f. = (r-1)x(c-1)
72
(O − E ij )
2
= 2 ij
i, j E ij
i raw total j column total Ri C j
th th
Eij = =
grand total n
Oij=observed frequency, Eij=expected frequency of the cell at the
juncture of I th raw & j th column
Chi-square test...
73
= 153.40
Chi-square table
78 Right tail areas for the Chi-square Distribution
df\area .995 .990 .975 .950 .900 .750 .500 .250 .100 .050 .025 .010 .005
1 0.00004 0.00016 0.00098 0.00393 0.01579 0.10153 0.45494 1.32330 2.70554 3.84146 5.02389 6.63490 7.87944
2 0.01003 0.02010 0.05064 0.10259 0.21072 0.57536 1.38629 2.77259 4.60517 5.99146 7.37776 9.21034 10.5966
3 0.07172 0.11483 0.21580 0.35185 0.58437 1.21253 2.36597 4.10834 6.25139 7.81473 9.34840 11.3448 12.8381
4 0.20699 0.29711 0.48442 0.71072 1.06362 1.92256 3.35669 5.38527 7.77944 9.48773 11.1432 13.2767 14.8602
5 0.41174 0.55430 0.83121 1.14548 1.61031 2.67460 4.35146 6.62568 9.23636 11.0705 12.8325 15.0862 16.7496
6 0.67573 0.87209 1.23734 1.63538 2.20413 3.45460 5.34812 7.84080 10.6446 12.5915 14.4493 16.811 18.5475
7 0.98926 1.23904 1.68987 2.16735 2.83311 4.25485 6.34581 9.03715 12.0170 14.0671 16.0127 18.4753 20.2777
8 1.34441 1.64650 2.17973 2.73264 3.48954 5.07064 7.34412 10.2188 13.3615 15.5073 17.5345 20.0902 21.9549
Assumptions of the 2 - test
79
Hypothesis
H0: there is no association between the treatment and relapse
H1: there is no association between the treatment and relapse
The degree of freedom for this table is df = (3-1)(2-1) = 2. thus the critical
value from chi-square distribution is given by = 9.21
Quiz
84
HIV
STDs Hx No Yes Total
No 84 32 116
Yes 48 122 170
Total 132 154 286
Summery
85
Characteristics χ2
1. Every χ2 distribution extends indefinitely to the right from 0.
2. Every χ2 distribution has only one (right ) tail.
3. As df increases, the χ2 curves get more bell shaped and approach the normal
curve in appearance (but remember that a chi square curve starts at 0, not at
-∞)
4. If the value of χ2 is zero, then there is a perfect agreement between the
observed and the expected frequencies. The greater the discrepancy between
the observed and expected frequencies, the larger will be the value of χ2.