Biostat Inferential Statistics
Biostat Inferential Statistics
Biostat Inferential Statistics
Daniel B.
(BSc/Public Health .
MPH/Biostatistics )
1
Objectives
At the end of this class students will be able to:
3
Inferential Statistics
4
Inferential Statistics
Involves
– Estimation Population?
Population?
– Hypothesis
testing
Purpose
– Make decisions about
population
characteristics
Inferential Statistics
Inferential statistics
Hypothesis
Estimation
testing
One Point
sample estimation
Two Interval
samples estimation
6
Inferential process
7
Statistical Estimation
Estimation is the process of determining a likely value of
population parameter, based on information
collected from the sample
Estimation is the use of sample statistics to estimate the
corresponding population parameters
The objective of estimation is to determine the approximate
value of unknown population parameter on the basis of a
sample statistic
8
Sample Statistics as Estimators of Population
Parameters
Parameters
Random sample
Estimation
Statistics
10
Estimation
Estimation
Point Interval
estimation estimation
11
Point and Interval Estimates
A point estimate is a single numerical value used as an estimate of a
population parameter
Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Interval estimate
12
Estimation Process
Interval estimate
Population Point estimate
Mean I am 95%
confident that
Mean, , is X = 50
is between 40 &
unknown
60.
RandomSample
13
Point estimation
A single numerical value used to estimate the
corresponding population parameter
Gives little information about how close the value is to the
unknown population parameter
Example: Sample mean X = 3 is point estimate of
unknown population mean
14
Sample statistic &their corresponding
population parameter
Statistic Parameter
Mean: X estimates
Variance: s2 estimates 2
Standard
deviation:
s estimates
Proportion: p estimates
From entire
From sample
population
15
Properties of good estimate
a) Unbiasedness: An estimator is said to be unbiased if
its expected value is equal to the population parameter it
estimates.
For example: when, the sample mean is an unbiased
estimator of the population mean E ( X )
Unbiasedness is an average or long-run property. The
mean of any single sample will probably not equal to the
population mean, but the average of the means of
repeated independent samples from a population will
equal to the population mean.
16
Properties of good estimate
b) Minimum variance: An estimate which has
a minimum standard error is a good estimator
For symmetrical distribution the mean has a
minimum standard error and
If the distribution is skewed the median has
a minimum standard error
17
Properties of good estimate
Consistency:: An
C) Consistency
C) An estimator
estimator isis said
said to
to be
be consistent
consistent ifif its
its
probability of
probability of being
being close
close to
to the
the parameter
parameter itit estimates
estimates increases
increases as
as
thesample
the samplesize
sizeincreases
increases
Consistency
n = 10 n = 100
18
Sampling Distribution of Means
22
Interval estimation
It is not reasonable to assume that a sample statistic
value is exactly equal to the corresponding population
parameter
An interval estimate locates the population parameter
within an interval, with a level of confidence is
needed
23
Confidence Interval or Interval Estimate
Confidence
Confidence interval
interval or
or interval
interval estimate
estimate isis aa range
range or
or interval
interval of
of
numbersbelieved
numbers believedtotoinclude
includeananunknown
unknownpopulation
populationparameter
parameter
Associated
Associated with
with the
the interval,
interval, there
there isis aa measure
measure ofof the
the confidence
confidence
we have
we have that
that the
the interval
interval does
does indeed
indeed contain
contain the
the parameter
parameter ofof
interest
interest
Confidence
Confidence interval:
interval: provide
provide aa range
range of of values
values of
of the
the estimate
estimate
likely to
likely to include
include the the “true”
“true” population
population parameter
parameter withwith aa given
given
probability
probability
25
Common Confidence Levels
Commonly used confidence levels are 90%, 95%, and
99%
Confidence
Confidence
Coefficient, Z/2 value
Level
1
80% .80 1.28
90% .90 1.645
95% .95 1.96
98% .98 2.33
99% .99 2.58
99.8% .998 3.08
99.9% .999 3.27
26
Intervals and Level of Confidence
Sampling Distribution of the Mean
σ σ
x z /2 /2 /2 x z /2
n 1 n
x
μ =μ
Intervals x
x1
extend from 100(1-)%
x2
σ of intervals
x z /2
n constructed
to contain μ;
σ 100()% do
x z /2
n Confidence Intervals not
27
Interpretation of
CI
Probabilistic Practical
n↑
x 29
Central Limit Theorem
Rule of thumb: Normal approximation for the
sampling distribution will be good if n > 30. If n <
30, the approximation is only good if the
population from which you are sampling is not
too different from normal.
Otherwise t-distribution
30
CI for population mean: Large-sample
size and when is known
For sufficiently large sample size n >30, the sampling
distribution of the sample mean, is approximately
normal
A 100(1‐α)% C.I. for μ is:
σ σ σ
x z /2 (x - z /2 , x + z /2 )
n n n
α is to be chosen by the researcher, most common values of α
are 0.05, 0.01, 0.001 and 0.1
31
CI for population mean: Large-sample
size and when is unknown
Whenever is not known (and the population is assumed
normal), the correct distribution to use is the t distribution
with n-1 degrees of freedom. However, for large degrees of
freedom, the t distribution is approximated well by the Z
distribution
33
Solution
s
X Z /2
n
10
170 1.96
100
(168.04, 171.96)
34
CI for population mean: small sample
size (n<30) and when is unknown
If population standard deviation is unknown,
then the sample means from samples of size n
are t-distributed with n-1 degrees of freedom
A 100(1‐α)% C.I. for μ is:
s
X t /2, n-1
n
35
Student’s t-distribution
t Z as n increases
Standard
Normal
(t with df = ∞)
0 t
36
Student’s t Table
1.51 0.221
(1.289, 1.731)
We are 95% sure that the population mean μ
lies between 1.289 and 1.731
39
The t Table
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
1.645 1.960 2.326 2.576
1.282
CI for a population proportion: Large-
sample size
For sufficiently large samples, the sampling distribution of the
proportion p is approximately normal
A 100(1‐α)% CI for π is:
p±z α/2
p(1-p)
n
41
Example
A study on dental health practice. Of 300
adults interviewed, 123 said that, they
regularly had a dental check‐up twice a
year. What is the 95% C.I. for π?
42
P = 123/300 = 0.41 a point estimator of π
α = 0.05 ⇒ z 0.025 = 1.96
p±zα/2 p(1-p)
n
(0.41)(0.59)
0.41±1.96
300
⇒ (0.3543, 0.4656)
Interpretation: ?
43
Interval Width…
44
The Level of Confidence and the Width of the Confidence
Interval
When sampling
When sampling from
from thethe same
same population,
population,
using aa fixed
using fixed sample
sample size,
size, the
the higher
higher the
the
confidence level,
confidence level, the
the wider
wider thethe confidence
confidence
interval.
interval.
45
The Sample Size and the Width of
the Confidence Interval
When sampling
When sampling from
from the thesame
samepopulation,
population,using
using aa fixed
fixed
confidence level,
confidence level, the
the larger
larger the
the sample
sample size,
size, n,
n, the
the
narrower the
narrower the confidence
confidence interval.
interval.
46
Sample size determination
47
Sample size determination
Common questions:
49
50
Example
51
Margin of Error
52
Factors Affecting Margin of Error
- Descriptive/Analytic
Degree of precision or accuracy – the allowed deviation from
the true population parameter (can be within 1% to 5%)
Degree of confidence level required
Availability of resources
55
Estimation of single mean
(zα/ 2 )
2 2
n= 2
Where: d
n = sample size
= population standard deviation if known,
56
Example
Suppose that for a certain group of cancer patients, we are
interested in estimating the mean age at diagnosis. We would
like a 95% CI of 5 years wide. If the population SD is 12 years,
how large should our sample be?
57
But the population is most of the time unknown
standard deviation, s
Similar studies
58
Estimation of single proportion
(zα/ 2 ) 2 pq
n=
d2
Where:
n = sample size
P = percentage
q = 1-p
d = desired degree of precision
Z= is the standard normal value at the level of
confidence desired, usually at 95% confidence
level
59
Example
A) Suppose that you are interested to know the proportion of
infants who breastfed >18 months of age in a rural area.
Suppose that in a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample size is required to
estimate the true proportion within ±3% with 95% confidence
60
Example
B)If the above sample is to be taken from a relatively small
population (say N = 3000) , the required minimum sample will
be obtained from the above estimate by making some
adjustment (if the population is less than 10,000 then a
smaller sample size may be required).
n 683
n final = 557
n 683
1+ 1
N 3000
61
An estimate of p is not always available
However, the formula may also be used for sample size
calculation based on various assumptions for the values of p.