Biostat Inferential Statistics

Inferential Statistics
Daniel B.
(BSc/Public Health .
MPH/Biostatistics )
1
Objectives
At the end of this class students will be able to:
 Define Inferential statistics
 Know statistical estimation
 Understand hypothesis testing & the “types of

errors” in decision making.
 Use test statistics to examine hypothesis about

population parameter
2
Inference
Use a random sample to

learn something about a
larger population
3
 Inferential Statistics: Are statistical

methods used for drawing conclusions
about a population based on the
information obtained from the a sample of
observations drawn from that population
4
 Involves
– Estimation Population?
Population?
– Hypothesis
testing
 Purpose
– Make decisions about
population
characteristics
Inferential statistics
Hypothesis
Estimation
testing
One Point
sample estimation
Two Interval
samples estimation
6
Inferential process
7
Statistical Estimation
 Estimation is the process of determining a likely value of
population parameter, based on information
collected from the sample
 Estimation is the use of sample statistics to estimate the
corresponding population parameters
 The objective of estimation is to determine the approximate
value of unknown population parameter on the basis of a
sample statistic
8
Sample Statistics as Estimators of Population
Parameters
A sample statistic is a A population parameter

numerical measure of a is a numerical measure of
summary characteristic of a summary characteristic
a sample. of a population.
 An estimator of a population parameter is a sample statistic

used to estimate or predict the population parameter
 An estimate of a parameter is a particular numerical value

of a sample statistic obtained through sampling. 9
Estimation
Every member of the population
has the same chance of being
selected in the sample
Population
Parameters
Random sample
Estimation
Statistics
10
Estimation
Estimation
Point Interval
estimation estimation
11
Point and Interval Estimates
 A point estimate is a single numerical value used as an estimate of a
 Interval estimate is a range or interval of numbers believed to include

unknown population parameter with a certain degree of assurance
 Point estimate is always within the interval estimate
Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Interval estimate
12
Estimation Process
Interval estimate
Population Point estimate
Mean I am 95%
 
 confident that 
Mean, , is  X = 50
is between 40 &
unknown
 60.
 
RandomSample


 


13
Point estimation
 A single numerical value used to estimate the
corresponding population parameter
 Gives little information about how close the value is to the
unknown population parameter
 Example: Sample mean X = 3 is point estimate of
unknown population mean
14
Sample statistic &their corresponding
Statistic Parameter
Mean: X estimates 
Variance: s2 estimates 2
Standard
deviation:
s estimates 
Proportion: p estimates 
From entire
From sample
population
15
Properties of good estimate
a) Unbiasedness: An estimator is said to be unbiased if
its expected value is equal to the population parameter it
estimates.
 For example: when, the sample mean is an unbiased
estimator of the population mean E ( X )  
 Unbiasedness is an average or long-run property. The
mean of any single sample will probably not equal to the
population mean, but the average of the means of
repeated independent samples from a population will
equal to the population mean.
16
b) Minimum variance: An estimate which has
a minimum standard error is a good estimator
 For symmetrical distribution the mean has a
minimum standard error and
 If the distribution is skewed the median has
a minimum standard error
17
Consistency:: An
C) Consistency
C) An estimator
estimator isis said
said to
to be
be consistent
consistent ifif its
its
probability of
probability of being
being close
close to
to the
the parameter
parameter itit estimates
estimates increases
increases as
as
thesample
the samplesize
sizeincreases
increases
Consistency
n = 10 n = 100
18
Sampling Distribution of Means
• The sampling distribution of means is one of the

most fundamental concepts of statistical inference,
and it has remarkable properties.
• Since it is a frequency distribution it has its own

mean and standard deviation .
Example:
 Age of individuals is a random variable.
 Similarly, mean age is a random variable.
 Conclusions about values of population parameters
based on one individual value can not be drawn.
 It should be based on sample statistic computed from
adequate sample size.
 Similarly, take a sample and calculate the statistic, e.g,
mean.
 Take another sample (same size) and calculate mean.
 Repeat & repeat & repeat &…….
 Do you expect all the sample means the same? NO
 They will vary BUT less variation
 Put all these sample statistic together to get a
distribution of sample statistic.
Properties
1. The mean of the sampling distribution of means
is the same as the population mean, μ .
2. The SD of the sampling distribution of means is
σ / √n .
3. The shape of the sampling distribution of means
is approximately a normal curve, regardless of the
shape of the population distribution and provided
n is large enough (Central limit theorem).
Interval estimation
 A single-valued estimate conveys little information about
the actual value of the population parameter, about the
accuracy of the estimate
 The probability of getting a sample statistic value that is
exactly equal to the corresponding population parameter is
usually quite small
22
Interval estimation
 It is not reasonable to assume that a sample statistic
value is exactly equal to the corresponding population
parameter
 An interval estimate locates the population parameter
within an interval, with a level of confidence is
needed
23
Confidence Interval or Interval Estimate
 Confidence
Confidence interval
interval or
or interval
interval estimate
estimate isis aa range
range or
or interval
interval of
of
numbersbelieved
numbers believedtotoinclude
includeananunknown
unknownpopulation
populationparameter
parameter
Associated
Associated with
with the
the interval,
interval, there
there isis aa measure
measure ofof the
the confidence
confidence
we have
we have that
that the
the interval
interval does
does indeed
indeed contain
contain the
the parameter
parameter ofof
interest
interest
Confidence
Confidence interval:
interval: provide
provide aa range
range of of values
values of
of the
the estimate
estimate
likely to
likely to include
include the the “true”
“true” population
parameter withwith aa given
given
probability
probability
 A confidence interval or interval estimate has two

components:
A range or interval of values
An associated level of confidence
24
Confidence Level
1. Probability that the unknown population
parameter falls within interval
2. Denoted (1 – 
• is probability that parameter is not within
interval
3. Typical values are 99%, 95%, 90%
25
Common Confidence Levels
 Commonly used confidence levels are 90%, 95%, and
99%
Confidence
Confidence
Coefficient, Z/2 value
Level
1 
80% .80 1.28
90% .90 1.645
95% .95 1.96
98% .98 2.33
99% .99 2.58
99.8% .998 3.08
99.9% .999 3.27
26
Intervals and Level of Confidence
Sampling Distribution of the Mean
σ σ
x  z /2 /2 /2 x  z /2
n 1  n
x
μ =μ
Intervals x
x1
extend from 100(1-)%
x2
σ of intervals
x  z /2
n constructed
to contain μ;
σ 100()% do
x  z /2
n Confidence Intervals not
27
Interpretation of
CI
Probabilistic Practical
In repeated sampling 100(1-)

We are 100(1-)% confident
% of all intervals around
sample means will in the long
that the single computed CI
contains μ
run include μ
Central Limit Theorem
As the sample size gets large enough, the sampling distribution
becomes almost normal regardless of shape of population
n↑
x 29
Central Limit Theorem
 Rule of thumb: Normal approximation for the
sampling distribution will be good if n > 30. If n <
30, the approximation is only good if the
population from which you are sampling is not
too different from normal.
 Otherwise t-distribution
30
CI for population mean: Large-sample
size and when  is known
 For sufficiently large sample size n >30, the sampling
distribution of the sample mean, is approximately
normal
 A 100(1‐α)% C.I. for μ is:
σ σ σ
x  z /2  (x - z /2 , x + z /2 )
n n n
 α is to be chosen by the researcher, most common values of α
are 0.05, 0.01, 0.001 and 0.1
31
CI for population mean: Large-sample
size and when  is unknown
 Whenever  is not known (and the population is assumed
normal), the correct distribution to use is the t distribution
with n-1 degrees of freedom. However, for large degrees of
freedom, the t distribution is approximated well by the Z
distribution
 A large sample 100(1‐α)% C.I. for μ is:

s
x  z
2 n
 Note that: when  is unknown, s is a good approximation
of  32
Example
 An epidemiologist studied the blood glucose level of a
random sample of 100 patients. The mean was 170, with a
SD of 10. Construct the 95% CI for the population mean.
33
Solution
s
X  Z /2
n
10
170  1.96
100
 (168.04, 171.96)
 We are 95% sure that the mean blood glucose level

of the population lies between 168.04 and 171.96
34
CI for population mean: small sample
size (n<30) and when  is unknown
 If population standard deviation is unknown,
then the sample means from samples of size n
are t-distributed with n-1 degrees of freedom
 A 100(1‐α)% C.I. for μ is:
s
X  t /2, n-1
n
35
Student’s t-distribution
t Z as n increases
Standard
Normal
(t with df = ∞)
t-distributions are bell- t(df = 13)

shaped and symmetrical
about zero, but have
‘fatter’ tails than the t (df = 5)
normal
0 t
36
Student’s t Table
Right Tail Area

Let: n = 3
df .10 .05 .025 df = n - 1 = 2
 = .10
1 3.078 6.314 12.706 /2 =.05
2 1.886 2.920 4.303
3 1.638 2.353 3.182 /2 = .05
The body of the table

contains t values 0 2.920 t
37
Example
 A study of hypoxemia during the immediate post-operative
period reported the fractions of ideal weight for 11 patients
who became severely hypoxemic during transfer to the
recovery room. The mean is 1.51 and the standard
deviation is 0.33. Estimate the 95% CI for the population
mean fraction of ideal weight, where the population
consists of hypoxemic patients similar to those in the
study (The data is normally distributed, use α=0.05)
38
Solution
t /2, n-1 = t 0.025, 10 =2.2281
0.33
1.51  2.2281
11
1.51  0.221
(1.289, 1.731)
 We are 95% sure that the population mean μ
lies between 1.289 and 1.731
39
The t Table
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
 1.645 1.960 2.326 2.576
1.282
CI for a population proportion: Large-
sample size
 For sufficiently large samples, the sampling distribution of the
proportion p is approximately normal
 A 100(1‐α)% CI for π is:
p±z α/2
p(1-p)
n
A sample is considered large enough when both n  p and n  q are greater

than 5, where q =1-p.
41
Example
 A study on dental health practice. Of 300
adults interviewed, 123 said that, they
regularly had a dental check‐up twice a
year. What is the 95% C.I. for π?
42
P = 123/300 = 0.41 a point estimator of π
α = 0.05 ⇒ z 0.025 = 1.96
p±zα/2 p(1-p)
n
(0.41)(0.59)
0.41±1.96
300
⇒ (0.3543, 0.4656)
Interpretation: ?
43
Interval Width…
 The width of the confidence interval estimate is a

function of the confidence level, the population
standard deviation, and the sample size…
44
The Level of Confidence and the Width of the Confidence
Interval
When sampling
When sampling from
from thethe same
same population,
population,
using aa fixed
using fixed sample
sample size,
size, the
the higher
higher the
the
confidence level,
confidence level, the
the wider
wider thethe confidence
confidence
interval.
interval.
45
The Sample Size and the Width of
the Confidence Interval
When sampling
When sampling from
from the thesame
samepopulation,
population,using
using aa fixed
fixed
confidence level,
confidence level, the
the larger
larger the
the sample
sample size,
size, n,
n, the
the
narrower the
narrower the confidence
confidence interval.
interval.
46
Sample size determination
47
Sample size determination
 Common questions:
– “How many subjects should I study?”

– Too small sample: -Waste of time and resources
-Results have no practical use

– Too large sample: -Waste of resources
-Data quality compromised
-Any small difference can be

statistically significant
48
When deciding on sample size:
 Precision is related to confidence level & CI
49
50
Example
 A prevalence of 10% from a sample size of 20
– would have a 95% CI of 3% to 23%,

– which is not very precise or informative
 But, a prevalence of 10% from a sample of size 400
– would have a 95% CI of 7% to 13%,

– may be considered sufficiently accurate
51
Margin of Error
52
Factors Affecting Margin of Error
 Margin of error is determined by n, s and α

– As n increases, the width of CI decreases.
– As s increases, the width of CI increases
– As the confidence level increases (αdecreases),the
width of CI increases
53
Reducing the Margin of Error
σ
ME  zα/ 2
n
 The margin of error can be reduced if
– the sample standard deviation is lower (s↓)
– The sample size is increased (n↑)
– The confidence level is decreased, (1 – ) ↓

Sample size determination depends on:
 Objective of the study

 Design of the study
- Descriptive/Analytic
 Degree of precision or accuracy – the allowed deviation from
the true population parameter (can be within 1% to 5%)
 Degree of confidence level required
 Availability of resources
55
Estimation of single mean
(zα/ 2 )  
2 2
n= 2
Where: d
n = sample size
 = population standard deviation if known,
d = desired degree of precision = half of the width of CI

Z= is the standard normal value at the level of confidence
desired, usually at 95% confidence level
56
Example
 Suppose that for a certain group of cancer patients, we are
interested in estimating the mean age at diagnosis. We would
like a 95% CI of 5 years wide. If the population SD is 12 years,
how large should our sample be?
(zα/ 2 )2   2 (1.96) 2  (144)

n= 2
 2
 88.5  89
d (2.5)
57
But the population is most of the time unknown
As a result, it has to be estimated from:

 Pilot or preliminary sample:
– Select a pilot sample and estimate with the sample
standard deviation, s
 Similar studies
58
Estimation of single proportion
(zα/ 2 ) 2  pq
n=
d2
Where:
n = sample size
P = percentage
q = 1-p
d = desired degree of precision
Z= is the standard normal value at the level of
confidence desired, usually at 95% confidence
level
59
Example
A) Suppose that you are interested to know the proportion of
infants who breastfed >18 months of age in a rural area.
Suppose that in a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample size is required to
estimate the true proportion within ±3% with 95% confidence
(zα/ 2 )  pq (1.96)2  (0.2)(0.3)

2
n= 2
 2
 683
d (0.03)
60
Example
B)If the above sample is to be taken from a relatively small
population (say N = 3000) , the required minimum sample will
be obtained from the above estimate by making some
adjustment (if the population is less than 10,000 then a
smaller sample size may be required).
n 683
n final =   557
n 683
1+ 1
N 3000
61
 An estimate of p is not always available
 However, the formula may also be used for sample size
calculation based on various assumptions for the values of p.
Note: if no prior information about the proportion (p),

assume p=q=0.5
62

Biostat Inferential Statistics

Uploaded by

Copyright:

Available Formats

Biostat Inferential Statistics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biostat Inferential Statistics

Uploaded by

Copyright:

Available Formats

Inferential Statistics

 Define Inferential statistics

 Know statistical estimation

 Understand hypothesis testing & the “types of

 Use test statistics to examine hypothesis about

Use a random sample to

 Inferential Statistics: Are statistical

A sample statistic is a A population parameter

 An estimator of a population parameter is a sample statistic

 An estimate of a parameter is a particular numerical value

 Interval estimate is a range or interval of numbers believed to include

 Point estimate is always within the interval estimate

• The sampling distribution of means is one of the

• Since it is a frequency distribution it has its own

 A confidence interval or interval estimate has two

In repeated sampling 100(1-)

 A large sample 100(1‐α)% C.I. for μ is:

 We are 95% sure that the mean blood glucose level

t-distributions are bell- t(df = 13)

Right Tail Area

The body of the table

A sample is considered large enough when both n  p and n  q are greater

 The width of the confidence interval estimate is a

– “How many subjects should I study?”

-Results have no practical use

-Any small difference can be

 Precision is related to confidence level & CI

 A prevalence of 10% from a sample size of 20

– would have a 95% CI of 3% to 23%,

– would have a 95% CI of 7% to 13%,

 Margin of error is determined by n, s and α

– the sample standard deviation is lower (s↓)

– The sample size is increased (n↑)

– The confidence level is decreased, (1 – ) ↓

 Objective of the study

d = desired degree of precision = half of the width of CI

(zα/ 2 )2   2 (1.96) 2  (144)

As a result, it has to be estimated from:

– Select a pilot sample and estimate with the sample

(zα/ 2 )  pq (1.96)2  (0.2)(0.3)

Note: if no prior information about the proportion (p),

You might also like