Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

5 Inferential Statistics V - April 21 2014

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Inferential Statistics V:

Confidence Interval and


Estimation Theory

Mr. Shital Bhandary


Assistant Professsor
DCHS, PAHS-SOM
Objectives:
Describe and discuss the statistical
estimation on means and proportions.

Compute and interpret point and interval


estimation
Compute and interpret confidence intervals for
means and proportions
Apply the point and interval estimation for
statistical inferences
Introduction
 Everybody makes estimates.
 The PAHS Basic and clinical coordinators
make estimates of average marks of their
students.
 A student can make estimate of his/her marks
on different subjects.
 Estimation is a part of the statistical inference.
 Statistical inference is the branch of statistic
concerned with using probability concepts to
deal with uncertainty in decision making.
Estimator and Estimates
 Any sample of statistic that is used to
estimate a population parameter is called
an estimator. For example, the sample
mean can be an estimator of the
population mean.
 When we observe a specific numerical
value of the estimator, we call that value
an estimate.
 We form an estimate by taking a (random)
sample and computing the value taken by
the estimator in that sample.
Sample Statistic, Population Parameter
and the Estimate
Statistical Population Sample Estimate
Parameter Parameter Statistic or
Estimator
Mean μ X 5000 unit

Standard σ s 100 unit


Deviation
Proportion P p 20%

Range R R 2000 unit


Criteria for a good estimator
Some statistics are better estimators than
others.
A good estimator is as close to the true
value of the parameter as possible.
A good estimator should fulfill the following
criteria:
1) Unbiasedness 2) Efficiency
3) Consistency 4) Sufficiency
Types of estimates

Estimation of population parameters


like mean, proportion etc. from the
corresponding sample statistics is
one of the very important problems in
statistical inference.
Thus, we can use two types of
estimates about population as (i)
point estimate and (ii) interval
estimate.
For example:

A Point estimate is a single number that is


used to estimate an unknown population
parameter. For example, if you say Mr. X
will get 68% in the third year, then you are
making a point estimate.
Interval estimate is a range of number that
is used to estimate an unknown population
parameter. For example, if you say Mr. X
will get 65 – 70% in the third year, then
you are making an interval estimate.
For example:
If we have to find the statistically sound
range of values for height of the
students of this class using random
sampling, then we must have –
Average height ( X )
Level of Significance (  )
Tabulated Z or t-value at  level of
significance.
Standard error of mean: σ/√n or s/√n
Z or t-value: which one to use?

Theoretically,

If the sample size is greater than 30 then Z-


value is used

If the sample size is less or equal to 30 then t-


value is used
Z or t-value: which one to use?

Practically,

t-distribution based values can be used for any


sample size

Why?
t-distribution: history

It is a statistical distribution published by


William Gosset in 1908, working at
Guiness Breweries.

He published it under psuedoname


“student” as per Guiness Breweries policy.

Thus, it is also widely known as “student t-


distribution”
Student t-distribution: definition

Student t-distribution: properties

When σ = s and t = Z then the distribution


becomes the normal distribution.

In other words,

As N increases, student t-distribution


approaches the normal distribution.
t-distribution vs. normal distribution
Confidence Limit Interval (CLI) for Mean

 ( X ) - Z * SE( X ) : Lower Confidence Limit


(LCL) or Lower Bound
 ( X ) + Z * SE ( X ) : Upper Confidence Limit
(UCL) or Upper Bound
 LCL and UCL are knows as confidence limits.
 The range from LCL to UCL is known as
confidence interval.

N.B.: If X is an unbiased estimator of μ then we


can write X = μ.
Use of CLI 1
 Department of Statistics, TU claimed that the
average height of its students is 162 cm with a
standard deviation of 10 cm.
 Now, let us take a small sample of TU students
from all the schools and test the claim at 95%
confidence level.
 A sample of 500 students is taken using
stratified random sampling. Average height of
these students is found to be 160 cm.
Solution:
 Population mean = μ = 162 cm.
 Sample mean = X = 160 cm.
 Standard Deviation = σ = 10 cm.
 Level of significance =  = 5%
 Standard error of mean = 10/√500 = 0.447
 Since, n > 30, we use Z-distribution.
 Thus, Z (95%) = 1.96
 LCL = 160 – (1.96)*(0.447) = 160 - 0.87612 =
159.12 cm
 UCL = 160 + (1.96)*(0.447) = 160 + 0.87612 =
160.88 cm
Decision:

Since the population mean does not lie


within the calculated confidence limit
interval, we reject the claim of Statistics
Department, TU.
Use of CLI 2
 KU Examination Committee has claimed that
the average marks of the first to third batch of
medical students is 64.5%.
 Now, let us take a 50% sample of KU medical
students from first to third batches and test the
claim at 99% confidence level.
 A sample of 64 students is taken using
systematic sampling and found the average
marks as 62.9% with standard deviation of 2.2%.
Solution:
 Population mean = μ = 64.5 %
 Sample mean = X = 62.9 %
 Standard Deviation = s = 2.2
 Level of significance =  = 1%
 Standard error of mean = 2.2/√64 = 0.275
 Since, n > 30, we use Z-distribution.
 Thus, Z (99%) = 2.58
 LCL = 62.9 – (2.58)*(0.275) = 62.9 – 0.7095 =
62.19 %
 UCL = 62.9 + (2.58)*(0.275) = 62.9 + 0.7095 =
63.61%
Decision:

Since the population mean does not lie


within the calculated confidence limit
interval, we reject the claim of KU
Examination Committee.
Use of CLI 3
 Suppose that the average weight of this class is
found to be 61 kg with standard deviation of 5 kg
using a simple random sample of 20 students. If
the average weight of all the students is claimed
as 63 kg by a student, is it correct claim?
 At first we use t-distribution because n < 30:
 Secondly, α=5% (Note: When α is not given, it is
always set as 5%)
 LCL will be: 61 - (2.093) (5/√20) = 58.66
 UCL will be: 61 + (2.093) (5/√20) = 63.34
 Thus, the 95% confidence interval is:
 (58.66, 63.34)
Decision:

Since the population mean lies within the


calculated confidence limit interval, we
accept the claim of that student.
Exercise 11

A count of malaria parasites in 100 fields


with 2 mm oil immersion lens gave a mean
of 35 parasites per field, standard
deviation 11.6. On counting one more field
the pathologist found 52 parasites. Does
this number lie outside the 95% reference
range? Does this number lie inside the
95% confidence interval.
Write a small comment on both the
cases.
Exercise 12
 A women measures her body temperature on
awakening on the first 10 days after
menstruation for contraceptive purpose and
obtains the following data: 97.2o, 96.8o, 97.4o,
97.4o, 97.3o, 97.0o, 97.1o, 97.3o, 97.2o and 97.3o.
 A theory exists that at the time of ovulation the
body temperature rises by an amount of 0.5oF to
1.0oF.
 Now, find the best estimate of her underlying
basal body temperature.
 Also find how accurate is this estimate.
 Comment on your results.
Confidence Limit Interval (CLI) for
Proportion
 If n > 30, then we use:
 p - Zα * SE (p) : Lower Confidence Limit (LCL)
 p - Zα * SE (p) : Upper Confidence Limit (UCL)
 So, (1-α)% confidence interval is (LCL, UCL).

 If n <= 30, then we use:


 p - tα * SE (p) : Lower Confidence Limit (LCL)
 p - tα * SE (p) : Upper Confidence Limit (UCL)
 So, (1-α)% confidence interval is (LCL, UCL).
Example 4

Dean, School of Medicine, PAHS claims


that 25% of the PAHS medical students
are current smokers. A study done on
randomly selected 64 students revealed
that only 12 students are current smokers.
Now test the claim of the PAHS SOM
Dean using confidence interval method.
Solution:
P = 25% = 0.25
p = 12/64 = 0.1875
q = 1 – 0.1875 = 0.8125
SE (p) = √(pq/n) = √[(0.1875)(0.8125)]/64
= 0.048789
LCL = 0.1875 – (1.96)(0.048789) = 0.092
UCL = 0.1875 + (1.96)(0.048789) = 0.283
As P lies in the 95% confidence interval,
we accept the PAHS SOM Dean’s claim.
Exercise 13

PAHS administration claims that 75% of


the staff are eager to move to Lele VDC,
Lalitpur before March 15, 2015. A stratified
random sampling among 15 staffs
revealed that only 7 are positive about the
move.
Now, test the claim using 99% confidence
interval and comment on your results.
Exercise 14
Nepal STEPS survey 2008 for Non-
communicable disease found 276
respondents reporting being diagnosed as
raised blood pressure 12 months before
the survey by medical doctor or health
professionals. This survey included 4400
randomly selected people, out of which
4378 gave information on this variable.
Do you agree that the prevalence of self-
reported raised blood pressure is 10%?
Confidence interval from two surveys

Confidence interval for a particular rate,


ratio and/or indicator can be obtained from
two or more studies or surveys
For instance, Maternal Mortality Ratio was
539 in 1996 NDHS whereas it was 281 in
2006 NDHS.
Was the reduction in MMR statistically
significant?
To decide:

We need to calculate


 Confidence interval of the MMR in 1996 NDHS
 Confidence interval of the MMR in 2006 NDHS

If the confidence interval overlaps then we


say the reduction is NOT statistically
significant and thus it happened due to
change (bias) alone!
Exercise 15

Using 2006 and 2011 NDHS reports find if


the reduction in:

Total Fertility Rate


Infant Mortality Rate

are statistically significant.


Interpret your result as well!
Question/Queries
Summary!
Thank you!!!

You might also like