Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
20 views

Sampling Distributions and Confidence Intervals

The document provides information about sampling distributions and confidence intervals. It defines key terms like sample statistic, estimator, estimate, sampling distribution, and sampling error. It explains that the sampling distribution is the distribution of all possible values of a statistic from samples of a given size from a population. The Central Limit Theorem states that even if a population is not normally distributed, the sampling distribution of the sample mean will be approximately normal for large sample sizes (n > 30). This allows inferring properties of populations from sample data. An example demonstrates calculating the probability that the sample mean falls within a given range using the normal approximation.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Sampling Distributions and Confidence Intervals

The document provides information about sampling distributions and confidence intervals. It defines key terms like sample statistic, estimator, estimate, sampling distribution, and sampling error. It explains that the sampling distribution is the distribution of all possible values of a statistic from samples of a given size from a population. The Central Limit Theorem states that even if a population is not normally distributed, the sampling distribution of the sample mean will be approximately normal for large sample sizes (n > 30). This allows inferring properties of populations from sample data. An example demonstrates calculating the probability that the sample mean falls within a given range using the normal approximation.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 68

LECTURE 8

Sampling Distributions and


Confidence Intervals
Learning Objectives

In this part of the lecture, you learn:


 The concept of the sampling distribution
 Probabilities related to the sample mean
 The Central Limit Theorem
Sample Statistic
 Sample statistic – a random variable whose
value depends on items in the random sample.
 Depending on the sample size, the sample
statistic could either represent the population
well or differ greatly from the population.
Estimators and Estimate
 Estimator – a statistic derived from a sample to infer the value of
a population parameter.
 Estimate – the value of the estimator in a particular sample.
 Population parameters are represented by Greek letters and the
corresponding statistic by Roman letters.
Example
 Consider eight random samples of size n = 5 from a
large population of GMAT scores.
 Sample mean is a statistic
 Sample mean used to estimate population mean is an estimator
 X 1  504.0 is an estimate
Sampling Distribution
 Sampling distribution is a distribution of all of the
possible values of a statistic for a given size sample
selected from a population.
 Sampling error is the difference between an estimate
and the corresponding population parameter. Example:
Sampling error  X  
 Bias is the difference between the expected value of the
estimator and the corresponding parameter. Example:
Bias  E ( X )  
Desirable Properties of
Estimators
 Unbiased:
 The expected value of the estimator equal to the parameter being estimated.
 The sample mean is an unbiased estimator of the population mean when
 Efficiency:
 Refers to the variance of the estimator’s sampling distribution.
 A more efficient estimator has smaller variance.
 Consistency: E( X )  
 The estimator converges toward the parameter being estimated as the sample size increases.
Developing a
Sampling Distribution

 Assume there is a population …


A C D
 Population size N=4 B

 Random variable, X,
is age of individuals
 Values of X: 18, 20,
22, 24 (years)
Developing a
Sampling Distribution
(continued)

Summary Measures for the Population Distribution:

μ
 X i P(x)
N .3
18  20  22  24 .2
  21
4 .1
0
σ
 (X  μ)
i
2

 2.236
18 20 22 24 x
N A B C D
Developing a
Sampling Distribution
(continued)
Now consider all possible samples of size n=2

16 Sample
1st 2nd Observation
Obs Means
18 20 22 24
1st 2nd Observation
18 18,18 18,20 18,22 18,24 Obs 18 20 22 24
20 20,18 20,20 20,22 20,24 18 18 19 20 21
22 22,18 22,20 22,22 22,24
20 19 20 21 22
24 24,18 24,20 24,22 24,24
16 possible samples 22 20 21 22 23
(sampling with
replacement)
24 21 22 23 24
Developing a
Sampling Distribution
(continued)

Sampling Distribution of All Sample Means

16 Sample Means Sample Means


Distribution
1st 2nd Observation _
Obs 18 20 22 24 P(X)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 X
Developing a
Sampling Distribution
(continued)

Summary Measures of this Sampling Distribution:

μX 
 X i

18  19  21    24
 21
N 16

σX 
 ( X i  μ X
) 2

(18 - 21)2  (19 - 21)2    (24 - 21)2


  1.58
16
Comparing the Population with its
Sampling Distribution
Population Sample Means Distribution
N=4 n=2
μ  21 σ  2.236 μX  21 σ X  1.58
_
P(X) P(X)
.3 .3

.2 .2
.1 .1
0
X
0
18 19 20 21 22 23 24
_
18 20 22 24 X
A B C D
Expected Value of Sample Mean
 Different samples of the same size from
the same population will yield different
sample means.
 Let X represent a sample mean, 
represent the population mean.
 Expected value of sample means:

E ( X)  
Standard Error of the Mean
 A measure of the variability in the mean from
sample to sample is given by the Standard
Error of the Mean:

σ
σX 
n
 Note that the standard error of the mean
decreases as the sample size increases
If the Population is Normal
 If a population is normal with mean μ and
standard deviation σ, the sampling distribution
of X is also normally distributed with

σ
μX  μ and σX 
n
Sampling Distribution Properties
Normal Population
μx  μ Distribution

μ x
(i.e. x is unbiased ) Normal Sampling
Distribution
(has the same mean)

μx
x
Sampling Distribution Properties
(continued)

As n increases, Larger
σ decreases
x
sample size

Smaller
sample size

μ x
If the Population is not Normal
 We can apply the Central Limit Theorem:

 Even if the population is not normal,


 …sample means from the population will be
approximately normal when the sample size is large
enough.

Properties of the sampling distribution:

σ
μx  μ and σx 
n
Central Limit Theorem
the sampling
As the n↑
distribution
sample
becomes
size gets
almost normal
large
regardless of
enough…
shape of
population

x
Illustrations of
Central Limit Theorem
How Large is Large Enough?

 For most distributions, n > 30 will give a


sampling distribution that is nearly normal
 For fairly symmetric distributions, n > 15
Z-value for Sampling Distribution
of the Mean
 Z-value for the sampling distribution of X :

( X  μX ) ( X  μ)
Z 
σX σ
n
where: X = sample mean
μ = population mean
σ = population standard deviation
n = sample size
Example
 Suppose a population has mean μ = 8 and
standard deviation σ = 3. Suppose a random
sample of size n = 36 is selected.

 What is the probability that the sample mean is


between 7.8 and 8.2?
Example (continued)
Solution:
 Even if the population is not normally distributed,
the central limit theorem can be used (n > 30)
 … so the sampling distribution of is
approximately normal x
 … with mean = 8
 …and standardμdeviation
x
σ 3
σx    0.5
n 36
Example (continued)
Solution (continued):
 
 7.8 - 8 X -μ 8.2 - 8 
P(7.8  X  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.4  Z  0.4)  0.3108

Population Sampling Standard Normal


Distribution Distribution Distribution .1554
??? +.1554
? ??
? ? Sample Standardize
? ? ?
?
7.8 8.2 -0.4 0.4
μ8 X μX  8 x μz  0 Z
Summary
 Introduced sampling distributions
 Calculated probabilities using sampling distributions
Confidence Intervals
Learning Objectives
In this part of the lecture, you learn:
 To construct and interpret confidence interval
estimates for the mean and proportion.
 How to determine the sample size necessary
to develop a confidence interval for the mean
and proportion.
Point and Interval Estimates

 A point estimate is a single number,


 a confidence interval provides additional
information about variability

Lower Upper
Confidence Confidence
Point Estimate
Limit Limit

Width of
confidence interval
Confidence Intervals
 How much uncertainty is associated with a
point estimate of a population parameter?

 An interval estimate provides more


information about a population characteristic
than does a point estimate

 Such interval estimates are called confidence


intervals
Estimation Process

Random Sample I am 95%


confident that
Population μ is between
Mean 40 & 60.
(mean, μ, is X = 50
unknown)

Sample
Confidence Level, (1-)
 Confidence for which the interval will contain
the unknown population parameter
 Suppose confidence level = 95%, also written
(1 - ) = 0.95
 Interpretation: in the long run, 95% of all the
confidence intervals that can be constructed
will contain the unknown true parameter
 A specific interval either will contain or will not
contain the true parameter (no probability
involved in a specific interval)
Confidence Intervals
Confidence
Intervals

Population Population
Mean Proportion

σ Known σ Unknown
Confidence Interval for μ
(σ2 Known)
 Assumptions
 Population variance σ2 is known
 Population is normally distributed
 If population is not normal, use large sample
X 
z

n

α α
1 
2 2

Z units: -Z/2 0 Z/2


Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit
X 
P ( Z /2   Z /2 )  1  

n

σ σ
x  z α/2  μ  x  z α/2
n n

 The general formula for all


confidence intervals is:
Point Estimate ± (Critical Value)(Standard Error)
Finding the Critical Value, Z
Z  1.96
 Consider a 95% confidence interval:
1   0.95

α α
 0.025  0.025
2 2

Z units: Z= -1.96 0 Z= 1.96


Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit
Common Levels of Confidence
 Commonly used confidence levels are 90%,
95%, and 99%
Confidence
Confidence
Coefficient, Z value
Level
1 
80% 0.80 1.28
90% 0.90 1.645
95% 0.95 1.96
98% 0.98 2.33
99% 0.99 2.58
99.8% 0.998 3.08
99.9% 0.999 3.27
Intervals and Level of Confidence
Sampling Distribution of the Mean

/2 1  /2
x
Intervals μx  μ
extend from x1
σ x2 (1-)x100%
X Z of intervals
n
constructed
to
σ contain μ;
X Z
n ()x100% do
not.
Confidence Intervals
Example
 A sample of 11 circuits from a large normal
population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is 0.35 ohms.

 Determine a 95% confidence interval for the


true mean resistance of the population.
Example
(continued)

 A sample of 11 circuits from a large normal


population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is 0.35 ohms.

 Solution: σ
X Z
n
 2.20  1.96 (0.35/ 11 )
 2.20  0.2068
1.9932    2.4068
Interpretation
 We are 95% confident that the interval
from 1.9932 to 2.4068 ohms contains the
true mean resistance;
 Although the true mean may or may not be
in this interval, 95% of intervals formed in
this manner will contain the true mean
Confidence Interval for μ
(σ Unknown)

 If the population standard deviation σ is


unknown, we can substitute the sample
standard deviation, S
 This introduces extra uncertainty, since
S is variable from sample to sample
 So we use the t distribution instead of the
normal distribution
Confidence Interval for μ
(σ Unknown)
(continued)
 Assumptions
 Population standard deviation is unknown
 Population is normally distributed
 If population is not normal, use large sample
 Use Student’s t Distribution
 Confidence Interval Estimate:

S S
x  t n-1,α/2  μ  x  t n-1,α/2
n n
where tn-1,α/2 is the critical value of the t distribution with n-1 d.f. and an
area of α/2 in each tail.
Student’s t Distribution
 The t is a family of distributions
 The t value depends on degrees of
freedom (d.f.)
 Number of observations that are free to vary after
sample mean has been calculated

d.f. = n - 1
Degrees of Freedom (df)
Idea: Number of observations that are free to vary
after sample mean has been calculated

Example: Suppose the mean of 3 numbers is 8.0

Let X1 = 7 If the mean of these three


Let X2 = 8 values is 8.0,
What is X3? then X3 must be 9
(i.e., X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2
(2 values can be any numbers, but the third is not free to vary
for a given mean)
Student’s t Distribution
Note: t Z as n increases

Standard
Normal
(t with df = ∞)

t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘longer’ tails than the t (df = 5)
normal

0 t
Student’s T Table
Upper Tail Area
Let: n = 3
df .25 .10 .05 df = n - 1 = 2
 = 0.10
1 1.000 3.078 6.314 /2 = 0.05

2 0.817 1.886 2.920


3 0.765 1.638 2.353 /2 = 0.05

The body of the table


contains t values, not 0 2.920 t
probabilities
T distribution values
With comparison to the Z value

Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.) ____

0.80 1.372 1.325 1.310 1.28


0.90 1.812 1.725 1.697 1.645
0.95 2.228 2.086 2.042 1.96
0.99 3.169 2.845 2.750 2.58

Note: t Z as n increases
In Excel
 TDIST(x, Degree_of_freedom, Tails)
 TDIST(2.92, 2, 2) = 0.1
 Return the probability T-student distribution
 TINV(Probability, Degree_of_freedom)
 TINV(0.1, 2) = 2.92
 Return the t-value for two tails
Example
A random sample of n = 25 has X = 50 and
S = 8. Form a 95% confidence interval for μ

 d.f. = n – 1 = 24, so t/2 , n1  t 0.025,24  2.0639

The confidence interval is


S 8
X  t /2, n-1  50  (2.0639)
n 25
46.698 ≤ μ ≤ 53.302
Confidence Intervals
Confidence
Intervals

Population Population
Mean Proportion

σ Known σ Unknown
Confidence Intervals for the
Population Proportion, π

 An interval estimate for the population


proportion ( π ) can be calculated by
adding an allowance for uncertainty to
the sample proportion ( p )
Confidence Intervals for the
Population Proportion, π
(continued)

 The distribution of the sample proportion is


approximately normal if the sample size is large,
with standard deviation

 (1  )
σp 
n
 We will estimate this with sample data

p(1 p)
n
Confidence Interval Endpoints
 Upper and lower confidence limits for the population
proportion are calculated with the formula:

p(1  p)
p  Z α/2
n
 where
 Zα/2 is the standard normal value for the level of confidence desired
 p is the sample proportion
 n is the sample size
 Note: must have X = np > 5 and n – X = n(1-p) > 5
Example
 A random sample of 100 people shows that
25 are left-handed. Form a 95% confidence
interval for the true proportion of left-handers.

X= 25 > 5 and n-X = 75 > 5

Make sure p  Z/2 p(1  p)/n


the sample
 25/100  1.96 0.25(0.75)/100
is big enough
 0.25  1.96 (0.0433)
0.1651    0.3349
Interpretation
 We are 95% confident that the true
percentage of left-handers in the population is
between
16.51% and 33.49%.

 Although the interval from 0.1651 to 0.3349


may or may not contain the true proportion,
95% of intervals formed from samples of size
100 in this manner will contain the true
proportion.
Determining Sample Size

Determining
Sample Size

For the For the


Mean Proportion
Sampling Error
 The required sample size can be found to reach a
desired margin of error (e) with a specified level of
confidence (1 - )

 The margin of error is also called sampling error


 the amount of imprecision in the estimate of the
population parameter
 the amount added and subtracted to the point estimate to
form the confidence interval
Reducing the Margin of Error
σ
e  zα/2
n

The margin of error can be reduced if

 the population standard deviation can be reduced (σ↓)

 The sample size is increased (n↑)

 The confidence level is decreased, (1 – ) ↓


Determining Sample Size

Determining
Sample Size

For the
Mean

σ σ 2
Z σ 2
XZ eZ n
n n e 2
Sampling Solve
error for n
Required Sample Size Example
If  = 45, what sample size is needed to
estimate the mean within ± 5 with 90%
confidence?

2 2 2 2
Z σ (1.645) (45)
n 2
 2
 219.19
e 5

So the required sample size is n = 220


(Always round up)
Determining Sample Size
Determining
Sample Size

For the
Proportion

π (1 π ) Now solve Z 2 π (1 π )


eZ for n to get n 2
n e
Required Sample Size Example

How large a sample would be necessary to


estimate the true proportion defective in a
large population within ±3%, with 95%
confidence?
(Assume a pilot sample yields p = 0.12)
Required Sample Size Example
(continued)

Solution:
For 95% confidence, use Zα/2 = 1.96
e = 0.03
p = 0.12, so use this to estimate π
2
Z/2 π (1  π ) (1.96) (0.12)(1  0.12)
2
n 2
 2
 450.74
e (0.03)
So use n = 451
Summary
 Introduced the concept of confidence intervals
 Discussed point estimates
 Developed confidence interval estimates
 Created confidence interval estimates for the mean
 Created confidence interval estimates for the
proportion
 Determined sample size
Homeworks
 Ebook: Chaper 8
 8.66
 8.70
 8.72
 8.86
 8.90

You might also like