Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

6Sampling Distribution

This document covers the concept of sampling distributions, including definitions, types, and the Central Limit Theorem. It explains how to construct sampling distributions, the differences between sampling with and without replacement, and the implications of sampling error. Additionally, it discusses the properties of sampling distributions and their applications in statistical analysis.

Uploaded by

Berhanu Yelea
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

6Sampling Distribution

This document covers the concept of sampling distributions, including definitions, types, and the Central Limit Theorem. It explains how to construct sampling distributions, the differences between sampling with and without replacement, and the implications of sampling error. Additionally, it discusses the properties of sampling distributions and their applications in statistical analysis.

Uploaded by

Berhanu Yelea
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 82

Sampling Distributions

Learning objectives
At the end of this topic, students will be able to:
• Describe sampling and sampling distribution

• Understand types of sampling distribution

• Define central Limit Theorem

• Demonstrate applications of the sampling distributions

12 March 2025 2
• A sampling distribution is a distribution
of all possible values of a statistic
computed from samples of the same
size randomly selected from the same
population.
• Serves to answer probability questions
about sample statistics.
• When sampling a discrete, finite
population, a sampling distribution can
be constructed.

• However, this construction is difficult


with a large population and impossible
with an infinite population.
• We consider sample statistics as random
variables.

Example:
• Age of individuals is a random variable.

• Similarly, mean age is a random variable.


• Conclusions about values of population
parameters based on one individual value
can not be drawn.

• It should be based on sample statistic


computed from adequate sample size.
• Similarly, take a sample and calculate the
statistic, e.g., mean.
• Take another sample (same size) and
calculate mean.
• Repeat & repeat & repeat & ………..
• Do you expect all the sample means the
same? NO
• They will vary BUT less variation
• Put all these sample statistics together to
get a distribution of sample statistics.
Construction of sampling distributions

1. From a population of size N, randomly


draw all possible samples of size n.
2. Compute the statistic of interest for
each sample.
3. Create a frequency distribution of the
statistic.
Main types of sampling distributions

A. Distribution of the sample mean


B. Distribution of the difference between
two means
C. Distribution of the sample proportion
D. Distribution of the difference between
two proportions
A. Sampling distribution of sample
mean
• Suppose we have a population of size N=4,
constituting the ages of four outpatients.
x, Age (years): 18, 20, 22, 24

μ
 x i
N
18  20  22  24
 21
4

σ
 i
(x  μ) 2

2.236
N
Now consider all possible samples of size

st nd
n=2
1 2 Observation 1st 2nd Observation
Obs 18 20 22 24 Obs 18 20 22 24
18 18,18 18,20 18,22 18,24 18 18 19 20 21
20 20,18 20,20 20,22 20,24 20 19 20 21 22
22 22,18 22,20 22,22 22,24 22 20 21 22 23
24 24,18 24,20 24,22 24,24 24 21 22 23 24
• 16 possible samples • 16 Sample Means
(with replacement)
Sample means Freq P( )
18 1 0.0625
19 2 0.1250
20 3 0.1875
21 4 0.2500
22 3 0.1875
23 2 0.1250
24 1 0.0625
Sampling distribution of all sample means

16 Sample Sample
Means Means
1st 2nd Observation Distribution
Obs 18 20 22 24 P(x)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 x
Summary measures of this sampling distribution: Add
the 16 sample means & divide by 16. Also calculate
the SD of the sample means.

μx 
 x

18  19  21    24
i
21
N 16

σx 
 i x
(x  μ ) 2

N
(18 - 21)2  (19 - 21)2    (24 - 21)2
 1.58
16
Comparing the population with its
sampling distribution
Population Sample means
N=4 distribution
μ 21 σ 2.236 μx 21n = σ2x 1.58
P(x) P(x)
.3 .3

.2 .2

.1 .1

0 0 18 19 20 21 22 23 24
_
18 20 22 24
x Mean
• We note that the mean of the sampling
distribution of has the same value as
the mean of the original population.

• However, the variance is ≠ the original


population variance; but is equal to the
population variance divided by the sample
size used to obtain sampling distribution.
• The square root of the sampling distribution
variance is called standard error of the
mean or, simply, standard error.
σ
σx 
n
• OR, the standard deviation of any sample
statistic is called its standard error.
• SE is determined by both the sample size and the
degree of variability among the individual
observations

• SD quantifies the amount of variability among


individuals in a population, while

• SE quantifies the variability among means of


repeated samples drawn from that population

• The SE is always smaller than the SD (except


when n = 1)
Sampling with Vs without
replacement
• The foregoing sampling distribution of
sample means was based on the
assumption that sampling is either with
replacement or the samples are drawn from
infinite populations.
• Sampling with replacement is difficult under
practical conditions
• Necessary to sample from finite population
• Sampling with • Sampling without
replacement replacement
Population size = N Population size = N
Sample size = n Sample size = n
1st draw = N 1st draw = N
2nd draw = N 2nd draw = N-1
3rd draw = N 3rd draw = N-2
. .
nth draw = N nth draw = (N-n+1)

The # of possible samples The # of possible samples


= Nn = NCn = N!
n! (N-n)!
• In sampling without replacement,
– the mean of the sampling distribution is equal
to the population mean
– The variance of the sampling distribution is:

= σ2 (N-n)
n (N-1)
– Finite population correction, (N-n)/(N-1)
Sampling Error
• Sample statistics are used to estimate

population parameters
ex: X is an estimate of the population mean, μ
• Problems:
– Different samples provide different estimates of
the population parameter
– Sample results have potential variability, thus
sampling error exits
Standard deviation vs. standard error

• Standard deviation (SD): tells us variability


among individuals (X)
• Standard error (S.E): tells us variability of
sample means X

12 March 2025 23
Calculating sampling error
• Sampling error:
The difference between a value (a statistic)
computed from a sample and the corresponding
value (a parameter) computed from a population

Example: (for the mean)


Sampling Error  x - μ
where: x sample mean
μ population mean
Example
If the population mean is μ = 98.6 degrees
and a sample of n = 5 temperatures yields a
x

sample mean of x = 99.2 degrees, then the


sampling error is:
x

Sample mean- μ = 99.2 – 98.6 = 0.6 degrees


Example 2
• Suppose that we want to estimate the mean birth-
weights of Tigre male live births in Ethiopia
• Due to logistical constraints, we decide to take a random
sample of 100 Tigre live births at the Ayder University
Hospital in a given year

All AUH
Tigre live 100
births, 2002 sample

All Tigre live births


in Ethiopia in 2002
Study population
12 March 2025 26 Target population
• We calculated sample mean = 3.5 kg and sample SD = 0.25kg

• Suppose that we know the mean birth weight of source population 


to be 3.27 kg with  = 0.38 kg
• X -  = 0.23kg
 Could the difference of 0.23 kg =(3.5kg-3.27kg) be real or could it
be purely due to chance in sampling?
 ‘Apparent’ difference between population mean and the random
sample mean that is purely due to chance in sampling is called the
sampling error

12 March 2025 27
 Could the difference of 0.23 kg =(3.5kg-3.27kg) be real or
could it be purely due to chance in sampling?
 ‘apparent’ difference between population mean and the
random sample mean that is purely due to chance in
sampling is called the sampling error
 Sampling error does not mean that a mistake has been made
in the process of sampling but variation experienced due to
the process of sampling

12 March 2025 28
Sampling error reflects the difference between
the value derived from the sample and the true
population value
The only way to eliminate sampling error is to
enumerate the entire population
Note:
• The sampling error may be positive or
negative (x may be greater than or
less than μ)
• The expected sampling error decreases
as the sample size increases
Properties of sampling distribution of mean

A. Sampling from normally distributed populations


a. If a population is normal with mean μ and
standard deviation σ, the sampling distribution
of x is also normally distributed with

σ
μ x μ and σx 
n
b. The mean, μ, of the distribution of sample
mean is equal to the mean of the
population from which the samples were
drawn
c. The variance of the distribution of sample
mean is equal to the variance of the
population divided by the sample size
Properties of normal distribution

0.34 0.34

-
  
 Unimodal and symmetrical, i.e. one half of distribution is mirror
image of the other half
 Probability distribution: area under normal curve is 1
 For a normal distribution with mean  and standard deviation 
 1 contains approximately 68% of area under the normal curve
 1.96 contains approximately 95% of area under the normal
curve
 2.58 contains approximately 99% of area under the normal
curve
12 March 2025 33
B. Sampling from non-normally distributed populations
• When the sampling is done from a non-normally
distributed population, the central limit theorem is used.
• The larger the sample size, the better will be the normal
approximation to the sampling distribution of the mean.
• We can apply the Central Limit Theorem:
– Even if the population is not normal, sample means
from the population will be approximately normal as
long as the sample size is large enough.
Then, the sampling distribution will have

μ x μ
σ
and σx 
n
The sampling
distribution
becomes almost
As the n↑ normal
sample regardless of
size gets shape of
large population
enough…

x
If the population is not normal
Population Distribution
Sampling distribution
properties:
Central Tendency
μ x μ
μ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx  Larger
n Smaller
sample size
sample
size

μx x
Below is a graph of results from a sampling activity. Samples were taken at
increasing sizes, from 4 cases to 98 cases. You can see that as sample size
increases, not only do the sample means become closer to the population
mean, but fluctuations in sample means becomes smaller.
• Generally, as n increases, the sample
mean and sample variance S2 approach
the values of the true population
parameters µ and σ2, respectively.
• The average of the sample means based
on repeated samples of size n approaches
the population mean µ as the number of
samples selected gets large.

E (x) = µ
• The estimator x is said to be unbiased
How large is large enough?
• For most distributions, n > 30 will give a sampling
distribution that is nearly normal

• For fairly symmetric distributions, n > 15

• For normal population distributions, the sampling distribution


of the mean is always normally distributed.
• However, the general answer depends on the shape of the
distribution of the sampled population.
Applications of the sampling
distributions of sample mean
• Helps in computing the probability of
obtaining a sample with a mean of some
specified magnitude.
z-value for sampling distribution
of x
(x  μ)
z 
σ
n

where: x = sample mean


μ = population mean
σ = population standard deviation
n = sample size
Finite Population Correction

• Apply the Finite Population Correction (FPC) factor if:


– the sample is large relative to the
population (n/N > 5%) and…
– Sampling is without replacement

(x  μ)
Then z
σ N n
n N 1
• When the population is much larger than
the sample, the difference between σ2/n
and (σ2/n)[(N-n)/(N-1)] will be negligible.

• Example: N = 10,000; n=25


• Finite Population Correction = (N-n)/(N-
1)
= (10,000-25)/(10,000-1) =0.9976 ≈ 1
Example 1
• Given: μ = 50, σ = 16, n = 64
Find: P(x > 53)
Solution
1. Write the given information, μ=50, σ=16, n=64
2. Sketch a normal curve
3. Convert x to a z score

4. Find the appropriate value(s) in the Table

The area of the SND above a value of z = 1.5 gives an area of

0.0668. The probability P (z > 1.5) = 0.0668

5. Complete the answer

The probability that X is greater than 53 is 0.0668.


Example 2
• Suppose a population has mean μ = 8
and standard deviation σ = 3. Suppose a
random sample of size n = 36 is selected.

• What is the probability that the sample


mean is between 7.8 and 8.2?
Solution:
• Even if the population is not normally
distributed, the central limit theorem can be
used (n > 30)
• … so the sampling distribution of x is
approximately normal
• … with mean μx = 8
• …and σ 3
σx   0.5
n 36
 
 7.8 - 8 μx -μ 8.2 - 8 
P(7.8  μ x  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.4  z  0.4)  0.3108

Population Sampling Standard Normal


Distribution Distribution Distribution
.1554
??? +.1554
? ??
? ? Sample Standardize
? ? ?
?
-0.4 0.4
μ 8 x 7.8
μx 8
8.2
x μz 0 z
Example 3
• The distribution of serum cholesterol levels for all
20-70 year-old males has mean µ = 211 mg/100
ml and SD = 46 mg/100 ml.
a. If a sample of size 25 is selected from this
population, what is the probability that the sample
has a mean of 230 or above?
• Since x has a normal distribution with mean
211 and standard error 9.2,
• The area under the standard normal curve
to the right of z = 2.07 is 0.0197

• Consequently, the probability that a


sample of size 25 has a mean of 230
mg/100 ml or higher is 0.0197.
b. What mean value of serum cholesterol level
cuts off the lower 10% of the sampling
distribution?
• An area of 0.1003 in the lower tail of the
SND is marked by the value z = −1.28
• What is the corresponding value of ?
Approximately 10% of samples of size
25 have means that are less than or
equal to 199.2 mg/100 ml.

The other 90% of the samples have


means that are greater than 199.2
mg/100 ml
B. Distribution of the difference
between two sample means
• Important to compare two population
means (comparative studies)
• Are the two population means different?
• If yes by how much they differ?
• For example, mean serum cholesterol level
for sedentary office workers vs laborers.
• It is generally assumed that the two populations
are normally distributed.

• For sampling from non-normal populations, large


samples are recommended by the application of
the CLT.

• Plotting sample differences (Mean1-Mean2)


against frequency gives a normal distribution with
mean equal to μ1-μ2 which is the difference
between the two population means.
• The variance of the distribution of the
sample differences is:
2 2
= (σ /n1) + (σ2 /n2)
1

• Thus, the standard error of the difference


between sample means is:

SE =
• To convert to the SND, we use the formula

• We find the z score by assuming that there


is no difference between the population
means.
Example
• In a study of annual family expenditures for general health
care, two populations were surveyed with the following
results:
Population1: n1=40; Mean (X1) = $346
Population2: n2=35; Mean (X2) = $300

• If the variances of the populations are σ12=2800


and σ22=3250, what is the probability of obtaining
sample results (X1-X2) as large as those shown if
there is no difference in the means of the two
populations?
• The area above a value of z = 3.6 is 0.0002. This
gives the probability P (z > 3.6) = .0002

• The probability that X1-X2 is as large as given is


0.0002.
C. Distribution of the sample
proportion
• The sample proportion is derived from
counts or frequency data.
• Easier and more reliable, does not depend
on variance.
• Sample proportion =
• Population proportion = p or π
• Population proportion (p) = the proportion
of population having some characteristic

• Sample proportion ( ) provides an


estimate of p:
x number of successes in the sample
p 
n sample size

• If two outcomes, p has a binomial


distribution
Properties of the sample
proportion
• Construction of the sampling distribution of
the sample proportion is done in a manner
similar to that of the mean and the
difference between two means.

• Applying the central limit theorem, the


shape of the sampling distribution is
approximately normal provided that n is
large enough
• The mean of the distribution, μp, will be
equal to the true population proportion, p,
and the variance of the distribution, σp2 will
be equal to p(q)/n.
How large does n need to be?
• If p is known, we must have npq  5
• However, since p is generally not known, we
use
• Approximation by a
normal distribution if:
Sampling Distribution
P( p )
.3
np 5 .2
.1
n(1  p) 5 0
0 .2 .4 .6 8 1 p

where
and p(1  p)
μ p p σp 
n
(where p = population proportion)
z-Value for Proportions
Standardize p to a z value with the formula:
p p p p
z 
σp p(1  p)
n
• If sampling is without
replacement and n > 5% of p(1  p) N  n
σp 
the population size, then σ p n N 1
must use the FPC(Finite
Population Correction) factor:
Example 1
• According to a recent estimate, 19.4% of the
adult male population was obese. What is the
probability that in a random sample of size 150
from this population fewer than 15% will be
obese?
Note: npq = 150x0.194x0.806 = 24 > 5.
• n = 150, p = .194, Find P( p < 15)


• Find the z score

• A value of z = -1.36 gives an area of .0869


which is the probability P (z < -1.36) = .0869

The probability that p < 15% is .0869.


Example 2
• If the true proportion of voters who
support Proposition A is p = .4, what is
the probability that a sample of size 200
yields a sample proportion between .40
and .45?
• if p = .4 and n = 200, what is
P(.40 ≤ p ≤ .45) ?

Find σ:p σp 
p(1  p)

.4(1  .4)
.03464
n 200

Convert to
standard  .40  .40 .45  .40 
normal: P(.40 p .45)  P z  
 .03464 .03464 
 P(0 z 1.44)
Use standard normal table: P(0 ≤ z ≤ 1.44) = .4251

Standardized
Sampling Distribution Normal Distribution

.4251

Standardize

.40 .45 0 1.44


p z
Example 3
• In a survey conducted in the 1990s, 19% of
respondents  18 years had not heard of
the AIDS virus HIV. What is the probability
that in a sample size of 175 from this
population 25% or more will not have heard
about the virus?
• σp2 = (0.19)(0.81)/175 = 0.0009, σp = 0.03
• z = (0.25-0.19)/0.03 = 2.0
• P (z  2.0) = 0.02275
• The probability that p  0.25 is 0.02275.
D. Distribution of the difference
between two sample proportions

• We assess the probability associated with a


difference in proportions computed from
samples drawn from each of these populations.

• The appropriate distribution is the distribution of


the difference between two sample proportions.
• Sampling distribution of

• The sampling distribution of the difference


between two sample proportions is constructed
in a manner similar to the difference between
two means.
• Independent random samples of size n1 and n2
are drawn from two populations of dichotomous
variables where the proportions of observations
with the character of interest in the two
populations are p1 and p2 , respectively.
• The distribution of the difference between
two sample proportions, , is approximately
normal.
The mean, , and

variance, .

• These are true when n1 and n2 are large.


The z Score
Example 1
• In a certain area of a large city it is hypothesized that 40%
of the houses are in a dilapidated condition. A random
sample of 75 houses from this section and 90 houses
from another section yielded a difference, , of 0.09.
If there is no difference between the two areas in the
proportion of dilapidated houses, what is the probability of
observing a difference this large or larger?
Given: n1 = 75, n2 = 90, p1 = 0.4, p2 = 0.4; p1-p2 = 0.09

• Find P(p1 - p2  0.09)


Sketch a normal curve
• The area of the normal curve above
z = 1.17 is 0.121.
P (z > 1.17) = 0.121

• The probability of observing of 0.09


or greater is 0.121.
Class Exercise I
• Suppose that the proportion of moderate to
heavy users of illegal drugs in population 1 = 0.5
and population 2 = 0.33.

• What is the probability that a sample of size 100


drawn from each will yield a proportion
difference as large as 0.3?
• Assumption: Approximately normal with mean
• μp1-p2 = 0.5-0.33 = 0.17

• Variance = (0.33)(0.67) + (0.5)(0.5) = 0.004711


100 100
• Z = (0.3-0.17)/√0.004711 = 1.89

• We need the area under the curve to the right of


1.89.

• The probability of observing the difference as


large as 0.30 is 0.0294.

You might also like