Lecture 8
Lecture 8
Lecture 8:
Sampling and Confidence Interval
Estimation
Recommended Text:
Albright and Winston, “Business Analytics”
6th Edition. 2017 Copyright © Cengage Learning
2
Lecture Objectives
• Discuss the sampling schemes generally used in real
sampling applications
3
Sampling Terminology
A population is the set of all members about which a
study intends to make inferences.
• An inference is a statement about a numerical
characteristic of the population.
A frame is a list of all members of the population. The
potential sample members are called sampling units.
A probability sample is a sample in which the
sampling units are chosen from the population
according to a random mechanism.
A judgmental sample is a sample in which the
sampling units are chosen according to the sampler’s
judgment.
4
Methods for Selecting Random
Samples
Different types of sampling schemes have different
properties.
5
Simple Random Sampling
The simplest type of sampling scheme is called simple
random sampling.
7
Simple Random Sampling (cont’d)
• Simple random sampling requires that all sampling
units be identified prior to sampling. Sometimes this is
infeasible.
8
Systematic Sampling
A systematic sample provides a convenient way to
choose the sample.
• First, divide the population size by the sample size,
creating “blocks.”
• Next, use a random mechanism to choose a number
between 1 and the number in each “block.”
• In general, one of the first k members is selected
randomly, and then every kth member after this one is
selected.
• The value k is called the sampling interval and equals
the ratio N/n, where N is the population size and n is the
desired sample size.
9
Systematic Sampling (Cont’d)
10
Stratified Sampling
• Suppose various subpopulations within the total
population can be identified. These subpopulations are
called strata.
• Instead of taking a simple random sample from the
entire population, it might make more sense to select a
simple random sample from each stratum separately.
• This sampling method is called stratified sampling.
11
Stratified Sampling (Cont’d)
12
Stratified Sampling (Cont’d)
Advantages of stratified sampling:
• Separate estimates can be obtained within each stratum,
which would not be obtained with a simple random sample
from the entire population.
• The accuracy of the resulting population estimates can be
increased by using appropriately defined strata.
• Define the strata such that there is less variability within the
individual strata than in the population as a whole.
13
Proportional Sample Sizes
There are many ways to choose sample sizes from
each stratum, but the most popular method is to use
proportional sample sizes.
14
Proportional Sample Sizes (Cont’d)
15
Cluster Sampling
In cluster sampling, the population is separated into
clusters, such as cities or city blocks, and then a random
sample of the clusters is selected.
• The primary advantage of cluster sampling is sampling
convenience (and possibly lower cost).
• The downside is that the inferences drawn from a cluster
sample can be less accurate for a given sample size than
other sampling plans.
16
Multistage Sampling Schemes
The cluster sampling scheme is an example of a
single-stage sampling scheme.
Real applications are often more complex than this,
resulting in multistage sampling schemes.
• For example, in ABC’s nationwide surveys, a random
sample of approximately 300 locations is chosen in
the first stage of the sampling process.
• City blocks or other geographical areas are then
randomly sampled from the first-stage locations in the
second stage of the process.
• This is followed by a systematic sampling of
households from each second-stage area.
17
Multistage Sampling Schemes
(Cont’d)
18
An Introduction to Estimation
The purpose of any random sample, simple or
otherwise, is to estimate properties of a population from
the data observed in the sample.
20
Sources of Estimation Errors
Nonsampling error is quite different and can occur for
a variety of reasons:
• Nonresponse bias occurs when a portion of the sample
fails to respond to the survey.
21
Sources of Estimation Errors (cont’d)
• Voluntary response bias occurs when the subset
of people who respond to a survey differs in some
important respect from all potential respondents.
22
Key Terms in Sampling
A point estimate is a single numeric value, a “best
guess” of a population parameter, based on the data in
a random sample.
29
Sample Size Selection
The problem of selecting the appropriate sample size in
any sampling context is not an easy one, but it must be
faced in the planning stages, before any sampling is
done.
• The sampling error tends to decrease as the sample
size increases, so the desire to minimize sampling
error encourages us to select larger sample sizes.
• However, several other factors encourage us to select
smaller sample sizes, including:
• Cost
• Timely collection of data
• Increased chance of nonsampling error, such as
nonresponse bias
30
Summary of Key Ideas for Simple
Random Sampling
• To estimate a population mean with a simple random sample,
the sample mean is typically used as a “best guess”. This
estimate is called a point estimate.
• The accuracy of the point estimate is measured by its
standard error. It is the standard deviation of the sampling
distribution of the point estimate.
• A confidence interval (with 95% confidence) for the population
mean extends to approximately two standard errors on either
side of the sample mean.
• From the central limit theorem, the sampling distribution of is
approximately normal when n is reasonably large.
• There is approximately a 95% chance that any particular will
be within two standard errors of the population mean .
• The sampling error can be reduced by increasing the sample
size n. 31
Confidence Interval Estimation
Statistical inferences are always based on an
underlying probability model, which means that some
type of random mechanism must generate the data.
• Two random mechanisms are generally used:
• Random sampling from a larger population
• Randomized experiments
Generally, statistical inferences are of two types:
• Confidence interval estimation uses the data to obtain
a point estimate and a confidence interval around
this point estimate.
• Hypothesis testing determines whether the observed
data provide support for a particular hypothesis.
32
Sampling Distributions
Most confidence intervals are of the form:
35
Other Sampling Distributions
The t distribution, a close relative of the normal
distribution, is used to make inferences about a
population mean when the population standard
deviation is unknown.
36
Confidence Interval for a Mean
To obtain a confidence interval for μ, first specify a
confidence level, usually 90%, 95%, or 99%.
39