0% found this document useful (0 votes)

17 views

Lecture 8

Uploaded by

nirmal thing

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Lecture 8

Uploaded by

nirmal thing

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

MDA511

Lecture 8:
Sampling and Confidence Interval
Estimation

Recommended Text:
Albright and Winston, “Business Analytics”
6th Edition. 2017 Copyright © Cengage Learning

Compiled by Prof. Paul Kwan

Motivations for Sampling
In a typical statistical inference problem, you want to
discover one or more characteristics of a given
population.

Generally difficult or even impossible to contact each

member of the population
• Solution: identify a sample of the population and
then obtain information from members of the
sample

2
Lecture Objectives
• Discuss the sampling schemes generally used in real
sampling applications

• See how the information from a sample of the

population can be used to infer the properties of the
entire population

3
Sampling Terminology
A population is the set of all members about which a
study intends to make inferences.
• An inference is a statement about a numerical
characteristic of the population.
A frame is a list of all members of the population. The
potential sample members are called sampling units.
A probability sample is a sample in which the
sampling units are chosen from the population
according to a random mechanism.
A judgmental sample is a sample in which the
sampling units are chosen according to the sampler’s
judgment.
4
Methods for Selecting Random
Samples
Different types of sampling schemes have different
properties.

• There is typically a trade-off between cost and

accuracy.

• Some sampling schemes are cheaper and easier to

administer, whereas others are more costly but
provide more accurate information.

5
Simple Random Sampling
The simplest type of sampling scheme is called simple
random sampling.

A simple random sample of size n is one where each

possible sample of size n has the same chance of
being chosen.
• Simple random samples are the easiest to
understand, and their statistical properties are the
most straightforward.

More complex random samples are often used in real

applications 6
Simple Random Sampling
Simple random samples are used infrequently in real
applications. There are several reasons for this:

• Because each sampling unit has the same chance of

being sampled, simple random sampling can result in
samples that are spread over a large geographical
region.

• This can make sampling extremely expensive, especially

if personal interviews are used.

7
Simple Random Sampling (cont’d)
• Simple random sampling requires that all sampling
units be identified prior to sampling. Sometimes this is
infeasible.

• Simple random sampling can result in

underrepresentation or overrepresentation of certain
segments of the population.

8
Systematic Sampling
A systematic sample provides a convenient way to
choose the sample.
• First, divide the population size by the sample size,
creating “blocks.”
• Next, use a random mechanism to choose a number
between 1 and the number in each “block.”
• In general, one of the first k members is selected
randomly, and then every kth member after this one is
selected.
• The value k is called the sampling interval and equals
the ratio N/n, where N is the population size and n is the
desired sample size.
9
Systematic Sampling (Cont’d)

10
Stratified Sampling
• Suppose various subpopulations within the total
population can be identified. These subpopulations are
called strata.
• Instead of taking a simple random sample from the
entire population, it might make more sense to select a
simple random sample from each stratum separately.
• This sampling method is called stratified sampling.

11
Stratified Sampling (Cont’d)

12
Stratified Sampling (Cont’d)
Advantages of stratified sampling:
• Separate estimates can be obtained within each stratum,
which would not be obtained with a simple random sample
from the entire population.
• The accuracy of the resulting population estimates can be
increased by using appropriately defined strata.
• Define the strata such that there is less variability within the
individual strata than in the population as a whole.

13
Proportional Sample Sizes
There are many ways to choose sample sizes from
each stratum, but the most popular method is to use
proportional sample sizes.

• With proportional sample sizes, the proportion of a

stratum in the sample is the same as the proportion of
that stratum in the population.
• The advantage of proportional sample sizes is they
are very easy to determine.
• The disadvantage is they ignore differences in
variability among the strata.

14
Proportional Sample Sizes (Cont’d)

15
Cluster Sampling
In cluster sampling, the population is separated into
clusters, such as cities or city blocks, and then a random
sample of the clusters is selected.
• The primary advantage of cluster sampling is sampling
convenience (and possibly lower cost).
• The downside is that the inferences drawn from a cluster
sample can be less accurate for a given sample size than
other sampling plans.

16
Multistage Sampling Schemes
The cluster sampling scheme is an example of a
single-stage sampling scheme.
Real applications are often more complex than this,
resulting in multistage sampling schemes.
• For example, in ABC’s nationwide surveys, a random
sample of approximately 300 locations is chosen in
the first stage of the sampling process.
• City blocks or other geographical areas are then
randomly sampled from the first-stage locations in the
second stage of the process.
• This is followed by a systematic sampling of
households from each second-stage area.
17
Multistage Sampling Schemes
(Cont’d)

18
An Introduction to Estimation
The purpose of any random sample, simple or
otherwise, is to estimate properties of a population from
the data observed in the sample.

The mathematical procedures appropriate for

performing this estimation depend on which properties
of the population are of interest and which type of
random sampling scheme is used.

For both simple random samples and more complex

sampling schemes, the concepts are the same.
19
Sources of Estimation Errors
There are two basic sources of errors that can occur
when you sample randomly from a population:
• Sampling error
• Nonsampling error

Sampling error is the inevitable result of basing an

inference on a random sample rather than on the entire
population.

20
Sources of Estimation Errors
Nonsampling error is quite different and can occur for
a variety of reasons:
• Nonresponse bias occurs when a portion of the sample
fails to respond to the survey.

• Nontruthful responses are particularly a problem when

there are sensitive questions in a questionnaire.

• Measurement error occurs when the responses to the

questions do not reflect what the investigator had in mind
(e.g., when questions are poorly worded).

21
Sources of Estimation Errors (cont’d)
• Voluntary response bias occurs when the subset
of people who respond to a survey differs in some
important respect from all potential respondents.

• The potential for non-sampling error is enormous.

• However, unlike sampling error, it cannot be measured

with probability theory.
• It can be controlled only by using appropriate sampling
procedures and designing good survey instruments.

22
Key Terms in Sampling
A point estimate is a single numeric value, a “best
guess” of a population parameter, based on the data in
a random sample.

The sampling error (or estimation error) is the

difference between the point estimate and the true
value of the population parameter being estimated.

The sampling distribution of any point estimate is the

distribution of the point estimates from all possible
samples (of a given sample size) from the population.
23
Key Terms in Sampling
A confidence interval is an interval around the point
estimate, calculated from the sample data, that is very
likely to contain the true value of the population
parameter.
An unbiased estimate is a point estimate such that the
mean of its sampling distribution is equal to the true
value of the population parameter being estimated.
The standard error of an estimate is the standard
deviation of the sampling distribution of the estimate.
• It measures how much estimates vary from sample to
sample.
24
Sampling Distribution of the Sample
Mean
The sampling distribution of the sample mean has the
following properties:
• It is an unbiased estimate of the population mean, as
indicated in this equation:
• The standard error of the sample mean is given in the
equation where is the standard deviation of
the population, and n is the sample size.
• It is customary to approximate the standard error by
substituting the sample deviation, s, for , which leads to
this equation:
• If you go out two standard errors on either side of the
sample mean, you are approximately 95% confident of
capturing the population mean, as shown below:
25
The Finite Population Correction
Generally, sample size is small relative to the
population size.
There are situations, however, when the sample size is
greater than 5% of the population.
In this case, the formula for the standard error of the
mean should be modified with a finite population
correction, or fpc, factor:

The standard error of the mean is multiplied by fpc in

order to make the correction:
26
The Central Limit Theorem
For any population distribution with mean and standard
deviation , the sampling distribution of the sample mean
is approximately normal with mean and standard
deviation , and the approximation improves as n
increases. This is called the central limit theorem.

The important part of this result is the normality of the

sampling distribution.
• When you sum or average n randomly selected values from any
distribution, normal or otherwise, the distribution of the sum or
average is approximately normal, provided that n is sufficiently
large.
• This is the primary reason why the normal distribution is
relevant in so many real world applications. 27
Example 2: Average Winnings from
a Wheel of Fortune
Objective: To illustrate the central limit theorem by a simulation
of winnings in a game of chance.
Solution: The population is the set of all outcomes you could
obtain from a single spin of the wheel—that is, all dollar values
from $0 to $1000.
Each spin results in one randomly sampled dollar value from this
population.
Each replication of the experiment simulates n spins of the wheel
and calculates the average—that is, the winnings—from these n
spins.
A histogram of winnings is formed, for any value of n, where n is
the number of spins.
As the number of spins increases, the histogram starts to take on
more and more of a bell shape. 28
Example 2: Average Winnings from
a Wheel of Fortune
Single spin Three spins

Six spins Ten spins

29
Sample Size Selection
The problem of selecting the appropriate sample size in
any sampling context is not an easy one, but it must be
faced in the planning stages, before any sampling is
done.
• The sampling error tends to decrease as the sample
size increases, so the desire to minimize sampling
error encourages us to select larger sample sizes.
• However, several other factors encourage us to select
smaller sample sizes, including:
• Cost
• Timely collection of data
• Increased chance of nonsampling error, such as
nonresponse bias
30
Summary of Key Ideas for Simple
Random Sampling
• To estimate a population mean with a simple random sample,
the sample mean is typically used as a “best guess”. This
estimate is called a point estimate.
• The accuracy of the point estimate is measured by its
standard error. It is the standard deviation of the sampling
distribution of the point estimate.
• A confidence interval (with 95% confidence) for the population
mean extends to approximately two standard errors on either
side of the sample mean.
• From the central limit theorem, the sampling distribution of is
approximately normal when n is reasonably large.
• There is approximately a 95% chance that any particular will
be within two standard errors of the population mean .
• The sampling error can be reduced by increasing the sample
size n. 31
Confidence Interval Estimation
Statistical inferences are always based on an
underlying probability model, which means that some
type of random mechanism must generate the data.
• Two random mechanisms are generally used:
• Random sampling from a larger population
• Randomized experiments
Generally, statistical inferences are of two types:
• Confidence interval estimation uses the data to obtain
a point estimate and a confidence interval around
this point estimate.
• Hypothesis testing determines whether the observed
data provide support for a particular hypothesis.
32
Sampling Distributions
Most confidence intervals are of the form:

In general, whenever you make inferences about one or more

population parameters, you always base this inference on the
sampling distribution of a point estimate, such as the sample
mean.
An equivalent statement to the central limit theorem is that the
standardized quantity Z, as defined below, is approximately
normal with mean 0 and standard deviation 1:

• However, the population standard deviation σ is rarely known,

so it is replaced by its sample estimate s in the formula for Z.
• When the replacement is made, a new source of variability is
introduced, and the sampling distribution is no longer normal. 33
Instead, it is called the t distribution.
The t Distribution
If we are interested in estimating a population mean μ with a
sample of size n, we assume the population distribution is
normal with unknown standard deviation σ.
σ is replaced by the sample standard deviation s, as shown in
this equation:

• Then the standardized value in the equation has a t

distribution with n – 1 degrees of freedom.
• The degrees of freedom is a numerical parameter of the t
distribution that defines the precise shape of the
distribution.
• The t-value in this equation is very much like a typical Z-
value.
• That is, the t-value indicates the number of standard errors
34
by which the sample mean differs from the population
mean.
The t Distribution
The t distribution looks very much like the standard normal
distribution.
• It is bell-shaped and centered at 0.
• The only difference is that it is slightly more spread out, and
this increase in spread is greater for small degrees of
freedom.
• When n is large, so that the degrees of freedom is large,
the t distribution and the standard normal distribution are
practically indistinguishable, as shown below.

35
Other Sampling Distributions
The t distribution, a close relative of the normal
distribution, is used to make inferences about a
population mean when the population standard
deviation is unknown.

Two other close relatives of the normal distribution are

the chi-square and F distributions.
• These are used primarily to make inferences about
variances (or standard deviations), as opposed to
means.

36
Confidence Interval for a Mean
To obtain a confidence interval for μ, first specify a
confidence level, usually 90%, 95%, or 99%.

Then use the sampling distribution of the point estimate

to determine the multiple of the standard error (SE) to
go out on either side of the point estimate to achieve
the given confidence level.
• If the confidence level is 95%, the value used most
frequently in applications, the multiple is
approximately 2. More precisely, it is a t-value.
• A typical confidence interval for μ is of the form:
where
37
Confidence Interval for a Mean
To obtain the correct t-multiple, let α be 1 minus the
confidence level (expressed as a decimal).
• For example, if the confidence level is 90%, then α =
0.10.
Then the appropriate t-multiple is the value that cuts off
probability α/2 in each tail of the t distribution with n−1
degrees of freedom.
As the confidence level increases, the length of the
confidence interval also increases.
As n increases, the standard error s/√n decreases, so
the length of the confidence interval tends to decrease
for any confidence level.
38
Example 3: Customer Response to
a New Sandwich
Objective: To obtain a 95% confidence interval for the mean
satisfaction rating of the new sandwich.
Solution: A random sample of 40 customers who ordered a new
sandwich were surveyed. Each was asked to rate the sandwich
on a scale of 1 to 10.
The results appear in column B below.
This method, using only Excel®, is shown by the formulas in
column G.

Download full An Introduction to Generalized Linear Models Third Edition Barnett ebook all chapters
No ratings yet
Download full An Introduction to Generalized Linear Models Third Edition Barnett ebook all chapters
55 pages
Econ MIdterm 2 Practise
No ratings yet
Econ MIdterm 2 Practise
11 pages
Sampling: Presented To Dr. Dibyojyoti Bhattacharjee Reader Dba-Sms Assam University
No ratings yet
Sampling: Presented To Dr. Dibyojyoti Bhattacharjee Reader Dba-Sms Assam University
31 pages
Chapter 4 Sampling
No ratings yet
Chapter 4 Sampling
18 pages
Chapter 4 SAMPLING PDF
No ratings yet
Chapter 4 SAMPLING PDF
18 pages
Sampling by Kapil
No ratings yet
Sampling by Kapil
35 pages
Chapter 3 Sent To Class
No ratings yet
Chapter 3 Sent To Class
48 pages
Business Statistics - Sampling
No ratings yet
Business Statistics - Sampling
18 pages
Week 4 Sampling and Sampling Procedures
No ratings yet
Week 4 Sampling and Sampling Procedures
47 pages
1741362494_L24-L26_NPEC595
No ratings yet
1741362494_L24-L26_NPEC595
27 pages
Sampling Technique - 9A
No ratings yet
Sampling Technique - 9A
33 pages
1.sampling Methods and Sample Size Determination
No ratings yet
1.sampling Methods and Sample Size Determination
80 pages
Introduction To Sampling: Agenda
No ratings yet
Introduction To Sampling: Agenda
20 pages
Population and sampling
No ratings yet
Population and sampling
31 pages
Research Methodology
No ratings yet
Research Methodology
61 pages
Research Methodology: Lecture 4
No ratings yet
Research Methodology: Lecture 4
43 pages
CHAPTER 3: Data Collection and Basic Concepts in Sampling Design
No ratings yet
CHAPTER 3: Data Collection and Basic Concepts in Sampling Design
11 pages
University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
No ratings yet
University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
34 pages
Chapter 4 Sampling Design
No ratings yet
Chapter 4 Sampling Design
34 pages
Research Method CHAPTER 5-6
No ratings yet
Research Method CHAPTER 5-6
80 pages
RM UNIT 3 - part a
No ratings yet
RM UNIT 3 - part a
39 pages
chapter 6 introduction to sampling
No ratings yet
chapter 6 introduction to sampling
63 pages
Sampling (Method)
No ratings yet
Sampling (Method)
31 pages
5-6.sampling Error and Confidence Interval 1
No ratings yet
5-6.sampling Error and Confidence Interval 1
65 pages
Samplingdesign 161107154543
No ratings yet
Samplingdesign 161107154543
24 pages
6.sampling technique (2)
No ratings yet
6.sampling technique (2)
65 pages
Lecture 11 (Sampling & SIze)
No ratings yet
Lecture 11 (Sampling & SIze)
49 pages
Sampling and Sampling Distributions
No ratings yet
Sampling and Sampling Distributions
63 pages
Chapter Five: Research Design
No ratings yet
Chapter Five: Research Design
33 pages
Intro W10 Rev
No ratings yet
Intro W10 Rev
23 pages
9-1 ASAP Statistics - Sampling-1
No ratings yet
9-1 ASAP Statistics - Sampling-1
50 pages
MR8 (1)
No ratings yet
MR8 (1)
28 pages
Sampling MR
No ratings yet
Sampling MR
51 pages
Sampling Issues in Research
No ratings yet
Sampling Issues in Research
55 pages
Chap- 6 Sampling Design
No ratings yet
Chap- 6 Sampling Design
12 pages
09. Sampling Techniques
No ratings yet
09. Sampling Techniques
52 pages
Module III- SAMPLING and ANALYSIS
No ratings yet
Module III- SAMPLING and ANALYSIS
37 pages
12 ABRM-Sampling in Research
No ratings yet
12 ABRM-Sampling in Research
31 pages
MR CHAPTER EIGHT (1)
No ratings yet
MR CHAPTER EIGHT (1)
4 pages
Sampling Design (1)
No ratings yet
Sampling Design (1)
33 pages
Hair_EOMR_6e_Chap006_PPT_Accessible
No ratings yet
Hair_EOMR_6e_Chap006_PPT_Accessible
23 pages
Sampling Methods
50% (6)
Sampling Methods
34 pages
BIOMETRY
No ratings yet
BIOMETRY
37 pages
sampling
No ratings yet
sampling
22 pages
Sampling
No ratings yet
Sampling
41 pages
3. Sampling Technique
100% (1)
3. Sampling Technique
38 pages
Chandigarh University Department of Commerce
No ratings yet
Chandigarh University Department of Commerce
16 pages
Chapter 5&6
No ratings yet
Chapter 5&6
104 pages
SAMPLING and Sampling Distribution
No ratings yet
SAMPLING and Sampling Distribution
49 pages
Sampling Techniques
No ratings yet
Sampling Techniques
61 pages
Sampling Techniques
No ratings yet
Sampling Techniques
43 pages
09. Sampling Techniques
No ratings yet
09. Sampling Techniques
52 pages
Sampling and Estimation
No ratings yet
Sampling and Estimation
15 pages
Unit 6 I
No ratings yet
Unit 6 I
33 pages
Introduction to Sampling
No ratings yet
Introduction to Sampling
17 pages
Sampling PPT
No ratings yet
Sampling PPT
46 pages
Reading Assignment. Sampling
No ratings yet
Reading Assignment. Sampling
78 pages
New Sampling
No ratings yet
New Sampling
15 pages
Sample and Population: Heni Purnama, Mns
No ratings yet
Sample and Population: Heni Purnama, Mns
35 pages
CH 3 Sampling
No ratings yet
CH 3 Sampling
78 pages
Asm Assignment
No ratings yet
Asm Assignment
11 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
No ratings yet
STAT Summative Test - Q3 (Week 5-6)
100% (2)
STAT Summative Test - Q3 (Week 5-6)
2 pages
Package Modeest': R Topics Documented
No ratings yet
Package Modeest': R Topics Documented
31 pages
Study Material On Unit-I
No ratings yet
Study Material On Unit-I
19 pages
Binomialnegativaheterogenea PDF
No ratings yet
Binomialnegativaheterogenea PDF
6 pages
Complete Download Probability and statistics with R Second Edition. Edition Arnholt PDF All Chapters
No ratings yet
Complete Download Probability and statistics with R Second Edition. Edition Arnholt PDF All Chapters
81 pages
Laboratory 1.1
No ratings yet
Laboratory 1.1
68 pages
Essentials of Econometrics 5th Edition Damodar Gujarati all chapter instant download
No ratings yet
Essentials of Econometrics 5th Edition Damodar Gujarati all chapter instant download
40 pages
Class Work 3
No ratings yet
Class Work 3
5 pages
Download Full Multilevel Modeling Using R (Second Edition) W. Holmes Finch PDF All Chapters
100% (1)
Download Full Multilevel Modeling Using R (Second Edition) W. Holmes Finch PDF All Chapters
55 pages
EE378A - Combined Notes
No ratings yet
EE378A - Combined Notes
76 pages
BTHBSC... 301 Mathematics-III 2023-24
No ratings yet
BTHBSC... 301 Mathematics-III 2023-24
3 pages
Applied economic forecasting using time series methods Ghysels All Chapters Instant Download
100% (2)
Applied economic forecasting using time series methods Ghysels All Chapters Instant Download
66 pages
Time Series Analysis
No ratings yet
Time Series Analysis
32 pages
10 Inferential Statistics
No ratings yet
10 Inferential Statistics
39 pages
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
No ratings yet
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
45 pages
CS2 - Risk Modelling and Survival Analysis Core Principles: Syllabus
0% (1)
CS2 - Risk Modelling and Survival Analysis Core Principles: Syllabus
8 pages
Metrics 2019 Lec3
No ratings yet
Metrics 2019 Lec3
59 pages
Applying Quantitative Bias Analysis to Epidemiologic Data, 2nd Edition Timothy L. Lash All Chapters Instant Download
100% (4)
Applying Quantitative Bias Analysis to Epidemiologic Data, 2nd Edition Timothy L. Lash All Chapters Instant Download
40 pages
Econometrics I 1
No ratings yet
Econometrics I 1
40 pages
Violation of Assumptions
No ratings yet
Violation of Assumptions
61 pages
Correlation Functions and Power Spectra: Jan Larsen
No ratings yet
Correlation Functions and Power Spectra: Jan Larsen
29 pages
Interval Estimation
No ratings yet
Interval Estimation
33 pages
Introduction to Econometrics 3rd, global Edition James H. Stock pdf download
100% (2)
Introduction to Econometrics 3rd, global Edition James H. Stock pdf download
49 pages
Natural Resources Forum - 2023 - Debonheur
No ratings yet
Natural Resources Forum - 2023 - Debonheur
35 pages
Some Estimation Methods For Dynamic Panel Data Models: July 2014
No ratings yet
Some Estimation Methods For Dynamic Panel Data Models: July 2014
9 pages
Density of The Ratio of Two Normal Random Variables and Applications
No ratings yet
Density of The Ratio of Two Normal Random Variables and Applications
24 pages
L2 - Inference About One Population Variance
No ratings yet
L2 - Inference About One Population Variance
8 pages