Business Data Analytics Students-07-Sampling PDF
Business Data Analytics Students-07-Sampling PDF
Sampling
Business Analytics 07
Tools of Business Statistics
• Descriptive statistics
• Collecting, presenting, and describing data
• Inferential statistics
• Drawing conclusions and/or making decisions concerning a
population based only on sample data
Recall…
• Statistics is a tool for converting data into information:
Data Statistics Information
a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y
Population Parameters
• Measures such as mean and standard deviation calculated using the entire
population are called population parameters
• The population parameters mean and standard deviation are usually denoted
using symbols and , respectively
Inferential Statistics
Sample (known) Population (unknown but can be
estimated from sample)
Sample Population
Sample Statistic
• When population parameters are estimated from sample they are called
sample statistic or statistic
• The sample statistic is denoted using symbols X (for mean) and S (or s
for standard deviation)
Inferential Statistics
• Estimation
• e.g., Estimate the population mean
weight using the sample mean weight
• Hypothesis Testing
• e.g., Use sample evidence to test the
claim that the population mean weight
is 120 pounds
Drawing conclusions and/or making decisions
concerning a population based on sample results.
Sampling
• Random sampling is usually carried out without replacement, that is, an observation which
is selected in the sample is removed from the population for further consideration
• Random samples can also be created with replacement, that is, an observation which is
selected for inclusion in the sample can again be considered since it is replaced (not
removed) in the population.
Simple Random Samples
• Every object in the population has an equal chance of being selected
• Objects are selected independently
• Samples can be obtained from a table of random numbers or computer random number
generators
• A simple random sample is the ideal against which other sample methods are compared
• Patients and length of stay (LoS) in days
Patient 1 2 3 4 5 6 7 8 9 10
LoS 4 20 12 13 15 17 16 20 9 17
3 4 5 1 8 12 13 15 4 20
1 7 9 1 3 4 16 9 4 12
8 4 7 3 5 20 13 16 12 15
Additional Probabilistic Sampling Methods
• Systematic (periodic) sampling – a sampling plan that selects every nth item from the
population.
• Stratified sampling – applies to populations that are divided into natural subsets (called strata)
and allocates the appropriate proportion of samples to each stratum.
• Cluster sampling - based on dividing a population into mutually exclusive subgroups (clusters),
sampling a set of clusters, and (usually) conducting a complete census within the clusters sampled
• Sampling from a continuous process
• Select a time at random; then select the next n items produced after that time.
• Select n times at random; then select the next item produced after each of these times.
Systematic Sampling
• If a sample size of n is desired from a population containing N elements, we
might sample one element for every n/N elements in the population
• We randomly select one of the first n/N elements from the population list
• We then select every n/Nth element that follows in the population list
• This method has the properties of a simple random sample, especially if the list
of the population elements is a random ordering
• Example: Selecting every 100th listing in a telephone book after the first
randomly selected listing
Stratified Sampling
• Sampling the data is collected from people who volunteer for such data collection.
• There could be bias in case of voluntary sampling
Sampling Distributions
σ
i
(X μ) 2
2.236
0
18 20 22 24 x
N A B C D
Uniform Distribution
Developing Sampling Distributions
1st 2nd Observation
16 Sample
Obs 18 20 22 24
Means
18 18,18 18,20 18,22 18,24 1st 2nd Observation
Obs 18 20 22 24
20 20,18 20,20 20,22 20,24 18 18 19 20 21
20 19 20 21 22
22 22,18 22,20 22,22 22,24 22 20 21 22 23
24 21 22 23 24
24 24,18 24,20 24,22 24,24
16 possible samples
(sampling with
replacement)
Expected Value of Sample Mean
E(X)
X
18 19 21 24
i
21 μ
N 16
σX
( X i μ) 2
N
(18 - 21) 2 (19 - 21) 2 (24 - 21) 2
1.58
16
Comparing the Population with its Sampling
Distribution
Population Sample Means Distribution
N=4 n=2
μ 21 σ 2.236 μX 21 σ X 1.58
_
P(X) P(X)
.3 .3
.2 .2
.1 .1
0
18 20 22 24
0
18 19 20 21 22 23 24
_
X X
A B C D
Standard Error of the Mean
• Different samples of the same size from the same population will yield different sample
means
• A measure of the variability in the mean from sample to sample is given by the Standard
Error of the Mean:
σ
σX
n
• Note that the standard error of the mean decreases as the sample size increases
Central Limit Theorem
• Let S1, S2, …, Sk be samples of size n drawn from an independent and
identically distributed population with mean and standard deviation .
• Let X1 , X 2 , ..., X k be the sample means (of the samples S1, S2, …, Sk).
x
Why study Central Limit Theorem
• Central limit theorem is the basis for hypothesis tests such as Z test and t test. In many
cases, we will have access to only a sample and the inference about the population has to be
made based on sample statistic.
Population Distribution
Sampling distribution
properties:
Central Tendency
μx μ
μ x
Sampling Distribution
Variation
σ
σx
(becomes normal as n increases)
Larger
n Smaller
sample size
sample
size
μx x
Alternative version of CLT can be stated as follows:
Let X1, X2, …, Xn be n random variables that are independent and identically
distributed with mean and standard deviation . Then for large n, mean
_ X1 X 2 ... X n
X
n
follows a normal distribution with mean standard error / n
How Large is Large Enough?
• If the population is normal, then X is normally distributed for all values of n.
• For normal population distributions, the sampling distribution of the mean is always normally
distributed
• If the population is non-normal, then X is approximately normal only for larger values of n.
• In most practical situations, a sample size n > 30 will give a sampling distribution that may be
sufficiently large to allow us to use the normal distribution as an approximation for the sampling
distribution of X.
Estimation of Population Parameters
• Estimation is a process used for making inferences about population parameters
based on samples