Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44
Chapter 7
Sampling and Sampling Distributions
Learning Objectives • In this chapter, you learn: - To distinguish between different sampling methods - The concept of the sampling distribution - To compute probabilities related to the sample mean and the sample proportion - The importance of the Central Limit Theorem Why Sample? • Selecting a sample is less time-consuming than selecting every item in the population (census). • Selecting a sample is less costly than selecting every item in the population. • An analysis of a sample is less cumbersome and more practical than an analysis of the entire population A Sampling Process Begins With A Sampling Frame • The sampling frame is a listing of items that make up the population • Frames are data sources such as population lists, directories, or maps • Inaccurate or biased results can result if a frame excludes certain portions of the population • Using different frames to generate data can lead to dissimilar conclusions Types of Samples Types of Samples: Nonprobability Sample
• In a nonprobability sample, items included
are chosen without regard to their probability of occurrence. - In convenience sampling, items are selected based only on the fact that they are easy, inexpensive, or convenient to sample. - In a judgment sample, you get the opinions of preselected experts in the subject matter Types of Samples: Probability Sample • In a probability sample, items in the sample are chosen on the basis of known probabilities. Probability Sample: Simple Random Sample
• Every individual or item from the frame has
an equal chance of being selected. • Selection may be with replacement (selected individual is returned to frame for possible reselection) or without replacement (selected individual isn’t returned to the frame). • Samples obtained from table of random numbers or computer random number generators. Selecting a Simple Random Sample Using A Random Number Table Probability Sample: Systematic Sample • Decide on sample size: n • Divide frame of N individuals into groups of k individuals: k=N/n • Randomly select one individual from the 1st group • Select every kth individual thereafter Probability Sample: Stratified Sample • Divide population into two or more subgroups (called strata) according to some common characteristic • A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes • Samples from subgroups are combined into one • This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines. Probability Sample Cluster Sample • Population is divided into several “clusters,” each representative of the population • A simple random sample of clusters is selected • All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique • A common application of cluster sampling involves election exit polls, where certain election districts are selected and sampled. Probability Sample: Comparing Sampling Methods • Simple random sample and Systematic sample - Simple to use - May not be a good representation of the population’s underlying characteristics • Stratified sample - Ensures representation of individuals across the entire population • Cluster sample - More cost effective - Less efficient (need larger sample to acquire the same level of precision) Evaluating Survey Worthiness • What is the purpose of the survey? • Is the survey based on a probability sample? Coverage error – appropriate frame? • Nonresponse error – follow up • Measurement error – good questions elicit good responses • Sampling error – always exists Types of Survey Errors • Coverage error or selection bias - Exists if some groups are excluded from the frame and have no chance of being selected • Non response error or bias - People who do not respond may be different from those who do respond • Sampling error - Variation from sample to sample will always exist • Measurement error - Due to weaknesses in question design, respondent error, and interviewer’s effects on the respondent (“Hawthorne effect”) Types of Survey Errors Sampling Distributions • A sampling distribution is a distribution of all of the possible values of a sample statistic for a given size sample selected from a population. • For example, suppose you sample 50 students from your college regarding their mean GPA. If you obtained many different samples of 50, you will compute a different mean for each sample. We are interested in the distribution of all potential mean GPA we might calculate for any given sample of 50 students. Developing a Sampling Distribution • Assume there is a population … • Population size N=4 • Random variable, X, is age of individuals • Values of X: 18, 20, 22, 24 (years) Developing a Sampling Distribution • Summary Measures for the Population Distribution: Developing a Sampling Distribution Now consider all possible samples of size n=2 Developing a Sampling Distribution • Sampling Distribution of All Sample Means Developing a Sampling Distribution • Summary Measures of this Sampling Distribution: Comparing the Population Distribution to the Sample Means Distribution Sample Mean Sampling Distribution: Standard Error of the Mean • Different samples of the same size from the same population will yield different sample means • A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean: (This assumes that sampling is with replacement or sampling is without replacement from an infinite population) •
• Note that the standard error of the mean decreases as the
sample size increase Sample Mean Sampling Distribution: If the Population is Normal • If a population is normally distributed with mean μ and standard deviation σ, the sampling distribution of is also normally distributed with Z-value for Sampling Distribution of the Mean Sampling Distribution Properties Sampling Distribution Properties Determining An Interval Including A Fixed Proportion of the Sample Means • Find a symmetrically distributed interval around µ that will include 95% of the sample means when µ = 368, σ = 15, and n = 25. • Since the interval contains 95% of the sample means 5% of the sample means will be outside the interval • Since the interval is symmetric 2.5% will be above the upper limit and 2.5% will be below the lower limit. • From the standardized normal table, the Z score with 2.5% (0.0250) below it is -1.96 and the Z score with 2.5% (0.0250) above it is 1.96. Determining An Interval Including A Fixed Proportion of the Sample Means Sample Mean Sampling Distribution: If the Population is not Normal • We can apply the Central Limit Theorem: - Even if the population is not normal, - …sample means from the population will be approximately normal as long as the sample size is large enough. Central Limit Theorem Sample Mean Sampling Distribution: If the Population is not Normal How Large is Large Enough? • For most distributions, n > 30 will give a sampling distribution that is nearly normal • For fairly symmetric distributions, n > 15 will usually give a sampling distribution is almost normal • For normal population distributions, the sampling distribution of the mean is always normally distributed Example • Suppose a population has mean μ = 8 and standard deviation σ = 3. Suppose a random sample of size n = 36 is selected. • What is the probability that the sample mean is between 7.8 and 8.2? Example Example Population Proportions Sampling Distribution of p Z-Value for Proportions Example • If the true proportion of voters who support Proposition A is π = 0.4, what is the probability that a sample of size 200 yields a sample proportion between 0.40 and 0.45? Example • if π = 0.4 and n = 200, what is P(0.40 ≤ p ≤ 0.45) ? Example • if π = 0.4 and n = 200, what is P(0.40 ≤ p ≤ 0.45) ? Chapter Summary • Discussed probability and nonprobability samples • Described four common probability samples • Examined survey worthiness and types of survey errors • Introduced sampling distributions • Described the sampling distribution of the mean • For normal populations Using the Central Limit Theorem • Described the sampling distribution of a proportion • Calculated probabilities using sampling distributions