Lesson: Sampling and Sampling Distributions
Lesson: Sampling and Sampling Distributions
What is Sampling?
In statistics, sampling is the process of selecting a subset (or "sample") from a larger group
(or "population") to gather information and make inferences about the entire group. It's often
impossible or impractical to study every single member of a population, so we use samples
to represent them.
Why do we sample?
● Cost-effectiveness: Studying a sample is usually much cheaper than
studying an entire population.
● Time efficiency: Gathering data from a sample takes less time.
● Feasibility: For some populations (e.g., all fish in the ocean), it's physically
impossible to examine every individual.
● Accuracy: A well-chosen sample can often provide results that are just as
accurate (or even more so) than studying the entire population, especially if the
population is very large.
Key Terms
● Population: The entire group that we are interested in studying.
● Sample: A subset of the population that we collect data from.
● Parameter: A numerical value that describes a characteristic of the
population (e.g., the average height of all students in a university).
● Statistic: A numerical value that describes a characteristic of the sample
(e.g., the average height of a sample of students from that university).
● Sampling Frame: A list of all the individuals or units in the population from
which the sample is selected.
Sampling Methods
There are various methods for selecting a sample. Here are some common ones:
● Simple Random Sampling: Every member of the population has an equal
chance of being selected. (e.g., drawing names from a hat).
● Stratified Sampling: The population is divided into subgroups (strata) based
on shared characteristics (e.g., age, gender), and a random sample is taken from each
stratum. This ensures representation from all subgroups.
● Systematic Sampling: Members of the population are selected at regular
intervals (e.g., every 10th person on a list).
● Cluster Sampling: The population is divided into clusters (e.g., geographic
regions), and a random sample of clusters is selected. All individuals within the
selected clusters are included in the sample.
● Convenience Sampling: Individuals are selected based on their availability or
ease of access. This method is often the easiest, but it can lead to biased results.
● Multistage Sampling: A combination of two or more sampling methods. For
example, a researcher might first use cluster sampling to select school districts, and
then use simple random sampling to select students within those districts.
Sampling Distributions
A sampling distribution is the probability distribution of a statistic (like the sample mean)
that is obtained through repeated sampling from the same population.
Imagine you take many different samples from the same population, and for each sample,
you calculate the sample mean. If you make a histogram of all those sample means, that
histogram approximates the sampling distribution of the sample mean.
Key Properties of Sampling Distributions (for the Sample Mean)
● Central Limit Theorem: For a sufficiently large sample size (usually n ≥ 30),
the sampling distribution of the sample mean will be approximately normal, regardless
of the shape of the original population's distribution.
● Mean of the Sampling Distribution: The mean of the sampling distribution of
the sample mean (µx̄ ) is equal to the population mean (µ).
● Standard Error of the Mean: The standard deviation of the sampling
distribution of the sample mean is called the standard error of the mean (σx̄ ). It is
calculated as σx̄ = σ / √n, where σ is the population standard deviation and n is the
sample size. The standard error measures how much the sample means are likely to
vary from the population mean.