Unit 3 Statistical and Modelling
Unit 3 Statistical and Modelling
Unit 3 Statistical and Modelling
SAMPLING DISTRIBUTIONS
• A sampling distribution is a concept used in statistics.
• It is a probability distribution of a statistic obtained from a larger number of
samples drawn from a specific population.
• It describes a range of possible outcomes for a statistic, such as the mean or
mode of some variable, of a population.
• The majority of data analyzed by researchers are actually samples, not
populations.
• Sampling distributions (or the distribution of data) are statistical metrics that
determine whether an event or certain outcome will take place.
• This distribution depends on a few different factors, including the sample size,
the sampling process involved, and the population as a whole.
There are a few steps involved with sampling distribution. These include:
1. Choosing a random sample from the overall population
2. Determine a certain statistic from that group, which could be the standard
deviation, median, or mean
3. Establishing a frequency distribution of each sample
4. Mapping out the distribution on a graph
Types of Sampling Distributions
• Sampling Distribution of the Mean: This method shows a normal distribution
where the middle is the mean of the sampling distribution. As such, it
represents the mean of the overall population. In order to get to this point, the
researcher must figure out the mean of each sample group and map out the
individual data.
• Sampling Distribution of Proportion: This method involves choosing a sample
set from the overall population to get the proportion of the sample. The mean
of the proportions ends up becoming the proportions of the larger group.
• T-Distribution: This type of sampling distribution is common in cases of small
sample sizes. It may also be used when there is very little information about the
1
UNIT 3 - STATISTICAL TESTING AND MODELLING
entire population. T-distributions are used to make estimates about the mean
and other statistical points.
• The central “balance” point of a sampling distribution is its mean, but the
standard deviation of a sampling distribution is referred to as a standard error.
• The theoretical formulas for various sampling distributions therefore depend
upon
(a) The original probability distributions that are assumed to have generated the
raw data and
(b) The size of the sample itself.
Distribution for a Sample Mean
• The arithmetic mean is arguably the most common measure of centrality used
when summarizing a data set.
• Estimated sample mean is described as follows:
2
UNIT 3 - STATISTICAL TESTING AND MODELLING
The Central Limit Theorem states that the sampling distribution of the sample means
will approach a normal distribution as the sample size increases.
• So if we do not have a normal distribution, or know nothing about our
distribution, the CLT tells us that the distribution of the sample means (x̄) will
become normal distributed as n (sample size) increases.
• How large does n have to be?
• A general rule of thumb tells us that n ≥ 30.
The Central Limit Theorem tells us that regardless of the shape of our population, the
sampling distribution of the sample mean will be normal as the sample size increases.
3
UNIT 3 - STATISTICAL TESTING AND MODELLING
• The nature of the sampling distribution therefore depends upon whether the
true standard deviation of the observations is known, as well as the sample size
n.
• The CLT states that normality occurs even if the raw observation distribution is
itself not normal, but this approximation is less reliable if n is small. It’s a
common rule of thumb to rely on the CLT only if n >= 30.
As an example, suppose that the daily maximum temperature in the month of
January in New Zealand, follows a normal distribution, with a mean of 22
degrees Celsius and a standard deviation of 1.5 degrees.
Then, in line with the comments for situation 1, for samples of size n = 5, the
sampling distribution of X will be normal, with mean 22 and standard error
4
UNIT 3 - STATISTICAL TESTING AND MODELLING
where (x) is the number of elements in your population or success in trials with the
characteristic and n is the sample size.
You are studying the number of cavity trees in the National Forest for wildlife
habitat. You have a sample size of n = 950 trees and, of those trees, x = 238 trees
with cavities. The sample proportion is:
T-Distribution
The t-distribution, also known as the Student’s t-distribution, is a type of probability
distribution that is similar to the normal distribution with its bell shape but has
heavier tails.
• It is used for estimating population parameters for small sample sizes or
unknown variances.
• T-distributions have a greater chance for extreme values than normal
distributions, and as a result have fatter tails.
• The t-distribution is the basis for computing t-tests in statistics.
5
UNIT 3 - STATISTICAL TESTING AND MODELLING