Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit 3 Statistical and Modelling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

UNIT 3 - STATISTICAL TESTING AND MODELLING

SAMPLING DISTRIBUTIONS
• A sampling distribution is a concept used in statistics.
• It is a probability distribution of a statistic obtained from a larger number of
samples drawn from a specific population.
• It describes a range of possible outcomes for a statistic, such as the mean or
mode of some variable, of a population.
• The majority of data analyzed by researchers are actually samples, not
populations.
• Sampling distributions (or the distribution of data) are statistical metrics that
determine whether an event or certain outcome will take place.
• This distribution depends on a few different factors, including the sample size,
the sampling process involved, and the population as a whole.
There are a few steps involved with sampling distribution. These include:
1. Choosing a random sample from the overall population
2. Determine a certain statistic from that group, which could be the standard
deviation, median, or mean
3. Establishing a frequency distribution of each sample
4. Mapping out the distribution on a graph
Types of Sampling Distributions
• Sampling Distribution of the Mean: This method shows a normal distribution
where the middle is the mean of the sampling distribution. As such, it
represents the mean of the overall population. In order to get to this point, the
researcher must figure out the mean of each sample group and map out the
individual data.
• Sampling Distribution of Proportion: This method involves choosing a sample
set from the overall population to get the proportion of the sample. The mean
of the proportions ends up becoming the proportions of the larger group.
• T-Distribution: This type of sampling distribution is common in cases of small
sample sizes. It may also be used when there is very little information about the

1
UNIT 3 - STATISTICAL TESTING AND MODELLING

entire population. T-distributions are used to make estimates about the mean
and other statistical points.
• The central “balance” point of a sampling distribution is its mean, but the
standard deviation of a sampling distribution is referred to as a standard error.
• The theoretical formulas for various sampling distributions therefore depend
upon
(a) The original probability distributions that are assumed to have generated the
raw data and
(b) The size of the sample itself.
Distribution for a Sample Mean
• The arithmetic mean is arguably the most common measure of centrality used
when summarizing a data set.
• Estimated sample mean is described as follows:

Formally, denote the random variable of interest as X.


• This represents the mean of a sample of n observations from the “raw
observation” random variable X, as in x1, x2,,,,,, xn.
• The conditions for finding the probability distribution of a sample mean vary
depending on whether you know the value of the standard deviation.

2
UNIT 3 - STATISTICAL TESTING AND MODELLING

The Central Limit Theorem states that the sampling distribution of the sample means
will approach a normal distribution as the sample size increases.
• So if we do not have a normal distribution, or know nothing about our
distribution, the CLT tells us that the distribution of the sample means (x̄) will
become normal distributed as n (sample size) increases.
• How large does n have to be?
• A general rule of thumb tells us that n ≥ 30.
The Central Limit Theorem tells us that regardless of the shape of our population, the
sampling distribution of the sample mean will be normal as the sample size increases.

3
UNIT 3 - STATISTICAL TESTING AND MODELLING

• The nature of the sampling distribution therefore depends upon whether the
true standard deviation of the observations is known, as well as the sample size
n.
• The CLT states that normality occurs even if the raw observation distribution is
itself not normal, but this approximation is less reliable if n is small. It’s a
common rule of thumb to rely on the CLT only if n >= 30.
As an example, suppose that the daily maximum temperature in the month of
January in New Zealand, follows a normal distribution, with a mean of 22
degrees Celsius and a standard deviation of 1.5 degrees.
Then, in line with the comments for situation 1, for samples of size n = 5, the
sampling distribution of X will be normal, with mean 22 and standard error

Distribution for a Sample Proportion


• If n trials of a success/failure event are performed you can obtain an estimate
of the proportion of successes;
• if another n trials are performed, the new estimate could vary.

The population proportion (p) is a parameter that is as commonly estimated as the


mean. It is just as important to understand the distribution of the sample proportion,
as the mean. With proportions, the element either has the characteristic you are
interested in or the element does not have the characteristic. The sample proportion
(p) is calculated by

4
UNIT 3 - STATISTICAL TESTING AND MODELLING

where (x) is the number of elements in your population or success in trials with the
characteristic and n is the sample size.
You are studying the number of cavity trees in the National Forest for wildlife
habitat. You have a sample size of n = 950 trees and, of those trees, x = 238 trees
with cavities. The sample proportion is:

T-Distribution
The t-distribution, also known as the Student’s t-distribution, is a type of probability
distribution that is similar to the normal distribution with its bell shape but has
heavier tails.
• It is used for estimating population parameters for small sample sizes or
unknown variances.
• T-distributions have a greater chance for extreme values than normal
distributions, and as a result have fatter tails.
• The t-distribution is the basis for computing t-tests in statistics.

Tail heaviness is determined by a parameter of the t-distribution called degrees of


freedom, with smaller values giving heavier tails, and with higher values making the
t-distribution resemble a standard normal distribution with a mean of 0 and a
standard deviation of 1.

5
UNIT 3 - STATISTICAL TESTING AND MODELLING

You might also like