Lecture 3 - Sampling-Distribution & Central Limit Theorem

A Transcript of the Lectures in the Sampling Methods and Sampling Distribution Chapter
Lecture Objective:
In the past lectures, we discussed about the populations of discrete and continuous random variables and
the parameters that describe them. In this lecture, our goal is to learn about samples and the statistics
that describe them.
References Used:
Albert, J., Albacea, J., Ayaay, M., David, I., and de Mesa, I. (2016). Teaching Guide for Senior High School –
Statistics and Probability. Commission on Higher Education K to 12 Transition Program Management
Unit.
Illowsky, B. and Dean, S. (2018). Introduction to Statistics. OpenStax.
Melosantos, L., Antonio, J., Robles, S., Bruce, R., and Sacluti, J (2016). Math Connections in the Digital Age
Statistics and Probability. Quezon City: Sibs Publishing House, Inc., 2016
Mendelhall, W., Beaver, R., and Beaver, B. (2013). Introduction to Probability and Statistics.
Pacific Grove, Calif. : Brooks/Cole ; Andover : Cengage Learning [distributor], 2013.
Lecture 3.1
Random Sampling
Introduction
Previously we learned about the normal distribution. Recall that the shape of a normal distribution is
determined by the mean and the standard deviation of the random variable. The mean and the standard
deviation of the normal random variable are called its parameters. To this, we say that we need the
parameters of the normal random variable to calculate probabilities associated with it. However, in the
real-world set-up, frequently, the parameters of the normal random variable are not always known.
For example, since grades are generally considered as normal random variables, then we have a reason to
believe that the scores of all Grade 11 UST-SHS Students in the Statistics and Probability 1 st Quarterly
examinations is normally distributed. However, its parameters (the mean μ and standard deviation σ )
may not always be available (or difficult to acquire).
In such case, we may have to rely on the sample to learn about the population through the statistics that
describe them. The mean and standard deviation of the sample grades of Grade 11 UST-SHS Students
approximate the actual values of μ and σ . Now, if we want to provide reliable and valid information about
the population, we must be able to select the sample in a reasonable and justified way, that is, a
statistically-based randomized way.
1
(Read pages 243 to 246 of the Introduction to Probability and Statistics textbook (viewing link:
https://drive.google.com/file/d/152oxLsvFxxDIX2ly1Tmy8BS7d5bGquWP/view?usp=sharing) to learn
the proper way of selecting random samples.)
Lesson Proper
When we select a random sample from a given population, the numerical descriptive measures (mean,
standard deviation, and variance) are called its statistics. Note that the statistics of a sample taken from
the possible values of a random variable may be different each time you select as it is random in nature.
The probability distribution for the statistics is called sampling distribution.
Def. Sampling Distribution

The sampling distribution of a statistic is the probability distribution for the possible values of the
statistic that results when the random samples of size n are repeatedly drawn from the population.
The sampling distribution can tell us the values of that statistic, and how often each value of the
random variable occurs.
Remark. There are generally three ways to find the sampling distribution of a statistic. The most
economical way to determine it is to use proven statistical theorems to derive the exact or approximate
sampling distributions.
Def. Central Limit Theorem

Suppose X is a random variable with a known or unknown distribution. The central limit theorem
states that if we draw random samples of size n from X , then when n is large, the sampling distribution
σ
of the sample mean x́ tend to be normal, having a mean that is the same as μ, and a variance equal to .
n
Given that the standard deviation is the square root of the variance, it can be shown that its standard
σ
deviation (also referred to as standard error) is .
√n
In other words, given a random variable ( σn ).

, when is large, x́ N μ ,
Remark. The approximation given by the central limit theorem becomes more accurate as n becomes
large. However, how large is “large”? Unfortunately, there is no rigorous answer to this question, but in
general, when the sample is at least thirty (30), the sampling distribution of a statistic becomes
approximately normal.
On another note, since the sampling distribution of x́ possesses a different standard deviation value from
X , it follows that its standard score is also different. So, to standardize x́, we use the formula:
x́−μ
z=
σ
√n
2
Example 3.1.1. The duration of Alzheimer’s disease from the onset of symptoms until death ranges from 3
to 20 years; the average is 8 years with a standard deviation of 4 years. The administrator of a large
medical center randomly selects the medical records of 30 deceased Alzheimer’s patients from the
medical center’s database and records the average duration. Find the approximate probabilities for these
events: a) The average duration is less than 7 years, and b) the average duration exceeds 7 years.
Solution to Example 3.1.1
a) It can be deduced from the given that the distribution is skewed a bit to the right. But regardless of
the population distribution, we know that sampling distribution of its sample mean x́has mean of
4
8 and a standard deviation of due to the Central Limit Theorem (CLT). Now, to find that the
√30
average is less than 7 years, we need to compute for P ( x́<7 ) . And so, we have
7−8
( )
P ( x́<7 )=P z <
4
√ 30
=P ( z ←1.37 )=0.0853∨8.53 %
Recall that the value 0.0853 can be determined using a z-table; or the NORMSDIST function of MS
Excel (syntax: =norm.s.dist(-1.37,true)); or the NORMDIST function (syntax:
=norm.dist(7,8,0.7303,true))
b) P ( x́>7 )=1−P ( x́ <7 )=1−P( z ←1.37)=0.9147∨91.47 %
Example 3.1.2. An unknown distribution has a mean of 90 and a standard deviation of 15. Samples of size
n=49 , are drawn randomly from the population. Find the probability that the sample mean is between
85 and 92.
Solution to Example 3.1.2.

In this problem, we want to find P ( 85< x́ <92 ) . And so, using the CLT, we have
92−90 85−90
(
P ( 85< x́ <92 )=P ( x́< 92 )−P ( x́< 85 )=P z<
15
√ 49) (
−P z<
15
√ 49 )
After simplifying the argument of the probability function, we have,
14 7
P z<( 15) ( )
−P z ← =0.8149∨81.49 %
3
Illustration:
P( x́ )
3
x́
Remark. It is imperative to understand
85 when the central
90 limit
92 theorem is used. If we are being asked to
find the probability of the mean, then we should use the CLT for the mean. On the other hand, note that if
we are being asked to find the probability concerning the value of a random variable, we use the
techniques for computing probabilities associated with the random variable’s distribution.
Remark. The important contribution of the central limit theorem will be highlighted when we embark on
making statistical inferences. We will see that the value used as “estimators” about the population’s
parameters are based on the averages of the sample measurements.
Example 3.1.3. In a recent study report made by a certain organization, it was discovered that the mean
age of tablet users is 34 years with a standard deviation of 15 years.
a) Suppose we take a sample size of n=100, what can we say about the mean and the standard error
for the sampling distribution of the sample mean ages of tablet users?
b) What does the distribution look like?
c) Using the reported parameters on this study, find the probability that the sample mean is more
than 30 years old.
d) What is the age where only 5% of the sample means age are greater than it?
Solution to Example 3.1.3.

a) Since the sample size is large enough, using the CLT, we can say that sample mean tends to be the
same as the mean of the population. And so, μ x́ =34. On the other hand, using the same theorem,
15
the standard error of the sampling distribution is σ x́ .
√ 100
b) According to the CLT, the distribution should be approximately bell-shaped.
30−34 −8
c)
(
P ( x́>30 )=1−P ( x́< 30 )=1−P z<
15
)
=1−P
3( ) =0.9962∨99.62 %
√100
d) The age where only 5% of the samples means are greater than it its 95 th percentile. In other words,
we want to find the sample mean, where 5% of the data are greater than it (or alternatively, 95%
are less than it). Thus, we need to compute for
P ( x́< k )=0.95∨P ( x́ > k )=0.05
Seeing that it is shorter (not to mention, easier) to compute for the value of k using P ( x́< k )=0.95,
we go and solve this ahead using MS Excel. And so,
4
P ( x́< k )=0.95k =36.47 years old
(The value of k was obtained using the following syntax:
=norm.inv(0.95,34,15/10))
Supplementary Exercises
1) Random samples of size n were selected from population with the means and variances given
here. Find the mean and standard deviation of the sampling distribution of the sample mean in
each case:
a) n=36, μ=10, σ 2=9 . Ans. mean = 10, standard deviation or standard error = 1/12
b) n=100, μ=5, σ 2=4 . Ans. mean = 5, standard deviation or standard error = 1/5
2) Suppose that SHS faculty members (with a master teacher rank) in the Philippines earn an
average of 720,000 php per year with a standard deviation of 10,000. In an attempt to verify these
given, a random sample of 60 master teachers were selected from a database for all master
teachers in the country.
a) Describe the sampling distribution of the sample mean in terms of its governing
characteristics.
Ans. The sampling distribution of the sample mean would approximately equal to 720k php.
b) Within what limits would you expect the sample average to lie, with a probability of 90
percent?
Ans. 721654.48 php
c) Calculate the probability that the sample mean is greater than 725,000 php per year.
Ans. approximately 0
d) If your random sample actually produced a mean of 760,000 php, would you consider this as
unusual? What conclusions you might draw?
Ans. Unusual, because of the large number of the sample, this must be close to the population
mean of 720,000 pesos
- End of the Lecture Transcript -

Lecture 3 - Sampling-Distribution & Central Limit Theorem

Uploaded by

Copyright:

Available Formats

Lecture 3 - Sampling-Distribution & Central Limit Theorem

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 3 - Sampling-Distribution & Central Limit Theorem

Uploaded by

Copyright:

Available Formats

What is a random sample?

What is a sampling distribution?

A Transcript of the Lectures in the Sampling Methods and Sampling Distribution Chapter

Illowsky, B. and Dean, S. (2018). Introduction to Statistics. OpenStax.

Def. Sampling Distribution

Def. Central Limit Theorem

In other words, given a random variable ( σn ).

b) P ( x́>7 )=1−P ( x́ <7 )=1−P( z ←1.37)=0.9147∨91.47 %

Solution to Example 3.1.2.

Solution to Example 3.1.3.

P ( x́< k )=0.95∨P ( x́ > k )=0.05

- End of the Lecture Transcript -

You might also like