Lecture 3 - Sampling-Distribution & Central Limit Theorem
Lecture 3 - Sampling-Distribution & Central Limit Theorem
Lecture 3 - Sampling-Distribution & Central Limit Theorem
Lecture Objective:
In the past lectures, we discussed about the populations of discrete and continuous random variables and
the parameters that describe them. In this lecture, our goal is to learn about samples and the statistics
that describe them.
References Used:
Albert, J., Albacea, J., Ayaay, M., David, I., and de Mesa, I. (2016). Teaching Guide for Senior High School –
Statistics and Probability. Commission on Higher Education K to 12 Transition Program Management
Unit.
Melosantos, L., Antonio, J., Robles, S., Bruce, R., and Sacluti, J (2016). Math Connections in the Digital Age
Statistics and Probability. Quezon City: Sibs Publishing House, Inc., 2016
Mendelhall, W., Beaver, R., and Beaver, B. (2013). Introduction to Probability and Statistics.
Pacific Grove, Calif. : Brooks/Cole ; Andover : Cengage Learning [distributor], 2013.
Lecture 3.1
Random Sampling
Introduction
Previously we learned about the normal distribution. Recall that the shape of a normal distribution is
determined by the mean and the standard deviation of the random variable. The mean and the standard
deviation of the normal random variable are called its parameters. To this, we say that we need the
parameters of the normal random variable to calculate probabilities associated with it. However, in the
real-world set-up, frequently, the parameters of the normal random variable are not always known.
For example, since grades are generally considered as normal random variables, then we have a reason to
believe that the scores of all Grade 11 UST-SHS Students in the Statistics and Probability 1 st Quarterly
examinations is normally distributed. However, its parameters (the mean μ and standard deviation σ )
may not always be available (or difficult to acquire).
In such case, we may have to rely on the sample to learn about the population through the statistics that
describe them. The mean and standard deviation of the sample grades of Grade 11 UST-SHS Students
approximate the actual values of μ and σ . Now, if we want to provide reliable and valid information about
the population, we must be able to select the sample in a reasonable and justified way, that is, a
statistically-based randomized way.
1
(Read pages 243 to 246 of the Introduction to Probability and Statistics textbook (viewing link:
https://drive.google.com/file/d/152oxLsvFxxDIX2ly1Tmy8BS7d5bGquWP/view?usp=sharing) to learn
the proper way of selecting random samples.)
Lesson Proper
When we select a random sample from a given population, the numerical descriptive measures (mean,
standard deviation, and variance) are called its statistics. Note that the statistics of a sample taken from
the possible values of a random variable may be different each time you select as it is random in nature.
The probability distribution for the statistics is called sampling distribution.
The sampling distribution can tell us the values of that statistic, and how often each value of the
random variable occurs.
Remark. There are generally three ways to find the sampling distribution of a statistic. The most
economical way to determine it is to use proven statistical theorems to derive the exact or approximate
sampling distributions.
On another note, since the sampling distribution of x́ possesses a different standard deviation value from
X , it follows that its standard score is also different. So, to standardize x́, we use the formula:
x́−μ
z=
σ
√n
2
Example 3.1.1. The duration of Alzheimer’s disease from the onset of symptoms until death ranges from 3
to 20 years; the average is 8 years with a standard deviation of 4 years. The administrator of a large
medical center randomly selects the medical records of 30 deceased Alzheimer’s patients from the
medical center’s database and records the average duration. Find the approximate probabilities for these
events: a) The average duration is less than 7 years, and b) the average duration exceeds 7 years.
Solution to Example 3.1.1
a) It can be deduced from the given that the distribution is skewed a bit to the right. But regardless of
the population distribution, we know that sampling distribution of its sample mean x́has mean of
4
8 and a standard deviation of due to the Central Limit Theorem (CLT). Now, to find that the
√30
average is less than 7 years, we need to compute for P ( x́<7 ) . And so, we have
7−8
( )
P ( x́<7 )=P z <
4
√ 30
=P ( z ←1.37 )=0.0853∨8.53 %
Recall that the value 0.0853 can be determined using a z-table; or the NORMSDIST function of MS
Excel (syntax: =norm.s.dist(-1.37,true)); or the NORMDIST function (syntax:
=norm.dist(7,8,0.7303,true))
Example 3.1.2. An unknown distribution has a mean of 90 and a standard deviation of 15. Samples of size
n=49 , are drawn randomly from the population. Find the probability that the sample mean is between
85 and 92.
92−90 85−90
(
P ( 85< x́ <92 )=P ( x́< 92 )−P ( x́< 85 )=P z<
15
√ 49) (
−P z<
15
√ 49 )
After simplifying the argument of the probability function, we have,
14 7
P z<( 15) ( )
−P z ← =0.8149∨81.49 %
3
Illustration:
P( x́ )
3
x́
Remark. It is imperative to understand
85 when the central
90 limit
92 theorem is used. If we are being asked to
find the probability of the mean, then we should use the CLT for the mean. On the other hand, note that if
we are being asked to find the probability concerning the value of a random variable, we use the
techniques for computing probabilities associated with the random variable’s distribution.
Remark. The important contribution of the central limit theorem will be highlighted when we embark on
making statistical inferences. We will see that the value used as “estimators” about the population’s
parameters are based on the averages of the sample measurements.
Example 3.1.3. In a recent study report made by a certain organization, it was discovered that the mean
age of tablet users is 34 years with a standard deviation of 15 years.
a) Suppose we take a sample size of n=100, what can we say about the mean and the standard error
for the sampling distribution of the sample mean ages of tablet users?
b) What does the distribution look like?
c) Using the reported parameters on this study, find the probability that the sample mean is more
than 30 years old.
d) What is the age where only 5% of the sample means age are greater than it?
√100
d) The age where only 5% of the samples means are greater than it its 95 th percentile. In other words,
we want to find the sample mean, where 5% of the data are greater than it (or alternatively, 95%
are less than it). Thus, we need to compute for
Seeing that it is shorter (not to mention, easier) to compute for the value of k using P ( x́< k )=0.95,
we go and solve this ahead using MS Excel. And so,
4
P ( x́< k )=0.95k =36.47 years old
(The value of k was obtained using the following syntax:
=norm.inv(0.95,34,15/10))
Supplementary Exercises
1) Random samples of size n were selected from population with the means and variances given
here. Find the mean and standard deviation of the sampling distribution of the sample mean in
each case:
a) n=36, μ=10, σ 2=9 . Ans. mean = 10, standard deviation or standard error = 1/12
b) n=100, μ=5, σ 2=4 . Ans. mean = 5, standard deviation or standard error = 1/5
2) Suppose that SHS faculty members (with a master teacher rank) in the Philippines earn an
average of 720,000 php per year with a standard deviation of 10,000. In an attempt to verify these
given, a random sample of 60 master teachers were selected from a database for all master
teachers in the country.
a) Describe the sampling distribution of the sample mean in terms of its governing
characteristics.
Ans. The sampling distribution of the sample mean would approximately equal to 720k php.
b) Within what limits would you expect the sample average to lie, with a probability of 90
percent?
Ans. 721654.48 php
c) Calculate the probability that the sample mean is greater than 725,000 php per year.
Ans. approximately 0
d) If your random sample actually produced a mean of 760,000 php, would you consider this as
unusual? What conclusions you might draw?
Ans. Unusual, because of the large number of the sample, this must be close to the population
mean of 720,000 pesos