Estimation Hypothesis Testing: Decisions Inferences

Mr.
Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics
LECTURE 2
Confidence Intervals and sample size
Introduction
 In this lecture, we will start to study and explain the second branch of statistics which is named
"INFERENTIAL STATISTICS".
 The inferential statistics is a branch interested in making decisions (inferences) about the
population parameters by applying some basic statistical methods (techniques).
 The inferential statistics is divided into two basic areas of study are defined as:
1. Estimation  Is a process of estimating the value of a population parameter using the
information known about a sample taken from this population.
2. Hypothesis Testing  Is a process of testing claims about the population parameters
that may be or may not be true and it helps in making decisions.
 The most common population parameters used in inferential statistics are "mean, proportion,
variance, and standard deviation".
 In both of estimation and hypothesis testing, the sample statistics are used to estimate the
population parameters. These statistics are called estimators, see the following table:
MEASURES Mean Variance Proportion
Population  σ2 p
Sample x s2 p̂
 A good estimator should satisfy three basic properties summarized as:
(1) It is unbiased estimator  If the expected (mean) value of that estimator is equal to the
corresponding population parameter.
(2) It is consistent estimator  If as the sample size increases, the value of that estimator
approaches to the corresponding population parameter.
(3) It is relatively efficient  If that estimator has the smallest variance, of all the statistics
that can be used to estimate the population parameter.
 There are some assumptions must be known before making the decisions (inferences) about
the population parameters are summarized as:
 The samples must be randomly selected.
 The sample size is greater than or equal to 30 (i.e, population is normally distributed).
 If the sample size is less than 30, the population must be approximately normally distributed.
LECTURE 2 PAGE 1 STAT 2040

Mr. Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics
Estimation of the population parameters

 Suppose a college president wishes to estimate the average age of students attending classes
this semester. If the president selects a random sample of 100 students and found that the
average age of these students is 22.3 years.
 Using the sample data obtained, the president can say that the average age of all the students
in the college is 22.3 years. This type of estimate is called "POINT ESTIMATE".
 Also, the president can say that the average age of all the students is included inside an
interval having an infinite number of ages. This type of estimate is called "INTERVAL ESTIMATE".
 There are two basic methods to estimate the population parameters are described as:
 Point Estimate  Is an estimate in which the population parameter is estimated as a
specific numerical value.
 Interval Estimate  Is an estimate in which the population parameter is estimated as
an interval of infinite values that includes the actual value of
the population parameter.
 We will study and explain the point and interval estimates in four different cases are:
1. The point and interval estimates for the population mean when  is known.
2. The point and interval estimates for the population mean when  is unknown.
3. The point and interval estimates for the population proportion (percentages).
4. The point and interval estimates for the population variance and standard deviation.
 Also, in only two different cases, we will show how to estimate the sample size that helps
the researchers to make an accurate estimate of the population parameters.
Basic Concepts
 The best point estimate of the population parameter is the corresponding sample statistic.
 The interval that contains the actual value of a population parameter is called "Confidence
Interval". It is denoted by C.I.
 The probability that the interval estimated contain the actual value of a population parameter
is called "Confidence Level or Degree of Confidence". It is denoted by (1- α)100%.
 There are three common confidence levels that will be used here are "the 90%, the 95%, and
the 99%" confidence levels.

1. Confidence Intervals for the population mean when σ is known

 In this section, we will find the confidence intervals for the population mean when the
population standard deviation is known by using the normal (z) distribution.
 The confidence intervals for the population mean (µ) when the population standard deviation
(σ) is known are given by the formula
 
x z.   x z. .
2 n 2 n
where,
 x  The sample mean & n  The sample size.
   The population standard deviation (assumed).
 z  2  The standard value of the random variable X (From z tables).
   The total area in both tails of the standard normal distribution curve.
  / 2  The total area in each one of the two tails of the standard normal curve.
 The term E  z  .  is called the "margin of error" and it is defined as the maximum
2 n
difference between the point estimate of a population parameter and the actual value of it.
It is also called the "maximum error of the estimate".
 The standard normal curve for a specific value of   0.05 and a 95% confidence interval is
given in the following figure:
Remark
 When the sample size increases, the margin of error will decrease.
 For a 90% confidence interval, we have z  2  1.65. (Proof Later)
 The confidence intervals for the population mean (µ) can be written as

C.I  x  z  . .
2 n

Example 1
A researcher wishes to estimate the number of days it takes an automobile dealer to sell a Chevrolet
Aveo. A sample of 50 cars had a mean time on the dealer’s lot of 54 days. Assume the population
standard deviation to be 6.0 days. Find the best point estimate of the population mean and the
95% confidence interval of the population mean?
Solution
 Since, we have
n  50, x  54,   6, z  2  1.96.
Then, the best point estimate of the population mean is given as
  x  54 days.
 The 95% confidence interval of the population mean is given as
 
x z.   x z. .
2 n 2 n
6 6
54  (1.96).    54  (1.96).
50 50
54  1.7    54  1.7
52.3    55.7.
Hence, with 95% confidence, the interval (52.3, 55.7) will contain the population mean.
Example 2
A survey of 30 emergency room patients found that the average waiting time for treatment was
174.3 minutes. Assuming that the population standard deviation is 46.5 minutes, find the best
point estimate of the population mean and the 99% confidence of the population mean?
Solution
 Since, we have
n  30, x  174.3,   46.5, z  2  2.58.
Then, the best point estimate of the population mean is given as
  x  174.3 minutes.
 The 99% confidence interval of the population mean is given as
 
x z.   x z. .
2 n 2 n
46.5 46.5
174.3  (2.58).    174.3  (2.58).
30 30
174.3  21.9    174.3  21.9
152.4    196.2.
Hence, we can be 99% confident that the mean waiting time for treatment for all emergency
room patients is between 152.4 and 196.2 minutes.

Sample Size
 An important question in estimation of the population mean. How large should the sample be
in order to make an accurate estimate?
 The answer is not easy because it depends on three factors are "the margin of error, the
population standard deviation, and the degree of confidence".
 To estimate the sample size that helps the researchers to make an accurate estimate of
the population mean, we will use the margin of error formula as follows:
z  .  z  . 
2
E  z  .  E . n  z  .  n 2
n  2  .
2 n 2 E  E 
 That is, the minimum sample size needed for an interval estimate of the population mean
is given by the formula
 z  . 
2
n  2  .
 E 
Example 3
A scientist wishes to estimate the average depth of a river. He wants to be 99% confident that the
estimate is accurate within 2 feet. From a previous study, the standard deviation of the depths
measured was 4.33 feet. Find the minimum sample size needed to do this?
Solution
Since, we have
z  2  2.58, E  2,   4.33.
Then, the minimum sample size needed to estimate the average depth of a river is given by
 z  .   (2.58)(4.33)  2
2
n  2    5.5857   31.2  32.

2

 E   2 
Hence, to be 99% confident that the estimate is accurate within 2 feet of the true mean depth, the
scientist needs a sample of at least 32 measurements.
Example 4
A researcher wishes to estimate within $300 the true average amount of money a county spends
on road repairs each year. If she wants to be 90% confident, how large a sample is necessary? The
standard deviation is known to be $900.
Solution

Since, we have
z  2  1.65, E  300,   900.
Then, the sample size necessary to estimate the average amount of money is given by
 z  .   (1.65)(900)  2
2
n  2     7.425  24.5025  25.

2

 E   200 
Hence, to be 90% confident that the estimate is accurate within $300 of the true mean money, the
researcher needs a sample of at least 25 roads.
2. Confidence Intervals for the population mean when σ is unknown
 When the population standard deviation  is known and the sample size is 30 or more, or the
population is normally distributed if the sample size is less than 30, the confidence intervals
for the population mean can be found by using the "z distribution".
 However, most of the time, the population standard deviation  is unknown and cannot be
assumed. In this case, we will use the sample standard deviation s as an estimator of  .
 In this case, another distribution is used to find the confidence intervals for the population
mean. This is called "student t distribution", or simply "t distribution".
 The t distribution is similar to the standard normal distribution in these properties:
 It is bell-shaped curve.
 It is symmetric about the mean.
 The curve never touches the X-axis.
 The mean, median, and mode are equal to 0.
 The t distribution differs from the standard normal distribution in these properties:
 The variance is greater than 1 (i.e,  2  1 ).
 As n increases, the t distribution approaches to the standard normal distribution.
 It has a family of curves based on the concept of "degrees of freedom" which is related
to the sample size.
Remark
 The degrees of freedom are the number of values that are free to vary when a sample statistic
is computed. It tells us which curve is used when a distribution consists of a family of curves.
 The degrees of freedom for a confidence interval for the population mean when  is unknown
is denoted by "d.f" and is computed by d.f = n -1.

Theorem
The confidence intervals for the population mean (µ) when the population standard deviation (σ)
is unknown are given by the formula
s s
x t .    x t . .
2 n 2 n
where,
 x  The sample mean & n  The sample size.
 s  The sample standard deviation (known or calculated).
 t  2  The standard value of the random variable X (From t distribution table).
 The values of t  2 are found according to the degree of freedom and the confidence levels.
Example 5
Find the value of t  2 for a 95% confidence interval when the sample size is 22?
Solution
Since, we have
Confidence Level = 95% & d.f  n 1  22 1  21.
Then, from the t distribution table, we get
t  2  2.080.
Example 6
Ten randomly selected people were asked how long they slept at night. The mean time for that
sample was 7.1 hours, and the standard deviation was 0.78 hour. Find the 95% confidence interval
of the population mean time? Assume the variable is normally distributed.
Solution

Since, we have
n  10, x  7.1, s  0.78, d.f  n 1  10 1  9, t  2  2.262.
Then, the 95% confidence interval of the population mean is given as
s s
x  t 2.    x  t 2. .
n n
0.78 0.78
7.1  (2.262).    7.1  (2.262).
10 10
7.1  0.56    7.1  0.56
6.54    7.66.
Hence, we can be 95% confident that the population mean time is between 6.54 and 7.66 hours.
Example 7
The following data represent a sample of the number of home fires started by candles for the past
seven years.
5460 5900 6090 6310 7160 8440 9930
Find the 99% confidence interval for the mean number of home fires started by candles in all years?
Solution
Since,
 x 
x i 5460  5900  ........  9930 49, 290
   7041.4.
n 7 7
 s 
2  x i2  n x 2 362,629,900  7(7041.4)2 15,560,702.28
   2,593, 450.38.
n 1 7 1 6
 s  s 2  2,593, 450.38  1610.4.
Now, we have
n  7, x  7041.4, s  1610.4, d.f  n  1  7  1  6, t  2  3.707.
Then, the 99% confidence interval of the population mean is given as
s s
x  t 2.    x  t 2. .
n n
1610.3 1610.3
7041.4  (3.707).    7041.4  (3.707).
7 7
7041.4  2256.2    7041.4  2256.2
4785.2    9297.6.
Hence, we can be 99% confident that the mean number of home fires started by candles in all years
is between 4785.2 and 9297.6 fires.
‫تمت بـحمـد اللـه‬

Estimation Hypothesis Testing: Decisions Inferences

Uploaded by

Copyright:

Available Formats

Estimation Hypothesis Testing: Decisions Inferences

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation Hypothesis Testing: Decisions Inferences

Uploaded by

Copyright:

Available Formats

Mr.

Mohamed El-Sayed El-Dawoody Lecturer of Mathematical Statistics

Confidence Intervals and sample size

LECTURE 2 PAGE 1 STAT 2040

Estimation of the population parameters

LECTURE 2 PAGE 2 STAT 2040

1. Confidence Intervals for the population mean when σ is known

LECTURE 2 PAGE 3 STAT 2040

LECTURE 2 PAGE 4 STAT 2040

n  2    5.5857   31.2  32.

LECTURE 2 PAGE 5 STAT 2040

n  2     7.425  24.5025  25.

LECTURE 2 PAGE 6 STAT 2040

LECTURE 2 PAGE 7 STAT 2040

LECTURE 2 PAGE 8 STAT 2040

You might also like