Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

Module 06 - One Population Parameter Estimation - Topic 4A

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Module 06 - One Population Parameter Estimation - Topic 4A

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

TOPIC 7

PARAMETER ESTIMATION

200032: STATISTICS FOR BUSINESS


.
Introduction

200032: STATISTICS FOR BUSINESS


Introduction

When computing probabilities in sampling distributions problems, we need to


know the value of the relevant parameters – a highly unlikely circumstance.

In the real world, population parameters are almost always unknown, because
they represent summary measurements about large populations.

The following topics will discuss the estimation of population parameters which
are unknown using known sample statistics.

We will be estimating the population mean by the sample mean, and the
population proportion by the sample proportion.

2
Estimation

The estimation procedures that we cover fall into two categories:

(i) Point estimation: This is a single number (or value) calculated from
available sample data and used to estimate a population parameter.

For example, is a logical point estimate of the population mean m .


x
(ii) Interval estimation: This consists of two numbers that define a range of
values, which will enclose the unknown population parameter at some
specified probability level. For example, we are 90% sure that the population
mean for Tutorial Quiz 1 (out of 10) lies between 6 and 7…

3
Confidence Intervals

Since a sample statistic such as x varies from sample to sample, this variation can be taken into
consideration when estimating the true population mean. Hence an “Interval Estimate” for the
population mean is obtained.

The interval that is constructed has a specified confidence or probability of correctly estimating
the true value of the population mean.

As an example, an interval can be constructed for which there will be a 95% confidence that this
interval includes the true value of the population mean.

In other words, if we repeat the process of obtaining samples and constructing intervals, in the
long run 95% of these intervals will contain the true population mean.

These intervals are called confidence intervals. The “level of confidence” is given as 100 (1 – α) %
for a given value 0 < α < 1. The choice of the confidence level is somewhat arbitrary.

4
Confidence Intervals

The basic form for all confidence intervals, including one and two populations,
is as follows:

Sample  critical value  standard error of the sample

The choice of the critical value and standard error depends on what
information we have about the population.

5
Confidence Intervals
In this unit we will be only estimating the population mean or proportion for one population
or the difference between population means or proportions for two populations.

The following table shows the point estimator (the sample statistic) that will be used when
calculating the confidence intervals

Estimating Population Sample

Population Mean µ 𝑥

Population Proportion p 𝑝

Difference in Means (µ1 – µ2)  x1  x2 

Difference in Proportions (p1 – p2)  pˆ1  pˆ 2 


6
.2
Estimating μ and p

200032: STATISTICS FOR BUSINESS


Estimation

Topic 6 introduced the Normal Distribution; by far the most commonly used continuous
distribution as many phenomena follow a normal distribution. It also usually gives a good
approximation even when the normal distribution does not apply, so the normal distribution is
often used as an approximation – especially in modern finance.

The normal distribution has such useful properties that even in cases when the data is far from
“normal” (for example, when it has a marked positive skew), a transformation (taking
logarithms or square root are the most commonly used transformations) may be applied to the
original data to make it more symmetric (and therefore more like a normal distribution). The
transformed data is then analysed as if it came from a normal distribution.

We also know that the powerful result known as the Central Limit Theorem guarantees that
the sampling distribution of the mean tends to a Normal Distribution as the sample size
increases

2
Estimating µ with  known

When we are estimating the population mean µ, if the population standard


deviation  is known then the critical value is zα/2 while the standard error of the
sample mean is

Hence the confidence interval to estimate the population mean given that the
population standard deviation is known is as follows 
x  z / 2
n
Assumptions of the population:
1. The population is normally distributed; or
2. If the population is not normal, then the sample size must be large (n ≥ 30)
3. The population standard deviation σ is known.
This formula can also be used be used if  is unknown and the sample size n is large
i.e., n ≥ 30. If n ≥ 30, then s is a good approximation for .
3
Example

How much time do computer users spend on the “information superhighway”?


An Internet service provider (ISP) conducted a survey of 250 of its customers,
and found that the average amount of time spent online was 10.5 hours per
week.

It is known that the population standard deviation is 5.2 hours.

Construct a 95% confidence interval for the average online time for all users of
this particular ISP.

4
Example
So what do we know?
• 95% confidence interval: 100 (1 – α) = 95, so α = 0.05.
• An ISP conducted a survey of 250 customers: so n = 250.
• Average amount of time spent online was 10.5: so x  10.5
• Population standard deviation is 5.2 hours: so  = 5.2.


x  z 2
n Thus we can say that we are 95% confident
that the average online time for all users of
 5.2 
10.5  1.96  
this particular Internet server is between
9.855 and 11.145 hours per week.
 250 
10.5   0.645 
9.855    11.145
5
Confidence Intervals

The left hand end of a confidence interval (CI) is called the lower confidence
limit (LCL).

The right hand end is called the upper confidence limit (UCL).


LCL  x  z 2
n

UCL  x  z 2
n

6
Estimating  with  unknown

What happens when the sample size is small and  is unknown?

Just as the mean μ of the population is usually unknown, the population standard
deviation σ is also usually unknown. Then we estimate σ by s, which is the sample
standard deviation.
s
In this case, the standard error of x will have to be estimated as .
n

s
Then for small sample sizes, our CI will look like x ?
n
But now “?” can no longer be taken from a standard normal distribution.

7
Student t-distributions
The t-distributions were discovered by William S. Gosset in 1908.
Gosset was a statistician employed by the Guinness brewing company, which had stipulated that he
not publish under his own name. He therefore wrote under the pen name “Student”. The Student t-
distributions were named after him; and these distributions arise in the following situation.

Suppose we have a simple random sample of size n drawn from a normal population with mean μ
and standard deviation σ.

Let x denote the sample mean, and s the sample standard deviation.
x  
Then the quantity t  s has a t-distribution with n – 1 degrees of freedom (d.f).
n
A t-distribution variable is denoted by “t”, and d.f. is specified as a subscript. For example, t8.

8
Properties of the t-distribution
• The graph of a t-distribution extends indefinitely to the left and right, and is
“mound-shaped”.

• A t-curve is symmetric about its mean.

• The high point of the t-distribution occurs at its mean, which is always equal to
zero.

• There are an infinite number of t-curves. Each particular curve is determined


by one parameter, the degrees of freedom.

• When the d.f. increase, the t-distribution gets closer to a standard normal
distribution.

• Critical values/probabilities are tabularised in t-distribution tables.

9
t-distribution table

10
Example

Find t0.05, 8 .

Hence t0.05, 8 = 1.860. 11


Estimating µ with  unknown

When we are estimating the population mean µ, if the population standard


deviation  is unknown then the critical value is tα/2 while the standard error of the
sample mean is s
n
Hence the confidence interval to estimate the population mean given that the
population standard deviation is unknown is as follows
s
x  t where s is the sample standard deviation, and t/2 is the critical
value of the t-distribution with (n – 1) degrees of freedom
2 n
Assumptions of the population:
1. The underlying population is normally distributed.
2. If the underlying population is not normal, then there must be a large sample.
3. Population standard deviation is unknown.

12
Example

A random sample of 9 packets of a certain breakfast cereal is taken, and reveals an


average fibre content of 3.6 grams, with a standard deviation of 0.9 grams.

Construct a 90% confidence interval for the true average fibre content for this
breakfast cereal.

Assume that the fibre content is normally distributed.

13
Example
So what do we know?
• 90% confidence interval: 100 (1 – α) = 90, so α = 0.10
• A random sample of 9 packets : so n=9
• Average fibre content of 3.6 grams : so x  3.6
• A random sample reveals … a standard deviation of 0.9 grams : so s = 0.9

Since  is unknown then tα/2 is the critical value. From the t-table, we have t0.05, 8 = 1.860.

s
x  t Thus, we can say that a 90% confidence interval for
2 n the true average fibre content for this breakfast cereal
is between 3.04 and 4.16 grams.
 0.9 
3.6  1.860  
 9
3.6  0.56
3.04    4.16 14
Estimating p
When we are dealing with a categorical variable (e.g. gender, preference, etc.), the parameter we
are most interested in is the proportion, p. Consider a population, of which only some individuals
possess the desired characteristic. We may be interested in estimating the proportion possessing
that characteristic.

If we take a sample of size n (sampling without replacement) and count the number which have
the characteristic ("number of successes"), x, then x will have a binomial distribution with
parameters n and p. If n is sufficiently large, (np and nq both greater than 5) the binomial
distribution (of x) can be approximated by a normal distribution.

The same principles used for the confidence interval for the mean are used for the confidence
interval of the population proportion. Here we want to obtain a plausible range of values for the
population proportion, p. Keep in mind, p should have a value between 0 and 1. However when
we use the sample proportion in constructing confidence intervals for p, this can lead to
confidence intervals which contain values outside of 0 and 1.

15
Estimating p – the population proportion
When we have a large sample, the critical value is zα/2

while the standard error of the sample proportion is


 pˆ 1  pˆ  or more simply pˆ qˆ
n n

Hence the confidence interval to estimate the population proportion given that the
sample is large is as follows

ˆ qˆ
p
ˆ  z
p
2 n
16
Example

A survey was taken of women in a major city to determine what factor was the
most important in deciding where to shop. The results appear below.

Factor Percentage
Price and Value 40%
Quality and Selection of Merchandise 30%
Service 15%
Shopping Environment 15%

If the sample size was 1200, estimate with 95% confidence the proportion of
women who identified "price and value" as the most important factor.

17
Example
So, what do we know?
• 95% confidence interval ; so α = 0.05.
• The sample size was 1200 ; so n = 1200.
• Price and Value most important: p̂ = 0.4.
Firstly, determine the critical z-value: z0.025 = 1.96.

ˆˆ
pq
pˆ  z 2
n Thus we can say that a 95% confidence interval
for the true proportion of women who identified
0.4  0.6
0.4  1.96 "price and value" as the most important factor is
1200 between 37.2% and 42.8%.
0.4  0.028
0.372  p  0.428
18
.3
Estimating the difference between
Two Population Means and
Two Population Proportions

200032: STATISTICS FOR BUSINESS


Statistical Inference of Two Populations

So far, all discussion of confidence intervals has centred on samples drawn from one
population. However, it is often important to compare two populations to see
whether there is a significant difference between the two.

Some examples:
Does the new safety program reduce accidents?
Are boys any better or worse than girls at Mathematics?
Is the new process better than the old one?

To conduct inference about two population parameters, we must first determine the
sampling distribution of the difference of two parameters.

2
Independent v Dependent Samples
Similarly to inferences about one population, we base our conclusions on samples
taken from each population. These samples can be either independent or dependent.

Two samples are independent if the sample values selected from one population are
not related to the sample values selected from the other population.

A sampling method is dependent when the individuals selected to be in one sample


are used to determine the individuals to be in the second sample.

The dependent samples we consider are often referred to as matched-pair samples.

3
Difference Between Two Independent Population Means
In this section we will consider methods for using sample data from two independent
samples to construct confidence interval estimates of the difference between two
population means.
We are examining the difference between two population means (µ1 – µ2)

by examining the difference in the sample means x  x .
1 2 
We will look at two cases: 1. σ1 and σ2 are known
2. σ1 and σ2 are unknown but assumed equal.

In both cases, we will assume that the populations are normally distributed.

4
Difference Between Two Independent Population Means – σ1 and σ2 are known

When we are estimating the difference between two independent population


means, if the population standard deviations (σ1 and σ2 ) are both known then the
critical value is zα/2 and the standard error is:

 1   2 
2 2

SE  
 x1  x 2  n1 n2
Hence the 100(1 – α)% confidence interval to estimate the difference between
population means is as follows

1   2 
2 2

x  x   z
1 2 2
n1

n2
5
Example
Two factories which are located 100 kilometres apart, were measured for amount
of time lost due to accidents. A sample of 45 days from the first factory had an
average time lost of 81 minutes. The sample from the other factory comprised 36
days, and had an average time lost of 76 minutes.

The population standard deviations of the time lost of the two factories are known
to be σ1 = 5.2 and σ2 = 3.4.

Construct a 95% confidence interval for the difference in the average time lost
between the two factories.

6
Example
Let Population 1 = Factory 1,
and Population 2 = Factory 2.
 1   
2 2

What do we know? x  x   z
1 2  2
n1
 2
n2

 5.2   3.4 
2 2

1 2 81  76   z 0.05 2 
45 36
 5.2   3.4 
2 2
Sample size 45 36
5  z0.025 
45 36
Sample mean 81 76
 5.2   3.4 
2 2

Population Variance 5.2


2
3.4
2
5  1.96 
45 36
5  1.882
 0.05

That is, we are 95% confident that the difference in the average time lost is between the
two factories lies between 3.12 and 6.88 minutes.
7
Difference Between Two Independent Population Means –
σ1 and σ2 are unknown but assumed equal

If the population variances are unknown, we will consider only the situation in which the
population variances are assumed to be equal.

That is, suppose it is reasonable to make the assumption (1)2 = (2)2 =  2

In nearly all practical situations, σ2 is unknown, and must be estimated using the
variances of two independent random samples selected from the populations.

The estimate used is called the pooled sample variance, denoted by (sp)2

 n1  1 s1   n2  1 s2 
2 2

s 
2
It is calculated as follows: 
n1  n2  2
p

Here (s1)2 and (s2)2 are the sample variances for each sample.

8
Difference Between Two Independent Population Means –
σ1 and σ2 are unknown but assumed equal

If σ1 and σ2 are unknown, the critical value is tα/2 , and the standard error is:

 s 2p s 2p  2 1 1 1 1
SE( x  x2 )      s p     s p   
n n 
 1 2  n1 n2   n1 n2 
1

The degrees of freedom for the critical value t is d.f. = (n1 – 1) + (n2 – 1) = n1 + n2 – 2

Assuming that both populations follow a normal distribution. Hence the confidence interval to
estimate the difference between population means given that both the population standard
deviations are unknown and assumed equal is as follows

1 1
( x1  x 2 )  t s p   
2
 n1 n2  9
Example

Two plastics produced by different processes were subjected to tests in order to determine
their breaking strengths.

A random sample consisting of 20 measurements made on plastic type X revealed a


mean breaking strength of 28.3 lb, with a standard deviation of 3.3 lb. A random sample
consisting of 25 measurements made on plastic type Y revealed a mean breaking
strength of 26.7 lb, with a standard deviation of 3.9 lb.

Construct a 95% confidence interval for the difference in their mean breaking strengths.
You may also assume that the populations are normal, and their variances are equal.

10
Example
Let Population 1 = Plastic X,
and Population 2 = Plastic Y.

What do we know?

t0.025,43 = 2.018
1 2
Hence the confidence interval is;
n= 20 25
(28.3 – 26.7)  2.018  3.647 

x= 28.3 26.7 1.6  2.207

s= 3.3 3.9 -0.607 < µ1 – µ2 < 3.807

 0.05

That is, we are 95% confident that the true difference in the mean breaking strengths
lies between – 0.607 and 3.807 pounds.
11
Difference Between Two Dependent Population Means

In situations where the samples are dependent, we take the differences between each pair of
observations, and then use these “differences” to construct the confidence interval as if it were
one sample.

Let (xi , yi ) be the pair of values for the i th individual in the data set.

How we determine the difference (i.e. d = xi – yi or d = yi – xi) is arbitrary.

The sample mean of the differences is d while the standard deviation of the differences is sd.
We assume that the population of differences is normally distributed, and the sample of
differences represents a random sample from the population of differences.

12
Difference Between Two Dependent Population Means

if the population standard deviation  is unknown then the critical value is tα/2 while the
standard error of the sample mean is s
d

Hence the confidence interval to estimate the population mean given that the population
standard deviation is unknown is as follows

 sd 
d  t 2  
 n
where sd is the sample standard deviation of the differences, and
t/2 is the critical value of the t-distribution with (n – 1) degrees of freedom

13
Example

Eleven students were randomly selected from a population of 1000 students. The sampling
method was simple random sampling.

All of the students were given a standardised English test and a standardised maths test. Test
results are summarized below.

Test Score

English 85 87 85 85 68 81 84 71 46 75 80

Mathematics 83 83 83 82 65 79 83 60 47 77 83

Find the 90% confidence interval for the mean difference between student scores on the
maths and English tests.

Assume that the mean differences are approximately normally distributed.

14
Example

First determine the difference:


d = English score – Mathematics score

Now determine the differences: (2, 4, 2, 3, 3, 2, 1, 11, –1, –2, –3)

Calculate the summary statistics and


critical value: Finally calculate the confidence interval:

 sd 
d  t 2  
d 2  n
sd  3.715  3.715 
2  1.812  
t 2  t0.05, 10 df  1.812
 11 
n  11 2  2.03
The 90% confidence interval for the mean difference between student scores on the math
and English tests is from – 0.03 to 4.03.
15
Difference Between Two Population Proportions

Using the same ideas for proportions as we used for two means:

We are analysing the difference in population proportions, so we know

sample difference  pˆ1  pˆ 2 


population difference (p1 – p2)

Since the population proportions are unknown, we use the sample proportions to
find the standard error:

 pˆ1qˆ1 pˆ 2 qˆ2 
SE pˆ1  pˆ 2     
 1 n n 2 

For large samples, by the Central Limit Theorem, the critical value is zα/2 for
confidence intervals. Large means that
n1 pˆ1 , n1qˆ1 , n2 pˆ 2 , n2 qˆ2  5.

16
Difference Between Two Population Proportions

Hence the confidence interval to estimate the difference between two


population proportions given that the samples are large is as follows

pˆ1qˆ1 pˆ 2 qˆ2
 pˆ1  pˆ 2   z 2 
n1 n2

17
Example

A random sample of 400 male university students shows that 160 of them catch
a train to university.

A random sample of 400 female university students resulted in 120 of them


taking a train to university.

Using this data, construct a 99% confidence interval to estimate the difference in
population proportions of male and female students who take a train to
university.

18
Example
Let Population 1 = Male, and
Population 2 = Female. pˆ1qˆ1 pˆ 2 qˆ2
 pˆ1  pˆ 2   z 2 
n1 n2
What do we know?
0.4  0.6 0.3  0.7
1 2  0.4  0.3  2.576 
400 400
n= 400 400 0.10  0.086

x= 160 120 That is, we are 99% confident that the true difference
in population proportions of male and female
x students who take a train to university is between
pˆ  0.4 0.3 0.014 and 0.186, or between 1.4% and 18.6%.
n

 0.01

19
.4
Sample Size Determination

200032: STATISTICS FOR BUSINESS


Sample Size Determination
One of the most frequent questions asked of the statistician is,

How many measurements should be included in the sample?

The sampling procedure, together with the sample size, controls the total amount of
relevant information in a sample. At this point in our study, we are concerned with the
simplest sampling situation – in other words, random sampling from a relatively large
population – and devote our attention to the selection of the sample size n.

We need to recognise that the confidence intervals we have covered are basically the
sample statistic ± an error value. If this error value (known as the maximum error or B)
is known, we can the calculate the sample size needed. (The “B” stands for “error
bound” here).

2
Sample Size Determination

The sample size determination formulae we are about to give you come from the
formulae for the maximum desired error of the estimates. Basically, they come from
the corresponding confidence interval formulae! The formula is then solved for n.

Be sure to round the answer obtained UP to the next whole number, not off to the
nearest whole number. If you round off, then you will exceed your maximum error
of the estimate in some cases.

By rounding up, you will have a smaller maximum error of the estimate than allowed,
but this is better than having a larger one than desired.

3
Sample Size Determination – Estimating m

When estimating the population mean, we can use the following formula to
determine the sample size, assuming we know σ (the population standard
deviation).

 z /2 
2

n 
 B 
If the population standard deviation is unknown, then we have to estimate σ so we
can determine the sample size. In such cases, it is acceptable to use the following
estimate.

1
   range 
4 4
Example

A fast food company wants to determine the average number of times that fast
food users visit fast food restaurants per week.

They have decided that their estimate needs to be accurate to within one-tenth
of a visit, and they want to be 95% sure that their estimate does not differ from
the true number of visits by more than one-tenth of a visit.

Previous research has shown that the standard deviation is 0.7 visits. What is the
required sample size?

5
Example

What do we know?

They want to be 95% sure… so α = 0.05.

… that their estimate does not differ from the true number of visits by
more than one-tenth of a visit… so B = 0.1.

Previous research has shown that the standard deviation is 0.7 visits…
so σ = 0.7.

 z /2 
2
1.96  0.7 
2

n     188.2384.
 B   0.1 
Hence a sample size of 189 or more is needed.
6
Sample Size Determination – Estimating p

The formula for the sample size here is obtained by solving for n the maximum
error of the estimate formula for the population proportion. Again, we get this
from the corresponding confidence interval! Note that p is taken from a previous
study, if one is available.

2
 z /2 
n  pq
 B 
If there is no previous study or estimate available, then use 0.5 for p and q, as these
are the values which will give the largest sample size. It is better to have too large of
a sample size and come under the maximum error of the estimate. In this case, the
formula simplifies to

2
1  z /2 
n 
4  B  7
Example

A publisher wants to know what percent of the population might be


interested in a new magazine on making the most of your retirement.

Secondary data (that is several years old) indicates that 22% of the
population is retired.

They are willing to accept an error rate of 5%, and they want to be 95%
certain that their finding does not differ from the true rate by more
than 5%.

What is the required sample size?

8
Example
What do we know?

Secondary data (that is several years old) indicates that 22% of the population is
retired… so p = 0.22.

They want to be 95% certain… so α = 0.05.

… that their finding does not differ from the true rate by more than 5%... so B = 0.05.

2 2
 z /2   1.96 
n  pq     0.22  0.78  263.687.
 B   0.05 
Hence a sample size of 264 or more is needed.
9
Sample Size Determination – Estimating µ1 – µ2

When dealing with two populations we use the same method as with one population. Basically
we rearrange the error part of the confidence intervals to solve for n. Since we have n1 and n2
we add the condition that n1 = n2

2
 z 

n1  n2   2   12   22
 B

 
If the population standard deviations are unknown, then we have to estimate them so we can
determine the sample size. In such cases, it is acceptable to use the following estimate for
both populations.

1
   range 
4 10
Example

A fast food company wants to estimate the difference in average spending


between males and females. They want to be 95% sure that their estimate does
not differ from the true value by more than $1.

Previous research has shown that the standard deviation is $3 for both males and
females.

Assuming an equal sample size, what is the required sample size?

11
Example

What do we know?

They want to be 95% sure… so α = 0.05.

… that their estimate does not differ from the true value by more than $1 … so B = 1.

Previous research has shown that the standard deviation is $3 for both males and females … so
σ1 and σ2 = 3.

2
 z  2

n1  n2   2
 B 

1   2  
2 2

 1.96  2 2

 3  3  69.149 
   1 

Hence a sample size of 70 males and 70 females is needed.

12
Sample Size Determination – Estimating p1 – p2

When dealing with two populations we use the same method as with one population. Basically
we rearrange the error part of the confidence intervals to solve for n. Since we have n1 and n2
we add the condition that n1 = n2

2
 z 
n1  n2   2   pˆ1qˆ1  pˆ 2 qˆ2 
 B
 
If there is no previous study or estimate available, then use 0.5 for p and q, as these are
the values which will give the largest sample size.

13
Example

A publisher wants to estimate the difference in the percent of people might be


interested in a new magazine between people in NSW and Queensland.

They are willing to accept an error rate of 5%, and they want to be 90% certain that
their finding does not differ from the true rate by more than 5%.

What is the required sample size assuming an equal sample size from each state?

14
Example

From the question we can determine the following


B = 0.05
α = 0.10 Hence z0.05 = 1.282
Since we have no previous estimate of the proportion, we will let the proportions
equal 0.5

2
 z   
2

n1  n2   2   pˆ1qˆ1  pˆ 2 qˆ2     0.5  0.5  0.5  0.5  328.705


1.282
 B   0.05 
 

Hence a sample size of 329 is needed from NSW and from Queensland.

15

You might also like