Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Lecture_8

Uploaded by

Thanh Trúc Vũ
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture_8

Uploaded by

Thanh Trúc Vũ
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

PROBABILITY AND STATISTICS

Lecture 8
Large-Sample Estimation
INTRODUCTION
 Populations are described by their
probability distributions and parameters.
 For quantitative populations, the location
and shape are described by m and s.
 For a binomial populations, the location
and shape are determined by p.
 If the values of parameters are unknown,
we make inferences about them using
sample information.
TYPES OF INFERENCE
 Estimation:
 Estimating or predicting the value of the
parameter
 “What is (are) the most likely values of
m or p?”
 Hypothesis Testing:
 Deciding about the value of a parameter
based on some preconceived idea.
 “Did the sample come from a population
with m = 5 or p = .2?”
TYPES OF INFERENCE
 Examples:
 A consumer wants to estimate the average
price of similar homes in her city before
putting her home on the market.
Estimation: Estimate m, the average home
price.

–A manufacturer wants to know if a new


type of steel is more resistant to high
temperatures than an old type was.
Hypothesis test: Is the new average resistance,
mN equal to the old average resistance, mO?
TYPES OF INFERENCE
 Whether you are estimating parameters or testing
hypotheses, statistical methods are important because
they provide:
 Methods for making the inference
 A numerical measure of the goodness or reliability of the
inference
DEFINITIONS
An estimator is a rule, usually a
formula, that tells you how to calculate
the estimate based on the sample.
 Point estimation: A single number is
calculated to estimate the parameter.
 Interval estimation: Two numbers
are calculated to create an interval
within which the parameter is
expected to lie.
PROPERTIES OF
POINT ESTIMATORS
Since an estimator is calculated from
sample values, it varies from sample to
sample according to its sampling
distribution.
An estimator is unbiased if the mean of
its sampling distribution equals the
parameter of interest.
 It does not systematically overestimate
or underestimate the target parameter.
PROPERTIES OF
POINT ESTIMATORS
 Of all the unbiased estimators, we prefer the estimator
whose sampling distribution has the smallest spread or
variability.
MEASURING THE GOODNESS
OF AN ESTIMATOR
 The distance between an estimate and the
true value of the parameter is the error of
estimation.
The distance between the bullet
and the bull’s-eye.
• In this chapter, the sample sizes are large,
so that our unbiased estimators will have
normal distributions.
Because of the
Central Limit
Theorem.
THE MARGIN OF ERROR
 For unbiased estimators with normal
sampling distributions, 95% of all point
estimates will lie within 1.96 standard
deviations of the parameter of interest.
•Margin of error: The maximum error of
estimation, calculated as

1.96  std error of the estimator


ESTIMATING MEANS
AND PROPORTIONS
•For a quantitative population,
Point estimatorof populationmean μ : x
s
Margin of error (n  30) :  1.96
n
•For a binomial population,
Point estimatorof populationproportion p : pˆ = x/n
pˆ qˆ
Margin of error (n  30) :  1.96
n
EXAMPLE
• A homeowner randomly samples 64 homes
similar to her own and finds that the average
selling price is $252,000 with a standard
deviation of $15,000. Estimate the average
selling price for all similar homes in the city.

Point estimator of μ: x = 252, 000


s 15, 000
Margin of error :  1.96 = 1.96 = 3675
n 64
EXAMPLE
A quality control technician wants to
estimate
the proportion of soda cans that are
underfilled.
He randomly samples 200 cans of soda and
finds 10 underfilled cans.
n = 200 p = proportion of underfilled cans
Point estimatorof p : pˆ = x/n = 10 / 200 = .05
pˆ qˆ (.05)(.95)
Margin of error :  1.96 = 1.96 = .03
n 200
INTERVAL ESTIMATION
• Create an interval (a, b) so that you are fairly
sure that the parameter lies between these two
values.
• “Fairly sure” is means “with high probability”,
measured using the confidence coefficient,
1-a. 1-a = .90, .95, .98, .99
Usually,
• Suppose 1-a = .95 and
that the estimator has
a normal distribution.
Parameter 
1.96SE
INTERVAL ESTIMATION
• Since we don’t know the value of the
parameter, consider Estimator  1.96SE which
has a variable center.

Worked
Worked
Worked
Failed
• Only if the estimator falls in the tail areas will
the interval fail to enclose the parameter.
This happens only 5% of the time.
TO CHANGE THE
CONFIDENCE LEVEL
• To change to a general confidence level, 1-a,
pick a value of z that puts area 1-a in the
center of the z distribution.
Tail area za/2
.05 1.645
.025 1.96
.01 2.33
.005 2.58

100(1-a)% Confidence Interval: Estimator  za/2SE


CONFIDENCE INTERVALS
FOR MEANS AND PROPORTIONS

•For a quantitative population,


Confidenceintervalfor a populationmean μ :
s
x  za / 2
n

•For a binomial population,


Confidenceintervalfor a populationproportion p :
pˆ qˆ
pˆ  za / 2
n
EXAMPLE
• A random sample of n = 50 males showed a
mean average daily intake of dairy products
equal to 756 grams with a standard deviation of
35 grams. Find a 95% confidence interval for
the population average m.

s 35
x  1.96  756  1.96  756  9.70
n 50

or 746.30  m  765.70 grams.


EXAMPLE
• Find a 99% confidence interval for m, the
population average daily intake of dairy
products for men.
s 35
x  2.58  756  2.58  756  12.77
n 50
or 743.23  m  768.77 grams.
The interval must be wider to provide for
the increased confidence that is does
indeed enclose the true value of m.
EXAMPLE
• Of a random sample of n = 150 college students,
104 of the students said that they had played on
a soccer team during their K-12 years. Estimate
the proportion of college students who played
soccer in their youth with a 98% confidence
interval.
pˆ qˆ 104 .69(.31)
pˆ  2.33   2.33
n 150 150
 .69  .09 or .60  p  .78.
ESTIMATING THE DIFFERENCE
BETWEEN TWO MEANS
•Sometimes we are interested in comparing
the means of two populations.
•The average growth of plants fed using two
different nutrients.
•The average scores for students taught with
two different teaching methods.
•To make this comparison,
A random sample of size n1 drawn from
population 1 with mean μ1 and variance s 12 .
A random sample of size n2 drawn from
population 2 with mean μ2 and variance s 22 .
ESTIMATING THE DIFFERENCE
BETWEEN TWO MEANS
•We compare the two averages by making
inferences about m1-m2, the difference in the
two population averages.
•If the two population averages are the
same, then m1-m2 = 0.
•The best estimate of m1-m2 is the
difference in the two sample means,

x1 - x2
THE SAMPLING
DISTRIBUTION OF x1 - x2
1. The mean of x1 - x2 is m1 - m 2 , the difference in
the population means.
s 12 s 22
2. The standard deviation of x1 - x2 is SE =  .
n1 n2
3. If the sample sizes are large, the sampling distribution
of x1 - x2 is approximately normal, and SE can be estimated
s12 s22
as SE =  .
n1 n2
ESTIMATING M1-M2
•For large samples, point estimates and
their margin of error as well as confidence
intervals are based on the standard normal
(z) distribution. Point estimate for m - m : x - x
1 2 1 2
2 2
s s
Margin of Error :  1.96 
1 2
n1 n2
Confidence interval for m1 - m 2 :
s12 s22
( x1 - x2 )  za / 2 
n1 n2
EXAMPLE
Avg Daily Intakes Men Women
Sample size 50 50
Sample mean 756 762
Sample Std Dev 35 30
• Compare the average daily intake of dairy products of
men and women using a 95% confidence interval.
2 2
s s
( x1 - x2 )  1.96 
1 2
n1 n2
352 302  - 6  12.78
 (756 - 762)  1.96 
50 50
or - 18.78  m1 - m 2  6.78.
EXAMPLE, CONTINUED
- 18.78  m1 - m 2  6.78

• Could you conclude, based on this confidence


interval, that there is a difference in the average
daily intake of dairy products for men and
women?
• The confidence interval contains the value m1-
m2= 0. Therefore, it is possible that m1 = m2. You
would not want to conclude that there is a
difference in average daily intake of dairy
products for men and women.
ESTIMATING THE DIFFERENCE
BETWEEN TWO PROPORTIONS
•Sometimes we are interested in comparing
the proportion of “successes” in two
binomial populations.
•The germination rates of untreated seeds and
seeds treated with a fungicide.
•The proportion of male and female voters who
favor a particular candidate for governor.
•To make
A random this of
sample comparison,
size n1 drawn from
binomial population 1 with parameter p1.
A random sample of size n2 drawn from
binomial population 2 with parameter p2 .
ESTIMATING THE DIFFERENCE
BETWEEN TWO MEANS
•We compare the two proportions by
making inferences about p1-p2, the
difference in the two population proportions.
•If the two population proportions are the
same, then p1-p2 = 0.
•The best estimate of p1-p2 is the
difference in the two sample proportions,
x1 x2
pˆ 1 - pˆ 2 = -
n1 n2
THE SAMPLING
DISTRIBUTION OF pˆ1 - pˆ 2

1. The mean of pˆ 1 - pˆ 2 is p1 - p2 , the difference in


the population proportions.
p1q1 p2 q2
2. The standard deviation of pˆ 1 - pˆ 2 is SE =  .
n1 n2
3. If the sample sizes are large, the sampling distribution
of pˆ 1 - pˆ 2 is approximately normal, and SE can be estimated
pˆ 1qˆ1 pˆ 2 qˆ 2
as SE =  .
n1 n2
ESTIMATING P1-P2
•For large samples, point estimates and
their margin of error as well as confidence
intervals are based on the standard normal
(z) distribution.
Point estimate for p1-p2 : pˆ 1 - pˆ 2
pˆ 1qˆ1 pˆ 2 qˆ 2
Margin of Error :  1.96 
n1 n2
Confidence interval for p1 - p2 :
pˆ 1qˆ1 pˆ 2 qˆ 2
( pˆ 1 - pˆ 2 )  za / 2 
n1 n2
EXAMPLE
Youth Soccer Male Femal
e
Sample size 80 70
Played soccer 65 39
• Compare the proportion of male and female college
students who said that they had played on a soccer
team during their K-12 years using a 99% confidence
interval.
pˆ 1qˆ1 pˆ 2 qˆ 2
( pˆ 1 - pˆ 2 )  2.58 
n1 n2

65 39 .81(.19) .56(.44)
 ( - )  2.58   .25  .19
80 70 80 70
or .06  p1 - p2  .44.
EXAMPLE, CONTINUED
.06  p1 - p2  .44
• Could you conclude, based on this confidence
interval, that there is a difference in the proportion of
male and female college students who said that they
had played on a soccer team during their K-12
years?
• The confidence interval does not contains the value
p1-p2 = 0. Therefore, it is not likely that p1= p2. You
would conclude that there is a difference in the
proportions for males and females.
A higher proportion of males than
females played soccer in their youth.
ONE SIDED
CONFIDENCE BOUNDS
 Confidence intervals are by their nature
two-sided since they produce upper and
lower bounds for the parameter.
 One-sided bounds can be constructed
simply by using a value of z that puts a
rather than a/2 in the tail of the z
distribution.

LCB : Estimator - za  (Std Error of Estimator)


UCB : Estimator  za  (Std Error of Estimator)
CHOOSING THE SAMPLE
SIZE
 The total amount of relevant information in a
sample is controlled by two factors:
- The sampling plan or experimental
design: the procedure for collecting the
information
- The sample size n: the amount of
information you collect.
 In a statistical estimation problem, the
accuracy of the estimation is measured by the
margin of error or the width of the
confidence interval.
Choosing the
Sample Size
1. Determine the size of the margin of error, B, that you
are willing to tolerate.
2. Choose the sample size by solving for n or n = n 1 = n2 in
the inequality: 1.96 SE  B, where SE is a function of
the sample size n.
3. For quantitative populations, estimate the population
standard deviation using a previously calculated value
of s or the range approximation s  Range / 4.
4. For binomial populations, use the conservative
approach and approximate p using the value p = .5.
EXAMPLE
A producer of PVC pipe wants to survey
wholesalers who buy his product in order to
estimate the proportion who plan to increase
their purchases next year. What sample size is
required if he wants his estimate to be within .04
of the actual proportion with probability equal to
.95?
pq .5(.5)
1.96  .04  1.96  .04
n n
1.96 .5(.5)  n  24.52 = 600.25
 n = 24.5
.04
He should survey at least 601
wholesalers.
KEY CONCEPTS
I. Types of Estimators
1. Point estimator: a single number is calculated to
estimate the population parameter.
2. Interval estimator: two numbers are calculated to
form an interval that contains the parameter.
II. Properties of Good Estimators
1. Unbiased: the average value of the estimator equals
the parameter to be estimated.
2. Minimum variance: of all the unbiased estimators, the
best estimator has a sampling distribution with the
smallest standard error.
3. The margin of error measures the maximum distance
between the estimator and the true value of the
parameter.
KEY CONCEPTS
III. Large-Sample Point Estimators
To estimate one of four population parameters when the
sample sizes are large, use the following point estimators
with the appropriate margins of error.
KEY CONCEPTS
IV. Large-Sample Interval Estimators
To estimate one of four population parameters when the
sample sizes are large, use the following interval
estimators.
KEY CONCEPTS
1. All values in the interval are possible values for the
unknown population parameter.
2. Any values outside the interval are unlikely to be the
value of the unknown parameter.
3. To compare two population means or proportions,
look for the value 0 in the confidence interval. If 0 is in
the interval, it is possible that the two population
means or proportions are equal, and you should not
declare a difference. If 0 is not in the interval, it is
unlikely that the two means or proportions are equal,
and you can confidently declare a difference.
V. One-Sided Confidence Bounds
Use either the upper () or lower (-) two-sided bound,
with the critical value of z changed from za / 2 to za.

You might also like