Chapter 8: Interval Estimates and Hypothesis Testing
Chapter 8: Interval Estimates and Hypothesis Testing
Chapter 8: Interval Estimates and Hypothesis Testing
Chapter 8 Outline
• Clint’s Assignment: Taking Stock
• Estimate Reliability: Interval Estimate Question
o Normal Distribution versus the Student t-Distribution: One Last
Complication
o Assessing the Reliability of a Coefficient Estimate: Applying the
Student t-Distribution
• Theory Assessment: Hypothesis Testing
o Motivating Hypothesis Testing: The Cynic
o Formalizing Hypothesis Testing: The Steps
• Summary: The Ordinary Least Squares (OLS) Estimation Procedure
o Regression Model and the Role of the Error Term
o Standard Ordinary Least Squares (OLS) Premises
o Ordinary Least Squares (OLS) Estimation Procedure: Three
Important Estimation Procedures
o Properties of the Ordinary Least Squares (OLS) Estimation
Procedure and the Standard Ordinary Least Square (OLS) Premises
Each estimation procedure is unbiased.
The estimation procedure for the coefficient value is the
best linear unbiased estimation procedure (BLUE).
• Causation versus Correlation
Actual Repetitions
Values From To Between
βx Var[e] Value Value From and to Values
2 50 1.5 2.5 ≈_____%
2 50 1.0 3.0 ≈_____%
2 50 .5 3.5 ≈_____%
2. In the simulation you just ran (Question 1):
2
c. Is it possible that the cynic is correct? To help you answer this question,
run the following simulation:
Student x y ∑ ( yt − y )( xt − x )
240 6
→ bx = = = = 1.2
t =1
1 5 66 T
∑
200 5
2 15 87 ( xt − x ) 2
3 25 90 t =1
6
bConst = y − bx x = 81 − × 15 = 81 − 18 = 63
5
4
Clint’s estimates suggest that Professor Lord gives each student 63 points for
showing up; subsequently, each student earns 1.2 additional points for each
additional minute studied.
Clint realizes that he cannot expect the coefficient estimate to equal the
actual value; in fact, he is all but certain that it will not. So now, Clint must
address two related issues:
• Estimate Reliability: How reliable is the coefficient estimate, 1.2,
calculated from the first quiz? That is, how confident should Clint be that
the coefficient estimate, 1.2, will be close to the actual value?
• Theory Assessment: How confident should Clint be that the theory is
correct, that studying improves quiz scores?
We shall address both of these issues in this chapter. First, we consider estimate
reliability.
⏐ ∑ (x − x )t
2
bConst
6
= 81 − × 15 = 63
⏐ t =1
5
↓ bConst = y − bx x
Mean[bx] = βx
Var[e ]
Var[bx ] = T
∑ ( xt − x )2
t =1
↓
Mean and variance describe the center and spread of the estimate’s probability distribution
Both the mean and variance of the coefficient estimate’s probability distribution
play a crucial role:
• Since the mean of the coefficient estimate’s probability distribution,
Mean[bx], equals the actual value of the coefficient, βx, the estimation
procedure is unbiased; the estimation procedure does not systematically
underestimate or overestimate the actual coefficient value.
• When the estimation procedure for the coefficient value is unbiased, the
variance of the estimate’s probability distribution, Var[bx], determines the
reliability of the estimate; as the variance decreases, the probability
distribution becomes more tightly cropped around the actual value;
consequently, it becomes more likely for the coefficient estimate to be
close to the actual coefficient value.
To assess his estimate’s reliability, Clint must consider the variance of the
coefficient estimate’s probability distribution. But we learned that Clint can never
determine the actual variance of the error term’s probability distribution, Var[e].
Instead, Clint adopts a two step strategy for estimating the variance of the
coefficient estimate’s probability distribution:
Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between the
probability distribution from the available variances of coefficient estimate’s and
information – data from the first quiz error term’s probability distributions
↓ ↓
EstVar[e ] = AdjVar[ Res's ] Var[e ]
Var[bx ] = T
SSR 54
= =
Degrees of Freedom 1
= 54 ∑
t =1
( xt − x ) 2
é ã
EstVar[e ] 54
EstVar[bx ] = T = = .27
∑ ( xt − x ) 2 200
t =1
Normal
Student t
bx
1.5 1.5
As discussed above, we must use the Student t-distribution rather than the
normal distribution since we must estimate the standard deviation of the
probability distribution. The regression results from Professor Lord’s first quiz
provide the estimate:
The standard error equals the estimated standard deviation. t equals the number of
standard errors (estimated standard deviations) that the value lies from the
distribution mean:
Value of Random Variable − Distribution Mean
t =
Standard Error
= Number of Standard Errors from the Distribution Mean
Since the distribution mean equals the actual value, we can “translate” 1.5 below
and above the actual value into t’s. Since the standard error equals .5196, 1.5
below and above the actual value translates into 2.89 standard errors below and
above the actual value:
1.5 below actual value 1.5 above actual value
↓ ↓
1.5 1.5
= 2.89 SE's below actual value = 2.89 SE's above actual value
.5196 .5196
To summarize,
The probability that the The probability that the
estimate lies within 1.5 of = estimate lies within 2.89 SE’s of
the actual value. the actual value.
↓
That is, between t’s of
−2.89 and 2.89
Figure 8.3 adds this information to the probability distribution graph.
bx
1.5 1.5
2.89 SE’s 2.89 SE’s
βx−1.5 βx+1.5
Actual Value = βx
t = −2.89 t = 2.89
Figure 8.3: Probability Distribution of Coefficient Estimate – “Close To” Value
Equals 1.5
11
.11 .11
bx
1.5 1.5
2.89 SE’s 2.89 SE’s
βx−1.5 βx+1.5
Actual Value = βx
t = −2.89 t = 2.89
Figure 8.4: Probability Distribution of Coefficient Estimate – Applying Student t-
Distribution
12
We can now fill in the second blank in the interval estimate question:
Interval Estimate Question: What is the probability that the coefficient
estimate, 1.2, lies within 1.5 of the actual coefficient value? .78
We shall turn our attention to assessing the theory.
The estimate for βx, 1.2, is positive. We estimate that an additional minute
of studying increases a student’s quiz score by 1.2 points. This lends support to
Clint’s theory. But, how much confidence should Clint have in the theory? Does
this provide definitive evidence that Clint’s theory is correct or should we be
skeptical?
If βx = 0
bx
0
Figure 8.5: Probability Distribution of Coefficient Estimate – Could the Cynic Be
Correct?
To answer this question, recall our earlier hypothesis testing discussion and play
the cynic. What would a cynic’s view of our theory and the regression results be?
Cynic’s view: Studying has no impact on a student’s quiz score; the positive
coefficient estimate obtained from the first quiz was just “the luck of the
draw.” In fact, studying has no effect on quiz scores; the actual coefficient, βx,
equals 0.
Is it possible that our cynic is correct?
14
The magnitude of the probability determines the likelihood that the cynic is
correct, the likelihood that studying has no impact on quiz scores:
Prob[Results IF Cynic Correct] small Prob[Results IF Cynic Correct] large
↓ ↓
Unlikely that the Likely that the
cynic is correct cynic is correct
↓ ↓
Unlikely that the Likely that the
studying has no impact studying has no impact
To compute this probability let us review what we know about the
probability distribution of the coefficient estimate:
OLS estimation If H0 Standard Number of Number of
procedure unbiased true error observations parameters
é ã ↓ é ã
Mean[bx ] = β x = 0 SE[bx] = .5196 DF = 3 − 2 = 1
Question for the Cynic: What is the probability that the coefficient estimate
from the first quiz would be 1.2 or more, if studying had no impact on quiz
scores (if the actual coefficient, βx, equaled 0)?
Student t-distribution
Mean = 0
SE = .5196
DF = 1
.13
bx
0 1.2
Figure 8.6: Probability Distribution of Coefficient Estimate – Prob[Results IF
Cynic Correct]
.2601/2 .2601/2
bx
1.2 1.2
−1.2 0 1.2
Figure 8.7: Probability Distribution of Coefficient Estimate – Tails Probability
The Prob column is based on the premise that the actual coefficient equals 0 and
then focuses on the two tails of the probability distribution where each tail begins
1.2 (the numerical value of the coefficient estimate) from 0. As Figure 8.7
illustrates, the value in the Prob column equals the probability of lying in the tails;
the probability that the estimate resulting from one week’s quiz lies at least 1.2
from 0 assuming that the actual coefficient, βx, equals 0. That is, the Prob column
reports the tails probability:
Tails Probability: The probability that the coefficient estimate, bx, resulting
from one regression would lie at least 1.2 from 0 based on the premise that the
actual coefficient, βx, equals 0.
Consequently, we do not need to use the Econometrics Lab to answer the
question that we pose for the cynic:
Question for the Cynic: What is the probability that the coefficient estimate
from the first quiz is 1.2 or more, if studying had no impact on quiz scores (if
the actual coefficient, βx, equals 0)?
Answer: Prob[Results IF Cynic Correct]
18
Student t-distribution
Mean = 0
SE = .5196
DF = 1
.2601/2
bx
0 1.2
Figure 8.8: Probability Distribution of Coefficient Estimate – Prob[Results IF
Cynic Correct]
We can use the regression results to answer this question. From the Prob
column we know that the tails probability equals .2601. We are only interested in
the right tail, however, the probability that the coefficient estimate will equal 1.2
or more, if the actual coefficient equals 0. Since the Student t-distribution is
.2601
symmetric, the probability of lying in one of the tails is . The answer to the
2
question we posed to assess the cynic’s view is .13:
Tails Probability .2601
Prob[Results IF Cynic Correct] = = ≈ .13
2 2
19
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, studying has no impact on quiz scores. The
results were just “the luck of the draw.”
Now, we construct the null and alternative hypotheses. Like the cynic, the null
hypothesis challenges the evidence; the alternative hypothesis is consistent
with the evidence:
H0: βx = 0 Cynic is correct: Studying has no impact on a student’s quiz score.
H1: βx > 0 Cynic is incorrect: Additional studying increases quiz scores.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
Question for the Cynic:
• Generic Question: What is the probability that the results would be
like those we actually obtained (or even stronger), if the cynic is
correct and studying actually has no impact?
• Specific Question: The regression’s coefficient estimate was 1.2:
What is the probability that the coefficient estimate in one regression
would be 1.2 or more if H0 were actually true (if the actual coefficient,
βx, equals 0)?
Answer: Prob[Results IF Cynic Correct] or Prob[Results IF H0 True]
The magnitude of this probability determines whether we reject the null
hypothesis:
Prob[Results IF H0 True] small Prob[Results IF H0 True] large
↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
21
Step 4: Use the general properties of the estimation procedure, the probability
distribution of the estimate, to calculate Prob[Results IF H0 True].
OLS estimation If H0 Standard Number of Number of
procedure unbiased true error observations parameters
é ã ↓ é ã
Mean[bx] = βx = 0 SE[bx] = .5196 DF = 3 − 2 = 1
We have already calculated this probability. First, we did so using the
Econometrics Lab. Then, we noted that the statistical software had done so
automatically. We need only divide the tails probability, as reported in the
Prob column of the regression results, by 2:
.2601
Prob[Results IF H 0 True] = ≈ .13
2
The probability that the coefficient estimate in one regression would be 1.2 or
more if H0 were actually true (if the actual coefficient, βx, equals 0) is .13.
∑(y t − y )( xt − x )
bx = t =1
T
and bConst = y − bx x
∑ (x − x )
t =1
t
2
α: Right Tail
Probability
t
0
Figure 8.9: Student t-distribution Right Tail Probabilities
Degrees of
Freedom α = 0.10 α = 0.05 α = 0.025 α = 0.01 α = 0.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
26
Degrees of
Freedom α = 0.10 α = 0.05 α = 0.025 α = 0.01 α = 0.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
Table 8.6: Right Tail Critical Values for the Student t-Distribution
The first column represents the degrees of freedom. The numbers in the body of
the table are called the “critical values.” A critical value equals the number of
standard errors a value lies from the mean. The top row specifies α’s value of, the
“right tail probability.” Since the t-distribution is symmetric, the “left tail
probability” also equals α. The probability of lying within the tails, in the center
of the distribution, is 1 − 2α. This no doubt sounds confusing, but everything
should become clear after we show how Clint can use this table to answer the
interval estimate question.
Student t-distribution
1 − 2α
α α
Estimate
Critical Value × SE Critical Value × SE
Distribution Mean
Figure 8.10: Student t-distribution – Illustrating the Probabilities
Interval Estimate Question: What is the probability that the estimate, 1.2,
lies within ____ of the actual value? ____
Let us review the regression results from Professor Lord’s first quiz:
Coefficient Estimate = bx = 1.2
Standard Error of Coefficient Estimate = SE[bx] = .5196
28
Next, we shall modify Figure 8.10 to reflect our specific example. Focus
on Figure 8.11.
• We are interested in the coefficient estimate; consequently, we replace the
horizontal axis label by substituting bx for Estimate.
• Also, we know that the estimation procedure Clint uses, the ordinary least
squares (OLS) estimation procedure, is unbiased; hence, the distribution
mean equals the actual value. We can replace the Distribution Mean with
the actual coefficient value, βx.
Student t-distribution
1 − 2α
α α
bx
Critical Value × SE Critical Value × SE
βx
Figure 8.11: Student t-distribution – Illustrating the Probabilities for Coefficient
Estimate
Now, let us help Clint fill in the blanks. When using the table we begin by
filling in the second blank rather than the first.
• Second Blank: Choose α to specify the tail probability.
Clint must choose a value for α. As we shall see, the value he chooses
depends on how demanding he is. For example, suppose that Clint
believes that a .80 probability of the estimate lying in the center of the
distribution, close to the mean, is good enough. He would then choose
an α equal to .10. To understand why, note that when α equals .10, the
probability of the estimate lying in the right tail would be .10. Since
the t-distribution is symmetric, the probability of the estimate lying in
the left tail would be .10 also. Therefore, the probability that the
estimate lies in the center of the distribution would be .80;
accordingly, we write .80 in the second blank.
What is the probability that the estimate, 1.2, lies within _____ of the
actual value? .80
• First Blank: Calculate tail boundaries.
29
The first blank quantifies what “close to” means. The standard error
and the Student t-distribution table allow us to fill in the first blank. To
do so, we begin by calculating the degrees of freedom. Recall that the
degrees of freedom equal 1:
Degrees of = Sample –
Number of Estimated Parameters
Freedom Size
= 3 – 2
= 1
Degrees of
Freedom α = 0.10 α = 0.05 α = 0.025 α = 0.01 α = 0.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
Table 8.7: Right Tail Critical Values for the Student t-Distribution – α Equals
0.10 and Degrees of Freedom Equals 1
Clint chose a value of α equal to .10. The table indicates that the critical
value for α = .10 with 1 degree of freedom is 3.078. The probability that the
estimate falls within 3.078 standard errors of the mean is .80. Next, the regression
results report that the standard error equals .5196:
SE[bx] = .5196
After multiplying the critical value given in the table, 3.078, by the standard error,
.5196, we can fill in the first blank:
3.078 × .5196 = 1.6
Student t-distribution
.80
.10 .10
bx
Critical Value × SE Critical Value × SE
3.078 × .5196 = 1.6 3.078 × .5196 = 1.6
βx−1.6 βx βx+1.6
Figure 8.12: Student t-distribution – Calculations for an α Equal to .10
30
What is the probability that the estimate, 1.2, lies within 1.6 of the
actual value? .80
1
Appendix 8.2 shows how we can use the Student t-distribution table to address
the interval estimate question. Since the table is cumbersome we shall use the
Econometrics Lab to do so.