Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 5 Regression With A Single Regressor: Hypothesis Tests and Confidence Intervals

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 32
At a glance
Powered by AI
The text discusses concepts related to regression analysis including hypothesis testing, confidence intervals, and assumptions of the OLS model. Multiple choice questions cover topics like heteroskedasticity, calculation of t-statistics, and interpretation of regression outputs.

The multiple choice questions cover topics like the definition of heteroskedasticity, how to calculate t-statistics, interpretation of confidence intervals, and assumptions required for valid hypothesis tests in regression.

For OLS estimates to be best linear unbiased, the assumptions are: errors have zero conditional mean, the regressors and errors are independent, the regressors are non-stochastic, and the errors have finite variance. Homoskedasticity is also usually assumed.

Introduction to Econometrics, 3e (Stock)

Chapter 5 Regression with a Single Regressor: Hypothesis Tests and Confidence


Intervals

5.1 Multiple Choice

1) Heteroskedasticity means that


A) homogeneity cannot be assumed automatically for the model.
B) the variance of the error term is not constant.
C) the observed units have different preferences.
D) agents are not all rational.
Answer: B

2) With heteroskedastic errors, the weighted least squares estimator is BLUE. You should use OLS with
heteroskedasticity-robust standard errors because
A) this method is simpler.
B) the exact form of the conditional variance is rarely known.
C) the Gauss-Markov theorem holds.
D) your spreadsheet program does not have a command for weighted least squares.
Answer: B

3) When estimating a demand function for a good where quantity demanded is a linear function of the
price, you should
A) not include an intercept because the price of the good is never zero.
B) use a one-sided alternative hypothesis to check the influence of price on quantity.
C) use a two-sided alternative hypothesis to check the influence of price on quantity.
D) reject the idea that price determines demand unless the coefficient is at least 1.96.
Answer: B

4) The t-statistic is calculated by dividing


A) the OLS estimator by its standard error.
B) the slope by the standard deviation of the explanatory variable.
C) the estimator minus its hypothesized value by the standard error of the estimator.
D) the slope by 1.96.
Answer: C

5) The confidence interval for the sample regression function slope


A) can be used to conduct a test about a hypothesized population regression function slope.
B) can be used to compare the value of the slope relative to that of the intercept.
C) adds and subtracts 1.96 from the slope.
D) allows you to make statements about the economic importance of your estimate.
Answer: A

1
Copyright © 2011 Pearson Education, Inc.
6) If the absolute value of your calculated t-statistic exceeds the critical value from the standard normal
distribution, you can
A) reject the null hypothesis.
B) safely assume that your regression results are significant.
C) reject the assumption that the error terms are homoskedastic.
D) conclude that most of the actual values are very close to the regression line.
Answer: A

7) Under the least squares assumptions (zero conditional mean for the error term, Xi and Yi being i.i.d.,
and Xi and ui having finite fourth moments), the OLS estimator for the slope and intercept
A) has an exact normal distribution for n > 15.
B) is BLUE.
C) has a normal distribution even in small samples.
D) is unbiased.
Answer: D

8) In general, the t-statistic has the following form:


estimate-hypothesize value
A)
standard error of estimate
estimator
B)
standard error of estimate

estimator-hypothesize value
C)
standard error of estimate

estimator - hypothesize value


D) standard error of estimator
n
Answer: C

9) Consider the following regression line: = 698.9 – 2.28 × STR. You are told that the t-statistic on
the slope coefficient is 4.38. What is the standard error of the slope coefficient?
A) 0.52
B) 1.96
C) -1.96
D) 4.38
Answer: A

10) Imagine that you were told that the t-statistic for the slope coefficient of the regression line =
698.9 – 2.28 × STR was 4.38. What are the units of measurement for the t-statistic?
A) points of the test score
B) number of students per teacher
TestScore
C)
STR
D) standard deviations
Answer: D

2
Copyright © 2011 Pearson Education, Inc.
11) The construction of the t-statistic for a one- and a two-sided hypothesis
A) depends on the critical value from the appropriate distribution.
B) is the same.
C) is different since the critical value must be 1.645 for the one-sided hypothesis, but 1.96 for the two-
sided hypothesis (using a 5% probability for the Type I error).
D) uses ±1.96 for the two-sided test, but only +1.96 for the one-sided test.
Answer: B

12) The p-value for a one-sided left-tail test is given by


A) Pr(Z - tact ) = φ(tact).
B) Pr(Z < tact ) = φ(tact).
C) Pr(Z < tact ) < 1.645.
D) cannot be calculated, since probabilities must always be positive.
Answer: B

13) The 95% confidence interval for β1 is the interval


A) (β1 - 1.96SE)(β1), β1 + 1.96SE(β1)).
B) ( 1 - 1.645SE)( 1), 1 + 1.645SE( 1)).
C) ( 1 - 1.96SE)( 1), 1 + 1.96SE( 1)).
D) ( 1 - 1.96, 1 + 1.96).
Answer: C

14) The 95% confidence interval for β0 is the interval


A) (β0 - 1.96SE(β0), β0 + 1.96SE(β0)).
B) (β0 - 1.645SE( 0), 0 + 1.645SE( 0)).
C) ( 0 - 1.96SE( 0), 0 + 1.96SE( 0)).
D) ( 0 - 1.96, 0 + 1.96).
Answer: C

15) The 95% confidence interval for the predicted effect of a general change in X is
A) (β1△x - 1.96SE(β1) × △x, β1△x + 1.96SE(β1) × △x).
B) ( 1△x - 1.645SE( 1) × △x, 1△x + 1.645SE( 1) × △x).
C) ( 1△x - 1.96SE( 1) × △x, 1△x + 1.96SE( 1) × △x).
D) ( 1△x - 1.96, 1△x + 1.96).
Answer: C

3
Copyright © 2011 Pearson Education, Inc.
16) The homoskedasticity-only estimator of the variance of 1 is

A)

B)

C)

D)

Answer: A

17) One of the following steps is not required as a step to test for the null hypothesis:
A) compute the standard error of 1.
B) test for the errors to be normally distributed.
C) compute the t-statistic.
D) compute the p-value.
Answer: B

18) Finding a small value of the p-value (e.g. less than 5%)
A) indicates evidence in favor of the null hypothesis.
B) implies that the t-statistic is less than 1.96.
C) indicates evidence in against the null hypothesis.
D) will only happen roughly one in twenty samples.
Answer: C

19) The only difference between a one- and two-sided hypothesis test is
A) the null hypothesis.
B) dependent on the sample size n.
C) the sign of the slope coefficient.
D) how you interpret the t-statistic.
Answer: D

4
Copyright © 2011 Pearson Education, Inc.
20) A binary variable is often called a
A) dummy variable.
B) dependent variable.
C) residual.
D) power of a test.
Answer: A

21) The error term is homoskedastic if


A) var(ui is constant for i = 1,…, n.
B) var(ui depends on x.
C) Xi is normally distributed.
D) there are no outliers.
Answer: A

22) In the presence of heteroskedasticity, and assuming that the usual least squares assumptions hold, the
OLS estimator is
A) efficient.
B) BLUE.
C) unbiased and consistent.
D) unbiased but not consistent.
Answer: C

23) The proof that OLS is BLUE requires all of the following assumptions with the exception of:
A) the errors are homoskedastic.
B) the errors are normally distributed.
C) E(ui .
D) large outliers are unlikely.
Answer: B

24) If the errors are heteroskedastic, then


A) OLS is BLUE.
B) WLS is BLUE if the conditional variance of the errors is known up to a constant factor of
proportionality.
C) LAD is BLUE if the conditional variance of the errors is known up to a constant factor of
proportionality.
D) OLS is efficient.
Answer: B

25) The homoskedastic normal regression assumptions are all of the following with the exception of:
A) the errors are homoskedastic.
B) the errors are normally distributed.
C) there are no outliers.
D) there are at least 10 observations.
Answer: D

5
Copyright © 2011 Pearson Education, Inc.
26) Using the textbook example of 420 California school districts and the regression of testscores on the
student-teacher ratio, you find that the standard error on the slope coefficient is 0.51 when using the
heteroskedasticity robust formula, while it is 0.48 when employing the homoskedasticity only formula.
When calculating the t-statistic, the recommended procedure is to
A) use the homoskedasticity only formula because the t-statistic becomes larger
B) first test for homoskedasticity of the errors and then make a decision
C) use the heteroskedasticity robust formula
D) make a decision depending on how much different the estimate of the slope is under the two
procedures
Answer: C

27) Consider the estimated equation from your textbook

=698.9 - 2.28 STR, R2 = 0.051, SER = 18.6


(10.4) (0.52)

The t-statistic for the slope is approximately


A) 4.38
B) 67.20
C) 0.52
D) 1.76
Answer: A

28) You have collected data for the 50 U.S. states and estimated the following relationship between the
change in the unemployment rate from the previous year ( ) and the growth rate of the respective
state real GDP (gy). The results are as follows

= 2.81 — 0.23 gy, R2= 0.36, SER = 0.78


(0.12) (0.04)

Assuming that the estimator has a normal distribution, the 95% confidence interval for the slope is
approximately the interval
A) [2.57, 3.05]
B) [-0.31,0.15]
C) [-0.31, -0.15]
D) [-0.33, -0.13]
Answer: C

29) Using 143 observations, assume that you had estimated a simple regression function and that your
estimate for the slope was 0.04, with a standard error of 0.01. You want to test whether or not the estimate
is statistically significant. Which of the following possible decisions is the only correct one:
A) you decide that the coefficient is small and hence most likely is zero in the population
B) the slope is statistically significant since it is four standard errors away from zero
C) the response of Y given a change in X must be economically important since it is statistically
significant
D) since the slope is very small, so must be the regression R2.
Answer: B

6
Copyright © 2011 Pearson Education, Inc.
30) You extract approximately 5,000 observations from the Current Population Survey (CPS) and estimate
the following regression function:

= 3.32 — 0.45 Age, R2= 0.02, SER = 8.66


(1.00) (0.04)

where ahe is average hourly earnings, and Age is the individual's age. Given the specification, your 95%
confidence interval for the effect of changing age by 5 years is approximately
A) [$1.96, $2.54]
B) [$2.32, $4.32]
C) [$1.35, $5.30]
D) cannot be determined given the information provided
Answer: A

5.2 Essays and Longer Questions

1) (Continuation from Chapter 4) Sir Francis Galton, a cousin of James Darwin, examined the
relationship between the height of children and their parents towards the end of the 19 th century. It is
from this study that the name "regression" originated. You decide to update his findings by collecting
data from 110 college students, and estimate the following relationship:

= 19.6 + 0.73 × Midparh, R2 = 0.45, SER = 2.0


(7.2) (0.10)

where Studenth is the height of students in inches, and Midparh is the average of the parental heights.
Values in parentheses are heteroskedasticity robust standard errors. (Following Galton's methodology,
both variables were adjusted so that the average female height was equal to the average male height.)
(a) Test for the statistical significance of the slope coefficient.
(b) If children, on average, were expected to be of the same height as their parents, then this would imply
two hypotheses, one for the slope and one for the intercept.
(i) What should the null hypothesis be for the intercept? Calculate the relevant t-statistic and carry out the
hypothesis test at the 1% level.
(ii) What should the null hypothesis be for the slope? Calculate the relevant t-statistic and carry out the
hypothesis test at the 5% level.
(c) Can you reject the null hypothesis that the regression R2 is zero?
(d) Construct a 95% confidence interval for a one inch increase in the average of parental height.
Answer:
(a) H0 : β1 = 0, t=7.30, for H1 : β1 > 0, the critical value for a two-sided alternative is 1.645. Hence we reject
the null hypothesis
(b) H0 : β0 = 0, t=2.72, for H1 : β0 ≠ 0, the critical value for a two-sided alternative is 2.58. Hence we reject
the null hypothesis in (i). For the slope we have H0 : β1 = 1, t=-2.70, for H1 : β1 ≠ 1, the critical value for a
two-sided alternative is 1.96. Hence we reject the null hypothesis in (ii).
(c) For the simple linear regression model, H0 : β1 = 0 implies that R2 = 0. Hence it is the same test as in
(a).
(d) (0.73 – 1.96 × 0.10, 0.73 + 1.96 × 0.10) = (0.53, 0.93).

7
Copyright © 2011 Pearson Education, Inc.
2) (Requires Appendix) (Continuation from Chapter 4) At a recent county fair, you observed that at one
stand people's weight was forecasted, and were surprised by the accuracy (within a range). Thinking
about how the person could have predicted your weight fairly accurately (despite the fact that she did
not know about your "heavy bones"), you think about how this could have been accomplished. You
remember that medical charts for children contain 5%, 25%, 50%, 75% and 95% lines for a weight/height
relationship and decide to conduct an experiment with 110 of your peers. You collect the data and
calculate the following sums:

where the height is measured in inches and weight in pounds. (Small letters refer to deviations from
means as in zi = Zi – Z .)
(a) Calculate the homoskedasticity-only standard errors and, using the resulting t-statistic, perform a test
on the null hypothesis that there is no relationship between height and weight in the population of
college students.
(b) What is the alternative hypothesis in the above test, and what level of significance did you choose?
(c) Statistics and econometrics textbooks often ask you to calculate critical values based on some level of
significance, say 1%, 5%, or 10%. What sort of criteria do you think should play a role in determining
which level of significance to choose?
(d) What do you think the relationship is between testing for the significance of the slope and whether or
not the regression R2 is zero?
Answer:
(a) The formula for the homoskedasticity-only standard errors requires knowledge of the residual
1
variance. But Suˆ 
2
SSR , and SSR=TSS-ESS. Given the result in (2b), SSR=47,604.7, and hence Sû2 =
n2
440.78. The SER is 21.00. Dividing by the square root of the variation in X then results in the
homoskedasticity-only standard error of the slope, which is 0.594. The t-statistic is 10.29, which rejects the
null hypothesis of no relationship.
(b) The alternative hypothesis should be one-sided, since there is strong prior knowledge that taller
people weigh more, on average. Given the size of the t-statistic, the null hypothesis can be rejected at any
reasonable level of significance.
(c) Clearly the levels should not be picked arbitrarily, but should depend on the cost involved with the
size and the power of the test. Consider a person who was accused of murder. In that case, the null
hypothesis is that he is innocent. The size of the test would be the probability of letting an innocent
person go to the electric chair, while (1-power of the test) gives the probability of letting a murderer go
free. There are obviously vastly different costs attached to each error, and these will determine the levels
chosen.
(d) If the slope in a regression function is zero, then there is no relationship between the two variables
involved. Hence testing for the significance of the regression slope is the same as testing whether or not
the regression R2 is zero.

8
Copyright © 2011 Pearson Education, Inc.
3) You have obtained measurements of height in inches of 29 female and 81 male students (Studenth) at
your university. A regression of the height on a constant and a binary variable (BFemme), which takes a
value of one for females and is zero otherwise, yields the following result:

= 71.0 – 4.84×BFemme , R2 = 0.40, SER = 2.0


(0.3) (0.57)
(a) What is the interpretation of the intercept? What is the interpretation of the slope? How tall are
females, on average?
(b) Test the hypothesis that females, on average, are shorter than males, at the 1% level.
(c) Is it likely that the error term is homoskedastic here?
Answer:
(a) The intercept gives you the average height of males, which is 71 inches in this sample. The slope tells
you by how much shorter females are, on average (almost 5 inches). The average height of females is
therefore approximately 66 inches.
(b) The t-statistic for the difference in means is -8.49. For a one-sided test, the critical value is –2.33. Hence
the difference is statistically significant.
(c) It is safer to assume that the variances for males and females are different. In the underlying sample the
standard deviation for females was smaller.

4) (continuation from Chapter 4, number 3) You have obtained a sub-sample of 1744 individuals from the
Current Population Survey (CPS) and are interested in the relationship between weekly earnings and age.
The regression, using heteroskedasticity-robust standard errors, yielded the following result:

= 239.16 + 5.20×Age , R2 = 0.05, SER = 287.21.,


(20.24) (0.57)

where Earn and Age are measured in dollars and years respectively.

(a) Is the relationship between Age and Earn statistically significant?


(b) The variance of the error term and the variance of the dependent variable are related. Given the
distribution of earnings, do you think it is plausible that the distribution of errors is normal?
(c) Construct a 95% confidence interval for both the slope and the intercept.
Answer:
(a) The t-statistic on the slope is 9.12, which is above the critical value from the standard normal
distribution for any reasonable level of significance.
(b) Since the earnings distribution is highly skewed, it is not reasonable to assume that the error
distribution is normal.
(c) The confidence interval for the slope is (4.08,6.32). The confidence interval for the intercept is
(199.49,278.83).

9
Copyright © 2011 Pearson Education, Inc.
5) (Continuation from Chapter 4, number 5) You have learned in one of your economics courses that one
of the determinants of per capita income (the "Wealth of Nations") is the population growth rate.
Furthermore you also found out that the Penn World Tables contain income and population data for 104
countries of the world. To test this theory, you regress the GDP per worker (relative to the United States)
in 1990 (RelPersInc) on the difference between the average population growth rate of that country (n) to
the U.S. average population growth rate (nus ) for the years 1980 to 1990. This results in the following
regression output:

= 0.518 – 18.831×(n – nus), R2=0.522, SER = 0.197


(0.056) (3.177)

(a) Is there any reason to believe that the variance of the error terms is homoskedastic?
(b) Is the relationship statistically significant?
Answer:
(a) There are vast differences in the size of these countries, both in terms of the population and GDP.
Furthermore, the countries are at different stages of economic and institutional development. Other
factors vary as well. It would therefore be odd to assume that the errors would be homoskedastic.
(b) The t-statistic is 5.93, making the relationship statistically significant, i.e., we can reject the null
hypothesis that the slope is different from zero.

10
Copyright © 2011 Pearson Education, Inc.
6) You recall from one of your earlier lectures in macroeconomics that the per capita income depends on
the savings rate of the country: those who save more end up with a higher standard of living. To test this
theory, you collect data from the Penn World Tables on GDP per worker relative to the United States
(RelProd) in 1990 and the average investment share of GDP from 1980-1990 (SK), remembering that
investment equals saving. The regression results in the following output:

= –0.08 + 2.44×SK, R2=0.46, SER = 0.21


(0.04) (0.38)

(a) Interpret the regression results carefully.


(b) Calculate the t-statistics to determine whether the two coefficients are significantly different from zero.
Justify the use of a one-sided or two-sided test.
(c) You accidentally forget to use the heteroskedasticity-robust standard errors option in your regression
package and estimate the equation using homoskedasticity-only standard errors. This changes the results
as follows:

= -0.08 + 2.44×SK, R2=0.46, SER = 0.21


(0.04) (0.26)

You are delighted to find that the coefficients have not changed at all and that your results have become
even more significant. Why haven't the coefficients changed? Are the results really more significant?
Explain.
(d) Upon reflection you think about the advantages of OLS with and without homoskedasticity-only
standard errors. What are these advantages? Is it likely that the error terms would be heteroskedastic in
this situation?
Answer:
(a) An increase in the saving rate of 0.1, or from 0.15 to 0.25, results in an increase in relative GDP per
worker of 0.244, or from 0.5 to roughly 0.75. (Taiwan had a value of 0.5 for RelProd in 1990, while Sweden
was at 0.77.) There is no interpretation for the intercept. The regression explains 46 percent of the
variation in GDP per worker relative to the United States.
(b) The t- statistics are 2.00 and 6.42 for the intercept and slope respectively. You should use a two-sided
test for the intercept, since there are no prior expectations on whether it should be positive or negative.
Hence the intercept is statistically significant at the 5 percent level, but not at the 1 percent level. Since we
expect a positive sign on the slope, we should conduct a one-sided test. The critical values suggest
significance at any reasonable probability level of the size of the test.
(c) Whether you use homoskedasticity-only or heteroskedasticity-robust standard errors does not affect
the estimator, only the formula for the standard errors. If the assumption of homoskedasticity was valid,
then the results would be more significant. However, given the lengthy discussion on homoskedasticity
versus heteroskedasticity in the textbook, it is safer to conduct inference under the assumption of
heteroskedasticity.
(d) In the presence of homoskedasticity in addition to the least squares assumptions in the text, OLS is
BLUE (Gauss-Markov theorem). If the errors are heteroskedastic, then the GLS estimator (weighted least
squares) is BLUE if the form of heteroskedasticity is known, which rarely occurs in practice. Since
economic theory does not suggest, in general, that errors are homoskedastic, it is safer to assume that
they are not. This avoids invalid statistical inference.

11
Copyright © 2011 Pearson Education, Inc.
7) Carefully discuss the advantages of using heteroskedasticity-robust standard errors over standard
errors calculated under the assumption of homoskedasticity. Give at least five examples where it is very
plausible to assume that the errors display heteroskedasticity.
Answer:
There are virtually no examples where economic theory suggests that the errors are homoskedastic.
Hence the maintained hypothesis should be that they are heteroskedastic. Using homoskedasticity-only
standard errors when in truth heteroskedasticity-robust standard errors should be used, results in false
inference. What makes this worse is that homoskedasticity-only standard errors are typically smaller than
heteroskedasticity-robust standard errors, resulting in t-statistics that are too large, and hence rejection of
the null hypothesis too often. There is an alternative GLS estimator, weighted least squares, which is
BLUE, but requires knowledge of how the error variance depends on X, e.g. X or X2. Answers will vary
by student regarding the examples, but earnings functions, cross country beta-convergence regressions,
consumption functions, sports regressions involving teams from markets with varying population size,
weight-height relationships for children, etc., are all good candidates.

12
Copyright © 2011 Pearson Education, Inc.
8) (Requires Appendix material from Chapters 4 and 5) Shortly before you are making a group
presentation on the testscore/student-teacher ratio results, you realize that one of your peers forgot to
type all the relevant information on one of your slides. Here is what you see:

= 698.9 – STR, R2 = 0.051, SER = 18.6


(9.47) (0.48)

In addition, your group member explains that he ran the regression in a standard spreadsheet program,
and that, as a result, the standard errors in parenthesis are homoskedasticity-only standard errors.
(a) Find the value for the slope coefficient.
(b) Calculate the t-statistic for the slope and the intercept. Test the hypothesis that the intercept and the
slope are different from zero.
(c) Should you be concerned that your group member only gave you the result for the homoskedasticity-
only standard error formula, instead of using the heteroskedasticity-robust standard errors?

Answer:
(a) The relationship between the slope coefficient and the regression R2 is

negative sign in front of the slope).


(b) The t-statistics are 73.82 and 4.75 respectively. Hence you can reject the two null hypothesis at any
reasonable level of significance.

13
Copyright © 2011 Pearson Education, Inc.
(c) There is no theory that suggests the homoskedasticity in the error terms in this case. Given the serious
consequences for using homoskedasticity only standard errors in the presence of heteroskedasticity, you
should definitely use the heteroskedasticity robust standard errors for inference.

9) (Continuation of the Purchasing Power Parity question from Chapter 4) The news-magazine The
Economist regularly publishes data on the so called Big Mac index and exchange rates between countries.
The data for 30 countries from the April 29, 2000 issue is listed below:

Price of Actual Exchange Rate


Country Currency Big Mac per U.S. dollar

Indonesia Rupiah 14,500 7,945


Italy Lira 4,500 2,088
South Korea Won 3,000 1,108
Chile Peso 1,260 514
Spain Peseta 375 179
Hungary Forint 339 279
Japan Yen 294 106
Taiwan Dollar 70 30.6
Thailand Baht 55 38.0
Czech Rep. Crown 54.37 39.1
Russia Ruble 39.50 28.5
Denmark Crown 24.75 8.04
Sweden Crown 24.0 8.84
Mexico Peso 20.9 9.41
France Franc 18.5 7.07
Israel Shekel 14.5 4.05
China Yuan 9.90 8.28
South Africa Rand 9.0 6.72
Switzerland Franc 5.90 1.70
Poland Zloty 5.50 4.30
Germany Mark 4.99 2.11
Malaysia Dollar 4.52 3.80
New Zealand Dollar 3.40 2.01
Singapore Dollar 3.20 1.70
Brazil Real 2.95 1.79
Canada Dollar 2.85 1.47
Australia Dollar 2.59 1.68
Argentina Peso 2.50 1.00
Britain Pound 1.90 0.63
United States Dollar 2.51

The concept of purchasing power parity or PPP ("the idea that similar foreign and domestic goods …
should have the same price in terms of the same currency," Abel, A. and B. Bernanke, Macroeconomics, 4th
edition, Boston: Addison Wesley, 476) suggests that the ratio of the Big Mac priced in the local currency
to the U.S. dollar price should equal the exchange rate between the two countries.

14
Copyright © 2011 Pearson Education, Inc.
After entering the data into your spread sheet program, you calculate the predicted exchange rate per
U.S. dollar by dividing the price of a Big Mac in local currency by the U.S. price of a Big Mac ($2.51). To
test for PPP, you regress the actual exchange rate on the predicted exchange rate.

The estimated regression is as follows:

= –27.05 + 1.35 × 1.35×Pr edExRate R2 = 0.994, n = 29, SER = 122.15


(23.74) (0.02)

(a) Your spreadsheet program does not allow you to calculate heteroskedasticity robust standard errors.
Instead, the numbers in parenthesis are homoskedasticity only standard errors. State the two null
hypothesis under which PPP holds. Should you use a one-tailed or two-tailed alternative hypothesis?
(b) Calculate the two t-statistics.
(c) Using a 5% significance level, what is your decision regarding the null hypothesis given the two t-
statistics? What critical values did you use? Are you concerned with the fact that you are testing the two
hypothesis sequentially when they are supposed to hold simultaneously?
(d) What assumptions had to be made for you to use Student's t-distribution?

Answer:
(a) Under PPP, H0 : β0 = 0 and Ho : β1 = 1. Economic theory does not tell you whether the intercept
should be greater or less than zero if PPP does not hold. The same goes for the slope, i.e., you do not
know whether or not it is less than or greater than unity. As a result, you should use a two tailed
alternative hypothesis.
27.05  0 1.35  1
(b) The t-statistic for the intercept is t = = -1.14. For the slope, it is t = = 17.5.
23.74 0.02
(c) Using the Student t-distribution and 27 degrees of freedom, the critical value for a two-sided
alternative is 2.05. Hence you can reject the null hypothesis for the intercept but not the slope. Under PPP,
both hypothesis are supposed to hold simultaneously and if either or both are rejected, then PPP is not
supported by the data. As is discussed later in the textbook, testing hypothesis sequentially is not the
same as testing them simultaneously, since p-values change. (At an intuition and heroically assuming
independence here, Pr(AandB) = Pr(A) × Pr(B); and hence the rejection probability needs to be adjusted.)
(d) In addition to the standard three least squares assumptions, you had to assume that the regression
errors are homoskedastic, and that the regression errors are normally distributed. That is you had to
assume that the homoskedastic normal regression assumptions hold.

15
Copyright © 2011 Pearson Education, Inc.
10) (Continuation from Chapter 4, number 6) The neoclassical growth model predicts that for identical
savings rates and population growth rates, countries should converge to the per capita income level. This
is referred to as the convergence hypothesis. One way to test for the presence of convergence is to
compare the growth rates over time to the initial starting level.
(a) The results of the regression for 104 countries were as follows:

= 0.019 – 0.0006 × RelProd60, R2= 0.00007, SER = 0.016


(0.004) (0.0073)

where g6090 is the average annual growth rate of GDP per worker for the 1960-1990 sample period, and
RelProd60 is GDP per worker relative to the United States in 1960. Numbers in parenthesis are
heteroskedasticity robust standard errors.

Using the OLS estimator with homoskedasticity-only standard errors, the results changed as follows:

= 0.019 – 0.0006×RelProd60, R2= 0.00007, SER = 0.016


(0.002) (0.0068)

Why didn't the estimated coefficients change? Given that the standard error of the slope is now smaller,
can you reject the null hypothesis of no beta convergence? Are the results in the second equation more
reliable than the results in the first equation? Explain.
(b) You decide to restrict yourself to the 24 OECD countries in the sample. This changes your regression
output as follows (numbers in parenthesis are heteroskedasticity robust standard errors):

= 0.048 – 0.0404 RelProd60, R2 = 0.82, SER = 0.0046


(0.004) (0.0063)

Test for evidence of convergence now. If your conclusion is different than in (a), speculate why this is the
case.
(c) The authors of your textbook have informed you that unless you have more than 100 observations, it
may not be plausible to assume that the distribution of your OLS estimators is normal. What are the
implications here for testing the significance of your theory?

Answer:
(a) Using homoskedasticity-only standard errors has no effect on the OLS estimator. The t- statistic
remains small and is certainly below the critical value. The results are less reliable since there is no reason
to believe that the error variance is homoskedastic.
(b) The t-statistic for the slope is 6.41. At face value, there is strong evidence for convergence. Neoclassical
growth theory does not predict unconditional convergence. Instead it only predicts convergence if the
savings rates and population growth rates are identical. It stands to reason that these are much more
similar between OECD countries than between the countries of the world.
(c) Since there are less than 30 observations, the distribution of the t-statistic is unknown. You should
therefore not conduct statistical inference.

16
Copyright © 2011 Pearson Education, Inc.
11) You have collected 14,925 observations from the Current Population Survey. There are 6,285 females
in the sample, and 8,640 males. The females report a mean of average hourly earnings of $16.50 with a
standard deviation of $9.06. The males have an average of $20.09 and a standard deviation of $10.85. The
overall mean average hourly earnings is $18.58.

a. Using the t-statistic for testing differences between two means (section 3.4 of your textbook), decide
whether or not there is sufficient evidence to reject the null hypothesis that females and males have
identical average hourly earnings.

b. You decide to run two regressions: first, you simply regress average hourly earnings on an intercept
only. Next, you repeat this regression, but only for the 6,285 females in the sample. What will the
regression coefficients be in each of the two regressions?

c. Finally you run a regression over the entire sample of average hourly earnings on an intercept and a
binary variable DFemme, where this variable takes on a value of 1 if the individual is a female, and is 0
otherwise. What will be the value of the intercept? What will be the value of the coefficient of the binary
variable?

d. What is the standard error on the slope coefficient? What is the t-statistic?

e. Had you used the homoskedasticity-only standard error in (d) and calculated the t-statistic, how would
you have had to change the test-statistic in (a) to get the identical result?

Answer: a. H0: μF = μM; H1: μF ≠ μM


20.09  16.05
t
10.852 9.062 . As a result, you can comfortably reject the null hypothesis at any reasonable

8640 6285
confidence level.

b. = = 18.58; = = 16.50
0 0

Hence for each of the regressions, the intercept takes on the value of the overall mean for average hourly
earnings, and the mean average hourly earnings for females.

c. = 0 + × DFemme = 20.09 - 3.59× DFemme


1
The intercept is the mean of average hourly earnings for males, and the slope is the difference between
the mean of average hourly earnings of females and males.

d. The standard error on the slope coefficient is 0.16, which is identical to the standard error of the t-
statistic in (a) above. Hence the t-statistic is (-21.98).

17
Copyright © 2011 Pearson Education, Inc.
e. You would have had to use the "pooled" standard error formula (3.23) in your textbook.

18
Copyright © 2011 Pearson Education, Inc.
5.3 Mathematical and Graphical Problems

1) In order to formulate whether or not the alternative hypothesis is one-sided or two-sided, you need
some guidance from economic theory. Choose at least three examples from economics or other fields
where you have a clear idea what the null hypothesis and the alternative hypothesis for the slope
coefficient should be. Write a brief justification for your answer.
Answer: Answers will vary by student. The problem is to find examples where there is only a single
explanatory variable. A student may argue that the price coefficient in a demand function is downward
sloping, but unless you control for other variables, this may not be so. The demand for L.A. Laker tickets
and their price comes to mind. CAPM is a nice example. Perhaps the marginal propensity to consume in
a consumption function is another. Testing for speculative efficiency in exchange rate markets may also
work.

2) For the following estimated slope coefficients and their heteroskedasticity robust standard errors, find
the t-statistics for the null hypothesis H0: β1 = 0. Assuming that your sample has more than 100
observations, indicate whether or not you are able to reject the null hypothesis at the 10%, 5%, and 1%
level of a one-sided and two-sided hypothesis.

(a) 1 = 4.2, SE( 1) = 2.4


(b) 1 = 0.5, SE( 1) = 0.37
(c) 1 = 0.003, SE( 1) = 0.002
(d) 1 = 360, SE( 1) = 300
Answer:
a) t = 1.75; reject null 10% level of two-sided test, and 5% of one-sided test.
b) t = 1.35; cannot reject null at 10% of two-sided test, reject null at 10% of one-sided test.
c) t = 1.50; cannot reject null at 10% of two-sided test, reject null at 10% of one-sided test.
d) t = 1.20; cannot reject null at 10% of both two-sided and one-sided test.

3) Explain carefully the relationship between a confidence interval, a one-sided hypothesis test, and a
two-sided hypothesis test. What is the unit of measurement of the t-statistic?
Answer: In the case of a two-sided hypothesis test, the relationship between the t-statistic and the
confidence interval is straightforward. The t-statistic calculates the distance between the estimate and the
hypothesized value in standard deviations. If the distance is larger than 1.96 (size of the test: 5%), then the
distance is large enough to reject the null hypothesis. The confidence interval adds and subtracts 1.96
standard deviations in this case, and asks whether or not the hypothesized value is contained within the
confidence interval. Hence the two concepts resemble the two sides of a coin. They are simply different
ways to look at the same problem. In the case of the one-sided test, the relationship is more complex.
Since you are looking at a one-sided alternative, it does not really make sense to construct a confidence
interval. However, the confidence interval results in the same conclusion as the t-test if the critical value
from the standard normal distribution is appropriately adjusted, e.g. to 10% rather than 5%. The unit of
measurement of the t-statistic is standard deviations.

19
Copyright © 2011 Pearson Education, Inc.
4) The effect of decreasing the student-teacher ratio by one is estimated to result in an improvement of the
districtwide score by 2.28 with a standard error of 0.52. Construct a 90% and 99% confidence interval for
the size of the slope coefficient and the corresponding predicted effect of changing the student-teacher
ratio by one. What is the intuition on why the 99% confidence interval is wider than the 90% confidence
interval?
Answer:
The 90% confidence interval for the slope is calculated as follows:
(2.28 - 1.645 × 0.52, 2.28 + 1.645 × 0.52) = (1.42, 3.14).

The corresponding predicted effect of a unit change in the student-teacher ratio is the same, since the
change in X is 1.
The 99% confidence interval for the slope coefficient and the unit change in the student-teacher ratio is:

(2.28 - 2.58 × 0.52, 2.28 + 2.58 × 0.52) = (0.94, 3.62).

The 99% confidence interval corresponds to a smaller size of the test. This means that you want to be
"more certain" that the population parameter is contained in the interval, and that requires a larger
interval.

5) Below you are asked to decide on whether or not to use a one-sided alternative or a two-sided
alternative hypothesis for the slope coefficient. Briefly justify your decision.
(a) qˆid = + , where qd is the quantity demanded for a good, and p is its price.
0 1pi
(b) pˆ iactual = 0 + 1 pˆ iactual , where pˆ iactual is the actual house price, and pˆ iactual is the assessed house price.
You want to test whether or not the assessment is correct, on average.
d
i = 0 + 1 Yi , where C is household consumption, and Y is personal disposable income.
(c) d
Answer:
(a) You would use a one-sided alternative hypothesis since economic theory suggests that the quantity
demanded and prices are negatively related.
(b) The alternative hypothesis is H1 : β1 ≠ 1 since assessments could be too large or too small, on average.
You should also test for H1 : β0 ≠ 0.
(c) You should use a one-sided alternative hypothesis, since economic theory strongly suggests that the
marginal propensity to consume is positive.

20
Copyright © 2011 Pearson Education, Inc.
n
6) (Requires Appendix material) Your textbook shows that OLS is a linear estimator 1 =  aˆ Y , where
i 1
i i

Xi  X
aˆi 
 X 
n 2 . For OLS to be conditionally unbiased, the following two conditions must hold:
i X
i 1
n n

 aˆ
i 1
i  0 and  aˆ X
i 1
i i = 1. Show that this is the case.

Answer:

7) (Requires Appendix material and Calculus) Equation (5.36) in your textbook derives the conditional
n
variance for any old conditionally unbiased estimator 1 to be var( 1 X1, ..., Xn) =  u
2
a
i 1
2
i where the
n n
conditions for conditional unbiasedness are  a = 0 and  a X
i 1
i
i 1
i i = 1. As an alternative to the BLUE

proof presented in your textbook, you recall from one of your calculus courses that you could minimize
the variance subject to the two constraints, thereby making the variance as small as possible while the
constraints are holding. Show that in doing so you get the OLS weights aˆi . (You may assume that X1,...,
Xn are nonrandom (fixed over repeated samples).)
Answer:
n n
 n 
The Lagrangian is  u
2

i 1
ai
2
 1 i
i 1
a  2   ai X i  1 ; i  1, n where the λi are two Lagrangian
 i 1 
multipliers. Minimizing the Lagrangian w.r.t. the n weights ai and the two Lagrangian multipliers, results
in (n+2) linear equations in (n+2) unknowns. Solving these for the weights, you get
Xi  X
ai   aˆi
 X 
n 2 .
i X
i 1

21
Copyright © 2011 Pearson Education, Inc.
8) Your textbook states that under certain restrictive conditions, the t- statistic has a Student t-distribution
with n-2 degrees of freedom. The loss of two degrees of freedom is the result of OLS forcing two
restrictions onto the data. What are these two conditions, and when did you impose them onto the data
set in your derivation of the OLS estimator?
n n
Answer: The two conditions are  uˆ
i 1
i = 0 and  uˆ X
i 1
i i = 0. These were the result of minimizing the sum

of the squared prediction errors, i.e., taking the derivative of the prediction mistakes and setting them to
zero.

9) Assume that your population regression function is

Yi = βiXi + ui

i.e., a regression through the origin (no intercept). Under the homoskedastic normal regression
assumptions, the t-statistic will have a Student t distribution with n-1 degrees of freedom, not n–2 degrees
of freedom, as was the case in Chapter 5 of your textbook. Explain. Do you think that the residuals will
still sum to zero for this case?
Answer: In deriving the OLS estimator 1, you minimize the prediction mistake w.r.t. b1 only, not b0 and
n
b1. As a result, you are only placing one restriction on the data, (  uˆ X
i 1
i i = 0) not two. Hence there are n-1
n
independent observations.  uˆ
i 1
i = 0 will no longer hold.

10) In many of the cases discussed in your textbook, you test for the significance of the slope at the 5%
level. What is the size of the test? What is the power of the test? Why is the probability of committing a
Type II error so large here?
Answer: The size of the test is the same as the probability of committing a Type I error. It is therefore 5%.
If the alternative hypothesis is vague, as is the case for H1 : 1 ≠ 0 or H1 : 1 < 0 (or H1 : 1 > 0), then the
distribution of the alternative hypothesis is located virtually on top of the distribution of the null
hypothesis (it is just marginally moved to the left or the right). As a result, the probability of the Type II
error must be 1-probability of the Type I error. Hence the power of the test is only 5%, which is low.

11) Assume that the homoskedastic normal regression assumption hold. Using the Student t-distribution,
find the critical value for the following situation:

(a) n = 28, 5% significance level, one-sided test.


(b) n = 40, 1% significance level, two-sided test.
(c) n = 10, 10% significance level, one-sided test.
(d) n = ∞, 5% significance level, two-sided test.
Answer:
(a) 1.71
(b) between 2.75 (30 degrees of freedom) and 2.66 (60 degrees of freedom)
(c) 1.40
(d) 1.96

22
Copyright © 2011 Pearson Education, Inc.
12) Consider the following two models involving binary variables as explanatory variables:

= + DFemme and = DFemme + Male

where Wage is the hourly wage rate, DFemme is a binary variable that is equal to 1 if the person is a
female, and 0 if the person is a male. Male = 1 – DFemme. Even though you have not learned about
regression functions with two explanatory variables (or regressions without an intercept), assume that
you had estimated both models, i.e., you obtained the estimates for the regression coefficients.

What is the predicted wage for a male in the two models? What is the predicted wage for a female in the
two models? What is the relationship between the β s and the φs? Why would you prefer one model over
the other?
Answer: For DFemme = 1, the models read = + and = ; for DFemme = 0, the models
read = and = . Hence both and give you the average wage of males. Clearly =
. Since the wage for females is = , and the wage for males is , then must be the
difference in the wage between males and females. Hence the first formulation allows you to test directly
whether or not the difference in means (here wages) is statistically significant.

13) Consider the sample regression function i = + Xi. The table below lists estimates for the slope (

) and the variance of the slope estimator ( ). In each case calculate the p-value for the null

hypothesis of β1 = 0 and a two-tailed alternative hypothesis. Indicate in which case you would reject the
null hypothesis at the 5% significance level.

–1.76 0.0025 2.85 -0.00014


0.37 0.000003 117.5 0.0000013

Answer: The t-statistics are -2.89, 1.36, 0.26, and -0.123 respectively, with p-values of 0.004, 0.17, 0.79, and
0.90. Hence you only reject the null hypothesis for the first case.

23
Copyright © 2011 Pearson Education, Inc.
14) Your textbook discussed the regression model when X is a binary variable

Yi = β0 + β1Di + ui, i = 1..., n

Let Y represent wages, and let D be one for females, and 0 for males. Using the OLS formula for the slope
coefficient, prove that is the difference between the average wage for males and the average wage for
females.
Answer: Using the OLS formula for the slope, we have

24
Copyright © 2011 Pearson Education, Inc.
15) Your textbook discussed the regression model when X is a binary variable

Yi = β0 + βiDi + ui, i = 1,..., n

Let Y represent wages, and let D be one for females, and 0 for males. Using the OLS formula for the
intercept coefficient, prove that is the average wage for males.
Answer: =Y - X . It is easy but tedious to show that the formula for the slope reduces to the
difference between the average wage for females and the average wage for males.

16) Let ui be distributed N(0,  u ), i.e., the errors are distributed normally with a constant variance
2

 u2
 2ˆ1 
being distributed N(β1, 
2

 X 
n 2
(homoskedasticity). This results in ˆ1 ), where . Statistical
i X
i 1

inference would be straightforward if  was known. One way to deal with this problem is to replace
2
u

 u2 with an estimator Sû2 . Clearly since this introduces more uncertainty, you cannot expect to be still
normally distributed. Indeed, the t-statistic now follows Student's t distribution. Look at the table for the
Student t-distribution and focus on the 5% two-sided significance level. List the critical values for 10
degrees of freedom, 30 degrees of freedom, 60 degrees of freedom, and finally ∞ degrees of freedom.
Describe how the notion of uncertainty about  u can be incorporated about the tails of the t-distribution
2

as the degrees of freedom increase.


Answer: More uncertainty implies that the tales of the distribution should be stretched further to the left
and right when compared to the normal distribution. Hence the critical values for the 5% significance
level should be greater than 1.96 in absolute levels. However, as the number of observations (degrees of
freedom) increase, Sû will converge towards  u , so that the shape of the t-distribution should resemble
2 2

the normal distribution more and more. Finally, when there are infinite degrees of freedom, the sample
2
formula Sû becomes the population variance, and the t-distribution should converge to the normal
distribution.

25
Copyright © 2011 Pearson Education, Inc.
17) In a Monte Carlo study, econometricians generate multiple sample regression functions from a known
population regression function. For example, the population regression function could be Yi = β0 + β1Xi =
100 – 0.5 Xi. The Xs could be generated randomly or, for simplicity, be nonrandom ("fixed over repeated
samples"). If we had ten of these Xs, say, and generated twenty Ys, we would obviously always have all
observations on a straight line, and the least squares formulae would always return values of 100 and 0.5
numerically. However, if we added an error term, where the errors would be drawn randomly from a
normal distribution, say, then the OLS formulae would give us estimates that differed from the
population regression function values. Assume you did just that and recorded the values for the slope
and the intercept. Then you did the same experiment again (each one of these is called a "replication").
And so forth. After 1,000 replications, you plot the 1,000 intercepts and slopes, and list their summary
statistics.

Sample: 1 1000

BETA0_HAT BETA1_HAT

Mean 100.014 –0.500


Median 100.021 –0.500
Maximum 106.348 –0.468
Minimum 93.862 –0.538
Std. Dev. 1.994 0.011
Skewness 0.013 –0.042
Kurtosis 3.026 2.986

Jarque-Bera 0.055 0.305


Probability 0.973 0.858

Sum 100014.353 –499.857


Sum Sq. Dev. 3972.403 0.118

Observations 1000.000 1000.000

26
Copyright © 2011 Pearson Education, Inc.
Here are the corresponding graphs:

Using the means listed next to the graphs, you see that the averages are not exactly 100 and –0.5.
However, they are "close." Test for the difference of these averages from the population values to be
statistically significant.

27
Copyright © 2011 Pearson Education, Inc.
Answer: You can use a simple t-statistic to calculate whether or not (-0.499857) and 100.0144 are
statistically different from (-0.5) and 100. In the denominator of that statistic you would simply put the
standard deviations (0.0109 and 1.9941) divided by the square root of 1,000. As you can see, r =
0.499857   0.50  100.0144  100
0.0109 = -0.41 and t = 1.9941 = 0.29. Neither one of the estimators is more than
1000 1000
1.96 standard deviations from truth, and hence you cannot reject the null hypothesis that the estimators
are unbiased.
n

X Y
i 1
i i
18) In the regression through the origin model Yi = β1Xi + ui, the OLS estimator is 1 = n
. Prove
X
i 1
i
2

that the estimator is a linear function of Y1,..., Yn and prove that it is conditionally unbiased.
Xi
n
Answer: Let wi = , then 1 = wiYi. Hence the OLS estimator is a linear function of Y1..., Yn. Next,
X
i 1
i
2

since Yi = β1Xi + ui, we get


n n n
ˆ1   wi  i X i  ui   1  wi X i   wi ui .
i 1 i 1 i 1
n

Xi n X i
2
n
wi  n
,  wi X i  i 1
n
 1 implies 1 = β1 + wu i i . Taking expectations on both sides, we find
X
i 1
i
2 i 1
X
i 1
i
2 i 1

1 n  1 n 
  X u    X i E  ui X 1 , , X n  
 n
 n i 1
i i
n
E( 1) = β1 + E   wi ui  = β1 + E  n
 = β1 + E  i 1  = β1
 i 1   1 2   1 n 2 
 n  
Xi Xi
i 1

 
 n i 1

The last equality follows by using the law of iterated expectations. By least squares assumptions, ui is
distributed independently of X for all observations other than i, so E(ui = 0. Hence E(
1 .

28
Copyright © 2011 Pearson Education, Inc.
19) The neoclassical growth model predicts that for identical savings rates and population growth rates,
countries should converge to the per capita income level. This is referred to as the convergence
hypothesis. One way to test for the presence of convergence is to compare the growth rates over time to
the initial starting level, i.e., to run the regression = + × RelProd60 , where g6090 is the
average annual growth rate of GDP per worker for the 1960-1990 sample period, and RelProd60 is GDP
per worker relative to the United States in 1960. Under the null hypothesis of no convergence, β1 = 0; H1 :
β1 < 0, implying ("beta") convergence. Using a standard regression package, you get the following output:

Dependent Variable: G6090


Method: Least Squares
Date: 07/11/06 Time: 05:46
Sample: 1 104
Included observations: 104
White Heteroskedasticity-Consistent Standard Errors & Covariance

You are delighted to see that this program has already calculated p-values for you. However, a peer of
yours points out that the correct p-value should be 0.4562. Who is right?
Answer: Statistical packages typically do not know what the alternative hypothesis is. As a result, the
packages calculate t-statistics and p-values for H1 : β1 ≠ 0. You can tell your fellow student that she is
right and you will still have to calculate p-values (and t-statistics) by hand for cases other than
H1 : β1 ≠ 0.

20) Changing the units of measurement obviously will have an effect on the slope of your regression
function. For example, let Y*= aY and X* = bX. Then it is easy but tedious to show that
n

x y  
i i
a ˆ
ˆ 
1
 i 1
n
 1 . Given this result, how do you think the standard errors and the regression R2 will
b
x
i 1
2
i

change?
Answer: Statistical inference should not depend on whim, and hence changes in the units of
measurement cannot have an effect on the regression R2. Also, the t-statistics should not change, and
a
hence SE( ˆ1 ) must change accordingly (SE( ˆ1 ) = × SE( ˆ1 )).
b

29
Copyright © 2011 Pearson Education, Inc.
21) Using the California School data set from your textbook, you run the following regression:

= 698.9 - 2.28 STR


n = 420, SER = 9.4

where TestScore is the average test score in the district and STR is the student-teacher ratio. The sample

standard deviation of test scores is 19.05, and the sample standard deviation of the student teacher ratio is

1.89.

a.
Find the regression R2 and the correlation coefficient between test scores and the student teacher ratio.

b.
Find the homoskedasticity-only standard error of the slope.

Answer:
SSR 144611.3
a. R2 = 1 - =1- = 0.051
TSS 152490.6
The correlation coefficient is the (negative) square root of this, or ( -0.23).
18.6
b. Using formula (5.29), you get β1= = 0.48
38.8

30
Copyright © 2011 Pearson Education, Inc.
22) Using the California School data set from your textbook, you run the following regression:

= 698.9 - 2.28 STR


2
n = 420, R = 0.051, SER = 18.6

where TestScore is the average test score in the district and STR is the student-teacher ratio. Using

heteroskedasticity robust standard errors, you find

while choosing the homoskedasticity-only option, the standard error is 0.48.

a. Calculate the t-statistic for both standard errors.

b. Which of the two t-statistics should you base your inference on?
Answer:
a. The respective t-statistics are 4.39 (heteroskedasticity-robust standard error) and 4.75
(homoskedasticity-only standard error).

b. Given the similarity of the two statistics and the fact that both are greater than 4, it will not make much
of a difference which one you will use. However, it is "cleaner" to use the heteroskedasticity-robust
formula, since, in general, it will result in the correct inference procedure.

31
Copyright © 2011 Pearson Education, Inc.
23) Using data from the Current Population Survey, you estimate the following relationship between

average hourly earnings (ahe) and the number of years of education (educ):

= -4.58 + 1.71 educ

The heteroskedasticity-robust standard error on the slope is (0.03). Calculate the 95% confidence interval
for the slope. Repeat the exercise using the 90% and then the 99% confidence interval. Can you reject the
null hypothesis that the slope coefficient is zero in the population?
Answer: The 95% confidence interval for the slope is (1.65,1.77). For the 90% confidence level, you get
(1.66,1.75) while the interval is (1.63,1.79) for the 99% level. Since neither of the confidence intervals
contains zero, you can comfortably reject the null hypothesis in all three cases.

32
Copyright © 2011 Pearson Education, Inc.

You might also like