Econometrics Example Questions and Solutions
Econometrics Example Questions and Solutions
Variable | Obs Mean Std . Dev . Min Max - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - lbwght | 1192 4.767457 .1879538 3.135494 5.602119 lfaminc | 1192 3.275699 .7157919 -.6931472 4.174387 fatheduc | 1192 13.18624 2.745985 1 18
where lbwght is the log of birth weight of a child (in ounces), lf aminc is the log of family income at birth, and f atheduc is the years of schooling of the childs father. You are interested in the relationship between (family) income and birth weight and estimate the following regression lbwghti = 0 + 1 lf aminci + ui which gives you the following results:
. reg lbwght lfaminc Number of obs F ( 1 , 1190) Prob > F R - squared Adj R - squared Root MSE = = = = = = 1192 7.16 0.0075 ?????? 0.0051 .18747
(1)
Source | SS df MS - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Model | .251709024 1 .251709024 Residual | ?????????? 1190 .035144808 - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Total | 42.07403 1191 .035326641
-----------------------------------------------------------------------------lbwght | Coef . Std . Err . t P >| t | [95% Conf . Interval ] - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - lfaminc | .0203099 .0075891 2.7 0.008 ???????? ???????? _cons | 4.700928 .0254456 184.7 0.000 4.651004 4.750851 ------------------------------------------------------------------------------
1. What is the R-squared of this regression? How do you interpret it? ANSWER: R-squared = 0.25/42 = 0.006 which is the fraction of the variation in log birth weight explained by log family income 2. Compute the residual sum of squares. ANSWER: RSS = 42.07 - 0.25 = 41.82
3. Compute the 95% condence interval for the coecient on lf aminc. Can you reject at the 5% signicance level that the coecient on lf aminc equals 0.035? ANSWER: [b-1.96*se,b+1.96*se]=[0.0054,0.0352] 0.035 is just inside the CI and we therefore dont reject the null. 4. What is the t-value corresponding to the null-hypothesis that the intercept equals 5? ANSWER: t=(4.7-5)/.025 = -12 5. Interpret the coecient on lf aminc. ANSWER: a 1% increase in family income increases birthweight with 0.02%
The following regression adds the education of the father in years (f atheduc) as an explanatory variable:
. reg lbwght lfaminc fatheduc Number of obs F ( 2 , 1189) Prob > F R - squared Adj R - squared Root MSE = = = = = = 1192 4.91 0.0076 0.0082 0.0065 .18734
Source | SS df MS - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Model | .34438718 2 .17219359 Residual | 41.7296428 1189 .035096419 - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Total | 42.07403 1191 .035326641
-----------------------------------------------------------------------------lbwght | Coef . Std . Err . t P >| t | [95% Conf . Interval ] - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - lfaminc | .0147377 .008323 1.8 0.077 -.0015917 .0310671 fatheduc | .0035255 .0021695 1.6 0.104 -.000731 .0077821 _cons | 4.672692 .0307978 151.7 0.000 4.612267 4.733116 ------------------------------------------------------------------------------
6. Suppose that the true population model is lbwghti = a0 + a1 lf aminci + a2 f atheduci + i (2)
where i is the error term. Suppose that the regression model given by equation 2 satises Assumptions MLR.1-MLR.4 (linear in parameters, random sampling, sample variation in each explanatory variable, and zero conditional mean), which implies unbiasedness of OLS. Compute the estimated omitted variables bias from using the simple regression model, given by equation 1. Use the omitted variables bias formula to determine the slope estimate from a regression of f atheduc on a constant and lf aminc. ANSWER: This is an application of the omitted variable bias formula.
We can now replace lbwghti with the true model and get
n i=1 (lf aminci lf aminc)f atheduci n 2 i=1 (lf aminci lf aminc) n i=1 (lf aminci lf aminc)i n 2 i=1 (lf aminci lf aminc)
1 = a1 + a2
We see that we get a biased estimate of a1 . 1 ) from a regression of f atheduci on a The second term is the slope estimate ( constant and lf aminci . We can get an estimate of the bias directly from the omitted variables formula: 1 = .0203099 .0147377 = .0055722 Bias = E [1 a1 |lf aminci , f atheduci ] = a2
Rearranging this equation and using that a2 is estimated to be .0035255, we get 1 = .0055722/.0035255
This suggests that high educated men have higher family income and since high educated men produce children with higher birth weight, there is a positive omitted variables bias Lastly, we change the specication and now regress the birth weight (in ounces) on lf aminc and f atheduc and estimate: bwghti = b0 + b1 lf aminci + b2 f atheduci + i
. reg bwght lfaminc fatheduc Source | SS df MS - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Model | 4703.73859 2 2351.86929 Residual | 478199.818 1189 402.186558 - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Total | 482903.556 1191 405.460585 Number of obs F ( 2 , 1189) Prob > F R - squared Adj R - squared Root MSE = = = = = = 1192 5.85 0.0030 0.0097 0.0081 20.055
(3)
-----------------------------------------------------------------------------bwght | Coef . Std . Err . t P >| t | [95% Conf . Interval ] - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - lfaminc | 1.59812 .890968 1.8 0.073 -.1499252 3.346164 fatheduc | .4445568 .2322474 1.9 0.056 -.0111035 .9002172 _cons | 108.4223 3.296877 32.9 0.000 101.954 114.8906 ------------------------------------------------------------------------------
7. Interpret the coecient on lf aminc. ANSWER: A 1% increase in family income increases birth weight by 0.016 ounces (everything else equal) 8. A friend says that you cannot interpret your estimates in a causal manner because the R-squared is too low. What do you reply? ANSWER: Causality is about whether the explanatory variables are independent of the error term (E[u|x]=0)) the R-squared does not say anything about this, it only say something about the explanatory power (t) of the model. 9. Suppose you want to test whether 10% extra family income is equivalent to a year of extra paternal education. White down the null hypothesis for this test in terms of b1 and b2 . ANSWER: The appropriate null hypothesis is H0 : 0.1b1 = b2 10. Suppose that in one region a novel medical treatment for pregnant women was introduced in 2010. All pregnant women in this region received the treatment throughout their pregnancy (treated). In other regions pregnant women didnt receive treatment (control group). You have data on a childs birth weight, the familys income and the fathers education for both treated and control groups, and for births before and (9 months) after the introduction of treatment. Explain how you would modify model 3 to evaluate the impact of treatment on a childs birth weight. Control explicitly for family income and fathers education, and explain if and what new variables you generate. Write down the null hypothesis that treatment has no eect on birth weight. ANSWER: This is an application of Dierence-in-Dierences (DiD). Recall that the DiD estimator is given by: DiD = ( ytreated,af ter y treated,bef ore ) ( ycontrol,af ter y control,bef ore ) where the notation is standard. The second dierence removes the time trend from the rst dierence if we impose a common trend assumption across the two groups. Then we are left with the policy impact (everything else equal). The following equation can deliver the DiD estimator: bwghtit = 0 +1 lf amincit +2 f atheducit +3 timeit +4 treatedit +5 timeit treatedit +uit
The variable timeit is a dummy that takes on the value 1 if the observation comes from after the treatment was introduced, 0 otherwise. The variable treatedit is also a dummy: takes on the value 1 if the mother belongs to the treated group, 0 otherwise. Then it turns out that 5 is the DiD estimator. To see that, set lf amincit = f atheducit = 0 for simplicity. Then: treated control Using the expression: DiD = ( ytreated,af ter y treated,bef ore ) ( ycontrol,af ter y control,bef ore ) you can see that the DiD estimator is equivalent to 5 . (Notice that if the covariates change with time, then the last expression above will not be equivalent to 5 . An adjustment must be made to get the policy impact 5 .) before 0 + 4 0 after 0 + 3 + 4 + 5 0 + 3