PS5 Answer Key
PS5 Answer Key
PS5 Answer Key
1 DAGs
1.1. For the following DAGs, state which variables we should control for in order to estimate
the causal effect of X on Y . (i.e., in other words, which variables do we need to control
for to satisfy the backdoor criterion). Also note if your “control strategy” is feasible or
not.
(a) .
A Z B
X M Y
We can start by writing down all of the paths from X to Y:
i. X M Y
ii. X A Z B Y
iii. X Z Y
iv. X Z B Y
v. X A Z Y
(iii.), (iv.), and (v.) all are open backdoor paths. To satisfy the backdoor criterion we
must close them. Controlling for Z will be sufficient to control all of these backdoor
paths. HOWEVER, notice that controlling for Z opens up backdoor path (ii.)!
Therefore we need to add a control, either for A or B. This is a feasible strategy
since all variables are observed.
(b) .
S U
X Y
A
Again, let’s write down all of the paths from X to Y:
i. X Y
ii. X A Y
iii. X S Y
iv. X A S Y
X1 Y
U X3
i. X1 Y
ii. X1 U X3 Y
iii. X1 X2 Y
iv. X1 X2 V Y
So there are no open backdoors. (Note though the subtlety if we change the direc-
tion of the arrows between Y /X2 , and X1 /X2 , in that case we would not want to
control for X2 . In general controlling for variables “downstream” of the outcome, or
downstream of the treatment is not good.)
X3 :
i. X3 Y
ii. X3 U X1 Y
iii. X3 U X1 X2 Y
iv. X3 U X1 X2 V Y
Notice that all backdoors here are closed due to our controls! So βb3 is unbiased.
Note as well that when controlling for X1 , we open a collider in (iii.) and (iv.), but
thankfully we control for X2 so overall these are still considered closed backdoors.
2 Heteroskedasticity
2.1. Which of the following are consequences of heteroskedasticity? (Relative to homoskedas-
ticity)
(a) The OLS estimators βbj are biased.
(b) Our test of multiple hypotheses (F-test), is no longer valid.
(c) Our errors are no longer normally distributed.
First, remember that MLR.5 was not necessary to prove unbiasedness, so this doesn’t
matter for bias. Second also note that distribution and variance are separate (although
somewhat related) assumptions (MLR.5 vs. MLR.6) so changes to MLR.5 do not affect
MLR.6.
So why is (b) correct? I.e. why can’t we do the F-test like we did before, if there is
heteroskedasticity? Unfortunately, understanding this deeply requires linear algebra and
is reserved for a masters level course. So we won’t spend much time here, but just know
that under the hood, if we want to do multiple hypothesis tests under heteroskedasticity
the computer is doing something a bit fancier than the “SSR” F-statistic we were working
with under homoskedasticity. I would simply put this fact on your cheat sheet.
Note: This is based on Wooldridge Problem 8.1
2.2. Consider a linear model to explain monthly beer consumption: (note: inc is income)
beer = β0 + β1 inc + β2 price + β3 educ + β4 f emale + u
E(u|x) = 0
V ar(u|x) = σ 2 inc2
1 1
(beer( = (β0 + β1 inc + β2 price + β3 educ + β4 f emale + u)
inc inc
beer price educ f emale u
= (β0 + β1 ) + β2 + β3 + β4 +
inc inc inc inc inc
beer price educ f emale
= β0′ + β2 + β3 + β4 + u′
inc inc inc inc
Therefore, if we divide all of our variables by inc, we would have a homoskedastic
model instead. The coefficients would all be the same, except for the intercept would
now also incorporate β1 instead of being able to estimate β1 seperately.
3 Missing Data
3.1. Say we are interested in analyzing the relationship between crime and campus and student
enrollment:
log(crime) = β0 + β1 log(enroll) + u
However, the only schools that show up in our sample are those that reported crimes. In
other words we are missing data from the schools who did not report crimes.
(a) What types of missingness would be “ok” in the sense that we wouldn’t be worried
about it biasing our regression analyses?
For the “named” types, “Missing Completely at Random” and “Missing at Random”.
In a general sense, any missingness that is not associated with the error term is “ok”!
(b) Describe a likely/possible reason why the types of missingness you listed in (a) would
not hold for this example.
All we need to do is think of an example where missingness is associated with the
error term. There are two possibilities that come to mind. One is that schools in
dangerous locations may be incentivized not to report crime. Therefore it is not a
random sample, more schools with high u are missing. The opposite may also be
true, where if schools have no crime to report at all, they may not report crime
and not be in the sample. So that is the opposite: more schools with low u are
missing. (Note that this example could easily be extended to measurement error
as well: schools with more crime fudge their numbers downwards, which means the
measurement error would be related to the error term u.).
4 Measurement Error
4.1. We often assume that measurement error has zero mean, which is usually justifiable. How-
ever, let’s consider the case where it does not hold. Say we observe y but the underlying
yi∗ = β0 + β1 xi + ui
yi = yi∗ + eyi
Where E[eyi |x] = c, where c is some constant, but eyi otherwise follows classical, random,
error. (Remember that classical error would state that Cov(eyi , x∗i ) = 0, and also that
here we don’t have measurement error in x so, xi = x∗i .). Also, for simplicity assume
MLR.4 holds, i.e., E[ui |xi ] = 0.
(a) How exactly would our estimates for β0 and β1 be biased in this case?
Remember from way back when, on problem set 2, problem 1.2, we did something
similar. It turns out only the estimate of the constant term (β0 ) is affected. To show
this, let, let’s start by just spelling out the feasible regression:
yi∗ = β0 + β1 xi + ui
yi − eiy = β0 + β1 xi + ui
yi = β0 + β1 xi + ui + ey
Remember, we have bias problems when E[eyi |x] ̸= 0. So we want to try to rearrange
things so we get an error whose expectation is zero. Let’s substitute vi = eiy − c.
Note that E[vi |x] = 0.
yi = β0 + β1 xi + ui + vi + c
yi = (β0 + c) + β1 x + ui + vi
yi = β0′ + β1 xi + u′i
For this recasting of the regression, E(u′i |x) = 0, due to our substitution, the as-
sumption of classical measurement error, and MLR.4. However our estimate of β0
is affected, and now we are recovering β0′ instead, which is shifted. So all in all, it
will bias the constant term which we don’t usually care about anyways, so this sort
of non-zero-mean measurement error in y is not a big deal. When we think about
measurement error we care more in how it varies than if it is mean zero, unless it
were a case we really cared about estimating the intercept term without bias.
4.2. Say we want to explain weekly hours of a child’s television viewing with the following
model:
We are worried that we actually don’t observe the true value, tvhours∗ , because we are
using a survey. Instead we only observe tvhours. (tvhours = tvhours∗ + etv ).
In general: note that this is error in the DEPENDENT variable only, and a different case
would be error in the x (independent) variables. Practice exam will have an example
problem for this other case, and also see lecture slides.