Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

PS5 Answer Key

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

PS5 - Misc Topics - (Not to be turned in,

extra study material for final)


(DAGs, Heteroskedasticity, Missing Data, Meas. Error)
Econometric Methods (Econ 3035)
Fall 2023, Vanderbilt University
Instructor: John Stromme

• Reading (for reference): See “Optional Readings” on Brightspace.

1 DAGs
1.1. For the following DAGs, state which variables we should control for in order to estimate
the causal effect of X on Y . (i.e., in other words, which variables do we need to control
for to satisfy the backdoor criterion). Also note if your “control strategy” is feasible or
not.

(a) .
A Z B

X M Y
We can start by writing down all of the paths from X to Y:

i. X M Y
ii. X A Z B Y
iii. X Z Y
iv. X Z B Y
v. X A Z Y
(iii.), (iv.), and (v.) all are open backdoor paths. To satisfy the backdoor criterion we
must close them. Controlling for Z will be sufficient to control all of these backdoor
paths. HOWEVER, notice that controlling for Z opens up backdoor path (ii.)!
Therefore we need to add a control, either for A or B. This is a feasible strategy
since all variables are observed.
(b) .
S U

X Y

A
Again, let’s write down all of the paths from X to Y:
i. X Y
ii. X A Y
iii. X S Y
iv. X A S Y

Page 1 Compiled on 2023/12/08 at 14:03:39


v. X A S U Y
vi. X S U Y
vii. X S A Y
(ii.), (iii.), (iv.), (vi.), (vii.) are open backdoor paths. Controlling for A will close the
backdoor for (ii.), (iv.), and (vii.). To close (vi.) we may want to control for S (and
we need to control for S to close (iii.)). However, when we do this, it will open the
collider in (v.). BUT, thankfully we already controlled for A, so (v.) overall remains
closed. So the solution is to control for A and S.

1.2. We estimate this regression:

y = βb0 + βb1 x1 + βb2 x2 + βb3 x3 + u

where the underlying model is specified by this DAG:


X2 V

X1 Y

U X3

(a) Which of the coefficients estimate a causal relationship?


At first glance we can immediately see that βb2 is biased because V is a confounder.
Turning to X1 and X3 , it looks like we will be all ok because of closed doors, but to
be 100% thorough we can write out the paths for X1 and X3 :
X1 :

i. X1 Y

ii. X1 U X3 Y

iii. X1 X2 Y

iv. X1 X2 V Y

So there are no open backdoors. (Note though the subtlety if we change the direc-
tion of the arrows between Y /X2 , and X1 /X2 , in that case we would not want to
control for X2 . In general controlling for variables “downstream” of the outcome, or
downstream of the treatment is not good.)
X3 :

i. X3 Y

ii. X3 U X1 Y

iii. X3 U X1 X2 Y

iv. X3 U X1 X2 V Y

Notice that all backdoors here are closed due to our controls! So βb3 is unbiased.
Note as well that when controlling for X1 , we open a collider in (iii.) and (iv.), but
thankfully we control for X2 so overall these are still considered closed backdoors.

Page 2 Compiled on 2023/12/08 at 14:03:39


(b) What would you need to control for in order to have all estimates be causal relation-
ships? Is this feasible?
In order to also get a causal estimate for X1 and X2 we would need to control for V .
However V is unobserved so is not feasible.

2 Heteroskedasticity
2.1. Which of the following are consequences of heteroskedasticity? (Relative to homoskedas-
ticity)
(a) The OLS estimators βbj are biased.
(b) Our test of multiple hypotheses (F-test), is no longer valid.
(c) Our errors are no longer normally distributed.
First, remember that MLR.5 was not necessary to prove unbiasedness, so this doesn’t
matter for bias. Second also note that distribution and variance are separate (although
somewhat related) assumptions (MLR.5 vs. MLR.6) so changes to MLR.5 do not affect
MLR.6.
So why is (b) correct? I.e. why can’t we do the F-test like we did before, if there is
heteroskedasticity? Unfortunately, understanding this deeply requires linear algebra and
is reserved for a masters level course. So we won’t spend much time here, but just know
that under the hood, if we want to do multiple hypothesis tests under heteroskedasticity
the computer is doing something a bit fancier than the “SSR” F-statistic we were working
with under homoskedasticity. I would simply put this fact on your cheat sheet.
Note: This is based on Wooldridge Problem 8.1
2.2. Consider a linear model to explain monthly beer consumption: (note: inc is income)
beer = β0 + β1 inc + β2 price + β3 educ + β4 f emale + u
E(u|x) = 0
V ar(u|x) = σ 2 inc2

(a) Does this model exhibit hetero or homoskedasticity? Explain.


Heteroskedasticity. The variance of the error term (u) depends on the regressors, in
this case it only depends on inc.
(b) If it exhibits heteroskedasticity, can you transform the regression to one that instead
has a homoskedastic error term? (Show/Explain)
u 1 2
Note that V ar( inc ) = inc 2 V ar(u) = σ . In words, if we can get the error to be
divided by inc, then it will be homoskedastic. So let’s try just dividing everything
by inc:

1 1
(beer( = (β0 + β1 inc + β2 price + β3 educ + β4 f emale + u)
inc inc
beer price educ f emale u
= (β0 + β1 ) + β2 + β3 + β4 +
inc inc inc inc inc
beer price educ f emale
= β0′ + β2 + β3 + β4 + u′
inc inc inc inc
Therefore, if we divide all of our variables by inc, we would have a homoskedastic
model instead. The coefficients would all be the same, except for the intercept would
now also incorporate β1 instead of being able to estimate β1 seperately.

Page 3 Compiled on 2023/12/08 at 14:03:39


(c) In practice, what is the drawback of using your solution in (b) to then estimate
homoskedastic errors, relative to running the regression as originally given and esti-
mating heteroskedastic-robust errors?
The assumption that V ar(u|x) = σ 2 inc2 is extremely strong, pretty much as strong
as a homoskedastic error assumption. So if that assumption didn’t hold in real life,
then errors would not be homoskedastic. The whole point of doing heteroskedastic-
robust errors, is that it is hard to justify these sorts of homoskedastic assumptions so
instead in practice we almost always use a robust standard error, which is the more
conservative approach.
(So then is knowing how to solve (b) a waste of time? No because there are other
cases where this type of solution is helpful, and it is good to know the ins and outs of
regression in general. Also we should be aware that the opposite could happen, where
we transform a variable and it induces heteroskedasticity, rather than eliminates it.)

Note: This is based on Wooldridge Problem 8.2

3 Missing Data
3.1. Say we are interested in analyzing the relationship between crime and campus and student
enrollment:

log(crime) = β0 + β1 log(enroll) + u

However, the only schools that show up in our sample are those that reported crimes. In
other words we are missing data from the schools who did not report crimes.

(a) What types of missingness would be “ok” in the sense that we wouldn’t be worried
about it biasing our regression analyses?
For the “named” types, “Missing Completely at Random” and “Missing at Random”.
In a general sense, any missingness that is not associated with the error term is “ok”!

(b) Describe a likely/possible reason why the types of missingness you listed in (a) would
not hold for this example.
All we need to do is think of an example where missingness is associated with the
error term. There are two possibilities that come to mind. One is that schools in
dangerous locations may be incentivized not to report crime. Therefore it is not a
random sample, more schools with high u are missing. The opposite may also be
true, where if schools have no crime to report at all, they may not report crime
and not be in the sample. So that is the opposite: more schools with low u are
missing. (Note that this example could easily be extended to measurement error
as well: schools with more crime fudge their numbers downwards, which means the
measurement error would be related to the error term u.).

Note: This is based on Wooldridge Problem 9.5

4 Measurement Error
4.1. We often assume that measurement error has zero mean, which is usually justifiable. How-
ever, let’s consider the case where it does not hold. Say we observe y but the underlying

Page 4 Compiled on 2023/12/08 at 14:03:39


model is:

yi∗ = β0 + β1 xi + ui
yi = yi∗ + eyi

Where E[eyi |x] = c, where c is some constant, but eyi otherwise follows classical, random,
error. (Remember that classical error would state that Cov(eyi , x∗i ) = 0, and also that
here we don’t have measurement error in x so, xi = x∗i .). Also, for simplicity assume
MLR.4 holds, i.e., E[ui |xi ] = 0.

(a) How exactly would our estimates for β0 and β1 be biased in this case?
Remember from way back when, on problem set 2, problem 1.2, we did something
similar. It turns out only the estimate of the constant term (β0 ) is affected. To show
this, let, let’s start by just spelling out the feasible regression:

yi∗ = β0 + β1 xi + ui
yi − eiy = β0 + β1 xi + ui
yi = β0 + β1 xi + ui + ey

Remember, we have bias problems when E[eyi |x] ̸= 0. So we want to try to rearrange
things so we get an error whose expectation is zero. Let’s substitute vi = eiy − c.
Note that E[vi |x] = 0.

yi = β0 + β1 xi + ui + vi + c
yi = (β0 + c) + β1 x + ui + vi
yi = β0′ + β1 xi + u′i

For this recasting of the regression, E(u′i |x) = 0, due to our substitution, the as-
sumption of classical measurement error, and MLR.4. However our estimate of β0
is affected, and now we are recovering β0′ instead, which is shifted. So all in all, it
will bias the constant term which we don’t usually care about anyways, so this sort
of non-zero-mean measurement error in y is not a big deal. When we think about
measurement error we care more in how it varies than if it is mean zero, unless it
were a case we really cared about estimating the intercept term without bias.

4.2. Say we want to explain weekly hours of a child’s television viewing with the following
model:

tvhours∗ = β0 + β1 age + β2 age2 + β3 educmother + β4 educf ather + β5 n siblings + u

We are worried that we actually don’t observe the true value, tvhours∗ , because we are
using a survey. Instead we only observe tvhours. (tvhours = tvhours∗ + etv ).
In general: note that this is error in the DEPENDENT variable only, and a different case
would be error in the x (independent) variables. Practice exam will have an example
problem for this other case, and also see lecture slides.

Page 5 Compiled on 2023/12/08 at 14:03:39


(a) State the classical errors-in-variables (CEV) assumption for this model.
We need CEV here in order to not have a biased regression. Remember that CEV
stipulates that the covariance between x∗ and e is zero, i.e., that the measurement
error is not related to your underlying explanatory variables. This measurement error
could be in the y variable, or any of the x’s. Formally the CEV assumption means
whichever variable you are talking about, that its measurement error is independent
of the true x values.
In our case here, we only have measurement error in our y variable. No measure-
ment error in x is equivalent to saying x = x∗ . So therefore, the CEV for this
case can be stated as: Cov(age, etv ) = Cov(age2 , etv ) = Cov(educmother , etv ) =
Cov(educf ather , etv ) = Cov(nsiblings , etv ) = 0
(b) For this regression, describe a likely/possible reason why this CEV assumption may
not hold.
We need to think of an example where measurement error is associated with at least
one x variable.
The first thing that comes to mind is that tv viewing is likely a self-reported variable
on a survey. People who don’t watch any tv will report very accurately, however those
who watch a lot of tv may have more measurement error, and likely underestimate
their tv viewing. This would mean that the measurement error is associated with all
of the x variables.
(The easiest thing for these sorts of problems is to think about how the error may be
associated with y itself. Because if it is associated with y itself, then it is associated
with all of the x. Also note that covariance means we need the direction of the error
to be associated with x. It is not enough to say that certain x will be more accurate
than others. It has to be that certain x will be more/less accurate, but in a particular
direction to induce that covariance.)

Note: This is based on Wooldridge Problem 9.4

Page 6 Compiled on 2023/12/08 at 14:03:39

You might also like