Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Takehome - Exam DiD and RDD

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Take Home Exam - Econometrics III

December 2023

1 Difference-in-Differences
For this exercise, you need to open the database takehome DD.dta. This dataset
describes an observational study with units “i” entering a treatment at different
moments in time “t”, i.e., in a staggered setting. Our aim is to estimate the
treatment effect of an intervention (the variable “treatment”) on an outcome “Y”.
As the units entering the treatment at a given moment (“t”=“g”) might be selective
and we do not have enough control variables to control for the difference in
composition in a credible way, we want to implement a difference in-differences
estimator, which can account for unobserved heterogeneity under the assumption
of a parallel trend (in the dataset the variable “g” indicates the moment when an
individual enters the treatment, “g” =0 for never treated). Answer the following
questions.

1
1. Provide a table describing the staggered setting: How many never treated
units do we have? How many treated units are there by the end of the data
observational periods?

tabulate g

This are the number of units we have. However, each individual is shown 6 times,
from t=0 to 5; so if we would be interested in knowing the unique units we just
have to divide by 6 each element, the percentage remains unchanged.

g| Freq. Percent Cum.

------------+-----------------------------------

0| 53 10.60 10.60

1| 119 23.80 34.40


2| 104 20.80 55.20

3| 71 14.20 69.40

4| 53 10.60 80.00

5| 100 20.00 100.00


------------+-----------------------------------
Total | 500 100.00
The number of treated is 500-53(g=0) =447. The number of never treated is
53. If we would be interested in the number of treated in period t i we can

2
just multiply the accumulated rate at time t minus the rate of treated
multiply by the total population. Or tabulate treatment variable over time is
also an option.

Here we have the split by time.

2. Estimate the treatment effect by implementing the standard diff-in-diff(two-


way fixed effect) estimator, assuming that the parallel trend holds
unconditionally and the treatment effect is homogeneous (tip: remember to
control for all group levels, keeping the never-treated as the omitted
category). What effect do we find? (note, clustering the standard errors by
individual identifier produces unreliable SEs due to the few number of units
in each cluster - therefore do not cluster the SEs in this exercise)

For the TWFE DiD I am going to use the following set up according to some
academic literature: https://bookdown.org/mike/data_analysis/two-way-
fixed-effects.html and Athey, Susan, and Guido W Imbens. 2022. “Design-
Based Analysis in Difference-in-Differences Settings with Staggered
Adoption.” Journal of Econometrics 226 (1): 62–79.

3
Yit (1)/E[Ygt/ g= gi, t= ti, treatment=0?1]=

E[Ygt/ g= gi, t= ti, treatment=1]- E[Ygt/ g= gi, t= ti, treatment=0]

E[Yigt]=αg+γt+β*treatmentgt+ϵigt+c

Our regression does not include the interaction D*T like it is shown in the
class PowerPoints, our dummy variable treatment classify treated when
treated vs non treated. In this way we just make correct comparisons
between treated groups once treated with groups not treated yet or never
treated. However due to the fact that we are adding specific time effects,
we are avoiding the combination of comparing a group with itself when it
was not treated. We make comparisons within the same period. So, it
seems reasonable, that the variable treatment could capture ATT effect.

I have doubts if using the interaction D*T would be a correct option too. I
think that once we include the time fixed effects and groups effects just the
treatment (Post indicator) is enough to identify ATT.
If we interact our ATT variable estimator treatment with a variable such
that is 0 always for never treated and 1 always for treated. We get the
following comparisons.

1.Treatment =1 & D=1


2.Treatment =0 & D=1

3.Treatment =1& D=0


4.Treatment =0 & D=0
The option 3 we do not have it in our data.

After counting for unobserved heterogeneity and no anticipation


assumption the interactions between 2 vs 4 options will cancel each other.
1 vs 4 is the treatment effects against g=0. When adding time and groups
effects, just comparisons with the same time and other group is done.
Iteration of option 1 against option 2 is the same we are doing for the
regression with just treatment.

We are going to see the two regressions; however, I will consider the first
option is the valid but both they should be very similar under two-way

4
effects design. The problem is that these two variables add collinearity to
the regression.
We have:
reg Y i.g i.t treatment

When we add the option i. it Stata already drops automatically one of the units to
avoid collinearity as we can see in the table. In our case it drops the omitted
variable g=0 and t=0.

i.g should capture specific group effects, i.t should capture specific time effects,
and treatment should capture ATT. It seems a bit strange the result of the
treatment effect because the parameter is slightly not significative at 5%, but
it might be true. However, these results are coherent with the assumption that
the treatment effect is constant over time, groups and individuals, confounders
or variable heterogeneity might be present. We can observe that the specific
group and time fixed effect increase over time. It might be also that we are not
capturing well the treatment effects. The intention of this regression is to get
the average outcome by group and period.

5
The result of the constant is very reasonable we can manually compute the
following operation to verify or check the graph:

1
𝑁𝑔
∑𝑌⃛ (𝑔,𝑡=0)

sum(subsets[1,])/6
[1] 2.69253

More or less 2 is the difference between each group in terms of unobserved heterogen
eity. We can see in the graph I added at the beginning.

We can run the 2nd regression with the two variables interactions to see the difference
s and to see what happens:

reg Y i.g i.t c.treatment#c.t


I have added c. so that Stata thinks that the variables are continuous and don’t make t
he split.

The regression seems good, now our treatment effect variable is significative, we add
the interaction with time, but using c. so that our assumption of constant treatment ef
fects across time, groups and units it is preserved. We get very similar results.

6
3. Now, implement the fixed effect estimator controlling for individual
fixedeffects instead of the group fixed-effects. How do the results
change?

To do this now we are going to use the following code:


xtset i t

xtreg Y i.t treatment, fe


the number of i is very large so we must use these specifications to account
for individual and time fixed effects. We get the following results:

We have very similar results, the individual fixed effects might have
absorbed the group fixed effects in a similar way. Similar results are
obtained with this tool for g :

7
xtset g
xtreg Y i.t DT, fe

However, strange is the fact that under this settings it shows the panel data
unbalanced for g, it might refer due to the fact that there are different
number of observations per g.

4. Let’s introduce our covariate now: having a university degree. First, show
whether having a university degree is correlated with the treatment status
and might be a confounding factor if units with a university degree have a
different time effect (for simplicity, compare the never treated to the rest).
To check the correlation between the two variables we can run a logistic
regression. logistic university i.treatment

8
The coefficient for '1.treatment' is approximately 0.3539 which indicates the odds
ratio for t=1. As is less than 1 this implies that being in the treated group is
associated with lower odds of having a university degree compared to the
untreated group. LR chi2(1) = 186.10, p < 0.0000). This indicates that the
treatment variable is statistically associated with the odds of having a
university degree. We can see the Note, baseline would be treatment=0
The constant is the baseline log-odds of having a university degree when the
treatment variable is 0. The odds ratio for t=0 is 1.0822. The logistic regression
results suggest a statistically significant association between the treatment
variable and the odds of having a university degree. The odds of having a
university degree are lower in the treated group compared to the untreated
group.

This is in line with the distribution of university degrees across treated and
controls:

4. ggplot(takehome_DD, aes(x = factor(takehome_DD$university), fill =


factor(takehome_DD$treatment))) +
5. geom_bar(position = "dodge") +
6. labs(title = "Distribution of University for Each Treatment
Level",
7. x = "University",
8. y = "Count") +
9. scale_fill_manual(values = c("grey", "red"), name = "Treatment")
+
10. theme_minimal()

9
We can see that there are more university=1 for the controls than the treated
when treated.
I think the variable university it has been created so that is correlated with the
treatment, let say that university is a second treatment variable for the latest
groups.
Now we can run these regressions and compare the different time effects:
reg Y i.g i.t i.treatment if university == 0

reg Y i.g i.t i.treatment if university == 1

10
These two pair of regressions look contradictory. For the first pair regressions
the idea is that if we regress filtering by treatment status and the two tables are
the same, then university has no effect. But we can see that those individuals
with university = 1 suffer much higher time effects. University is also correlated
with the treatment effect; the treatment effect has changed. For the 3rd and 4th
regression we can see that when we interact university with t we get the same
results but changed of sign. However, I have doubts that the interaction of
university=0 with t really produce that very negative results. So, I think is
misleading. The relevant here is that those interactions terms are significant
what means that these two variables are related. With the first two regression
we confirm that university produce different time effects.
But not just time effect also a higher outcome and group effects.

11
5. Add the “university degree” covariate to the two regressions (just in levels).
Do the results change? Why?

We can observe that the inclusion of the variable university has absorbed
part of the group effects. And it has improved the R squared of the model. I
was expecting it would absorb also part of the time effects, but they remain
unchanged. The residuals of the estimations due to the model has been
reduced notably after the inclusion.

xtset i t
xtreg Y i.t treatment university, fe

1.

2.

12
Sadly, it appears that there are problems of collinearity xtreg settings. For that
reason, I have tried to run the second regression with c.i . However, I am doubtful
of how congruent the specification is. The term c.i is trying to imitate the
individual fixed effects.

With the two regressions we can conclude that the effect of university is around 8.
And it is capturing some effect from the individual and time effect for the second
regression. The results change because these very big time and group effects were
not belonging to them, it was due to the covariate university that was hidden.

6. Forget about the covariates for now. Do we have a problem of


negativeweights due to the staggered setting, as described in de
Chaisemartin and D’Haultfoeuille (2020)? Why is that?

13
We can see in the second estimation that we have a problem with negative
weights. The problem of negative weights appears due to using groups as
controls when already treated and their treatment effect is used to retrieve
the counterfactual of the units becoming treated.
However, I am not able to identify what kind of combinations the command
twoweights have made looking at the number of comparisons and the
number of negative weights it has achieved. For me there are more than 15
DiD to be done.

7. Estimate the treatment effect with a staggered diff-in-diff method, imposing


that the parallel trend assumption holds unconditional on the covariates (i.e.,
the time effect is the same for everybody). Use the Callaway and Sant’Anna
(2021) estimation method.

Way 1#

Now we are assuming that treatment is not homogeneous there are


treatment effects dynamics. We get the following using the Callaway and
Sant’Anna commands:

14
Here it has calculated 25 2X2 DiD.
Unless mistaken there are t*(n-t)+t good comparisons per period counting
combinations with never treated if comparisons are done t as after. Without
counting never treated there must be 20 good comparisons in total, if they
aggregating by time the effects to make just unique comparisons between
groups, then less could be done. I am applying probably the same rules for
my interpretation that the bacon decomposition. I am considering that a
group gi is valid to be compared in after t=ti=a and comparisons are done
time by time period.

15
In the table of Callway is strange that it makes the combination g0/g1 five
times but not with the rest, it might mean that t_0_1 at gi is the never treated
comparisons for each group, however I am not sure of the comparisons they
have made. For this reason, I have calculated myself the table and using my
own regression specifications with the following code in R:

1. #Data ----> takehome_DD


2. time = 1:5
3. g = 1:5
4. result <- data.frame(Number = numeric(), Name = character(),Weight =
numeric(), stringsAsFactors = FALSE)
5.
6. for (timei in time) {
7. for (gi in g[g <= timei]) {
8.
9. # Interactions with g=0
10. if (gi == timei) {
11. beta <- as.numeric(coefficients(lm(Y ~ treatment+factor(g) +
factor(t), data = subset(takehome_DD,t == timei & g == gi | g == 0)))[2])
12. w <- nrow(subset(takehome_DD, t >= timei &g == gi))
13. if (is.na(beta)) {
14. beta <- 0
15. }
16. namebeta <- paste("Y", "_g", gi, "_o", 0, "_t>a", sep = "")
17. result <- rbind(result, cbind(Number =as.numeric(beta), Name =
namebeta,Weight = w))
18. }
19.
20. # Interactions with g>0
21. o = g[g > timei]
22. for (oi in o) {
23. beta <- as.numeric(coefficients(lm(Y ~ factor(g) + factor(t)+
treatment, data = subset(takehome_DD, t == timei & g == gi | g ==
oi)))[4])
24. w <- nrow(subset(takehome_DD, t == timei & g == gi | g == oi))
25. namebeta <- paste("Y", "_g", gi, "_o", oi, "_t", timei, sep =
"")
26. result <- rbind(result, cbind(Number =as.numeric(beta), Name =
namebeta,Weight = w))
27. }
28. }
29. }
30. result[,3]<- as.numeric(result[,3])/sum(as.numeric(result[,3]))
31.
32. result
33. cat("Sum Y:", sum(as.numeric(result[, 1])), "\n")
34. cat("Sum of weights:", sum(as.numeric(result[, 3])), "\n")
35. cat("Sum Weighted:", sum(as.numeric(result[, 3]) *
as.numeric(result[, 1])), "\n")

16
I have done 20 interactions for g>0, calculating the treatment effect in t=ti
and 5 interactions with g=0 for gi for time t= a . We get the below results:

So, 32 is the accumulated treatment effect if we count for treatment


dynamics effects. I am not sure if I the weights are correctly calculated. I was
looking the way as in the session 03. Staggered but I am not sure how to
apply that way under these settings:

When we have multiple interactions with gi I don’t know how I should split
by the different periods. I just tried to weigh the relative size of the treated
group. Also, I wonder what the difference between is using the weights while
doing the regression versus adding them after it.

Doing the weighted sum, we can get the overall treatment effect:
cat("Sum Weighted:", sum(as.numeric(result[, 3]) * a
s.numeric(result[, 1])), "\n")

17
Sum Weighted: 0.9083953

Way 2 #

reg Y i.g i.t i.e


e is our treatment indicator for the treatment effect dynamics. Calculated as:

gen e=t-g+1
replace e=0 if treatment==0

when t>g our observations are treated, and we have positive value. When g>t our
observations are not treated we have negative values. To all the possible control
groups we assigned 0.
We sum one so that we move forward one all the treated and to get a scale from 0
to 5 that match the time. e = 1 is the dynamic effect for those groups that have
suffered the effect for 1 time period. e=5 the dynamic effect for those groups that
have suffered the effects during 5 periods, this just happens to g=1 when t=5.

lincom (_b[i1.e]+ _b[i2.e]+ _b[i3.e]+ _b[i4.e]+ _b[i5.e])/5

18
We get very different results under the two different methods.

8. What is the overall treatment effect for all the periods and groups? Are the
placebo tests suggesting that we can trust the (unconditional) parallel trend
assumption?

estat event

These would be the results under the Callway command. We can see
treatment effects aggregated by time and group.
The overall treatment effect that has been calculated before:

19
cat("Sum Weighted:", sum(as.numeric(result[, 3]) * as.num
eric(result[, 1])), "\n")
Sum Weighted: 0.9083953

It is difficult to set up a good placebo test that does not produce collinearity when
I make the regression and is still a relevant test. I wanted to do other tests but
they have problems of collinearity. We have:

gen placebo_dg = runiform()


replace placebo_dg = 0 if t-g<= 0

reg Y i.g i.t i.g i.e c.placebo_dg

We can see how the variable placebo is significative. Parallel trend assumption
might not hold. Our placebo variable more or less mimic the behavior of the
treatment before the treatment period but with random noise, but after the

20
inclusion it becomes significative. Our placebo variable indicates treatment with
0, and no treatment with random noise, so that it does not produce collinearity.
Another placebo test that I have done is the following:
reg Y i.g i.t i.e if treatment==0
We filter by non-treated but e is the treatment indicator if e becomes not
significant then it would be a good signal that parallel assumption hold.

Sadly we have collinearity again.

9. Let’s now relax the parallel trend assumption to hold only conditional on
education (people with university degrees can have a differential trend).
Implement the doubly robust estimation method proposed by Callaway and
Sant’Anna (2021). What is now the average treatment effect and the treatment
effect dynamics? Do the placebo tests confirm the reliability of the (conditional)
parallel trend assumption?

I suppose that we are still under the assumption of treatment effect dynamics.

21
We run this regression again as in previous exercise:
logistic e university
predict propensity
gen weight = 1 / propensity

We can plot the distribution of the propensity scores:

kdensity propensity if e == 0, color(blue) addplot(kdensity propensity if e >0,


color(red))

We can now run this regression to have our DiD-IPW + RA:


reg Y i.g university##e i.t##university [aweight=weight ]

22
I have added weights based on ps, I have interacted university with time given that
it was affecting specific time effects and could be justified theoretically. Also, the
interaction of university with the treatment. In the coefficients of e we can see the
new treatment effects dynamics.

23
Running again:
lincom (_b[i1.e]+ _b[i2.e]+ _b[i3.e]+ _b[i4.e]+ _b[i5.e])/5

We get that 4 is the overall effect.

Without interacting time and university we would have get the following results:
. reg Y i.g university##e i.t [aweight=weight ]

24
Our interaction is absorbing part of the treatment dynamics, if we consider that
university is not affecting specific time effects, then this regression is the
appropriate. The fact that we are getting that negative results for the treatment
dynamics doesn’t seem good. However, if we consider the time after graduation
and the outcome would be the salary then it could be related.
Now, we proceed to run the placebo test again, I will keep this model for the test;
. reg Y i.g university##e i.t [aweight=weight ] ; Although we got doubtful negative
results for the treatment effect I will continue with this model since looks more
standard and less saturated.

The placebo test:


. reg Y i.g university##e i.t placebo_dg [aweight=weight ]

Our placebo became not significant. If our placebo variable is well designed (it was
created with the objective to test but skipping the collinearity) then it is
supporting the parallel trend assumption.

25
2 Regression Discontinuity Design
For this exercise, you need to open the database takehome RDD.dta. In this
observational study, we have an administrative rule that encourages units “i” to
enter the treatment “D” if the value of their forcing variable “Z” is above the
threshold of 13. The policymaker can only encourage treatment participation
above that threshold, and some units enter the treatment also below the threshold
(“always takers”), while some units do not enter even if they are above the
threshold (“never takers”). Answer the following questions to estimate the
treatment effect on the outcome variable “Y” by implementing a Regression
Discontinuity Design using the continuity-based approach.

1. Visually show the evolution of the outcome “Y” and the treatment “D”over
the forcing variable “Z” using the data-driven regression discontinuity plots.
Use the mimicking variance evenly spaced method with spacing estimators.
rdplot Y Z,c(13) bwselect(*mv) evenly

26
We can see a very big jump in c=13

2. Estimate how successful the administrative rule was in encouraging


treatment participation by implementing a sharp RDD estimator on the
treatment “D” using local linear regression and an optimally chosen
bandwidth with robust bias-corrected confidence intervals. What is the
confidence interval?

rdrobust Y Z , c(13) fuzzy(D) p(4) bwselect(mserd)

27
bwselect(mserd) option selects a single MSE-optimal bandwidth for the
entire RD design. It aims to find the bandwidth that minimizes the mean
squared error of the treatment effect estimator. We can see the

The regression Y on D gives us the effect on participation that is 0.49 we can


see the biased CI in the first row, and the unbiased CI for the estimator in the
second row. To see if this effect is big, we can check the specific context.
Just to see the length of the band width that the previous regression has
applied we can run:

rdbwselect Y Z, c(13) fuzzy(D) bwselect(mserd)

28
3. Estimate the intention-to-treat effect (i.e., the effect of being encouraged to
take up the treatment) by implementing a sharp RDD estimator on the
outcome “Y” using the same estimation approach. What is the confidence
interval?

rdrobust Y Z , c(13) fuzzy(D) p(4) bwselect(mserd)

We can see in the previous image that the ITT is 15, and as in the case of first
stage effect we can see the biased CI in the first row and the biased adjusted
CI in the second row. More o less this result is coherent with what we see in
the graph.

Just to check if the results we get makes sense we can do the following fast
calculations:
The bandwidth is around 2, if we make a fast estimation of E(Y/Z=c -)=
E(Y/Z=c-)= calculation:
> (Y0<-aggregate(Y~D,subset(takehome_RDD_1_,11<=Z & Z<=1
3),mean))
D Y
1 0 5.429758
2 1 20.838739
> (Y1<-aggregate(Y~D,subset(takehome_RDD_1_,15>=Z & Z>=1
3),mean))
D Y
1 0 5.388968
2 1 20.763499
>
> Y1[2,2]-Y0[1,2]
[1] 15.33374

These calculations are in line with the formal calculations.

4. Estimate the local average treatment effect (i.e., the effect on compliers) by
implementing a fuzzy RDD estimator.
We can divide the first stage effect and ITT and we get:
> 15.154/0.49231
[1] 30.78142

29
We get that the fuzzy estimator is 30. After assuming that treatment effect is
homogeneous at c, the fuzzy estimator captures the ATE at c or assuming
monotonicity it can be regarded as the LATE for compliers at c.

However, this result looks to high we can do another check to see if it makes
sense:

(Y0D <- aggregate(Y ~ D, subset(takehome_RDD_1_, 11 <= Z & Z <= 13),


+ function(column) quantile(column, 0.01)))

3. D Y
4. 1 0 -1.591778
5. 2 1 13.526899

(Y1D <- aggregate(Y ~ D, subset(takehome_RDD_1_, 13 <= Z & Z <= 15),


+function(column) quantile(column, 0.99)))

9. D Y
10. 1 0 12.27765
11. 2 1 27.64276
14. > Y1D[2,2]-Y0D[1,2]
15. [1] 29.23454

We have that percentile 0.99 for the treated – percentile 0.01 for the control
at c+ bandwidth is 29.23 which is lower than our LATE. So, these results we
have got for the fuzzy RDD estimator don’t look too good at first sight. The
estimations we got for our regressions have too wide CI. Our estimations
might be not that precise.

5. Add the covariate university degree. Does the point estimate change
substantially? What about the standard errors?
We can run this:

rdrobust Y Z , c(13) fuzzy(D) p(4) bwselect(mserd) covs(university)

30
We can see how we were overestimating our results. The sharp RDD has
been reduced to 14. Now the SE for the ITT has been reduced very
significantly. However, the bias BW has increased. Now the fuzzy RDD
estimator gives:
> 14.694/0.49231
[1] 29.84705

6. Let’s now run some validation tests in support of the identifying assumption
of the RDD estimator. Run a density test on “Z” to see if there is a sign of
manipulation, i.e., the units tried to take up/avoid the treatment
encouragement by systematically crossing the threshold.

rddensity Z, plot c(13)

31
We can see how there is an increase in the density of Z, when we are
approaching the cutoff and once, we cross it the density starts to get reduced.

This might indicate some imperfect manipulation;


We can run these two other commands to check:

rddensity Z if D==1, plot c(13)

rddensity Z if D==0, plot c(13)

32
Here it is much more visible the manipulation. In the merged graph some
effects were offsetting each other within treated and controls.
7. As a further test for manipulation, check if at the same cutoff point theunits
change their observable characteristics by implementing a sharp RDD
estimator on the covariate “university” using the estimation approach of
question 2. Show it also visually as in question 1.

rdplot university Z,c(13) bwselect(*mv) evenly

We can see also a small jump in Z for X. There is a small change in observable
characteristics after crossing the cutoff.
We can run a sharp RDD estimation:

rdrobust university Z , c(13) p(4) bwselect(mserd)

33
There is a change of 0.004 in university points, but it is not a very big amount.
But again, we have very wide CI and big SE.

stvar <- function(x) {return((x - mean(x)) / sd(x)) }

before_cutoff <- subset(takehome_RDD_1_, 11<Z& Z < 13)$university


after_cutoff <- subset(takehome_RDD_1_,13<Z& Z < 15)$university

plot(density(stvar(before_cutoff)),main="Density Distribution of
university (Before and After the Cutoff)")
lines(density(stvar(after_cutoff)), col="red")

34
Probably the best way to assess this fact is to check the density distribution
of individuals before and after within the bandwidths. We can see that both
has different densities, so it means that crossing the cutoff change the
observable characteristics.

8. Implement some falsification tests by estimating some sharp RDD


estimators on the outcome “D” and on “Y” but using placebo/fake cutoff
points (c(x) with x!=13). Rely on sharp estimators for both the effect on “D”
and on “Y” (the fuzzy estimator is unreliable since small effects on “D” make
the denominator of the Wald Estimator blow up). Do the falsification tests
pass (use 5% p-value as a rejection threshold)?
We can try the following tests:
rdrobust Y Z , c(12) p(4) bwselect(mserd) covs(university) fuzzy(D)

rdrobust Y Z , c(14) p(4) bwselect(mserd) covs(university) fuzzy(D)

35
One of the regression coefficients became significative and the other not.
However, they both became less significant and if we chose cut off points
farther from the cutoff point they would become probably less significant.
Also, the values of the coefficients are very small, there are no significant
jumps in other points out of the cutoff.

36

You might also like