Panel Data Regression
Panel Data Regression
i = 1, . . . , n and t = 1, . . . , T,
Selected observations on cigarette sales, prices, and taxes, by state and year
for U.S. states, 19851995
152
Terminology:
A balanced panel is a panel that has all its observations
(focus of this lecture)
An unbalanced panel is a panel that has some missing data
for at least one time period or for at least one entity
154
Preliminary analysis:
In a first step we focus on the two years 1982 and 1988 and,
for each year, perform an OLS regression of FATALITYRATE on
BEERTAX
The estimated regression equations (neglecting subscripts)
for the 1982 and 1988 data along with the standard errors
(in brackets) are given by
\
FATALITYRATE
=
(6.1)
(6.2)
(0.15) (0.13)
\
=
FATALITYRATE
(0.11) (0.13)
155
156
158
Problem:
Some of these variables (such as the cultural acceptance of
drinking and driving) might be hard or even impossible to
measure
Possible resort:
If these factors remain constant over time in a given state,
then we make use of the panel data structure to effectively
hold these factors constant even though we cannot measure
them
OLS regression with fixed effects
159
More explicitly:
Consider the variable Zi with the following properties:
Zi determines the fatality rate in the ith state
Zi does not change over time
(no time-subscript t)
For example, Zi could represent the local cultural attitude
towards drinking and driving which changes slowly
(we consider it to be constant between 1982 and 1988)
Regression equation:
FATALITYRATEit = 0 + 1 BEERTAXit + 2 Zi + uit
(6.3)
with i = 1, . . . , n and t = 1, 2
161
Now:
Zi does not change over time
Zi does not produce any change in FATALITYRATE between
1982 and 1988
We eliminate the impact of Zi by analyzing the change in
FATALITYRATE between the two periods
More precisely:
Specifying the regression changes in Eq. (6.4) eliminates the
effect of the unobserved variables Zi that are constant over
time
Analyzing changes in Y and X has the effect of controlling
for variables that are constant over time thereby eliminating
this source of omitted variable bias
Consider the change in the fatality rate between 1982 and
1988 against the change in the real beer tax between 1982
and 1988 for the 48 U.S. states
164
165
Empirical results:
OLS estimation results:
\
FATALITYRATEi1988
FATALITYRATEi1982
= 0.072 1.04 (BEERTAXi1988 BEERTAXi1982) (6.5)
(0.065) (0.36)
Intercept in Eq. (6.5) allows for the possibility that the mean
change in the fatality rate, in the absence of a change in the
real beer tax, is nonzero
The negative intercept (0.072) could reflect improvements
in auto safety from 1982 to 1988 that reduced the average
fatality rate
166
Remarks: [continued]
If these factors change over time and are correlated with
the real beer tax, then their omission will produce omitted
variable bias
More careful analysis in Section 6.5
Transference of the ideas valid for T = 2 to more than 2
time periods (T > 2)
Method of fixed effects regression
168
More explicitly:
Consider the regression model (6.3) from Slide 161:
Yit = 0 + 1 Xit + 2 Zi + uit,
(6.6)
(6.7)
1 when i = 2
, . . . , Dn,i =
0 otherwise
1 when i = n
0 otherwise
172
(6.8)
Now:
Extension to multiple X-regressors
Definition 6.2: (Fixed effects regression model)
The fixed effects regression model is
Yit = 1 X1,it + . . . + k Xk,it + i + uit,
(6.9)
where i = 1, . . . , n and t = 1, . . . , T and 1, . . . , n are the entityspecific intercepts. Equivalently, the fixed effects regression
model can be written in terms of a common intercept, the Xregressors and the n 1 dummy variables defined on Slide 172:
Yit = 0 + 1 X1,it + . . . + k Xk,it
(6.10)
174
i + i + u
i = 1 X
i
Y
PT
i and u
with Yi = (1/T ) t=1 Yit and X
i similarly defined
It follows from Eq. (6.7) that
i i u
= 1 Xit + i + uit 1 X
Y
i
Y
| it {z }i
it
Y
i) + (uit u
= 1 (X
X
)
it
|
{z
}
{z i }
|
it
X
it + u
it
= 1 X
uit
(6.11)
176
\
FATALITYRATE
= 0.66 BEERTAX + StateFixedEffects
(0.29)
The sign of 1 is negative and the coefficient is significant
at the 5% level
Including state fixed effects avoids omitted variable bias arising from omitted factors that vary across states but are constant over time
What about the effects of omitted factors that evolve over
time but are the same for all states?
(for example, overall automobile safety improvements)
Regression with time fixed effects
178
(6.12)
(6.13)
(6.14)
1 when t = 2
, . . . , BT,t =
0 otherwise
1 when t = T
0 otherwise
(6.15)
Now:
Combination of entity and time fixed effects
Definition 6.3: (Entity and time fixed effects regression model)
The fixed effects regression model is
(6.16)
Yit = 1 X1,it + . . . + k Xk,it + i + t + uit,
where 1, . . . , n are the entity fixed and 1, . . . , T time fixed
effects. Equivalently, the entity and time fixed effects regression
model can be written in terms of a common intercept, the Xregressors and the n 1 and T 1 dummy variables defined on
Slides 172, 181:
Yit = 0 + 1 X1,it + . . . + k Xk,it
+ 2 D2,i + . . . + n Dn,i
+ 2 B2,t + . . . + T BT,t + uit.
(6.17)
182
Remark:
The combined entity and time fixed effects regression model
eliminates omitted variables bias arising both from unobserved variables that are constant over time and from variables that are constant across states
Parameter estimation:
The full model (6.17) can in principle be estimated by OLS
Most software packages implement a two-step algorithm using entity and time-period demeaned Y and X-variables
183
\
FATALITYRATE
= 0.64 BEERTAX + SF Effects + TF Effects
(0.36)
185
186
i = 1, . . . , n, t = 1, . . . , T.
Remarks:
Definition 6.4 focuses on entity fixed effects regressions neglecting time effects
An extension for including time fixed effects is straightforward
Standard errors for fixed effects regression:
Autocorrelated errors are a pervasive phenomenon in data
with a time component
(see Section 3.1.2. on Slides 48, 49)
In the case of autocorrelated errors standard errors should
be computed using the HAC estimator of the variance
One type of HAC errors are clustered errors used in the
traffic-fatality dataset
188