Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
251 views

Panel Data Regression

This document discusses panel data and fixed effects regression models. It begins by defining panel data as observations on the same entities over multiple time periods. This allows researchers to control for omitted variables that differ across entities but remain constant over time through fixed effects models. The document then provides an example comparing traffic fatality rates and beer taxes in US states from 1982-1988. By analyzing changes over time, the fixed effects model controls for cultural factors that impact fatalities but do not change. The empirical results show higher beer taxes significantly reduce fatality rates. In summary, the document introduces panel data, uses a traffic fatalities example to illustrate fixed effects models, and finds higher beer taxes decrease fatality rates when controlling for time-invariant state factors

Uploaded by

Shejj Peter
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
251 views

Panel Data Regression

This document discusses panel data and fixed effects regression models. It begins by defining panel data as observations on the same entities over multiple time periods. This allows researchers to control for omitted variables that differ across entities but remain constant over time through fixed effects models. The document then provides an example comparing traffic fatality rates and beer taxes in US states from 1982-1988. By analyzing changes over time, the fixed effects model controls for cultural factors that impact fatalities but do not change. The empirical results show higher beer taxes significantly reduce fatality rates. In summary, the document introduces panel data, uses a traffic fatalities example to illustrate fixed effects models, and finds higher beer taxes decrease fatality rates when controlling for time-invariant state factors

Uploaded by

Shejj Peter
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

6.

Regression with panel data


Key feature of this section:
Up to now, analysis of data on n distinct entities at a given
point of time
(cross sectional data)
Example:

Student-performance data set


Observations on different schooling characteristics in n =
420 districts (entities)

Now, data structure in which each entity is observed at two


or more points of time
Panel data
150

6.1. Structure of panel data sets


Definition 6.1: (Panel data)
Panel data consist of observations on the same n entities at two
or more time periods T . If the data set contains observations
on the independent variables X1, X2, . . . , Xk and the dependent
variable Y , then we denote the data by
(X1,it, X2,it, . . . , Xk,it, Yit),

i = 1, . . . , n and t = 1, . . . , T,

where the first subscript, i, refers to the entity being observed


and the second subscript, t, refers to the date at which it is
observed.
151

Selected observations on cigarette sales, prices, and taxes, by state and year
for U.S. states, 19851995

152

Terminology:
A balanced panel is a panel that has all its observations
(focus of this lecture)
An unbalanced panel is a panel that has some missing data
for at least one time period or for at least one entity

Description of example data set:


Traffic deaths and alcohol taxes
(State Traffic Fatality (STF) data set)
How effective are various government policies designed to
discourage drunk driving in reducing traffic deaths?
153

Description of example data set: [continued]


Annual data between 19821988 for 48 U.S. states
(excluding Alaska and Hawai)
Important variables:
FATALITYRATE is the number of annual traffic deaths per
10000 people in the population in the state
BEERTAX is the real tax on a case of beer put in 1988
U.S. dollars by adjusting for inflation
Various dummy variables indicating state-specific characteristics such as legal drinking age and punishment

154

Preliminary analysis:
In a first step we focus on the two years 1982 and 1988 and,
for each year, perform an OLS regression of FATALITYRATE on
BEERTAX
The estimated regression equations (neglecting subscripts)
for the 1982 and 1988 data along with the standard errors
(in brackets) are given by

\
FATALITYRATE
=

2.01 + 0.15 BEERTAX

(6.1)

1.86 + 0.44 BEERTAX

(6.2)

(0.15) (0.13)

\
=
FATALITYRATE

(0.11) (0.13)

155

The traffic fatality rate and the tax on beer

156

Preliminary analysis: [continued]


The OLS estimate 1 for the 1982 data is not significant at
the 10% level
(the t-statistic is 1.15 < 1.64)
The OLS estimate 1 for the 1988 data is significant at the
1% level
(the t-statistic is 3.43 > 2.58)
Both OLS estimates are positive what, taken literally, implies
that higher real beer taxes are associated with more (not
fewer) traffic fatalities
Indication of substantial omitted variable bias
157

Preliminary analysis: [continued]


Some potentially neglected state-specific factors:
Quality of automobiles driven in the state
Quality of state highways
Rural versus urban driving
Density of cars on the road
Cultural acceptance of drinking and driving

158

Problem:
Some of these variables (such as the cultural acceptance of
drinking and driving) might be hard or even impossible to
measure

Possible resort:
If these factors remain constant over time in a given state,
then we make use of the panel data structure to effectively
hold these factors constant even though we cannot measure
them
OLS regression with fixed effects
159

6.2. Panel data with two time periods: beforeand-after comparisons


Aim of this section:
Provision of intuition on how we can exploit the panel data
structure to mitigate the omitted-variable-bias problem
Approach:
We consider a panel with T = 2 time periods
We focus on changes in the dependent variable
This before-and-after comparison holds constant the unobserved factors that differ from one state to the next but do
not change over time within the state
160

More explicitly:
Consider the variable Zi with the following properties:
Zi determines the fatality rate in the ith state
Zi does not change over time
(no time-subscript t)
For example, Zi could represent the local cultural attitude
towards drinking and driving which changes slowly
(we consider it to be constant between 1982 and 1988)
Regression equation:
FATALITYRATEit = 0 + 1 BEERTAXit + 2 Zi + uit

(6.3)

with i = 1, . . . , n and t = 1, 2
161

Now:
Zi does not change over time
Zi does not produce any change in FATALITYRATE between
1982 and 1988
We eliminate the impact of Zi by analyzing the change in
FATALITYRATE between the two periods

Derivation of the change:


Regression equations for each time period:
FATALITYRATEi1982 = 0 + 1 BEERTAXi1982 + 2 Zi + ui1982
FATALITYRATEi1988 = 0 + 1 BEERTAXi1988 + 2 Zi + ui1988
162

Derivation of the change: [continued]


Subtraction of both regression equations:
FATALITYRATEi1988 FATALITYRATEi1982
= 1 (BEERTAXi1988 BEERTAXi1982) + ui1988 ui1982 (6.4)
Interpretation of Eq. (6.4):
Zi does not change between 1982 and 1988

Any changes in traffic fatalities over time must have arisen


from other sources
These changes are
changes in the tax on beer
changes in the error terms
(capturing changes in other factors on traffic deaths)
163

More precisely:
Specifying the regression changes in Eq. (6.4) eliminates the
effect of the unobserved variables Zi that are constant over
time
Analyzing changes in Y and X has the effect of controlling
for variables that are constant over time thereby eliminating
this source of omitted variable bias
Consider the change in the fatality rate between 1982 and
1988 against the change in the real beer tax between 1982
and 1988 for the 48 U.S. states

164

Changes in fatality rates and beer taxes, 19821988

165

Empirical results:
OLS estimation results:

\
FATALITYRATEi1988
FATALITYRATEi1982
= 0.072 1.04 (BEERTAXi1988 BEERTAXi1982) (6.5)
(0.065) (0.36)

Intercept in Eq. (6.5) allows for the possibility that the mean
change in the fatality rate, in the absence of a change in the
real beer tax, is nonzero
The negative intercept (0.072) could reflect improvements
in auto safety from 1982 to 1988 that reduced the average
fatality rate
166

Empirical results: [continued]


Estimated effect of a change in the real beer tax is negative
(as predicted by economic theory)
OLS slope coefficient of 1.04 is significant at the 1% level
(the absolute value of the t statistic is 2.89 > 2.58)
Increase in the real beer tax by 1$ per case reduces the
traffic fatality rate by 1.04 deaths per 10000 people
(substantial effect)
Remarks:
The regression Eq. (6.5) controls for fixed factors such as
cultural attitudes towards drinking and driving
There are other factors influencing traffic safety
167

Remarks: [continued]
If these factors change over time and are correlated with
the real beer tax, then their omission will produce omitted
variable bias
More careful analysis in Section 6.5
Transference of the ideas valid for T = 2 to more than 2
time periods (T > 2)
Method of fixed effects regression

168

6.3. Fixed effects regression


Now:
Method for controlling for omitted variables in panel data
when the omitted variables vary across entities but do not
change over time
The fixed effects regression model has n different intercepts,
one for each entity
These intercepts can be represented by a set of binary variables
These binary variables absorp the influences of all omitted
variables that differ from one entity to the next but are constant over time
169

More explicitly:
Consider the regression model (6.3) from Slide 161:
Yit = 0 + 1 Xit + 2 Zi + uit,

(6.6)

where Zi is an unobserved variable that varies from one state


to the next but does not change over time
(for example, Zi represents cultural attitudes toward drinking
and driving)
We aim at estimating 1, the effect on Y of X holding constant the unobserved state characteristic Z
We can interpret Eq. (6.6) as having n intercepts, one for
each entity
170

More explicitly: [continued]


Specifically, define i 0 +2 Zi, so that Eq. (6.6) becomes
Yit = 1 Xit + i + uit

(6.7)

1, . . . , n are treated as state-specific intercepts to be estimated


Population regression line for the ith state: Yit = i + 1 Xit
The slope coefficient 1 is the same for all states, but the
intercept varies from one state to the next
The intercept i can be thought of as the effect of being
in entity i
171

More explicitly: [continued]


The terms 1, . . . , n are known as entity fixed effects
The variation in the entity fixed effects comes from omitted
variables (like Zi in Eq. (6.6)) that vary across entities but
not over time
Eq. (6.7) is known as the fixed effects regression model
Representation with dummy variables:
Consider the n 1 dummy variables
D2,i =

1 when i = 2
, . . . , Dn,i =
0 otherwise

1 when i = n
0 otherwise
172

Representation with dummy variables: [continued]


Then, the fixed effects regression model (6.7) can be equivalently expressed as
Yit = 0 + 1 Xit + 2 D2,i + . . . + n Dn,i + uit,

(6.8)

where 0, 1, 2, . . . , n are coefficients to be estimated


Relationships between parameters in Eqs. (6.7) and (6.8):
1 = 0, 2 = 0 + 2, . . . , n = 0 + n
The entity-specific intercepts in Eq. (6.7) and the binary
regressors in Eq. (6.8) have the same source, namely the
unobserved variable Zi that varies across entities but not
over time
173

Now:
Extension to multiple X-regressors
Definition 6.2: (Fixed effects regression model)
The fixed effects regression model is
Yit = 1 X1,it + . . . + k Xk,it + i + uit,

(6.9)

where i = 1, . . . , n and t = 1, . . . , T and 1, . . . , n are the entityspecific intercepts. Equivalently, the fixed effects regression
model can be written in terms of a common intercept, the Xregressors and the n 1 dummy variables defined on Slide 172:
Yit = 0 + 1 X1,it + . . . + k Xk,it

+ 2 D2,i + . . . + n Dn,i + uit.

(6.10)
174

Estimation and inference:


In principle, the binary variable specification (6.10) can be
estimated via OLS
However, specificaton (6.10) requires estimation of k + n parameters what becomes problematic if the number of entities
n is large
Use of special routines for OLS estimation of fixed effects
regressions
(Two-step) entity-demeaned OLS algorithm
Subtract the entity-specific averages from each variable
Perform OLS regression using the entity-demeaned variables
175

Estimation and inference: [continued]


Example:

Consider the (single-regressor) fixed effects model (6.7)


Taking (time) averages on both sides of (6.7) yields

i + i + u
i = 1 X
i
Y
PT

i and u
with Yi = (1/T ) t=1 Yit and X
i similarly defined
It follows from Eq. (6.7) that

i i u
= 1 Xit + i + uit 1 X
Y
i
Y
| it {z }i
it
Y

i) + (uit u

= 1 (X
X
)
it
|
{z
}
{z i }
|
it
X

it + u
it
= 1 X

uit

(6.11)
176

Estimation and inference: [continued]


Example: [continued]
Estimation of 1 in Eq. (6.11) via OLS
Under certain assumptions stated on Slide 187 (the so-called
fixed effects regression assumptions)
the sampling distribution of the OLS estimator is normal
in large samples
the variance and the standard error of the sampling distribution can be estimated from the data
Hypothesis testing (based on t- and F -statistics) and construction of confidence intervals in exactly the same way
as in multiple regressions with cross-sectional data
177

Application to traffic deaths:


OLS estimate of the fixed effects regression based on all
T = 7 years of data (observations) is

\
FATALITYRATE
= 0.66 BEERTAX + StateFixedEffects
(0.29)
The sign of 1 is negative and the coefficient is significant
at the 5% level
Including state fixed effects avoids omitted variable bias arising from omitted factors that vary across states but are constant over time
What about the effects of omitted factors that evolve over
time but are the same for all states?
(for example, overall automobile safety improvements)
Regression with time fixed effects
178

6.4. Regression with time fixed effects


Now:
We aim at controlling for variables that are constant across
entities but evolve over time
(such as overall safety improvements in new cars)
To this end, we augment our regression Eq. (6.6) from Slide
170 to take the form
Yit = 0 + 1 Xit + 2 Zi + 3 St + uit,

(6.12)

where St is an unobserved variable (representing automobile


safety) that changes over time but is constant across states
Note that omitting St from the regression may lead to omitted variable bias
179

Time effects only:


Let us consider for the moment that the variables Zi are not
present, so that Eq. (6.12) becomes
Yit = 0 + 1 Xit + 3 St + uit

(6.13)

Similar to the entity fixed effects model, it is possible to


eliminate St from Eq. (6.13)
Specifically, we set t = 0 + 3 St to obtain
Yit = 1 Xit + t + uit

(6.14)

This model has a different intercept, t, for each time period


which can be thought of as the effect on Y of time period t
1, . . . , T are known as time fixed effects whose variation
stems from omitted variables (like St) that vary over time
but not across entities
180

Time effects only: [continued]


Considering the T 1 binary variables
B2,t =

1 when t = 2
, . . . , BT,t =
0 otherwise

1 when t = T
0 otherwise

we can equivalently express model (6.14) as


Yit = 0 + 1 Xit + 2 B2,t + . . . + T BT,t + uit,

(6.15)

where 0, 1, 2, . . . , T are coefficients to be estimated


Relationships between parameters in Eqs. (6.14) and (6.15):
1 = 0, 2 = 0 + 2, . . . , T = 0 + T
(see Eqs. (6.7) and (6.8) on Slides 171, 173)
181

Now:
Combination of entity and time fixed effects
Definition 6.3: (Entity and time fixed effects regression model)
The fixed effects regression model is
(6.16)
Yit = 1 X1,it + . . . + k Xk,it + i + t + uit,
where 1, . . . , n are the entity fixed and 1, . . . , T time fixed
effects. Equivalently, the entity and time fixed effects regression
model can be written in terms of a common intercept, the Xregressors and the n 1 and T 1 dummy variables defined on
Slides 172, 181:
Yit = 0 + 1 X1,it + . . . + k Xk,it
+ 2 D2,i + . . . + n Dn,i
+ 2 B2,t + . . . + T BT,t + uit.

(6.17)
182

Remark:
The combined entity and time fixed effects regression model
eliminates omitted variables bias arising both from unobserved variables that are constant over time and from variables that are constant across states

Parameter estimation:
The full model (6.17) can in principle be estimated by OLS
Most software packages implement a two-step algorithm using entity and time-period demeaned Y and X-variables

183

Application to traffic deaths:


OLS estimate of the entity and time fixed effects regression:

\
FATALITYRATE
= 0.64 BEERTAX + SF Effects + TF Effects
(0.36)

This specification includes


47 state binary variables (state fixed effects, not reported)
6 single-year binary variables (time fixed effects, not reported)
the variable BEERTAX
the intercept (not reported)
184

Application to traffic deaths: [continued]


Time fixed effects have little impact on beer tax coefficient
(cf. regression estimation on Slide 178)
Coefficient is significant at the 10% level
(but not at the 5% level; t-statistic is -0.64/0.36 = -1.78)
Estimation is immune to omitted variable bias from variables
that are constant either over time or across states
However, other relevant but omitted variables may vary both
across states and over time
Specification might still be subject to omitted variable bias
More careful analysis of the dataset
see class

185

6.5. The fixed effects regression assumptions and


standard errors for fixed effects regression
Aim of this section:
Formulation of OLS assumptions of the fixed effects regression model so that Theorem 2.4 on Slide 19 holds for the
involved OLS estimators
(especially the asymptotic normal distribution when n is large)
Some comments on the standard errors for fixed effects regressions

186

Definition 6.4: (Fixed effects regression assumptions)


We consider the fixed effects regression model
Yit = 1 Xit + i + uit,

i = 1, . . . , n, t = 1, . . . , T.

The following are called the fixed effects regression assumptions:


1. uit has conditional mean zero:
E(uit|Xi1, Xi2, . . . , XiT , i) = 0.
2. (Xi1, Xi2, . . . , XiT , ui1, ui2, . . . , uiT ), i = 1, . . . , n, are i.i.d. draws
from their joint distribution.
3. Large outliers are unlikely: Xit and uit have nonzero finite
fourth moments.
4. There is no perfect multicollinearity.
For multiple regressors, Xit should be replaced by the full list
X1,it, X2,it, . . . , Xk,it.
187

Remarks:
Definition 6.4 focuses on entity fixed effects regressions neglecting time effects
An extension for including time fixed effects is straightforward
Standard errors for fixed effects regression:
Autocorrelated errors are a pervasive phenomenon in data
with a time component
(see Section 3.1.2. on Slides 48, 49)
In the case of autocorrelated errors standard errors should
be computed using the HAC estimator of the variance
One type of HAC errors are clustered errors used in the
traffic-fatality dataset
188

You might also like