Working Paper Series
Faculty of Finance
No. 11
Fixed-effects in Empirical Accounting
Research
Eli Amir, Jose M. Carabias, Jonathan Jona, Gilad Livne
Fixed-effects in Empirical Accounting Research
Eli Amir
Tel Aviv University and City University of London
Jose M. Carabias
London School of Economics and Political Science
Jonathan Jona
University of Melbourne
Gilad Livne
University of Exeter Business School
21 July 2015
Comments welcome
Abstract
The fixed-effects specification is often used in panel datasets as a way of dealing with correlated
omitted variables. A review of recent accounting publications reveals that while researchers are
generally aware of the need to include fixed-effects in empirical models when using panel
datasets (firm-time observations), many chose to replace firm fixed-effects with other form of
fixed-effects, mainly industry fixed-effects. We examine the consequences of using different
specifications of fixed-effects and show analytically and using simulations that this can lead to
biased estimates and wrong inferences. To illustrate the importance of properly including firm
fixed-effects, we reexamine commonly used regression models in the accounting literature. We
show how inferences change when fixed-effects are properly included. We call for a more
careful consideration with regard to the use of fixed-effects specification.
Address correspondence to Eli Amir at eliamir@post.tau.ac.il. We would like to thank Yakov
Amihud, Eti Einhorn, Joanne Horton, Fani Kalogirou and seminar participants at the University
of Exeter Business School and Tel Aviv University for helpful comments.
Fixed-effects in Empirical Accounting Research
1. Introduction
Many empirical studies analyze panel datasets, where both cross-section and time-series
observations are pooled together to obtain a larger and more powerful sample. The advantage of
panel datasets is they allow the investigation of relations of interest both in the cross section and
over time. However, examining panel data in accounting should carefully account for
underlying statistical properties of the cross-section data, as well as of the time series. The
challenge, however, is that such properties are not easily observed.
Petersen (2009) argues that while researchers often use panel datasets, the coefficients’
standard errors are often wrongly estimated. Also, the methods used by researchers to correct
for possible biases in standard errors vary widely, but are often wrong. To address this concern,
he proposes clustering standard errors at the firm level, time dimension or both, when
appropriate. However, he finds that clustering standard errors by both firm and time appears
unnecessary in the finance applications he considers. Gow et al. (2010) review and evaluate the
procedures commonly used in estimating standard errors in accounting research and show that
in the presence of both serial and cross-sectional dependence, existing methods often produce
misspecified test statistics.
Both Petersen (2009) and Gow et al. (2010) provide important evidence on significant
biases in standard errors and their implications for test statistics. However, both studies assume
that the models estimated by researchers are well-specified and the coefficients themselves are
not subject to the bias caused by correlated omitted variables. Both studies focus on the problem
of correlations among the error terms. We address a different problem than that addressed by
Petersen (2009) and Gow et al. (2010); namely the potential bias in the estimated coefficients
1
due to misspecified regression models. In particular, our study complements Petersen (2009)
and Gow et al. (2010) by investigating instead the effect of model misspecification (biased and
inconsistent parameters) and its implications for statistical inference. Similar to Petersen (2009)
and Gow et al. (2010) we highlight the concern of incorrect statistical inferences. However,
while Petersen (2009) and Gow et al. (2010) consider bias in the denominator of the test
statistic (standard errors), we consider the possible bias both in the numerator of the test
statistic (the coefficient estimate) and its denominator (the standard error).
To control for unobserved firm and time effects, researchers choose between two models.
The first one, which has been widely used by accounting researchers, is the fixed-effects model.
The maintained assumption under this model is that the unobserved firm and time effects are
correlated with the main explanatory variables. This is a reasonable assumption because
financial information reflects firm characteristics, which are often time-invariant.
An alternative model that is rarely used in accounting research, but is popular in other
disciplines, when panel datasets are used, is the random-effects model.1 In this specification the
unobserved firm and time effects are assumed uncorrelated with the regression residuals. In
datasets with firms being the units of analysis, the random-effect method estimates an
unobserved effect that is drawn from an in i.i.d normal distribution, and is independent of the
error term (Greene, 2003; Wooldridge, 2007). Consequently, the random-effects model
consumes fewer degrees of freedom relative to the fixed-effects model. When the researcher
believes that the model specification does not suffer from a correlated omitted variable problem,
then the random-effect model is the preferred specification, because it will produce unbiased
slope estimates and more efficient standard errors (Greene, 2003; Wooldridge, 2007). However,
1
Mundlak (1978) shows that the random-effects model is, in fact, a special case of the fixed-effects model.
2
in the presence of correlated omitted variables, which are time-invariant, the fixed-effects model
is preferred. Our study deals only with the empirical consequences of using the fixed-effects
model.
The maintained assumption under the fixed-effects model is that both dimensions (firm
and time) of the panel are correlated with the main regressors. Hence the research should
include both time and firm controls. Nevertheless, in many accounting studies researchers
replace firm fixed-effects with industry fixed-effects. This replacement could lead to biased and
inconsistent estimates, and hence incorrect inferences, when firm fixed-effects are correlated
with the main variables of interest.
To assess the significance of this problem in accounting research, we reviewed articles
published in six accounting journals over the period 2006-2013: Journal of Accounting and
Economics, Journal of Accounting Research, The Accounting Review, Review of Accounting
Studies, Contemporary Accounting Research, and European Accounting Review. Out of 1,842
articles, 1,152 (62.5%) can be classified as empirical studies (see Table 1, Panel A). While many
of these empirical articles use more than one empirical methodology, 933 articles (50.6%) use
some form of pooled regressions (see Table 1, Panel B). Many of these "pooled" studies use
some form of fixed-effects. The most common fixed-effects specification used in these
regressions are time and industry. Surprisingly, only 114 articles out of the 927 (12.2%) use firm
fixed-effects, 75 of which in the recent three years. It seems that many researchers prefer using
industry instead of firm fixed-effects.
One plausible reason for researchers’ pervasive use of industry fixed-effects instead of
firm fixed-effects is the belief that industry fixed-effects are sufficient and would not lead to
incorrect inferences. That is, the researcher believes that within-industry variation of the
3
variable of interest is negligible. Another reason may be that one or more of the explanatory
variables are time-invariant which obviates the need to control for firm fixed-effects. A third
possible explanation is that when empirical findings are not robust to the inclusion of firm fixedeffects the researcher may not know which set of results is more credible and opts to report
results rather than no-results.
The purpose of this study is to examine the implications of model misspecification due to
omitted unobserved fixed-effects in accounting settings. We start by analytically identifying the
reasons why failing to control for firm fixed-effects could lead to wrong inferences. These
include the correlation between the omitted fixed-effects and the included explanatory variables,
using incorrect degrees of freedom and biased estimate of the included coefficients’ standard
error. Next, we use simulations of a simple model in which the main variable of interest is
moderately correlated with the unobserved firm fixed-effect. Even with a relatively moderate
correlation, we find that replacing firm fixed-effects with industry fixed-effects, could lead to
substantially wrong inferences. This is because the inclusion of industry fixed-effects does not
fully remove cross-sectional correlations between the omitted firm fixed-effects and the
independent variables. These simulations also reveal that the distribution of the underlying
coefficient of interest measured from OLS regressions with industry and time fixed-effects do
not overlap with the distribution of this coefficient when measured from OLS regressions
featuring both firm and time fixed-effects. This “disconnect” in the distributions indicates the
severity of the inference problem that could result from failing to properly control for these
fixed-effects. Specifically, omitting fixed-effects results in a much higher rate of rejection of the
null (a Type I error). However, the severity of the problem diminishes as the number of
industries increases, or the number of firms per industry decreases. In general, however,
4
ignoring the within-industry variation in the main explanatory variables results in biased
estimated coefficients and wrong inferences. Furthermore, we show that the bias caused by
omitting firm fixed-effects decreases as the variance of the main regressor increases. The
intuition is that firm fixed-effects become more crucial as the main regressor is more timeinvariant.
We then examine the sensitivity of commonly used capital markets-based accounting
regressions to the omission of firm fixed-effects. First, we select Basu's (1997) regression of the
asymmetric response of earnings to good and bad news. We select this regression because the
relation between earnings and stock returns may vary across firms and over time. Ball et al.
(2013) also argue that this is indeed the case because the relation between expected earnings and
expected stock returns varies cross-sectionally. Another, not mutually exclusive, explanation is
that the process of impounding economic news into earnings is firm-specific due to corporate
governance mechanisms, internal controls, or relationships with the auditor, all of which are
quite stable over time. The crucial factor, however, is that the researcher either cannot observe
the underlying mechanisms, or cannot collect the full set of data, and hence needs to consider
this in designing the research strategy. With similar motivation in mind, we also select the
predictive regressions of accruals and cash flow components of earnings as explanatory
variables for future earnings (Sloan, 1996).
In addition to the fixed-effects specifications, we also examine a number of alternative
specifications, which have been used in the literature. These include (i) first differencing of both
the dependent and independent variables (full differencing), (ii) using first differences of only
the dependent variable but not the independent variable, (iii) replacing the vectors of the
dependent and independent variables by their means and estimating a single cross-sectional
5
regression, (iv) demeaning only the dependent variable, and (v) using the Fama and MacBeth
(1973) estimation approach, namely estimating the coefficient in question by averaging the
coefficients and standard errors obtained from periodical cross-sectional regressions.
Of these alternative specifications, only full differencing specification yields an
overlapping distribution with that of the correct specification (that includes both firm and time
fixed-effects). However, this specification is somewhat less efficient, as the distribution of this
alternative method is more dispersed than the distribution obtained from the correct regression
specification. This highlights that incorrect inferences under the full differencing method are
more likely than under the correct model.
Our replication of the two studies, using firm fixed-effects, reveals that that the magnitude
and significance of the main coefficients of interest are quite different from what is obtained
under the original specifications. Although these replications still yield coefficients that are
reliably different from zero, the differences in magnitude suggest that the strength of the
underlying economic phenomenon could be substantially weaker. For example, our fixed-effects
estimate of the Basu’s coefficient of timely loss recognition is 40-50% lower than the estimate
obtained under the Basu (1997) estimation procedure, depending on the estimation period. We
also obtain qualitatively similar results when we replicate Sloan’s (1996) study.
With respect to Basu's (1997) regression, Ball et al. (2013) also acknowledge the need to
control for fixed-effects. However, they use an alternative approach; they suggest that
demeaning the dependent variable is equivalent to using firm fixed-effects. We show
analytically that their procedure is not equivalent to using firm fixed-effects and that their
approach is not free from bias. In fact, their approach tends to understate the magnitude of the
true coefficient. Furthermore, we show empirically that the magnitude and the standard error of
6
the slope coefficient on negative stock returns in the Basu (1997) model are smaller using the
Ball et al. (2013) approach than when firm and time fixed-effects are included. Nevertheless, the
Basu (1997) results hold, albeit with lower magnitude and significance, which is broadly
consistent with Ball et al. (2013).2
It is important to note that including firm fixed-effects does not solve all correlated
omitted variable problems. In particular, it does not solve the problem of omitting a correlated
variable that varies across time. However, in such cases including firm and time fixed-effects
would not exacerbate the underlying problem. We therefore recommend the use of firm and
time fixed-effects because there is no harm to doing so. Including firm and time fixed-effects
when this is not necessary would nevertheless yield correct inferences, but excluding firm or
time fixed-effects would lead to incorrect inferences when they are correlated with the
explanatory variables. If including firm fixed-effects is not feasible, then the “second-best”
approach is first differencing of both the dependent variable and independent variables. While
first differencing yields unbiased coefficients, it is less efficient owing to loss of data (e.g., the
first year in the panel is lost). Either way, replacing firm with industry fixed-effects is likely to
yield biased coefficients.
Section 2 summarizes the main insights from our analytical framework. Section 3 presents
results of simulations aimed at quantifying the potential bias caused by correlated omitted
variables in panel datasets. Section 4 uses the empirical models employed by Basu (1997) and
Sloan (1996) to demonstrate the effect of correlated omitted variables on estimated coefficients.
Section 5 provides concluding remarks.
2
Patatoukas and Thomas (2015) argue that the conservatism coefficient in Ball et al. (2013) is still upward biased.
Our study focuses on the role of fixed effects in regression specifications and does not address the issues raised by
Patatoukas and Thomas (2015).
7
2. Analytical Derivations
An appendix to this study provides detailed calculations, which are the basis for the
following text. Here we only present the essential elements.3 Let D = [DF, DT] be a matrix of
indicator (dummy) variables for firm fixed-effects (DF) and time fixed-effects (DT), and the
unobserved fixed-effects be denoted by [ F ,T ] where the subscripts F and T denote firm
and time fixed-effects, respectively. Then for the panel data the model becomes
y X D
(1)
The variance of ε is denoted 2 . Estimating this model will yield
(2)
y Xb Da e
where b and a are the coefficient estimates of β and α, respectively and e is the estimated
regression residual. The vector of estimated slope coefficients b can be expressed (using
partitioned matrix conventions) as:
b [ X M D X ]1[ X M D y ] [ X * X * ]1[ X * y* ]
where M D I D[ DD]1 D ; X* M D X ; y* MD y ; * M D .
(3)
(4)
MD can be thought of as a particular process of demeaning the independent and dependent
variables. It is straightforward to show that b is unbiased (that is, E[b | X ] ). Furthermore, in
the case where the fixed effects are uncorrelated with X, employing (3) would generate an
unbiased estimate of β, which will be the same as the estimate one would obtain from regressing
y only on X.
Crucially, the demeaning process captured in the matrix M is dependent on the
researcher’s choice of which fixed-effects to include. Specifically, assume that the researcher
3
The full analytical appendix is available from the authors upon request.
8
uses instead industry fixed-effects and time fixed-effects. Let D* [ DH , DT ] and * [ H ,T ]
where DH stands for the matrix of industry dummies and H is the matrix of unobserved
industry fixed-effects. Then for the panel data the estimated model is
y X D**
(5)
Importantly, the disturbance term μ involves firm fixed-effects that have not been removed
by the inclusion of industry fixed-effects. Now, the estimated coefficient b1, from this model can
be expressed similarly to (3)
b1 [ X M D* X ]1[ X M D* y ] [ X ** X ** ]1[ X ** y** ]
[ X ** X ** ]1 X ** M D* [ X ** X ** ]1 X ** **
where M D* I D*[ D*D* ]1 D* ; X ** M D* X ; y** M D* y ; ** M D*
(6)
(7)
Because the disturbance term still includes firm fixed-effects (since using industry fixedeffects has not fully controlled for cross-sectional variations), it follows that
E[b1 | X ] [ X ** X ** ]1 X ** E[ ** | X ]
[ X ** X ** ]1 X ** [ DF * F * ]
(8)
It follows from (8) that b1 is biased. The magnitude of the bias is related to the covariance
[ DF * F * ] , and the scaling factor, [ X ** X ** ] , which can be thought of as a measure of
element, X **
the variability in the undemeaned regressor X (see Figure 2). Hence, including industry fixedeffects instead of firm fixed-effects will affect inferences. Furthermore, t-statistics are also a
function of the estimated coefficients’ standard error. Note also that
Var b | X X * X * 2
1
(9)
and
Var b1 | X X ** X ** 2
1
(10)
9
Since 2 is unknown to the researcher, it has to be estimated from the data. Let T denote
the number of years, F the number of firms and H the number of industries. The unbiased
estimator of 2 is s2, whereby:
e
2
s
2
FT ( F 1) (T 1) K 1
( y Xb Da )( y Xb Da)
FT ( F 1) (T 1) K 1
(11)
Hence, the conditional variance of the k-th coefficient bk is based on the diagonal element
1
kk in the matrix X * X * as follows:
1
sˆb2k Est. Var bk | X s 2 X * X * .
kk
(12)
With industry and time fixed-effects, the equivalent expressions are
s12
e
2
1
FxT ( H 1) (T 1) K 1
( y Xb1 D*a* )( y Xb1 D*a* )
FxT ( H 1) (T 1) K 1
(13)
and
sˆb21k Est. Var[b1 | X ] s12 X ** X ** kk
1
(14)
Observation 3 in the Appendix states that a t-test based on the (misspecified) regression
coefficients b1, tb1 k
tbk
bk
sb2k
b1k
sb21 k
, and a t-test based on the correct regression coefficients b,
, are identical if and only if
tb1k
b1k
sb21k
b1k [ X ** X ** ]k2, k
1
e
2
1
/ ( FxT ( H 1) (T 1) K 1)
bk [ X * X * ]k2, k
1
e
2
/ ( FxT ( F 1) (T 1) K 1)
10
bk
sb2k
(15)
tbk
Otherwise, the sign and significance of the t-test of b1 would be different than that of b.
Note that with non-zero correlation between the firm fixed-effects and the independent variables
the two expressions for the t-statistics differ along four dimensions. The first is the difference
point estimates b and b1. The second difference relates to the specific kk-th element in the square
bracket in the numerator. The third is the two sums of the regression squared residuals. The
fourth and last is the difference in the degrees of freedom. With respect to the latter, note that as
H approaches F the difference in the degrees of freedom becomes smaller in magnitude. In the
extreme, when H = F the two t-statistics will be identical, as firm and industry fixed-effects
coincide.
The Ball et al. (2013) specification
Ball et al. (2013) argue that the Basu (1997) model suffers from a correlated omitted
variable problem. They suggest the problem could be solved by using a fixed-effects
specification. However, due to computation infeasibility, they suggest an alternative approach to
the standard fixed-effects specification in which the dependent variable, earnings, is adjusted for
average earnings (where averages are taken at a firm level over time). Demeaning only the
dependent variable is not identical to including firm and time fixed-effects.4 It is important to
note that Ball et al. (2013) employ several measures of returns. The important specification for
us appears in Table 5, and for this table they use size and book-to-market adjusted returns.
Importantly these returns are not zero mean. This would lead to biased coefficients.
4
Ball et al. (2013) compute unexpected returns by subtracting from firm-specific returns the market return (or the
return on the corresponding size and book-to-market portfolios). If the average unexpected return is zero, the full
fixed-effects model and Ball et al. (2013) approach yields unbiased estimators in the Basu (1997) framework.
11
T
To see this, let yi denote the firm level average of yit (that is, yi yit / T ). If only the
1
dependent variable is demeaned, the model can be written as (assuming it includes time fixedeffects):
yit yi y* t X it uit
(16)
where uit it C f i yi
Cf=i denotes the time-invariant firm fixed-effects that are not explicitly modelled and hence are
absorbed in the disturbance term. Using matrix algebra, we get5
y* DT X u;
M I D [ D D ]1 D
(17)
y M y* M ( X u ) X u
The estimated vector of coefficients is therefore
b [ X X ]1 X y
Taking expectations, and bearing in mind that the correct model is given by equation (1)
above and that E[CF | X ] DF F :
E[b | X ] E[[ X X ]1 X y | X ]
[ X X ]1 X E[ M ( y yi ) | X ]
[ X X ]1 X ( X M D T M DF F X M D T M DF F )
(18)
[ X X ]1 X M A X [ I [ X X ]1 X M A X ]
where MA is the firm-level average-creating matrix. That is, demeaning the dependent variable
but without demeaning the right-hand side variables leads to a biased coefficient. Only in the
case where X+ is zero-mean, we obtain unbiased coefficients since MA X+ = 0. Importantly, the
bias in (18) is unrelated to the correlation between the firm fixed effects and the other
5
With matrix algebra
u CF y .
12
explanatory variables. Equation (18) suggests that the Ball et al’s (2013) coefficients are the true
coefficients scaled down by the expression [ I [ X X ]1 X M A X ] .
The Ball et al.’s (2013) estimates are therefore biased and they would differ from a full
fixed-effect model. To stress, this is because including time and firm fixed-effects works to
demean the panel variables in a different way. Under this specification all variables (dependent
and independent) are transformed as follows:
vit* vit vi vt v
(19)
where vit* is the demeaned variable for firm i and year t, vit is the original observation, vi is the
average across all annual observations for firm i, vt is the average of the underlying variable v in
year t across all firms, and v is the grand average of v. For the dependent variable this is
different from the Ball et al.’s (2013) transformation vit* vit vi . This also affects inferences,
because the standard error of the t-test is derived from the residuals sum of squares, which also
differs between the two approaches. Additionally, a research design using the Ball et al.’s
(2013) approach is likely to incorrectly calculate the degrees of freedom. The typical statistical
software used in regression analysis will identify the transformed variable vit* vit vi as a
“single” variable and hence instead of using FT F 1 T 1 K 1 degrees of freedom
would use FT T 1 K 1 degrees of freedom.6
The Fama-MacBeth (1973) approach
According to Fama-MacBeth (1973) regression coefficients are not calculated from a
panel, but rather from periodical cross-sectional regressions. Specifically, the overall coefficient
6
Ball et al. (2013) also estimate a model in which the dependent variable (earnings) is adjusted by subtracting the
lagged dependent variable (but not the independent variable). It can be shown analytically that this approach also
leads to biased estimates. For brevity we do not include the proof here, but simulate this specification later.
13
is the average coefficient over the T annual regressions and the standard error is derived from
the distribution of the individual (periodical) coefficients. Because periodical regressions are not
tooled to accommodate fixed firm or annual effects, this method is still prone to the same
problem. Each annual underlying model can be written as:
yt X t t
(20)
where X is the matrix of annual explanatory variables. However, the individual disturbance term
incorporates the fixed-effects implying it C f i Ct it . The estimated coefficient for a
single year t therefore can be expressed as in a simple OLS setup:
bt [ X tX t ]1[ X tyt ] [ X tX t ]1 X tt
(21)
Averaging over T annual regressions yields the Fama-MacBeth coefficient:
bFM T1 t 1 T1 t 1[ X tX t ]1 X tt
T
T
(22)
The assumption of correlated fixed-effects implies that E[C f Ct | X t ] 0 . Hence we
obtain that in expectation
T
E[bFM | X ] T1 [ X tX t ]1 E[ X tt | X t ] .
(23)
t 1
In other words, the Fama-MacBeth (1973) procedure is prone to the omitted fixed-effects
problem under the assumption that the true model is as expressed in equation (1).
3. Simulations
To illustrate the potential bias in estimated slope coefficients that is caused by omitting
fixed-effects, we simulate a panel dataset according to the following specification:
y it ai at a I xit eit
14
X it X i t
1
0
0
X i at
X i , at , ai , aI ~ N Μ, Σ , Μ , Σ
0
X i ai
X a
0
iI
and t , eit ~ N 0,1 . The variable
,
X a
Xa
1
0
0
1
0
0
i t
is the dependent variable,
industry fixed-effects, respectively, and
i i
,
X a
0
0
1
i I
,
are firm, time and
is the independent variable. The values of the true
parameters are X i t , X i i , X i I {0.5,0.25,0,0.25,0.50} and 1 . The null hypothesis is
1 . Since the bias in depends on the correlation between the omitted fixed-effects and the
regressor X, we impose five different levels of correlations between the fixed-effects and X:
0.50, 0.25, 0, -0.25 and -0.50. We expect a positive, zero and negative bias for the positive, zero
and negative correlations, respectively, when the model omits the fixed-effects.7
Using the above specification, we simulate a panel of 8,000 observations made of 10
periods, 20 industries and 40 firms per industry. We repeat this process 8,000 times, applying
the following eight specifications:
(1) Firm and time fixed-effects (FE) – This model includes both firm and time fixed-effects.
We expect this specification to yield an unbiased slope estimate (b = 1). We label this
model as FE.
(2) Industry and time fixed-effects (IE) – Here, we include time and industry fixed-effects,
by replacing
with
. This specification ignores within-industry variations at the firm
level and hence we expect the slope estimate to be biased. Notice that this specification
approaches the full fixed-effects model as the number of firms per industry decreases. At
7
In a single variable setting, it is easy to sign the bias; In a multivariate setting, however, the sign of the bias
depends on the correlation matrix of the regressors (see the Appendix).
15
the extreme case where there is one firm per industry, this specification is identical to the
full fixed-effects model. To show this, we conduct sensitivity analysis where we
sequentially increase the number of industries (and reduce the number of firms per industry)
keeping the number of observations constant (see Table 3). We label this model as IE.
(3) Misspecified model (MS) – Here we omit all fixed-effects; hence, we expect the estimated
slope coefficient (β) to be biased to a greater degree than the previous model. However,
given the small number of time periods (10) relative to the number of firms (800), the bias
from omitting time effects is expected to be small. This situation is similar to that in many
studies that use archival data, as the number of firms is much larger than the number of
periods. We label this model MS.
(4) First differences model for both the dependent and independent variables (FD) – In
this model, we use first differences instead of current values (that is, current value minus
the lagged one). This leads to an unbiased estimated slope coefficient, and one could omit
the fixed-effects from the model. However, the differencing process involves loss of
information, as the first period in the panel is lost. We label this method FD.
(5) Using first differences for the dependent variable only (LY) – Here, the researcher firstdifferences the dependent variable but not the independent variables. We expect this
specification to yield biased results. However, in this case the bias in induced not only by
the covariance between the independent variable and the fixed-effects, but also by the
exclusion of the variable βXit-1 from the model. Hence, the bias in this case is also a function
of the true β; when β is positive the bias is negative and when β is negative the bias is
positive. We label this model LY (Lag Y).
16
(6) Using the time-series means of the independent and the dependent variables (MYX) –
Here we convert the panel dataset into a single cross-sectional regression by using the
means of the independent and dependent variables as the main variables in the regression
(see for instance, Aghion et al., 2010). Using time-level means implies that the error term
still includes firm fixed effects and hence the coefficient estimates are biased.
(7) Demeaning the dependent variable Y (MY) - Similar to the LY case and motivated by
Ball et al. (2013), this specification only adjusts the Ys by subtracting the firm level
averages. However, this specification is expected to yield biased estimates, as argued above.
The bias is induced not by the covariance between the independent variable and the fixedeffects, but by the failure to demean the dependent variable at the firm level. Equation (18)
suggests that coefficient estimate under this specification, is a scaled-down estimate of the
true parameter. We therefore expect it to be smaller than 1, regardless of the sign of the
correlation between the fixed-effects and independent variable. We label this model as MY.
(8) Fama-MacBeth (FM) – We also estimate the model (without fixed-effects) using the
Fama-MacBeth (1973) procedure; that is, estimating 10 periodical regressions and reporting
the average slope coefficient. Equation (23) suggests the FM specification would yield
biased estimate, under the true model that includes firm and time effects. We label this
model as FM.
For each of the eight specifications, we obtain 8,000 slope estimates. We also vary the
magnitude of the correlation between the fixed-effects and the regressor X. Table 2 reports the
means of the estimated slope coefficients, standard errors, t-statistics of the distance from the
true coefficient (β = 1), and R2s for five levels of correlations: 0.5, 0.25, 0, -0.25, and -0.5. We
17
also present the distribution of the estimated slope coefficients in Figure 1, using three different
correlations: Figures 1a, 1b, and 1c present the distribution for a correlation of 0.5, 0, and -0.5,
respectively.
By construction, the full fixed-effects model (FE) yields an unbiased estimate (b = 1) and
high R2s for all levels of correlation. Also, the distribution of the bs is the tightest among all
alternative distributions, as can be seen from the figures. The second model (industry and time
effects, denoted IE) yields a positive bias (b = 1.25) when the correlation between the fixedeffects and the regressor is 0.5; zero bias when the correlation is zero and negative bias when the
correlation is -0.5 (b = 0.75). We would incorrectly reject the null hypothesis that the slope is
equal to 1 in all cases, except for the case of zero correlation between the fixed-effects and the
regressor. Also, the regression R2s are lower relatively to the FE model.
When both firm and time effects are omitted (MS), the pattern of bias is similar to that
observed for model IE. That is, with 20 industries and 800 different firms, controlling for
industry fixed-effects performs equally poorly as the fully misspecified model. Moreover, from
Figures 1a-1c we note that the distributions of the slope coefficient under both the MS and IE
are completely disjoint from the distribution of b under the FE specification. This suggests that
it is very unlikely that a slope estimate from these two specifications would fall within a
conventional confidence interval obtained under the full fixed-effect model.
Using first differences for both the dependent and independent variables (FD) yields an
unbiased slope estimate (b = 1.00) for all five correlations, but the estimate is less efficient as
reflected by the larger standard errors, lower t-statistic, and lower adjusted R2 and the larger tails
seen in Figures 1a-1c. Using a first difference only for the dependent variable (LY) yields a
biased and less efficient estimate (b = 0.50; Adjusted R2 = 0.09), regardless of the correlation
18
between the fixed-effects and regressor X. This is because the model is not sensitive to the
correlations between the fixed-effects and the independent variable.
In model MYX, we use means of X and Y and estimate a cross-sectional regression. This
model yields a large positive (negative) bias when the correlation between the fixed effects and
the regressor is positive (negative). However, in the case of demeaning the dependent variable
only (MY), the bias is negative regardless of the correlation between the fixed effects and the
regressor. The reason the two models LY and MY lie to the left of the FE distribution is
consistent with the theoretical prediction derived in the previous section, which states that with
one explanatory variable the estimated coefficient will be smaller in magnitude. Since β is
positive here, they yield values smaller than 1.8
Using the Fama-MacBeth (1973) specification (FM) yields qualitatively similar results as
the misspecified model, with even larger tails of the distribution. This is seen in Figures 1a-1c
where the FM parameter distribution obscures the parameter distribution of the MS model.
Finally, when the correlation between the fixed-effects and the independent variable is zero,
omitting the fixed-effects is not expected to cause any bias. Indeed, the results show that the
slope coefficients in models IE, MS, and FM are unbiased.9
8
We also find from additional simulations (not tabulated) that when β = -1, the LY and MY distributions are within
negative value range and lie to the right of the FE and FD distributions. This, again, is consistent with lower
magnitude relative to the true value.
9
Note that the distributions of the slope coefficient depicted in the three charts of Figure 1 do not correspond to the
average standard errors reported in Table 2. For example, the distribution of β under the FM specification features
larger tails than that of the FE model, although the standard error for the FM model (0.008) is smaller than that of
the FE model (0.012). To demonstrate this issue, assume that we run 5 simulations of 8,000 observations each and
obtain the following coefficient estimates for the FM model: -2, -1, 0, 1, and 2. Also, suppose the output for the
OLS regression is such that each coefficient is estimated with standard error of 1. In contrast, for the FE model
assume we obtain coefficients of -1, -0.5, 0, 0.5 and 1, but each coefficient is estimated with a standard error of 2.
Then, if we were to chart these outcomes, the distribution of the FM (FE) will be wider (narrower), but the average
standard error tabulated would be smaller (larger) for the FM (FE) specification.
19
The conclusion from this analysis is that omitting firm fixed-effects will result in biased
slope coefficients unless the fixed-effects are uncorrelated with the independent regressor. Using
firm fixed-effects is a safe approach in that it will generate unbiased coefficients even when the
data generating process does not contain unobserved correlated fixed-effects. An alternative
approach would be to conduct the Hausman (1978) test procedure to identify whether fixedeffects should be employed. However, since the Hausman test practically runs a fixed-effect
model against a model with no fixed-effects, we see no clear advantage over routinely including
firm fixed-effects.10
(Table 2 and Figure 1 about here)
Since industry fixed-effects is often used instead of firm fixed-effects, we examine the
effect of firm distribution across industries on the results by changing the number of firms per
industry while keeping the total number of observations constant at 8,000. We consider the
following cases: (i) 10 industries with 80 companies in each industry; (ii) 20 industries with 40
companies in each industry (the baseline used above); (iii) 40 industries with 20 companies in
each industry; (iv) 160 industries with 5 companies in each industry; and (v) 400 industries with
2 companies in each industry. We expect the bias to decline as the number of industries
increases. In the extreme case of one firm per industry, there will not be any bias, as this case
coincides with the full fixed-effects specification.
Table 3 contains the results of this analysis. As the number of industries increases and the
number of companies per industry decreases, the bias declines. However, the decline in the bias
is rather small. For example, the bias is 24% when we use 40 industries and 20 companies per
industry; it declines to 17% when using 400 industries and two companies per industry. Hence,
10
Another advantage for using fixed effects specifications is that typical software output can report the fixed effects
coefficients, if the researcher is interested in exploring or reporting these coefficients.
20
replacing firm fixed-effects with industry fixed-effects and increasing the number of industries
will not eliminate the bias in the coefficients, although using a finer industry classification might
reduce the bias. For example, using the Fama-French 48-industry classification in estimating
panel datasets is expected to yield less biased results than using the 12-industry classification.
(Table 3 about here)
The bias caused by omitting firm and time fixed-effects depends also on the time-variance
of the main regressor X. As the time-variance of the regressor X increases, the bias caused by
omitting the firm fixed-effects is expected to decline. To see this, we let the variance of X (i.e.,
of the parameter - t ) to decrease from 2.0 to 0.25 in intervals of 0.25. As Figure 2 shows, the
bias in the slope coefficient increases as the variance of X decreases. In other words, omitting
firm fixed-effects results in a larger bias as the main regressors become more time-invariant. In
contrast, when the regressor X varies over time, omitting firm fixed-effects is likely to result in
little bias if at all.
(Figure 2 about here)
Overall we draw several conclusions from the simulation analysis:
(i) Omitting firm fixed-effects may generate biased estimates and overstated t-statistics, hence,
wrong inferences. Replacing firm with industry fixed-effects is not a valid approach as it
does not eliminate the coefficient bias, if the purpose is to control for unobserved correlated
omitted variables that are time-invariant. While increasing the number of industries is likely
to reduce the bias, this approach is unlikely to eliminate the bias.
(ii) Using means of the dependent and independent variables, or the approach taken by Ball et
al. (2013) are not equivalent to using firm fixed-effects. These methods yield biased
estimates. Same holds for lagging just the dependent variable.
21
(iii) Using first differences (for both the dependent variable and independent variables) is a
valid, but less efficient, estimation strategy.
(iv) The coefficient distributions of several specifications may be so disjoint from the
coefficient distribution of a fixed-effect model that respective confidence intervals may be
entirely non-overlapping. That is, the chance of correct inference under the wrong
specification may be quite slim.
4. Implications for Empirical Accounting Research
We now examine the effects of using different model specifications on the results of
commonly used regression models in accounting research. We chose two regressions that have
gained wide recognition: Basu's (1997) model of asymmetric timeliness of earnings and Sloan’s
(1996) differential persistence of accruals and cash flow components of earnings.11
4.1 The Asymmetric Timeliness of Earnings – Basu (1997)
The Basu (1997) model highlights the differential reaction of earnings to good and bad
news, where stock returns serve as a proxy for news. The regression is:
X it / Pit 1 0 1 D( Rit 0) 0 Rit 1 D( Rit 0) Rit it
where
denotes firm i's annual stock returns for the 12 months starting nine months prior to
fiscal year-end until three months after the fiscal year-end, a period that roughly corresponds to
the period between earnings announcements.
denotes firm i's earnings per share for year t,
11
The problem of excluding fixed-effects applies to any dynamic panel data models where the dependent variable
is a function of lagged values of the independent variables. Suppose that the data generating process is
yit ai at yit 1 eit . Then it follows that the dependent variable yit 1 ai at 1 yit 2 eit 1 . If the researcher
estimates instead the regression yit yit 1 uit where uit eit ai at , it follows that E [uit yit 1 | yit 1 ] 0 and
therefore the estimate b will be biased.
22
denotes firm i’s share price at the beginning of fiscal year t, and D(Rit < 0) is an indicator
variable obtaining the value "1" if stock returns are negative, and "0" otherwise.
Like Basu (1997), we use all firm-year observations from 1963 to 1990 for which stock
returns are available on the CRSP monthly files, and the necessary accounting data available on
Compustat. Similarly, we deflate earnings by the beginning-of-year share price and eliminate
observations falling in the top or bottom 0.5% of opening price-deflated earnings in each
calendar year to reduce the effects of outliers on the results.
Like Ball et al. (2013) we consider two additional definitions of stock returns: market
adjusted returns, and size and book-to-market adjusted returns. The size and book-to-market
adjusted returns are computed by forming 5x5 portfolios based on annual sorts on market
capitalization and on the book-to-market ratio (at the end of year t-1). We then calculate
monthly value-weighted mean returns for each size and book-to-market portfolio and subtract
the portfolio returns from the same size and book-to-market quintiles raw returns. Market
adjusted returns are raw returns minus the value-weighted market returns. To save space, we
report results only for raw returns; results for market-adjusted returns and size and book to
market-adjusted returns are similar.
We collect data for all US firms that trade on the NYSE, AMEX and NASDAQ. Our
sample contains 114,175 firm-year observations for the period 1963-2013, and 42,546 firm-year
observations for the period 1963-1990. For comparison, Basu (1997) reports results for a sample
of 43,321 firm-year observations over the same period. Panel A of Table 4 reports summary
statistics for the regression variables for the period 1963-2013. These statistics are consistent
with those reported in Patatoukas and Thomas (2011) and Ball et al. (2013). For instance,
X it / Pit 1 is left-skewed and Rit is right-skewed. The median value of X it / Pit 1 in our study is
23
0.063 whereas Patatoukas and Thomas (2011) report median value of 0.063 and Ball et al.
(2013) report median value of 0.062. The median value of annual stock returns Rit in our study
is 0.094, whereas Patatoukas and Thomas (2011) report 0.089 and Ball et al. (2013) report a
value of -0.047.
(Table 4 about here)
Table 5 presents the sensitivity of the asymmetric timeliness of earnings regression to the
inclusion of fixed-effects. Looking at the 1963-1990 sample, our results show that when pooled
OLS is used, the coefficient 1 is positive (0.198) and significant at the 0.01 level (t-statistic of
24.12). Basu (1997) reports a somewhat larger coefficient of 0.256 (t-statistic of 27.14). The
common interpretation of this result is that contemporaneous earnings reflect negative news in a
timelier manner than positive news (accounting earnings are conditionally conservative).
When firm and time fixed-effects are included in the model the slope coefficient is
significantly lower than that reported by Basu (1997). Specifically, the coefficient 1 is 0.091 (tstatistic = 10.88) for the 1963-1990 period and 0.145 (t-statistic = 28.45) for the 1963-2013
period. When industry effects are included in the model instead of firm effects, 1 is 0.163 (tstatistic = 21.72) for the 1963-1990 period and 0.223 (t-statistic = 44.90) for the 1963-2013
period, very similar to those reported by Basu (1997). Furthermore, using the Fama-MacBeth
(1973) methodology yields 1 equal to 0.211 (t-statistic = 9.26) for the 1963-1990 period and
0.26 (t-statistic = 14.51) for the 1963-2013 period, again very close to the results reported by
Basu (1997).
In sum, adding firm fixed-effects reduces the coefficient on conservatism ( 1 )
substantially, while using industry fixed-effects or the Fama-MacBeth (1973) methodology
24
yields a much higher conservatism coefficient. These results are in line with our theoretical
predictions and highlight the fact that when the data generating process contains firm fixedeffects, the inclusion of industry fixed-effects does not help in dealing with unobserved firm
heterogeneity. Similarly, as predicted, the Fama-MacBeth (1973) approach is also subject to
unobserved heterogeneity biases.
Aghion et al. (2010) use a different approach to estimating a panel dataset. Instead of
adding firm and time fixed-effect, they convert the panel dataset into a cross-sectional
regression whereby instead of the vectors of dependent and independent variables, they use
time-level means (denoted here MYX). Using this method yields conservatism coefficients ( 1 )
equal to 0.097 (t-statistic = 9.03) for the 1963-1990 period and 0.096 (t-statistic = 10.43) for the
1963-2013 period.
Ball et al. (2013) acknowledge that in order to deal with this problem, the researcher could
use a fixed-effects specification. However, in their empirical approach and due to computational
constraints, they use a different approach: demeaning the dependent variable (MY). To assess
the effect of their approach, the last specification in Table 5 employs the Ball et al. (2013)
specification (denoted, MY). As can be seen, 1 is 0.048 (t-statistic = 6.65) for the 1963-1990
period and 0.082 (t-statistic = 18.96) for the 1963-2013 period. These values are similar to the
estimates reported in Table 5 (row 5) in Ball et al. (2013). We therefore conclude that 1 under
the Ball et al. (2013) specification is significantly lower than the coefficient obtained from the
full fixed-effects specification. This result suggests that the MY specification provides a lower
bound for the conditional conservatism coefficient. This is likely to be the case as the mean
value of the dependent variable is positive, as suggested by our simulations.
25
Overall, our results suggest that there is substantial unobserved heterogeneity at the firm
level that seems to be an important determinant for explaining price-deflated earnings. These
findings are consistent with the Ball et al. (2013) finding that the Basu (1997) is affected by
correlated omitted variables due to the expected components of earnings being correlated with
the expected components of stock returns. However, we argue that the empirical specification of
Ball et al. (2013) is not an appropriate substitute for firm and time fixed-effects (FE). This
notwithstanding, from a qualitative standpoint, our results confirm the presence of conditional
conservatism in earnings.
(Table 5 about here)
4.2 The differential persistence of accruals and cash flow components of earnings
Sloan (1996) explores the association between future income and previous year’s accruals
and cash flows. He finds that the persistence of cash flows exceeds that of accruals, which is
consistent with the reversal property of accruals. Sloan (1996) estimates the following
regression, allowing the persistence coefficient on the accruals and cash flow components of
earnings to be different. The model is:
OI it / TAit 0 0 ACCit 1 / TAit 1 1CFit 1 / TAit 1 eit
where OI it / TAit is operating income divided by average total assets, ACCit 1 / TAit 1 denotes
operating accruals divided by average total assets, and CFit 1 / TAit 1 denotes operating cash
flows divided by average total assets.
The accrual component of operating income is measured as ACCit = (∆CAit - ∆Cashit) (∆CLit - ∆STDit - ∆TPit) – Depit, where ∆CAit is the change in current assets; ∆Cashit is the
change in cash and cash equivalents; ∆CLit is the change in current liabilities; ∆STDit is the
26
change in debt included in current liabilities; ∆TPit is the change in income taxes payable; and
Depit is the depreciation and amortization expense. The cash flow component of earnings (CFit)
is measured as the difference between operating income and the accrual component of earnings.
We collect a sample, which includes all firm-year observations with the necessary
accounting and stock return data available on Compustat and CRSP monthly file between 1962
and 1991. This is the sample analyzed by Sloan (1996). We also collect data for an extended
sample for 1963-2013. We only sample US firms that trade on NYSE, AMEX or NASDAQ. As
before, we eliminate the top and bottom 0.5% of observations. Table 4, Panel B, contains
descriptive statistics for the main variables for the extended period 1963-2013. Median
operating income over average total assets is 0.13. This figure is made of a median accrual
component of 0.01 and a median cash flow component of 0.12.
Table 6 presents results of estimating the differential persistence regressions for the full
period 1963-2013 and for the sub-period 1962-1991. For the sub-period, using a pooled
regression without fixed-effects, the coefficient on the accrual component ( 0 ) is 0.621 (tstatistic = 141.29), lower, as expected, than the coefficient on the cash flow component ( 1 ),
which is 0.732 (t-statistic = 207.87). These coefficients are somewhat lower than those reported
by Sloan (1996) in Table 3 ( 0 = 0.765 and 1 = 0.855). However, similar to Sloan (1996), our
results also show that the accrual component of earnings is less persistent than the cash flow
component (0.62 vs. 0.73) and that the difference between the two coefficients is significant at
the 0.01 level.
Adding firm and year fixed-effects reduces the persistence coefficients quite significantly.
The coefficient on the accrual component ( 0 ) is 0.468 (t-statistic = 128.33) and the coefficient
on the cash flow component ( 1 ) is 0.538 (t-statistic = 197.53) for the 1962-2013 period. The
27
corresponding coefficients for sub-period are even lower (0.377 and 0.453, respectively). When
firm fixed-effects are replaced with industry fixed-effects, the persistence coefficients increase
to 0.680 and 0.789, respectively for the entire sample period, and in the sub-period these
coefficients are 0.606 and 0.713, respectively. That is, the coefficients without fixed-effects and
with industry and time fixed-effects are virtually identical. Using the Fama-MacBeth (1973)
methodology has a minor effect on the persistence coefficients and these coefficients are of a
similar magnitude as without any fixed-effects.
Using means of the dependent and independent variables (MYX) yields very high
persistence measures. For the entire sample period the persistence measures of accruals and cash
flows are 0.816 and 0.988, respectively. Interestingly, for the sub-period, these measures are
very similar to this reported by Sloan (1996): 0.715 and 0.888, respectively. Finally, demeaning
the dependent variable (MY) reduces both persistence coefficients to the lowest magnitude
reported in this table. Moreover, for the sub-period 1962-1991 accruals are not less persistent
than cash flows; clearly, this last specification yields downwards biased coefficients, as shown
in our simulations.12
To summarize, with firm fixed-effects the magnitude of the coefficients in the Sloan
(1996) model is smaller than originally reported. This suggests that both accruals and cash flows
are only moderately persistent, although accruals are still found to be less persistent than cash
flows.
(Table 6 about here)
12
In all other specifications we find that accruals are less persistent than cash flows and that the difference between
the two coefficients is statistically significant at the 0.01 level.
28
5. Conclusion
Accounting researchers often use panel datasets that contain firm/time observations.
However, instead of controlling for firm and time fixed-effects, researchers often use industry
and time fixed-effects or none at all. When asked about the reason for avoiding firm fixedeffects, some researchers have argued that by including firm fixed-effects they "throw the baby
with the bathtub water." Our study highlights the consequences of adopting this view on
estimation results. We show analytically and empirically that omitting firm fixed-effects yields
biased and inconsistent slope estimates and hence erroneous test statistics, which in turn could
result in incorrect inferences.
We complement recent studies by Petersen (2009) and Gow et al. (2012) that address
potential problems in panel datasets due to correlation in residuals over time and thus biased
standard errors. Unlike Petersen (2009) and Gow et al. (2012) who assume that the regression
model is well-specified, we focus on cases where models are misspecified, and hence the
coefficient estimates are biased and the related standard errors are incorrect. Specifically, we
show how incorrect inferences stem not only from the test statistics denominator, i.e., the
standard error (e.g., Petersen, 2009; Gow et al., 2012) but also from the test statistics numerator,
i.e., the coefficient estimates.
Our study is the first that focuses on the potential bias in estimated coefficients due to
omitting firm fixed-effects when panel datasets are used. Our survey of the common panel
dataset regression specifications used in accounting literature illustrates a clear preference for
the use of industry rather than firm fixed-effects. We find that the inclusion of industry fixedeffects does not eliminate the bias and could lead to markedly incorrect inferences. This is due
to potential within-industry variations that are ignored or wrongly assumed to be immaterial by
29
researchers. Our results show that the bias in coefficients is negatively related to the number of
industries controlled. This provides further support for within-industry variations and for the
need to use firm fixed-effects. We further test and show that other commonly used methods such
as differencing the dependent variable, or demeaning it, and using the Fama-MacBeth (1973)
procedure yield biased slope coefficients, unless the fixed-effects are uncorrelated with the
independent variable. In addition to the firm fixed-effects model, using first differences for both
the dependent and independent variables yield unbiased estimates but these are less efficient.
With the aim of providing guidance for empirical accounting researchers, we conclude
that the commonly used methods addressing the potential limitations in inferences of panel
datasets do not eliminate the correlated omitted variable problem, with the exception of firm and
time fixed-effects. This is due to the fact that with archival data, the exact form of the data
generation process is unknown to researchers. Our replications of two widely recognized
regression models in the accounting literature show that regression results are sensitive to the
method used. Without knowing all the underlying mechanisms, researchers should check for
substantial differences between a full fixed-effects specification and a simple pooled regression.
A substantial difference may highlight the need to use a full fixed-effects specification.
30
References
Aghion, P., Y., Algan, P. Cahuc, and A. Shleifer (2010). Regulation and Distrust, The Quarterly
Journal of Economics vol. 125, no. 3, pp. 1015-1049.
Ball, R., S. Kothari, S., and V. Nikolaev (2013). On Estimating Conditional Conservatism, The
Accounting Review, vol. 88, no. 3, pp. 755-787.
Baltagi, B.H. (2008) Econometrics Analysis of Panel Data (4th Edition). Chichester : Wiley.
Basu, S. (1997). The conservatism principle and the asymmetric timeliness of earnings. Journal
of Accounting and Economics, vol. 24, no. 1, pp. 3-37.
Easton, P. D., and T. S. Harris (1991). Earnings as an explanatory variable for returns. Journal
of Accounting Research, vol. 29, no. 1, pp. 19-36.
Fama, E. and J. MacBeth (1973). Risk, Return, and Equilibrium: Empirical Tests. Journal of
Political Economy, vol. 81, pp. 607–36.
Gow, I. D., G. Ormazabal, and D. J. Taylor (2010). Correcting for cross-sectional and timeseries dependence in accounting research. The Accounting Review, vol. 85, no. 2, pp. 483-512.
Greene, W.H., 2003. Econometric analysis, 5th. Edition. Upper Saddle River, NJ: Pearson
Education.
Hausman, J. A. (1978). Specification tests in econometrics. Econometrica: Journal of the
Econometric Society, vol. 46, no. 6, pp. 1251-1271.
Mundlak, Y., 1978. On the pooling of time series and cross section data. Econometrica: journal
of the Econometric Society, no. 46, pp. 69-85.
Patatoukas, P., and J. Thomas (2011). More evidence of bias in differential timeliness estimates
of conditional conservatism. The Accounting Review, vol. 86, no. 5, pp. 1765-1793.
Patatoukas, P. N., and J. K. Thomas (2015). Placebo tests of conditional conservatism. The
Accounting Review, forthcoming.
Petersen, M. A. (2009). Estimating standard errors in finance panel data sets: Comparing
approaches. Review of Financial Studies, vol. 22, no. 1, pp. 435-480.
Sloan, R. G. (1996). Do stock prices fully reflect information in accruals and cash flows about
future earnings? The Accounting Review, vol. 71, no. 3, pp. 289-315.
Wooldridge, J.M., 2010. Econometric analysis of cross section and panel data. MA: MIT Press.
31
Table 1
A Survey of Research Methodologies in Accounting Journals (2006-2013)
Panel A. Accounting Journals’ Review Statistics
Journal Empirical Experiment Theory Essay
179
42
25
47
CAR
91
5
20
68
EAR
212
1
23
44
JAE
188
22
39
45
JAR
153
6
24
46
RAST
328
75
37
17
TAR
Total
1152
151
168
267
Survey
10
18
2
4
0
21
55
Case Study
3
6
0
0
0
2
11
Other
18
16
1
1
0
3
39
Total
324
224
283
299
229
483
1842
Panel B. Accounting Journals’ Regressions Specification and Treatments
Journal
Pooled Annual Portfolio Time Industry Firm Country
CAR
150
3
3
83
71
13
11
EAR
62
15
2
26
27
9
6
JAE
179
9
20
86
76
27
11
JAR
144
10
19
79
66
23
11
RAST
116
10
15
61
46
8
2
TAR
282
28
17
128
120
34
17
Total
933
75
76
463
406
114
58
Panel C. Firm Fixed-Effect per year
Year
2006
2007
2008
Total
8
5
7
2009
2010
2011
2012
2013
12
7
22
26
27
Notes:
1. Journal abbreviations are: Contemporary Accounting Research (CAR), European Accounting
Review (EAR), Journal of Accounting and Economics (JAE), Journal of Accounting
Research (JAR), Review of Accounting Studies (RAST), and The Accounting Review
(TAR).
2. Column definitions in Panel A are:
- Empirical – Studies that use archival data to support a theory or derive a conclusion.
- Experiment – Studies that carry out experiments with the goal of verifying, falsifying, or
validating a hypothesis.
- Theory – Studies that operate within theoretically defined framework. Use mathematical
derivations to illustrate and verify hypothesis.
- Essay – Studies that do not test any hypothesis but merely discusses concepts within the
accounting discipline. These studies are often discussions of other papers or editors
comments on specific topics.
32
- Survey – Studies that gather and collect data by sending surveys to subjects.
- Case Study – Qualitative studies that study specific subjects in depth.
- Other - Interviews, Descriptive studies, and studies on methodology.
3. Column definitions in Panel B are:
- Pooled – Studies that use pooled cross-section and time-series regressions. Many
empirical studies use pooled regressions as a first step before improving the
specification.
- Annual – Studies that use indicator variables for specific years or periods (for instance,
pre-SOX.
- Portfolio – Studies that use portfolio analysis.
- Time – Studies that include time fixed-effects in pooled regressions.
- Industry – Studies that use industry fixed-effects in pooled regressions.
- Firm – Studies that include firm fixed-effects in the pooled regressions.
- Country – Studies that include country dummies. This could be done for specific
countries but also for all countries in the sample.
33
Table 2
Simulation Results
1
FE
2
IE
3
MS
4
FD
5
LY
6
MYX
7
MY
8
FM
β = 1 ρ = 0.5
Slope (b)
1.00
1.247
1.250 1.00
0.50
1.45
0.45
1.25
t-stat (b = 1)
0.01
22.8
16.5
0.01
-26.0
10.1
-46.5
35.8
Standard error
0.012
0.011
0.015 0.016 0.019
0.045
0.012 0.008
0.85
0.73
0.46
0.35
0.09
0.56
0.15
0.53
Adjusted R2
β = 1 ρ = 0.25
Slope (b)
1.00
1.12
1.12
1.00
0.50
1.23
0.45
1.12
t-stat (b = 1)
0.01
11.12
8.12
0.01 -26.00
4.82
-46.40
17.8
Standard error
0.012
0.011
0.015 0.016 0.019
0.047
0.012 0.008
0.84
0.69
0.40
0.35
0.09
0.46
0.15
0.47
Adjusted R2
β=1ρ=0
Slope (b)
1.00
1.00
1.00
1.00
0.50
1.00
0.45
1.00
t-stat (b = 1)
-0.02
-0.01
-0.01 -0.03 -26.03
0.01
-46.49 -0.02
Standard error
0.012
0.011
0.015 0.016 0.019
0.048
0.012 0.008
0.83
0.66
0.34
0.35
0.09
0.35
0.15
0.40
Adjusted R2
β = 1 ρ = -0.25
Slope (b)
1.00
0.87
0.87
1.00
0.50
0.77
0.45
0.87
t-stat (b = 1)
-0.01
-11.13 -8.14 -0.03 -25.93
-4.82
-46.36 -17.66
Standard error
0.012
0.011
0.015 0.016 0.019
0.047
0.012 0.008
2
0.81
0.63
0.29
0.34
0.09
0.25
0.15
0.35
Adjusted R
β = 1 ρ = -0.50
Slope (b)
1.00
0.75
0.75
1.00
0.50
0.55
0.45
0.75
t-stat (b = 1)
0.013
-22.78 -16.48 0.01 -25.95
-10.1
-46.37 -35.90
Standard error
0.012
0.011
0.015 0.016 0.019
0.045
0.012 0.008
0.79
0.61
0.23
0.34
0.09
0.15
0.15
0.29
Adjusted R2
Notes: We simulate a panel of 8,000 observations made of 10 periods, 20 industries and 40 firms
per industry. The correlations between the fixed-effects and the independent variable are 0.5,
0.25, 0, -0.25, and -0.50, respectively. The estimation methods are:
(1) FE – Including fixed firm and time fixed-effects
(2) IE - Industry and time fixed-effects
(3) MS - Miss-specified model (no fixed-effects)
(4) FD - First differences for both the dependent and independent variables
(5) LY – First differences of the dependent variable only
(6) MYX – Using means of both the dependent and independent variables (subtracting the firm
mean from both the dependent and independent variable).
(7) MY – Demeaning the dependent variable (subtracting the firm mean from the dependent
variable)
(8) FM – Fama-Macbeth (1973) applied to annual OLS regressions.
The table reports the means of the estimated slope coefficient (b), the t-statistics, standard errors
and R2s across the 8,000 simulations of 8,000 observations in each round.
34
Figure 1
Plots of the Distributions of Estimated Coefficients under Different Model
Specifications and Correlation Parameters
Figure 1a
β=1; correlation=0.5
No. of observations
3500
FE
3000
IE
2500
MS
2000
FD
1500
LY
1000
MY
500
FM
MYX
0
0.38
0.50
0.63
0.75
0.87
1.00
1.12
Distributions of β
1.24
1.37
1.49
1.61
Figure 1b
β=1; correlation=0
2500
FE
No. of observations
IE
2000
MS
FD
1500
LY
1000
MY
500
FM
MYX
0
0.39
0.47
0.55
0.63
0.70
0.78
0.86
Distributions of β
35
0.94
1.01
1.09
1.17
Figure 1c
β=1; correlation=‐0.5
2000
FE
No. of observations
IE
1500
MS
FD
1000
LY
MY
500
FM
MYX
0
0.40
0.47
0.54
0.61
0.67
0.74
0.81
Distributions of β
0.88
0.95
1.02
1.09
Table 3
Varying the Number of Industries and Number of Member Firms
β = 1 ρ = 0.5
Slope (b)
t-stat (b = 1)
Standard error
Adjusted R2
Full
FE
1.00
0.02
0.012
0.85
IE
10/80
1.249
16.7
0.015
0.46
Industry Fixed-effects
IE
IE
IE
20/40
40/20
160/5
1.248
1.24
1.22
16.6
16.2
14.6
0.015
0.015
0.016
0.47
0.47
0.50
IE
400/2
1.17
10.4
0.017
0.52
MS
1.25
16.6
0.015
0.46
Note: We simulate a panel of 8,000 observations made of 10 periods, a varying number of
industries (10, 20, up to 400) and a varying number of companies per industry (80, 40, down to
2). The correlation between the fixed-effects and the independent variable is positive. The
models are: FE – Full Fixed Time and Firm Effects; IE - Industry and time fixed-effects; and
MS - Miss-specified model (no fixed-effects). The table reports the means of the estimated slope
coefficient (b), the t-statistics, standard errors and R2.
36
Figure 2
Sensitivity to Increasing the Cross-time Variance of the Regressors
β=1; correlation=0.5
0.50
0.40
Bias in MS
MS‐FE
0.30
0.20
0.10
0.00
2
1.75
1.5
1.25
1
0.75
0.5
0.25
‐0.10
Time variance of Xs
Note: We simulate a panel of 8,000 observations made of 10 periods, a varying number of
industries and a varying number of companies per industry. The correlation between the fixedeffects and the independent variable is positive 0.5. The true slope coefficient is equal to one
and the correlation between fixed-effects and regressors is 0.5. We vary the variance of the time
component in X, i.e., t from 2, 1.75, 1.50….down to 0.25.
37
Table 4
Variable Definitions and Summary Statistics
Panel A: The Asymmetric Timeliness of Earnings (Basu, 1997)
The model is: X it / Pit1 0 1D( Rit 0) 0 Rit 1D( Rit 0)Rit it , where X it / Pit 1 denotes
earnings per share divided by lagged share price; Rit denotes raw returns; Rit Rmt denotes
market adjusted returns; Rait denotes size and book-to-market adjusted return; and D( Rit 0) is
an indicator variable that takes the value of “1” if Rit is negative, and “0” otherwise.
Summary statistics 1963-2013; 105,179
obs.
P5
P25
P50
P75
P95
MEAN
STD
X it / Pit 1
Rit
-0.17
0.02
0.06
0.10
0.21
0.05
0.19
D( Rit 0)
-0.50
-0.14
0.09
0.38
1.11
0.17
0.53
-0.56
-0.23
-0.02
0.23
0.91
0.05
0.49
Panel B: Accruals and Cash flows as predictors of earnings (Sloan, 1996)
OI it
ACC it 1
CF
The model is:
0 0
1 it 1 it , where OI it / TAit denotes operating
TAit
TAit
TAit
income divided by average total assets; ACC it / TAit denotes the accrual component of earnings
divided by average total assets; and CFit / TAit denotes operating cash flows divided by average
total assets.
Summary Statistics 1963-2013; 104,898 obs.
P5
P25
P50
P75
P95
MEAN
STD
OI it / TAit
-0.14
0.08
0.13
0.20
0.31
0.12
0.15
ACC it / TAit
-0.11
-0.02
0.01
0.05
0.15
0.01
0.09
CFit / TAit
-0.17
0.05
0.12
0.19
0.31
0.11
0.15
38
Table 5
Alternative Estimation Methods - The Asymmetric Timeliness of Earnings
(Basu, 1997)
α0
α1
β0
β1
Basu (1997) as reported
Table 1, Panel A – coefficient
t-statistic
Adj-R2
Observ.
0.090
(68.03)
0.002
(0.86)
0.059
(18.34)
0.216
(20.66)
0.10
43,321
Table 1, Panel B – coefficient
t-statistic
0.030
(22.62)
0.014
(6.07)
0.047
(11.03)
0.256
(27.14)
0.12
43,321
Table 1, Panel C – coefficient
t-statistic
Replication with Raw Returns
Pooled, no fixed-effects (MS)
1963-2013 – coefficient
t-statistic
0.086
(64.1)
-0.005
(1.96)
0.075
(21.3)
0.166
(16.5)
0.12
43,118
0.074
(76.00)
-0.009
(-5.22)
-0.006
(-3.93)
0.256
(52.02)
0.05
114,175
1963-1990 – coefficient
t-statistic
Fixed Firm and Year Effects (FE)
1963-2013 – coefficient
t-statistic
0.094
(63.38)
0.000
(0.14)
0.064
(24.81)
0.198
(24.12)
0.09
42,546
0.060
(12.27)
-0.012
(-7.10)
0.016
(10.63)
0.145
(28.45)
0.19
114,175
1963-1990 – coefficient
t-statistic
Fixed Industry and Year Effects (IE)
1963-2013 – coefficient
t-statistic
0.085
(10.37)
-0.004
(-1.73)
0.074
(28.04)
0.091
(10.88)
0.24
42,546
0.069
(14.04)
-0.013
(-7.29)
0.011
(7.28)
0.223
(44.90)
0.12
114,175
0.093
(12.30)
-0.006
(-2.54)
0.071
(26.40)
0.163
(21.72)
0.16
42,546
0.069
(14.46)
-0.005
(-1.49)
0.023
(3.63)
0.26
(14.51)
0.10
114,175
1963-1990 – coefficient
0.089
-0.004
0.049
t-statistic
(12.65) (-0.95)
(5.34)
Means of Dependent and Independent Variables (MYX)
1963-2013 – coefficient
0.017
-0.020
0.066
t-statistic
(8.63)
(-7.62) (10.84)
0.211
(9.26)
0.13
42,546
0.096
(10.43)
0.04
16,459
0.097
9.03
0.15
7,820
1963-1990 – coefficient
t-statistic
Fama-MacBeth (FM)
1963-2013 – coefficient
t-statistic
1963-1990 – coefficient
t-statistic
0.053
24.54
39
-0.014
-4.82
0.148
20.76
Demeaning the Dependent Variable (MY)
1963-2013 – coefficient
0.008
t-statistic
(9.10)
-0.008
(-5.39)
0.020
(15.61)
0.082
(18.96)
0.02
114,175
1963-1990 – coefficient
t-statistic
-0.002
(-0.75)
0.069
(30.11)
0.048
(6.65)
0.06
42,546
-0.006
(-4.61)
Notes: The table reports results for estimating Basu’s (1997) asymmetric timeliness of earnings
model: X it / Pit 1 0 1D( Rit 0) 0 Rit 1D( Rit 0) Rit it . We report results for two periods:
1963-2013, and 1963-1990 (the sample period used in Basu (1997)). See table 4 for variable
definitions. We present results for the following specifications: (i) as reported in Basu (1997),
(ii) replication without any fixed-effects (MS), (iii) replication with firm and year fixed-effects
(FE), (iv) replication with industry and year fixed-effects (IE), (v) replication using the Fama
and MacBeth (1973) methodology (average coefficients and corresponding standard errors
obtained from annual cross-sectional regressions, FM)); and (vi) a replication where the
dependent variable is mean-adjusted (mean is calculated for each cross-sectional unit by
averaging observations over time). See Table 4 for variable definitions.
40
Table 6
Alternative Estimation Methods - Accruals and Cash flows as Predictors of
Earnings (Sloan, 1996)
α0
β0
β1
Adj-R2
Observ.
0.011
(24.05)
0.765
(186.53)
0.855
(304.56)
?
40,679
Table 3, Panel B, pooled, decile ranking
t-statistic
Replication
Pooled, no fixed-effects (MS)
1963-2013 – coefficient
t-statistic
-2.216
(-55.86)
0.565
(141.02)
0.838
(209.43)
?
40,679
0.023
(64.52)
0.697
(206.02)
0.811
(426.34)
0.64
104,898
1962-1991 – coefficient
t-statistic
0.042
(66.18)
0.621
(141.29)
0.732
(207.87)
0.50
43,978
Fixed Firm and Year Effects (FE)
1963-2013 – coefficient
t-statistic
0.055
(28.97)
0.468
(128.33)
0.538
(197.53)
0.69
104,898
1962-1991 – coefficient
t-statistic
0.084
(23.83)
0.377
(75.15)
0.453
(97.52)
0.57
43,978
Fixed Industry and Year Effects (IE)
1963-2013 - coefficient
t-statistic
0.026
(13.41)
0.680
(197.43)
0.789
(397.55)
0.65
104,898
1962-1991 - coefficient
t-statistic
0.044
(12.63)
0.606
(135.63)
0.713
(197.83)
0.51
43,978
Fama-MacBeth (FM)
1963-2013 - coefficient
t-statistic
0.032
(12.89)
0.643
(28.20)
0.753
(29.86)
0.55
104,898
1962-1991 - coefficient
t-statistic
0.043
(22.40)
0.588
(22.67)
0.700
(18.71)
0.47
43,978
Means of Dependent and Independent Variables (MYX)
1963-2013 - coefficient
-0.001
0.816
t-statistic
(-1.34)
(64.96)
0.988
(218.92)
0.11
8,109
1962-1991 - coefficient
t-statistic
0.888
(89.08)
0.11
4,243
Variables
Sloan (1996) As reported
Table 3, Panel A, pooled - coefficient
t-statistic
0.016
(10.22)
41
0.715
(44.08)
Demeaning the Dependent Variable (MY)
1963-2013 - coefficient
-0.024
t-statistic
(-71.06)
0.215
(65.46)
0.202
(109.21)
0.109
104,898
1962-1991 - coefficient
t-statistic
0.211
(51.47)
0.234
(71.27)
0.107
43,978
-0.036
(-60.59)
Notes: The table reports results for the following specifications: (i) as reported in Sloan
(1996, Table 3), (ii) replication without any fixed-effects, (iii) replication with firm and
year fixed-effects, (iv) replication with industry and year fixed-effects, and (v) replication
using the Fama and MacBeth (1973) methodology (average coefficients and
corresponding standard errors obtained from annual cross-sectional regressions). We also
report results for two sample periods: 1963-2013, and 1962-1991). The model is:
OI it / TAit 0 0 ACC it 1 / TAit 1CFit 1 / TAit it .
See Table 4 for variable definitions.
42