Working Paper 11-18
Economic Series
May, 2011
Departamento de Economía
Universidad Carlos III de Madrid
Calle Madrid, 126
28903 Getafe (Spain)
Fax (34) 916249875
State dependence and heterogeneity in health
using a bias corrected fixed effects estimator.*
Jesús M. Carro
Universidad Carlos III de Madrid
Alejandra Traferri
Pontificia Univ. Católica de Chile
Abstract
This paper considers the estimation of a dynamic ordered probit of self-assessed
health status with two fixed effects: one in the linear index equation and one in the cut
points. The two fixed effects allow us to robustly control for heterogeneity in
unobserved health status and in reporting behaviour, even though we can not separate
both sources of heterogeneity. The contributions of this paper are twofold. First it
contributes to the literature that studies the determinants and dynamics of Self-Assessed
Health measures. Second, this paper contributes to the recent literature on bias
correction in nonlinear panel data models with fixed effects by applying and studying
the finite sample properties of two of the existing proposals to our model. The most
direct and easily applicable correction to our model is not the best one, and has
important biases in our sample sizes.
JEL classification: C23, C25, I19
Keywords: dynamic ordered probit, fixed effects, self-assessed health, reporting bias,
panel data, unobserved heterogeneity, incidental parameters, bias correction.
*
This is a revised version of a paper previously circulated under the title "Correcting the bias in the
estimation of a dynamic ordered probit with fixed effects of self-assessed health status". We thank Raquel
Carrasco, Matilde Machado and seminar participants at Universidad Carlos III, Boston University, and
The Lincoln College (Oxford U.) Applied Microeconometrics Conference for useful remarks and
suggestions. The first author gratefully acknowledges that this research was supported by a Marie Curie
International Outgoing Fellowship within the 7th European Community Framework Programme, by
grants number ECO2009-11165 and SEJ2006-05710 from the Spanish Minister of Education, MCINN
(Consolider- Ingenio2010) and Consejería de Educación de la Comunidad de Madrid (Excelecon project).
The second author would like to thank the Institute for Economic Development at Boston University, in
which she conducted part of this research as a visiting scholar. Address for correspondence: Jesús M.
Carro. Departmento de Economia, Universidad Carlos III de Madrid, Madrid 28903, Getafe (MadridSpain). E-mail addresses: jcarro@eco.uc3m.es; atraferri@uc.cl.
1
Introduction
Self-assessed health (SAH) has been used as a proxy for true overall individual health
status in many socioeconomic studies. Moreover, it has been shown to be a good predictor of mortality and of demand for medical care (see, for example, van Doorslaer,
Jones, and Koolman, 2004). Motivated by this and by the high observed persistence in
health outcomes, Contoyannis, Jones and Rice (2004) study the dynamics and e¤ects of
socioeconomic variables on SAH in the British Household Panel Survey. Among other
aims, they investigate the relative contribution of state dependence and unobserved heterogeneity in explaining the observed persistence in SAH. State dependence may arise due
to structural reasons such as di¤ering abilities to deal with new health shocks depending
on previous health status, or willingness to investments in health that changes as health
status evolves. For example, people may be less prone to invest in their health after a
health shock that lowers their returns to that investment. In any case, as it happens in
labor force participation, regardless of the underlying explanations for state dependence,
knowing its magnitude is relevant for many health policy debates. This is because the
state dependence informs of the long-run implications of a policy a¤ecting health status
today.
Given that SAH is a categorical variable Contoyannis, Jones and Rice (2004) use
a dynamic ordered probit model, and they take a random e¤ects approach to control
for unobserved heterogeneity in the level equation. Halliday (2008) studies the relative
contribution of state dependence and unobserved heterogeneity in SAH using a di¤erent
data set and another random e¤ects approach. Halliday(2008) only includes age as a
covariate as the study focuses on the evolution over the life-cycle.
We account for heterogeneity in reporting behavior (cut-point shifts) in addition to
heterogeneous unobserved factors that a¤ect health status (index shifts). An example of
index shifts is genetic traits. Cut-point shifts occur if individuals use di¤erent thresholds
to assess their health and report di¤erent values of SAH even though they have the
same level of true health.1 Since we can only identify di¤erences up to scale in discrete
choice models, we cannot separately identify the two sources of heterogeneity. We can,
nonetheless, correctly control for both sources of heterogeneity by including individual
e¤ects in the levels and the cut points of the ordered probit. A model with only one
individual e¤ect (usually placed in the index equation) allows both sources of heterogeneity
too, but so restrictively that it almost always gives incorrect estimates and inferences if
both sources are present and relevant.
As with one individual e¤ect, we could take a ‘random e¤ects’ approach. However, this
approach has the drawback of imposing either independence, or a speci…c and potentially
1
See Lindeboom and van Doorslaer (2004) for a test that shows evidence of existence of these two
di¤erent kinds of shifts.
1
too restrictive functional form on the relation between unobserved heterogeneity and other
explanatory variables. It also has the drawback of having to deal with the so-called initial
conditions problem. By taking a ‘…xed e¤ects’ approach, we place no restrictions on
the joint distribution of the two individual e¤ects and their correlation with explanatory
variables. Moreover, there is no initial conditions problem. Despite these advantages,
there have been very few applications of nonlinear panel models with …xed e¤ects in
health economics, as noted in Jones’ (2007) handbook’s chapter.2 This is due to the
known problems in estimating nonlinear panel data models with …xed e¤ects and the
panel data sets available. This estimation problem is usually called incidental parameters
problem, and it results in large …nite sample biases of the MLE when using panels where
T is not very large. It is more severe in a model like ours that is dynamic and contains
more than one …xed e¤ect.
An important part of the research in microeconometrics has been concerned with
…nding a solution to this problem by developing bias-adjusted methods. Some examples
are Hahn and Newey (2004), Hahn and Kuersteiner (2004), Arellano and Hahn (2006),
Carro (2007), Fernandez-Val (2009), and Bester and Hansen (2009).3 This fast growing
literature o¤ers several bias correction methods potentitaly useful to estimate our model.
Bester and Hansen (2009) include an application of their so-called HS estimator to a
dynamic ordered probit model with two …xed e¤ects. So, the HS is directly applicable
to our problem, whereas others require some transformation to adapt them to our model
with two …xed e¤ects. However, simulations of other models in the referred papers suggest
that HS is not the best one in terms of …nite sample performance. They show that for
sample sizes with T less than fourteen, the remaining bias when using HS could still be
signi…cant, especially for the ordered probit Bester and Hansen (2009) simulate. This
result is con…rmed in our simulations, which are more speci…c to the model we want to
estimate. Thus, we have to consider another of the proposed methods.
In this paper we derive explicit formulas of the Modi…ed MLE (MMLE) used in Carro
(2007) for the dynamic ordered probit model considered here. We evaluate its …nite
sample performance and compare it with the HS penalty estimator.4 The MMLE has
better …nite sample properties and negligible bias in our sample size. This exercise is a
main contribution of this paper since, as Arellano and Hahn (2007) point out in their
conclusions, more research is needed to know “how well each of the methods recently
proposed work for other speci…c models and data sets of interest in applied econometrics.”
2
Jones and Schurer (2009) is a recent example of using the …xed e¤ects approach to study SAH;
however, they use the Conditional MLE of Chamberlain (1980) which does not provide information
about the distribution of the …xed e¤ects. This information is needed to calculate marginal e¤ects, the
usual parameters of interest in nonlinear models. Another important di¤erence is that Jones and Schurer
(2009) do not allow for dynamics.
3
See Arellano and Hahn (2007) for a good review of this literature, detailed references and a general
framework in which the various approaches can be included.
4
The MMLE comes from modifying the score of the MLE so that the order of the bias in T is reduced.
2
Also, Greene and Henshen (2008) comment on the lack of studies about the applicability
of the recent proposals for bias reduction estimators in binary choice models to ordered
choice models.
The rest of the paper proceeds as follows. Section 2 presents our model of SAH, the
data we use, and explains the relation of this paper to other recent papers about SAH.
Section 3 presents the estimation problem and the method we propose. We also comment
on possible solutions from the nonlinear bias correction literature for nonlinear panel data
models with …xed e¤ects. We use simulations to evaluate the …nite sample performance
of di¤erent alternatives and to justify selection of MMLE as our estimator. Section 4
presents the estimation results. The estimates of our model and the comparison with
random e¤ects estimates show that there are important state dependence e¤ects, and
statistically signi…cant e¤ect of income and other socioeconomic variables. Results also
show that ‡exibly accounting for permanent unobserved heterogeneity matters. Section
5 concludes.
2
Model and Data
2.1
Empirical Model of self-assessed health
We consider the following dynamic panel data ordered probit with …xed e¤ects as a
reduced-form model of self-assessed health status (SAH):
hit =
i
+
1 1 (hi;t 1
= 1) +
1 1 (hi;t 1
=
1) + x0it + "it ; i = 1; :::; N , t = 0; ::; T (1)
where xit is a set of exogenous variables that in‡uence SAH, "it is a time and individualvarying error term which is assumed to be "it N (0; 1), and hit is the latent health. The
iid
reported SAH (hit ), which is what we observe, is determined according to the following
thresholds:
8
>
hit < ci
< 1 if
hit =
(2)
0 if
ci < hit 0
>
:
1 if
hit > 0
where hit = 1 corresponds to poor health, hit = 0 to fair health and hit = 1 to good
health. i and ci are the model’s …xed e¤ects; these account for permanent unobserved
heterogeneity, both in unobserved factors a¤ecting health and in reporting behaviour, in
an unrestricted way, as explained at the introduction. Note that in addition to the usual
scale normalization in discrete choice models (i.e. restricting the variance of "it to equal
one), here we are also normalizing one of the two cut points to be zero. The somewhat
more conventional normalization of setting the intercept in the linear index equal to zero
is not available to us because the distribution of the intercept, including its mean, is
3
unrestricted in the …xed e¤ects approach. An alternative normalization would be to put
the two …xed e¤ects in the two cut points and leave the linear index equation without any
intercept.
As this discussion on normalization shows, it is clear that it is not possible to separately identify individual e¤ects a¤ecting that impact only hit from those that impact the
cut points. Therefore, though we controll for the two mentioned sources of unobserved
heterogeneity, we can not separate them. Additionally, having only the …xed e¤ect in the
linear index ( i ) would also account for heterogeneity in the cut points, but in a very
restrictive way. In particular, by introducing only one individual e¤ect ( i ), we would
be assuming that both sources of unobserved heterogeneity must have e¤ects of opposite
signs in Pr(hit = 1) and Pr(hit = 1); furthermore, we would be restricting how these two
e¤ects di¤er in magnitude for all individuals. We do not have evidence in favor of these
assumptions. Furthermore, given the di¤erent sources of the unobserved heterogeneity
and the potential relations among them and observable variables these assumption are
most likely too restrictive, leading to incorrect inference. In contrast with this, by having
two …xed e¤ects in (2) we are not imposing any restrictions on the cut-point shifts, nor
on the index shift. This constitutes an important di¤erence from previous studies like
Contoyannis, Jones and Rice (2004).
In addition to the parameters capturing the e¤ect of heterogeneity, capture the e¤ect
of exogenous variables, and 1 and 1 are the parameters that allow state dependence
in this model. Determining the relative importance of tate dependence versus permanent
unobserved heterogeneity as alternative sources of persistence is crucial since they have
very di¤erent implications. As explained in the introduction, there are several structural
reasons for state dependence. However, regardless of the reason, state dependence gives
the long-run e¤ect of a policy a¤ecting health status today. This is why it is so useful to
know its magnitude.
2.2
Data and x variables
We use the British Household Panel Survey (BHPS), a longitudinal survey of private
households in Great Britain. It was designed as an annual survey of each adult (16+)
member of a representative sample of more than 5,000 households, with approximately
10,000 individual interviews. The same individuals are re-interviewed in successive waves;
if they split o¤ from their original households are re-interviewed along with all adult members of their new households. Similarly, new adult members joining sample households,
and children who have reached the age of 16 become eligible for interview. We use sixteen
waves of data (years 1991 - 2006), and include individuals who gave a full interview. An
unbalanced panel of individuals who were interviewed in at least 8 subsequent waves is
used. Our sample consists of 76128 observations from 6,375 individuals.
4
SAH is de…ned for waves 1-8 and 10-16 as the response to the question “Compared
to people of your own age, would you say your health over the last 12 months on the
whole has been: excellent, good, fair, poor, very poor?” In wave 9 the SAH question and
categories were reworded. This makes comparison with other waves di¢cult and wave 9
is not used in our empirical analysis.
The original …ve SAH categories is collapsed to a three-category variable, creating a
new SAH variable that is our dependent variable, with the following codes: poor (hit =
1) for individuals who reported either “very poor” or “poor” health; fair (hit = 0) for
individuals who reported “fair” health; and Good (hit = 1) for individuals who reported
“good” or “excellent” health.
Main Model. The explanatory variables x that we use in the main model we estimate are: three dummy variables representing marital status (Married, Widowed, Divorced/Separated) with Single as the reference category, size of the household (the number
of people living in the same household), number of kids in the household, household income, year dummies (excluding the necessary number to avoid prefect colinearity), and a
quadratic function of age. The question about SAH that we use to construct our dependent variable asks respondents to compare health with people their own age. However,
SAH becomes worse over time in the raw sample data, perhaps indicating that the age
e¤ect over health is not totally discounted by respondents.This can be seen in table 2.5
This is the reason for including age as an explanatory variable. The income variable is the
logarithm of equivalised real income, adjusted using the Retail Price Index and equivalised
by the McClement’s scale to adjust for household size and composition, and consists on
the sum of non-labour and labour income in the reference year.
Variables that are time-constant and speci…c for individuals, like the level of education
or gender, are not included in the set of explanatory variables because they can not be
separately identi…ed from permanent unobserved heterogeneity.6 Fixed e¤ects account
for these variables as well as for unobserved characteristics, and we can not separate their
e¤ects. Sometimes this is seen as a drawback of the …xed e¤ects approach. However, the
random e¤ects approach only separately identi…es the e¤ect of these variables because
of the unrealistic assumption that unobserved characteristics are independent from them
(for example that unobserved healthy life style is independent of education). Even with
a correlated random e¤ects approach, if correlation is allowed in a Mundlak (1978) and
Chamberlain (1984) style and initial conditions are controlled for following Wooldridge
(2005) proposal, it is not possible to separately identify the e¤ect of these time constant
variables from the e¤ect of the unobserved factors correlated with them without further
assumptions. For instance, Contoyannis, Jones and Rice (2004) follow Wooldridge (2005)
5
6
See Contoyannis, Jones and Rice (2004) for further discussion on this.
They are, however, included in the random e¤ects estimation we make for comparison.
5
proposal, and they comment about this impossibility of separating the e¤ect of variables
like education from the e¤ect of the unobservables correlated with them.
Additional Model. In addition to the main model we estimate a model including
variables with information on objective health problems. These variables turn in observable part of the unobserved underlying true health, especially persistent health situations.
This will help in identifying heterogeneity in reporting behaviour. With this additional
model we try to see whether the state dependence that we may …nd in the main model
is still signi…cantly di¤erent from zero even after introducing observations of persistent
determinants of health. These variables are not clean determinants of SAH and are a mix
of several components. Therefore they will induce a decrease in the e¤ect of hit 1 even if
we correctly capture and isolate all the state dependence e¤ect in the main model. However, if state dependence is still signi…cantly di¤erent from zero this will provide further
evidence of the robustness and importance of dynamics and state dependence in SAH.
The BHPS contains several questions about health problems and health care demand,
but many of them can be induced by a self valuation that might di¤er from true health
as much as SAH, and in an unobserved way. For example the number of visits to the
doctor can be determined by a perception of a health problem rather than a true health
problem. To avoid this endogeneity bias, we have selected only those questions that we
regard as measuring more objective health situations and, therefore, are not a¤ected by
personal health assessments. We introduce the following variables:
- Health problems: This is a dummy variable, which takes the value 1 if the individual
reports at least one of the following permanent health problems or disabilities: arthritis or
rheumatism, di¢culty in hearing, allergies, asthma, bronchitis, blood pressure, diabetes,
migraine or frequent headaches, cancer and stroke, among others.
- Health limits daily activities: This is a dummy variable, which takes the value 1 if
the individual answers ‘yes’ to the following question: does your health in any way limit
your daily activities, compared to most people of your age? Examples of daily activities
included are: doing the housework, climbing stairs, dressing yourself, walking for at least
10 minutes, etc.
- Health limits ability to work: Similar to previous question.
- Number of days in a hospital as an in-patient in the reference year.
- Finally, we include a dummy variable representing long term sick or disabled, and four
other variables for employment status (Self employed, In paid employment, Unemployed,
Retired). The category ’Other’ (that includes looking after family or home, on maternity
leave, on a government training scheme, full-time student/at school, and something else)
is left as the reference category.
6
Table 1: Number of individuals that reports each category of SAH by number of times it
is reported.
Number Excellent or good
Fair
of times Freq.
%
Freq. (N)
0 273
4.28
2076
1 170
2.67
1114
2 182
2.85
867
3 193
3.03
641
4 233
3.65
481
5 273
4.28
376
6 379
5.95
279
7 456
7.15
204
8 665
10.43
145
9 563
8.83
83
10 533
8.36
61
11 495
7.76
19
12 544
8.53
20
13 672
10.54
5
14 744
11.67
4
Total 6375
100.00
6375
Poor or very poor
%
Freq. (N)
%
32.56
4380
68.71
17.47
898
14.09
13.60
367
5.76
10.05
213
3.34
7.55
137
2.15
5.90
99
1.55
4.38
79
1.24
3.20
46
0.72
2.27
47
0.74
1.30
33
0.52
0.96
32
0.50
0.30
16
0.25
0.31
8
0.13
0.08
9
0.14
0.06
11
0.17
100.00
6375
100.00
For example, 273 in the conlumn Freq. of category ‘Excellent or good’, is the number of
individuals that reported ‘Excellent or good’ 0 times in total over the sample period they
are observed.
Descriptive Statistics Tables 1, 2 and 3 contain some descriptive statistics of selfassessed heath reported in our sample. The most frequent category is excellent or good
with more than 70% of the answers corresponding to this category. There is high persistence in SAH reported as can be seen in table 3, which shows the transition probabilities.
In this table, the largest numbers are on the diagonal for all three values of SAHt 1 . Table
2 presents the variation of SAH across di¤erent characteristics and health variables. For
example, married or single people respond in the excellent or good health category more
frequently than widows or divorced people. The three objective health measures in table
2 alter the SAH responses in the expected direction and in greater magnitude than the
socioeconomic variables also presented in the table.
2.3
Relation to recent papers studying heterogeneity and state
dependence in SAH
2.3.1
Relation to Contoyannis, Jones and Rice (2004)
There is a clear connection between this paper and Contoyannis, Jones and Rice (2004):
both papers use the British Household Panel Survey to study the dynamics of SAH. Nev-
7
Table 2: Proportion (in %) of each category of SAH by several characteristics
Characteristics and their
SAH categories
Sample Proportions
Excellent or good Fair Poor
All
73.19
19.39
By age group
40.17
< 40
78.31
16.50
43.92
40-64
72.92
18.91
15.91
65+
61.02
28.02
By sex
46.84
Male
75.35
18.32
53.16
Female
71.29
20.34
By marital status
63.46
Married
74.00
18.86
8.92
Divorced
69.63
19.29
6.32
Widowed
58.84
28.92
21.3
Single
76.52
18.20
By household size
13.30
1
65.57
23.82
34.32
2
71.67
20.51
20.20
3
74.33
18.44
21.63
4
78.30
16.50
10.55
5+
75.10
17.97
By kids number
64.12
0
70.91
20.84
15.52
1
76.70
17.05
14.73
2
78.45
16.19
5.63
3+
75.75
17.74
Health problems
58.46
Yes
60.57
27.26
41.54
No
90.95
8.32
Health limits daily activities
13.36
Yes
22.49
39.13
86.64
No
81.01
16.35
Health limits work
16.43
Yes
29.85
38.29
83.57
No
81.71
15.68
8
or very poor
7.42
5.19
8.17
10.96
6.34
8.37
7.14
11.08
12.25
5.28
10.62
7.83
7.24
5.20
6.93
8.25
6.25
5.36
6.51
12.16
0.74
38.38
2.64
31.86
2.61
Table 3: Sample transition probabilities from SAH in t-1 to SAH in t
SAH
in
t 1
Excellent
Fair
Poor or very poor
Proportion
Excellent or
85.91
43.22
17.66
72.80
SAH in t
good Fair Poor or very poor Total
11.84
2.25
100
45.18
11.59
100
31.60
50.74
100
19.67
7.53
100
ertheless, there are several aspects considered in Contoyannis, Jones and Rice (2004) that
are not studied here. In particular, that paper contains a more detailed data description,
and a further discussion of the estimated model; it also address other issues, like sample
attrition, that are not considered here.7 However, our paper complements and adds to
Contoyannis, Jones and Rice (2004) in various ways:
(i) We use more periods from the BHPS than they do. They only use the …rst eight
waves because the ninth contains a di¤erent question about and categorization of
SAH. While we drop the 9th wave too, we incorporate the waves after the 9th in
our estimation. Since the model speci…ed includes only one lag of hit , we have all
the variables we need for the 11th to 16th waves. For the 10th wave we have all the
variables but hit 1 as it is the case for the …rst wave. We treat the 10th wave like
an initial observation and condition it out in our likelihood leaving the probability
of that observation totally unrestricted. Contoyannis, Jones and Rice (2004) can
not do this because of their way of solving the initial conditions problem and use of
random e¤ects.
(ii) In our model we have two individual speci…c e¤ects: one in the linear index and one in
the cut points. Lindeboom and van Doorslaer (2004) tested for and found clear evidence of di¤erent reporting behavior (cut-point shifting) for gender and age. Given
that Contoyannis, Jones and Rice (2004) impose homogeneous cut points, they estimate di¤erent models by gender to allow for that di¤ering reporting behavior, but
they do not allow unrestricted di¤erent behavior by age. Although we can not separately identify both sources of unobserved heterogeneity, our approach is robust to
heterogeneous cut points freely correlated with any determinant of SAH.
(iii) We use …xed e¤ects instead of a random e¤ects approach. The advantages of this are
7
An unbalanced panel (with random attrition) in a dynamic panel model does not pose any complications to a …xed e¤ect estimator (as opposed to a random e¤ects estimator), as long as it does not imply
many individuals with a very small number of periods; and in our sample all observations have at least
8 periods. However, the assumption of attrition at random seems unrealistic. Contoyannis, Jones and
Rice (2004) made a test and found evidence of non-random attrition, but they also found that the bias
this may be causing to the estimates is negligible. Given this result using the same data set as us, and
since this problem would take us too far from the main theme of this paper, we do not consider it here.
9
that no arbitrary restriction is imposed on the correlation between permanent unobserved heterogeneity and the observable variables, and there is no initial conditions
problem.
(iv) As an additional complement, our study includes some objective health measures,
so we can see how much is explained by socioeconomic variables and by state dependence even after these measures are included.
Given the aspects not covered in this paper, and in order to make an assessment of
the contributions of this paper with respect to the previous literature we also estimate
our models using the same kind of speci…cation and estimation method as Contoyannis,
Jones and Rice (2004). Thus that we also estimate (2) using a correlated random e¤ects
speci…cation with only an individual e¤ect in the linear index equation (the i parameter
in (1)), but with homogeneous cut points. Therefore, in this correlated random e¤ects
speci…cation:
8
>
hit < c1
< 1 if
hit =
(3)
0 if c1 < hit c2
>
:
1 if
hit > c2
where c1 and c2 are (homogenous) parameters to be estimated, hit is de…ned in (1), and
i in (1) is assumed to be:
i
=
0
+
0
1 hi1
+
0
2 xi
+ ui
where xi is the average over the sample period of the exogenous variables, and ui
(4)
iid
N (0; u2 ) independently of everything else. hi1 is in (4) to deal with the initial condition
problem following Wooldridge (2005).
2.3.2
Relation to Halliday (2008)
Halliday (2008) studies state dependence and heterogeneity in SAH using data from the
Panel Study of Income Dynamics. Since his focus is on the evolution of health over the
life-cycle, he only considers age as explanatory variable. No other socio-economic variable
is included. Another di¤erence with our study is that he further reduces health status
to two categories estimating a logit instead of an ordered probit with three categories.
With respect to heterogeneity, on the one hand Halliday (2008) is more ‡exible because
his analysis allows for heterogeneous parameters both in the intercept and in the slope
of the dynamic model. On the other hand the random e¤ects approach he adopts has
no incidental parameters problem but restricts the distribution of the heterogeneity and
su¤ers from the initial conditions problem. Nonetheless, Halliday (2008) uses a discrete
…nite mixture, which is potentially more ‡exible and less parametric in its treatment of
heterogeneity and the initial conditions than the distribution assumed in Contoyannis,
10
Jones and Rice (2004). This greater ‡exibility comes from the possibility of having many
points of support which should provide an approximation to a variety of distributions like
asymmetric distributions, or distributions with several modes.
The limitation of this approach is computational. This limitation leads Halliday (2008)
to consider no more than four points of support, even though more points of support might
be needed.8 Four is certainly more than the two points of support assumed in other
applications, but it may be not enough to provide a good approximation to a bivariate
distribution. In this paper we have two individual speci…c e¤ects potentially correlated
with each other. Such a bivariate joint distribution may be di¢cult to approximate with
only four points of support. In contrast to these limitations, the …xed e¤ects approach we
follow here is non-parametric in the distribution of the heterogeneity, requires no special
treatment of the initial conditions, and does not have the same computational limitations
as estimating discrete …nite distributions.
Finally, Halliday (2008) …nds evidence of a great amount of heterogeneity in health,
which is the most important motivation for following the approach we propose here.
2.3.3
Relation to Jones and Schurer (2009)
Another recent paper dealing with self-assessed health measures is Jones and Schurer
(2009). They use the German Socio-Economic Panel and focus on the e¤ect of income
on health, conducting a detailed analysis on the shape of this e¤ect. But, it does not
consider dynamics in SAH. However, it does address the potential importance of unobserved heterogeneity. They control for heterogeneity in both unobserved health status and
reporting behaviour by estimating a logit model with …xed e¤ects for each of the J 1
threshold values into which SAH can be dichotomized. However, they estimate it using
the Conditional MLE of Chamberlain (1980) because the standard MLE estimation of
ordered-choice models with …xed e¤ects su¤er from a severe incidental parameters problem and there was no other solution from the panel data literature ready to be applied
to these models. Implementing a solution to this problem in the estimation of orderedchoices model with …xed e¤ects is one of the main contributions of our paper. This allow
us to estimate marginal e¤ects that properly account for the distribution of unobserved
heterogeneity and, especially, for its correlation with observable variables. The Conditional MLE conditions out the …xed e¤ects and, therefore, no information about them is
recovered. This means that when calculating marginal e¤ects they have to substitute the
…xed e¤ect for a value that may not be representative of the population and, in any case,
it ignores the correlation between the observables and the heterogeneity.
Jones and Schurer (2009) also estimate a random e¤ects model that assumed independence of the heterogeneity. Comparing this with the …xed e¤ects estimates they …nd
8
See section 5.1.2 in Halliday (2008).
11
that the underlying assumptions of the statistical model matter for assessing the link between income and health. This …nding provides additional support in favor of estimating
a model that makes no assumptions about the distribution of the heterogeneity.
3
Estimation Method
3.1
Estimation problem and possible solutions
From (1), (2) and the normality assumption about "it , we have that
Pr(hit =
1jxit ; hit 1 ; ci ;
i)
=1
Pr(hit = 0jxit ; hit 1 ; ci ;
i)
=
Pr(hit = 1jxit ; hit 1 ; ci ;
i)
=1
(ci +
(ci +
it )
it )
(
Pr(hit =
(5)
it )
1j:)
Pr(hit = 0j:) =
(
it )
(6)
where
it
=
i
+
1 1 (hi;t 1
= 1) +
1 1 (hi;t 1
=
1) + x0it
(7)
Conditioning on the …rst observation hi0 , the log-likelihood is:
l( 1 ;
1;
; ; c) =
T
N X
X
f1 fhit =
1g log [1
(ci +
it )]+
i=1 t=1
1 fhit = 0g log [ (ci +
it )
(
it )]
+ 1 fyit = 1g log [ (
it )]g;
(8)
Using standard MLE to estimate models like (2) is known to be biased, since we do not
have a large number of periods. The MLE is inconsistent when T does not go to in…nity
because the …xed e¤ects act as incidental parameters. Furthermore, existing Monte Carlo
experiments with dynamic nonlinear models show that the MLE has large bias. In fact,
simulations of a dynamic ordered probit in Bester and Hansen (2009) and simulations in
the following sections show that the bias is non-negligible even with T as large as 20. As
mentioned in the introduction, several recently developed bias-correction methods could
overcome this problem. Arellano and Hahn (2007) summarize the di¤erent approaches.
The methods can be grouped into three approaches based on the object that is corrected. The …rst aproach is to construct an analytical or numerical bias correction of
a …xed e¤ect estimator. Fernandez-Val (2009), among others, takes this approach and
applies his analytical bias correction to dynamic binary choice models. The second approach is to correct the bias in moment equations. An example of this is Carro (2007)
that uses an estimator of this type to correct the bias in dynamic binary choice models.
The third group are those that correct the objective function. Arellano and Hahn (2006)
and Bester and Hansen (2009) take this approach, with the latter including an application to a dynamic ordered probit model. The HS-penalty estimator studied in Bester and
12
Hansen (2009) is the …rst option we consider because our model is also a dynamic ordered
probit, and because alternative approaches require transformations or derivations. This
estimator also has the advantage of being easier to compute than the Modi…ed MLE in
Carro (2007) and the Bias Correction in Fernandez-Val (2009) because the HS does not
require the calculation of expectations and the other two do. This advantage is more
relevant in our case, because it has two …xed e¤ects.
Arellano and Hahn (2007) show how the di¤erent approaches are related. Asymptotically, all the approaches always reduce the order of the bias of the MLE from the standard
O(T 1 ) to O(T 2 ) for the general classes of models they were developed. However there
may be di¤erences when they are applied to speci…c cases . The following very simple
example used in Carro (2007), Arellano and Hahn (2007), and Bester and Hansen (2009)
illustrates this point. Consider the model where yit N ( i ; 02 ). The ML estimator of 02
iid
P P
2
1
2
2
is bM LE = N T i t (yit bi ) . It is well known that bM
LE is not a consistent estimator
2
of 0 when N ! 1 with …xed T , since it converges to T T 1 02 . In this case the whole
P PT
problem is very easy to …x. N (T1 1) N
bi )2 is the …xed T consistent estimai=1
t=1 (yit
tor of 02 . The MMLE from Carro (2007) produces this very same estimator, correcting
not only the O(T 1 ) term of the bias, but all the asymptotic bias in this special example.
The HS removes the O(T 1 ) term of the bias, but it does not attain the …xed-T consistent
estimator. The one-step bias correction to the ML estimator from Fernandez-Val (2009)
does not produce a …xed-T consistent estimator either, but its iterated form does. Thus,
di¤erences may appear between the di¤erent approaches when applied to speci…c models.
On the other hand, the incidental parameters problem can be seen as a …nite sample
bias problem in panel data context. The problem is not important when T is large relative
to N. However, since our panel does not have a large number of periods it is reasonable to
wonder whether the excellent asymptotic properties of the MLE when T goes to in…nity
(su¢ciently fast) are a good approximation to our …nite sample. Simulations show that
we would need panels with many more time periods than are usually found in practice.
The relevant implication is that we have to examine the …nite sample performance of the
estimators for our model and sample sizes. In the methods considered here this is done
through Monte Carlo experiments. Bester and Hansen (2009) do not compare the …nite
sample properties of the method they use with others for the ordered probit case because
many of the other methods require some derivation to get the speci…c correction for this
case. However, they make such a comparison using binary choice (probit and logit) models.
Also, Carro (2007) and Fernandez-Val (2009) conduct Monte Carlo experiments for logit
and probit models with di¤erent sample sizes (both in T and N), allowing us to compare
a wide range of methods for those models. From these comparisons we can conclude that
the HS penalty approach is not the best one and for sample sizes with T smaller than
13 the remaining bias can still be signi…cant. Given this result, we consider other of the
proposed methods to estimate our ordered probit and evaluate its …nite sample properties.
13
Interesting candidates are the corrections discussed by Fernandez-Val (2009) and Carro
(2007) since they are equally superior to other alternatives in …nite sample performance
in the relevant existing comparisons. In the next subsections we derive explicit formulas
of the modi…ed MLE used in Carro (2007) for the model considered here, evaluate its
…nite sample performance, and compare it with the HS penalty estimator.
3.2
MMLE for a dynamic ordered probit with two …xed e¤ects
The model we want to estimate is de…ned in (1) and (2), and its log-likelihood is (8).
Let = ( ; 1 ; 1 ) and i = ( i ; ci ). Partial derivatives are denoted by the letter d, so
@li ( ; i )
@li ( ; i )
and d i ( ; i )
. Bold letters
the …rst order conditions are d i ( ; i )
@ i
@
represent vectors.
The MLE of i for given , i ( ), solves d i ( ; i ) = 0. The MLE of is obtained by
P
maximizing the concentrated log-likelihood ( N
i=1 li ( ; i ( ))), i.e. by solving the following
…rst order condition:
N
1 X
d i ( ; i ( )) = 0
(9)
T N i=1
where d i ( ; i ( )) =
@li ( ; i )
@
i= i(
)
.
To reduce the bias of the estimation, we follow Carro (2007) in modifying the score of
the concentrated log-likelihood by adding a term that removes the …rst order term of the
asymptotic bias in T . By doing so, we get that the MMLE of the parameters of model
(2) is the value that solves the following score equation:
d
M i(
+dcci d
1
2d
1
@ bi
@^
ci
+ dccci
@
@
d ci
i dcci
@ bi
@ bi
@^
ci
@^
ci
+ d ci
2d ci d ci + d ci
+ d cci
i+d
i
@
@
@
@
E(d ci )E(d ci ) E(dcci )E(d i )
@
@ i
E(d i )E(dcci ) [E(d ci )]2
i= i( )
) = d i ( ; i ( ))
@
@ci
2
d
i
d
E(d i )E(d ci ) E(d i )E(d ci )
E(d i )E(dcci ) [E(d ci )]2
cci
+d
cci
=0
i= i(
(10)
)
where d i ( ; i ( )) is the standard …rst order condition from the concentrated log-likelihood,
2
2
@ 3 li
, and so on. From the …rst order conas in (9). d ci = @@ @cli i , d i = @@ l2i , d ci = @ @c
i@ i
i
ditions of i and ci we obtain bi ( ) and b
ci ( ), as it is done in order to concentrate the
log-likelihood. All expectations are conditional on the same set of information as the
likelihood. These expectations can be computed by conditioning recursively, like we do
to write the conditional likelihood. The parametric model (equations (1), (2) and the
assumption about "it ) from which we write the likelihood also gives the parametric form
14
of the expectations we need to calculate.9
We show in Appendix A how this modi…cation on the score of the concentrated loglikelihood in (10) is a …rst order adjustment on the asymptotic bias of the ML score, so
the …rst order condition is more nearly unbiased and the order of the bias of the estimator
is reduced from O(T 1 ) to O(T 2 ). Furthermore, the bias is corrected without changing
the asymptotic variance of the MLE.
3.3
3.3.1
Simulations
First DGP: Performance for di¤erent T
We simulate the model in equations (1), and (2) with the following value of the parameters
and Data Generating Process (DGP): = 1, 1 = 0:5, and 1 = 0:5. The error follows
a normal distribution: "it N (0; 1). The …xed e¤ects are constructed as follows:
i
4
1 X
=
xit + ui ;
2 t=1
ci = jzi j;
where ui
where zi
N (xi0 ; 1)
(11)
N (xi0 ; 1):
(12)
so that they are correlated with the explanatory variables. This correlation of the unobserved heterogeneity with the covariates makes the problem more severe than in the
independency case. We study the performance of estimators under this condition as we
consider it to be more realistic.10 xit follows a Gaussian AR(1) with autoregressive parameter equal to 0:5. Initial conditions are xi0
N (0; 1) and hi0 = i + 0 xi0 + "i0 . We
perform 1000 replications, with a population of N = 250 individuals. For each simulation
we estimate the MLE, the MMLE given by equation (10) and the HS estimator de…ned
in Bester and Hansen (2009). That is, the HS estimator is the value of the parameters
that maximize the following penalized objective function:
N
X
i=1
lki ( ;
1;
1;
i ; ci )
N
X
1
i=1
2
trace Ib c1i Vb
ci
k
2
(13)
where lki is the log likelihood of i, Ib ci is the sample information matrix for ei = ( i ; ci )0 ,
@li
Vb ci is a HAC estimator of V ar p1T @e
, and k = dim(ei ). This penalty term is easier to
i
calculate than the modi…cation of the score in (10) because the penalty does not involve
any expectation.
9
Appendix B gives some indications about computing the MMLE.
In the simulations of an ordered probit in Bester and Hansen (2009) the …xed e¤ects are independent
of the covariates. We have simulated and compared MMLE and HS in this case too. As said, the bias
is smaller for all T , but the conclusions from the comparison between MMLE and HS are the same as
in the dependency case. Since the latter is more relevant in practice we do not report the independency
case.
10
15
Table 4: Monte Carlo Results. Dynamic Ordered Probit parameters
Parameter
1
1
True value
1
0:5
0:5
Estimator Mean Bias RMSE Mean Bias. RMSE Mean Bias
T =4
MLE
0:816
0:828
0:474
0:516
0:551
0:392
0:443
0:467
HS
0:796
0:809
0:254
0:282
0:280
MMLE
0:172
0:182
T =8
0:188
0:216
0:189
MLE
0:335
0:341
0:115
0:153
0:119
HS
0:247
0:254
0:062
0:108
0:067
MMLE
0:073
0:086
T = 10
0:145
0:171
0:154
MLE
0:257
0:263
0:083
0:119
0:093
HS
0:170
0:178
MMLE
0:052
0:067
0:036
0:086
0:050
T = 12
0:127
0:152
0:127
MLE
0:210
0:215
HS
0:127
0:134
0:072
0:106
0:074
0:030
0:079
0:036
MMLE
0:040
0:054
T = 16
MLE
0:154
0:159
0:093
0:118
0:096
0:048
0:083
0:054
HS
0:081
0:088
0:017
0:068
0:022
MMLE
0:026
0:041
T = 20
MLE
0:122
0:127
0:072
0:095
0:078
0:034
0:067
0:042
HS
0:058
0:065
0:009
0:058
0:016
MMLE
0:019
0:034
RMSE
0:586
0:509
0:305
0:216
0:154
0:109
0:179
0:127
0:093
0:151
0:106
0:081
0:119
0:085
0:069
0:101
0:074
0:062
Note: See a detailed description of the model simulated and other characteristics of the
DGP in subsection 3.3.
Results from this experiment for di¤erent T are reported in Table 4, which shows the
mean bias and the Root Mean Squared Error (RMSE). We …nd that for all T , the MMLE
performs much better than the other two estimators. Comparing it with the HS, the
di¤erences are greater for T = 4 and T = 8, where the HS is closer to the MLE than to
the MMLE. When using the MMLE the bias is smaller than 10% of the true values with
T = 10 for all but for one of the parameters. With T = 12 the bias when using the
MMLE is already negligible whereas the HS contain biases and RMSE larger than the
MMLE with T = 10. Even with T = 16 the HS exhibit mean biases greater than the
MMLE with T = 10. It is not until T = 20 that the HS has small biases and RMSE. So
HS needs more periods (at least more than 16) to have small …nite sample biases. Given
this and the fact that the sample sizes we have in our empirical analysis are smaller than
T = 14, we use MMLE.
16
The reasons of this better performance of the MMLE is the use of the speci…c structure
of the model we want to estimate, which translates into the expectations in the modi…cation term. The likelihood includes the fact that we know the distribution of one of the
explanatory variables: the lag of the dependent variable. This distribution is, of course,
that of the dependent variable in the previous period. Therefore, we write the likelihood
recursively for each period (conditional on the previous period) up to the likelihood of
the initial condition. This is used in the modi…cation so it includes expectations, using
the known distribution of hit 1 conditional on hit 2 . The HS is generally written so that
it does not make any intensive use of a speci…c likelihood and it does not include such
expectations. Therefore HS does not exploit all the information that our speci…cation
provides and it requires more periods to attain the same performance as the MMLE. This
con…rms the idea expressed in Bester and Hansen (2009) that the simplicity of the HS
(due to not having to calculate expectations) may not be free and could lead to a worse
performance than other approaches.
3.3.2
Quality of inference
We consider the quality of inference on …nite samples based on these estimators. Table 5
presents the coverage of 95% con…dence intervals and the estimated asymptotic standard
errors divided by the standard deviation. The latter is very close to 1 in all cases for the
MMLE and in most cases for the other estimators, which indicates that the variance is
estimated well and the problem is the bias. This corresponds with the fact that we are
correcting a bias without altering the asymptotic variance. With respect to inference,
the coverage of the con…dence intervals is extremely poor for the MLE, especially for .
Even with T = 20, the coverage for is smaller than 3%. The HS estimator improves
inference with respect to the MLE, but it is still too far from the theoretical coverage of
95%, being the coverage for specially bad even with T = 20. As it happens with the
bias and RMSE criteria, the MMLE is clearly the best estimator of these three for doing
inference, for all periods and parameters.
3.3.3
Performance for di¤erent degrees of persistence
To check whether results are maintained under di¤erent state dependence scenarios, we
present simulations for di¤erent values of 1 and
1 , with T = 10 in Table 6. The
DGP is the same as that of Table 4 except for the values of 1 and 1 : Here the state
dependence changes from very negative to very positive, including the case with no state
dependence. In terms of bias and RMSE, we …nd that the MMLE performs better than
the other methods for all cases. In principle, having a more negative state dependence
may improve all the estimators since it induces higher variance in yit . This is the case for
the estimation of , where the three estimation methods improve, but it is not the case
17
Table 5: Monte Carlo Results. Inference over Dynamic Order Probit parameters: Conference intervals coverage and estimation of the estandard error.
Parameter
True value
Estimator
MLE
HS
MMLE
MLE
HS
MMLE
MLE
HS
MMLE
MLE
HS
MMLE
MLE
HS
MMLE
1
1
0:5
0:5
1
% Coverage
% Coverage
% Coverage
C.I. 95%
SE/SD
C.I. 95%
SE/SD
C.I. 95%
SE/SD
T =8
47%
0:87
48%
0:90
0%
0:85
0%
0:86
74%
0:91
73%
0:94
87%
0:93
85%
0:96
64%
1:02
T = 10
54%
0:91
53%
0:91
0%
0:81
3:5%
0:83
82%
0:96
78%
0:95
90%
0:96
89%
0:96
74%
0:94
T = 12
0%
0:89
58%
0:91
62%
0:93
85%
0:96
83%
0:98
8:8%
0:92
92%
0:95
92%
0:97
81%
1:00
T = 16
0%
0:92
69%
0:91
68%
0:94
88%
0:96
88%
0:99
29%
0:95
88%
1:00
93%
0:94
93%
0:96
T = 20
77%
0:96
73%
0:94
2%
0:90
91%
1
88%
0:98
48%
0:93
90%
0:97
95%
0:98
93%
0:95
Note: This is for the simulation experiment in Table 4. We have used the inverse of the
hessian as estimator of variance.
for the estimation of
worse.
3.3.4
1
and
1,
where the MMLE improves but the MLE and HS get
Simulations based on real data
Finally, we perform a simulation based on the real data used in this paper. This will provide further evidence about …nite sample performance of the MMLE and will give more
robustness to our estimator choice. The DGP takes the estimates obtained by MMLE and
reported in Table 8 as the true model. It takes the real data for all the individuals used in
that estimation and all the signi…cant x variables except the time dummies. This means
that in this DGP xit is a vector containing observations of the following variables: age,
squared age, household size, number of kids, and income. The true values of the parameters are: 1 = 0:4875, 1 = 0:4375, 0 = (0:0205; 0:0005; 0:0388; 0:0472; 0:0396).
N = 1739, T is the same as in our data (i.e. between 8 and 14 periods), and "it N (0; 1):
18
Table 6: Monte Carlo Results. Dynamic Ordered Probit parameters with di¤erent degrees
of state dependence
Parameter
1
Estimator Mean Bias RMSE Mean Bias.
True value
1
1
MLE
0:204
0:212
0:264
0:094
HS
0:105
0:116
0:008
MMLE
0:012
0:044
True value
1
0:5
0:214
MLE
0:212
0:218
0:079
HS
0:116
0:126
0:018
MMLE
0:026
0:048
True value
1
0
0:180
MLE
0:227
0:233
0:079
HS
0:136
0:144
MMLE
0:037
0:055
0:028
True value
1
0:5
0:145
MLE
0:257
0:263
HS
0:170
0:178
0:083
0:036
MMLE
0:052
0:067
True value
1
1
MLE
0:297
0:303
0:105
0:086
HS
0:215
0:222
0:057
MMLE
0:065
0:078
1
RMSE Mean Bias
1
0:284
0:244
0:136
0:087
0:089
0:003
0:5
0:235
0:206
0:119
0:078
0:083
0:018
0
0:201
0:180
0:116
0:084
0:082
0:032
0:5
0:171
0:154
0:119
0:093
0:086
0:050
1
0:144
0:111
0:126
0:091
0:100
0:069
RMSE
0:265
0:130
0:086
0:227
0:119
0:083
0:201
0:119
0:084
0:179
0:127
0:093
0:148
0:129
0:107
Note: 1000 Monte Carlo simulations of the Ordered Probit model in equations (1) and
(2), following the same DGP as in Table 4 (described at the beginning of section 3.3),
but changing the value of the state dependence parameters from negative to positive,
including the case with no state dependence. T = 10:
and ci are the estimates of these parameters by MMLE. The distributions of these
two parameters can be seen in graph 1. The distribution of i is not normal and is
correlated with ci (correlation coe¢cient between i and ci is -0.33). Thus, the distribution
of unobserved heterogeneity is not an arbitrary and statistically convenient distribution,
but an empirically founded distribution that captures both real correlations with the
covariates and correlations between …xed e¤ects. These correlations and distributions of
i and ci are richer than those in the previous simulation experiments. Furthermore,
this is the relevant DGP to compare the proposed strategy for dealing with unobserved
heterogeneity with the random e¤ects approach previously used in the literature. Making
this comparison with an arbitrarily chosen DGP may imply a too favorable assumption
to the random e¤ects, as in our …rst DGP, or a too arbitrarily unfavorable one. However,
this case is the relevant case for our empirical analysis.
For the reasons discussed, we evaluate the …nite sample performance of the random
i
19
Table 7: Monte Carlo Results. DGP based on the real data used in the empirical analysis.
2
House- Number Household
hold size of Kids
Income
-0.0388
0.0472
0.0396
Age
Age
True value -0.4375 0.4875 0.0205 -0.0005
Mean Bias
CRE
0.0945 0.0459 0.0002 0.00006 -0.0080
MLE
0.2039 -0.1239 0.0061 -0.00016 -0.0078
HS
0.1288 -0.0474 0.0044 -0.00010 -0.0049
MMLE
0.0437 0.0090 0.0029 -0.00006 -0.0030
Root Mean Squared Error
CRE
0.1041 0.0603 0.0265 0.00021
0.0406
MLE
0.2066 0.1272 0.0113 0.00018
0.0304
HS
0.1326 0.0546 0.0098 0.00013
0.0271
MMLE
0.0576 0.0289 0.0086 0.00010
0.0261
1
1
0.0095
0.0121
0.0077
0.0046
-0.0003
0.0063
0.0033
0.0016
0.0512
0.0354
0.0310
0.0292
0.0348
0.0257
0.0234
0.0222
Note: 1000 Monte Carlo simulations. DGP described at the begging of subsection 3.3.4.
e¤ects approach (CRE) described at the end of section 2.3.1, in addition to the MLE,
HS, and MMLE. To make the comparison as close as possible with the estimators used
in practice, we include the following constant variables as covariates when estimating by
random e¤ects: gender, race, and education indicators. These are implicitly included in
the DGP through the estimated i and ci , since in the …xed e¤ects these variables can
not be separately identi…ed from the …xed e¤ects.
The results of this simulation are presented in Table 7. The MMLE is clearly the best
of all estimators in terms of RMSE. More speci…cally, the bias and RMSE for the CRE are
twice the bias and RMSE of the MMLE for some parameters like 1 and Household Size.
As in the previous simulations experiments with similar number of periods, the MMLE
exibit small biases.
4
4.1
Estimation Results
Main Model
Table 8 presents the coe¢cient estimates for the main model based on three di¤erent
estimators. This includes di¤erent speci…cations of the heterogeneity. The …rst estimated
model (column I) is a pooled version of the model in (1) and (2), without individual
speci…c e¤ects. The second estimated model (column II) is the correlated random e¤ects
model described in equations (3) and (4). It is similar to models estimated in Contoyannis,
Jones and Rice (2004). It has homogenous cut-points and uses a random e¤ects approach
to control for the individual speci…c intercept in the linear index. The last speci…cation
(column III) is described in previous subsections; it is the model in (1) and (2) treating
20
Table 8: Estimates, Main model.
I
II
III
Correlated
Variables
Pooled
Random E¤ects
MMLE
Health in t-1: Good
0.6527*** (0 .0 1 8 5 )
0.5028*** (0 .0 2 3 4 ) 0.4875*** (0 .0 1 8 6 )
Health in t-1: Poor
-0.4417*** (0 .0 2 3 3 ) -0.3259*** (0 .0 3 4 3 ) -0.4375*** (0 .0 2 4 2 )
Age
0.0011 (0 .0 0 3 2 )
0.0200 (0 .0 2 1 0 )
0.0205 (0 .0 2 2 2 )
Age square
-0.0000 (0 .0 0 0 0 )
-0.0007*** (0 .0 0 0 1 ) -0.0005*** (0 .0 0 0 1 )
Married
0.0344 (0 .0 2 8 6 )
0.1722 (0 .0 7 5 2 )
0.0749 (0 .0 6 0 6 )
Separated/Divorced
-0.0580 (0 .0 3 5 8 )
0.0475 (0 .1 0 2 8 )
0.0375 (0 .0 7 2 9 )
Widowed
-0.0243 (0 .0 4 0 8 )
0.3668** (0 .1 3 2 9 )
0.0542 (0 .0 9 1 8 )
Household size
-0.0782*** (0 .0 1 3 8 )
-0.0112 (0 .0 1 8 9 ) -0.0388** (0 .0 1 7 7 )
Number of Kids
0.0647*** (0 .0 1 5 5 )
0.0423 (0 .0 1 8 9 )
0.0472** (0 .0 1 8 8 )
Household Income
0.0816*** (0 .0 1 2 2 )
0.0188 (0 .0 1 9 1 ) 0.0396*** (0 .0 1 4 7 )
Male
-0.0095 (0 .0 1 7 5 )
0.0116 (0 .0 2 6 5 )
Non-white
-0.0890* (0 .0 4 6 7 )
-0.1277* (0 .0 7 0 9 )
Higher/1st degree
0.1540*** (0 .0 3 4 5 )
0.1563*** (0 .0 4 6 6 )
HND/A level
0.0810*** (0 .0 2 5 0 )
0.0696* (0 .1 8 6 2 )
CSE/O level
0.0860*** (0 .0 2 2 5 )
0.0923*** (0 .0 3 2 7 )
Cut point 1
0.0192 (0 .1 2 3 3 )
-0.0277*** (0 .2 2 6 5 )
Cut point 2
1.0698*** (0 .1 2 3 5 )
1.0528*** (0 .2 2 6 7 )
2
0.0686
u
Mean ci
1.1323
Variance ci
0.3277
Mean i
-0.0743
Variance i
0.6311
Correlation( i ,ci )
-0.3326
Akaike Infomation Criterion
38544.0
37334.3
37275.2
Standard errors are reported in parentheses. Number of individuals used in estimation of
all models is 1739. Estimates of year dummies in all models and within means of variables
in random e¤ects are not reported.
* signi…cant at 10% ; ** signi…cant at 5% ; *** signi…cant at 1%.
21
and ci as …xed e¤ects. It is estimated by MMLE.
To compare magnitudes of the e¤ects across variables and estimates we look at the
relative e¤ects (i.e. ratio of coe¢cients), and the average and median marginal e¤ects
reported in tables 9 and 10 for the variables with a coe¢cient signi…cantly di¤erent from
zero.11;12
i
Table 9: Average Marginal E¤ects on Probability of reporting good and poor health for
signi…cant variables. Main model.
(a) Good
I
II
Correlated Random
Pooled St.Err. E¤ects
St.Err.
Health in t-1: Good 0.2528 0.0071 0.1883
0.0456
Health in t-1: Poor -0.1550 0.0078 -0.1149
0.0637
Age
-0.0005 0.0003 -0.0170
0.0055
Household size
-0.0282 0.0050 -0.0040
0.0111
Number of Kids
0.0233 0.0056 0.0150
0.0149
Household Income
0.0294 0.0044 0.0067
0.0095
III
MMLE St.Err.
0.1653 0.0080
-0.1403 0.0520
-0.0089 0.0064
-0.0120 0.0054
0.0145 0.0058
0.0122 0.0045
(b) Poor
I
II
Correlated Random
Pooled St.Err. E¤ects
St.Err.
Health in t-1: Good -0.1399 0.0046 -0.1057
0.2125
Health in t-1: Poor 0.1477 0.0081 0.0968
0.1372
Age
0.0003 0.0002 0.0105
0.0140
Household size
0.0173 0.0031 0.0024
0.0072
Number of Kids
-0.0143 0.0034 -0.0090
0.0171
Household Income
-0.0181 0.0027 -0.0040
0.0078
III
MMLE St.Err.
-0.0984 0.1153
0.1268 0.0947
0.0058 0.0117
0.0081 0.0086
-0.0095 0.0102
-0.0081 0.0082
11
These marginal e¤ects are also called partial e¤ects. The marginal e¤ects are averaged (or calculated
their median) across the …rst eight waves of the panel as well as across the values of the covariates for
each individual. This means that we …rst calculate the marginal e¤ect for each individual in the sample
at the observed values of the regressors and then we calculate the average (or the median) of them,
instead of calculating the marginal e¤ect at the average value of the covariates. We do this in order to
obtain summary measures of the marginal e¤ects representative of the situation of the population (see
Chamberlain, 1982, pp.1273). Moreover, a measure that substitutes the values of the covariates and
especially the individual speci…c e¤ect i with their means (or any other …xed value) ignores any possible
correlation between them. This may give the wrong values of the marginal e¤ects representative of the
population.
12
An alternative way to identify and estimate the marginal e¤ects is the approach taken in Chernozhukov et. al. (2010). They show that in a model like ours, with …xed e¤ects, when T is …xed the
(average and quantile) marginal e¤ects are not point identi…ed. However they are set identi…ed and they
propose a way to estimate bounds on the partial e¤ect. These nonparametric bounds tighten as T grows.
The main advantage is that the bounds analysis applies to any T , whereas our bias correction method
depends on T not being very small. However, the bounds analysis is only available with discrete covariates for the moment. In contrast, the bias correction methods work well in many examples, including
continuous covariates, and they consistently point estimate the identi…ed average e¤ect.
22
Table 10: Median Marginal E¤ects on Probability of reporting good and poor health for
signi…cant variables.
(a) Good
I
II
Corr. Random
Pooled
E¤ects
Health in t-1: Good 0.2536
0.1889
Health in t-1: Poor -0.1555
-0.1175
Age
-0.0004
-0.0162
Household size
-0.0283
-0.0040
Number of Kids
0.0234
0.0151
Household Income
0.0296
0.0067
III
MMLE
0.1738
-0.1544
-0.0080
-0.0127
0.0154
0.0130
(b) Poor
I
Pooled
Health in t-1: Good -0.1402
Health in t-1: Poor 0.1484
Age
0.0002
Household size
0.0170
Number of Kids
-0.0140
Household Income
-0.0177
II
Random
E¤ects
-0.1014
0.0949
0.0094
0.0023
-0.0086
-0.0039
III
MMLE
-0.0910
0.1282
0.0043
0.0077
-0.0089
-0.0077
The pooled model exacerbates the state dependence e¤ect due to the lack of permanent
unobserved heterogeneity. Though it is not reported, we also estimated the model in (1)
and (2) by MLE. As seen in the simulations it is severely biased, estimating much lower
state dependence e¤ects and higher e¤ect of the other explanatory variables.
More interesting is the comparison between the correlated random e¤ects and the …xed
e¤ects model estimated by MMLE. They are in columns II and III of Tables 8, 9, and
10. The …rst di¤erence is in the variables that are statistically signi…cant. Table 8 shows
that in the MMLE household size, number of kids, and household income have an impact
that is statistically di¤erent from zero. However, none of them has a signi…cant e¤ect
in the random e¤ect estimates. In correspondence, the average marginal e¤ect of those
variables increases in absolute value in the MMLE case with respect to the random e¤ects
model, especially for household income. With respect to the state dependence e¤ect (effect of hit 1 ) there are some changes too. The e¤ect of hit 1 = good decreases in absolute
value when estimating by MMLE, and the e¤ect of hit 1 = poor increases. Comparing
coe¢cients in Table 8 we can also see that the e¤ect of hit 1 = poor increases proportionally less than the e¤ect of the other relevant explanatory variables. In the random e¤ects
speci…cation the ratio of the coe¢cient of 1 (hi;t 1 = poor) to the coe¢cient of ‘Household
23
income’ is around 17, whereas in the MMLE that ratio is 11. In any case, this partial
increase in the e¤ect of state dependence and of the e¤ect of the explanatory variables
is remarkable because the model in column III allows for more permanent unobserved
heterogeneity and more ‡exibly than in column II.13
Moreover, many of those di¤erences in the estimated e¤ects of the explanatory variables between the correlated random e¤ects model and the …xed e¤ects model estimated
by MMLE are statistically signi…cant. As is known, if the restrictions imposed by the
correlated random e¤ects model are correct its estimates are more precise (i.e. e¢cient)
than the estimates of the …xed e¤ects model (even after the modi…cation of the MLE),
though both are consistent. Given this, we have used a Hausman type test to see if those
important di¤erences are only due to the more imprecise estimates in columns III. We
have made the test over the Average Marginal e¤ects instead of the parameters in table
8 for two reasons. First, Marginal E¤ects (including their average), and not the parameters in equations (1) and (2), are usually the parameters of interest in nonlinear models.
Second, the average marginal e¤ects do not su¤er the di¤erent scales problem that makes
magnitudes in columns II and III of Table 8 not directly comparable and not directly
interpretable. The average marginal e¤ects of both models are well de…ned within the
same scale, as any other marginal e¤ect over choice probabilities, and their magnitude has
the same clear interpretation. If we were primarily interested in a single average marginal
e¤ect, like the e¤ect of hi;t 1 = good over the probability of hi;t = good, we could use a
t-statistic that ignores the others. Doing this for all the average marginal e¤ects we reject
at 5% the null hypothesis that both estimates are the same for four variables. Doing a
joint test we also reject the null hypothesis that the correlated random e¤ects estimates
and the …xed e¤ects MMLE estimates are the same, therefore rejecting, the restrictions
imposed in the correlated random e¤ects model.14
The previous two paragraphs are a clear indication that ignoring the added dimension
of heterogeneity and the ‡exibility in the distribution of the …xed e¤ects matters when
estimating the model and the marginal e¤ects of variables. It is not only a matter of the
amount of heterogeneity but also a matter of the other restrictions being imposed on the
model in column II.
Besides the formal test of random e¤ects versus …xed e¤ects, we look at the unobserved
heterogeneity both in the linear index equation and in the cut point shift. Figure 1
displays the estimated distribution (histogram) of both …xed e¤ects in the population,
and both exhibit large variation. The average for i is 0:074 and for ci is 1:13. The
13
Recall that permanent unobserved heterogeneity, state dependence and persistence in observable
variables are alternative explanations of the observed high persistence in hit .
14
In the Hausman test we have used the Var-Cov of the Fixed E¤ects estimates only, instead of
subtracting from it the Var-Cov of the Random E¤ects. We do this in order to avoid the di¤erence not
being a positive de…nite matrix due to the use of di¤erent estimates of the variance of the errors. This
represents a lower bound for this test and a rejection here will also be a rejection when using the well
de…ned di¤erence in the var-cov matrices.
24
Figure 1: Distribution (histogram) of the …xed e¤ects from MML estimates.
standard deviations of these distributions are 0:79 and 0:57; respectively. In the random
e¤ects speci…cation i is the compound equation (4) that includes a linear relation to
some observables and an additive unobserved term that is assumed to follow a normal
distribution. Given the estimates of the parameters of equation (4), the estimated average
for i in the random e¤ects model is 1:41, and its standard deviation is 0:9626. With
respect to the heterogeneity on the cut points, the average of ci , the …rst cut point, is
-1:13 and the estimate of the …rst cut point in the random e¤ects speci…cation is 0:03.
Also, as can be seen in the right panel of …gure 1 and has been said, there is large variation
in ci among individuals that is ignored by the random e¤ects model estimated. Moreover,
a test rejects normality of the distribution of i at 1%.15 Finally, the correlation between
0:33, so there are rich interactions between both …xed e¤ects forming a joint
i and ci is
distribution that is not the simple combination of their marginal distributions.
Focusing on the MML estimates, we …nd evidence of strong positive state dependence.
With respect to socioeconomic variables we …nd that aging and household size have a
small but signi…cant negative e¤ect on SAH. Household income and number of kids have
a small but signi…cant positive marginal e¤ect on SAH. Number of kids has the biggest
e¤ect of all the x variables.
With respect to how the models …t the observed data, in addition to the information
criteria (AIC) reported in Table 8 some predictions of the estimated models and their
sample counterparts are in Table 11. Overall the MMLE model …ts the data better,
because its predictions are closer to the actually observed proportions in the sample.
Likewise, the MMLE predictions capture better the inverted-U shape of the proportion of
reporting excellent or good health as we look at people with higher number of children,
and the slope in the increasing patter when looking at people with higher income.16
15
16
ci can not be normal by de…nition since it is restricted to be positive.
Note that we are not controlling for any other observable characteristics. Thus, there may be other
25
Table 11: Sample vs. predicted proportions of SAH (in %)
Panel A: Total proportions.
Poor or very poor Fair Excellent or good
Sample
16
31
53
Predicted MMLE
15
32
53
Predicted CRE
12
31
57
Predicted Pooled
14
29
57
Panel B: Proportions of people reporting Excellent or good SAH.
Predicted
Sample MMLE CRE Pooled
By number of Kids
0
52
53
57
56
1
55
54
55
57
2
58
56
57
60
3+
50
51
54
58
By income quartiles
1st quartile
47
50
54
54
2nd quartile
51
52
56
56
3rd quartile
56
55
58
59
4th quartile
58
57
59
59
In addition to considering the average and median marginal e¤ects reported in tables 9
and 10, we look at how many individuals have a signi…cant marginal e¤ect in the sample,
given their particular situation and unobserved characteristics. Table 12 presents the
proportion of individuals with signi…cant (at 10%) marginal e¤ects over the probability of
reporting good and bad health, for the same variables as in table 9. Notice that although
the average marginal e¤ects are signi…cant, there is a great deal of heterogeneity; for
around half the population, the marginal e¤ects over the probability of reporting good
health is not signi…cantly di¤erent from zero for many of these variables.
4.2
4.2.1
Estimates of additional speci…cations
Model with health measures
As explained in subsection 2.2, we add variables that contain information on objective
health problems to provide further evidence of the robustness and importance of state
dependence in SAH. Table 13 presents the estimates of this model by MMLE, and table
14 contains the corresponding average marginal e¤ects. Of the three signi…cant socioeconomic variables in the main model only number of kids remains signi…cantly di¤erent
di¤erences between people with di¤erent number of children (or di¤erent income) that can reinforce or
cancel the e¤ect of it on average. Therefore these numbers can not be interpreted as the e¤ect of the
number of children (nor the e¤ect of income).
26
Table 12: Proportion of individuals with marginal e¤ects (on the probability of reporting
good and poor) that are signi…cantly di¤erent from zero at 10%.
Proportion
Good
Poor
Health in t-1: Good 60.44% 12.25%
Health in t-1: Poor 55.43% 34.50%
Age
22.71% 2.53%
Household size
37.21% 11.44%
Number of Kids
41.81% 12.65%
Household Income
44.85% 15.35%
Variable
from zero (at 10%). Most of the objective health measures have the biggest e¤ect over
SAH, all with the expected signs. The second variables with higher impact are the two
indicators of hit 1 .Thus, even after including objective health measures we …nd evidence
of strong positive state dependence here, though it is less than in the main model. The
variance of the unobserved heterogeneity is even higher in both i and ci than in the main
model.
4.2.2
Linear versus quadratic e¤ect of age
Halliday (2008) found, based on AIC, that a quadratic function of age was only weakly
preferred to the linear model and that there was not much lost with a linear model in
age. We have estimated model III in table 8 excluding age2 as an explanatory variable,
and in our case the …t is worse because the e¤ect of age increase more than linearly at
olger ages. Also, when introducing the quadratic term, the AIC changes much more than
in Halliday (2008). Here in the linear model AIC is 37373.4 and in the quadratic model
is 37275.2, almost a hundred points smaller.
5
Conclusion
In this paper we have considered the estimation of a dynamic ordered probit of a selfassessed health status with two …xed e¤ects: one in the linear index equation and one in
the cut points. The inclusion of two …xed e¤ects, instead of only one as is usual, is motivated by the potential existence of two sources of heterogeneity: unobserved health status
and reporting behavior. Even though we cannot separately identify these two sources of
heterogeneity we robustly controll for them by using two …xed e¤ects. Based on our best
estimates, the two …xed e¤ects exhibit important variation and it is relevant to account
for both when estimating the e¤ect of other variables. Our estimates also show that state
dependence is large and signi…cant even after controlling for unobserved heterogeneity and
27
Table 13: Estimates, health indicators added.
Variables
Health in t-1: Good
Health in t-1: Poor
Age
Age square
Married
Separated/Divorced
Widowed
Household size
Number of Kids
Household Income
Self employed
In paid employment
Unemployed
Retired
Long term sick or disa.
Health problems
Health limits daily acti.
Health limits work
Hospital days
Cut point 1
Cut point 2
2
u
Correlated
Random E¤ects
0.4191*** (0 .0 3 3 7 )
-0.1830*** (0 .0 4 0 1 )
0.0262 (0 .0 3 2 4 )
-0.0005*** (0 .0 0 0 2 )
0.0974 (0 .1 2 1 5 )
-0.0177 (0 .1 5 4 7 )
0.1601 (0 .2 0 8 7 )
-0.0181 (0 .0 3 5 9 )
0.0667 (0 .0 4 4 4 )
0.0051 (0 .0 3 1 2 )
-0.0941 (0 .1 0 7 3 )
0.1042 (0 .0 6 6 5 )
0.1311 (0 .0 9 5 6 )
0.1089 (0 .1 1 1 0 )
-0.1893 (0 .1 2 3 1 )
-0.6808*** (0 .0 4 7 0 )
-0.6435*** (0 .0 4 6 5 )
-0.4956*** (0 .0 4 6 8 )
-0.0331*** (0 .0 0 2 9 )
-0.9318*** (0 .2 6 5 1 )
0.2788 (0 .2 6 4 7 )
0.0489
Mean ci
Variance ci
Mean i
Variance i
Correlation( i ,ci )
Akaike Infomation Criterion
27688.2
MMLE
0.3696*** (0 .0 2 2 6 )
-0.2784*** (0 .0 2 9 6 )
-0.0215 (0 .0 2 8 2 )
-0.0003*** (0 .0 0 0 1 )
0.0350 (0 .0 6 7 2 )
0.0340 (0 .0 8 1 7 )
0.0474 (0 .1 1 1 0 )
-0.0127 (0 .0 2 0 6 )
0.0387* (0 .0 2 1 3 )
0.0112 (0 .0 1 7 7 )
0.0216 (0 .0 6 6 0 )
0.1069** (0 .0 4 2 5 )
0.0946 (0 .0 6 8 0 )
0.1104* (0 .0 6 5 1 )
-0.2562*** (0 .0 7 0 7 )
-0.7759*** (0 .0 3 3 4 )
-0.6865*** (0 .0 2 9 9 )
-0.4854*** (0 .0 3 0 6 )
-0.0350*** (0 .0 0 0 8 )
1.2775
0.3942
2.7760
1.4170
-0.0551
27310.7
Standard errors are reported in parentheses. Number of individuals used in estimation
of all models is 1437. Estimates of year dummies in all models, constant variables and
within means of variables in random e¤ects are not reported.
* signi…cant at 10% ; ** signi…cant at 5% ; *** signi…cant at 1%.
28
Table 14: Average Marginal E¤ects health for signi…cant variables. Model with health
indicators added.
(a) Good
Correlated Random
E¤ects
St.Err.
Health in t-1: Good
0.1416
0.0117
Health in t-1: Poor
-0.0610
0.0134
Age
-0.0061
0.0087
Number of Kids
0.0213
0.0141
In paid employment
0.0336
0.0215
Retired
0.0352
0.0358
Long term sick or disa. -0.0610
0.0396
Health problems
-0.2250
0.0171
Health limits daily acti. -0.2169
0.0167
Health limits work
-0.1666
0.0162
Hospital days
-0.0106
0.0009
MMLE St.Err.
0.1122 0.0074
-0.0832 0.0223
-0.0135 0.0080
0.0109 0.0060
0.0306 0.0122
0.0316 0.0185
-0.0729 0.0223
-0.2277 0.0480
-0.2045 0.0340
-0.1439 0.0141
-0.0099 0.0003
(b) Poor
Correlated Random
E¤ects
St.Err.
Health in t-1: Good
-0.0780
0.0159
Health in t-1: Poor
0.0434
0.0119
Age
0.0038
0.0052
Number of Kids
-0.0122
0.0081
In paid employment
-0.0199
0.0133
Retired
-0.0208
0.0213
Long term sick or disa. 0.0404
0.0280
Health problems
0.1083
0.0239
Health limits daily acti. 0.1435
0.0264
Health limits work
0.1041
0.0209
Hospital days
0.0063
0.0012
MMLE St.Err.
-0.0675 0.0877
0.0650 0.0657
0.0088 0.0161
-0.0070 0.0089
-0.0201 0.0247
-0.0207 0.0266
0.0547 0.0570
0.1216 0.1667
0.1501 0.1630
0.0994 0.1136
0.0065 0.0075
some forms of objective health measures. The comparison with random e¤ects estimates
previously used shows that it matters to ‡exibly account for more permanent unobserved
heterogeneity.
The recent literature in bias-adjusted methods of estimation of nonlinear panel data
models with …xed e¤ects has produced several potentially equivalent estimators. We …nd
that the a priori the most directly applicable correction to our model, which is the HS
estimator proposed in Bester and Hansen (2009), still has signi…cant biases in our sample
size. This lead us to consider the Modi…ed MLE proposed in Carro (2007). We derive
the expression of the MMLE for our model, conduct Monte Carlo experiments to evaluate
its …nite sample properties, and compare it with the HS. The MMLE has a negligible
bias in our sample size. The Monte Carlo experiments contribute to the literature on
29
bias-adjusted methods of estimation nonlinear panel data models by showing how well
two of the proposed methods work for a speci…c model and sample size. This information
will be useful for other applications when choosing among the several correction methods
existing in the literature.
References
[1] Arellano, M. and J. Hahn (2006): “A likelihood-based approximate solution to the
incidental parameter problemin dynamic nonlinear models with multiple e¤ects”,
unpublished manuscript.
[2] Arellano, M. and J. Hahn (2007): “Understanding Bias in Nonlinear Panel Models:
Some Recent Developments”. in Advances in Economics and Econometrics, Theory
and Applications, Ninth World Congress, Volume 3, edited by Richard Blundell,
Whitney Newey, and Torsten Persson. Cambridge University Press.
[3] Bester, C. A. and C. Hansen (2009): “A Penalty Function Approach to Bias Reduction in Non-linear Panel Models with Fixed E¤ects”. Journal of Business & Economic
Statistic, 27 (2):131-148
[4] Carro, J. M. (2007) “Estimating dynamic panel data discrete choice models with
…xed e¤ects”. Journal of Econometrics, 140 (2007):503-528
[5] Chamberlain, G. (1984): “Panel Data”, in Griliches, Z. and M.D. Intriligator (eds.)
Handbook of Econometrics, vol. 2, Elsevier Science, Amsterdam.
[6] Chernozhukov, V., I. Fernandez-Val, J. Hahn, and W. Newey (2010): “Average and
Quantile E¤ects in Nonseparable Panel Models”, mimeo, MIT Department of Economics.
[7] Contoyannis, P., A. M. Jones and N. Rice (2004): “The Dynamics of Health in the
British Household Panel Survey” Journal of Applied Econometrics, 19: 473-503
[8] Fernandez-Val, I. (2009): “Fixed e¤ects estimation of structural parameters and
marginal e¤ects in panel probit models ”, Journal of Econometrics, 150 (2009):7185.
[9] Greene, W. H. and D. A. Henshen (2008): “Modeling Ordered Choices: A Primer
and Recent Developments”, Available at SSRN: http://ssrn.com/abstract=1213093.
[10] Hahn, J. and G. Kuersteiner (2004): “Bias Reduction for Dynamic Nonlinear Panel
Models with Fixed E¤ects”, mimeo.
30
[11] Hahn, J. and W. Newey (2004): “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models”, Econometrica, 72(4): 1295-1319.
[12] Halliday, T. J. (2008): “Heterogeneity, state dependence and health”. Econometrics
Journal, 11: 499-516
[13] Jones, A. M. (2007): “Panel Data Methods and Applications to Health Economics”,
to appear in The Palgrave Handbook of Econometrics Volume II: Applied Econometrics, edited by Terence C. Mills and Kerry Patterson. Basingstoke: Palgrave
MacMillan.
[14] Jones, A. M. and S. Schurer (2009): “How does Heterogeneity Shape the Socioeconomic Gradient in Health Satisfaction?”. Journal of Applied Econometrics, published
online, DOI: 10.1002/jae.1134.
[15] Lindeboom, M. and E. van Doorslaer (2004):“Cut-point shift and index shift in selfreported health” Journal of Health Economics, 23: 1083-1099
[16] Mundlak, Y. (1978): “On the pooling of time series and cross-section data”, Econometrica, 46(1): 69-85.
[17] Rilstone, P., V.K. Srivastava, and A. Ullah (1996): “The second-order bias and mean
squared error of nonlinear estimators”, Journal of Econometrics, 75: 369-395
[18] van Doorslaer, E., Jones, A.M. and Koolman, X. (2004). “Explaining income-related
inequalities in doctor utilisation in Europe” Health Economics, 13: 629-647
[19] Wooldridge, J. (2005): “Simple solutions to the initial conditions problem in dynamic,
nonlinear panel data models with unobserved heterogeneity ”, Journal of Applied
Econometrics, 20: 39-54.
31
A
Appendix: Reduction of the order of the bias
In this appendix we show that the modi…ed score presented above corrects the …rst order asymptotic
bias of the original score. The algebra is somewhat tedious because of the many terms, but the idea is
clear. We …rst expand the score of the MLE around the true value of the …xed e¤ects and make some
calculations and substitutions on it to obtain the leading term of the bias of the MLE’s score. Then we
show that the modi…cation in the MMLE’s score, equation (10), is subtracting that leading bias term
from the score. This follows Carro (2007), and is adapted to our model with two …xed e¤ects.
The notation used is the same as in section 3.2: = ( ; 1 ; 1 ) and i = ( i ; ci ); we denote partial
@li ( ; i )
@li ( ; i )
derivatives by the letter d; bold letters are used to denote vectors; d i
,
, d i
@ i
@
2
2
3
@ li
li
, d i = @@ l2i , d ci = @ @c
, and so on; the derivatives evaluated at the true values of
d ci = @@ @c
i
i@ i
i
the parameters are represented by including a 0 in the sub-index (e.g. d i0 = d i ( 0 ; i0 )).
A.1
Deriving the leading term of the bias of the score in the
MLE
We start by deriving the …rst term of the bias in the score of the original unmodi…ed concentrated
log-likelihood. Expanding this score around i0 , and evaluating it at 0 we get:
d i ( 0 ; i ( 0 )) = d
i0
+ d
i0 (^ i ( 0 )
i0 )
(A1)
+ d ci0 (^
ci ( 0 ) ci0 )
1
1
2
+ d
d cci0 (^
ci ( 0 )
i0 (^ i ( 0 )
i0 ) +
2
2
+ d aci0 (^ i ( 0 )
ci ( 0 ) ci0 ) + Op (T
i0 )(^
ci0 )2
1=2
) + :::
This equation clearly shows that the score evaluated at the true value 0 di¤ers from the value of the
score we want to obtain, d i0 = d i ( 0 ; i0 ), as much as ^ i ( 0 ) and c^i ( 0 ) di¤er from i0 and ci0 . This
is the source of the incidental parameters problem.
Now we need expressions for (^ i ( 0 )
ci ( 0 ) ci0 ), for which we do asymptotic expansions,
i0 ) and (^
following Rilstone, Srivastava and Ullah (1996):
+ Op (T
3=2
)
(A2)
ci0 ) = bc 1=2 + bc 1 + Op (T
3=2
)
(A3)
(^ i ( 0 )
i0 )
(^
ci ( 0 )
where b
vector b
=b
1=2
+b
and bc 1=2 are the elements of the vector b
1 , which are determined as follows:
1=2
b
1=2
b
1
=
Q
1
R
=
Q
1
Sb
R=
d i0
dci0
1
T
1=2
!
1
Q
2
Q = E(rR)
S = rR
Q
2
U = E(r Q)
32
1
1=2 ,
1
and b
U (b
1=2
1
and bc 1 are the elements of the
b
1=2 )
From the above expressions we obtain:
b
1=2
1
T
=
1
T
E
1
T
bc 1=2 =
dc
i0
1
T
dai0 E
i0
E
1
T
dcci0
E
1
T
dc
i0
1
T
dci0 E
E
1
T
dcci0
E
1
T
dci0 E
d
dai0 E
1
T
E
d
i0
1
T dcci0
2
1
T dc i0
1
T d i0
2
1
T dc i0
(A4)
(A5)
It is also useful to obtain:
2
i0 )
= (ba 1=2 )2 + Op (T
3=2
ci0 )2 = (bc 1=2 )2 + Op (T
3=2
(^ i ( 0 )
(^
ci ( 0 )
(^ i ( 0 )
ci ( 0 )
i0 ) (^
With respect to the squares of b
(b
2
1=2 )
=
1
T
dai0
2
E
1
T
dcci0
2
1
T
(bc 1=2 )2 =
1
T
2
dci0
E
1
T
d
2
i0
1
T
+
d
d
1
T
E
b
2
1
T
E
1=2
+ Op (T
)
(A7)
3=2
)
2 T1 dai0 T1 dci0 E
i0
dcci0
1
T
E
2
dc
1
T
E
i0
2
dc
1
T
E
i0
dai0
1
T
E
2
dci0
1
T
E
1=2
c
(A6)
(A8)
and bc 1=2 , we get:
1=2
+
ci0 ) = b
a
)
dc
dcci0
E
1
T
dc
i0
1
T
E
dcci0
i0
2 T1 dai0 T1 dci0 E
i0
1
T
2 2
1
T
d
i0
E
1
T
dc
i0
2 2
dc
i0
Substituting by expectations, and using the information matrix identity (E(dc i ) =
E(dai dci )), we
get:
(b
2
1=2 )
=
(bc 1=2 )2 =
1
T E
1
T E
1
T
E
1
T
d
i0
1
T
E
1
T
d
i0
dcci0
1
T
E
1
T
E
dcci0
d
E
1
T
dc
i0
E
1
T
dc
i0
+ Op (T
3=2
2
)
(A9)
+ Op (T
3=2
2
)
(A10)
3=2
(A11)
i0
dcci0
Following the same procedure for the cross-product, we get:
b
c
1=2 b 1=2 =
With respect to b
1
1
T E
1
T
E
1
T
d
i0
E
1
T
dc
dcci0
i0
E
1
T
dc
2
+ Op (T
)
i0
and bc 1 , we follow the same procedure (replace by expectations and use the
33
information matrix identity) to get:
b
1
=
1
2T
(
2E
1
1
T
E
d
E
i0
2
1
dc
T
E
i0
1
T
E
dcci0
1
d
T
cci0
1
T
dc
i0
(A12)
2 2
1
dai0 dcci0
T
+E
2
1
1
1
E
dcci0
da i0 + 2E
dai0 d i0
T
T
T
1
1
1
d i0 E
dcci0 E
d cci0 + 2E
+E
T
T
T
1
1
1
E
dc i0 E
d i0 E
dccci0 + 2E
T
T
T
1
1
1
dc i0 E
dcci0 3E
d ci0 + 4E
E
T
T
T
+E
1
dci0 dc
T
i0
+E
+ Op (T
1
bc 1 =
2T
(
2E
3=2
+ 2E
1
dci0 d
T
i0
)
1
1
T
E
d
E
i0
2
1
dc
T
1
d
+E
T
1
d
+E
T
1
E
dc
T
1
E
dc
T
+ Op (T
1
dci0 dc i0
T
1
dci0 dcci0
T
1
dai0 dc i0
T
i0
i0
1
d
T
E
1
dccci0
T
i0
i0
3=2
1
dcci0
T
1
E
dcci0
T
1
E
d i0
T
E
E
dcci0
E
2
i0
1
T
ci0
1
T
+E
+ 2E
1
d
T
1
E
da
T
1
3E
d
T
E
dc
i0
(A13)
2 2
1
dci0 d
T
i0
+E
1
dai0 dc
T
i0
1
dci0 dcci0
T
ci0
i0
cci0
)
1
dai0 dc i0
T
1
+ 2E
dai0 d i0
T
1
+ 4E
dci0 dc i0
T
+ 2E
+ 2E
1
dai0 dcci0
T
(A14)
34
Introducing all these expressions in (A1), and taking expectations, we get:
E(d i ( 0 ; ^i ( 0 ))) =
1
T
E
i0 dci0
d
1
T
E
+
1
2
(A15)
1
T
E
d
i0
dc
i0
E
1
T
1
T
E
1
T
E
(
d
2
1
dc
T
2E
1
T
E
i0
E
dcci0
d
d
i0 dai0
E
1
T
1
T
dc
1
T
E
dcci0
2
dc
i0
ai0
dcci0
1
d
T
E
i0
1
T
E
1
dai0 dcci0
T
+E
cci0
2 2
i0
1
dci0 dc
T
+E
i0
2
1
1
1
dcci0
da i0 + 2E
dai0 d i0
E
T
T
T
1
1
1
+E
d i0 E
dcci0 E
d cci0 + 2E
T
T
T
1
1
1
dc i0 E
d i0 E
dccci0 + 2E
E
T
T
T
1
1
1
E
dc i0 E
dcci0 3E
d ci0 + 4E
T
T
T
+E
+
+
E
1
T
d
ci0 dai0
E
1
2
1
T
E
1
d
T
1
d
+E
T
1
E
dc
T
1
dc
E
T
+E
E
d
1
dc
T
2E
E
d
1
T
i0
i0
i0
i0
d
i0
1
d
T
aci0
+ O(T
1
d
1
dccci0
T
E
dcci0
1
dc
T
E
1
T
dc
i0
+E
ci0
+ 2E
1
d
T
1
da
E
T
1
3E
d
T
E
E
i0
1
T
E
E
1
T
ci0 dci0
E
1
T
d
+ 2E
1
dci0 d
T
i0
i0
2
dc
i0
ci0
1
d
T
1
dcci0
T
1
dcci0
E
T
1
E
d i0
T
1
d
dcci0
dcci0
E
E
1
T
1
T
E
i0
E
2
i0
E
1
T
1
T
E
2
i0
dc
i0
E
(
+
1
T
1
T
E
1
dci0 dc i0
T
1
dci0 dcci0
T
1
dai0 dc i0
T
1
T
dc
1
E
2
2 2
1
dci0 d
T
i0
+E
1
dai0 dc
T
i0
1
dci0 dcci0
T
ci0
i0
cci0
1
dai0 dc i0
T
1
dai0 d i0
+ 2E
T
1
+ 4E
dci0 dc i0
T
+ 2E
+ 2E
1
dai0 dcci0
T
2
i0
1
d
T
i0
E
1
dcci0
T
1
E
2
1
d
T
cci0
E
1
d
T
i0
)
The remainder of this expression is O(T 1 ) because Op (T 1=2 ) terms have zero mean. This means
that the score of the original concentrated likelihood has a bias of order O(1), whose expression is in the
previous formulae.
35
A.2
Modi…ed Score
The modi…ed score in (10) can be decomposed in three terms, d
A = d i ( ; i ( ))
1
1
B=
2 d i dcci
M i(
) = A + B + C, such that:
(A16)
dc
i
(A17)
2
@ ^i
@^
ci
+ dccci
@
@
@^
ci
@ ^i
+ d ci
+ dcci d
i+d
i
@
@
@ ^i
@^
ci
2dc i d aci + d ci
+ d cci
@
@
E(d ci )E(dc i ) E(dcci )E(d i )
@
C=
@ai
E(d i )E(dcci ) [E(dc i )]2
d
d
i
+d
cci
cci
E(d i )E(dc i ) E(d i )E(d
E(d i )E(dcci ) [E(dc i )]2
@
@ci
(A18)
i= i(
)
ci )
i= i(
)
A is the score of the original unmodi…ed concentrated log-likelihood. So, we now analyze B and C:
Part B. We …rst want to derive expression for @ ^ i =@ and @^
ci =@ . Di¤erentiating the score of the
concentrated log-likelihood, d i ( ; i ( )), with respect to we get a system of two equations with two
unknowns. Solving for @ ^ i =@ and @^
ci =@ we get:
d ci dc i dcci d
@ ^i( )
=
@
d i dcci d2c i
d i dc i d i d
@^
ci ( )
=
@
d i dcci d2c i
evaluating at
0
i
(A19)
ci
(A20)
and replacing by expectations:
E
@ ^i( 0)
=
@
1
T
E
@^
ci ( 0 )
=
@
1
T
d
ci0
E
1
T
d
i0
E
1
T
E
d
1
T
i0
E
d
1
T
i0
dc
i0
E
1
T
dc
i0
E
1
T
E
1
T
dcci0
E
dcci0
36
E
1
T
1
T
dcci0 E
1
T
dc
d
i0
E
1
T
i0
+ Op (T
1
2
)
(A21)
i0
1
T d ci0
2
E
dc
d
2
i0
+ Op (T
1
2
)
(A22)
Introducing in (A17) and rearranging terms:
1
T
E
B=
d
ci0
1
T
E
dc
1
T
1
T
E
i0
1
T
E
d i0 E
dcci0
d i d cci + dcci d
2dc i d ci
i
2(d i dcci d2c i )
d i d cci + dcci d
2dc i d
i
2(d i dcci d2c i )
1
T
E
d
E
i0
1
T
dc
1
T
E
i0
1
T
E
ci
d
i0
1=2
+ Op (T
E
1
T
d
i0
ci0
E
T
E
)
1
T d ci0
2
E
i0
1=2
)
), adding 1=T 2 in numerators and denomi-
E
1
T
1
d
T
+ Op (T
1
dc
T
dcci0
E
1
T
dc
(A24)
2 2
i0
1
1
dcci0 E
d ai0
T
T
1
1
1
1
d i0 E
d cci0 + E
dcci0 E
da i0
E
T
T
T
T
1
1
1
1
+ E
d i0 E
dc i0
d i0 E
d ci0
E
T
T
T
T
1
1
1
1
dcci0 E
d ci0 + E
d i0 E
dccci0
E
T
T
T
T
1
1
2
2 E 1 d i0 E 1 dcci0
E 1 dc i0
1
d
T
E
(A23)
1
1
2
B=
i0
i0
i0
1=2
Op (T
d
2
dc
E T1 d i0 E T1 dcci0
E T1 dc
dcci d ci + d i dccci 2dc i d cci
2(d i dcci d2c i )
dcci d ci + d i dccci 2dc i d cci
Op (T
2(d i dcci d2c i )
d i d cci + dcci d
2dc i d aci
i
2
2(d i dcci dc i )
Evaluating at 0 , using the fact that i ( ) =
nators and replacing by expectations:
1
T
dcci0 E
i0
E
T
i0
1=2
E
1
d
T
2E
1
dc
T
i0
E
1
d
T
2E
1
dc
T
i0
E
1
d
T
ci0
cci0
T
cci0
+E
1
dcci0 E
T
1
d
T
i0
2E
1
dc
T
i0
E
1
d
T
aci0
)
(A25)
Finally, taking the expected value of this expression will not change anything, except that the remainder would be O(T 1 ) instead of Op (T 1=2 ).
Part C.
To analyze C, we need the following result:
@
E (d
@ i
ci )
= E (d
ci )
+ E (d
ci d i )
This works with other derivatives of expectations as well.
C is the sum of two derivatives, that we call C and C c respectively, evaluated at
37
(A26)
i
=
i(
). C is
equal to:
@
@ai
C =
@
@ai
=
+
E(d ci )E(dc i ) E(dcci )E(d i )
E(d i )E(dcci ) [E(dc i )]2
(E(d
ci )E(dc i )
E(d
(E(d
E(dcci )E(d
i )E(dcci )
ci )E(dc i )
[E(dc
i
@
i )) @ai
E(dcci )E(d
(E(d
i ))
)]2
i )E(dcci )
E(d
i )E(dcci )
2 2
[E(dc i )]2
[E(dc i )] )
Working with the derivative and using the above result, we get:
Ca =
1
)E(d
i
cci )
E(d
fE(d
ci ) [E(d
ci )
E(dcci ) [E(d
+
E(d
+ E(dc i dai )] + E(dc i ) [E(d
i)
+ E(d
ci )E(dc i )
(E(d
fE(d
[E(dc i )]2
E(d
+ E(d
i ) [E(d cci )
ci dai )]
+ E(dcci dai )]g
E(dcci )E(d
i )E(dcci )
i ) [E(d cci )
2E(dc i ) [E(d
i dai )]
aci )
[E(dc
i
i)
2
2
)] )
+ E(dcci dai )] + E(dcci ) [E(d
ci )
i)
+ E(d
i dai )]
aci )
+ E(d
i dci )]
+ E(dc i dai )]g
Likewise, for C c we have:
Cc =
fE(d
i ) [E(d cci )
E(d
+
1
)E(d
i
cci )
E(d
E(d
+ E(d
i )E(dc i )
E(d
i )E(dcci )
fE(dcci ) [E(d
ci )
2E(dc i ) [E(d
0
+ E(dc i dci )] + E(dc i ) [E(d
i ) [E(d cci )
(E(d
We then evaluate at
[E(dc i )]2
E(d
ci ) [E(d
ci )
+ E(d
i dci )]g
i )E(d ci )
2
[E(dc i )]2 )
+ E(d
cci )
ci dci )]
i dci )]
+ E(d
i ) [E(dccci )
+ E(dcci dci )]
+ E(dc i dci )]g
and take the expected value of these expressions.
Putting everything together. Finally, if we add all the terms of B and C from before, which is
equal to d M i ( ) d i ( ; i ( )) = B + C, we get exactly minus (A15). Therefore, the modi…ed score
equal the standard score minus the …rst order term of the bias, because we are subtracting it with the
modi…cation B + C: The reminder of this expansion for d M i ( ) is O(T 1 ); as opposed to O(1); which
is the order of magnitude of the bias of d i ( ; i ( )). This shows that MMLE reduced the order of the
bias of the MLE.
B
Computation of the MMLE
Computing the MMLE implies maximizing a likelihood whose …rst order condition is equation (10). This
…rst order condition has known close analytical terms. This means that we can program this optimization
problem in any of the most frequently used programs in economics: MATLAB, GAUSS, and STATA. We
can even use one of the already written routines and tools in those programs to maximize a likelihood,
38
provided it allow us specify the analytical form of the …rst order condition; otherwise we would obtain
the MLE instead of the MMLE. We have used FORTRAN to program the MMLE for this paper because
we are more familiar with this programming language and because we have conducted several Monte
Carlo experiments and expected FORTRAN to be faster at doing this. But nothing in MMLE prevents
us from using other software and programing language. MML Estimates reported in Table 8 (our main
model) took 5 minutes. MML Estimates reported in Table 13 took 34 minutes, because it has much more
variables than model in Table 8.
There are three main aspects when computing the MMLE:
1. We …rst have to obtain the several derivatives and cross derivatives of the likelihood (8). This
includes di¤erentiating the MLE’s …rst order conditions for the …xed e¤ects with respect to ,
ci
so that we obtain @@bi and @^
@ . This may look somewhat tedious, but these are straight forward
calculations with known compact general forms that hold for all the parameters.
2. Calculate the expectations in (10). They are expectations of functions of Xit and hit 1 , f (hit 1 ; Xit )
where f denotes here any of the functions that results from the derivatives that compound the
modi…cation. These expectations are conditional on all the values of the xi covariates, on hi0 , and
on ( i ; ci ); that is E [ f (hit 1 ; Xit )j Xi = xi ; hi0 ; i ; ci ]. Thus, the only random variable over which
the expectation is made is hit 1 whenever t > 1. For t = 1
E [ f (hit 1 ; Xit )j Xi = xi ; hi0 ; i ; ci ] = f (hi0 ; xit ). For t = 2
E [ f (hit 1 ; Xit )j Xi = xi ; hi0 ; i ; ci ] = f (hi1 = 1; xit ) Pr (hi1 = 1jxi ; hi0 ; i ; ci ) + f (hi1 =
0; xit ) Pr (hi1 = 0jxi ; hi0 ; i ; ci ) + f (hi1 = 1; xit ) Pr (hi1 = 1jxi ; hi0 ; i ; ci ),
where the Pr (hi1 jxi ; hi0 ; i ; ci ) are those given by the model in equations (5). For t > 2 we
continue proceeding recursively using Pr (hit 2 jxi ; hi0 ; i ; ci ) to calculate Pr (hit 1 jxi ; hi0 ; i ; ci ):
Pr (hit 1 jxi ; hi0 ; i ; ci ) = Pr(hit 1 jxit ; hit 2 = 1; ci ; i ) Pr (hit 2 = 1jxi ; hi0 ; i ; ci ) + Pr(hit 1 jxit ; hi1 =
0; ci ; i ) Pr (hit 2 = 0jxi ; hi0 ; i ; ci ) + Pr(hit 1 jxit ; hi1 = 1; ci ; i ) Pr (hit 2 = 1jxi ; hi0 ; i ; ci ),
where Pr(hit 1 jxit ; hit 2 ; ci ; i ) is given by equations (5) and Pr (hit 2 jxi ; hi0 ; i ; ci ) has already
been obtained in this recursive process.
3. Concentrate the likelihood and estimate with …xed e¤ects. The problems come from not having
a close form for bi and c^i to obtain the analytic expression of the concentrated likelihood, and
from having to estimate as many …xed e¤ects parameters as individuals in the panel with large
N . This problem is not speci…c to the MMLE. It a¤ects any estimator with …xed e¤ects and
has already been treated in the literature. On top of that, computational problems are smaller
with the current technology than they used to be. Classical references o¤ering di¤erent solutions
are Chamberlain (1980) and Heckman and MaCurdy (1980). More recently Greene (2004) also
deals with the computational problem of inverting a large Hessian matrix. We have not used
any of these solutions when estimating the MLE and MMLE. We have followed the proposal in
Appendix B of Carro (2008) that concentrates the likelihood numerically by nesting the …rst order
conditions used to compute the …xed e¤ects in the algorithm that maximizes the concentrated
likelihood with respect to and . We have found this to be faster than dividing the optimization
problem in two procedures and iterating back and forth between the two optimization algorithms
until convergence is reached, as proposed by Heckman and MaCurdy (1980). This also does not
require us to invert a large Hessian matrix and, at the same time, produces a correct estimate of
the variance. See Appendix B in Carro (2008) for further details. In any case, the message here is
that these computational problems already have satisfactory solutions.
39