Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Working Paper 11-18 Economic Series May, 2011 Departamento de Economía Universidad Carlos III de Madrid Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 916249875 State dependence and heterogeneity in health using a bias corrected fixed effects estimator.* Jesús M. Carro Universidad Carlos III de Madrid Alejandra Traferri Pontificia Univ. Católica de Chile Abstract This paper considers the estimation of a dynamic ordered probit of self-assessed health status with two fixed effects: one in the linear index equation and one in the cut points. The two fixed effects allow us to robustly control for heterogeneity in unobserved health status and in reporting behaviour, even though we can not separate both sources of heterogeneity. The contributions of this paper are twofold. First it contributes to the literature that studies the determinants and dynamics of Self-Assessed Health measures. Second, this paper contributes to the recent literature on bias correction in nonlinear panel data models with fixed effects by applying and studying the finite sample properties of two of the existing proposals to our model. The most direct and easily applicable correction to our model is not the best one, and has important biases in our sample sizes. JEL classification: C23, C25, I19 Keywords: dynamic ordered probit, fixed effects, self-assessed health, reporting bias, panel data, unobserved heterogeneity, incidental parameters, bias correction. * This is a revised version of a paper previously circulated under the title "Correcting the bias in the estimation of a dynamic ordered probit with fixed effects of self-assessed health status". We thank Raquel Carrasco, Matilde Machado and seminar participants at Universidad Carlos III, Boston University, and The Lincoln College (Oxford U.) Applied Microeconometrics Conference for useful remarks and suggestions. The first author gratefully acknowledges that this research was supported by a Marie Curie International Outgoing Fellowship within the 7th European Community Framework Programme, by grants number ECO2009-11165 and SEJ2006-05710 from the Spanish Minister of Education, MCINN (Consolider- Ingenio2010) and Consejería de Educación de la Comunidad de Madrid (Excelecon project). The second author would like to thank the Institute for Economic Development at Boston University, in which she conducted part of this research as a visiting scholar. Address for correspondence: Jesús M. Carro. Departmento de Economia, Universidad Carlos III de Madrid, Madrid 28903, Getafe (MadridSpain). E-mail addresses: jcarro@eco.uc3m.es; atraferri@uc.cl. 1 Introduction Self-assessed health (SAH) has been used as a proxy for true overall individual health status in many socioeconomic studies. Moreover, it has been shown to be a good predictor of mortality and of demand for medical care (see, for example, van Doorslaer, Jones, and Koolman, 2004). Motivated by this and by the high observed persistence in health outcomes, Contoyannis, Jones and Rice (2004) study the dynamics and e¤ects of socioeconomic variables on SAH in the British Household Panel Survey. Among other aims, they investigate the relative contribution of state dependence and unobserved heterogeneity in explaining the observed persistence in SAH. State dependence may arise due to structural reasons such as di¤ering abilities to deal with new health shocks depending on previous health status, or willingness to investments in health that changes as health status evolves. For example, people may be less prone to invest in their health after a health shock that lowers their returns to that investment. In any case, as it happens in labor force participation, regardless of the underlying explanations for state dependence, knowing its magnitude is relevant for many health policy debates. This is because the state dependence informs of the long-run implications of a policy a¤ecting health status today. Given that SAH is a categorical variable Contoyannis, Jones and Rice (2004) use a dynamic ordered probit model, and they take a random e¤ects approach to control for unobserved heterogeneity in the level equation. Halliday (2008) studies the relative contribution of state dependence and unobserved heterogeneity in SAH using a di¤erent data set and another random e¤ects approach. Halliday(2008) only includes age as a covariate as the study focuses on the evolution over the life-cycle. We account for heterogeneity in reporting behavior (cut-point shifts) in addition to heterogeneous unobserved factors that a¤ect health status (index shifts). An example of index shifts is genetic traits. Cut-point shifts occur if individuals use di¤erent thresholds to assess their health and report di¤erent values of SAH even though they have the same level of true health.1 Since we can only identify di¤erences up to scale in discrete choice models, we cannot separately identify the two sources of heterogeneity. We can, nonetheless, correctly control for both sources of heterogeneity by including individual e¤ects in the levels and the cut points of the ordered probit. A model with only one individual e¤ect (usually placed in the index equation) allows both sources of heterogeneity too, but so restrictively that it almost always gives incorrect estimates and inferences if both sources are present and relevant. As with one individual e¤ect, we could take a ‘random e¤ects’ approach. However, this approach has the drawback of imposing either independence, or a speci…c and potentially 1 See Lindeboom and van Doorslaer (2004) for a test that shows evidence of existence of these two di¤erent kinds of shifts. 1 too restrictive functional form on the relation between unobserved heterogeneity and other explanatory variables. It also has the drawback of having to deal with the so-called initial conditions problem. By taking a ‘…xed e¤ects’ approach, we place no restrictions on the joint distribution of the two individual e¤ects and their correlation with explanatory variables. Moreover, there is no initial conditions problem. Despite these advantages, there have been very few applications of nonlinear panel models with …xed e¤ects in health economics, as noted in Jones’ (2007) handbook’s chapter.2 This is due to the known problems in estimating nonlinear panel data models with …xed e¤ects and the panel data sets available. This estimation problem is usually called incidental parameters problem, and it results in large …nite sample biases of the MLE when using panels where T is not very large. It is more severe in a model like ours that is dynamic and contains more than one …xed e¤ect. An important part of the research in microeconometrics has been concerned with …nding a solution to this problem by developing bias-adjusted methods. Some examples are Hahn and Newey (2004), Hahn and Kuersteiner (2004), Arellano and Hahn (2006), Carro (2007), Fernandez-Val (2009), and Bester and Hansen (2009).3 This fast growing literature o¤ers several bias correction methods potentitaly useful to estimate our model. Bester and Hansen (2009) include an application of their so-called HS estimator to a dynamic ordered probit model with two …xed e¤ects. So, the HS is directly applicable to our problem, whereas others require some transformation to adapt them to our model with two …xed e¤ects. However, simulations of other models in the referred papers suggest that HS is not the best one in terms of …nite sample performance. They show that for sample sizes with T less than fourteen, the remaining bias when using HS could still be signi…cant, especially for the ordered probit Bester and Hansen (2009) simulate. This result is con…rmed in our simulations, which are more speci…c to the model we want to estimate. Thus, we have to consider another of the proposed methods. In this paper we derive explicit formulas of the Modi…ed MLE (MMLE) used in Carro (2007) for the dynamic ordered probit model considered here. We evaluate its …nite sample performance and compare it with the HS penalty estimator.4 The MMLE has better …nite sample properties and negligible bias in our sample size. This exercise is a main contribution of this paper since, as Arellano and Hahn (2007) point out in their conclusions, more research is needed to know “how well each of the methods recently proposed work for other speci…c models and data sets of interest in applied econometrics.” 2 Jones and Schurer (2009) is a recent example of using the …xed e¤ects approach to study SAH; however, they use the Conditional MLE of Chamberlain (1980) which does not provide information about the distribution of the …xed e¤ects. This information is needed to calculate marginal e¤ects, the usual parameters of interest in nonlinear models. Another important di¤erence is that Jones and Schurer (2009) do not allow for dynamics. 3 See Arellano and Hahn (2007) for a good review of this literature, detailed references and a general framework in which the various approaches can be included. 4 The MMLE comes from modifying the score of the MLE so that the order of the bias in T is reduced. 2 Also, Greene and Henshen (2008) comment on the lack of studies about the applicability of the recent proposals for bias reduction estimators in binary choice models to ordered choice models. The rest of the paper proceeds as follows. Section 2 presents our model of SAH, the data we use, and explains the relation of this paper to other recent papers about SAH. Section 3 presents the estimation problem and the method we propose. We also comment on possible solutions from the nonlinear bias correction literature for nonlinear panel data models with …xed e¤ects. We use simulations to evaluate the …nite sample performance of di¤erent alternatives and to justify selection of MMLE as our estimator. Section 4 presents the estimation results. The estimates of our model and the comparison with random e¤ects estimates show that there are important state dependence e¤ects, and statistically signi…cant e¤ect of income and other socioeconomic variables. Results also show that ‡exibly accounting for permanent unobserved heterogeneity matters. Section 5 concludes. 2 Model and Data 2.1 Empirical Model of self-assessed health We consider the following dynamic panel data ordered probit with …xed e¤ects as a reduced-form model of self-assessed health status (SAH): hit = i + 1 1 (hi;t 1 = 1) + 1 1 (hi;t 1 = 1) + x0it + "it ; i = 1; :::; N , t = 0; ::; T (1) where xit is a set of exogenous variables that in‡uence SAH, "it is a time and individualvarying error term which is assumed to be "it N (0; 1), and hit is the latent health. The iid reported SAH (hit ), which is what we observe, is determined according to the following thresholds: 8 > hit < ci < 1 if hit = (2) 0 if ci < hit 0 > : 1 if hit > 0 where hit = 1 corresponds to poor health, hit = 0 to fair health and hit = 1 to good health. i and ci are the model’s …xed e¤ects; these account for permanent unobserved heterogeneity, both in unobserved factors a¤ecting health and in reporting behaviour, in an unrestricted way, as explained at the introduction. Note that in addition to the usual scale normalization in discrete choice models (i.e. restricting the variance of "it to equal one), here we are also normalizing one of the two cut points to be zero. The somewhat more conventional normalization of setting the intercept in the linear index equal to zero is not available to us because the distribution of the intercept, including its mean, is 3 unrestricted in the …xed e¤ects approach. An alternative normalization would be to put the two …xed e¤ects in the two cut points and leave the linear index equation without any intercept. As this discussion on normalization shows, it is clear that it is not possible to separately identify individual e¤ects a¤ecting that impact only hit from those that impact the cut points. Therefore, though we controll for the two mentioned sources of unobserved heterogeneity, we can not separate them. Additionally, having only the …xed e¤ect in the linear index ( i ) would also account for heterogeneity in the cut points, but in a very restrictive way. In particular, by introducing only one individual e¤ect ( i ), we would be assuming that both sources of unobserved heterogeneity must have e¤ects of opposite signs in Pr(hit = 1) and Pr(hit = 1); furthermore, we would be restricting how these two e¤ects di¤er in magnitude for all individuals. We do not have evidence in favor of these assumptions. Furthermore, given the di¤erent sources of the unobserved heterogeneity and the potential relations among them and observable variables these assumption are most likely too restrictive, leading to incorrect inference. In contrast with this, by having two …xed e¤ects in (2) we are not imposing any restrictions on the cut-point shifts, nor on the index shift. This constitutes an important di¤erence from previous studies like Contoyannis, Jones and Rice (2004). In addition to the parameters capturing the e¤ect of heterogeneity, capture the e¤ect of exogenous variables, and 1 and 1 are the parameters that allow state dependence in this model. Determining the relative importance of tate dependence versus permanent unobserved heterogeneity as alternative sources of persistence is crucial since they have very di¤erent implications. As explained in the introduction, there are several structural reasons for state dependence. However, regardless of the reason, state dependence gives the long-run e¤ect of a policy a¤ecting health status today. This is why it is so useful to know its magnitude. 2.2 Data and x variables We use the British Household Panel Survey (BHPS), a longitudinal survey of private households in Great Britain. It was designed as an annual survey of each adult (16+) member of a representative sample of more than 5,000 households, with approximately 10,000 individual interviews. The same individuals are re-interviewed in successive waves; if they split o¤ from their original households are re-interviewed along with all adult members of their new households. Similarly, new adult members joining sample households, and children who have reached the age of 16 become eligible for interview. We use sixteen waves of data (years 1991 - 2006), and include individuals who gave a full interview. An unbalanced panel of individuals who were interviewed in at least 8 subsequent waves is used. Our sample consists of 76128 observations from 6,375 individuals. 4 SAH is de…ned for waves 1-8 and 10-16 as the response to the question “Compared to people of your own age, would you say your health over the last 12 months on the whole has been: excellent, good, fair, poor, very poor?” In wave 9 the SAH question and categories were reworded. This makes comparison with other waves di¢cult and wave 9 is not used in our empirical analysis. The original …ve SAH categories is collapsed to a three-category variable, creating a new SAH variable that is our dependent variable, with the following codes: poor (hit = 1) for individuals who reported either “very poor” or “poor” health; fair (hit = 0) for individuals who reported “fair” health; and Good (hit = 1) for individuals who reported “good” or “excellent” health. Main Model. The explanatory variables x that we use in the main model we estimate are: three dummy variables representing marital status (Married, Widowed, Divorced/Separated) with Single as the reference category, size of the household (the number of people living in the same household), number of kids in the household, household income, year dummies (excluding the necessary number to avoid prefect colinearity), and a quadratic function of age. The question about SAH that we use to construct our dependent variable asks respondents to compare health with people their own age. However, SAH becomes worse over time in the raw sample data, perhaps indicating that the age e¤ect over health is not totally discounted by respondents.This can be seen in table 2.5 This is the reason for including age as an explanatory variable. The income variable is the logarithm of equivalised real income, adjusted using the Retail Price Index and equivalised by the McClement’s scale to adjust for household size and composition, and consists on the sum of non-labour and labour income in the reference year. Variables that are time-constant and speci…c for individuals, like the level of education or gender, are not included in the set of explanatory variables because they can not be separately identi…ed from permanent unobserved heterogeneity.6 Fixed e¤ects account for these variables as well as for unobserved characteristics, and we can not separate their e¤ects. Sometimes this is seen as a drawback of the …xed e¤ects approach. However, the random e¤ects approach only separately identi…es the e¤ect of these variables because of the unrealistic assumption that unobserved characteristics are independent from them (for example that unobserved healthy life style is independent of education). Even with a correlated random e¤ects approach, if correlation is allowed in a Mundlak (1978) and Chamberlain (1984) style and initial conditions are controlled for following Wooldridge (2005) proposal, it is not possible to separately identify the e¤ect of these time constant variables from the e¤ect of the unobserved factors correlated with them without further assumptions. For instance, Contoyannis, Jones and Rice (2004) follow Wooldridge (2005) 5 6 See Contoyannis, Jones and Rice (2004) for further discussion on this. They are, however, included in the random e¤ects estimation we make for comparison. 5 proposal, and they comment about this impossibility of separating the e¤ect of variables like education from the e¤ect of the unobservables correlated with them. Additional Model. In addition to the main model we estimate a model including variables with information on objective health problems. These variables turn in observable part of the unobserved underlying true health, especially persistent health situations. This will help in identifying heterogeneity in reporting behaviour. With this additional model we try to see whether the state dependence that we may …nd in the main model is still signi…cantly di¤erent from zero even after introducing observations of persistent determinants of health. These variables are not clean determinants of SAH and are a mix of several components. Therefore they will induce a decrease in the e¤ect of hit 1 even if we correctly capture and isolate all the state dependence e¤ect in the main model. However, if state dependence is still signi…cantly di¤erent from zero this will provide further evidence of the robustness and importance of dynamics and state dependence in SAH. The BHPS contains several questions about health problems and health care demand, but many of them can be induced by a self valuation that might di¤er from true health as much as SAH, and in an unobserved way. For example the number of visits to the doctor can be determined by a perception of a health problem rather than a true health problem. To avoid this endogeneity bias, we have selected only those questions that we regard as measuring more objective health situations and, therefore, are not a¤ected by personal health assessments. We introduce the following variables: - Health problems: This is a dummy variable, which takes the value 1 if the individual reports at least one of the following permanent health problems or disabilities: arthritis or rheumatism, di¢culty in hearing, allergies, asthma, bronchitis, blood pressure, diabetes, migraine or frequent headaches, cancer and stroke, among others. - Health limits daily activities: This is a dummy variable, which takes the value 1 if the individual answers ‘yes’ to the following question: does your health in any way limit your daily activities, compared to most people of your age? Examples of daily activities included are: doing the housework, climbing stairs, dressing yourself, walking for at least 10 minutes, etc. - Health limits ability to work: Similar to previous question. - Number of days in a hospital as an in-patient in the reference year. - Finally, we include a dummy variable representing long term sick or disabled, and four other variables for employment status (Self employed, In paid employment, Unemployed, Retired). The category ’Other’ (that includes looking after family or home, on maternity leave, on a government training scheme, full-time student/at school, and something else) is left as the reference category. 6 Table 1: Number of individuals that reports each category of SAH by number of times it is reported. Number Excellent or good Fair of times Freq. % Freq. (N) 0 273 4.28 2076 1 170 2.67 1114 2 182 2.85 867 3 193 3.03 641 4 233 3.65 481 5 273 4.28 376 6 379 5.95 279 7 456 7.15 204 8 665 10.43 145 9 563 8.83 83 10 533 8.36 61 11 495 7.76 19 12 544 8.53 20 13 672 10.54 5 14 744 11.67 4 Total 6375 100.00 6375 Poor or very poor % Freq. (N) % 32.56 4380 68.71 17.47 898 14.09 13.60 367 5.76 10.05 213 3.34 7.55 137 2.15 5.90 99 1.55 4.38 79 1.24 3.20 46 0.72 2.27 47 0.74 1.30 33 0.52 0.96 32 0.50 0.30 16 0.25 0.31 8 0.13 0.08 9 0.14 0.06 11 0.17 100.00 6375 100.00 For example, 273 in the conlumn Freq. of category ‘Excellent or good’, is the number of individuals that reported ‘Excellent or good’ 0 times in total over the sample period they are observed. Descriptive Statistics Tables 1, 2 and 3 contain some descriptive statistics of selfassessed heath reported in our sample. The most frequent category is excellent or good with more than 70% of the answers corresponding to this category. There is high persistence in SAH reported as can be seen in table 3, which shows the transition probabilities. In this table, the largest numbers are on the diagonal for all three values of SAHt 1 . Table 2 presents the variation of SAH across di¤erent characteristics and health variables. For example, married or single people respond in the excellent or good health category more frequently than widows or divorced people. The three objective health measures in table 2 alter the SAH responses in the expected direction and in greater magnitude than the socioeconomic variables also presented in the table. 2.3 Relation to recent papers studying heterogeneity and state dependence in SAH 2.3.1 Relation to Contoyannis, Jones and Rice (2004) There is a clear connection between this paper and Contoyannis, Jones and Rice (2004): both papers use the British Household Panel Survey to study the dynamics of SAH. Nev- 7 Table 2: Proportion (in %) of each category of SAH by several characteristics Characteristics and their SAH categories Sample Proportions Excellent or good Fair Poor All 73.19 19.39 By age group 40.17 < 40 78.31 16.50 43.92 40-64 72.92 18.91 15.91 65+ 61.02 28.02 By sex 46.84 Male 75.35 18.32 53.16 Female 71.29 20.34 By marital status 63.46 Married 74.00 18.86 8.92 Divorced 69.63 19.29 6.32 Widowed 58.84 28.92 21.3 Single 76.52 18.20 By household size 13.30 1 65.57 23.82 34.32 2 71.67 20.51 20.20 3 74.33 18.44 21.63 4 78.30 16.50 10.55 5+ 75.10 17.97 By kids number 64.12 0 70.91 20.84 15.52 1 76.70 17.05 14.73 2 78.45 16.19 5.63 3+ 75.75 17.74 Health problems 58.46 Yes 60.57 27.26 41.54 No 90.95 8.32 Health limits daily activities 13.36 Yes 22.49 39.13 86.64 No 81.01 16.35 Health limits work 16.43 Yes 29.85 38.29 83.57 No 81.71 15.68 8 or very poor 7.42 5.19 8.17 10.96 6.34 8.37 7.14 11.08 12.25 5.28 10.62 7.83 7.24 5.20 6.93 8.25 6.25 5.36 6.51 12.16 0.74 38.38 2.64 31.86 2.61 Table 3: Sample transition probabilities from SAH in t-1 to SAH in t SAH in t 1 Excellent Fair Poor or very poor Proportion Excellent or 85.91 43.22 17.66 72.80 SAH in t good Fair Poor or very poor Total 11.84 2.25 100 45.18 11.59 100 31.60 50.74 100 19.67 7.53 100 ertheless, there are several aspects considered in Contoyannis, Jones and Rice (2004) that are not studied here. In particular, that paper contains a more detailed data description, and a further discussion of the estimated model; it also address other issues, like sample attrition, that are not considered here.7 However, our paper complements and adds to Contoyannis, Jones and Rice (2004) in various ways: (i) We use more periods from the BHPS than they do. They only use the …rst eight waves because the ninth contains a di¤erent question about and categorization of SAH. While we drop the 9th wave too, we incorporate the waves after the 9th in our estimation. Since the model speci…ed includes only one lag of hit , we have all the variables we need for the 11th to 16th waves. For the 10th wave we have all the variables but hit 1 as it is the case for the …rst wave. We treat the 10th wave like an initial observation and condition it out in our likelihood leaving the probability of that observation totally unrestricted. Contoyannis, Jones and Rice (2004) can not do this because of their way of solving the initial conditions problem and use of random e¤ects. (ii) In our model we have two individual speci…c e¤ects: one in the linear index and one in the cut points. Lindeboom and van Doorslaer (2004) tested for and found clear evidence of di¤erent reporting behavior (cut-point shifting) for gender and age. Given that Contoyannis, Jones and Rice (2004) impose homogeneous cut points, they estimate di¤erent models by gender to allow for that di¤ering reporting behavior, but they do not allow unrestricted di¤erent behavior by age. Although we can not separately identify both sources of unobserved heterogeneity, our approach is robust to heterogeneous cut points freely correlated with any determinant of SAH. (iii) We use …xed e¤ects instead of a random e¤ects approach. The advantages of this are 7 An unbalanced panel (with random attrition) in a dynamic panel model does not pose any complications to a …xed e¤ect estimator (as opposed to a random e¤ects estimator), as long as it does not imply many individuals with a very small number of periods; and in our sample all observations have at least 8 periods. However, the assumption of attrition at random seems unrealistic. Contoyannis, Jones and Rice (2004) made a test and found evidence of non-random attrition, but they also found that the bias this may be causing to the estimates is negligible. Given this result using the same data set as us, and since this problem would take us too far from the main theme of this paper, we do not consider it here. 9 that no arbitrary restriction is imposed on the correlation between permanent unobserved heterogeneity and the observable variables, and there is no initial conditions problem. (iv) As an additional complement, our study includes some objective health measures, so we can see how much is explained by socioeconomic variables and by state dependence even after these measures are included. Given the aspects not covered in this paper, and in order to make an assessment of the contributions of this paper with respect to the previous literature we also estimate our models using the same kind of speci…cation and estimation method as Contoyannis, Jones and Rice (2004). Thus that we also estimate (2) using a correlated random e¤ects speci…cation with only an individual e¤ect in the linear index equation (the i parameter in (1)), but with homogeneous cut points. Therefore, in this correlated random e¤ects speci…cation: 8 > hit < c1 < 1 if hit = (3) 0 if c1 < hit c2 > : 1 if hit > c2 where c1 and c2 are (homogenous) parameters to be estimated, hit is de…ned in (1), and i in (1) is assumed to be: i = 0 + 0 1 hi1 + 0 2 xi + ui where xi is the average over the sample period of the exogenous variables, and ui (4) iid N (0; u2 ) independently of everything else. hi1 is in (4) to deal with the initial condition problem following Wooldridge (2005). 2.3.2 Relation to Halliday (2008) Halliday (2008) studies state dependence and heterogeneity in SAH using data from the Panel Study of Income Dynamics. Since his focus is on the evolution of health over the life-cycle, he only considers age as explanatory variable. No other socio-economic variable is included. Another di¤erence with our study is that he further reduces health status to two categories estimating a logit instead of an ordered probit with three categories. With respect to heterogeneity, on the one hand Halliday (2008) is more ‡exible because his analysis allows for heterogeneous parameters both in the intercept and in the slope of the dynamic model. On the other hand the random e¤ects approach he adopts has no incidental parameters problem but restricts the distribution of the heterogeneity and su¤ers from the initial conditions problem. Nonetheless, Halliday (2008) uses a discrete …nite mixture, which is potentially more ‡exible and less parametric in its treatment of heterogeneity and the initial conditions than the distribution assumed in Contoyannis, 10 Jones and Rice (2004). This greater ‡exibility comes from the possibility of having many points of support which should provide an approximation to a variety of distributions like asymmetric distributions, or distributions with several modes. The limitation of this approach is computational. This limitation leads Halliday (2008) to consider no more than four points of support, even though more points of support might be needed.8 Four is certainly more than the two points of support assumed in other applications, but it may be not enough to provide a good approximation to a bivariate distribution. In this paper we have two individual speci…c e¤ects potentially correlated with each other. Such a bivariate joint distribution may be di¢cult to approximate with only four points of support. In contrast to these limitations, the …xed e¤ects approach we follow here is non-parametric in the distribution of the heterogeneity, requires no special treatment of the initial conditions, and does not have the same computational limitations as estimating discrete …nite distributions. Finally, Halliday (2008) …nds evidence of a great amount of heterogeneity in health, which is the most important motivation for following the approach we propose here. 2.3.3 Relation to Jones and Schurer (2009) Another recent paper dealing with self-assessed health measures is Jones and Schurer (2009). They use the German Socio-Economic Panel and focus on the e¤ect of income on health, conducting a detailed analysis on the shape of this e¤ect. But, it does not consider dynamics in SAH. However, it does address the potential importance of unobserved heterogeneity. They control for heterogeneity in both unobserved health status and reporting behaviour by estimating a logit model with …xed e¤ects for each of the J 1 threshold values into which SAH can be dichotomized. However, they estimate it using the Conditional MLE of Chamberlain (1980) because the standard MLE estimation of ordered-choice models with …xed e¤ects su¤er from a severe incidental parameters problem and there was no other solution from the panel data literature ready to be applied to these models. Implementing a solution to this problem in the estimation of orderedchoices model with …xed e¤ects is one of the main contributions of our paper. This allow us to estimate marginal e¤ects that properly account for the distribution of unobserved heterogeneity and, especially, for its correlation with observable variables. The Conditional MLE conditions out the …xed e¤ects and, therefore, no information about them is recovered. This means that when calculating marginal e¤ects they have to substitute the …xed e¤ect for a value that may not be representative of the population and, in any case, it ignores the correlation between the observables and the heterogeneity. Jones and Schurer (2009) also estimate a random e¤ects model that assumed independence of the heterogeneity. Comparing this with the …xed e¤ects estimates they …nd 8 See section 5.1.2 in Halliday (2008). 11 that the underlying assumptions of the statistical model matter for assessing the link between income and health. This …nding provides additional support in favor of estimating a model that makes no assumptions about the distribution of the heterogeneity. 3 Estimation Method 3.1 Estimation problem and possible solutions From (1), (2) and the normality assumption about "it , we have that Pr(hit = 1jxit ; hit 1 ; ci ; i) =1 Pr(hit = 0jxit ; hit 1 ; ci ; i) = Pr(hit = 1jxit ; hit 1 ; ci ; i) =1 (ci + (ci + it ) it ) ( Pr(hit = (5) it ) 1j:) Pr(hit = 0j:) = ( it ) (6) where it = i + 1 1 (hi;t 1 = 1) + 1 1 (hi;t 1 = 1) + x0it (7) Conditioning on the …rst observation hi0 , the log-likelihood is: l( 1 ; 1; ; ; c) = T N X X f1 fhit = 1g log [1 (ci + it )]+ i=1 t=1 1 fhit = 0g log [ (ci + it ) ( it )] + 1 fyit = 1g log [ ( it )]g; (8) Using standard MLE to estimate models like (2) is known to be biased, since we do not have a large number of periods. The MLE is inconsistent when T does not go to in…nity because the …xed e¤ects act as incidental parameters. Furthermore, existing Monte Carlo experiments with dynamic nonlinear models show that the MLE has large bias. In fact, simulations of a dynamic ordered probit in Bester and Hansen (2009) and simulations in the following sections show that the bias is non-negligible even with T as large as 20. As mentioned in the introduction, several recently developed bias-correction methods could overcome this problem. Arellano and Hahn (2007) summarize the di¤erent approaches. The methods can be grouped into three approaches based on the object that is corrected. The …rst aproach is to construct an analytical or numerical bias correction of a …xed e¤ect estimator. Fernandez-Val (2009), among others, takes this approach and applies his analytical bias correction to dynamic binary choice models. The second approach is to correct the bias in moment equations. An example of this is Carro (2007) that uses an estimator of this type to correct the bias in dynamic binary choice models. The third group are those that correct the objective function. Arellano and Hahn (2006) and Bester and Hansen (2009) take this approach, with the latter including an application to a dynamic ordered probit model. The HS-penalty estimator studied in Bester and 12 Hansen (2009) is the …rst option we consider because our model is also a dynamic ordered probit, and because alternative approaches require transformations or derivations. This estimator also has the advantage of being easier to compute than the Modi…ed MLE in Carro (2007) and the Bias Correction in Fernandez-Val (2009) because the HS does not require the calculation of expectations and the other two do. This advantage is more relevant in our case, because it has two …xed e¤ects. Arellano and Hahn (2007) show how the di¤erent approaches are related. Asymptotically, all the approaches always reduce the order of the bias of the MLE from the standard O(T 1 ) to O(T 2 ) for the general classes of models they were developed. However there may be di¤erences when they are applied to speci…c cases . The following very simple example used in Carro (2007), Arellano and Hahn (2007), and Bester and Hansen (2009) illustrates this point. Consider the model where yit N ( i ; 02 ). The ML estimator of 02 iid P P 2 1 2 2 is bM LE = N T i t (yit bi ) . It is well known that bM LE is not a consistent estimator 2 of 0 when N ! 1 with …xed T , since it converges to T T 1 02 . In this case the whole P PT problem is very easy to …x. N (T1 1) N bi )2 is the …xed T consistent estimai=1 t=1 (yit tor of 02 . The MMLE from Carro (2007) produces this very same estimator, correcting not only the O(T 1 ) term of the bias, but all the asymptotic bias in this special example. The HS removes the O(T 1 ) term of the bias, but it does not attain the …xed-T consistent estimator. The one-step bias correction to the ML estimator from Fernandez-Val (2009) does not produce a …xed-T consistent estimator either, but its iterated form does. Thus, di¤erences may appear between the di¤erent approaches when applied to speci…c models. On the other hand, the incidental parameters problem can be seen as a …nite sample bias problem in panel data context. The problem is not important when T is large relative to N. However, since our panel does not have a large number of periods it is reasonable to wonder whether the excellent asymptotic properties of the MLE when T goes to in…nity (su¢ciently fast) are a good approximation to our …nite sample. Simulations show that we would need panels with many more time periods than are usually found in practice. The relevant implication is that we have to examine the …nite sample performance of the estimators for our model and sample sizes. In the methods considered here this is done through Monte Carlo experiments. Bester and Hansen (2009) do not compare the …nite sample properties of the method they use with others for the ordered probit case because many of the other methods require some derivation to get the speci…c correction for this case. However, they make such a comparison using binary choice (probit and logit) models. Also, Carro (2007) and Fernandez-Val (2009) conduct Monte Carlo experiments for logit and probit models with di¤erent sample sizes (both in T and N), allowing us to compare a wide range of methods for those models. From these comparisons we can conclude that the HS penalty approach is not the best one and for sample sizes with T smaller than 13 the remaining bias can still be signi…cant. Given this result, we consider other of the proposed methods to estimate our ordered probit and evaluate its …nite sample properties. 13 Interesting candidates are the corrections discussed by Fernandez-Val (2009) and Carro (2007) since they are equally superior to other alternatives in …nite sample performance in the relevant existing comparisons. In the next subsections we derive explicit formulas of the modi…ed MLE used in Carro (2007) for the model considered here, evaluate its …nite sample performance, and compare it with the HS penalty estimator. 3.2 MMLE for a dynamic ordered probit with two …xed e¤ects The model we want to estimate is de…ned in (1) and (2), and its log-likelihood is (8). Let = ( ; 1 ; 1 ) and i = ( i ; ci ). Partial derivatives are denoted by the letter d, so @li ( ; i ) @li ( ; i ) and d i ( ; i ) . Bold letters the …rst order conditions are d i ( ; i ) @ i @ represent vectors. The MLE of i for given , i ( ), solves d i ( ; i ) = 0. The MLE of is obtained by P maximizing the concentrated log-likelihood ( N i=1 li ( ; i ( ))), i.e. by solving the following …rst order condition: N 1 X d i ( ; i ( )) = 0 (9) T N i=1 where d i ( ; i ( )) = @li ( ; i ) @ i= i( ) . To reduce the bias of the estimation, we follow Carro (2007) in modifying the score of the concentrated log-likelihood by adding a term that removes the …rst order term of the asymptotic bias in T . By doing so, we get that the MMLE of the parameters of model (2) is the value that solves the following score equation: d M i( +dcci d 1 2d 1 @ bi @^ ci + dccci @ @ d ci i dcci @ bi @ bi @^ ci @^ ci + d ci 2d ci d ci + d ci + d cci i+d i @ @ @ @ E(d ci )E(d ci ) E(dcci )E(d i ) @ @ i E(d i )E(dcci ) [E(d ci )]2 i= i( ) ) = d i ( ; i ( )) @ @ci 2 d i d E(d i )E(d ci ) E(d i )E(d ci ) E(d i )E(dcci ) [E(d ci )]2 cci +d cci =0 i= i( (10) ) where d i ( ; i ( )) is the standard …rst order condition from the concentrated log-likelihood, 2 2 @ 3 li , and so on. From the …rst order conas in (9). d ci = @@ @cli i , d i = @@ l2i , d ci = @ @c i@ i i ditions of i and ci we obtain bi ( ) and b ci ( ), as it is done in order to concentrate the log-likelihood. All expectations are conditional on the same set of information as the likelihood. These expectations can be computed by conditioning recursively, like we do to write the conditional likelihood. The parametric model (equations (1), (2) and the assumption about "it ) from which we write the likelihood also gives the parametric form 14 of the expectations we need to calculate.9 We show in Appendix A how this modi…cation on the score of the concentrated loglikelihood in (10) is a …rst order adjustment on the asymptotic bias of the ML score, so the …rst order condition is more nearly unbiased and the order of the bias of the estimator is reduced from O(T 1 ) to O(T 2 ). Furthermore, the bias is corrected without changing the asymptotic variance of the MLE. 3.3 3.3.1 Simulations First DGP: Performance for di¤erent T We simulate the model in equations (1), and (2) with the following value of the parameters and Data Generating Process (DGP): = 1, 1 = 0:5, and 1 = 0:5. The error follows a normal distribution: "it N (0; 1). The …xed e¤ects are constructed as follows: i 4 1 X = xit + ui ; 2 t=1 ci = jzi j; where ui where zi N (xi0 ; 1) (11) N (xi0 ; 1): (12) so that they are correlated with the explanatory variables. This correlation of the unobserved heterogeneity with the covariates makes the problem more severe than in the independency case. We study the performance of estimators under this condition as we consider it to be more realistic.10 xit follows a Gaussian AR(1) with autoregressive parameter equal to 0:5. Initial conditions are xi0 N (0; 1) and hi0 = i + 0 xi0 + "i0 . We perform 1000 replications, with a population of N = 250 individuals. For each simulation we estimate the MLE, the MMLE given by equation (10) and the HS estimator de…ned in Bester and Hansen (2009). That is, the HS estimator is the value of the parameters that maximize the following penalized objective function: N X i=1 lki ( ; 1; 1; i ; ci ) N X 1 i=1 2 trace Ib c1i Vb ci k 2 (13) where lki is the log likelihood of i, Ib ci is the sample information matrix for ei = ( i ; ci )0 , @li Vb ci is a HAC estimator of V ar p1T @e , and k = dim(ei ). This penalty term is easier to i calculate than the modi…cation of the score in (10) because the penalty does not involve any expectation. 9 Appendix B gives some indications about computing the MMLE. In the simulations of an ordered probit in Bester and Hansen (2009) the …xed e¤ects are independent of the covariates. We have simulated and compared MMLE and HS in this case too. As said, the bias is smaller for all T , but the conclusions from the comparison between MMLE and HS are the same as in the dependency case. Since the latter is more relevant in practice we do not report the independency case. 10 15 Table 4: Monte Carlo Results. Dynamic Ordered Probit parameters Parameter 1 1 True value 1 0:5 0:5 Estimator Mean Bias RMSE Mean Bias. RMSE Mean Bias T =4 MLE 0:816 0:828 0:474 0:516 0:551 0:392 0:443 0:467 HS 0:796 0:809 0:254 0:282 0:280 MMLE 0:172 0:182 T =8 0:188 0:216 0:189 MLE 0:335 0:341 0:115 0:153 0:119 HS 0:247 0:254 0:062 0:108 0:067 MMLE 0:073 0:086 T = 10 0:145 0:171 0:154 MLE 0:257 0:263 0:083 0:119 0:093 HS 0:170 0:178 MMLE 0:052 0:067 0:036 0:086 0:050 T = 12 0:127 0:152 0:127 MLE 0:210 0:215 HS 0:127 0:134 0:072 0:106 0:074 0:030 0:079 0:036 MMLE 0:040 0:054 T = 16 MLE 0:154 0:159 0:093 0:118 0:096 0:048 0:083 0:054 HS 0:081 0:088 0:017 0:068 0:022 MMLE 0:026 0:041 T = 20 MLE 0:122 0:127 0:072 0:095 0:078 0:034 0:067 0:042 HS 0:058 0:065 0:009 0:058 0:016 MMLE 0:019 0:034 RMSE 0:586 0:509 0:305 0:216 0:154 0:109 0:179 0:127 0:093 0:151 0:106 0:081 0:119 0:085 0:069 0:101 0:074 0:062 Note: See a detailed description of the model simulated and other characteristics of the DGP in subsection 3.3. Results from this experiment for di¤erent T are reported in Table 4, which shows the mean bias and the Root Mean Squared Error (RMSE). We …nd that for all T , the MMLE performs much better than the other two estimators. Comparing it with the HS, the di¤erences are greater for T = 4 and T = 8, where the HS is closer to the MLE than to the MMLE. When using the MMLE the bias is smaller than 10% of the true values with T = 10 for all but for one of the parameters. With T = 12 the bias when using the MMLE is already negligible whereas the HS contain biases and RMSE larger than the MMLE with T = 10. Even with T = 16 the HS exhibit mean biases greater than the MMLE with T = 10. It is not until T = 20 that the HS has small biases and RMSE. So HS needs more periods (at least more than 16) to have small …nite sample biases. Given this and the fact that the sample sizes we have in our empirical analysis are smaller than T = 14, we use MMLE. 16 The reasons of this better performance of the MMLE is the use of the speci…c structure of the model we want to estimate, which translates into the expectations in the modi…cation term. The likelihood includes the fact that we know the distribution of one of the explanatory variables: the lag of the dependent variable. This distribution is, of course, that of the dependent variable in the previous period. Therefore, we write the likelihood recursively for each period (conditional on the previous period) up to the likelihood of the initial condition. This is used in the modi…cation so it includes expectations, using the known distribution of hit 1 conditional on hit 2 . The HS is generally written so that it does not make any intensive use of a speci…c likelihood and it does not include such expectations. Therefore HS does not exploit all the information that our speci…cation provides and it requires more periods to attain the same performance as the MMLE. This con…rms the idea expressed in Bester and Hansen (2009) that the simplicity of the HS (due to not having to calculate expectations) may not be free and could lead to a worse performance than other approaches. 3.3.2 Quality of inference We consider the quality of inference on …nite samples based on these estimators. Table 5 presents the coverage of 95% con…dence intervals and the estimated asymptotic standard errors divided by the standard deviation. The latter is very close to 1 in all cases for the MMLE and in most cases for the other estimators, which indicates that the variance is estimated well and the problem is the bias. This corresponds with the fact that we are correcting a bias without altering the asymptotic variance. With respect to inference, the coverage of the con…dence intervals is extremely poor for the MLE, especially for . Even with T = 20, the coverage for is smaller than 3%. The HS estimator improves inference with respect to the MLE, but it is still too far from the theoretical coverage of 95%, being the coverage for specially bad even with T = 20. As it happens with the bias and RMSE criteria, the MMLE is clearly the best estimator of these three for doing inference, for all periods and parameters. 3.3.3 Performance for di¤erent degrees of persistence To check whether results are maintained under di¤erent state dependence scenarios, we present simulations for di¤erent values of 1 and 1 , with T = 10 in Table 6. The DGP is the same as that of Table 4 except for the values of 1 and 1 : Here the state dependence changes from very negative to very positive, including the case with no state dependence. In terms of bias and RMSE, we …nd that the MMLE performs better than the other methods for all cases. In principle, having a more negative state dependence may improve all the estimators since it induces higher variance in yit . This is the case for the estimation of , where the three estimation methods improve, but it is not the case 17 Table 5: Monte Carlo Results. Inference over Dynamic Order Probit parameters: Conference intervals coverage and estimation of the estandard error. Parameter True value Estimator MLE HS MMLE MLE HS MMLE MLE HS MMLE MLE HS MMLE MLE HS MMLE 1 1 0:5 0:5 1 % Coverage % Coverage % Coverage C.I. 95% SE/SD C.I. 95% SE/SD C.I. 95% SE/SD T =8 47% 0:87 48% 0:90 0% 0:85 0% 0:86 74% 0:91 73% 0:94 87% 0:93 85% 0:96 64% 1:02 T = 10 54% 0:91 53% 0:91 0% 0:81 3:5% 0:83 82% 0:96 78% 0:95 90% 0:96 89% 0:96 74% 0:94 T = 12 0% 0:89 58% 0:91 62% 0:93 85% 0:96 83% 0:98 8:8% 0:92 92% 0:95 92% 0:97 81% 1:00 T = 16 0% 0:92 69% 0:91 68% 0:94 88% 0:96 88% 0:99 29% 0:95 88% 1:00 93% 0:94 93% 0:96 T = 20 77% 0:96 73% 0:94 2% 0:90 91% 1 88% 0:98 48% 0:93 90% 0:97 95% 0:98 93% 0:95 Note: This is for the simulation experiment in Table 4. We have used the inverse of the hessian as estimator of variance. for the estimation of worse. 3.3.4 1 and 1, where the MMLE improves but the MLE and HS get Simulations based on real data Finally, we perform a simulation based on the real data used in this paper. This will provide further evidence about …nite sample performance of the MMLE and will give more robustness to our estimator choice. The DGP takes the estimates obtained by MMLE and reported in Table 8 as the true model. It takes the real data for all the individuals used in that estimation and all the signi…cant x variables except the time dummies. This means that in this DGP xit is a vector containing observations of the following variables: age, squared age, household size, number of kids, and income. The true values of the parameters are: 1 = 0:4875, 1 = 0:4375, 0 = (0:0205; 0:0005; 0:0388; 0:0472; 0:0396). N = 1739, T is the same as in our data (i.e. between 8 and 14 periods), and "it N (0; 1): 18 Table 6: Monte Carlo Results. Dynamic Ordered Probit parameters with di¤erent degrees of state dependence Parameter 1 Estimator Mean Bias RMSE Mean Bias. True value 1 1 MLE 0:204 0:212 0:264 0:094 HS 0:105 0:116 0:008 MMLE 0:012 0:044 True value 1 0:5 0:214 MLE 0:212 0:218 0:079 HS 0:116 0:126 0:018 MMLE 0:026 0:048 True value 1 0 0:180 MLE 0:227 0:233 0:079 HS 0:136 0:144 MMLE 0:037 0:055 0:028 True value 1 0:5 0:145 MLE 0:257 0:263 HS 0:170 0:178 0:083 0:036 MMLE 0:052 0:067 True value 1 1 MLE 0:297 0:303 0:105 0:086 HS 0:215 0:222 0:057 MMLE 0:065 0:078 1 RMSE Mean Bias 1 0:284 0:244 0:136 0:087 0:089 0:003 0:5 0:235 0:206 0:119 0:078 0:083 0:018 0 0:201 0:180 0:116 0:084 0:082 0:032 0:5 0:171 0:154 0:119 0:093 0:086 0:050 1 0:144 0:111 0:126 0:091 0:100 0:069 RMSE 0:265 0:130 0:086 0:227 0:119 0:083 0:201 0:119 0:084 0:179 0:127 0:093 0:148 0:129 0:107 Note: 1000 Monte Carlo simulations of the Ordered Probit model in equations (1) and (2), following the same DGP as in Table 4 (described at the beginning of section 3.3), but changing the value of the state dependence parameters from negative to positive, including the case with no state dependence. T = 10: and ci are the estimates of these parameters by MMLE. The distributions of these two parameters can be seen in graph 1. The distribution of i is not normal and is correlated with ci (correlation coe¢cient between i and ci is -0.33). Thus, the distribution of unobserved heterogeneity is not an arbitrary and statistically convenient distribution, but an empirically founded distribution that captures both real correlations with the covariates and correlations between …xed e¤ects. These correlations and distributions of i and ci are richer than those in the previous simulation experiments. Furthermore, this is the relevant DGP to compare the proposed strategy for dealing with unobserved heterogeneity with the random e¤ects approach previously used in the literature. Making this comparison with an arbitrarily chosen DGP may imply a too favorable assumption to the random e¤ects, as in our …rst DGP, or a too arbitrarily unfavorable one. However, this case is the relevant case for our empirical analysis. For the reasons discussed, we evaluate the …nite sample performance of the random i 19 Table 7: Monte Carlo Results. DGP based on the real data used in the empirical analysis. 2 House- Number Household hold size of Kids Income -0.0388 0.0472 0.0396 Age Age True value -0.4375 0.4875 0.0205 -0.0005 Mean Bias CRE 0.0945 0.0459 0.0002 0.00006 -0.0080 MLE 0.2039 -0.1239 0.0061 -0.00016 -0.0078 HS 0.1288 -0.0474 0.0044 -0.00010 -0.0049 MMLE 0.0437 0.0090 0.0029 -0.00006 -0.0030 Root Mean Squared Error CRE 0.1041 0.0603 0.0265 0.00021 0.0406 MLE 0.2066 0.1272 0.0113 0.00018 0.0304 HS 0.1326 0.0546 0.0098 0.00013 0.0271 MMLE 0.0576 0.0289 0.0086 0.00010 0.0261 1 1 0.0095 0.0121 0.0077 0.0046 -0.0003 0.0063 0.0033 0.0016 0.0512 0.0354 0.0310 0.0292 0.0348 0.0257 0.0234 0.0222 Note: 1000 Monte Carlo simulations. DGP described at the begging of subsection 3.3.4. e¤ects approach (CRE) described at the end of section 2.3.1, in addition to the MLE, HS, and MMLE. To make the comparison as close as possible with the estimators used in practice, we include the following constant variables as covariates when estimating by random e¤ects: gender, race, and education indicators. These are implicitly included in the DGP through the estimated i and ci , since in the …xed e¤ects these variables can not be separately identi…ed from the …xed e¤ects. The results of this simulation are presented in Table 7. The MMLE is clearly the best of all estimators in terms of RMSE. More speci…cally, the bias and RMSE for the CRE are twice the bias and RMSE of the MMLE for some parameters like 1 and Household Size. As in the previous simulations experiments with similar number of periods, the MMLE exibit small biases. 4 4.1 Estimation Results Main Model Table 8 presents the coe¢cient estimates for the main model based on three di¤erent estimators. This includes di¤erent speci…cations of the heterogeneity. The …rst estimated model (column I) is a pooled version of the model in (1) and (2), without individual speci…c e¤ects. The second estimated model (column II) is the correlated random e¤ects model described in equations (3) and (4). It is similar to models estimated in Contoyannis, Jones and Rice (2004). It has homogenous cut-points and uses a random e¤ects approach to control for the individual speci…c intercept in the linear index. The last speci…cation (column III) is described in previous subsections; it is the model in (1) and (2) treating 20 Table 8: Estimates, Main model. I II III Correlated Variables Pooled Random E¤ects MMLE Health in t-1: Good 0.6527*** (0 .0 1 8 5 ) 0.5028*** (0 .0 2 3 4 ) 0.4875*** (0 .0 1 8 6 ) Health in t-1: Poor -0.4417*** (0 .0 2 3 3 ) -0.3259*** (0 .0 3 4 3 ) -0.4375*** (0 .0 2 4 2 ) Age 0.0011 (0 .0 0 3 2 ) 0.0200 (0 .0 2 1 0 ) 0.0205 (0 .0 2 2 2 ) Age square -0.0000 (0 .0 0 0 0 ) -0.0007*** (0 .0 0 0 1 ) -0.0005*** (0 .0 0 0 1 ) Married 0.0344 (0 .0 2 8 6 ) 0.1722 (0 .0 7 5 2 ) 0.0749 (0 .0 6 0 6 ) Separated/Divorced -0.0580 (0 .0 3 5 8 ) 0.0475 (0 .1 0 2 8 ) 0.0375 (0 .0 7 2 9 ) Widowed -0.0243 (0 .0 4 0 8 ) 0.3668** (0 .1 3 2 9 ) 0.0542 (0 .0 9 1 8 ) Household size -0.0782*** (0 .0 1 3 8 ) -0.0112 (0 .0 1 8 9 ) -0.0388** (0 .0 1 7 7 ) Number of Kids 0.0647*** (0 .0 1 5 5 ) 0.0423 (0 .0 1 8 9 ) 0.0472** (0 .0 1 8 8 ) Household Income 0.0816*** (0 .0 1 2 2 ) 0.0188 (0 .0 1 9 1 ) 0.0396*** (0 .0 1 4 7 ) Male -0.0095 (0 .0 1 7 5 ) 0.0116 (0 .0 2 6 5 ) Non-white -0.0890* (0 .0 4 6 7 ) -0.1277* (0 .0 7 0 9 ) Higher/1st degree 0.1540*** (0 .0 3 4 5 ) 0.1563*** (0 .0 4 6 6 ) HND/A level 0.0810*** (0 .0 2 5 0 ) 0.0696* (0 .1 8 6 2 ) CSE/O level 0.0860*** (0 .0 2 2 5 ) 0.0923*** (0 .0 3 2 7 ) Cut point 1 0.0192 (0 .1 2 3 3 ) -0.0277*** (0 .2 2 6 5 ) Cut point 2 1.0698*** (0 .1 2 3 5 ) 1.0528*** (0 .2 2 6 7 ) 2 0.0686 u Mean ci 1.1323 Variance ci 0.3277 Mean i -0.0743 Variance i 0.6311 Correlation( i ,ci ) -0.3326 Akaike Infomation Criterion 38544.0 37334.3 37275.2 Standard errors are reported in parentheses. Number of individuals used in estimation of all models is 1739. Estimates of year dummies in all models and within means of variables in random e¤ects are not reported. * signi…cant at 10% ; ** signi…cant at 5% ; *** signi…cant at 1%. 21 and ci as …xed e¤ects. It is estimated by MMLE. To compare magnitudes of the e¤ects across variables and estimates we look at the relative e¤ects (i.e. ratio of coe¢cients), and the average and median marginal e¤ects reported in tables 9 and 10 for the variables with a coe¢cient signi…cantly di¤erent from zero.11;12 i Table 9: Average Marginal E¤ects on Probability of reporting good and poor health for signi…cant variables. Main model. (a) Good I II Correlated Random Pooled St.Err. E¤ects St.Err. Health in t-1: Good 0.2528 0.0071 0.1883 0.0456 Health in t-1: Poor -0.1550 0.0078 -0.1149 0.0637 Age -0.0005 0.0003 -0.0170 0.0055 Household size -0.0282 0.0050 -0.0040 0.0111 Number of Kids 0.0233 0.0056 0.0150 0.0149 Household Income 0.0294 0.0044 0.0067 0.0095 III MMLE St.Err. 0.1653 0.0080 -0.1403 0.0520 -0.0089 0.0064 -0.0120 0.0054 0.0145 0.0058 0.0122 0.0045 (b) Poor I II Correlated Random Pooled St.Err. E¤ects St.Err. Health in t-1: Good -0.1399 0.0046 -0.1057 0.2125 Health in t-1: Poor 0.1477 0.0081 0.0968 0.1372 Age 0.0003 0.0002 0.0105 0.0140 Household size 0.0173 0.0031 0.0024 0.0072 Number of Kids -0.0143 0.0034 -0.0090 0.0171 Household Income -0.0181 0.0027 -0.0040 0.0078 III MMLE St.Err. -0.0984 0.1153 0.1268 0.0947 0.0058 0.0117 0.0081 0.0086 -0.0095 0.0102 -0.0081 0.0082 11 These marginal e¤ects are also called partial e¤ects. The marginal e¤ects are averaged (or calculated their median) across the …rst eight waves of the panel as well as across the values of the covariates for each individual. This means that we …rst calculate the marginal e¤ect for each individual in the sample at the observed values of the regressors and then we calculate the average (or the median) of them, instead of calculating the marginal e¤ect at the average value of the covariates. We do this in order to obtain summary measures of the marginal e¤ects representative of the situation of the population (see Chamberlain, 1982, pp.1273). Moreover, a measure that substitutes the values of the covariates and especially the individual speci…c e¤ect i with their means (or any other …xed value) ignores any possible correlation between them. This may give the wrong values of the marginal e¤ects representative of the population. 12 An alternative way to identify and estimate the marginal e¤ects is the approach taken in Chernozhukov et. al. (2010). They show that in a model like ours, with …xed e¤ects, when T is …xed the (average and quantile) marginal e¤ects are not point identi…ed. However they are set identi…ed and they propose a way to estimate bounds on the partial e¤ect. These nonparametric bounds tighten as T grows. The main advantage is that the bounds analysis applies to any T , whereas our bias correction method depends on T not being very small. However, the bounds analysis is only available with discrete covariates for the moment. In contrast, the bias correction methods work well in many examples, including continuous covariates, and they consistently point estimate the identi…ed average e¤ect. 22 Table 10: Median Marginal E¤ects on Probability of reporting good and poor health for signi…cant variables. (a) Good I II Corr. Random Pooled E¤ects Health in t-1: Good 0.2536 0.1889 Health in t-1: Poor -0.1555 -0.1175 Age -0.0004 -0.0162 Household size -0.0283 -0.0040 Number of Kids 0.0234 0.0151 Household Income 0.0296 0.0067 III MMLE 0.1738 -0.1544 -0.0080 -0.0127 0.0154 0.0130 (b) Poor I Pooled Health in t-1: Good -0.1402 Health in t-1: Poor 0.1484 Age 0.0002 Household size 0.0170 Number of Kids -0.0140 Household Income -0.0177 II Random E¤ects -0.1014 0.0949 0.0094 0.0023 -0.0086 -0.0039 III MMLE -0.0910 0.1282 0.0043 0.0077 -0.0089 -0.0077 The pooled model exacerbates the state dependence e¤ect due to the lack of permanent unobserved heterogeneity. Though it is not reported, we also estimated the model in (1) and (2) by MLE. As seen in the simulations it is severely biased, estimating much lower state dependence e¤ects and higher e¤ect of the other explanatory variables. More interesting is the comparison between the correlated random e¤ects and the …xed e¤ects model estimated by MMLE. They are in columns II and III of Tables 8, 9, and 10. The …rst di¤erence is in the variables that are statistically signi…cant. Table 8 shows that in the MMLE household size, number of kids, and household income have an impact that is statistically di¤erent from zero. However, none of them has a signi…cant e¤ect in the random e¤ect estimates. In correspondence, the average marginal e¤ect of those variables increases in absolute value in the MMLE case with respect to the random e¤ects model, especially for household income. With respect to the state dependence e¤ect (effect of hit 1 ) there are some changes too. The e¤ect of hit 1 = good decreases in absolute value when estimating by MMLE, and the e¤ect of hit 1 = poor increases. Comparing coe¢cients in Table 8 we can also see that the e¤ect of hit 1 = poor increases proportionally less than the e¤ect of the other relevant explanatory variables. In the random e¤ects speci…cation the ratio of the coe¢cient of 1 (hi;t 1 = poor) to the coe¢cient of ‘Household 23 income’ is around 17, whereas in the MMLE that ratio is 11. In any case, this partial increase in the e¤ect of state dependence and of the e¤ect of the explanatory variables is remarkable because the model in column III allows for more permanent unobserved heterogeneity and more ‡exibly than in column II.13 Moreover, many of those di¤erences in the estimated e¤ects of the explanatory variables between the correlated random e¤ects model and the …xed e¤ects model estimated by MMLE are statistically signi…cant. As is known, if the restrictions imposed by the correlated random e¤ects model are correct its estimates are more precise (i.e. e¢cient) than the estimates of the …xed e¤ects model (even after the modi…cation of the MLE), though both are consistent. Given this, we have used a Hausman type test to see if those important di¤erences are only due to the more imprecise estimates in columns III. We have made the test over the Average Marginal e¤ects instead of the parameters in table 8 for two reasons. First, Marginal E¤ects (including their average), and not the parameters in equations (1) and (2), are usually the parameters of interest in nonlinear models. Second, the average marginal e¤ects do not su¤er the di¤erent scales problem that makes magnitudes in columns II and III of Table 8 not directly comparable and not directly interpretable. The average marginal e¤ects of both models are well de…ned within the same scale, as any other marginal e¤ect over choice probabilities, and their magnitude has the same clear interpretation. If we were primarily interested in a single average marginal e¤ect, like the e¤ect of hi;t 1 = good over the probability of hi;t = good, we could use a t-statistic that ignores the others. Doing this for all the average marginal e¤ects we reject at 5% the null hypothesis that both estimates are the same for four variables. Doing a joint test we also reject the null hypothesis that the correlated random e¤ects estimates and the …xed e¤ects MMLE estimates are the same, therefore rejecting, the restrictions imposed in the correlated random e¤ects model.14 The previous two paragraphs are a clear indication that ignoring the added dimension of heterogeneity and the ‡exibility in the distribution of the …xed e¤ects matters when estimating the model and the marginal e¤ects of variables. It is not only a matter of the amount of heterogeneity but also a matter of the other restrictions being imposed on the model in column II. Besides the formal test of random e¤ects versus …xed e¤ects, we look at the unobserved heterogeneity both in the linear index equation and in the cut point shift. Figure 1 displays the estimated distribution (histogram) of both …xed e¤ects in the population, and both exhibit large variation. The average for i is 0:074 and for ci is 1:13. The 13 Recall that permanent unobserved heterogeneity, state dependence and persistence in observable variables are alternative explanations of the observed high persistence in hit . 14 In the Hausman test we have used the Var-Cov of the Fixed E¤ects estimates only, instead of subtracting from it the Var-Cov of the Random E¤ects. We do this in order to avoid the di¤erence not being a positive de…nite matrix due to the use of di¤erent estimates of the variance of the errors. This represents a lower bound for this test and a rejection here will also be a rejection when using the well de…ned di¤erence in the var-cov matrices. 24 Figure 1: Distribution (histogram) of the …xed e¤ects from MML estimates. standard deviations of these distributions are 0:79 and 0:57; respectively. In the random e¤ects speci…cation i is the compound equation (4) that includes a linear relation to some observables and an additive unobserved term that is assumed to follow a normal distribution. Given the estimates of the parameters of equation (4), the estimated average for i in the random e¤ects model is 1:41, and its standard deviation is 0:9626. With respect to the heterogeneity on the cut points, the average of ci , the …rst cut point, is -1:13 and the estimate of the …rst cut point in the random e¤ects speci…cation is 0:03. Also, as can be seen in the right panel of …gure 1 and has been said, there is large variation in ci among individuals that is ignored by the random e¤ects model estimated. Moreover, a test rejects normality of the distribution of i at 1%.15 Finally, the correlation between 0:33, so there are rich interactions between both …xed e¤ects forming a joint i and ci is distribution that is not the simple combination of their marginal distributions. Focusing on the MML estimates, we …nd evidence of strong positive state dependence. With respect to socioeconomic variables we …nd that aging and household size have a small but signi…cant negative e¤ect on SAH. Household income and number of kids have a small but signi…cant positive marginal e¤ect on SAH. Number of kids has the biggest e¤ect of all the x variables. With respect to how the models …t the observed data, in addition to the information criteria (AIC) reported in Table 8 some predictions of the estimated models and their sample counterparts are in Table 11. Overall the MMLE model …ts the data better, because its predictions are closer to the actually observed proportions in the sample. Likewise, the MMLE predictions capture better the inverted-U shape of the proportion of reporting excellent or good health as we look at people with higher number of children, and the slope in the increasing patter when looking at people with higher income.16 15 16 ci can not be normal by de…nition since it is restricted to be positive. Note that we are not controlling for any other observable characteristics. Thus, there may be other 25 Table 11: Sample vs. predicted proportions of SAH (in %) Panel A: Total proportions. Poor or very poor Fair Excellent or good Sample 16 31 53 Predicted MMLE 15 32 53 Predicted CRE 12 31 57 Predicted Pooled 14 29 57 Panel B: Proportions of people reporting Excellent or good SAH. Predicted Sample MMLE CRE Pooled By number of Kids 0 52 53 57 56 1 55 54 55 57 2 58 56 57 60 3+ 50 51 54 58 By income quartiles 1st quartile 47 50 54 54 2nd quartile 51 52 56 56 3rd quartile 56 55 58 59 4th quartile 58 57 59 59 In addition to considering the average and median marginal e¤ects reported in tables 9 and 10, we look at how many individuals have a signi…cant marginal e¤ect in the sample, given their particular situation and unobserved characteristics. Table 12 presents the proportion of individuals with signi…cant (at 10%) marginal e¤ects over the probability of reporting good and bad health, for the same variables as in table 9. Notice that although the average marginal e¤ects are signi…cant, there is a great deal of heterogeneity; for around half the population, the marginal e¤ects over the probability of reporting good health is not signi…cantly di¤erent from zero for many of these variables. 4.2 4.2.1 Estimates of additional speci…cations Model with health measures As explained in subsection 2.2, we add variables that contain information on objective health problems to provide further evidence of the robustness and importance of state dependence in SAH. Table 13 presents the estimates of this model by MMLE, and table 14 contains the corresponding average marginal e¤ects. Of the three signi…cant socioeconomic variables in the main model only number of kids remains signi…cantly di¤erent di¤erences between people with di¤erent number of children (or di¤erent income) that can reinforce or cancel the e¤ect of it on average. Therefore these numbers can not be interpreted as the e¤ect of the number of children (nor the e¤ect of income). 26 Table 12: Proportion of individuals with marginal e¤ects (on the probability of reporting good and poor) that are signi…cantly di¤erent from zero at 10%. Proportion Good Poor Health in t-1: Good 60.44% 12.25% Health in t-1: Poor 55.43% 34.50% Age 22.71% 2.53% Household size 37.21% 11.44% Number of Kids 41.81% 12.65% Household Income 44.85% 15.35% Variable from zero (at 10%). Most of the objective health measures have the biggest e¤ect over SAH, all with the expected signs. The second variables with higher impact are the two indicators of hit 1 .Thus, even after including objective health measures we …nd evidence of strong positive state dependence here, though it is less than in the main model. The variance of the unobserved heterogeneity is even higher in both i and ci than in the main model. 4.2.2 Linear versus quadratic e¤ect of age Halliday (2008) found, based on AIC, that a quadratic function of age was only weakly preferred to the linear model and that there was not much lost with a linear model in age. We have estimated model III in table 8 excluding age2 as an explanatory variable, and in our case the …t is worse because the e¤ect of age increase more than linearly at olger ages. Also, when introducing the quadratic term, the AIC changes much more than in Halliday (2008). Here in the linear model AIC is 37373.4 and in the quadratic model is 37275.2, almost a hundred points smaller. 5 Conclusion In this paper we have considered the estimation of a dynamic ordered probit of a selfassessed health status with two …xed e¤ects: one in the linear index equation and one in the cut points. The inclusion of two …xed e¤ects, instead of only one as is usual, is motivated by the potential existence of two sources of heterogeneity: unobserved health status and reporting behavior. Even though we cannot separately identify these two sources of heterogeneity we robustly controll for them by using two …xed e¤ects. Based on our best estimates, the two …xed e¤ects exhibit important variation and it is relevant to account for both when estimating the e¤ect of other variables. Our estimates also show that state dependence is large and signi…cant even after controlling for unobserved heterogeneity and 27 Table 13: Estimates, health indicators added. Variables Health in t-1: Good Health in t-1: Poor Age Age square Married Separated/Divorced Widowed Household size Number of Kids Household Income Self employed In paid employment Unemployed Retired Long term sick or disa. Health problems Health limits daily acti. Health limits work Hospital days Cut point 1 Cut point 2 2 u Correlated Random E¤ects 0.4191*** (0 .0 3 3 7 ) -0.1830*** (0 .0 4 0 1 ) 0.0262 (0 .0 3 2 4 ) -0.0005*** (0 .0 0 0 2 ) 0.0974 (0 .1 2 1 5 ) -0.0177 (0 .1 5 4 7 ) 0.1601 (0 .2 0 8 7 ) -0.0181 (0 .0 3 5 9 ) 0.0667 (0 .0 4 4 4 ) 0.0051 (0 .0 3 1 2 ) -0.0941 (0 .1 0 7 3 ) 0.1042 (0 .0 6 6 5 ) 0.1311 (0 .0 9 5 6 ) 0.1089 (0 .1 1 1 0 ) -0.1893 (0 .1 2 3 1 ) -0.6808*** (0 .0 4 7 0 ) -0.6435*** (0 .0 4 6 5 ) -0.4956*** (0 .0 4 6 8 ) -0.0331*** (0 .0 0 2 9 ) -0.9318*** (0 .2 6 5 1 ) 0.2788 (0 .2 6 4 7 ) 0.0489 Mean ci Variance ci Mean i Variance i Correlation( i ,ci ) Akaike Infomation Criterion 27688.2 MMLE 0.3696*** (0 .0 2 2 6 ) -0.2784*** (0 .0 2 9 6 ) -0.0215 (0 .0 2 8 2 ) -0.0003*** (0 .0 0 0 1 ) 0.0350 (0 .0 6 7 2 ) 0.0340 (0 .0 8 1 7 ) 0.0474 (0 .1 1 1 0 ) -0.0127 (0 .0 2 0 6 ) 0.0387* (0 .0 2 1 3 ) 0.0112 (0 .0 1 7 7 ) 0.0216 (0 .0 6 6 0 ) 0.1069** (0 .0 4 2 5 ) 0.0946 (0 .0 6 8 0 ) 0.1104* (0 .0 6 5 1 ) -0.2562*** (0 .0 7 0 7 ) -0.7759*** (0 .0 3 3 4 ) -0.6865*** (0 .0 2 9 9 ) -0.4854*** (0 .0 3 0 6 ) -0.0350*** (0 .0 0 0 8 ) 1.2775 0.3942 2.7760 1.4170 -0.0551 27310.7 Standard errors are reported in parentheses. Number of individuals used in estimation of all models is 1437. Estimates of year dummies in all models, constant variables and within means of variables in random e¤ects are not reported. * signi…cant at 10% ; ** signi…cant at 5% ; *** signi…cant at 1%. 28 Table 14: Average Marginal E¤ects health for signi…cant variables. Model with health indicators added. (a) Good Correlated Random E¤ects St.Err. Health in t-1: Good 0.1416 0.0117 Health in t-1: Poor -0.0610 0.0134 Age -0.0061 0.0087 Number of Kids 0.0213 0.0141 In paid employment 0.0336 0.0215 Retired 0.0352 0.0358 Long term sick or disa. -0.0610 0.0396 Health problems -0.2250 0.0171 Health limits daily acti. -0.2169 0.0167 Health limits work -0.1666 0.0162 Hospital days -0.0106 0.0009 MMLE St.Err. 0.1122 0.0074 -0.0832 0.0223 -0.0135 0.0080 0.0109 0.0060 0.0306 0.0122 0.0316 0.0185 -0.0729 0.0223 -0.2277 0.0480 -0.2045 0.0340 -0.1439 0.0141 -0.0099 0.0003 (b) Poor Correlated Random E¤ects St.Err. Health in t-1: Good -0.0780 0.0159 Health in t-1: Poor 0.0434 0.0119 Age 0.0038 0.0052 Number of Kids -0.0122 0.0081 In paid employment -0.0199 0.0133 Retired -0.0208 0.0213 Long term sick or disa. 0.0404 0.0280 Health problems 0.1083 0.0239 Health limits daily acti. 0.1435 0.0264 Health limits work 0.1041 0.0209 Hospital days 0.0063 0.0012 MMLE St.Err. -0.0675 0.0877 0.0650 0.0657 0.0088 0.0161 -0.0070 0.0089 -0.0201 0.0247 -0.0207 0.0266 0.0547 0.0570 0.1216 0.1667 0.1501 0.1630 0.0994 0.1136 0.0065 0.0075 some forms of objective health measures. The comparison with random e¤ects estimates previously used shows that it matters to ‡exibly account for more permanent unobserved heterogeneity. The recent literature in bias-adjusted methods of estimation of nonlinear panel data models with …xed e¤ects has produced several potentially equivalent estimators. We …nd that the a priori the most directly applicable correction to our model, which is the HS estimator proposed in Bester and Hansen (2009), still has signi…cant biases in our sample size. This lead us to consider the Modi…ed MLE proposed in Carro (2007). We derive the expression of the MMLE for our model, conduct Monte Carlo experiments to evaluate its …nite sample properties, and compare it with the HS. The MMLE has a negligible bias in our sample size. The Monte Carlo experiments contribute to the literature on 29 bias-adjusted methods of estimation nonlinear panel data models by showing how well two of the proposed methods work for a speci…c model and sample size. This information will be useful for other applications when choosing among the several correction methods existing in the literature. References [1] Arellano, M. and J. Hahn (2006): “A likelihood-based approximate solution to the incidental parameter problemin dynamic nonlinear models with multiple e¤ects”, unpublished manuscript. [2] Arellano, M. and J. Hahn (2007): “Understanding Bias in Nonlinear Panel Models: Some Recent Developments”. in Advances in Economics and Econometrics, Theory and Applications, Ninth World Congress, Volume 3, edited by Richard Blundell, Whitney Newey, and Torsten Persson. Cambridge University Press. [3] Bester, C. A. and C. Hansen (2009): “A Penalty Function Approach to Bias Reduction in Non-linear Panel Models with Fixed E¤ects”. Journal of Business & Economic Statistic, 27 (2):131-148 [4] Carro, J. M. (2007) “Estimating dynamic panel data discrete choice models with …xed e¤ects”. Journal of Econometrics, 140 (2007):503-528 [5] Chamberlain, G. (1984): “Panel Data”, in Griliches, Z. and M.D. Intriligator (eds.) Handbook of Econometrics, vol. 2, Elsevier Science, Amsterdam. [6] Chernozhukov, V., I. Fernandez-Val, J. Hahn, and W. Newey (2010): “Average and Quantile E¤ects in Nonseparable Panel Models”, mimeo, MIT Department of Economics. [7] Contoyannis, P., A. M. Jones and N. Rice (2004): “The Dynamics of Health in the British Household Panel Survey” Journal of Applied Econometrics, 19: 473-503 [8] Fernandez-Val, I. (2009): “Fixed e¤ects estimation of structural parameters and marginal e¤ects in panel probit models ”, Journal of Econometrics, 150 (2009):7185. [9] Greene, W. H. and D. A. Henshen (2008): “Modeling Ordered Choices: A Primer and Recent Developments”, Available at SSRN: http://ssrn.com/abstract=1213093. [10] Hahn, J. and G. Kuersteiner (2004): “Bias Reduction for Dynamic Nonlinear Panel Models with Fixed E¤ects”, mimeo. 30 [11] Hahn, J. and W. Newey (2004): “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models”, Econometrica, 72(4): 1295-1319. [12] Halliday, T. J. (2008): “Heterogeneity, state dependence and health”. Econometrics Journal, 11: 499-516 [13] Jones, A. M. (2007): “Panel Data Methods and Applications to Health Economics”, to appear in The Palgrave Handbook of Econometrics Volume II: Applied Econometrics, edited by Terence C. Mills and Kerry Patterson. Basingstoke: Palgrave MacMillan. [14] Jones, A. M. and S. Schurer (2009): “How does Heterogeneity Shape the Socioeconomic Gradient in Health Satisfaction?”. Journal of Applied Econometrics, published online, DOI: 10.1002/jae.1134. [15] Lindeboom, M. and E. van Doorslaer (2004):“Cut-point shift and index shift in selfreported health” Journal of Health Economics, 23: 1083-1099 [16] Mundlak, Y. (1978): “On the pooling of time series and cross-section data”, Econometrica, 46(1): 69-85. [17] Rilstone, P., V.K. Srivastava, and A. Ullah (1996): “The second-order bias and mean squared error of nonlinear estimators”, Journal of Econometrics, 75: 369-395 [18] van Doorslaer, E., Jones, A.M. and Koolman, X. (2004). “Explaining income-related inequalities in doctor utilisation in Europe” Health Economics, 13: 629-647 [19] Wooldridge, J. (2005): “Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity ”, Journal of Applied Econometrics, 20: 39-54. 31 A Appendix: Reduction of the order of the bias In this appendix we show that the modi…ed score presented above corrects the …rst order asymptotic bias of the original score. The algebra is somewhat tedious because of the many terms, but the idea is clear. We …rst expand the score of the MLE around the true value of the …xed e¤ects and make some calculations and substitutions on it to obtain the leading term of the bias of the MLE’s score. Then we show that the modi…cation in the MMLE’s score, equation (10), is subtracting that leading bias term from the score. This follows Carro (2007), and is adapted to our model with two …xed e¤ects. The notation used is the same as in section 3.2: = ( ; 1 ; 1 ) and i = ( i ; ci ); we denote partial @li ( ; i ) @li ( ; i ) derivatives by the letter d; bold letters are used to denote vectors; d i , , d i @ i @ 2 2 3 @ li li , d i = @@ l2i , d ci = @ @c , and so on; the derivatives evaluated at the true values of d ci = @@ @c i i@ i i the parameters are represented by including a 0 in the sub-index (e.g. d i0 = d i ( 0 ; i0 )). A.1 Deriving the leading term of the bias of the score in the MLE We start by deriving the …rst term of the bias in the score of the original unmodi…ed concentrated log-likelihood. Expanding this score around i0 , and evaluating it at 0 we get: d i ( 0 ; i ( 0 )) = d i0 + d i0 (^ i ( 0 ) i0 ) (A1) + d ci0 (^ ci ( 0 ) ci0 ) 1 1 2 + d d cci0 (^ ci ( 0 ) i0 (^ i ( 0 ) i0 ) + 2 2 + d aci0 (^ i ( 0 ) ci ( 0 ) ci0 ) + Op (T i0 )(^ ci0 )2 1=2 ) + ::: This equation clearly shows that the score evaluated at the true value 0 di¤ers from the value of the score we want to obtain, d i0 = d i ( 0 ; i0 ), as much as ^ i ( 0 ) and c^i ( 0 ) di¤er from i0 and ci0 . This is the source of the incidental parameters problem. Now we need expressions for (^ i ( 0 ) ci ( 0 ) ci0 ), for which we do asymptotic expansions, i0 ) and (^ following Rilstone, Srivastava and Ullah (1996): + Op (T 3=2 ) (A2) ci0 ) = bc 1=2 + bc 1 + Op (T 3=2 ) (A3) (^ i ( 0 ) i0 ) (^ ci ( 0 ) where b vector b =b 1=2 +b and bc 1=2 are the elements of the vector b 1 , which are determined as follows: 1=2 b 1=2 b 1 = Q 1 R = Q 1 Sb R= d i0 dci0 1 T 1=2 ! 1 Q 2 Q = E(rR) S = rR Q 2 U = E(r Q) 32 1 1=2 , 1 and b U (b 1=2 1 and bc 1 are the elements of the b 1=2 ) From the above expressions we obtain: b 1=2 1 T = 1 T E 1 T bc 1=2 = dc i0 1 T dai0 E i0 E 1 T dcci0 E 1 T dc i0 1 T dci0 E E 1 T dcci0 E 1 T dci0 E d dai0 E 1 T E d i0 1 T dcci0 2 1 T dc i0 1 T d i0 2 1 T dc i0 (A4) (A5) It is also useful to obtain: 2 i0 ) = (ba 1=2 )2 + Op (T 3=2 ci0 )2 = (bc 1=2 )2 + Op (T 3=2 (^ i ( 0 ) (^ ci ( 0 ) (^ i ( 0 ) ci ( 0 ) i0 ) (^ With respect to the squares of b (b 2 1=2 ) = 1 T dai0 2 E 1 T dcci0 2 1 T (bc 1=2 )2 = 1 T 2 dci0 E 1 T d 2 i0 1 T + d d 1 T E b 2 1 T E 1=2 + Op (T ) (A7) 3=2 ) 2 T1 dai0 T1 dci0 E i0 dcci0 1 T E 2 dc 1 T E i0 2 dc 1 T E i0 dai0 1 T E 2 dci0 1 T E 1=2 c (A6) (A8) and bc 1=2 , we get: 1=2 + ci0 ) = b a ) dc dcci0 E 1 T dc i0 1 T E dcci0 i0 2 T1 dai0 T1 dci0 E i0 1 T 2 2 1 T d i0 E 1 T dc i0 2 2 dc i0 Substituting by expectations, and using the information matrix identity (E(dc i ) = E(dai dci )), we get: (b 2 1=2 ) = (bc 1=2 )2 = 1 T E 1 T E 1 T E 1 T d i0 1 T E 1 T d i0 dcci0 1 T E 1 T E dcci0 d E 1 T dc i0 E 1 T dc i0 + Op (T 3=2 2 ) (A9) + Op (T 3=2 2 ) (A10) 3=2 (A11) i0 dcci0 Following the same procedure for the cross-product, we get: b c 1=2 b 1=2 = With respect to b 1 1 T E 1 T E 1 T d i0 E 1 T dc dcci0 i0 E 1 T dc 2 + Op (T ) i0 and bc 1 , we follow the same procedure (replace by expectations and use the 33 information matrix identity) to get: b 1 = 1 2T ( 2E 1 1 T E d E i0 2 1 dc T E i0 1 T E dcci0 1 d T cci0 1 T dc i0 (A12) 2 2 1 dai0 dcci0 T +E 2 1 1 1 E dcci0 da i0 + 2E dai0 d i0 T T T 1 1 1 d i0 E dcci0 E d cci0 + 2E +E T T T 1 1 1 E dc i0 E d i0 E dccci0 + 2E T T T 1 1 1 dc i0 E dcci0 3E d ci0 + 4E E T T T +E 1 dci0 dc T i0 +E + Op (T 1 bc 1 = 2T ( 2E 3=2 + 2E 1 dci0 d T i0 ) 1 1 T E d E i0 2 1 dc T 1 d +E T 1 d +E T 1 E dc T 1 E dc T + Op (T 1 dci0 dc i0 T 1 dci0 dcci0 T 1 dai0 dc i0 T i0 i0 1 d T E 1 dccci0 T i0 i0 3=2 1 dcci0 T 1 E dcci0 T 1 E d i0 T E E dcci0 E 2 i0 1 T ci0 1 T +E + 2E 1 d T 1 E da T 1 3E d T E dc i0 (A13) 2 2 1 dci0 d T i0 +E 1 dai0 dc T i0 1 dci0 dcci0 T ci0 i0 cci0 ) 1 dai0 dc i0 T 1 + 2E dai0 d i0 T 1 + 4E dci0 dc i0 T + 2E + 2E 1 dai0 dcci0 T (A14) 34 Introducing all these expressions in (A1), and taking expectations, we get: E(d i ( 0 ; ^i ( 0 ))) = 1 T E i0 dci0 d 1 T E + 1 2 (A15) 1 T E d i0 dc i0 E 1 T 1 T E 1 T E ( d 2 1 dc T 2E 1 T E i0 E dcci0 d d i0 dai0 E 1 T 1 T dc 1 T E dcci0 2 dc i0 ai0 dcci0 1 d T E i0 1 T E 1 dai0 dcci0 T +E cci0 2 2 i0 1 dci0 dc T +E i0 2 1 1 1 dcci0 da i0 + 2E dai0 d i0 E T T T 1 1 1 +E d i0 E dcci0 E d cci0 + 2E T T T 1 1 1 dc i0 E d i0 E dccci0 + 2E E T T T 1 1 1 E dc i0 E dcci0 3E d ci0 + 4E T T T +E + + E 1 T d ci0 dai0 E 1 2 1 T E 1 d T 1 d +E T 1 E dc T 1 dc E T +E E d 1 dc T 2E E d 1 T i0 i0 i0 i0 d i0 1 d T aci0 + O(T 1 d 1 dccci0 T E dcci0 1 dc T E 1 T dc i0 +E ci0 + 2E 1 d T 1 da E T 1 3E d T E E i0 1 T E E 1 T ci0 dci0 E 1 T d + 2E 1 dci0 d T i0 i0 2 dc i0 ci0 1 d T 1 dcci0 T 1 dcci0 E T 1 E d i0 T 1 d dcci0 dcci0 E E 1 T 1 T E i0 E 2 i0 E 1 T 1 T E 2 i0 dc i0 E ( + 1 T 1 T E 1 dci0 dc i0 T 1 dci0 dcci0 T 1 dai0 dc i0 T 1 T dc 1 E 2 2 2 1 dci0 d T i0 +E 1 dai0 dc T i0 1 dci0 dcci0 T ci0 i0 cci0 1 dai0 dc i0 T 1 dai0 d i0 + 2E T 1 + 4E dci0 dc i0 T + 2E + 2E 1 dai0 dcci0 T 2 i0 1 d T i0 E 1 dcci0 T 1 E 2 1 d T cci0 E 1 d T i0 ) The remainder of this expression is O(T 1 ) because Op (T 1=2 ) terms have zero mean. This means that the score of the original concentrated likelihood has a bias of order O(1), whose expression is in the previous formulae. 35 A.2 Modi…ed Score The modi…ed score in (10) can be decomposed in three terms, d A = d i ( ; i ( )) 1 1 B= 2 d i dcci M i( ) = A + B + C, such that: (A16) dc i (A17) 2 @ ^i @^ ci + dccci @ @ @^ ci @ ^i + d ci + dcci d i+d i @ @ @ ^i @^ ci 2dc i d aci + d ci + d cci @ @ E(d ci )E(dc i ) E(dcci )E(d i ) @ C= @ai E(d i )E(dcci ) [E(dc i )]2 d d i +d cci cci E(d i )E(dc i ) E(d i )E(d E(d i )E(dcci ) [E(dc i )]2 @ @ci (A18) i= i( ) ci ) i= i( ) A is the score of the original unmodi…ed concentrated log-likelihood. So, we now analyze B and C: Part B. We …rst want to derive expression for @ ^ i =@ and @^ ci =@ . Di¤erentiating the score of the concentrated log-likelihood, d i ( ; i ( )), with respect to we get a system of two equations with two unknowns. Solving for @ ^ i =@ and @^ ci =@ we get: d ci dc i dcci d @ ^i( ) = @ d i dcci d2c i d i dc i d i d @^ ci ( ) = @ d i dcci d2c i evaluating at 0 i (A19) ci (A20) and replacing by expectations: E @ ^i( 0) = @ 1 T E @^ ci ( 0 ) = @ 1 T d ci0 E 1 T d i0 E 1 T E d 1 T i0 E d 1 T i0 dc i0 E 1 T dc i0 E 1 T E 1 T dcci0 E dcci0 36 E 1 T 1 T dcci0 E 1 T dc d i0 E 1 T i0 + Op (T 1 2 ) (A21) i0 1 T d ci0 2 E dc d 2 i0 + Op (T 1 2 ) (A22) Introducing in (A17) and rearranging terms: 1 T E B= d ci0 1 T E dc 1 T 1 T E i0 1 T E d i0 E dcci0 d i d cci + dcci d 2dc i d ci i 2(d i dcci d2c i ) d i d cci + dcci d 2dc i d i 2(d i dcci d2c i ) 1 T E d E i0 1 T dc 1 T E i0 1 T E ci d i0 1=2 + Op (T E 1 T d i0 ci0 E T E ) 1 T d ci0 2 E i0 1=2 ) ), adding 1=T 2 in numerators and denomi- E 1 T 1 d T + Op (T 1 dc T dcci0 E 1 T dc (A24) 2 2 i0 1 1 dcci0 E d ai0 T T 1 1 1 1 d i0 E d cci0 + E dcci0 E da i0 E T T T T 1 1 1 1 + E d i0 E dc i0 d i0 E d ci0 E T T T T 1 1 1 1 dcci0 E d ci0 + E d i0 E dccci0 E T T T T 1 1 2 2 E 1 d i0 E 1 dcci0 E 1 dc i0 1 d T E (A23) 1 1 2 B= i0 i0 i0 1=2 Op (T d 2 dc E T1 d i0 E T1 dcci0 E T1 dc dcci d ci + d i dccci 2dc i d cci 2(d i dcci d2c i ) dcci d ci + d i dccci 2dc i d cci Op (T 2(d i dcci d2c i ) d i d cci + dcci d 2dc i d aci i 2 2(d i dcci dc i ) Evaluating at 0 , using the fact that i ( ) = nators and replacing by expectations: 1 T dcci0 E i0 E T i0 1=2 E 1 d T 2E 1 dc T i0 E 1 d T 2E 1 dc T i0 E 1 d T ci0 cci0 T cci0 +E 1 dcci0 E T 1 d T i0 2E 1 dc T i0 E 1 d T aci0 ) (A25) Finally, taking the expected value of this expression will not change anything, except that the remainder would be O(T 1 ) instead of Op (T 1=2 ). Part C. To analyze C, we need the following result: @ E (d @ i ci ) = E (d ci ) + E (d ci d i ) This works with other derivatives of expectations as well. C is the sum of two derivatives, that we call C and C c respectively, evaluated at 37 (A26) i = i( ). C is equal to: @ @ai C = @ @ai = + E(d ci )E(dc i ) E(dcci )E(d i ) E(d i )E(dcci ) [E(dc i )]2 (E(d ci )E(dc i ) E(d (E(d E(dcci )E(d i )E(dcci ) ci )E(dc i ) [E(dc i @ i )) @ai E(dcci )E(d (E(d i )) )]2 i )E(dcci ) E(d i )E(dcci ) 2 2 [E(dc i )]2 [E(dc i )] ) Working with the derivative and using the above result, we get: Ca = 1 )E(d i cci ) E(d fE(d ci ) [E(d ci ) E(dcci ) [E(d + E(d + E(dc i dai )] + E(dc i ) [E(d i) + E(d ci )E(dc i ) (E(d fE(d [E(dc i )]2 E(d + E(d i ) [E(d cci ) ci dai )] + E(dcci dai )]g E(dcci )E(d i )E(dcci ) i ) [E(d cci ) 2E(dc i ) [E(d i dai )] aci ) [E(dc i i) 2 2 )] ) + E(dcci dai )] + E(dcci ) [E(d ci ) i) + E(d i dai )] aci ) + E(d i dci )] + E(dc i dai )]g Likewise, for C c we have: Cc = fE(d i ) [E(d cci ) E(d + 1 )E(d i cci ) E(d E(d + E(d i )E(dc i ) E(d i )E(dcci ) fE(dcci ) [E(d ci ) 2E(dc i ) [E(d 0 + E(dc i dci )] + E(dc i ) [E(d i ) [E(d cci ) (E(d We then evaluate at [E(dc i )]2 E(d ci ) [E(d ci ) + E(d i dci )]g i )E(d ci ) 2 [E(dc i )]2 ) + E(d cci ) ci dci )] i dci )] + E(d i ) [E(dccci ) + E(dcci dci )] + E(dc i dci )]g and take the expected value of these expressions. Putting everything together. Finally, if we add all the terms of B and C from before, which is equal to d M i ( ) d i ( ; i ( )) = B + C, we get exactly minus (A15). Therefore, the modi…ed score equal the standard score minus the …rst order term of the bias, because we are subtracting it with the modi…cation B + C: The reminder of this expansion for d M i ( ) is O(T 1 ); as opposed to O(1); which is the order of magnitude of the bias of d i ( ; i ( )). This shows that MMLE reduced the order of the bias of the MLE. B Computation of the MMLE Computing the MMLE implies maximizing a likelihood whose …rst order condition is equation (10). This …rst order condition has known close analytical terms. This means that we can program this optimization problem in any of the most frequently used programs in economics: MATLAB, GAUSS, and STATA. We can even use one of the already written routines and tools in those programs to maximize a likelihood, 38 provided it allow us specify the analytical form of the …rst order condition; otherwise we would obtain the MLE instead of the MMLE. We have used FORTRAN to program the MMLE for this paper because we are more familiar with this programming language and because we have conducted several Monte Carlo experiments and expected FORTRAN to be faster at doing this. But nothing in MMLE prevents us from using other software and programing language. MML Estimates reported in Table 8 (our main model) took 5 minutes. MML Estimates reported in Table 13 took 34 minutes, because it has much more variables than model in Table 8. There are three main aspects when computing the MMLE: 1. We …rst have to obtain the several derivatives and cross derivatives of the likelihood (8). This includes di¤erentiating the MLE’s …rst order conditions for the …xed e¤ects with respect to , ci so that we obtain @@bi and @^ @ . This may look somewhat tedious, but these are straight forward calculations with known compact general forms that hold for all the parameters. 2. Calculate the expectations in (10). They are expectations of functions of Xit and hit 1 , f (hit 1 ; Xit ) where f denotes here any of the functions that results from the derivatives that compound the modi…cation. These expectations are conditional on all the values of the xi covariates, on hi0 , and on ( i ; ci ); that is E [ f (hit 1 ; Xit )j Xi = xi ; hi0 ; i ; ci ]. Thus, the only random variable over which the expectation is made is hit 1 whenever t > 1. For t = 1 E [ f (hit 1 ; Xit )j Xi = xi ; hi0 ; i ; ci ] = f (hi0 ; xit ). For t = 2 E [ f (hit 1 ; Xit )j Xi = xi ; hi0 ; i ; ci ] = f (hi1 = 1; xit ) Pr (hi1 = 1jxi ; hi0 ; i ; ci ) + f (hi1 = 0; xit ) Pr (hi1 = 0jxi ; hi0 ; i ; ci ) + f (hi1 = 1; xit ) Pr (hi1 = 1jxi ; hi0 ; i ; ci ), where the Pr (hi1 jxi ; hi0 ; i ; ci ) are those given by the model in equations (5). For t > 2 we continue proceeding recursively using Pr (hit 2 jxi ; hi0 ; i ; ci ) to calculate Pr (hit 1 jxi ; hi0 ; i ; ci ): Pr (hit 1 jxi ; hi0 ; i ; ci ) = Pr(hit 1 jxit ; hit 2 = 1; ci ; i ) Pr (hit 2 = 1jxi ; hi0 ; i ; ci ) + Pr(hit 1 jxit ; hi1 = 0; ci ; i ) Pr (hit 2 = 0jxi ; hi0 ; i ; ci ) + Pr(hit 1 jxit ; hi1 = 1; ci ; i ) Pr (hit 2 = 1jxi ; hi0 ; i ; ci ), where Pr(hit 1 jxit ; hit 2 ; ci ; i ) is given by equations (5) and Pr (hit 2 jxi ; hi0 ; i ; ci ) has already been obtained in this recursive process. 3. Concentrate the likelihood and estimate with …xed e¤ects. The problems come from not having a close form for bi and c^i to obtain the analytic expression of the concentrated likelihood, and from having to estimate as many …xed e¤ects parameters as individuals in the panel with large N . This problem is not speci…c to the MMLE. It a¤ects any estimator with …xed e¤ects and has already been treated in the literature. On top of that, computational problems are smaller with the current technology than they used to be. Classical references o¤ering di¤erent solutions are Chamberlain (1980) and Heckman and MaCurdy (1980). More recently Greene (2004) also deals with the computational problem of inverting a large Hessian matrix. We have not used any of these solutions when estimating the MLE and MMLE. We have followed the proposal in Appendix B of Carro (2008) that concentrates the likelihood numerically by nesting the …rst order conditions used to compute the …xed e¤ects in the algorithm that maximizes the concentrated likelihood with respect to and . We have found this to be faster than dividing the optimization problem in two procedures and iterating back and forth between the two optimization algorithms until convergence is reached, as proposed by Heckman and MaCurdy (1980). This also does not require us to invert a large Hessian matrix and, at the same time, produces a correct estimate of the variance. See Appendix B in Carro (2008) for further details. In any case, the message here is that these computational problems already have satisfactory solutions. 39