Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Author manuscript, published in "Journal of Econometrics 155, 2 (2009) 117" DOI : 10.1016/j.jeconom.2009.09.021 Accepted Manuscript Dynamic invariant multinomial probit model: Identification, pretesting and estimation peer-00628315, version 1 - 2 Oct 2011 Roman Liesenfeld, Jean-François Richard PII: DOI: Reference: S0304-4076(09)00234-6 10.1016/j.jeconom.2009.09.021 ECONOM 3251 To appear in: Journal of Econometrics Received date: 7 September 2007 Revised date: 7 July 2009 Accepted date: 24 September 2009 Please cite this article as: Liesenfeld, R., Richard, J.-F., Dynamic invariant multinomial probit model: Identification, pretesting and estimation. Journal of Econometrics (2009), doi:10.1016/j.jeconom.2009.09.021 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. IPT ACCEPTED MANUSCRIPT SC R Dynamic Invariant Multinomial Probit Model: Identification, Pretesting and Estimation Roman Liesenfeld∗ Department of Economics, Christian Albrechts Universität, Kiel, Germany NU July 7, 2009 MA Abstract TE D We present a new specification for the multinomial multiperiod Probit model with autocorrelated errors. In sharp contrast with commonly used specifications, ours is invariant with respect to the choice of a baseline alternative for utility differencing. It also nests these standard models as special cases, allowing for data based selection of the baseline alternatives for the latter. Likelihood evaluation is achieved under an Efficient Importance Sampling (EIS) version of the standard GHK algorithm. Several simulation experiments highlight identification, estimation and pretesting within the new class of multinomial multiperiod Probit models. AC CE P peer-00628315, version 1 - 2 Oct 2011 Jean-François Richard Department of Economics, University of Pittsburgh, USA JEL classification: C35, C15 Keywords: Discrete choice, Efficient Importance sampling, Invariance, MonteCarlo integration, Panel data, Simulated maximum likelihood; Contact author: R. Liesenfeld, Institut für Statistik und Ökonometrie, Christian-AlbrechtsUniversität zu Kiel, Olshausenstraße 40-60, D-24118 Kiel, Germany; E-mail: liesenfeld@statecon.uni-kiel.de; Tel.: +49-(0)431-8803810; Fax: +49-(0)431-8807605. ∗ ACCEPTED MANUSCRIPT Introduction IPT 1 In this paper we revisit the Dynamic Multinomial (multiperiod) Probit model (hereafter DMP). DMP models offer a flexible and operational framework for SC R analyzing correlated sequences of discrete choices such as living arrangement decisions for elderlies (Börsch-Supan et al., 1990) or the brand choices in successive purchases (Keane, 1997). The standard DMP specification commonly used in the literature initially NU selected a priori among all available alternatives. It then assumes that the error terms associated with these differences follow a stationary diagonal AR(1) MA process. One common interpretation of that approach amounts to treating the selected baseline utility as non-random – see, e.g., Börsch-Supan et al., (1990) or Geweke et al. (1997). However, as we shall discuss below, the standard DMP model suffers from a major drawback in that it is not invariant with respect to TE D the choice of the baseline alternative. Specifically, DMP models derived under different baseline alternatives are non-nested and their respective parameterizations are not one-to-one transformations of one another. It follows that results (estimations or test statistics) derived under different baseline alternatives are mutually incompatible and, therefore, not easily comparable. AC CE P peer-00628315, version 1 - 2 Oct 2011 expresses all utilities in differences from that of a baseline alternative which is In the present paper we propose a dynamic version of the multinomial probit model which is specified in terms of utilities prior to differencing. It still relies upon an arbitrary baseline alternative in order to construct the likelihood function. However, parameters associated with different selections of baseline alternative will be in one-to-one correspondence with one another. Whence, our specification will be invariant with respect to that selection. 1 ACCEPTED MANUSCRIPT In addition, our Dynamic Invariant Multinomial Probit model (hereafter DIMP) IPT offers the critical advantage that it actually nests all DMP versions thereof, corresponding to different baseline categories. Whence it becomes trivial to test whether an initial DIMP model simplifies into a DMP model for a particular SC R baseline alternative (whose selection is now data based instead of arbitrary). Last but not least, the Monte Carlo (MC) evaluation of the likelihood function of the DIMP model is not more demanding than that of the standard DMP model. For the likelihood evaluation of both specification one can rely on very similar im- NU Hajivassiliou (1990) and Keane (1994). Actually, in the present paper we shall rely upon a numerically more Efficient Importance Sampling version of the GHK feld and Richard (2009). MA algorithm (hereafter GHK-EIS) as developed in the companion paper by Liesen- Invariance, nesting of DMPs, similar ease of computation offer strong incen- TE D tives for the adoption of our proposed DIMP specification by practitioners. In particular, it allows for pretesting of whether a DIMP model can be subsequently simplified into a standard DMP model under data based selection of a baseline alternative. The remainder of the paper is organized as follows. In Section 2, we use a sim- AC CE P peer-00628315, version 1 - 2 Oct 2011 plementations of the GHK probability simulator as developed by Geweke (1991), ple bivariate example in order to introduce some of the key features of the DIMP model under a simplified notation. The general DIMP specification is introduced in Section 3.1, followed by a discussion of its invariance (Section 3.2), identification (Section 3.3) and nesting properties (Section 3.4). Estimation in presented in Section 4 with a brief description of GHK-EIS (Section 4.1) followed by its application to likelihood evaluation (Section 4.2). MC experiments are presented in Section 5: First a correctly specified DIMP (Section 5.1), next a misspecified 2 ACCEPTED MANUSCRIPT DMP (Section 5.2) and finally a sample-based pretesting of a correctly specified Introductory example SC R 2 IPT DMP (Section 5.3). Section 6 concludes. Consider the case where there are only two categories with utilities given by  NU ut1  Ut =   = µ (xt ; β) + εt , ut2 (1) where µ(·) is momentarily left unspecified and εt follows a stationary AR(1) process ηt ∼ N2 (0, Σ). MA εt = Rεt−1 + ηt , (2) Assume that we only observe the difference Yt = d′ Ut with d′ = (1, −1), or later TE D only its sign. The following related three questions are central to our paper: (i) Which parameters remain identified? (ii) Under what conditions would d′ εt itself follow a stationary AR(1) process? (iii) What would be the consequences of incorrectly assuming that d′ εt follows AC CE P peer-00628315, version 1 - 2 Oct 2011  an AR(1) process? For the ease of exposition we initially consider the case when R is diagonal with diagonal elements ρ1 and ρ2 (with |ρi | < 1). Selecting an appropriate (re)parametrization helps clarifying the issues under consideration. Since the transformation from εt to d′ εt implies a reduction in dimensionality from 2 to 1 and, therefore, an (implicit) marginalization, we first introduce the auxiliary 3 ACCEPTED MANUSCRIPT     1 −1 Q= , 0 1 SC R  et  ε∗t =   = Q εt , εt2 IPT non-singular transformation (3) with et = d′ εt = εt1 − εt2 . Note that ε∗t follows the stationary AR(1) process ε∗t = R∗ ε∗t−1 + ηt∗ , ηt∗ ∼ N2 (0, Σ∗ ) , NU R∗ = QRQ−1 and Σ∗ = QΣQ′ . (5) MA Let Φ = (φij ) denote the stationary covariance matrix of εt and Φ∗ that of ε∗t Φ∗ = QΦQ′ . (6) TE D The most relevant parametrization is that which is associated with the factorization of the stationary density of ε∗t into a marginal density for et and a conditional density for εt2 |et . Whence, Φ∗ is re-parameterized as AC CE P peer-00628315, version 1 - 2 Oct 2011 with (4)   b2 Ψ   Ψ Φ∗ =  , b2 Ψ υ 2 + b22 Ψ (7) with Ψ > 0, υ 2 > 0 and b2 ∈ R. For the ease of reference, the relationships between the successive parameterizations just introduced are given by φ11 = σ12 , 1 − ρ21 φ12 = σ12 , 1 − ρ1 ρ2 4 φ22 = σ22 , 1 − ρ22 (8) ACCEPTED MANUSCRIPT υ2 = b2 = φ21 − φ22 , φ11 + φ22 − 2φ12 φ11 φ22 − φ212 . φ11 + φ22 − 2φ12 IPT Ψ = φ11 + φ22 − 2φ12 , (9) (10) SC R For obvious reasons of symmetry we shall also consider the (stationary) regression coefficient of εt1 , on (εt2 − εt1 ) which is given by b1 = φ21 − φ11 = − (1 + b2 ) . φ11 + φ22 − 2φ12 (11) NU ρ2 ) together with β. The identification for β is standard and has to be achieved by means of restrictions on the difference µ1 (xt , β) − µ2 (xt , β) while υ 2 which repre- MA sents the variance of the conditional distribution of the utility error term εt2 given et is clearly unidentified. We are left discussing identification of (Ψ, b2 , ρ1 , ρ2 ). Equations (5) to (7) imply that the stationary distribution of et is characterized TE D by the following moments: Var (et ) = Ψ, (12)   µ ¶ 1 ∗ s Cov (et−s , et ) = 1 0 (R∗ ) Φ∗   = γs Ψ, 0 AC CE P peer-00628315, version 1 - 2 Oct 2011 The parametrization used for the rest of the discussion consists of (Ψ, b2 , υ 2 , ρ1 , (13) with γs∗ = [ρs1 + b2 (ρs1 − ρs2 )] . (14) Identification results for Ψ are standard. If d′ Ut is observed, then Ψ is identified. If only the sign of d′ Ut is observed then it is identified only up to a constant and this indeterminacy is typically resolved by setting Ψ = 1. If b2 = 0, then ρ1 is identified. If b2 = −1 (b1 = 0), then ρ2 is identified. Otherwise, the triples 5 ACCEPTED MANUSCRIPT (ρ1 , ρ2 , b2 ) and (ρ2 , ρ1 , b1 ) are observationally equivalent, in which case (ρ1 , ρ2 , b2 ) IPT are locally but not globally identified. However, as we shall discuss below, global identification of the ρs and bs obtains from the diagonal elements of Cov(et−s , et ) zero). SC R when the dimension of et is greater than 1 (except on a subspace of measure It also follows from Equation (14) that if any of the following three conditions hold: (i) b1 = 0; (ii) b2 = 0; (iii) ρ1 = ρ2 , then γs∗ = (γ1∗ )s and et follows the NU et = γ1∗ et−1 + λt . (15) If neither of these conditions hold, then while it still is the case that Cov(λt , et−1 ) = MA 0 by construction, the higher order Cov(λt , et−s ) are non zero for s > 1. In other words, λt is no longer an innovation relative to {eτ }t−2 τ =1 and et no longer follows an AR(1) process. Furthermore, even though |ρi | < 1, it does not even follow that |γ1∗ | < 1 since b2 ∈ R. Actually γ1∗ is then unrestricted. Note that the first- TE D order covariance associated with γ1∗ in Equation (15) does not suffice to identify the ρs and bs by itself. As we shall formally demonstrate in Section 3.3 below, identification of these parameters requires taking into consideration the higher order covariances. The consequences of erroneously assuming that et follows an AR(1) process AC CE P peer-00628315, version 1 - 2 Oct 2011 stationary AR(1) process when neither of the conditions listed above hold depend upon the nature of the T were observed, then we would be discussing a standard observations. If {d′ Ut }t=1 Generalized Least Squares miss–specification and the ML estimate of β would remain consistent but would be inefficient. Since, furthermore, it need not be the case that |γ1∗ | < 1, we might be lead to erroneously conclude that the process is non–stationary. If, as commonly the case for discrete choice applications, we 6 ACCEPTED MANUSCRIPT only observed the signs of the differences d′ Ut , then the Hessian would no longer IPT be block diagonal and all parameter estimates would be inconsistent. Foremost, we shall see that the ρs and bs are identified except on a zeromeasure subspace and that the (EIS-)GHK algorithm applies as discussed below SC R to the more general model. Whence, there is no need to impose a priori the restriction that d′ εt follows an AR(1) process, which eventually can be pretested within the more general model introduced in Equations (1) and (2). Actually, we might not even restrict R to be diagonal. Let R = (rij ). The NU (16) MA     ∗ ∗ r11 r12  r11 − r12 r11 + r12 − (r22 + r21 ) . =  ∗ ∗ r21 r22 r21 r22 + r21 The coefficients γs∗ in Equation (13) then obtain from the recursion TE D     ∗ ∗ γs−1  γs  ,   = R∗  cs−1 cs s > 0, (17) ∗ = 0 is sufficient initialized by γ0∗ = 1 and c0 = b1 . It follows that the condition r12 for et to follow an AR(1) process (the conditions b1 = 0 or b2 = 0 are no longer ∗ ∗ sufficient if r12 6= 0). Actually, if r12 = 0, then it follows from Equations (4) and AC CE P peer-00628315, version 1 - 2 Oct 2011 corresponding elements for R∗ are then given by (16) that et = (r11 − r12 ) et−1 + (η2t − η1t ) . 7 (18) ACCEPTED MANUSCRIPT Dynamic Invariant Multinomial Probit (DIMP) 3.1 IPT 3 Model We now discuss the higher dimensional version of the pilot model introduced in SC R Section 2. Let Ut′ = (ut1 , . . . , utJ ) denote a J-dimensional vector of utilities at time t. We assume that Ut evolves according to the stationary process εt | εt−1 ∼ NJ (Rεt−1 , Σ) , NU (19) where µt = µ(xt ; β) is left unspecified since the focus of our analysis lies in the second order moments of the process and of its subsequent transformations. The MA stationary covariance matrix of εt denoted by Φ satisfies the identity Σ = Φ − RΦR′ . (20) TE D The standard approach consists of selecting a baseline alternative, say alternative J, and of expressing all utilities in difference from utJ . As in Section 2, we initially complete that transformation into the following non-singular transformation:      Yt   D  Ut∗ =   =   Ut = QUt , utJ i′J AC CE P peer-00628315, version 1 - 2 Oct 2011 U t = µt + ε t , (21) where D is the (J − 1) × J matrix . D = (IJ−1 .. − ιJ−1 ), (22) ι′J−1 = (1, . . . , 1) and i′J is the unit vector (0, . . . , 0, 1). The transformed system 8 ACCEPTED MANUSCRIPT is given by with ε∗t = Qεt , Σ∗ = QΣQ′ , R∗ = QRQ−1 ,   IJ−1 ιJ−1  = . 0 1 SC R µ∗t = Qµt , IPT ¢ ¡ ε∗t | ε∗t−1 ∼ NJ R∗ ε∗t−1 , Σ∗ , Ut∗ = µ∗t + ε∗t , Q−1 (23) (24) (25) NU ¡ ¢ Cov ε∗t−s , ε∗t = (R∗ )s Φ∗ . Φ∗ = Var (ε∗t ) = QΦQ′ , (26) MA As in Section 2, we partition Φ∗ as follows   TE D Ψb  Ψ Φ∗ =  . b′ Ψ υ 2 + b′ Ψb (27) The stationary moments of et = Dεt are given by Var (et ) = Ψ,   µ ¶ . IJ  Cov (et−s , et ) = IJ .. 0 (R∗ )s Φ∗   = Γ∗s Ψ, 0 AC CE P peer-00628315, version 1 - 2 Oct 2011 The stationary covariance matrix of {ε∗t }Tt=1 consists of the following blocks with Γ∗s = µ   . IJ  IJ .. 0 (R∗ )s   . b′ ¶ (28) (29) (30) In the next two sections we discuss the invariance and the identifications of the parametrization (Ψ, b, {Γ∗s }). 9 ACCEPTED MANUSCRIPT Invariance IPT 3.2 Suppose we select a different baseline alternative, say utj of instead of utJ . The simplest way to handle this change consists of permuting ujt and uJt before SC R applying the Q transformation introduced in Equation (21). Whence the error terms associated with the baseline alternative utj are given by −1 ∗ ε∗j t = QPj εt = QPj Q εt , (31) NU   0 , 1 (32) MA S j QPj Q−1 =  i′j where Sj is the (J − 1) × (J − 1) matrix   TE D 0  Ij−1 −ιj−1   Sj =  −1 0  ,  0   0 −ιJ−j−1 IJ−j−1 with SJ = IJ−1 , (33) Sj−1 = Sj and −i′j Sj = i′j . It follows from Equation (27) and (31) that the stationary covariance matrix of ε∗j t is given by AC CE P peer-00628315, version 1 - 2 Oct 2011 where Pj is the permutation matrix for rows J and j. Note that  S j Φ∗j =  i′j      Ψb  Sj′ ij   Ψj 0  Ψ Ψj bj   , =  ′ ′ ′ ′ 2 2 1 bj Ψj υj + bj Ψj bj b Ψ υ + b Ψb 0 1 (34) with Ψj = Sj ΨSj′ , bj = Sj′ (ij + b) , 10 υj2 = υ 2 . (35) ACCEPTED MANUSCRIPT S j (R∗j )s =  i′j It follows that Γ∗js = µ    S 0 0 s j .  (R∗ )  ′ ij 1 1 SC R  IPT Moreover (since − i′j Sj = i′j ),   . IJ  IJ .. 0 (R∗j )s   = Sj Γ∗s Sj . b′ ¶ (36) (37) NU −1 (Ψj , bj , υj2 , {Γ∗js }Ts=1 ) associated with baseline alternative j are in one–to–one cor−1 respondence with the parameters (Ψ, b, υ 2 , {Γ∗s }Ts=1 ) associated with baseline al- MA ternative J. Whence ML estimators of the multinominal probit model introduced in Equation (19) will be invariant w.r.t. the choice of the baseline alternative. In sharp contrast, the standard dynamic multinominal probit model which assumes a diagonal autocorrelation matrix for the differences relative to a partic- TE D ular alternative is not invariant relative to the choice of that alternative unless all correlations are equal. Specifically, assume that R∗ is diagonal, say R∗ = Diag (ρ∗i ) , AC CE P peer-00628315, version 1 - 2 Oct 2011 The immediate implication of Equations (35) and (37) is that the parameters i = 1, . . . , J. (38) Then it is trivial to verify that R∗j which obtains from equation (36) with s = 1 is not diagonal unless 3.3 R∗ = ρ∗ IJ . (39) Identification Next, we discuss identification of the DIMP model in Equation (19) when observations consist of {xt , jt }Tt=1 , where jt denotes the index of the alternative 11 ACCEPTED MANUSCRIPT selected at time t. Following our discussion in Section 3.1, we can arbitrarily IPT select a baseline alternative, say alternative J. Since the transformations associated with permutations of the alternatives are one-to-one we only need to discuss identification of the baseline parameters. The parameters whose identification SC R needs to be discussed are (Ψ, b, υ 2 , β, R). Identification of β has been extensively discussed in the literature – see, e.g., Bunch (1991) and Keane (1992) – and need not be considered further here. Moreover, under scale normalization of the mean vector µ(xt , β), the stationary co- NU left discussing identification of (b, R). Actually, the likelihood function depends upon the sequence of auxiliary re- MA −1 gression matrices {Γ∗s }Ts=1 introduced in Equation (30) which are clearly over– identified functions of b and R. (This will not be a problem for ML estimation since optimization will be conducted in terms of b and R themselves.) Since the TE D relationship between (b, R) and {Γ∗s } is highly non-linear we cannot expect global identification if R is left unconstrained (beyond stationarity). Local identification might be considered though in the present paper, we choose to focus our attention on the case where R is diagonal, for which we can establish global identification under simple conditions. Whence let AC CE P peer-00628315, version 1 - 2 Oct 2011 variance matrix Ψ in Equation (27) is identified while υ 2 is not. Whence, we are R = Diag (ρi ) , i = 1, . . . , J, 12 (40) ACCEPTED MANUSCRIPT s ρ1  0  ∗ Γs =  .  ..   0 0 ρs2 .. . 0   ρs1 ρsJ  − 0       s s    ··· 0   ρ2 − ρJ  +  · (b1 , b2 , . . . , bJ−1 ), .. ..    . .        s s s ρJ−1 − ρJ · · · ρJ−1 ··· (41) SC R  IPT in which case Γ∗s in Equation (30) is given by where the bj s are the individual elements of b. The following special cases are NU identified. In order to avoid dealing with all possible combinations of these special cases, we first proceed to a reduction of the matrices {Γ∗s } by eliminating all rows i for which ρi = ρJ and all columns j for which bj = 0. Moreover, we also MA permute alternatives in such a way that the remaining JR rows and JC columns are regrouped in the leading JR × JC blocks of {Γ∗s }. It follows that all bj s for j > JC and all ρi s for i > min(JR , JC ) are identified. The following theorem then TE D provides sufficient condition for global identification of (b, R): Theorem. If (i) JR > 1; (ii) JC > 0; (iii) either there exist a pair ρi 6= ρi′ (i, i′ ≤ JR ) or a bj 6= − 21 (j ≤ JC ), then (b, R) are globally identified. AC CE P peer-00628315, version 1 - 2 Oct 2011 trivial: (i) If ρi = ρJ , then ρi is identified; and (ii) if bj = 0, then (ρj , bj ) are Proof. Since we can permute alternatives, we need only to consider the leading (2 × 1) block of the reduced {Γ∗s }, say  s ρ 1  + b1 (ρs1 − b1 (ρs2 − ρsJ ) 13  ρsJ ) . ACCEPTED MANUSCRIPT ρs1 + b1 (ρs1 − ρsJ ) ≡ ρsJ − (1 + b1 )(ρsJ − ρs1 ). IPT As was the case for γs∗ in Equation (14) we note that SC R If, however, ρ1 6= ρ2 or ρ1 = ρ2 but b1 6= −1/2 (in which case −(1 + b1 ) 6= −b1 ), the off-diagonal element (2, 1) prevents permuting ρ1 and ρJ . 2 3.4 Reinterpreting the standard DMP NU 1997) requires selecting a baseline alternative, say alternative J, expressing all utilities in deviation from utJ and assuming that the differences et = Dεt fol- MA low a diagonal AR(1) process. This model is clearly not invariant with respect to the choice of the baseline alternative. Worse, in the absence of additional assumptions on the underlying utility process the DMP models associated with TE D different baseline alternatives are non-nested with one another. This probably explains why, to the best of our knowledge, data based selection of the baseline alternative has not been discussed in the literature. It turns out that our discussion of invariance and identification enables us to reinterpret the standard DMP model as one which is explicitly nested within AC CE P peer-00628315, version 1 - 2 Oct 2011 The standard DMP model (see e.g. Börsch-Supan et al., 1990 or Geweke et al., a DIMP with diagonal R matrix. The key to such reinterpretation lies with Equation (41) which indicates that a diagonal R matrix also produces diagonal Γ∗s matrices if b is zero. However, in such a case the matrices Γ∗js associated with baseline alternative j 6= J are not diagonal – confirming the lag of invariance of 14 ACCEPTED MANUSCRIPT s ρ1  0  ∗ Γjs =  .  ..   0 0 ρs2 .. . 0   ρs1 ρsj  − 0       s  s   ... 0   ρ2 − ρj  ′ −  ij , .. ..    . .        s s s ρJ−1 − ρj . . . ρJ−1 ... (42) SC R  IPT the standard DMP model. Instead, following Equation (37) they are given by which leaves ρJ as the sole unidentified autocorrelation coefficient, irrespective of NU – see also Equation (35) – indicates that the vector bj associated with baseline alternative j 6= J is given by MA bj = Sj′ ij = −ij . (43) This reinterpretation of the standard DMP model has fundamental impli- TE D cations for practitioners. Specifically, once the standard DMP model is nested within a DIMP model with diagonal R matrix, it is no longer advisable to select a priori (arbitrarily) a baseline alternative, say J, and to impose the assumption that the corresponding Γ∗s matrices be diagonal. Instead, one can use the DIMP as maintained hypothesis under a parametrization (Ψ, b, R) which is fundamen- AC CE P peer-00628315, version 1 - 2 Oct 2011 the baseline alternative. A direct comparison between Equations (41) and (42) tally invariant with respect to J. It is then trivial to test whether this DIMP simplifies into a standard DMP for baseline alternative j denoted by DMPj . The corresponding null hypotheses are given by H0j (“DMPj ”) : bj = 0, for j 6= J, and H0J (“DMPJ ”) : b = 0, (44) all of which are trivially nested within the DIMP. In view of Equation (35) these 15 ACCEPTED MANUSCRIPT H0j : b = −ij , for j 6= J, and IPT null hypotheses can all be reformulated in terms of b H0J : b = 0. (45) SC R Nesting DMPj s within a DIMP with diagonal R matrix removes all ambiguities as well as arbitrariness associated with the conventional interpretation of DMP models. Note, in particular, the complete disconnect between the selection of alternative J for parametrization purposes (with full invariance with respect to J) NU based tests of H0j are clearly invariant with respect to J). The fact that under the null hypotheses H0j : DMPj (j = 1, ..., J) the pa- MA rameter ρJ is not identified poses the problem that the asymptotic distribution of standard tests like the likelihood ratio (LR), Lagrange multiplier (LM), and Wald test are nonstandard, which means that the conventional critical values cannot be used to assess significance (see, e.g., Davies 1977,1987). In order to TE D derive asymptotic optimal tests for such testing problems, Davies (1977, 1987) and Andrews and Ploberger (1994) consider mappings (supremum and average) of the LR, LM, and Wald statistics as functions in the nuisance parameter which is unidentified under the null hypothesis, while Hansen (1996) suggests a useful computational method for simulating appropriate asymptotic critical values. AC CE P peer-00628315, version 1 - 2 Oct 2011 and the eventual subsequent data based selection of particular DMPj (likelihood The extension of these methods to our nonlinear simulated Maximum Likelihood (ML) framework seems to be conceptionally straightforward but is beyond the scope of the present paper and is left for future research. A simple but computationally more demanding alternative to those asymptotic methods is to perform bootstrap versions of the LR, LM, and Wald test, where the model under consideration is used to simulate the finite-sample distribution of the corresponding 16 ACCEPTED MANUSCRIPT 4 IPT test statistics under the null hypothesis. Estimation SC R Before we formally discuss simulated ML estimation, it should be noted that since DMP models are nested within DIMP models, the analysis which follows applies identically to both classes of models. The only difference is that a DIMP model includes the J additional parameters (b, ρJ ). This justifies our earlier claim of NU well as DIMPs. GHK and GHK-EIS MA 4.1 Let y ′ = (y1 , . . . , yM ) denote a M -dimensional Normal random vector with mean TE D µ and covariance matrix V : y = µ + Lη, η ∼ NM (0, IM ) , V = LL′ , (46) where L denotes the lower triangular Cholesky decomposition of V . The probability to be computed is that of the event D = {yi < 0; i = 1, . . . , M }. Let ℓ′τ = (γτ′ , δτ ) denote the τ th (lower triangular) row of L with γτ ∈ Rτ −1 and AC CE P peer-00628315, version 1 - 2 Oct 2011 similar ease of computation, and the analysis which follows applies to DMPs as δτ > 0. The τ th row of y is then given by yτ = µτ + γτ′ η(τ −1) + δτ ητ , 17 (47) ACCEPTED MANUSCRIPT P (D) = M Y Z IPT ′ with η(τ −1) = (η1 , . . . , ητ −1 ) and η(0) = ∅. It follows that ϕτ (η(τ ) )dη, RM τ =1 ϕτ (η(τ ) ) = I(ητ < − SC R with 1 [µτ + γτ′ η(τ −1) ]) · φ(ητ ), δτ (48) (49) where I denotes the indicator function and φ the standardized normal density. importance sampling density of the form mτ (ητ |η(τ −1) ; aτ ), (50) MA m(η; a) = M Y NU in Liesenfeld and Richard (2009). In short they rely upon a sequential parametric τ =1 with a′ = (a1 , . . . , aM ) ∈ A = ×M τ =1 Aτ . The density mτ obtains from an auxiliary TE D density kernel kτ (η(τ ) ; aτ ) with known functional integral χτ (η(τ −1) ; aτ ) in ητ , i.e., kτ (η(τ ) ; aτ ) mτ (ητ |η(τ −1) ; aτ ) = , χτ (η(τ −1) ; aτ ) χτ (η(τ −1) ; aτ ) = Z kτ (η(τ ) ; aτ )dητ . (51) R GHK-EIS relies upon a backward recursive sequence of auxiliary LS problems, selecting values of aτ which approximately minimize the MC sampling variances AC CE P peer-00628315, version 1 - 2 Oct 2011 Both GHK and GHK-EIS are importance sampling techniques that are described of the ratios ϕτ χτ +1 /kτ for τ = M, . . . , 1 (with χM +1 ≡ 1). If we momentarily assume that χτ +1 is given by the product of a gaussian kernel in η(τ ) by a Normal cdf of the form Φ(ωτ ), where ωτ is a linear combination of η(τ ) , then the only non-Gaussian kernel in the numerator ϕτ χτ +1 is Φ(ωτ ) itself. Whence kτ obtains by combining the various Gaussian kernels in that numerator with a Gaussian 18 ACCEPTED MANUSCRIPT 1 ln Φ(ωτ ) ≃ − (α̂τ ωτ2 + 2β̂τ ωr + κ̂τ ), 2 (52) SC R with âτ = (α̂τ , β̂τ , κ̂τ ). IPT approximations of Φ(ωτ ) obtained by the LS approximation Analytical integration with respect to ητ of the resulting Gaussian kernel kτ over the range associated with the indicator function in Equation (49) produces a (recursive) functional integral which is itself in the form of a Gaussian kernel NU and kτ cancel out in the ratios ϕτ χτ +1 /kτ . It follows that the GHK-EIS estimate of P (D) is given by (s) where ω̃τ MA "M −1 # S (s) Φ(ω̃τ ) 1X Y P̂s (D) = χ1 (â1 ) · , S s=1 τ =1 exp − 1 (α̂τ [ω̃τ(s) ]2 + 2β̂τ ω̃τ(s) + κ̂τ ) 2 (s) (53) (s) S is a linear combination of η̃(τ ) , and {{η̃τ }M τ =1 }s=1 denotes S i.i.d. TE D trajectories drawn from the EIS sampler m(η; â) as defined in Equation (51) with â = {âτ }M τ =1 . As discussed in Richard and Zhang (2007), the values of (s) the EIS parameters â and, therefore, the simulated {η̃τ }M τ =1 trajectories depend upon the values of the model parameters (µ, V ). Hence, continuity of the MC (s) S likelihood evaluation (53) in (µ, V ) requires that all trajectories {{η̃τ }M τ =1 }s=1 AC CE P peer-00628315, version 1 - 2 Oct 2011 times a Gaussian cdf in ωτ −1 . Note that all Gaussian kernels common to ϕτ χτ +1 drawn under different values of â be obtained by a set of Common Random Numbers (CRNs) pre-drawn from a canonical distribution, which does not depend on the parameters â. In the present context, the CRNs consist of M × S draws form a uniform distribution on [0, 1] to be transformed by inversion into truncated (s) Gaussian draws from mτ (ητ |η̃τ −1 ; âτ ). As for GHK, it ignores the factor χτ +1 in the construction of the auxiliary 19 ACCEPTED MANUSCRIPT Gaussian kernels kτ , which is equivalent to setting α̂τ = β̂τ = κ̂r = 0 in Equations IPT (52) and (53). Whence, for a given simulation sample size S GHK is intrinsically numerically less efficient than GHK-EIS but computationally faster since it does not require computing the auxiliary LS approximations in Equation (52). See SC R Liesenfeld and Richard (2009) for full details and for numerical examples which indicate that even if we increase the simulation sample size S for GHK in order to equate computing time, GHK-EIS remains the numerically more efficient NU 4.2 GHK-EIS likelihood evaluation MA Since individuals in a probit panel data set are typically assumed to act independently from one another, we consider here a single individual for whom observations consist of a sequence of T indicators of selected alternatives {j1 , . . . , jT ; jt ∈ (1, . . . , J)} together with a T × K matrix of exogenous variables. As discussed TE D in Section 3, the identified parameters are those associated with an (arbitrary) baseline alternative J and are denoted by (Ψ, b, R, β). The likelihood function is the joint probability of the observed sequence of choices. The covariance matrix V of the T (J − 1) vector e′ = (e′j1 , . . . , e′jT ) with ejt = Sjt et consists of the following blocks: AC CE P peer-00628315, version 1 - 2 Oct 2011 procedure. All computations reported below rely upon GHK-EIS. - The variance of ejt obtain from Equation (34): Var(ejt ) = Ψjt = Sjt ΨSj′ t (54) - The covariance between ejt and ejs obtain from Equation (29): Cov(ejt , ejs ) = Sjt Γ∗t−s ΨSj′ s , 20 t > s. (55) ACCEPTED MANUSCRIPT Brute force Cholesky decomposition of the joint covariance matrix V enables IPT us to rewrite the DIMP model in the form given by equation (46), from which GHK-EIS evaluation of the likelihood function proceeds as described in Section 4.1. SC R As discussed in Section 3.1, D(I)MP models obtain by marginalization with respect to the Jth component of the error term ε∗t in Equation (23). It is precisely this operation which led to the replacement of the correlation matrices (R∗ )t−s by Γ∗t−s in Equation (55) and to the subsequent need for brute force Cholesky NU position of the covariance matrix of {ε∗t }, transformed as needed to account for {jt }, can take full advantage of the present AR(1) structure as discussed in the MA lemma below. Whence, an efficient algorithm would first compute that Cholesky decomposition, and only then marginalizing with respect to the T Jth components of {ε∗t }. In order to do so, we complete the rectangular transformation from TE D ε∗t ∈ RJ into ejt ∈ RJ−1 into the square transformation   Sjt 0 ∗ ∗ e0jt =   εt = Kjt εt , 0 1 (56) = Kjt . The covariance matrix V0 of the T J-dimensional vector e′0 = with Kj−1 t AC CE P peer-00628315, version 1 - 2 Oct 2011 decomposition of the joint covariance matrix V . In contrast, the Cholesky decom- (e0j1 ′ , . . . , e0jT ′ ) consists of the following blocks: ¡ ¢ Var e0jt = Kjt Φ∗ Kj′t , ¢ ¡ Cov e0jt , e0js = Kjt (R∗ )t−s Φ∗ Kj′s , (57) t > s. (58) Note that the covariance matrix V0 preserves the initial AR(1) structure (up to a set of squared transformations), as indicated by the presence of (R∗ )t−s in Equa21 ACCEPTED MANUSCRIPT tion (58) - instead of Γ∗t−s in Equation (55). The Cholesky decomposition of V0 IPT obtains by application of the following lemma (deleting the subscripts 0, ∗ and j for the ease of notation). SC R Lemma. Let the T J-dimensional stationary covariance matrix V be partitioned into J-dimensional blocks of the form (59) NU t ≥ s, with Kt−1 = Kt . Let L denote the lower triangular Cholesky decomposition of V , partitioned conformably with V . The diagonal blocks of L obtain by the following MA J-dimensional Cholesky factorizations: L11 L′11 = K1 ΦK1′ , with TE D Ltt L′tt = Kt ΣKt′ , (60) Σ = Φ − RΦR′ , (61) and its off–diagonal blocks by the products £ ¤ Lts = Kt (R)t−s Ks Lss , AC CE P peer-00628315, version 1 - 2 Oct 2011 Vts = Kt (R)t−s ΦKs′ , t > s. (62) Proof: See Liesenfeld and Richard (2009, Appendix). Note that Kt can only take J different forms for the D(I)MP, corresponding to each of the J different alternatives (see Equation 56). Whence the Cholesky factorization of the T J-dimensional matrix V0 requires at most J different Jdimensional Cholesky factorizations as given in Equations (60) and (61). The extension to V0 only requires two trivial modifications of the GHK-EIS algorithm for V : (i) Since the baseline utility uJt is not observed the corresponding 22 ACCEPTED MANUSCRIPT integral with respect to the last component of e0jt (t = 1, . . . , T ) is un-truncated. IPT It produces a probability Φ(·) equal to 1 in Equations (52) and (53) with the corresponding EIS coefficients equal to zero; (ii) The stationary covariance matrix Φ∗ as given by Equations (26) and (27) includes an additional parameter υ 2 which without affecting the likelihood values. 5.1 NU Monte Carlo simulations ML-GHK-EIS estimates of the DIMP model MA In order to illustrate the identification and ease of estimation of DIMP models we conducted a Monte Carlo experiment for a DIMP panel model with J = 3 alternatives, T = 10 periods and N = 500 individuals. We arbitrarily selected alternative 3 as the baseline (remembering that DIMP models are invariant with TE D respect to the selection of the baseline alternative). Since our focus of attention lies on second order moments we specified the regression function of the utilities, as given in Equation (19) directly in differences as Dµit = Dµi (xt ; β) = (π1 + ψZit1 , π2 + ψZit2 )′ , AC CE P peer-00628315, version 1 - 2 Oct 2011 5 SC R is unidentified and can, therefore, be set equal to any (reasonable) arbitrary value (63) with β ′ = (π1 , π2 , ψ) and Zitj ∼ i.i.d.N(0, 1) for t = 1, . . . , T , i = 1, ..., N and j ∈ {1, 2}. Note that this specification represents a simplified version of that used by Geweke et al. (1997). The autocorrelation matrix R in Equation (19) is diagonal with elements ρ1 = 0.8, ρ2 = 0.6 and ρ3 = 0.3. The covariance matrix Σ = [σik ] has diagonal elements σ11 = σ22 = σ33 = 1 and off-diagonal elements σ12 = σ13 = σ23 = 0.3. 23 ACCEPTED MANUSCRIPT is given by   SC R   3.087   . Φ∗ =  0.915 1.930     −0.704 −0.733 1.099 IPT The corresponding stationary covariance matrix Φ∗ as defined in Equation (26) (64) Its partitioning according to Equation (27) produces the following values for  0  1.757 Λ=  0.521 1.288   0.134 b = − . 0.316 NU  (65) MA Since we are leaving Dµit unnormalized, Ψ is identified only up to a scale factor. Whence, for ease of comparison with the parameter true values we set λ11 equal to its true value 1.757. Had we set it equal to one as commonly done, we would 1.757. TE D have to divide the true values for (π1 , π2 , ψ, λ11 , λ12 , λ22 ) and their estimates by ML-GHK-EIS estimates are computed using for the likelihood evaluations (53) a simulation sample size S = 20 which, as we shall see, suffices to produce high numerical accuracy in likelihood function IS estimates and corresponding ML parameter estimates. Following Richard and Zhang (2007) we conduct two AC CE P peer-00628315, version 1 - 2 Oct 2011 Λ = [λik ], the Cholesky decomposition of Ψ, and b sets of replications. For the first one we keep the simulated data set fixed and replicate ML-GHK-EIS estimation 100 times under different sets of CRNs used for the likelihood evaluations. The MC standard deviations of these replications are direct measures of numerical accuracy (accounting in particular for the simulation based selection of the EIS samplers). Since these replications are conducted for a fixed data set, we also computed Root Mean Squared Errors (RMSEs) relative to the “true” ML estimates (as computed using a simulation sample size S = 1, 000, 24 ACCEPTED MANUSCRIPT which suffices to produce highly accurate approximations to the infeasible true IPT ML estimates). Our second set of replications repeats ML-GHK-EIS under 100 different simulated data sets (using a fixed set of CRNs for all 100 replications). The resulting SC R means and standard deviations are numerical estimates of the finite sample (statistical) distribution of the ML estimators. We also produced for comparison asymptotic standard deviations based upon the inverse Hessian (since EIS likelihood estimates are sufficiently smooth to produce numerically well-behaved esti- NU confirm that the DIMP model is identified and statistically well-behaved, including the parameters (b1 , b2 , ρ3 ) which are not identified under the standard DMP MA (unless, as discussed in Section 3.4, the latter is reinterpreted as nested within the DIMP in which case b1 = b2 = 0 a priori though ρ3 remains unidentified). The finite sample standard deviations are close to their asymptotic counterparts. TE D The numerical (MC) standard deviations are orders of magnitude smaller than the statistical standard deviations (with ratios ranging from 0.016 to 0.050). Though the results are not reported here, we also reproduced the entire simulation exercise under ML-GHK. Our findings confirm results presented in Liesenfeld and Richard (2009) in that for a common simulation sample size S = 20 ML- AC CE P peer-00628315, version 1 - 2 Oct 2011 mates of the Hessian). The results are reproduced in Table 1. They unequivocally GHK estimates are numerically far less accurate than ML-GHK-EIS estimates especially for the DIMP parameters (b1 , b2 , ρ1 , ρ2 , ρ3 ) with ratios of numerical standard deviations ranging from 8 to 95. In that paper, where we provide a more detailed comparison of GHK and GHK-EIS for the estimation of the standard DMP model, we also produce results in which we increase the simulation sample size of the computationally less demanding GHK from 20 to 100 in order to roughly equalize computing time for both procedures. Doing so we found 25 ACCEPTED MANUSCRIPT that GHK-EIS with S = 20 remains numerically more efficient than GHK with IPT S = 100, though with efficiency gain reductions by a factor ranging between 1.2 and 2.1 relative to the comparison for a common simulation sample size S = 20 (see Liesenfeld and Richard, 2009, Tables 2 and 4). SC R For each of the 100 ML-GHK-EIS estimates of the DIMP model under different simulated data sets we computed the three Wald statistics testing whether the DIMP simplifies into a corresponding DMPj model, which amounts to testing the restrictions b = −ι1 (DMP1 ), b = −ι2 (DMP2 ), and b = 0 (DMP3 ). Figure 1 NU for all three hypotheses nearly all values of the Wald statistic are significantly larger than the ‘naive’ 1% critical value from a χ2(2) -distribution, which is used as MA a preliminary benchmark value. Of course, one would need, as mentioned above, to adjust the naive critical value since it ignores the fact that we are ‘scanning’ across a range of values for the nuisance parameter ρ3 which is unidentified under TE D the null. This suggests that the appropriate critical value is larger than the naive one (see Davis, 1987). However, as shown in Section 5.3 below, the appropriate critical value in the present context remains in the same ballpark as its naive counterpart. Hence, the comparably large values of the Wald-statistics in Figure 1 suggest that such a Wald-type procedure is a useful tool to pretest whether AC CE P peer-00628315, version 1 - 2 Oct 2011 provides the histogram of the values for the three Wald statistics. As expected, an initial DIMP model simplifies into a DMP model for a particular baseline alternative. The main conclusions to be drawn from this simulation exercise are that: (i) DIMP models appear to be statistically well-behaved, as expected in view of the identification theorem in Section 3.3; (ii) ML-GHK-EIS can produce numerically accurate ML estimates with as little as S = 20 EIS draws; and (iii) a Wald test used to pretest whether an initial DIMP model can be substituted by a standard 26 ACCEPTED MANUSCRIPT 5.2 IPT DMP model with a particular reference category appears to have good power. Estimates of a miss-specified DMP model SC R In order to analyze the impact of erroneously assuming a DMPj model, we used the same simulated data sets from the DIMP model as in Section 5.1 and computed the ML-GHK-EIS estimates for a miss-specified DMP3 model, i.e., the DIMP model with the restriction b = 0 and unidentified ρ3 . The results are NU of the estimates for the mean parameters π01 , π02 , and ψ are comparably small while, in contrast, the estimates for the covariance parameters λ12 , ρ1 , ρ2 are MA typically severely biased due to the miss-specification. Based on the likelihood values of the miss-specified DMP3 model at the parameter estimates and those of the DIMP model obtained in Section 5.1 we computed the values of the LR statistic testing for the DMP3 model. The lower right panel TE D of Figure 1 provides the histogram of the LR values. Consistent with the results for the Wald statistic (see lower left panel of Figure 1) most of the LR values are significantly larger than the naive asymptotic critical value. However note that the finite-sample variation of the LR-statistic appears to be much smaller than that of the Wald statistic. AC CE P peer-00628315, version 1 - 2 Oct 2011 reproduced in the left columns of Table 2. They indicate that (statistical) biases 5.3 Testing for a DMP model In order to analyze the finite-sample distribution of the Wald statistic under the null hypothesis of a DMP model, we simulated 100 data sets from a DMP3 model and computed ML-GHK-EIS estimates for an unrestricted DIMP specification together with the corresponding Wald statistic for the DMP3 model. 27 ACCEPTED MANUSCRIPT Under the restriction b = (b1 , b2 )′ = 0, the DIMP model with J = 3 alterna- IPT tives and j = 3 as baseline alternative reduces to a DMP3 model. This amounts to setting the off-diagonal blocks of the stationary covariance matrix Φ∗ given by Ψb and b′ Ψ equal to zero (see Equations 26 and 27). This restriction on Φ∗ is the initial specification (19) satisfy the Equations 1 − ρ1 ρ3 σ33 1 − ρ23 σ23 = 1 − ρ2 ρ3 σ33 . 1 − ρ23 (66) NU and We consider the following values for the (R, Σ) parameters: (ρ1 , ρ2 , ρ3 , σ11 , σ22 , σ33 ) = (0.8, 0.6, 0.3, 1, 1, 0.2), together with the restrictions given in Equation (66). MA The implied values for the identified parameters for the Cholesky decomposition of Ψ, and b are (λ12 , λ22 , b1 , b2 ) = (0.223, 1.137, 0, 0). The parameter estimates of the unrestricted DIMP model for the simulated data from a DMP3 specification are reported in the right columns of Table 2. TE D As expected, the ML-GHK-EIS estimator is statistically well-behaved, except for the parameter ρ3 , which is unidentified under the data generating DMP3 model. Note also that the means of the b1 and b2 estimates are not significantly different from zero. The histograms for the Wald statistics testing for the DMP1 , DMP2 , and DMP3 model are provided in Figure 2. As for the Wald statistic testing AC CE P peer-00628315, version 1 - 2 Oct 2011 σ13 = SC R satisfied if the parameters of the autocorrelation and covariance matrix (R, Σ) in the DMP3 specification (see lower panel of Figure 1), in 2% of the cases one would reject the null using the naive 1% asymptotic critical value, in 8% for the 5% critical value, and in 10% for the 10% critical value. This clearly suggests that the naive asymptotic χ2 distribution delivers a fairly good approximation. Hence, the appropriate critical values which account for the fact that the nuisance parameter ρ3 is unidentified under the null can be expected to be fairly close to 28 ACCEPTED MANUSCRIPT the naive ones. Finally, we observe that the histograms for the Wald statistic IPT testing for a DMP1 and DMP2 model (see upper panels of Figure 2) indicate large deviations from the specification under the corresponding null, as expected. The main conclusion to be drawn from this simulation experiment are that: SC R (i) the naive asymptotic distribution of the Wald statistic under the null of a DMP model provides a fairly good approximation to the true one, and (ii) the Wald test appears to be able to discriminate between different DMP specifications obtained by using different baseline categories, allowing for data based selection NU Predictions under a miss-specified DMP model MA 5.4 The histograms in Figure 1 show that when the true model is a DIMP model, both the Wald- and LR-test reject the miss-specified DMP models at very high rates. However, it is well-known that a miss-specified model can always be rejected given TE D enough data even if the miss-specification leads to only trivial deterioration in fit and predictions. This raises the question of whether a miss-specified DMP model produces significant differences in substantively important predictions relative to a correctly specified DIMP. Note that both models share common specifications for their means with similar estimated mean coefficients in Tables 1 and 2 and AC CE P peer-00628315, version 1 - 2 Oct 2011 of the latter. differ only in their error correlation structure. It follows, for example, that we should expect DMP miss-specification to have a greater impact on predicted transition probabilities than on predicted marginal selection probabilities. In order to substantiate these conjectures, we compare marginal selection probabilities and transition probabilities and their responses to changes in the covariates under the DIMP and DMP3 model estimated in Sections 5.1 and 5.2, setting the parameters at their estimated values in Tables 1 and 2, respectively. 29 ACCEPTED MANUSCRIPT In this context the DIMP parameters serve as true values while the DMP3 pa- IPT rameters implicitly represent pseudo-true values. Probabilities are accurately estimated by Monte Carlo using 1,000,000 simulated sequences of selected alternatives {ji1 , ..., jiT } for each model. and P (jit |Zit1 = 0, Zit2 = 1) − P (jit |Zit1 = Zit2 = 0), for jit = 1, 2, 3. NU P (jit |Zit1 = 1, Zit2 = 0) − P (jit |Zit1 = Zit2 = 0), As expected, percentage differences remain fairly small ranging from -6.82% to MA 7.34%. For reference, the selection probabilities P (jit |Zit1 = Zit2 = 0) for jit = 1, 2, 3 are (0.573, 0.086, 0.342) under the DIMP model and (0.571, 0.084, 0.345) under the DMP3 model. the covariates TE D In Table 4 we report the transition probabilities evaluated at the means of P (jit |jit−1 , Zi.1 = Zi.2 = 0), for (jit , jit−1 ) ∈ {1, 2, 3} × {1, 2, 3}, where Zi.j = (Zi1j , ..., ZiT j ), and observe larger percentage differences ranging AC CE P peer-00628315, version 1 - 2 Oct 2011 (j = 1, 2) on the marginal selection probabilities SC R In Table 3 we compare the partial effects of changes in the covariates Zitj from -16.11% to 29.65%. Finally, we report in Table 5 the partial effects of a permanent change in the covariates on the transition probabilities P (jit |jit−1 , Zi.1 = 1, Zi.2 = 0) − P (jit |jit−1 , Zi.1 = Zi.2 = 0), P (jit |jit−1 , Zi.1 = 0, Zi.2 = 1) − P (jit |jit−1 , Zi.1 = Zi.2 = 0). 30 and ACCEPTED MANUSCRIPT IPT Here we observe the largest percentage changes ranging from -19.35% to 59.65%. These results fully confirm our conjecture that miss-specified DMP models can significantly distort predicted transition probabilities and their responses to SC R changes in the covariates even when marginal selection probabilities are affected to a much lesser extent. We suspect that we could easily induce larger distortions of the predicted transition probabilities by assigning autocorrelation parameters ρj over the full range from −1 to +1 for the DIMP model instead of the [0.3 , 0.8] NU model in any substantive empirical application analyzing the dynamics of decision processes since the cost of generalizing a DMP estimation algorithm into a DIMP MA one essentially consists of a one-time implementation cost associated with modifications of the Cholesky decomposition of the corresponding joint covariance 6 TE D required for GHK(-EIS). Conclusion We have proposed a new specification for the multinomial multiperiod model with autocorrelated errors which is invariant with respect to the selection of the base- AC CE P peer-00628315, version 1 - 2 Oct 2011 interval used here. Therefore, it appears prudent to initially estimate a DIMP line alternative, in sharp contrast with commonly used models. Furthermore, it identifies parameters which are not identified under the standard approach (such as the parameter governing the dynamics of the utility for the reference category). The formal identification of the proposed invariant model has been illustrated by MC experiments. Since our model also nests the standard specifications as special cases, it becomes feasible to test whether it simplifies into a standard model for a particular baseline alternative. 31 ACCEPTED MANUSCRIPT IPT Acknowledgement We are grateful to two anonymous referees for their helpful comments which have produced major clarifications on several key issues. Roman Liesenfeld ac- SC R knowledges research support provided by the Deutsche Forschungsgemeinschaft (DFG) under grant HE 2188/1-1; Jean-François Richard acknowledges the research support provided by the National Science Foundation (NSF) under grant NU MA TE D AC CE P peer-00628315, version 1 - 2 Oct 2011 SES-0516642. 32 ACCEPTED MANUSCRIPT IPT References Andrews, D.W.K., Ploberger, W., 1994. Optimal tests when a nuisance parameter is present under the alternative. Econometrica 62, 1383–1414. SC R Bunch, D.S., 1991. Estimability in the multinomial probit model. Transportation Research B 25, 1–12. Börsch-Supan, A., Hajivassiliou, V., Kotlikoff, L., Morris, J. 1990. Health, children, and elderly living arrangements: a multiperiod multinomial probit model NU No. 3343. Davis, R.B., 1977. Hypothesis testing when a nuisance parameter is present only MA under the alternative. Biometrika 64, 247–254. Davis, R.B., 1987. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74, 33–43. TE D Geweke, J., 1991. Efficient simulation from the multivariate normal and studentt distributions subject to linear constraints. Computer Science and Statistics: Proceedings of the Twenty-Third Symposium on the Interface, 571–578. Geweke, J., Keane, M., Runkle, D. 1997. Statistical inference in the multinomial multiperiod probit model. Journal of Econometrics 80, 125–165. AC CE P peer-00628315, version 1 - 2 Oct 2011 with unobserved heterogeneity and autocorrelated errors. NBER working paper Hajivassiliou, V., 1990. Smooth simulation estimation of panel data LDV models. Mimeo. Yale University. Hansen, B., 1996. Inference when a nuisance parameter is not identified under the null hypothesis. Econometrica 64, 413–430. Keane, M., 1992. A note on identification in the multinomial probit model. Journal of Business & Economics Statistics 10, 193–200. 33 ACCEPTED MANUSCRIPT Keane, M., 1994. A computationally practical simulation estimator for panel data. IPT Econometrica 62, 95–116. Keane, M., 1997. Modeling heterogeneity and state dependence in consumer choice SC R behavior. Journal of Business and Economic Statistics 15, 310–327. Liesenfeld, R., Richard, J.-F., 2009. Efficient estimation of Probit models with correlated errors. Working paper, University of Kiel. Richard, J.-F., Zhang, W., 2007. Efficient high-dimensional importance sampling. NU MA TE D AC CE P peer-00628315, version 1 - 2 Oct 2011 Journal of Econometrics 141, 1385–1411. 34 ACCEPTED MANUSCRIPT diff. data sets ML-GHKEIS .495 .063 .063 .049 true .500 fixed data set/diff. CRNs MLvalue .537 SC R mean std. dev. rmse mean asy. s.e. ML-GHKEIS .539 .0009 .0023 mean std. dev. rmse mean asy. s.e. −1.000 −1.008 .083 .084 .066 ψ mean std. dev. rmse mean asy. s.e. 1.000 1.003 .036 .036 .030 λ12 mean std. dev. rmse mean asy. s.e. .521 λ22 mean std. dev. rmse mean asy. s.e. 1.288 b1 mean std. dev. rmse mean asy. s.e. −.134 −.139 .095 .095 .068 b2 mean std. dev. rmse mean asy. s.e. −.316 −.313 .099 .099 .072 −.427 −.418 .0040 .0098 mean std. dev. rmse mean asy. s.e. .800 .802 .029 .029 .024 .775 .775 .0008 .0008 ρ1 −.961 −.968 .0027 .0079 1.020 1.024 .0009 .0044 .523 .087 .087 .066 .550 .552 .0051 .0054 1.288 .073 .073 .056 1.291 1.296 .0026 .0059 TE D MA NU π02 AC CE P peer-00628315, version 1 - 2 Oct 2011 Parameter π01 IPT Table 1. ML-EIS-GHK estimates for the DIMP model 35 .0003 −.004 .0024 .0051 Table 1. Continued Parameter ρ2 mean std. dev. rmse mean asy. s.e. .300 .245 .160 .169 .110 MLvalue .687 .317 ML-GHKEIS .686 .0019 .0021 .302 .0055 .0166 TE D MA NU NOTE: The reported numbers for ML-GHK-EIS are mean, standard deviation, RMSE, and the mean of the asymptotic standard errors obtained for S = 20. The asymptotic standard errors are obtained from a numerical approximation to the Hessian. For the experiment with a fixed data set and different CRNs, RMSE is computed around the true ML value for that particular data set. The true ML values are the ML-GHK-EIS estimates based on S = 1000. AC CE P peer-00628315, version 1 - 2 Oct 2011 ρ3 mean std. dev. rmse mean asy. s.e. ML-GHKEIS .594 .066 .066 .045 true .600 fixed data set/diff. CRNs SC R diff. data sets IPT ACCEPTED MANUSCRIPT 36 ACCEPTED MANUSCRIPT IPT Table 2. ML-EIS-GHK estimates for the DIMP/DMP model mean std. dev. rmse mean asy. s.e. true .500 −1.000 ψ mean std. dev. rmse mean asy. s.e. 1.000 λ12 mean std. dev. rmse mean asy. s.e. .521 λ22 mean std. dev. rmse mean asy. s.e. 1.288 b1 mean std. dev. rmse mean asy. s.e. b2 ρ1 −1.043 .088 .098 .073 true .500 ML-GHKEIS .515 .064 .066 .051 −1.000 −.994 .116 .116 .070 1.005 .036 .037 .031 1.000 1.001 .047 .047 .032 .423 .094 .136 .075 .223 .245 .079 .081 .065 1.264 .072 .076 .060 1.137 1.141 .093 .093 .058 −.134 .000 −.010 .410 .410 .129 mean std. dev. rmse mean asy. s.e. −.316 .000 .004 .247 .247 .143 mean std. dev. rmse mean asy. s.e. .800 .800 .810 .082 .082 .025 NU mean std. dev. rmse mean asy. s.e. ML-GHKEIS .469 .063 .071 .052 TE D MA π02 AC CE P peer-00628315, version 1 - 2 Oct 2011 Parameter π01 true model: DMP3 / est. model: DIMP SC R true model: DIMP / est. model: DMP3 .726 .017 .075 .014 37 Table 2. Continued Parameter ρ2 ML-GHKEIS .537 .035 .072 .029 mean std. dev. rmse mean asy. s.e. true .600 NU .300 ML-GHKEIS .604 .052 .052 .037 .189 .891 .898 .346 TE D MA NOTE: The reported numbers for ML-GHK-EIS are mean, standard deviation, RMSE, and the mean of the asymptotic standard errors obtained for S = 20. The asymptotic standard errors are obtained from a numerical approximation to the Hessian. AC CE P peer-00628315, version 1 - 2 Oct 2011 ρ3 mean std. dev. rmse mean asy. s.e. true .600 true model: DMP3 / est. model: DIMP SC R true model: DIMP / est. model: DMP3 IPT ACCEPTED MANUSCRIPT 38 ACCEPTED MANUSCRIPT ∆Zit1 = 1 jit = 1 jit = 2 jit = 3 jit = 1 DIMP .207 −.045 −.162 −.085 DMP3 .206 −.042 −.164 −.51 −6.82 1.24 −.079 −6.51 AC CE P TE D MA NU Difference in percent peer-00628315, version 1 - 2 Oct 2011 ∆Zit2 = 1 39 jit = 2 jit = 3 .171 −.086 SC R Alternative IPT Table 3. Partial effects of a change in Zitj on the selection probabilites .172 −.093 .47 7.34 ACCEPTED MANUSCRIPT jit = 1 jit = 2 jit−1 = 1 DIMP DMP3 Difference in percent .781 .788 .86 jit−1 = 2 DIMP DMP3 Difference in percent .268 .347 29.65 jit−1 = 3 DIMP .300 .100 DMP3 .267 .089 Difference in percent −11.08 −10.63 40 jit = 3 .180 .167 −7.36 SC R .039 .045 16.68 .350 .382 .332 .321 −5.09 −16.11 NU MA TE D AC CE P peer-00628315, version 1 - 2 Oct 2011 Alternative IPT Table 4. Transition probabilities .600 .644 7.31 ACCEPTED MANUSCRIPT ∆Zit1 = 1 ∆Zit2 = 1 jit = 1 jit = 2 jit = 3 jit = 1 jit = 2 jit = 3 −.043 −.042 −3.53 DIMP DMP3 Difference in percent .099 .096 −3.27 −.019 −.080 −.022 −.074 14.78 −7.55 −.033 −.046 40.06 .076 .088 15.37 jit−1 = 2 DIMP DMP3 Difference in percent .121 .135 11.32 −.048 −.059 24.20 −.045 −.072 59.65 .188 −.142 .187 −.115 −.28 −19.35 jit−1 = 3 DIMP DMP3 Difference in percent .133 .122 −8.44 −.026 −.107 −.021 −.101 18.90 −5.88 SC R jit−1 = 1 −.074 −.076 3.02 NU MA TE D AC CE P peer-00628315, version 1 - 2 Oct 2011 Alternative IPT Table 5. Partial effects of a permanent change in Zitj on the transition probabilities 41 −.042 .148 −.043 .142 1.85 −4.36 −.106 −.100 −6.79 NU MA TE D Figure 1. Histogram of the Wald statistic of H0 : DMPj for j = 1 (upper left panel), j = 2 (upper right panel), j = 3 (lower left panel) and histogram of the likelihood-ratio statistic for H0 : DMP3 (lower right panel); The true data generating process is the DIMP model as given by Equations (63) to (65). The vertical line marks the 99%-percentile of a χ2 -distribution with 2 degrees of freedom given by 9.21. AC CE P peer-00628315, version 1 - 2 Oct 2011 SC R IPT ACCEPTED MANUSCRIPT 42 NU MA TE D Figure 2. Histogram of the Wald statistic of H0 : DMPj for j = 1 (upper left panel), j = 2 (upper right panel), j = 3 (lower left panel); The true data generating process is the DMP3 model. The vertical line marks the 99%-percentile of a χ2 -distribution with 2 degrees of freedom given by 9.21. AC CE P peer-00628315, version 1 - 2 Oct 2011 SC R IPT ACCEPTED MANUSCRIPT 43