Author manuscript, published in "Journal of Econometrics 155, 2 (2009) 117"
DOI : 10.1016/j.jeconom.2009.09.021
Accepted Manuscript
Dynamic invariant multinomial probit model: Identification, pretesting
and estimation
peer-00628315, version 1 - 2 Oct 2011
Roman Liesenfeld, Jean-François Richard
PII:
DOI:
Reference:
S0304-4076(09)00234-6
10.1016/j.jeconom.2009.09.021
ECONOM 3251
To appear in:
Journal of Econometrics
Received date: 7 September 2007
Revised date: 7 July 2009
Accepted date: 24 September 2009
Please cite this article as: Liesenfeld, R., Richard, J.-F., Dynamic invariant multinomial probit
model: Identification, pretesting and estimation. Journal of Econometrics (2009),
doi:10.1016/j.jeconom.2009.09.021
This is a PDF file of an unedited manuscript that has been accepted for publication. As a
service to our customers we are providing this early version of the manuscript. The manuscript
will undergo copyediting, typesetting, and review of the resulting proof before it is published in
its final form. Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
IPT
ACCEPTED MANUSCRIPT
SC
R
Dynamic Invariant Multinomial Probit Model:
Identification, Pretesting and Estimation
Roman Liesenfeld∗
Department of Economics, Christian Albrechts Universität, Kiel, Germany
NU
July 7, 2009
MA
Abstract
TE
D
We present a new specification for the multinomial multiperiod Probit model
with autocorrelated errors. In sharp contrast with commonly used specifications,
ours is invariant with respect to the choice of a baseline alternative for utility
differencing. It also nests these standard models as special cases, allowing for
data based selection of the baseline alternatives for the latter. Likelihood evaluation is achieved under an Efficient Importance Sampling (EIS) version of the
standard GHK algorithm. Several simulation experiments highlight identification, estimation and pretesting within the new class of multinomial multiperiod
Probit models.
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
Jean-François Richard
Department of Economics, University of Pittsburgh, USA
JEL classification: C35, C15
Keywords: Discrete choice, Efficient Importance sampling, Invariance, MonteCarlo integration, Panel data, Simulated maximum likelihood;
Contact author: R. Liesenfeld, Institut für Statistik und Ökonometrie, Christian-AlbrechtsUniversität zu Kiel, Olshausenstraße 40-60, D-24118 Kiel, Germany; E-mail: liesenfeld@statecon.uni-kiel.de; Tel.: +49-(0)431-8803810; Fax: +49-(0)431-8807605.
∗
ACCEPTED MANUSCRIPT
Introduction
IPT
1
In this paper we revisit the Dynamic Multinomial (multiperiod) Probit model
(hereafter DMP). DMP models offer a flexible and operational framework for
SC
R
analyzing correlated sequences of discrete choices such as living arrangement decisions for elderlies (Börsch-Supan et al., 1990) or the brand choices in successive
purchases (Keane, 1997).
The standard DMP specification commonly used in the literature initially
NU
selected a priori among all available alternatives. It then assumes that the error terms associated with these differences follow a stationary diagonal AR(1)
MA
process. One common interpretation of that approach amounts to treating the
selected baseline utility as non-random – see, e.g., Börsch-Supan et al., (1990)
or Geweke et al. (1997). However, as we shall discuss below, the standard DMP
model suffers from a major drawback in that it is not invariant with respect to
TE
D
the choice of the baseline alternative. Specifically, DMP models derived under
different baseline alternatives are non-nested and their respective parameterizations are not one-to-one transformations of one another. It follows that results
(estimations or test statistics) derived under different baseline alternatives are
mutually incompatible and, therefore, not easily comparable.
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
expresses all utilities in differences from that of a baseline alternative which is
In the present paper we propose a dynamic version of the multinomial probit model which is specified in terms of utilities prior to differencing. It still
relies upon an arbitrary baseline alternative in order to construct the likelihood
function. However, parameters associated with different selections of baseline
alternative will be in one-to-one correspondence with one another. Whence, our
specification will be invariant with respect to that selection.
1
ACCEPTED MANUSCRIPT
In addition, our Dynamic Invariant Multinomial Probit model (hereafter DIMP)
IPT
offers the critical advantage that it actually nests all DMP versions thereof, corresponding to different baseline categories. Whence it becomes trivial to test
whether an initial DIMP model simplifies into a DMP model for a particular
SC
R
baseline alternative (whose selection is now data based instead of arbitrary).
Last but not least, the Monte Carlo (MC) evaluation of the likelihood function
of the DIMP model is not more demanding than that of the standard DMP model.
For the likelihood evaluation of both specification one can rely on very similar im-
NU
Hajivassiliou (1990) and Keane (1994). Actually, in the present paper we shall
rely upon a numerically more Efficient Importance Sampling version of the GHK
feld and Richard (2009).
MA
algorithm (hereafter GHK-EIS) as developed in the companion paper by Liesen-
Invariance, nesting of DMPs, similar ease of computation offer strong incen-
TE
D
tives for the adoption of our proposed DIMP specification by practitioners. In
particular, it allows for pretesting of whether a DIMP model can be subsequently
simplified into a standard DMP model under data based selection of a baseline
alternative.
The remainder of the paper is organized as follows. In Section 2, we use a sim-
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
plementations of the GHK probability simulator as developed by Geweke (1991),
ple bivariate example in order to introduce some of the key features of the DIMP
model under a simplified notation. The general DIMP specification is introduced
in Section 3.1, followed by a discussion of its invariance (Section 3.2), identification (Section 3.3) and nesting properties (Section 3.4). Estimation in presented
in Section 4 with a brief description of GHK-EIS (Section 4.1) followed by its
application to likelihood evaluation (Section 4.2). MC experiments are presented
in Section 5: First a correctly specified DIMP (Section 5.1), next a misspecified
2
ACCEPTED MANUSCRIPT
DMP (Section 5.2) and finally a sample-based pretesting of a correctly specified
Introductory example
SC
R
2
IPT
DMP (Section 5.3). Section 6 concludes.
Consider the case where there are only two categories with utilities given by
NU
ut1
Ut = = µ (xt ; β) + εt ,
ut2
(1)
where µ(·) is momentarily left unspecified and εt follows a stationary AR(1)
process
ηt ∼ N2 (0, Σ).
MA
εt = Rεt−1 + ηt ,
(2)
Assume that we only observe the difference Yt = d′ Ut with d′ = (1, −1), or later
TE
D
only its sign. The following related three questions are central to our paper:
(i) Which parameters remain identified?
(ii) Under what conditions would d′ εt itself follow a stationary AR(1) process?
(iii) What would be the consequences of incorrectly assuming that d′ εt follows
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
an AR(1) process?
For the ease of exposition we initially consider the case when R is diagonal with diagonal elements ρ1 and ρ2 (with |ρi | < 1). Selecting an appropriate
(re)parametrization helps clarifying the issues under consideration. Since the
transformation from εt to d′ εt implies a reduction in dimensionality from 2 to
1 and, therefore, an (implicit) marginalization, we first introduce the auxiliary
3
ACCEPTED MANUSCRIPT
1 −1
Q=
,
0 1
SC
R
et
ε∗t = = Q εt ,
εt2
IPT
non-singular transformation
(3)
with et = d′ εt = εt1 − εt2 . Note that ε∗t follows the stationary AR(1) process
ε∗t = R∗ ε∗t−1 + ηt∗ ,
ηt∗ ∼ N2 (0, Σ∗ ) ,
NU
R∗ = QRQ−1
and
Σ∗ = QΣQ′ .
(5)
MA
Let Φ = (φij ) denote the stationary covariance matrix of εt and Φ∗ that of ε∗t
Φ∗ = QΦQ′ .
(6)
TE
D
The most relevant parametrization is that which is associated with the factorization of the stationary density of ε∗t into a marginal density for et and a
conditional density for εt2 |et . Whence, Φ∗ is re-parameterized as
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
with
(4)
b2 Ψ
Ψ
Φ∗ =
,
b2 Ψ υ 2 + b22 Ψ
(7)
with Ψ > 0, υ 2 > 0 and b2 ∈ R. For the ease of reference, the relationships
between the successive parameterizations just introduced are given by
φ11 =
σ12
,
1 − ρ21
φ12 =
σ12
,
1 − ρ1 ρ2
4
φ22 =
σ22
,
1 − ρ22
(8)
ACCEPTED MANUSCRIPT
υ2 =
b2 =
φ21 − φ22
,
φ11 + φ22 − 2φ12
φ11 φ22 − φ212
.
φ11 + φ22 − 2φ12
IPT
Ψ = φ11 + φ22 − 2φ12 ,
(9)
(10)
SC
R
For obvious reasons of symmetry we shall also consider the (stationary) regression coefficient of εt1 , on (εt2 − εt1 ) which is given by
b1 =
φ21 − φ11
= − (1 + b2 ) .
φ11 + φ22 − 2φ12
(11)
NU
ρ2 ) together with β. The identification for β is standard and has to be achieved by
means of restrictions on the difference µ1 (xt , β) − µ2 (xt , β) while υ 2 which repre-
MA
sents the variance of the conditional distribution of the utility error term εt2 given
et is clearly unidentified. We are left discussing identification of (Ψ, b2 , ρ1 , ρ2 ).
Equations (5) to (7) imply that the stationary distribution of et is characterized
TE
D
by the following moments:
Var (et ) = Ψ,
(12)
µ
¶
1
∗
s
Cov (et−s , et ) =
1 0 (R∗ ) Φ∗ = γs Ψ,
0
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
The parametrization used for the rest of the discussion consists of (Ψ, b2 , υ 2 , ρ1 ,
(13)
with
γs∗ = [ρs1 + b2 (ρs1 − ρs2 )] .
(14)
Identification results for Ψ are standard. If d′ Ut is observed, then Ψ is identified. If only the sign of d′ Ut is observed then it is identified only up to a constant
and this indeterminacy is typically resolved by setting Ψ = 1. If b2 = 0, then
ρ1 is identified. If b2 = −1 (b1 = 0), then ρ2 is identified. Otherwise, the triples
5
ACCEPTED MANUSCRIPT
(ρ1 , ρ2 , b2 ) and (ρ2 , ρ1 , b1 ) are observationally equivalent, in which case (ρ1 , ρ2 , b2 )
IPT
are locally but not globally identified. However, as we shall discuss below, global
identification of the ρs and bs obtains from the diagonal elements of Cov(et−s , et )
zero).
SC
R
when the dimension of et is greater than 1 (except on a subspace of measure
It also follows from Equation (14) that if any of the following three conditions
hold: (i) b1 = 0; (ii) b2 = 0; (iii) ρ1 = ρ2 , then γs∗ = (γ1∗ )s and et follows the
NU
et = γ1∗ et−1 + λt .
(15)
If neither of these conditions hold, then while it still is the case that Cov(λt , et−1 ) =
MA
0 by construction, the higher order Cov(λt , et−s ) are non zero for s > 1. In other
words, λt is no longer an innovation relative to {eτ }t−2
τ =1 and et no longer follows
an AR(1) process. Furthermore, even though |ρi | < 1, it does not even follow
that |γ1∗ | < 1 since b2 ∈ R. Actually γ1∗ is then unrestricted. Note that the first-
TE
D
order covariance associated with γ1∗ in Equation (15) does not suffice to identify
the ρs and bs by itself. As we shall formally demonstrate in Section 3.3 below,
identification of these parameters requires taking into consideration the higher
order covariances.
The consequences of erroneously assuming that et follows an AR(1) process
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
stationary AR(1) process
when neither of the conditions listed above hold depend upon the nature of the
T
were observed, then we would be discussing a standard
observations. If {d′ Ut }t=1
Generalized Least Squares miss–specification and the ML estimate of β would
remain consistent but would be inefficient. Since, furthermore, it need not be the
case that |γ1∗ | < 1, we might be lead to erroneously conclude that the process
is non–stationary. If, as commonly the case for discrete choice applications, we
6
ACCEPTED MANUSCRIPT
only observed the signs of the differences d′ Ut , then the Hessian would no longer
IPT
be block diagonal and all parameter estimates would be inconsistent.
Foremost, we shall see that the ρs and bs are identified except on a zeromeasure subspace and that the (EIS-)GHK algorithm applies as discussed below
SC
R
to the more general model. Whence, there is no need to impose a priori the
restriction that d′ εt follows an AR(1) process, which eventually can be pretested
within the more general model introduced in Equations (1) and (2).
Actually, we might not even restrict R to be diagonal. Let R = (rij ). The
NU
(16)
MA
∗
∗
r11 r12 r11 − r12 r11 + r12 − (r22 + r21 )
.
=
∗
∗
r21
r22
r21
r22 + r21
The coefficients γs∗ in Equation (13) then obtain from the recursion
TE
D
∗
∗
γs−1
γs
,
= R∗
cs−1
cs
s > 0,
(17)
∗
= 0 is sufficient
initialized by γ0∗ = 1 and c0 = b1 . It follows that the condition r12
for et to follow an AR(1) process (the conditions b1 = 0 or b2 = 0 are no longer
∗
∗
sufficient if r12
6= 0). Actually, if r12
= 0, then it follows from Equations (4) and
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
corresponding elements for R∗ are then given by
(16) that
et = (r11 − r12 ) et−1 + (η2t − η1t ) .
7
(18)
ACCEPTED MANUSCRIPT
Dynamic Invariant Multinomial Probit (DIMP)
3.1
IPT
3
Model
We now discuss the higher dimensional version of the pilot model introduced in
SC
R
Section 2. Let Ut′ = (ut1 , . . . , utJ ) denote a J-dimensional vector of utilities at
time t. We assume that Ut evolves according to the stationary process
εt | εt−1 ∼ NJ (Rεt−1 , Σ) ,
NU
(19)
where µt = µ(xt ; β) is left unspecified since the focus of our analysis lies in the
second order moments of the process and of its subsequent transformations. The
MA
stationary covariance matrix of εt denoted by Φ satisfies the identity
Σ = Φ − RΦR′ .
(20)
TE
D
The standard approach consists of selecting a baseline alternative, say alternative J, and of expressing all utilities in difference from utJ . As in Section 2,
we initially complete that transformation into the following non-singular transformation:
Yt D
Ut∗ = = Ut = QUt ,
utJ
i′J
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
U t = µt + ε t ,
(21)
where D is the (J − 1) × J matrix
.
D = (IJ−1 .. − ιJ−1 ),
(22)
ι′J−1 = (1, . . . , 1) and i′J is the unit vector (0, . . . , 0, 1). The transformed system
8
ACCEPTED MANUSCRIPT
is given by
with
ε∗t = Qεt ,
Σ∗ = QΣQ′ , R∗ = QRQ−1 ,
IJ−1 ιJ−1
=
.
0
1
SC
R
µ∗t = Qµt ,
IPT
¢
¡
ε∗t | ε∗t−1 ∼ NJ R∗ ε∗t−1 , Σ∗ ,
Ut∗ = µ∗t + ε∗t ,
Q−1
(23)
(24)
(25)
NU
¡
¢
Cov ε∗t−s , ε∗t = (R∗ )s Φ∗ .
Φ∗ = Var (ε∗t ) = QΦQ′ ,
(26)
MA
As in Section 2, we partition Φ∗ as follows
TE
D
Ψb
Ψ
Φ∗ =
.
b′ Ψ υ 2 + b′ Ψb
(27)
The stationary moments of et = Dεt are given by
Var (et ) = Ψ,
µ
¶
.
IJ
Cov (et−s , et ) = IJ .. 0 (R∗ )s Φ∗ = Γ∗s Ψ,
0
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
The stationary covariance matrix of {ε∗t }Tt=1 consists of the following blocks
with
Γ∗s =
µ
.
IJ
IJ .. 0 (R∗ )s .
b′
¶
(28)
(29)
(30)
In the next two sections we discuss the invariance and the identifications of the
parametrization (Ψ, b, {Γ∗s }).
9
ACCEPTED MANUSCRIPT
Invariance
IPT
3.2
Suppose we select a different baseline alternative, say utj of instead of utJ . The
simplest way to handle this change consists of permuting ujt and uJt before
SC
R
applying the Q transformation introduced in Equation (21). Whence the error
terms associated with the baseline alternative utj are given by
−1 ∗
ε∗j
t = QPj εt = QPj Q εt ,
(31)
NU
0
,
1
(32)
MA
S j
QPj Q−1 =
i′j
where Sj is the (J − 1) × (J − 1) matrix
TE
D
0
Ij−1 −ιj−1
Sj =
−1
0
,
0
0 −ιJ−j−1 IJ−j−1
with
SJ = IJ−1 ,
(33)
Sj−1 = Sj and −i′j Sj = i′j . It follows from Equation (27) and (31) that the
stationary covariance matrix of ε∗j
t is given by
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
where Pj is the permutation matrix for rows J and j. Note that
S j
Φ∗j =
i′j
Ψb Sj′ ij Ψj
0 Ψ
Ψj bj
,
=
′
′
′
′
2
2
1
bj Ψj υj + bj Ψj bj
b Ψ υ + b Ψb
0 1
(34)
with
Ψj = Sj ΨSj′ ,
bj = Sj′ (ij + b) ,
10
υj2 = υ 2 .
(35)
ACCEPTED MANUSCRIPT
S j
(R∗j )s =
i′j
It follows that
Γ∗js =
µ
S 0
0
s j
.
(R∗ )
′
ij 1
1
SC
R
IPT
Moreover (since − i′j Sj = i′j ),
.
IJ
IJ .. 0 (R∗j )s = Sj Γ∗s Sj .
b′
¶
(36)
(37)
NU
−1
(Ψj , bj , υj2 , {Γ∗js }Ts=1
) associated with baseline alternative j are in one–to–one cor−1
respondence with the parameters (Ψ, b, υ 2 , {Γ∗s }Ts=1
) associated with baseline al-
MA
ternative J. Whence ML estimators of the multinominal probit model introduced
in Equation (19) will be invariant w.r.t. the choice of the baseline alternative.
In sharp contrast, the standard dynamic multinominal probit model which
assumes a diagonal autocorrelation matrix for the differences relative to a partic-
TE
D
ular alternative is not invariant relative to the choice of that alternative unless
all correlations are equal. Specifically, assume that R∗ is diagonal, say
R∗ = Diag (ρ∗i ) ,
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
The immediate implication of Equations (35) and (37) is that the parameters
i = 1, . . . , J.
(38)
Then it is trivial to verify that R∗j which obtains from equation (36) with
s = 1 is not diagonal unless
3.3
R∗ = ρ∗ IJ .
(39)
Identification
Next, we discuss identification of the DIMP model in Equation (19) when observations consist of {xt , jt }Tt=1 , where jt denotes the index of the alternative
11
ACCEPTED MANUSCRIPT
selected at time t. Following our discussion in Section 3.1, we can arbitrarily
IPT
select a baseline alternative, say alternative J. Since the transformations associated with permutations of the alternatives are one-to-one we only need to discuss
identification of the baseline parameters. The parameters whose identification
SC
R
needs to be discussed are (Ψ, b, υ 2 , β, R).
Identification of β has been extensively discussed in the literature – see, e.g.,
Bunch (1991) and Keane (1992) – and need not be considered further here. Moreover, under scale normalization of the mean vector µ(xt , β), the stationary co-
NU
left discussing identification of (b, R).
Actually, the likelihood function depends upon the sequence of auxiliary re-
MA
−1
gression matrices {Γ∗s }Ts=1
introduced in Equation (30) which are clearly over–
identified functions of b and R. (This will not be a problem for ML estimation
since optimization will be conducted in terms of b and R themselves.) Since the
TE
D
relationship between (b, R) and {Γ∗s } is highly non-linear we cannot expect global
identification if R is left unconstrained (beyond stationarity). Local identification
might be considered though in the present paper, we choose to focus our attention
on the case where R is diagonal, for which we can establish global identification
under simple conditions. Whence let
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
variance matrix Ψ in Equation (27) is identified while υ 2 is not. Whence, we are
R = Diag (ρi ) ,
i = 1, . . . , J,
12
(40)
ACCEPTED MANUSCRIPT
s
ρ1
0
∗
Γs = .
..
0
0
ρs2
..
.
0
ρs1
ρsJ
−
0
s
s
···
0 ρ2 − ρJ
+
· (b1 , b2 , . . . , bJ−1 ),
..
..
.
.
s
s
s
ρJ−1 − ρJ
· · · ρJ−1
···
(41)
SC
R
IPT
in which case Γ∗s in Equation (30) is given by
where the bj s are the individual elements of b. The following special cases are
NU
identified. In order to avoid dealing with all possible combinations of these special cases, we first proceed to a reduction of the matrices {Γ∗s } by eliminating all
rows i for which ρi = ρJ and all columns j for which bj = 0. Moreover, we also
MA
permute alternatives in such a way that the remaining JR rows and JC columns
are regrouped in the leading JR × JC blocks of {Γ∗s }. It follows that all bj s for
j > JC and all ρi s for i > min(JR , JC ) are identified. The following theorem then
TE
D
provides sufficient condition for global identification of (b, R):
Theorem. If (i) JR > 1; (ii) JC > 0; (iii) either there exist a pair ρi 6= ρi′
(i, i′ ≤ JR ) or a bj 6= − 21 (j ≤ JC ), then (b, R) are globally identified.
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
trivial: (i) If ρi = ρJ , then ρi is identified; and (ii) if bj = 0, then (ρj , bj ) are
Proof. Since we can permute alternatives, we need only to consider the leading
(2 × 1) block of the reduced {Γ∗s }, say
s
ρ 1
+
b1 (ρs1
−
b1 (ρs2 − ρsJ )
13
ρsJ )
.
ACCEPTED MANUSCRIPT
ρs1 + b1 (ρs1 − ρsJ ) ≡ ρsJ − (1 + b1 )(ρsJ − ρs1 ).
IPT
As was the case for γs∗ in Equation (14) we note that
SC
R
If, however, ρ1 6= ρ2 or ρ1 = ρ2 but b1 6= −1/2 (in which case −(1 + b1 ) 6= −b1 ),
the off-diagonal element (2, 1) prevents permuting ρ1 and ρJ . 2
3.4
Reinterpreting the standard DMP
NU
1997) requires selecting a baseline alternative, say alternative J, expressing all
utilities in deviation from utJ and assuming that the differences et = Dεt fol-
MA
low a diagonal AR(1) process. This model is clearly not invariant with respect
to the choice of the baseline alternative. Worse, in the absence of additional
assumptions on the underlying utility process the DMP models associated with
TE
D
different baseline alternatives are non-nested with one another. This probably
explains why, to the best of our knowledge, data based selection of the baseline
alternative has not been discussed in the literature.
It turns out that our discussion of invariance and identification enables us
to reinterpret the standard DMP model as one which is explicitly nested within
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
The standard DMP model (see e.g. Börsch-Supan et al., 1990 or Geweke et al.,
a DIMP with diagonal R matrix. The key to such reinterpretation lies with
Equation (41) which indicates that a diagonal R matrix also produces diagonal
Γ∗s matrices if b is zero. However, in such a case the matrices Γ∗js associated with
baseline alternative j 6= J are not diagonal – confirming the lag of invariance of
14
ACCEPTED MANUSCRIPT
s
ρ1
0
∗
Γjs = .
..
0
0
ρs2
..
.
0
ρs1
ρsj
−
0
s
s
...
0 ρ2 − ρj ′
−
ij ,
..
..
.
.
s
s
s
ρJ−1 − ρj
. . . ρJ−1
...
(42)
SC
R
IPT
the standard DMP model. Instead, following Equation (37) they are given by
which leaves ρJ as the sole unidentified autocorrelation coefficient, irrespective of
NU
– see also Equation (35) – indicates that the vector bj associated with baseline
alternative j 6= J is given by
MA
bj = Sj′ ij = −ij .
(43)
This reinterpretation of the standard DMP model has fundamental impli-
TE
D
cations for practitioners. Specifically, once the standard DMP model is nested
within a DIMP model with diagonal R matrix, it is no longer advisable to select
a priori (arbitrarily) a baseline alternative, say J, and to impose the assumption
that the corresponding Γ∗s matrices be diagonal. Instead, one can use the DIMP
as maintained hypothesis under a parametrization (Ψ, b, R) which is fundamen-
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
the baseline alternative. A direct comparison between Equations (41) and (42)
tally invariant with respect to J. It is then trivial to test whether this DIMP
simplifies into a standard DMP for baseline alternative j denoted by DMPj . The
corresponding null hypotheses are given by
H0j (“DMPj ”) : bj = 0,
for j 6= J,
and
H0J (“DMPJ ”) : b = 0, (44)
all of which are trivially nested within the DIMP. In view of Equation (35) these
15
ACCEPTED MANUSCRIPT
H0j : b = −ij ,
for j 6= J,
and
IPT
null hypotheses can all be reformulated in terms of b
H0J : b = 0.
(45)
SC
R
Nesting DMPj s within a DIMP with diagonal R matrix removes all ambiguities
as well as arbitrariness associated with the conventional interpretation of DMP
models. Note, in particular, the complete disconnect between the selection of
alternative J for parametrization purposes (with full invariance with respect to J)
NU
based tests of H0j are clearly invariant with respect to J).
The fact that under the null hypotheses H0j : DMPj (j = 1, ..., J) the pa-
MA
rameter ρJ is not identified poses the problem that the asymptotic distribution
of standard tests like the likelihood ratio (LR), Lagrange multiplier (LM), and
Wald test are nonstandard, which means that the conventional critical values
cannot be used to assess significance (see, e.g., Davies 1977,1987). In order to
TE
D
derive asymptotic optimal tests for such testing problems, Davies (1977, 1987)
and Andrews and Ploberger (1994) consider mappings (supremum and average)
of the LR, LM, and Wald statistics as functions in the nuisance parameter which
is unidentified under the null hypothesis, while Hansen (1996) suggests a useful computational method for simulating appropriate asymptotic critical values.
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
and the eventual subsequent data based selection of particular DMPj (likelihood
The extension of these methods to our nonlinear simulated Maximum Likelihood
(ML) framework seems to be conceptionally straightforward but is beyond the
scope of the present paper and is left for future research. A simple but computationally more demanding alternative to those asymptotic methods is to perform
bootstrap versions of the LR, LM, and Wald test, where the model under consideration is used to simulate the finite-sample distribution of the corresponding
16
ACCEPTED MANUSCRIPT
4
IPT
test statistics under the null hypothesis.
Estimation
SC
R
Before we formally discuss simulated ML estimation, it should be noted that since
DMP models are nested within DIMP models, the analysis which follows applies
identically to both classes of models. The only difference is that a DIMP model
includes the J additional parameters (b, ρJ ). This justifies our earlier claim of
NU
well as DIMPs.
GHK and GHK-EIS
MA
4.1
Let y ′ = (y1 , . . . , yM ) denote a M -dimensional Normal random vector with mean
TE
D
µ and covariance matrix V :
y = µ + Lη,
η ∼ NM (0, IM ) ,
V = LL′ ,
(46)
where L denotes the lower triangular Cholesky decomposition of V . The probability to be computed is that of the event D = {yi < 0; i = 1, . . . , M }. Let
ℓ′τ = (γτ′ , δτ ) denote the τ th (lower triangular) row of L with γτ ∈ Rτ −1 and
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
similar ease of computation, and the analysis which follows applies to DMPs as
δτ > 0. The τ th row of y is then given by
yτ = µτ + γτ′ η(τ −1) + δτ ητ ,
17
(47)
ACCEPTED MANUSCRIPT
P (D) =
M
Y
Z
IPT
′
with η(τ
−1) = (η1 , . . . , ητ −1 ) and η(0) = ∅. It follows that
ϕτ (η(τ ) )dη,
RM τ =1
ϕτ (η(τ ) ) = I(ητ < −
SC
R
with
1
[µτ + γτ′ η(τ −1) ]) · φ(ητ ),
δτ
(48)
(49)
where I denotes the indicator function and φ the standardized normal density.
importance sampling density of the form
mτ (ητ |η(τ −1) ; aτ ),
(50)
MA
m(η; a) =
M
Y
NU
in Liesenfeld and Richard (2009). In short they rely upon a sequential parametric
τ =1
with a′ = (a1 , . . . , aM ) ∈ A = ×M
τ =1 Aτ . The density mτ obtains from an auxiliary
TE
D
density kernel kτ (η(τ ) ; aτ ) with known functional integral χτ (η(τ −1) ; aτ ) in ητ , i.e.,
kτ (η(τ ) ; aτ )
mτ (ητ |η(τ −1) ; aτ ) =
,
χτ (η(τ −1) ; aτ )
χτ (η(τ −1) ; aτ ) =
Z
kτ (η(τ ) ; aτ )dητ .
(51)
R
GHK-EIS relies upon a backward recursive sequence of auxiliary LS problems,
selecting values of aτ which approximately minimize the MC sampling variances
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
Both GHK and GHK-EIS are importance sampling techniques that are described
of the ratios ϕτ χτ +1 /kτ for τ = M, . . . , 1 (with χM +1 ≡ 1). If we momentarily
assume that χτ +1 is given by the product of a gaussian kernel in η(τ ) by a Normal
cdf of the form Φ(ωτ ), where ωτ is a linear combination of η(τ ) , then the only
non-Gaussian kernel in the numerator ϕτ χτ +1 is Φ(ωτ ) itself. Whence kτ obtains
by combining the various Gaussian kernels in that numerator with a Gaussian
18
ACCEPTED MANUSCRIPT
1
ln Φ(ωτ ) ≃ − (α̂τ ωτ2 + 2β̂τ ωr + κ̂τ ),
2
(52)
SC
R
with âτ = (α̂τ , β̂τ , κ̂τ ).
IPT
approximations of Φ(ωτ ) obtained by the LS approximation
Analytical integration with respect to ητ of the resulting Gaussian kernel kτ
over the range associated with the indicator function in Equation (49) produces
a (recursive) functional integral which is itself in the form of a Gaussian kernel
NU
and kτ cancel out in the ratios ϕτ χτ +1 /kτ . It follows that the GHK-EIS estimate
of P (D) is given by
(s)
where ω̃τ
MA
"M −1
#
S
(s)
Φ(ω̃τ )
1X Y
P̂s (D) = χ1 (â1 ) ·
,
S s=1 τ =1 exp − 1 (α̂τ [ω̃τ(s) ]2 + 2β̂τ ω̃τ(s) + κ̂τ )
2
(s)
(53)
(s)
S
is a linear combination of η̃(τ ) , and {{η̃τ }M
τ =1 }s=1 denotes S i.i.d.
TE
D
trajectories drawn from the EIS sampler m(η; â) as defined in Equation (51)
with â = {âτ }M
τ =1 . As discussed in Richard and Zhang (2007), the values of
(s)
the EIS parameters â and, therefore, the simulated {η̃τ }M
τ =1 trajectories depend
upon the values of the model parameters (µ, V ). Hence, continuity of the MC
(s)
S
likelihood evaluation (53) in (µ, V ) requires that all trajectories {{η̃τ }M
τ =1 }s=1
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
times a Gaussian cdf in ωτ −1 . Note that all Gaussian kernels common to ϕτ χτ +1
drawn under different values of â be obtained by a set of Common Random
Numbers (CRNs) pre-drawn from a canonical distribution, which does not depend
on the parameters â. In the present context, the CRNs consist of M × S draws
form a uniform distribution on [0, 1] to be transformed by inversion into truncated
(s)
Gaussian draws from mτ (ητ |η̃τ −1 ; âτ ).
As for GHK, it ignores the factor χτ +1 in the construction of the auxiliary
19
ACCEPTED MANUSCRIPT
Gaussian kernels kτ , which is equivalent to setting α̂τ = β̂τ = κ̂r = 0 in Equations
IPT
(52) and (53). Whence, for a given simulation sample size S GHK is intrinsically
numerically less efficient than GHK-EIS but computationally faster since it does
not require computing the auxiliary LS approximations in Equation (52). See
SC
R
Liesenfeld and Richard (2009) for full details and for numerical examples which
indicate that even if we increase the simulation sample size S for GHK in order to equate computing time, GHK-EIS remains the numerically more efficient
NU
4.2
GHK-EIS likelihood evaluation
MA
Since individuals in a probit panel data set are typically assumed to act independently from one another, we consider here a single individual for whom observations consist of a sequence of T indicators of selected alternatives {j1 , . . . , jT ; jt ∈
(1, . . . , J)} together with a T × K matrix of exogenous variables. As discussed
TE
D
in Section 3, the identified parameters are those associated with an (arbitrary)
baseline alternative J and are denoted by (Ψ, b, R, β). The likelihood function is
the joint probability of the observed sequence of choices.
The covariance matrix V of the T (J − 1) vector e′ = (e′j1 , . . . , e′jT ) with ejt =
Sjt et consists of the following blocks:
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
procedure. All computations reported below rely upon GHK-EIS.
- The variance of ejt obtain from Equation (34):
Var(ejt ) = Ψjt = Sjt ΨSj′ t
(54)
- The covariance between ejt and ejs obtain from Equation (29):
Cov(ejt , ejs ) = Sjt Γ∗t−s ΨSj′ s ,
20
t > s.
(55)
ACCEPTED MANUSCRIPT
Brute force Cholesky decomposition of the joint covariance matrix V enables
IPT
us to rewrite the DIMP model in the form given by equation (46), from which
GHK-EIS evaluation of the likelihood function proceeds as described in Section
4.1.
SC
R
As discussed in Section 3.1, D(I)MP models obtain by marginalization with
respect to the Jth component of the error term ε∗t in Equation (23). It is precisely
this operation which led to the replacement of the correlation matrices (R∗ )t−s
by Γ∗t−s in Equation (55) and to the subsequent need for brute force Cholesky
NU
position of the covariance matrix of {ε∗t }, transformed as needed to account for
{jt }, can take full advantage of the present AR(1) structure as discussed in the
MA
lemma below. Whence, an efficient algorithm would first compute that Cholesky
decomposition, and only then marginalizing with respect to the T Jth components of {ε∗t }. In order to do so, we complete the rectangular transformation from
TE
D
ε∗t ∈ RJ into ejt ∈ RJ−1 into the square transformation
Sjt 0 ∗
∗
e0jt =
εt = Kjt εt ,
0 1
(56)
= Kjt . The covariance matrix V0 of the T J-dimensional vector e′0 =
with Kj−1
t
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
decomposition of the joint covariance matrix V . In contrast, the Cholesky decom-
(e0j1 ′ , . . . , e0jT ′ ) consists of the following blocks:
¡ ¢
Var e0jt = Kjt Φ∗ Kj′t ,
¢
¡
Cov e0jt , e0js = Kjt (R∗ )t−s Φ∗ Kj′s ,
(57)
t > s.
(58)
Note that the covariance matrix V0 preserves the initial AR(1) structure (up to a
set of squared transformations), as indicated by the presence of (R∗ )t−s in Equa21
ACCEPTED MANUSCRIPT
tion (58) - instead of Γ∗t−s in Equation (55). The Cholesky decomposition of V0
IPT
obtains by application of the following lemma (deleting the subscripts 0, ∗ and j
for the ease of notation).
SC
R
Lemma. Let the T J-dimensional stationary covariance matrix V be partitioned into J-dimensional blocks of the form
(59)
NU
t ≥ s,
with Kt−1 = Kt . Let L denote the lower triangular Cholesky decomposition of V ,
partitioned conformably with V . The diagonal blocks of L obtain by the following
MA
J-dimensional Cholesky factorizations:
L11 L′11 = K1 ΦK1′ ,
with
TE
D
Ltt L′tt = Kt ΣKt′ ,
(60)
Σ = Φ − RΦR′ ,
(61)
and its off–diagonal blocks by the products
£
¤
Lts = Kt (R)t−s Ks Lss ,
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
Vts = Kt (R)t−s ΦKs′ ,
t > s.
(62)
Proof: See Liesenfeld and Richard (2009, Appendix).
Note that Kt can only take J different forms for the D(I)MP, corresponding
to each of the J different alternatives (see Equation 56). Whence the Cholesky
factorization of the T J-dimensional matrix V0 requires at most J different Jdimensional Cholesky factorizations as given in Equations (60) and (61).
The extension to V0 only requires two trivial modifications of the GHK-EIS algorithm for V : (i) Since the baseline utility uJt is not observed the corresponding
22
ACCEPTED MANUSCRIPT
integral with respect to the last component of e0jt (t = 1, . . . , T ) is un-truncated.
IPT
It produces a probability Φ(·) equal to 1 in Equations (52) and (53) with the corresponding EIS coefficients equal to zero; (ii) The stationary covariance matrix
Φ∗ as given by Equations (26) and (27) includes an additional parameter υ 2 which
without affecting the likelihood values.
5.1
NU
Monte Carlo simulations
ML-GHK-EIS estimates of the DIMP model
MA
In order to illustrate the identification and ease of estimation of DIMP models
we conducted a Monte Carlo experiment for a DIMP panel model with J = 3
alternatives, T = 10 periods and N = 500 individuals. We arbitrarily selected
alternative 3 as the baseline (remembering that DIMP models are invariant with
TE
D
respect to the selection of the baseline alternative). Since our focus of attention
lies on second order moments we specified the regression function of the utilities,
as given in Equation (19) directly in differences as
Dµit = Dµi (xt ; β) = (π1 + ψZit1 , π2 + ψZit2 )′ ,
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
5
SC
R
is unidentified and can, therefore, be set equal to any (reasonable) arbitrary value
(63)
with β ′ = (π1 , π2 , ψ) and Zitj ∼ i.i.d.N(0, 1) for t = 1, . . . , T , i = 1, ..., N and
j ∈ {1, 2}. Note that this specification represents a simplified version of that
used by Geweke et al. (1997).
The autocorrelation matrix R in Equation (19) is diagonal with elements
ρ1 = 0.8, ρ2 = 0.6 and ρ3 = 0.3. The covariance matrix Σ = [σik ] has diagonal
elements σ11 = σ22 = σ33 = 1 and off-diagonal elements σ12 = σ13 = σ23 = 0.3.
23
ACCEPTED MANUSCRIPT
is given by
SC
R
3.087
.
Φ∗ =
0.915
1.930
−0.704 −0.733 1.099
IPT
The corresponding stationary covariance matrix Φ∗ as defined in Equation (26)
(64)
Its partitioning according to Equation (27) produces the following values for
0
1.757
Λ=
0.521 1.288
0.134
b = −
.
0.316
NU
(65)
MA
Since we are leaving Dµit unnormalized, Ψ is identified only up to a scale factor.
Whence, for ease of comparison with the parameter true values we set λ11 equal
to its true value 1.757. Had we set it equal to one as commonly done, we would
1.757.
TE
D
have to divide the true values for (π1 , π2 , ψ, λ11 , λ12 , λ22 ) and their estimates by
ML-GHK-EIS estimates are computed using for the likelihood evaluations
(53) a simulation sample size S = 20 which, as we shall see, suffices to produce
high numerical accuracy in likelihood function IS estimates and corresponding
ML parameter estimates. Following Richard and Zhang (2007) we conduct two
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
Λ = [λik ], the Cholesky decomposition of Ψ, and b
sets of replications. For the first one we keep the simulated data set fixed and
replicate ML-GHK-EIS estimation 100 times under different sets of CRNs used for
the likelihood evaluations. The MC standard deviations of these replications are
direct measures of numerical accuracy (accounting in particular for the simulation
based selection of the EIS samplers). Since these replications are conducted for a
fixed data set, we also computed Root Mean Squared Errors (RMSEs) relative to
the “true” ML estimates (as computed using a simulation sample size S = 1, 000,
24
ACCEPTED MANUSCRIPT
which suffices to produce highly accurate approximations to the infeasible true
IPT
ML estimates).
Our second set of replications repeats ML-GHK-EIS under 100 different simulated data sets (using a fixed set of CRNs for all 100 replications). The resulting
SC
R
means and standard deviations are numerical estimates of the finite sample (statistical) distribution of the ML estimators. We also produced for comparison
asymptotic standard deviations based upon the inverse Hessian (since EIS likelihood estimates are sufficiently smooth to produce numerically well-behaved esti-
NU
confirm that the DIMP model is identified and statistically well-behaved, including the parameters (b1 , b2 , ρ3 ) which are not identified under the standard DMP
MA
(unless, as discussed in Section 3.4, the latter is reinterpreted as nested within
the DIMP in which case b1 = b2 = 0 a priori though ρ3 remains unidentified).
The finite sample standard deviations are close to their asymptotic counterparts.
TE
D
The numerical (MC) standard deviations are orders of magnitude smaller
than the statistical standard deviations (with ratios ranging from 0.016 to 0.050).
Though the results are not reported here, we also reproduced the entire simulation
exercise under ML-GHK. Our findings confirm results presented in Liesenfeld
and Richard (2009) in that for a common simulation sample size S = 20 ML-
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
mates of the Hessian). The results are reproduced in Table 1. They unequivocally
GHK estimates are numerically far less accurate than ML-GHK-EIS estimates
especially for the DIMP parameters (b1 , b2 , ρ1 , ρ2 , ρ3 ) with ratios of numerical
standard deviations ranging from 8 to 95. In that paper, where we provide
a more detailed comparison of GHK and GHK-EIS for the estimation of the
standard DMP model, we also produce results in which we increase the simulation
sample size of the computationally less demanding GHK from 20 to 100 in order
to roughly equalize computing time for both procedures. Doing so we found
25
ACCEPTED MANUSCRIPT
that GHK-EIS with S = 20 remains numerically more efficient than GHK with
IPT
S = 100, though with efficiency gain reductions by a factor ranging between 1.2
and 2.1 relative to the comparison for a common simulation sample size S = 20
(see Liesenfeld and Richard, 2009, Tables 2 and 4).
SC
R
For each of the 100 ML-GHK-EIS estimates of the DIMP model under different simulated data sets we computed the three Wald statistics testing whether
the DIMP simplifies into a corresponding DMPj model, which amounts to testing
the restrictions b = −ι1 (DMP1 ), b = −ι2 (DMP2 ), and b = 0 (DMP3 ). Figure 1
NU
for all three hypotheses nearly all values of the Wald statistic are significantly
larger than the ‘naive’ 1% critical value from a χ2(2) -distribution, which is used as
MA
a preliminary benchmark value. Of course, one would need, as mentioned above,
to adjust the naive critical value since it ignores the fact that we are ‘scanning’
across a range of values for the nuisance parameter ρ3 which is unidentified under
TE
D
the null. This suggests that the appropriate critical value is larger than the naive
one (see Davis, 1987). However, as shown in Section 5.3 below, the appropriate
critical value in the present context remains in the same ballpark as its naive
counterpart. Hence, the comparably large values of the Wald-statistics in Figure
1 suggest that such a Wald-type procedure is a useful tool to pretest whether
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
provides the histogram of the values for the three Wald statistics. As expected,
an initial DIMP model simplifies into a DMP model for a particular baseline
alternative.
The main conclusions to be drawn from this simulation exercise are that: (i)
DIMP models appear to be statistically well-behaved, as expected in view of the
identification theorem in Section 3.3; (ii) ML-GHK-EIS can produce numerically
accurate ML estimates with as little as S = 20 EIS draws; and (iii) a Wald test
used to pretest whether an initial DIMP model can be substituted by a standard
26
ACCEPTED MANUSCRIPT
5.2
IPT
DMP model with a particular reference category appears to have good power.
Estimates of a miss-specified DMP model
SC
R
In order to analyze the impact of erroneously assuming a DMPj model, we used
the same simulated data sets from the DIMP model as in Section 5.1 and computed the ML-GHK-EIS estimates for a miss-specified DMP3 model, i.e., the
DIMP model with the restriction b = 0 and unidentified ρ3 . The results are
NU
of the estimates for the mean parameters π01 , π02 , and ψ are comparably small
while, in contrast, the estimates for the covariance parameters λ12 , ρ1 , ρ2 are
MA
typically severely biased due to the miss-specification.
Based on the likelihood values of the miss-specified DMP3 model at the parameter estimates and those of the DIMP model obtained in Section 5.1 we computed
the values of the LR statistic testing for the DMP3 model. The lower right panel
TE
D
of Figure 1 provides the histogram of the LR values. Consistent with the results
for the Wald statistic (see lower left panel of Figure 1) most of the LR values are
significantly larger than the naive asymptotic critical value. However note that
the finite-sample variation of the LR-statistic appears to be much smaller than
that of the Wald statistic.
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
reproduced in the left columns of Table 2. They indicate that (statistical) biases
5.3
Testing for a DMP model
In order to analyze the finite-sample distribution of the Wald statistic under the
null hypothesis of a DMP model, we simulated 100 data sets from a DMP3 model
and computed ML-GHK-EIS estimates for an unrestricted DIMP specification
together with the corresponding Wald statistic for the DMP3 model.
27
ACCEPTED MANUSCRIPT
Under the restriction b = (b1 , b2 )′ = 0, the DIMP model with J = 3 alterna-
IPT
tives and j = 3 as baseline alternative reduces to a DMP3 model. This amounts
to setting the off-diagonal blocks of the stationary covariance matrix Φ∗ given by
Ψb and b′ Ψ equal to zero (see Equations 26 and 27). This restriction on Φ∗ is
the initial specification (19) satisfy the Equations
1 − ρ1 ρ3
σ33
1 − ρ23
σ23 =
1 − ρ2 ρ3
σ33 .
1 − ρ23
(66)
NU
and
We consider the following values for the (R, Σ) parameters: (ρ1 , ρ2 , ρ3 , σ11 , σ22 , σ33 )
= (0.8, 0.6, 0.3, 1, 1, 0.2), together with the restrictions given in Equation (66).
MA
The implied values for the identified parameters for the Cholesky decomposition
of Ψ, and b are (λ12 , λ22 , b1 , b2 ) = (0.223, 1.137, 0, 0).
The parameter estimates of the unrestricted DIMP model for the simulated
data from a DMP3 specification are reported in the right columns of Table 2.
TE
D
As expected, the ML-GHK-EIS estimator is statistically well-behaved, except for
the parameter ρ3 , which is unidentified under the data generating DMP3 model.
Note also that the means of the b1 and b2 estimates are not significantly different
from zero. The histograms for the Wald statistics testing for the DMP1 , DMP2 ,
and DMP3 model are provided in Figure 2. As for the Wald statistic testing
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
σ13 =
SC
R
satisfied if the parameters of the autocorrelation and covariance matrix (R, Σ) in
the DMP3 specification (see lower panel of Figure 1), in 2% of the cases one
would reject the null using the naive 1% asymptotic critical value, in 8% for the
5% critical value, and in 10% for the 10% critical value. This clearly suggests
that the naive asymptotic χ2 distribution delivers a fairly good approximation.
Hence, the appropriate critical values which account for the fact that the nuisance
parameter ρ3 is unidentified under the null can be expected to be fairly close to
28
ACCEPTED MANUSCRIPT
the naive ones. Finally, we observe that the histograms for the Wald statistic
IPT
testing for a DMP1 and DMP2 model (see upper panels of Figure 2) indicate
large deviations from the specification under the corresponding null, as expected.
The main conclusion to be drawn from this simulation experiment are that:
SC
R
(i) the naive asymptotic distribution of the Wald statistic under the null of a
DMP model provides a fairly good approximation to the true one, and (ii) the
Wald test appears to be able to discriminate between different DMP specifications
obtained by using different baseline categories, allowing for data based selection
NU
Predictions under a miss-specified DMP model
MA
5.4
The histograms in Figure 1 show that when the true model is a DIMP model, both
the Wald- and LR-test reject the miss-specified DMP models at very high rates.
However, it is well-known that a miss-specified model can always be rejected given
TE
D
enough data even if the miss-specification leads to only trivial deterioration in fit
and predictions. This raises the question of whether a miss-specified DMP model
produces significant differences in substantively important predictions relative to
a correctly specified DIMP. Note that both models share common specifications
for their means with similar estimated mean coefficients in Tables 1 and 2 and
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
of the latter.
differ only in their error correlation structure. It follows, for example, that we
should expect DMP miss-specification to have a greater impact on predicted
transition probabilities than on predicted marginal selection probabilities.
In order to substantiate these conjectures, we compare marginal selection
probabilities and transition probabilities and their responses to changes in the
covariates under the DIMP and DMP3 model estimated in Sections 5.1 and 5.2,
setting the parameters at their estimated values in Tables 1 and 2, respectively.
29
ACCEPTED MANUSCRIPT
In this context the DIMP parameters serve as true values while the DMP3 pa-
IPT
rameters implicitly represent pseudo-true values. Probabilities are accurately
estimated by Monte Carlo using 1,000,000 simulated sequences of selected alternatives {ji1 , ..., jiT } for each model.
and
P (jit |Zit1 = 0, Zit2 = 1) − P (jit |Zit1 = Zit2 = 0),
for jit = 1, 2, 3.
NU
P (jit |Zit1 = 1, Zit2 = 0) − P (jit |Zit1 = Zit2 = 0),
As expected, percentage differences remain fairly small ranging from -6.82% to
MA
7.34%. For reference, the selection probabilities P (jit |Zit1 = Zit2 = 0) for jit =
1, 2, 3 are (0.573, 0.086, 0.342) under the DIMP model and (0.571, 0.084, 0.345)
under the DMP3 model.
the covariates
TE
D
In Table 4 we report the transition probabilities evaluated at the means of
P (jit |jit−1 , Zi.1 = Zi.2 = 0),
for (jit , jit−1 ) ∈ {1, 2, 3} × {1, 2, 3},
where Zi.j = (Zi1j , ..., ZiT j ), and observe larger percentage differences ranging
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
(j = 1, 2) on the marginal selection probabilities
SC
R
In Table 3 we compare the partial effects of changes in the covariates Zitj
from -16.11% to 29.65%.
Finally, we report in Table 5 the partial effects of a permanent change in the
covariates on the transition probabilities
P (jit |jit−1 , Zi.1 = 1, Zi.2 = 0) − P (jit |jit−1 , Zi.1 = Zi.2 = 0),
P (jit |jit−1 , Zi.1 = 0, Zi.2 = 1) − P (jit |jit−1 , Zi.1 = Zi.2 = 0).
30
and
ACCEPTED MANUSCRIPT
IPT
Here we observe the largest percentage changes ranging from -19.35% to 59.65%.
These results fully confirm our conjecture that miss-specified DMP models
can significantly distort predicted transition probabilities and their responses to
SC
R
changes in the covariates even when marginal selection probabilities are affected
to a much lesser extent. We suspect that we could easily induce larger distortions
of the predicted transition probabilities by assigning autocorrelation parameters
ρj over the full range from −1 to +1 for the DIMP model instead of the [0.3 , 0.8]
NU
model in any substantive empirical application analyzing the dynamics of decision
processes since the cost of generalizing a DMP estimation algorithm into a DIMP
MA
one essentially consists of a one-time implementation cost associated with modifications of the Cholesky decomposition of the corresponding joint covariance
6
TE
D
required for GHK(-EIS).
Conclusion
We have proposed a new specification for the multinomial multiperiod model with
autocorrelated errors which is invariant with respect to the selection of the base-
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
interval used here. Therefore, it appears prudent to initially estimate a DIMP
line alternative, in sharp contrast with commonly used models. Furthermore, it
identifies parameters which are not identified under the standard approach (such
as the parameter governing the dynamics of the utility for the reference category).
The formal identification of the proposed invariant model has been illustrated by
MC experiments. Since our model also nests the standard specifications as special cases, it becomes feasible to test whether it simplifies into a standard model
for a particular baseline alternative.
31
ACCEPTED MANUSCRIPT
IPT
Acknowledgement
We are grateful to two anonymous referees for their helpful comments which
have produced major clarifications on several key issues. Roman Liesenfeld ac-
SC
R
knowledges research support provided by the Deutsche Forschungsgemeinschaft
(DFG) under grant HE 2188/1-1; Jean-François Richard acknowledges the research support provided by the National Science Foundation (NSF) under grant
NU
MA
TE
D
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
SES-0516642.
32
ACCEPTED MANUSCRIPT
IPT
References
Andrews, D.W.K., Ploberger, W., 1994. Optimal tests when a nuisance parameter is
present under the alternative. Econometrica 62, 1383–1414.
SC
R
Bunch, D.S., 1991. Estimability in the multinomial probit model. Transportation
Research B 25, 1–12.
Börsch-Supan, A., Hajivassiliou, V., Kotlikoff, L., Morris, J. 1990. Health, children, and elderly living arrangements: a multiperiod multinomial probit model
NU
No. 3343.
Davis, R.B., 1977. Hypothesis testing when a nuisance parameter is present only
MA
under the alternative. Biometrika 64, 247–254.
Davis, R.B., 1987. Hypothesis testing when a nuisance parameter is present only
under the alternative. Biometrika 74, 33–43.
TE
D
Geweke, J., 1991. Efficient simulation from the multivariate normal and studentt distributions subject to linear constraints. Computer Science and Statistics:
Proceedings of the Twenty-Third Symposium on the Interface, 571–578.
Geweke, J., Keane, M., Runkle, D. 1997. Statistical inference in the multinomial
multiperiod probit model. Journal of Econometrics 80, 125–165.
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
with unobserved heterogeneity and autocorrelated errors. NBER working paper
Hajivassiliou, V., 1990. Smooth simulation estimation of panel data LDV models.
Mimeo. Yale University.
Hansen, B., 1996. Inference when a nuisance parameter is not identified under the
null hypothesis. Econometrica 64, 413–430.
Keane, M., 1992. A note on identification in the multinomial probit model. Journal
of Business & Economics Statistics 10, 193–200.
33
ACCEPTED MANUSCRIPT
Keane, M., 1994. A computationally practical simulation estimator for panel data.
IPT
Econometrica 62, 95–116.
Keane, M., 1997. Modeling heterogeneity and state dependence in consumer choice
SC
R
behavior. Journal of Business and Economic Statistics 15, 310–327.
Liesenfeld, R., Richard, J.-F., 2009. Efficient estimation of Probit models with correlated errors. Working paper, University of Kiel.
Richard, J.-F., Zhang, W., 2007. Efficient high-dimensional importance sampling.
NU
MA
TE
D
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
Journal of Econometrics 141, 1385–1411.
34
ACCEPTED MANUSCRIPT
diff. data sets
ML-GHKEIS
.495
.063
.063
.049
true
.500
fixed data set/diff. CRNs
MLvalue
.537
SC
R
mean
std. dev.
rmse
mean asy. s.e.
ML-GHKEIS
.539
.0009
.0023
mean
std. dev.
rmse
mean asy. s.e.
−1.000
−1.008
.083
.084
.066
ψ
mean
std. dev.
rmse
mean asy. s.e.
1.000
1.003
.036
.036
.030
λ12
mean
std. dev.
rmse
mean asy. s.e.
.521
λ22
mean
std. dev.
rmse
mean asy. s.e.
1.288
b1
mean
std. dev.
rmse
mean asy. s.e.
−.134
−.139
.095
.095
.068
b2
mean
std. dev.
rmse
mean asy. s.e.
−.316
−.313
.099
.099
.072
−.427
−.418
.0040
.0098
mean
std. dev.
rmse
mean asy. s.e.
.800
.802
.029
.029
.024
.775
.775
.0008
.0008
ρ1
−.961
−.968
.0027
.0079
1.020
1.024
.0009
.0044
.523
.087
.087
.066
.550
.552
.0051
.0054
1.288
.073
.073
.056
1.291
1.296
.0026
.0059
TE
D
MA
NU
π02
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
Parameter
π01
IPT
Table 1. ML-EIS-GHK estimates for the DIMP model
35
.0003
−.004
.0024
.0051
Table 1. Continued
Parameter
ρ2
mean
std. dev.
rmse
mean asy. s.e.
.300
.245
.160
.169
.110
MLvalue
.687
.317
ML-GHKEIS
.686
.0019
.0021
.302
.0055
.0166
TE
D
MA
NU
NOTE: The reported numbers for ML-GHK-EIS are mean, standard deviation,
RMSE, and the mean of the asymptotic standard errors obtained for S = 20. The
asymptotic standard errors are obtained from a numerical approximation to the
Hessian. For the experiment with a fixed data set and different CRNs, RMSE is
computed around the true ML value for that particular data set. The true ML
values are the ML-GHK-EIS estimates based on S = 1000.
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
ρ3
mean
std. dev.
rmse
mean asy. s.e.
ML-GHKEIS
.594
.066
.066
.045
true
.600
fixed data set/diff. CRNs
SC
R
diff. data sets
IPT
ACCEPTED MANUSCRIPT
36
ACCEPTED MANUSCRIPT
IPT
Table 2. ML-EIS-GHK estimates for the DIMP/DMP model
mean
std. dev.
rmse
mean asy. s.e.
true
.500
−1.000
ψ
mean
std. dev.
rmse
mean asy. s.e.
1.000
λ12
mean
std. dev.
rmse
mean asy. s.e.
.521
λ22
mean
std. dev.
rmse
mean asy. s.e.
1.288
b1
mean
std. dev.
rmse
mean asy. s.e.
b2
ρ1
−1.043
.088
.098
.073
true
.500
ML-GHKEIS
.515
.064
.066
.051
−1.000
−.994
.116
.116
.070
1.005
.036
.037
.031
1.000
1.001
.047
.047
.032
.423
.094
.136
.075
.223
.245
.079
.081
.065
1.264
.072
.076
.060
1.137
1.141
.093
.093
.058
−.134
.000
−.010
.410
.410
.129
mean
std. dev.
rmse
mean asy. s.e.
−.316
.000
.004
.247
.247
.143
mean
std. dev.
rmse
mean asy. s.e.
.800
.800
.810
.082
.082
.025
NU
mean
std. dev.
rmse
mean asy. s.e.
ML-GHKEIS
.469
.063
.071
.052
TE
D
MA
π02
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
Parameter
π01
true model: DMP3 /
est. model: DIMP
SC
R
true model: DIMP /
est. model: DMP3
.726
.017
.075
.014
37
Table 2. Continued
Parameter
ρ2
ML-GHKEIS
.537
.035
.072
.029
mean
std. dev.
rmse
mean asy. s.e.
true
.600
NU
.300
ML-GHKEIS
.604
.052
.052
.037
.189
.891
.898
.346
TE
D
MA
NOTE: The reported numbers for ML-GHK-EIS are mean, standard deviation,
RMSE, and the mean of the asymptotic standard errors obtained for S = 20. The
asymptotic standard errors are obtained from a numerical approximation to the
Hessian.
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
ρ3
mean
std. dev.
rmse
mean asy. s.e.
true
.600
true model: DMP3 /
est. model: DIMP
SC
R
true model: DIMP /
est. model: DMP3
IPT
ACCEPTED MANUSCRIPT
38
ACCEPTED MANUSCRIPT
∆Zit1 = 1
jit = 1
jit = 2
jit = 3
jit = 1
DIMP
.207
−.045
−.162
−.085
DMP3
.206
−.042
−.164
−.51
−6.82
1.24
−.079
−6.51
AC
CE
P
TE
D
MA
NU
Difference in percent
peer-00628315, version 1 - 2 Oct 2011
∆Zit2 = 1
39
jit = 2
jit = 3
.171
−.086
SC
R
Alternative
IPT
Table 3. Partial effects of a change in Zitj on the selection probabilites
.172
−.093
.47
7.34
ACCEPTED MANUSCRIPT
jit = 1
jit = 2
jit−1 = 1
DIMP
DMP3
Difference in percent
.781
.788
.86
jit−1 = 2
DIMP
DMP3
Difference in percent
.268
.347
29.65
jit−1 = 3
DIMP
.300
.100
DMP3
.267
.089
Difference in percent −11.08 −10.63
40
jit = 3
.180
.167
−7.36
SC
R
.039
.045
16.68
.350
.382
.332
.321
−5.09 −16.11
NU
MA
TE
D
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
Alternative
IPT
Table 4. Transition probabilities
.600
.644
7.31
ACCEPTED MANUSCRIPT
∆Zit1 = 1
∆Zit2 = 1
jit = 1
jit = 2
jit = 3
jit = 1
jit = 2
jit = 3
−.043
−.042
−3.53
DIMP
DMP3
Difference in percent
.099
.096
−3.27
−.019 −.080
−.022 −.074
14.78 −7.55
−.033
−.046
40.06
.076
.088
15.37
jit−1 = 2
DIMP
DMP3
Difference in percent
.121
.135
11.32
−.048
−.059
24.20
−.045
−.072
59.65
.188
−.142
.187
−.115
−.28 −19.35
jit−1 = 3
DIMP
DMP3
Difference in percent
.133
.122
−8.44
−.026
−.107
−.021 −.101
18.90 −5.88
SC
R
jit−1 = 1
−.074
−.076
3.02
NU
MA
TE
D
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
Alternative
IPT
Table 5. Partial effects of a permanent change in Zitj on the transition probabilities
41
−.042
.148
−.043
.142
1.85 −4.36
−.106
−.100
−6.79
NU
MA
TE
D
Figure 1. Histogram of the Wald statistic of H0 : DMPj for j = 1 (upper left panel),
j = 2 (upper right panel), j = 3 (lower left panel) and histogram of the
likelihood-ratio statistic for H0 : DMP3 (lower right panel); The true data generating
process is the DIMP model as given by Equations (63) to (65). The vertical line
marks the 99%-percentile of a χ2 -distribution with 2 degrees of freedom given by 9.21.
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
SC
R
IPT
ACCEPTED MANUSCRIPT
42
NU
MA
TE
D
Figure 2. Histogram of the Wald statistic of H0 : DMPj for j = 1 (upper left panel),
j = 2 (upper right panel), j = 3 (lower left panel); The true data generating process
is the DMP3 model. The vertical line marks the 99%-percentile of a χ2 -distribution
with 2 degrees of freedom given by 9.21.
AC
CE
P
peer-00628315, version 1 - 2 Oct 2011
SC
R
IPT
ACCEPTED MANUSCRIPT
43