Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
22 views15 pages

Ordered Response Models

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 15

Allgemeines Statistisches Archiv 0, 0–14

c Physica-Verlag 0, ISSN 0002-6018

Ordered Response Models


By Stefan Boes and Rainer Winkelmann∗

Summary: We discuss regression models for ordered responses, such as ratings of bonds,
schooling attainment, or measures of subjective well-being. Commonly used models in
this context are the ordered logit and ordered probit regression models. They are based
on an underlying latent model with single index function and constant thresholds. We
argue that these approaches are overly restrictive and preclude a flexible estimation of
the effect of regressors on the discrete outcome probabilities. For example, the signs of
the marginal probability effects can only change once when moving from the smallest
category to the largest one. We then discuss several alternative models that overcome
these limitations. An application illustrates the benefit of these alternatives.

Keywords: Marginal effects, generalized threshold, sequential model, random coeffi-


cients, latent class analysis, happiness. JEL C25, I25.

1. Introduction

Regression models for ordered responses, i.e. statistical models in which the
outcome of an ordered dependent variable is explained by a number of ar-
bitrarily scaled independent variables, have their origin in the biometrics
literature. Aitchison and Silvey (1957) proposed the ordered probit model
to analyze experiments in which the responses of subjects to various doses
of stimulus are divided into ordinally ranked classes. Snell (1964) suggested
the use of the logistic instead of the normal distribution as an approxima-
tion for mathematical simplification. The first comprehensive treatment of
ordered response models in the social sciences appeared with the work of
McKelvey and Zavoina (1975) who generalized the model of Aitchison and
Silvey to more than one independent variable. Their basic idea was to as-
sume the existence of an underlying continuous latent variable – related to
a single index of explanatory variables and an error term – and to obtain
the observed categorical outcome by discretizing the real line into a finite
number of intervals.
McCullagh (1980) developed independently the so-called cumulative model
in the statistics literature. He directly modelled the cumulative probabili-
ties of the ordered outcome as a monotonic increasing transformation of a
linear predictor onto the unit interval, assuming a logit or probit link func-
tion. This specification yields the same probability function as the model
of McKelvey and Zavoina, and is therefore observationally equivalent. Both
papers spurred a large literature on how to model ordered dependent vari-
ables, the former mostly in the social sciences, the latter predominantly in
the medical and biostatistics literature.
On the one hand, a number of parametric generalizations have been pro-
posed. These include alternative link functions, prominent examples being

Received: / Revised:
∗ We are grateful to an anonymous referee for valuable comment.
ORDERED RESPONSE MODELS 1

the log-log or the complementary log-log function (McCullagh, 1980), gen-


eralized predictor functions that include, for example, quadratic terms or
interactions, or dispersion parameters (Cox, 1995). Olsson (1979) and Ron-
ning and Kukuk (1996) discuss estimation of models in which both depen-
dent and independent variables are ordered in the context of multivariate
latent structural models, i.e. an adaptation of log-linear models to ordinal
data. On the other hand, semi- and non-parametric approaches replace the
distributional assumptions of the standard model, or the predictor function,
by flexible semi- or non-parametric functional forms. General surveys of the
parametric as well as the semi- and nonparametric literature are given, for
example, in Agresti (1999), Barnhart and Sampson (1994), Clogg and Shi-
hadeh (1994), Winship and Mare (1984), Bellemare, Melenberg, and van
Soest (2002), and Stewart (2004), the two latter references in particular for
the semi- and nonparametric treatments of ordered data.
When thinking about the usefulness of these alternative models, it is in-
evitable to make up one’s mind on the ultimate objective of the analysis.
It is our perception that in most applications of ordered response models
the parameters of the latent model do not have direct interpretation per se.
Rather, the interest lies in the shift of the predicted discrete ordered out-
come distribution as one or more of the regressors change, i.e. the marginal
probability effects. Perhaps surprisingly, standard ordered response models
are not very well suited to analyze these marginal probability effects, be-
cause the answer is to a large extent predetermined by the rigid parametric
structure of the model. Therefore, we consider a number of generalizations
that allow for flexible analyses of marginal probability effects. In addition to
the generalized threshold model (Maddala, 1983; Terza, 1985; Brant, 1990)
and the sequential model (Fienberg, 1980; Tutz, 1990, 1991), we show how
additional flexibility can be gained by modeling individual heterogeneity
either by means of a random coefficients model or as a finite mixture/latent
class model.
The remainder of the paper is organized as follows. In the next section
we provide a short review of the standard model, before turning to the
generalizations in section 3. In section 4 we illustrate the methods with
an analysis of the relationship between income and happiness using data
from the German Socio-Economic Panel. Our results show that marginal
probability effects in the generalized alternatives are substantially different
from those in the standard model. For example, the standard model implies
that the probability of being completely satisfied increases on average by
about 0.017 percentage points by a one-percentage increase in income, while
it is decreasing or constant in the generalized models. Section 5 concludes.

2. Standard Ordered Response Models


Consider the following examples. In a survey, respondents have been asked
about their life-satisfaction, or their change in health status. Answer cate-
gories might range from 0 to 10 where 0 means completely dissatisfied and
2 STEFAN BOES and RAINER WINKELMANN

10 means completely satisfied, or from 1 to 5, where 1 means greatly deterio-


rated and 5 means greatly improved, respectively. The objective is to model
these ordered responses as functions of explanatory variables.
Formally, let the ordered categorical outcome y be coded, without loss
of generality, in a rank preserving manner, i.e. y ∈ {1, 2, . . . , J} where J
denotes the total number of distinct categories. Furthermore, suppose that
a (k × 1)-dimensional vector x of covariates is available. In standard ordered
response models, the cumulative probabilities of the discrete outcome are
related to a single index of explanatory variables in the following way

Pr[y ≤ j|x] = F (κj − x0 β) j = 1, . . . , J (1)

where κj and β(k×1) denote unknown model parameters, and F can be any
monotonic increasing function mapping the real line onto the unit interval.
Although no further restrictions are imposed a priori on the transformation
F it is standard to replace F by a distribution function, the most commonly
used ones being the standard normal (which yields the ordered probit) and
the logistic distribution (associated with the ordered logit model), and we
assume in what follows that F represents either the standard normal or
logistic distribution. In order to ensure well-defined probabilities, we require
that κj > κj−1 , ∀j, and it is understood that κJ = ∞ such that F (∞) = 1
as well as κ0 = −∞ such that F (−∞) = 0.
Ordered response models are usually motivated by an underlying con-
tinuous but latent process y ? together with a response mechanism of the
form

y=j if and only if κj−1 ≤ y ? = x0 β + u < κj j = 1, . . . , J

where κ0 , . . . , κJ are introduced as threshold parameters, discretizing the


real line, represented by y ? , into J categories. The latent variable y ? is
related linearly to observable and unobservable factors and the latter have
a fully specified distribution function F (u) with zero mean and constant
variance.
The cumulative model (1) can be postulated without assuming the exis-
tence of a latent part and a threshold mechanism, though. Moreover, since
y ? cannot be observed and is purely artificial, its interpretation is not of
interest. The main focus in the analysis of ordered data should be put on
the conditional cell probabilities given by

Pr[y = j|x] = F (κj − x0 β) − F (κj−1 − x0 β) (2)

In order to identify the parameters of the model we have to fix location and
scale of the argument in F , the former by assuming that x does not contain
a constant term, the latter by normalizing the variance of the distribution
function F . Then, equation (2) represents a well-defined probability func-
tion which allows for straightforward application of maximum likelihood
methods for a random sample of size n of pairs (y, x).
ORDERED RESPONSE MODELS 3

The most natural way to interpret ordered response models (and discrete
probability models in general) is to determine how a marginal change in
one regressor changes the distribution of the outcome variable, i.e. all the
outcome probabilities. These marginal probability effects can be calculated
as
∂ Pr[y = j|x] h i
M P Ejl (x) = = f (κj−1 − x0 β) − f (κj − x0 β) βl (3)
∂xl

where f (z) = dF (z)/dz and xl denotes the l-th (continuous) element in x.


With respect to a discrete valued regressor it is more appropriate to calculate
the change in the probabilities before and after the discrete change ∆xl ,

∆ Pr[y = j|x] = Pr[y = j|x + ∆xl ] − Pr[y = j|x] (4)

In general, the magnitude of these probability changes depends on the spe-


cific values of the ith observation’s covariates. After taking expectation with
respect to x we obtain average marginal probability effects, which can be
estimated consistently by replacing the true parameters by their correspond-
ing maximum likelihood estimates and taking the average over all observa-
tions.
However, if we take a closer look at (3) and (4) it becomes apparent that
marginal probability effects in standard ordered response models have two
restrictive properties that limit the usefulness of these models in practice.
First, the ratio of marginal probability effects of two distinct continuous
covariates on the same outcome, i.e. relative marginal probability effects,
are constant across individuals and the outcome distribution, because from
(3) we have that
M P Ejl (x) βl
=
M P Ejm (x) βm
which does not depend on i and j. Second, marginal probability effects
change their sign exactly once when moving from the smallest to the largest
outcome. More precisely, if we move stepwise from the lowest category y = 1
to the highest category y = J, the effects are either first negative and then
positive (βl > 0), or first positive and then negative (βl < 0). This “single
crossing property” follows directly from the bell-shaped density functions
of the standard normal and the logistic distribution. Therefore, if we are
interested in the effect of a covariate on the outcome probabilities, i.e. if we
turn our attention to the effects on the full distribution of outcomes, the
standard models preclude a flexible analysis of marginal probability effects
by design.

3. Generalized Ordered Response Models

Three assumptions of the standard model are responsible for its limitations
in analyzing marginal probability effects: First, the single index assumption,
4 STEFAN BOES and RAINER WINKELMANN

second, the constant threshold assumption, and third, the distributional as-
sumption which does not allow for additional individual heterogeneity be-
tween individual realizations. While relaxing these assumptions we want to
retain the possibility of interpreting the model in terms of marginal prob-
ability effects. Therefore, we need to search for a richer class of parametric
models that does not impose restrictions such as constant relative effects or
single crossing. In this section we present four such alternatives.

3.1. Generalized Threshold Model. The first model we consider re-


laxes the single index assumption and allows for different indices across
outcomes. This model was introduced by Maddala (1983) and Terza (1985)
who proposed to generalize the threshold parameters by making them de-
pendent on covariates
κj = κ̃j + x0 γj
where γj is a k ×1-dimensional vector of response specific parameters. Plug-
ging this into (1) we get the cumulative probabilities in the generalized
threshold model

Pr[y ≤ j|x] = F (κ̃j + x0 γj − x0 β) = F (κ̃j − x0 βj ) j = 1, . . . , J (5)

where it is understood that κ̃0 = −∞ and κ̃J = ∞, as before. The last equal-
ity in (5) follows because γj and β cannot be identified separately with the
same x entering the index function and the generalized thresholds, and we
define βj ≡ β − γj . The cumulative probabilities define a probability den-
sity function in the same manner as in (2) and parameters can be estimated
directly by maximum likelihood. A non-linear specification can be used to
ensure that κ̃j−1 − x0 βj−1 < κ̃j − x0 βj for all κ̃, β̃ and x (e.g. Ronning,
1990). We observe that the generalized threshold model nests the standard
model under the restrictions β1 = . . . = βJ−1 and therefore both models
can be tested against each other by performing a likelihood ratio (LR) test.
The generalized threshold model provides a framework in which marginal
probability effects can be analyzed with much more flexibility than in the
standard model, since

M P Ejl (x) = f (κ̃j−1 − x0 βj−1 )βj−1l − f (κ̃j − x0 βj )βjl (6)

does not rely anymore on a single crossing property or constant relative


effects. Nevertheless, this generalization comes at a cost. The model now
contains (J − 2)k parameters more than before which reduces the degrees
of freedom considerably, in particular when J is large.

3.2. Random Coefficients Model. As a second alternative we discuss


the class of random coefficients models. The basic idea is to randomize the
parameters of interest by adding an error term that is correlated with the
ORDERED RESPONSE MODELS 5

unobserved factors in u. Thus, we translate individual heterogeneity into


parameter heterogeneity, writing the vector of slopes as

β = β̃ + ε

where ε is an individual specific (k × 1)-dimensional vector of error terms.


Moreover, we assume for the joint error term γ ≡ (ε0 u)0 that
!
0 Ω ψ
E[γ|x] = 0 and E[γγ |x] = Σ with Σ =
ψ0 1

where Ω is the (k × k)-dimensional covariance matrix of ε, ψ is the (k × 1)-


dimensional covariance vector between the slope parameters and u, and
Var[u|x] = 1, as before. The consequences of this modification are easiest
seen from the latent variable representation, where we now have y ? = x0 β̃+ũ
with “new” error term ũ ≡ x0 ε + u, such that

E[ũ|x] = 0 and E[ũũ0 |x] = x0 Ωx + 2x0 ψ + 1 ≡ σũ2

and ũ/σũ is distributed with distribution function F . If ε and u are jointly


normal with covariance structure given by Σ, we obtain an ordered probit
model with unobserved heterogeneity. However, in principle, we do not need
to know the distributions of ε or u, as long as F is a well-defined distribution
function. In this case, we can express the cumulative probabilities in the
random coefficients model as
!
κj − x0 β̃
Pr[y ≤ j|x] = F ≡ F̃j (x) (7)
σũ

where σũ = x0 Ωx + 2x0 ψ + 1 can be seen as dispersion parameter. The
standard model is a special case of the random coefficients model under the
assumption Ω = 0 and ψ = 0. Thus, a simple LR test can be used to test
for parameter heterogeneity.
The probability density function of y is obtained in the same way as
in (2), and one can calculate marginal probability effects in the random
coefficients model as
h i β̃
l
M P Ejl (x) = f˜j−1 (x) − f˜j (x)
σũ
h    i x0 Ω + ψ
l l
+ f˜j−1 (x) κj−1 − x0 β̃ − f˜j (x) κj − x0 β̃ (8)
σũ3
by using product and chain rules. In (8), Ωl denotes the l-th column in Ω
and ψl the l-th element in ψ, respectively, and f˜(z) = dF̃ (z)/dz. The first
term in (8) corresponds to the marginal probability effects in the standard
model corrected for the standard deviation of the disturbance ũ. The second
term arises because we assume a specific form of heteroscedasticity which
makes the error term dependent on x. Consequently, marginal probability
6 STEFAN BOES and RAINER WINKELMANN

effects in the random coefficient model are more flexible than those in the
standard model since the sign of the second term is indeterminate.
The random coefficients model can be estimated directly by the method
of maximum likelihood with heteroscedasticity corrected index function.
However, some caution is required in running the optimization routines.
Although the parameters of the model are identified by functional form,
the specific structure of the model might cause problems in some datasets.
Specifically, certain values of Ω, ψ and x can drive σũ2 to be negative or its
square root to be almost linear in the parameters, such that the argument
in F gets complex or is not identified, respectively. Nevertheless, if the data
support the model, we should find reasonable estimates of the elements in
Ω and ψ.

3.3. Finite Mixture Model. The third approach is a finite mixture


model for ordered data (Everitt, 1988; Everitt and Merette, 1990; Uebersax,
1999) which provides a very flexible way of modeling heterogeneity among
groups of individuals. It is supposed that the population is split into C
distinct latent classes and each class has its own data-generating process,
i.e. we relax the distributional assumption of the standard model and its im-
plied homogeneity. To fix ideas, let c = 1, . . . , C denote the index of classes
and write the cumulative probabilities for class c as

Pr[yc ≤ j|x] = F (κcj − x0 βc ) ≡ Fcj (x)

However, individual class membership is not observable and we assume that


each individual belongs to a certain class c with probability πc . Thus, we can
write the cumulative probabilities of the observed outcomes as a mixture of
class specific cumulative probabilities
C
X
Pr[y ≤ j|x] = πc Fcj (x) (9)
c=1

where the πc ’s sum up to unity. The probability density function of the
P
ordered outcome is given by Pr[y = j|x] = c πc Fcj (x) − Fcj−1 (x) and
marginal probability effects can be obtained, as before, by taking the first
order derivative with respect to xl
C
X  
M P Ejl (x) = πc fcj−1 (x) − fcj (x) βcl (10)
c=1

Again, the sign of marginal probability effects is indeterminate because of


the dependence on πc and βcl which might differ in magnitude and sign
among classes. The statistical significance of these differences can be tested
by conducting a LR test with restrictions π1 = . . . = πC and β1 = . . . = βC ,
that is, a total number of (C − 1)(k + 1) restrictions. Uebersax (1999) gives
conditions for identification of class specific thresholds and slope parameters.
ORDERED RESPONSE MODELS 7

The parameters of the finite mixture model can be estimated directly


via maximum likelihood. This requires maximization of a (in general mul-
timodal) log-likelihood function of the form

n X
X J C
X  
ln L(θ, π|y, x, z) = yij ln πc Fcj (xi ) − Fcj−1 (xi )
i=1 j=1 c=1

where θ and π is shorthand notation for the vectors of class specific pa-
rameters θc (which include thresholds and slopes) and probabilities πc , re-
spectively, and yj is a binary variable indicating whether y = j. The multi-
modality of the log-likelihood function and the large number of parameters
for increasing C might cause the optimization routines to be slow in find-
ing the global maximum. Furthermore, although the probability function of
the complete mixture might be well-defined, the probabilities in a subset of
classes can turn negative. An alternative approach of getting the maximum
likelihood estimates that circumvents these problems is to formulate the
model as an incomplete data problem and to apply the EM algorithm of
Dempster et al. (1977).
To be more specific, let mc denote a binary variable indicating individ-
ual class membership which can be interpreted as independent realizations
of a C-component multinomial distribution with component probabilities
πc , the prior probability of belonging to class c. The (complete-data) log-
likelihood function for a random sample of size n conditional on observed
class membership m can be written as

X J
n X C
X n  o
ln L(θ, π|y, x, m) = yij mci ln πc + ln Fcj (xi ) − Fcj−1 (xi )
i=1 j=1 c=1
(11)
Since we cannot observe individual class membership, that is the data are
incomplete, we cannot maximize this log-likelihood function directly.
The EM algorithm proceeds iteratively in two steps, based on an E-
step in which the expectation of (11) is taken with respect to m given the
observed data and the current fit of θ and π, and an M-step in which the
log-likelihood function (11) is maximized with respect to θ and π given
expected individual class membership. The linearity of the complete-data
log-likelihood in m allows for direct calculation of the expected individual
class membership given the observed data and the parameters obtained in
the q-th iteration step. This expectation corresponds to the probability of
the ith entity belonging to class c, henceforth called posterior probability τc .
From the assumptions above or simply by Bayes’ theorem it can be shown
that
 
(q) (q) (q)
  πc Fcj (x) − Fcj−1 (x)
τc y, x; θ(q) , π (q) = C (12)
P (q)  (q) (q)

πc Fcj (x) − Fcj−1 (x)
c=1
8 STEFAN BOES and RAINER WINKELMANN

(q)
where Fcj denotes the value of F evaluated at the parameters obtained
in the q-th iteration step. These probabilities can be used to anaylze the
characteristics of each class, i.e. we can assign each individual to the class
for which its probability is the highest and then derive descriptive statistics
or marginal probability effects per class.
The M-step replaces mc in (11) by its expectation, τc , and therefore con-
siders the expected log-likelihood to be maximized. Again, the linearity in
(11) provides a substantial simplification of the optimization routine. First,
(q+1)
updated estimates of πc can be obtained directly by taking the sample
−1
P
average n i τ c (.) where 0 ≤ τc (.) ≤ 1 (see (12)). Secondly, each class
can be maximized separately with respect to θc to get updated estimates
(q+1)
θc taking into account the multiplicative factor τc . In other words, we
can estimate C simple ordered probits or logits while weighting the data
appropriately and alter the E- and M-steps repeatedly until the change in
the difference between the log-likelihood values is sufficiently small.

3.4. Sequential Model. The last alternative for a flexible ordered re-
sponse model adopts methods from the literature on discrete time duration
data. In this literature, the main quantity of interest is the conditional exit
probability (or “hazard rate”) Pr[y = j|y ≥ j, x], where y is the duration
of the spell and j is the time of exit. The key insight is that such discrete
time hazard rate models can be used for any ordered response y. Once
the conditional transition probabilities are determined, the unconditional
probabilities are obtained from the recursive relationship
Pr[y = j|x] = Pr[y = j|y ≥ j, x] Pr[y ≥ j|x] j = 1, . . . , J (13)
where
Pr[y ≥ 1|x] = 1
j−1
Yn o
Pr[y ≥ j|x] = 1 − Pr[y = r|y ≥ r, x] j = 2, . . . , J (14)
r=1

and it is understood that Pr[y = J|y ≥ J, x] = 1. Using (13) and (14)


the whole probability function of y can be expressed in terms of condition-
als, or more precisely, as a sequence of binary choice models where each
decision is made for a specific category j conditional on refusing all cate-
gories smaller than j. This kind of model can be motivated by a sequential
response mechanism where each of the J outcomes can be reached only
step-by-step, starting with the lowest category, and therefore the model is
refered to as sequential model. This model implicitly accounts for the or-
dering information in y without assuming any cardinality in the threshold
mechanism.
To complete the model we specify the conditional transition probabilities
as
Pr[y = j|y ≥ j, x] = F (αj + x0 βj ) = Fj (x) j = 1, . . . , J (15)
ORDERED RESPONSE MODELS 9

where αj is a category specific constant, βj is a category specific slope


parameter, and it is understood that αJ = ∞ such that FJ (∞) = 1. There-
fore, in contrast to previously discussed models, we do not parameterize
the cumulative probabilities but rather the conditional transition proba-
bilities. The parameters can be estimated by running j consecutive binary
choice models where the dependent variable is the binary indicator yj de-
fined in the previous section, and only observations with y ≥ j are included.
Therefore, estimation is simplified considerably compared to the generalized
threshold and the random coefficients model since no further restrictions on
the parameter space are required. The downside is that computation of the
marginal probability effects is now more complicated. It can be shown that
M P E1l (x) = f1 (x)β1l
j−1
X
M P Ejl (x) = fj (x)βjl Pr[y ≥ j|x] − Fj (x) M P Erl (x) j = 2, . . .(16)
,J
r=1

Clearly, these effects are very flexible, as they can vary by category and
do not rely on a single crossing property or constant relative effects. The
sequential model and the standard model are nonnested models and one
may use information based measures like the Akaike Information Criterion
(AIC) as a model selection criterion. Moreover, for the problem of choosing
among the generalized alternatives the same strategy is advisable.

4. Empirical Illustration
In order to illustrate the benefit of the generalized ordered response models
we analyze the effect of income on happiness using data from the German
Socio-Economic Panel (GSOEP; see also Boes and Winkelmann, 2004). The
relationship between income and happiness was studied before in a number
of papers (see, for example, Easterlin, 1973, 1974; Scitkovsky, 1975; Frey and
Stutzer, 2000, 2002; Shields and Wheatley Price, 2004 and the references
therein) and has gained renewed interest in the recent literature because
of its use for valuation of public goods or intangibles (see, for example,
Winkelmann and Winkelmann, 1998; Frey, Luechinger, and Stutzer, 2002;
van Praag and Baarsma, 2005).
We used data from the 1997 wave of the GSOEP and selected a sample of
1735 men aged between 25 and 65. The dependent variable happiness with
originally 11 categories was recoded to avoid cells with low frequency and,
after merging the lower categories 0/1/2 and 3/4, we retained a total of J =
8 ordered response categories. We included among the regressors logarithmic
family income and logarithmic household size as well as a quadratic form
in age, and two dummy variables indicating good health status as well as
unemployment.
In our regression analysis, we assumed that F is the cumulative density
function of the standard normal distribution. The random coefficients model
was simplified by restricting Ω and ψ such that σũ2 = Ωll x2l + 2ψl xl + 1,
10 STEFAN BOES and RAINER WINKELMANN

Table 1. Model Selection


Ordered Generalized Sequential Random Finite
Probit Threshold Probit Coefficients Mixture
No. of param. [13] [49] [49] [15] [26]
ln L -3040.58 -2999.59 -2999.12 -3035.88 -3024.65
AIC 6107.16 6097.18 6096.24 6101.76 6101.30

No. of obs. 1735


Notes: The data were drawn from the 1997 wave of the German Socio-Economic
Panel, the dependent variable happiness with originally eleven categories (0-10) was
recoded to avoid cells with low frequency; we subsumed categories 0-2 in j=1, cate-
gories 3/4 in j=2, the remaining in ascending order up to j=8.

where xl is assumed to be logarithmic income, Ωll denotes the l-th diagonal


element in Ω and ψl the l-th element in ψ. Thus, we confine our analysis to
parameter heterogeneity in the income coefficient, with all other parameters
being deterministic. In the finite mixture model, we considered only two
latent classes (C = 2). The following discussion proceeds in two steps: First,
we evaluate the models by means of likelihood ratio tests and selection
criteria, and second, we examine the implications for interpretion in terms
of marginal probability effects.
The first question we address is whether one of the models presented
above uses the information inherent in the data optimally. For this pur-
pose, we perform likelihood ratio tests or AIC comparisons, depending on
the situation. For example, the differences between the generalized thresh-
old and the standard ordered probit model are statistically signifcant if we
can reject the null hypothesis of no category specific parameters. This can
be investigated by running a likelihood ratio test with minus two times
the difference between the log-likelihoods of the standard and the general-
ized model as appropriate test statistic, showing a value of 79.98. The test
statistic is asymptotically χ2 -distributed with 36 degrees of freedom. Thus,
we can reject the null hypothesis, and thereby the standard orderd probit
model. Likewise, we can compare the random coefficients model as well as
the finite mixture model with the ordered probit, the latter being rejected
in both cases. The sequential model and the standard ordered probit are
nonnested models which rules out the application of a LR test. Instead,
we may calculate the AIC for each model, showing values of 6107.96 and
6096.24 for the ordered probit and the sequential probit, respectively. A
smaller value indicates a better fit while penalizing for the proliferation
of parameters, and, although 36 parameters more, we favor the sequential
probit to the ordered probit model. Furthermore, among the generalized
alternatives the generalized threshold and the sequential model have the
smallest AIC values, followed by the finite mixture model and the random
coefficients model.
We now turn our attention to average marginal probability effects of in-
come on happiness. The M P E’s of the ordered probit model are reported
ORDERED RESPONSE MODELS 11

Table 2. Marginal Probability Effects of Income on Happiness


Ordered Generalized Sequential Random Finite
Probit Threshold Probit Coeff. Mixture
Class 1 Class 2
j=1 -0.0076 -0.0098 -0.0083 -0.0165 -1.3e-07 -0.0245
j=2 -0.0228 -0.0096 -0.0155 -0.0391 -0.0076 -0.0092
j=3 -0.0223 -0.0352 -0.0338 -0.0297 -0.0024 -0.0565
j=4 -0.0160 -0.0444 -0.0410 -0.0140 -0.0026 -0.0285
j=5 -0.0090 0.0039 0.0095 0.0135 -0.0030 0.0198
j=6 0.0328 0.0680 0.0697 0.0589 0.0028 0.0920
j=7 0.0275 0.0403 0.0334 0.0234 0.0073 0.0069
j=8 0.0173 -0.0133 -0.0140 0.0035 0.0056 5.7e-08
Notes: The table reports average marginal probability effects of logarithmic income
on happiness responses, AM P Ej,ln(income) . For example, in the ordered probit model
AM P E6,ln(income) = 0.0328 means that the probability of j = 6 increases by about
0.0328 percentage points given an increase in logarithmic income by 0.01 (which
corresponds to an increase in income by about 1 percent).

in the first column of table 2. Our results show a positive coefficient of


logarithmic income, implying a negative sign of the M P E’s for low hap-
piness responses, switching into the positive for j ≥ 6. The interpretation
of, for example, M P E6 = 0.0328 is that a one-percent increase in income
raises the probability of happiness = 6 by approximately 0.0328 percentage
points. Compared to the standard model, the generalized threshold and the
sequential model yield substantially different effects (see columns 2 and 3).
First, the sign of M P E5 changes, indicating a positive effect also for the
fifth category. Second, the magnitude of some M P E’s are clearly under-
estimated by the standard model. For example, the estimated M P E6 in
the generalized ordered response models is more than twice as large as in
the ordered probit. Third, and probably most important, the sign of the
marginal probability effect in the utmost right part of the outcome distri-
bution turns out to be negative, violating the single crossing requirement of
the simple model. This means that an increase in income actually reduces
the probability of being very happy, a result consistent with the view that
“money does not buy happiness”.
The results of the random coefficients model are reported in the fourth
column of table 2. The calculated M P E’s tend to support the results of
the generalized threshold and the sequential model, although there is no
negative effect on the highest happiness response. However, the random co-
efficient specification provides further insights into the relationship between
income and happiness. We estimated Ω̂ll = 0.60 and ψ̂l = −0.77, the latter
implying that unobservables in the happiness equation are negatively cor-
related with the random coefficient. This can be interpreted as follows: If
12 STEFAN BOES and RAINER WINKELMANN

unobservables in the happiness equation tend to increase the probability of


higher responses, then the effect of income is lower for these individuals.
In the finite mixture model we can make use of the posterior probabilities
to obtain marginal probability effects per class (see columns 5 and 6). The
results indicate that the effect of income on happiness can be neglected for
one class (the relatively happy class with average happiness of 5.71) whereas
for the class of relatively unhappy people (average happiness of 4.25) income
plays a much more important role.

5. Concluding Remarks

In this paper we argued that the standard ordered probit and ordered logit
models, while commonly used in applied work, are characterized by some
restrictive and therefore non-desirable properties. We then discussed four
generalized models, namely the generalized threshold, the random coeffi-
cients, the finite mixture, and the sequential model. All of them are sub-
stantially more flexible in analyzing marginal probability effects since they
do not rely on constant relative effects or a single crossing property.
An illustrative application with data from the 1997 wave of the GSOEP
dealt with the relationship between income and happiness. We asked how a
one-percent increase in income is predicted to change the happiness distribu-
tion, ceteris paribus. The analysis showed that the estimated marginal prob-
ability effects differed markedly between the standard ordered probit model
and the probit-specified alternatives. For example, a negative marginal effect
for the highest answer category (as predicted by the generalized threshold
model) is ruled out by assumption in the standard model.
As is not uncommon with such generalizations, they can be computation-
ally burdensome due to the larger number of parameters, restrictions on the
parameter space, or a multimodality of the likelihood function. Neverthe-
less, the greater flexibility and enhanced interpretation possibilities should
render these alternative models indispensable tools in all research situations,
where an accurate estimation of the marginal probability effects over the
entire range of the outcome distribution is of interest.

References

Aitchison, J., and S.D. Silvey (1957). The Generalization of Probit Analysis
to the Case of Multiple Repsonses. Biometrika 44 131–140.
Agresti, A. (1999). Modelling Ordered Categorical Data: Recent Advances and
Future Challenges. Statistics in Medicine 18 2191–2207.
Anderson, J.A. (1984). Regression and Ordered Categorical Variables. Journal
of the Royal Statistical Society. Series B (Methodological) 46 1–30.
Barnhart, H.X., and A.R. Sampson (1994). Overview of Multinomial Models
for Ordered Data. Communications in Statistics – A. Theory and Methods 23
3395–3416.
ORDERED RESPONSE MODELS 13

Bellemare C., B. Melenberg, and A. van Soest (2002). Semi-parametric


Models for Satisfaction with Income. Portuguese Economic Journal 1 181–
203.
Boes, S., and R. Winkelmann (2004). Income and Happiness: New Results from
Generalized Threshold and Sequential Models. IZA Discussion Paper No. 1175,
SOI Working Paper No. 0407.
Brant, R. (1990). Assessing Proportionality in the Proportional Odds Model for
Ordered Logistic Regression. Biometrics 46 1171–1178.
Clogg, C.C., and E.S. Shihadeh (1994). Statistical Models for Ordered Vari-
ables. Sage Publications, Thousand Oaks.
Cox, C. (1995). Location-Scale Cumulative Odds Models for Ordered Data: A
Generalized Non-Linear Model Approach. Statistics in Medicine 14 1191–
1203.
Dempster, A.P., N.M. Laird, and D.B. Rubin (1977). Maximum Likelihood
from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical
Society. Series B (Methodological) 39 1–38.
Easterlin, R. (1973). Does Money Buy Happiness?. Public Interest 30 3–10.
Easterlin, R. (1974). Does Economic Growth Improve the Human Lot? Some
Empirical Evidence. In Nations and Households in Economic Growth: Essays
in Honor of Moses Abramowitz P. David, M. Reder, eds. 89–125. Academic
Press, New York.
Everitt, B.S. (1988). A Finite Mixture Model for the Clustering of Mixed-Mode
Data. Statistics and Probability Letters 6 305–309.
Everitt, B.S., and C. Merette (1990). The Clustering of Mixed-Mode Data: A
Comparison of Possible Approaches. Journal of Applied Statistics 17 283–297.
Fienberg, S.E. (1980). The Analysis of Cross-Classified Categorical Data. MIT
PRess, Cambridge, MA.
Frey, B.S., S. Luechinger, and A. Stutzer (2004). Valuing Public Goods:
The Life Satisfaction Approach. CESifo Working Paper No. 1158.
Frey, B.S., and A. Stutzer (2000). Happiness, Economy and Institutions. The
Economic Journal 110 918–938.
Frey, B.S., and A. Stutzer (2002). Happiness and Economics: How the Econ-
omy and Institutions Affect Human Well-Being. Princeton University Press,
Princeton and Oxford.
Maddala, G. (1983). Limited-Dependent and Qualitative Variables in Economet-
rics. Cambridge University Press, Cambridge.
McCullagh, P. (1980). Regression Models for Ordered Data. Journal of the
Royal Statistical Society. Series B (Methodological) 42 109–142.
McKelvey, R., and W. Zavoina (1975). A Statistical Model for the Analysis
of Ordered Level Dependent Variables. Journal of Mathematical Sociology 4
103–120.
Olsson, U. (1979). Maximum-Likelihood Estimation of the Polychoric Correla-
tion Coefficient. Psychometrika 44 443–460.
Ronning, G. (1990). The Informational Content of Responses from Business Sur-
veys. In Microeconometrics. Surveys and Applications J.P. Florens, M. Ivaldi,
J.J. Laffont, F. Laisney, eds. 123–144. Basil Blackwell, Oxford..
14 STEFAN BOES and RAINER WINKELMANN

Ronning, G., and M. Kukuk (1996). Efficient Estimation of Ordered Probit


Models. Journal of the American Statistical Association 91 1120–1129.
Scitovsky, T. (1975). Income and Happiness. Acta Oeconomica 15 45–53.
Shields, M., and S. Wheatley Price (2005). Exploring the Economic and
Social Determinants of Psychological Well-Being and Perceived Social Support
in England. Journal of The Royal Statistical Society. Series A 168 513–537.
Snell, E.J. (1964). A Scaling Procedure for Ordered Categorical Data. Biomet-
rics 20 592–607.
Stewart, M.B. (2004). A Comparison of Semiparametric Estimators for the Or-
dered Response Model. Computational Statistics and Data Analysis 49 555–
573.
Terza, J. (1985). Ordered Probit: A Generalization. Communications in Statis-
tics – A. Theory and Methods 14 1–11.
Tutz, G. (1990). Sequential Item Response Models with an Ordered Response.
British Journal of Mathematical and Statistical Psychology 43 39–55.
Tutz, G. (1991). Sequential Models in Ordered Regression. Computational Statis-
tics and Data Analysis 11 275–295.
Uebersax, J.S. (1999). Probit Latent Class Analysis with Dichotomous or Or-
dered Category Measures: Conditional Independence/Dependence Models. Ap-
plied Psychological Measurement 23 283–297.
van Praag, B.M.S., and B.E. Baarsma (2005). Using Happiness Surveys to
Value Intangibles: The Case of Airport Noise. The Economic Journal 115
224–246.
Winkelmann, L., and R. Winkelmann (1998). Why Are the Unemployed So
Unhappy? Evidence from Panel Data. Economica 65 1–15.
Winship, C., and R.D. Mare (1984). Regression Models with Ordered Variables.
American Sociological Review 49 512-525.

Stefan Boes Rainer Winkelmann


Socioeconomic Institute Socioeconomic Institute
University of Zurich University of Zurich
Zuerichbergstr. 14, CH-8032 Zurich Zuerichbergstr. 14, CH-8032 Zurich
Switzerland Switzerland
boes@sts.unizh.ch winkelmann@sts.unizh.ch

You might also like